WO2016090520A1

WO2016090520A1 - A method and a system for image classification

Info

Publication number: WO2016090520A1
Application number: PCT/CN2014/001115
Authority: WO
Inventors: Xiaogang Wang; Hongsheng LI; Rui Zhao
Original assignee: Xiaogang Wang
Priority date: 2014-12-10
Filing date: 2014-12-10
Publication date: 2016-06-16
Also published as: CN107004142B; CN107004142A

Abstract

Disclosed is an apparatus for image classification. The apparatus comprises a converter and a forward propagator. The converter is configured to retrieve a convolutional neural network with a plurality of convolutional layers and a plurality of pooling layers connected to the convolutional layers. The forward propagator is configured to feed an image into the convolutional neural network to predict classes of all pixels in the image. The convert further comprises first and second converting units. The first converting unit is configured to insert all-zero rows and columns to the convolutional kernel of the convolutional layers such that every two neighboring entries are separated from each other. The second converting unit is configured to insert unmasked rows and columns to the pooling kernel of the pooling layers such that every two neighboring entries are separated from each other. The apparatus also comprises a backward propagator to update the convolutional kernels in the converted convolutional neural network. The present application also discloses a method for image classification.

Description

A METHOD AND A SYSTEM FOR IMAGE CLASSIFICATION

Technical Field

The present application relates to a method for image classification and a system thereof.

Background

The goal of pixel-wise classification is to classify all pixels in an image into different classes. Pixel-wise classification tasks include image segmentation and object detection， which require inputting image patches into a classifier and outputting the class labels for their central pixels.

Convolutional Neural Networks (CNNs) are trainable multistage feed-forward neural networks. They have been extensively investigated to extract good hierarchical feature representations for image classification tasks. The input and output of each layer are called feature maps. The CNN generally comprises convolution layers， pooling layers and non-linearity layers. The convolution layer convolves input feature maps with 3D filter banks to generate output feature maps. Each filter extracts the same type of local features at all locations of the input feature map. The pooling layer decreases the resolution of the feature maps to make the output feature maps less sensitive to input shift and distortions. Max-pooling and average-pooling are most commonly used. The non-linearity layer is a point-wise non-linear function applied to each entry of the feature maps.

After extracting features with a multilayer convolutional network， fully connected layers with a final classifier are added to output class predictions. Given training samples and their labels， the parameters of CNNs are learned in an end-to-end supervised way by minimizing a loss function on training data. Forward and backward propagation is used to make class predictions for input samples and to update CNN parameters based on prediction errors， respectively.

However， forward and backward propagation was originally designed for whole-image classification. Directly applying it to pixel-wise classification in a patch-by-patch scanning manner is extremely inefficient， because surrounding patches of pixels have large overlaps， which lead to a lot of redundant computation.

Summary

There have been works on how to eliminate all the redundant computations of forward and backward propagation in CNN based pixelwise classification， and achieve significant speedup.

In one aspect of the present application， disclosed is an apparatus for image classification. The apparatus may comprise a converter configured to convert a convolutional neural network with a plurality of convolutional layers and a plurality of pooling layers connected to the convolutional layers. The converter may comprise a first converting unit configured to insert all-zero rows and columns to a convolutional kernel of each of the convolutional layers such that every two neighboring entries in the convolutional kernel are separated from each other， and a second converting unit configured to insert unmasked rows and columns to a pooling kernel of each of the pooling layers such that every two neighboring entries in the pooling kernel are separated from each other. The apparatus may further comprise a forward propagator configured to feed an image into the converted convolutional neural network to predict classes of all pixels in the image.

In one embodiment， the apparatus may further comprise a backward propagator. The backward propagator may be configured to update parameters of the convolutional kernel in the converted convolutional neural network

In one embodiment， the apparatus may further comprise a chooser. The chooser may be configured to choose errors of pixels of interest， the errors being back-propagate through the converted convolutional neural network so as to update parameters of the convolutional kernel.

In another aspect of the present application， disclosed is a method for image classification. The method may comprise converting a convolutional neural network with a plurality of convolutional layers and a plurality of pooling layers connected to the convolutional layers， and feeding an image into the converted convolutional neural network to predict classes of all pixels in the image. The step of converting may comprise inserting all-zero rows and columns to a convolutional kernel of each of the convolutional layers such that every two neighboring entries in the convolutional kernel are separated from each other， and inserting unmasked rows and columns to a pooling kernel of each of the pooling layers such that every two neighboring entries in the pooling kernel are separated from each other.

In one embodiment， the method may further comprise a step of updating parameters of the convolutional kernel in the converted convolutional neural network.

In one embodiment， the method may further comprise a step of choosing errors of pixels of interest， and back-propagating errors through the converted convolutional neural network so as to update the parameters of the convolutional kernel.

Brief Description of the Drawing

Exemplary non-limiting embodiments of the present invention are described below with reference to the attached drawings. The drawings are illustrative and generally not to an exact scale. The same or similar elements on different figures are referenced with the same reference numbers.

Fig. 1 is a schematic diagram illustrating an exemplary apparatus according to one embodiment of the present application.

Fig. 2 is a schematic diagram illustrating an exemplary forward propagator according to one embodiment of the present application.

Fig. 3 is a schematic diagram illustrating another exemplary forward propagator according to one embodiment of the present application.

Fig. 4 is a schematic diagram illustrating an exemplary chooser according to one embodiment of the present application.

Fig. 5 is a schematic diagram illustrating an exemplary backward propagator according to one embodiment of the present application.

Fig. 6 is a schematic diagram illustrating another exemplary backward propagator according to one embodiment of the present application.

Fig. 7 is a schematic diagram illustrating yet another exemplary backward propagator according to one embodiment of the present application.

Fig. 8 is a schematic flowchart illustrating an exemplary method for image classification according to one embodiment of the present application.

Fig. 9 is a schematic flowchart illustrating the steps for converting an original CNN to a converted CNN according to one embodiment of the present application.

Fig. 10 is a schematic view illustrating inserting all-zero rows and columns to the convolutional kernel W_k and pooling kernel P_k with d＝2 and d＝3， respectively.

Fig. 11 is a schematic flowchart illustrating the steps for forward propagating according to one embodiment of the present application.

Fig. 12 is a schematic view illustrating performing convolution as matrix multiplication with the converted convolutional kernel.

Fig. 13 is a schematic flowchart illustrating the choosing step according to one embodiment of the present application.

Fig. 14 is a schematic flowchart illustrating the steps for backward propagating according to one embodiment of the present application.

Fig. 15 is a comparison of patch-by-patch scanning for CNN based pixel-wise classification and the advanced method disclosed in the present application.

Detailed Description

Reference will now be made in detail to some specific embodiments of the invention including the best modes contemplated by the inventors for carrying out the invention. Examples of these specific embodiments are illustrated in the accompanying drawings. While the invention is described in conjunction with these specific embodiments， it will be understood that it is not intended to limit the invention to the described embodiments. On the contrary， it is intended to cover alternatives， modifications， and equivalents as may be included within the spirit and scope of the invention as defined by the appended claims. In the following description， numerous specific details are set forth in order to provide a thorough understanding of the present invention. The present invention may be practiced without some or all of these specific details. In other instances， well-known process operations have not been described in detail in order not to unnecessarily obscure the present invention.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used herein， the singular forms ″a″ ， ″an″ and ″the″ are intended to include the plural forms as well， unless the context clearly indicates otherwise. It will be further understood that the terms ″comprises″ and/or ″comprising， ″ when used in this specification， specify the presence of stated features， integers， steps， operations， elements， and/or components， but do not preclude the presence or addition of one or more other features， integers， steps， operations， elements， components， and/or groups thereof.

As will be appreciated by one skilled in the art， the present invention may be embodied as a system， method or computer program product. Accordingly， the present invention may take the form of an entirely hardware embodiment， an entirely software embodiment (including firmware， resident software， micro-code， etc. ) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit， ” “module” or “system. ” Furthermore， the present invention may take the form of a computer program product embodied in any tangible medium of expression having computer-usable program code embodied in the medium.

It is further understood that the use of relational terms such as first and second， and the like， if any， are used solely to distinguish one from another entity， item， or action without necessarily requiring or implying any actual such relationship or order between such entities， items or actions.

Much of the inventive functionality and many of the inventive principles when implemented， are best supported with or in software or integrated circuits (ICs) ， such as a digital signal processor and software therefore or application specific ICs. It is expected that one of ordinary skill， notwithstanding possibly significant effort and many design choices motivated by， for example， available time， current technology， and economic considerations， when guided by the concepts and principles disclosed herein will be readily capable of generating such software instructions or ICs with minimal experimentation. Therefore， in the interest of brevity and minimization of any risk of obscuring the principles and concepts according to the present invention， further discussion of such software and ICs， if any， will be limited to the essentials with respect to the principles and concepts used by the preferred embodiments.

Fig. 1 is a schematic diagram illustrating an exemplary apparatus 100 for image classification consistent with some disclosed embodiments. As shown， the apparatus 100 may comprise a converter 10 and a forward propagator 20. The converter 10 is configured to retrieve a convolutional neural network with a plurality of convolutional layers and a plurality of pooling layers connected to the convolutional layers. The forward propagator may be configured to feed an image into the converted convolutional neural network to generate a predicted label map for the image classification. In one embodiment of the present application， the converter 10 may comprise a first converting unit 11 and a second converting unit 12. The first converting unit 11 may be configured to insert all-zero rows and columns to a convolutional kernel of each of the convolutional layers such that every two neighboring entries in the convolutional kernel are separated from each other. The second converting unit 12 may be configured to insert unmasked rows and columns to a pooling kernel of each of the pooling layers such that every two neighboring entries in the pooling kernel are separated from each other. In some embodiments， the two neighboring entries are separated from each other by several pixels.

Referring to Fig. 1， to enable the convolutional neural network to work effectively， the apparatus 100 further comprises a backward propagator 30 for calculating the gradients of parameters of the modified CNN. In the present embodiment， the backward propagator 30 may be configured to update the parameters of the convolutional kemel in the converted convolutional neural network. In some embodiments， the apparatus 100 further comprises a chooser 40， which calculates the errors of the predicted label map and chooses only the errors of pixels of interest for training CNN parameters. In the present embodiment， the chooser 40 may be configured to choose errors of pixels of interest， the errors being back-propagate through the converted convolutional neural network so as to update the parameters of the convolutional kernel.

Fig. 2 is a schematic diagram illustrating an exemplary forward propagator 20.As shown， the forward propagator 20 may comprise a first extracting unit 21， a first vectorizing unit 22， and a first convolution unit 23， wherein the first extracting unit 21 is configured to extract feature values specified by non-zero entries in the converted convolutional kernel from each neighborhood in an input feature of images to be classified， the first vectorizing unit 22 is configured to vectorize the non-zero entries of the converted convolutional kernel， and the first convolution unit 23 is configured to perform convolution on the feature values extracted by the first extracting unit and the non-zero entries vectorized by the first vectorizing unit to generate an output feature map， which may be used in the CNN as an intermediate result.

Fig. 3 is a schematic diagram illustrating another exemplary forward propagator 20’. As shown in Fig. 3， the forward propagator 20’ may comprise a second extracting unit 24， and a calculating unit 25， wherein the second extracting unit 24 is configured to extract feature values specified by masked entries in the converted pooling kernel from for each neighborhood in an input feature of images to be classified， and the calculating unit 25 is configured to calculate a mean value for an average pooling layer in said plurality of pooling layers or a max value for a max pooling layer in said plurality of pooling layers from the feature values extracted in the second extracting unit to generate an output feature map. As well known in the art， the pooling layer may be a layer in a convolutional neural network and can be at any layer of the CNN， and the average pooling layer calculates the mean value from the feature values extracted from each neighborhood of the input feature map. As to the max pooling layer， it may calculate the max value from the feature values extracted from each neighborhood of the input feature map.

It should be understood that， in some embodiments， the forward propagator 20 may comprise a first extracting unit 21， a first vectorizing unit 22， a first convolution unit 23， a second extracting unit 24， and a calculating unit 25. It should be understood that， although one forward propagator 20/20’ is shown in Figs. 2 and 3， there may be more than one forward propagators 20/20’ in other embodiments.

Fig. 4 is a schematic diagram illustrating an exemplary chooser 40. As shown in Fig. 4， the chooser 40 may comprises a comparer 41， which is configured to compare a predicted label map generated in the forward propagator 20 with a ground-truth label map to obtain pixel-wise errors for the label map.

In some embodiments， the chooser 40 may further comprises a multiplier 42， which is configured to multiply each of the pixel-wise errors with a pixel-of-interest mask to generate a masked map for the errors.

Fig. 5 is a schematic diagram illustrating an exemplary backward propagator 30. As shown in Fig. 5， the backward propagator 30 may comprise a third extracting unit 31， a second vectorizing unit 32， and a second convolution unit 33. The third extracting unit 31 is configured to extract feature values specified by non-zero entries in the converted convolutional kernel from each neighborhood in an input feature of images to be classified， the second vectorizing unit 32 is configured to vetorize the error map revived from the chooser 40 or the error map of the next layer， and the second convolution unit 33 is configured to perform convolution on the feature values extracted by the third extracting unit and by the error map vectorized by the second vectorizing unit 32 to calculate the gradients of the convolutional kernel for updating the convolutional kernel.

In some embodiments， the backward propagator 30 further comprises a third vectorizing unit 321， a fourth extracting unit 311， and a third convolution unit 331. The third vectorizing unit 321 may be configured to rotate the non-zero entries of the converted convolutional kernel by a certain degree and vectorize the rotated non-zero entries. The fourth extracting unit 311 may be configured to extract feature values specified by the rotated non-zero entries. The third convolution unit 331 is configured to perform convolution on the feature values extracted by the fourth extracting unit 311 and the non-zero entries vectorized by the third vectorizing unit 321 to generate an error map， which is propagated backward through the network layer by layer to update the convolutional kernels of former convolutional layers.

Fig. 6 is a schematic diagram illustrating another exemplary backward propagator 30’ . As shown in Fig. 6， the backward propagator 30’ may comprise a first transferring unit 34， and a first accumulating unit 35. The first transferring unit 34 may be configured to transfer the error value of the error map revived from the chooser 40 or the error map ora next layer to a corresponding entry on the error map of a current layer， whose indices are recorded in the forward propagator 10， and the first accumulating unit 35 may be configured to accumulate the transferred error values at each entry of the error map of the current layer.

Fig. 7 is a schematic diagram illustrating yet another exemplary backward propagator 30” . As shown in Fig. 7， the backward propagator 30” may comprise a dividing unit 36， a second transferring unit 37， and a second accumulating unit 38. The dividing unit 36 may be configured to divide each error value on the error map revived from the chooser 40 or the error map of a next layer by the number of masked entries in the pooling kernel. The second transferring unit 37 may be configured to transfer the divided error values back to the neighborhood on the error map of a current layer， whose indices are recorded in the forward propagator 10. The second accumulating unit 38 may be configured to accumulate the transferred error values at each entry of the error map of the current layer wherein the first transferring unit 34 is configured to transfer the error value of the error map of a next layer to a corresponding entry on the error map of a current layer， whose indices are recorded in the forward propagator 10.

Fig. 8 is a schematic diagram illustrating an exemplary method 200 for image classification. As shown， the method 200 may comprise following steps. At step 210， a convolutional neural network with a plurality of convolutional layers and a plurality of pooling layers connected to the convolutional layers is retrieved. At step 220， an image is fed into the convolutional neural network to predict classes of all pixels in the image. At step 260， errors of pixels of interest are chosen and back-propagated through the converted convolutional neural network so as to update the convolutional kernels. At step 240， the convolutional kernels in the converted convolutional neural network are updated.

In the present embodiment， the convolutional neural network is retrieved or obtained by converting an original CNN that takes image patches as input to a converted CNN that is able to take whole image as input. Particularly， some parameters， such as a convolutional kernel W_k of a convolution layer， a pooling kernel P_k of a pooling layer， and a stride d_k ofa layer， are converted by steps shown in Fig. 9. In step 211， d and k are originally set to be 1. In step 212， the type ofthe layer k is determined. If the layer is neither a convolution layer nor a pooling layer， the convention process goes to step 214. If the layer is a convolution layer in step 212， the method proceeds to step 213 at which the convolutional kernel W_k is converted by inserting all-zero rows and columns to the convolutional kernel W_k such that every two neighboring entries are d-pixels away from each other. Ifthe layer is a pooling layer at step 212， the pooling kernel P_k is converted by inserting unmasked rows and columns to the kernel P_k at step 213 such that every two neighboring entries are d-pixels away from each other. Then the process proceeds to step 214 at which whether the current layer is the last layer is determined. If yes， the process goes to step 215 at which the process ends. If not， k is increased by 1 (i.e.， k＝k+1) ， and the process goes back to step 212 to process the next layer.

Fig. 10 is an illustration of inserting all-zero rows and columns to the convolutional kernel W_k and pooling kernel P_k with d＝2 and d＝3， respectively.

Fig. 11 is a schematic flowchart illustrating the forward propagation according to one embodiment of the present application， which may be carried out by the forward propagator 20 mentioned above. As shown in Fig. 11， the forward propagation starts from step 221 at which an image is set as the input feature map and k is set to be 1. Then the process goes to step 222 at which the type of the current layer (i.e.， layer k) is determined. If the current layer is neither a convolution layer nor a pooling layer， for example， the layer is a non-linearity layer， the method goes to step 223 at which the operation is performed in its original way. If the current layer is a convolution layer with the converted convolutional kernel W_k and the bias vector b_k， the process goes to step 224 at which the stride of the layer is set to be 1， and then goes to step 225 at which feature values specified by non-zero entries in the kernel are extracted from each neighborhood in an input feature of image X_k to create a matrix. Then， the process goes to step 226 at which the non-zero entries of the convolutional kernel W_k are vectorized to create a vector. Then the process goes to step 227 at which， convolution is performed by matrix multiplication between the matrix and the vector created in former steps. Then the process goes to step 228 at which the results are stored in the output feature map.

At step 222， if the current layer is a pooling layer with the converted pooling kernel P_k， the process goes to step 229 at which the stride of the layer is set to be 1 and then goes to step 230 at which feature values specified by masked entries in the pooling kernel are extract from each neighborhood in an input feature of image X_k to be classified. Then， the process goes to step 231 at which a mean value for the average pooling layer or a max value for the max pooling layer is calculated from the feature values extracted in the second extracting unit to generate a label map for classifying all pixels in the image. Similar to step 228， the calculation results are stored in the output feature map at step 232. Then the process proceeds to step 233 at which whether the current layer is the last layer is determined. If yes， the process goes to step 234 at which output feature map of the last layer is output to generate a predicted label map， and the process ends. If not， the process goes back to step 222 to process the next layer.

In some embodiments， the output feature map X_k+1 may be created by re-organizing the matrix multiplication result. In some embodiments， a bias value b_k (i) is added to all the values for the ith channel of the output feature map X_k+1.

Fig. 13 is a schematic flowchart illustrating the choosing step according to one embodiment of the present application， which may be which may be carried out by the chooser 40 mentioned above. As shown in Fig. 13， a predicted label map generated in the forward propagator based on the output feature map is compared with a ground-truth label map to calculate pixel-wise errors for the label map in step 261， and then each of the pixel-wise errors is multiplied with a pixel-of-interest mask to generate a masked map for the errors at step 262. Finally， the masked error map is output at step 263.

Fig. 14 is a schematic flowchart illustrating the steps for backward propagating according to one embodiment of the present application， which may be carried out by the backward propagator 30 mentioned above.

As shown in Fig. 14， the forward propagation starts from step 241 at which the current error map is set as the input error map and k is set to be K. Then the process goes to step 242 at which the type of the current layer (i.e.， layer k) is determined. Ifthe current layer is neither a convolution layer nor a pooling layer， for example， the layer is a non-linearity layer， the method goes to step 243 at which the operation is performed in its original way.

At step 242， if the layer k is a convolution layer with the converted convolutional kernel W_k and the bias vector b_k， the process goes to step 244 at which feature values specified by non-zero entries in the converted convolutional kernel are extracted from each neighborhood in an input feature of image X_k to be classified to create a matrix. Then the error map δ_k+1 is rotated by a certain degree， for example， 180 degrees， and vetorized to create a vector at step 245. After that， convolution is performed as matrix multiplication with the created matrix and the vector to calculate the gradients of the kernel W_k at step 246. Then the process goes to step 247 at which the results are stored as the gradients of the kernel. For the ith channel in the error map δ_k+1， all the error values in each error channel is summed up as the gradient of the bias b_k (i) .

At the same time of step 244， the kernel may be rotated for a certain degree such as 180 degrees to create a vector at step 248. Then for each neighborhood in the error map δ_k+1， error values specified by non-zero entries in the kernel are extracted from the error map to create a matrix at step 250. Then the process goes to step 251 at which convolution is performed as matrix multiplication with the created matrix and vector to calculate the error map at layer. Finally， the results are stored in the error map δ_k of the previous layer， i.e.， layer k-1.

At step 242， ifthe layer k is a max pooling layer with the modified pooling kernel P_k， then the process goes to step 249 at which， for each error value of the error map δ_k+1， the error value is transferred to the corresponding entry on the error map δ_k， whose indices are recorded during the forward propagation. Then the transferred error values are accumulated at each entry of δ_k. If layer k is an average pooling layer with the modified pooling kernel P_k， each error value on δ_k+1 is divided by the number of masked entries in the pooling kernel P_k at step 249. Then the divided values is transferred back to the neighborhood on the error map δ_k， whose indices are recorded during the forward propagation. Finally， the transferred error values are accumulated at each entry of δ_k.

After above steps， the process proceeds to step 253 at which whether the current layer is the first layer is determined. If yes， the process goes to step 254 at which the process ends. If not， k is decreased by 1 (i.e.， k＝k-1) ， and the process goes back to step 242 to continue.

At the end of the process， the gradients of all convolutional kernels and bias vectors are output.

Fig. 15 is a comparison of patch-by-patch scanning for CNN based pixel-wise classification and the advanced method disclosed in the present application. Comparing with the conventional classification scheme， the present solution has the advantage of eliminating the redundant computation of forward and backward propagation in CNN based pixel-wise classification， and achieves significant speedup.

The corresponding structures， materials， acts， and equivalents of all means or step plus function elements in the claims below are intended to include any structure， material， or act for performing the function in combination with other claimed elements as specifically claimed. The description of the present invention has been presented for purposes of illustration and description， but is not intended to be exhaustive or limited to the invention in the form disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the invention. The embodiment was chosen and described in order to best explain the principles of the invention and the practical application， and to enable others of ordinary skill in the art to understand the invention for various embodiments with various modifications as are suited to the particular use contemplated.

Claims

An apparatus for image classification， comprising：

a converter configured to convert a convolutional neural network with a plurality of convolutional layers and a plurality of pooling layers connected to the convolutional layers， wherein the converter further comprises：

a first converting unit configured to insert all-zero rows and columns to a convolutional kernel of each of the convolutional layers such that every two neighboring entries in the convolutional kernel are separated from each other， and

a second converting unit configured to insert unmasked rows and columns to a pooling kernel of each of the pooling layers such that every two neighboring entries in the pooling kernel are separated from each other， and

a forward propagator configured to feed an image into the converted convolutional neural network to generate a predicted label map for the image classification.
The apparatus of claim 1， further comprising：

a backward propagator configured to update the convolutional kernel in the converted convolutional neural network.
The apparatus of claim 2， further comprising：

a chooser configured to choose errors of pixels of interest， the errors being back-propagate through the converted convolutional neural network so as to update the convolutional kernel.
The apparatus of claim 1， wherein the forward propagator further comprises：

a first extracting unit configured to extract feature values specified by non-zero entries in the converted convolutional kernel from each neighborhood in an input feature of an image to be classified；

a first vectorizing unit configured to vectorize the non-zero entries of the converted convolutional kernel； and

a first convolution unit configured to perform convolution on the feature values extracted by the first extracting unit and the non-zero entries vectorized by the first vectorizing unit to generate a feature map to be outputted from the convolutional layers.
The apparatus of claim 1， wherein the pooling layers comprises an average pooling layer or a max pooling layer， and

wherein the forward propagator further comprises：

a second extracting unit configured to extract feature values specified by masked entries in the converted pooling kernel from each neighborhood in an input feature of an image to be classified； and

a calculating unit configured to calculate a mean value for the average pooling layer or a max value for the max pooling layer from the feature values extracted in the second extracting unit to generate a feature map to be outputted from the pooling layers.
The apparatus of claim 1， wherein the chooser further comprises：

a comparer configured to compare the predicted label map with a ground-truth label map to obtain pixel-wise errors for the predicted label map； and

a multiplier configured to multiply each of the pixel-wise errors with a pixel-of-interest mask to generate a masked error map.
The apparatus of claim 2， wherein the backward propagator further comprises：

a third extracting unit configured to extract feature values specified by non-zero entries in the converted convolutional kernel from each neighborhood in an input feature of an image to be classified；

a second vectorizing unit configured to vetorize the error map revived from the chooser；

a second convolution unit configured to perform convolution on the feature values extracted by the third extracting unit and the error map vectorized by the second vectorizing unit to calculate the gradients of the convolutional kernel for updating the convolutional kernel.
The apparatus of claim 2， wherein the forward propagator further comprises：

a third vectorizing unit configured to rotate the non-zero entries of the converted convolutional kernel by a certain degree and vectorize the rotated non-zero entries；

a fourth extracting unit configured to extract feature values specified by the rotated non-zero entries； and

a third convolution unit configured to perform convolution on the feature values extracted by the fourth extracting unit and the non-zero entries vectorized by the third vectorizing unit to generate an error map for a former layer.
The apparatus of claim 8， wherein the backward propagator further comprises：

a first transferring unit configured to transfer the error value of the error map of a next layer to a corresponding entry on the error map of a current pooling layer of the plurality of polling layers， whose indices are recorded in the forward propagator； and

a first accumulating unit configured to accumulate the transferred error values at each entry of the error map of a current layer.
The apparatus of claim 8， wherein the backward propagator further comprises：

a dividing unit configured to divide each error value on the error map of a next layer by the number of masked entries in the pooling kernel；

a second transferring unit configured to transfer the divided error values back to the neighborhood on the error map of a current pooling layer of the plurality of polling layers， whose indices are recorded in the forward propagator； and

a second accumulating unit configured to accumulate the transferred error values at each entry of the error map of a current layer.
A method for image classification， comprising：

converting a convolutional neural network with a plurality of convolutional layers and a plurality of pooling layers connected to the convolutional layers， wherein the step of converting comprises：

inserting all-zero rows and columns to a convolutional kernel of each of the convolutional layers such that every two neighboring entries in the convolutional kernel are separated from each other， and

inserting unmasked rows and columns to a pooling kernel of each of the pooling layers such that every two neighboring entries in the pooling kernel are separated from each other， and

feeding an image into the converted convolutional neural network to generate a predicted label map for the image classification.
The method ofclaim 11， further comprising：

updating the convolutional kernel in the converted convolutional neural network.
The method of claim 12， further comprising：

choosing errors of pixels of interest， and back-propagating errors being through the converted convolutional neural network so as to update the convolutional kernel.
The method of claim 11， wherein the step of feeding comprises：

extracting feature values specified by non-zero entries in the converted convolutional kemel from each neighborhood in an input feature of an image to be classified；

vectorizing the non-zero entries of the converted convolutional kernel； and

performing convolution on the feature values extracted in the step of extracting and the non-zero entries vectorized in the step of vectorizing to generate an feature map to be outputted from the convolutional layers.
The method of claim 11， wherein the pooling layers comprises an average pooling layer or a max pooling layer， and

wherein the step of feeding further comprises：

extracting feature values specified by masked entries in the converted pooling kernel from each neighborhood in an input feature of an image to be classified； and

calculating a mean value for the average pooling layer or a max value for the max pooling layer from the feature values extracted in the step of extracting to generate a feature map to be outputted from the pooling layers.
The method of claim 11， wherein the step of choosing comprises：

comparing the predicted label map with a ground-truth label map to obtain pixel-wise errors for the predicted label map； and

multiplying each of the pixel-wise errors with a pixel-of-interest mask to generate a masked error map.
The method of claim 12， wherein the step of updating comprises：

extracting feature values specified by non-zero entries in the converted convolutional kemel from each neighborhood in an input feature of an image to be classified；

vectorizing the error map revived in the step of choosing errors；

performing convolution on the feature values extracted in the step of extracting and the error map vectorized in the step of vectorizing to calculate the gradients of the convolutional kernel for updating the convolutional kernel.
The method of claim 12， wherein the step of feeding further comprises：

rotating the non-zero entries of the converted convolutional kernel by a certain degree and vectorize the rotated non-zero entries；

extracting feature values specified by the rotated non-zero entries； and

performing convolution on the feature values extracted in the step of extracting and the non-zero entries vectorized in the step of vectorizing to generate an error feature map for a former layer.
The method of claim 18， wherein the step of updating further comprises：

transferring the error value of the error map of a next layer to a corresponding entry on the error map of a current pooling layer of the plurality of polling layers， whose indices are recorded during the step of feeding； and

accumulating the transferred error values at each entry of the error map of a current layer.
The method of claim 18， wherein the step of updating further comprises：

dividing each error value on the error map of a next layer by the number of masked entries in the pooling kernel；

transferring the divided error values back to the neighborhood on the error map of a current pooling layer of the plurality of polling layers， whose indices are recorded during the step of feeding； and

accumulating the transferred error values at each entry of the error map of a current layer.