WO2017214968A1 - Method and apparatus for convolutional neural networks - Google Patents

Method and apparatus for convolutional neural networks Download PDF

Info

Publication number
WO2017214968A1
WO2017214968A1 PCT/CN2016/086152 CN2016086152W WO2017214968A1 WO 2017214968 A1 WO2017214968 A1 WO 2017214968A1 CN 2016086152 W CN2016086152 W CN 2016086152W WO 2017214968 A1 WO2017214968 A1 WO 2017214968A1
Authority
WO
WIPO (PCT)
Prior art keywords
elements
channel
region
pooling
convolutional neural
Prior art date
Application number
PCT/CN2016/086152
Other languages
French (fr)
Inventor
Jiale CAO
Original Assignee
Nokia Technologies Oy
Nokia Technologies (Beijing) Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Technologies Oy, Nokia Technologies (Beijing) Co., Ltd. filed Critical Nokia Technologies Oy
Priority to PCT/CN2016/086152 priority Critical patent/WO2017214968A1/en
Publication of WO2017214968A1 publication Critical patent/WO2017214968A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • G06V10/431Frequency domain transformation; Autocorrelation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Definitions

  • Embodiments of the present disclosure generally relate to the field of convolutional neural networks, and in particular, to a method and apparatus for a pooling layer of a convolutional neural network.
  • Convolutional neural networks have achieved state-of-the-art performance in the applications of image recognition, object detection, acoustic recognition, and so on.
  • Representative applications of convolutional Neural Networks include, but are not limited to, AlphaGo, Advanced Driver Assistance Systems (ADAS) , self-driving cars, Optical Character Recognition (OCR) , face recognition, large-scale image classification (for example, ImageNet classification) , and Human Machine Interaction (HCI) .
  • Convolutional neural networks are mainly organized in interweaved layers of two types: convolutional layers and pooling (subsampling) layers, with a convolutional layer (or several convolutional layers) followed by a pooling layer.
  • Pooling is a process that replaces the output of its corresponding convolutional layers at certain location with summary statistic of the nearby outputs. Pooling over spatial regions contributes to make feature representation become shift invariant and also contributes to improve the computational efficiency of the convolutional neural network.
  • max-pooling and average pooling are two dominant pooling methods.
  • the max-pooling operation outputs the maximum value within a rectangular neighborhood.
  • the average pooling operation outputs the average value within a rectangular neighborhood. In some scenarios, the max-pooling and the average pooling can be combined.
  • example embodiments of the present disclosure provide a method and apparatus for a pooling layer of a convolutional neural network.
  • a method implemented at a pooling layer of a convolutional neural network comprises receiving a channel including a two-dimensional array of elements, the elements being associated with a feature extracted from data being processed by the convolutional neural network.
  • the method further comprises dividing the channel into regions, each region including a sub-array of the array.
  • the method further comprises calculating magnitudes of Fourier transformed elements of each region.
  • the method further comprises reshaping the calculated magnitudes of respective regions into respective vectors to complete pooling of the feature.
  • the dividing may comprise dividing the channel into non-overlapping regions.
  • each region may have a same quantity of elements.
  • the reshaping comprises arranging the calculated magnitudes along the channel dimension.
  • the arranging may comprise arranging the calculated magnitudes of the respective regions in a same order.
  • the data may be associated with at least one of image recognition, acoustic recognition, object detection, Advanced Driver Assistance Systems (ADAS) , self-driving cars, Optical Character Recognition (OCR) , face recognition, large-scale image classification, and Human Machine Interaction (HCI) .
  • image recognition acoustic recognition
  • object detection acoustic recognition
  • ADAS Advanced Driver Assistance Systems
  • OCR Optical Character Recognition
  • face recognition face recognition
  • large-scale image classification large-scale image classification
  • HCI Human Machine Interaction
  • an apparatus implemented at a pooling layer of a convolutional neural network comprises at least one processor and at least one memory including computer program code.
  • the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to perform: receiving a channel including a two-dimensional array of elements, the elements being associated with a feature extracted from data being processed by the convolutional neural network; dividing the channel into regions, each region including a sub-array of the array; calculating magnitudes of Fourier transformed elements of each region; and reshaping the calculated magnitudes of respective regions into respective vectors to complete pooling of the feature.
  • the at least one memory and the computer program code may further be configured to, with the at least one processor, cause the apparatus to perform dividing the channel into non-overlapping regions.
  • the at least one memory and the computer program code may further be configured to, with the at least one processor, cause the apparatus to perform: reshaping the calculated magnitudes of respective regions by arranging the calculated magnitudes along the channel dimension.
  • the data may be associated with at least one of image recognition, acoustic recognition, object detection, Advanced Driver Assistance Systems (ADAS) , self-driving cars, Optical Character Recognition (OCR) , face recognition, large-scale image classification, and Human Machine Interaction (HCI) .
  • image recognition acoustic recognition
  • object detection acoustic recognition
  • ADAS Advanced Driver Assistance Systems
  • OCR Optical Character Recognition
  • face recognition face recognition
  • large-scale image classification large-scale image classification
  • HCI Human Machine Interaction
  • an apparatus comprises means for performing a method according to the first aspect.
  • the proposed pooling method and apparatus can achieve complete shift-invariance. Moreover, instead of representing a region with a single value, the proposed pooling method and apparatus represent a region with a vector, and thus they do not lose useful information.
  • Fig. 1 is a schematic diagram of an environment in which embodiments of the present disclosure can be implemented
  • Fig. 3 is a schematic diagram illustrating a method implemented at a pooling layer of a convolutional neutral network in accordance with embodiments of the present disclosure
  • Fig. 4 is a block diagram illustrating an apparatus implemented at a pooling layer of a convolutional neutral network in accordance with an embodiment of the present disclosure.
  • Fig. 5 is a block diagram illustrating an apparatus implemented at a pooling layer of a convolutional neutral network in accordance with another embodiment of the present disclosure.
  • references in the specification to “one embodiment, ” “an embodiment, ” “an example embodiment, ” etc. indicate that the embodiment described may include a particular feature, structure, or characteristic, but it is not necessary that every embodiment includes the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to affect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
  • first and second etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first element could be termed a second element, and similarly, a second element could be termed a first element, without departing from the scope of example embodiments.
  • the term “and/or” includes any and all combinations of one or more of the associated listed terms.
  • Fig. 1 shows a schematic diagram of an environment in which embodiments of the present disclosure can be implemented.
  • an example framework of a convolutional neural network 100 may include an input color image 110, a first convolutional layer 120, a first pooling layer 130, a second convolutional layer 140, a second pooling layer 150, and a classification unit 160.
  • the convolutional neural network 100 can also apply to any other suitable environments as described above.
  • the convolutional neural network 100 may be equivalently or similarly applied to acoustic recognition, object detection, Advanced Driver Assistance Systems (ADAS) , self-driving cars, Optical Character Recognition (OCR) , face recognition, large-scale image classification, Human Machine Interaction (HCI) , and so on.
  • ADAS Advanced Driver Assistance Systems
  • OCR Optical Character Recognition
  • HAI Human Machine Interaction
  • the convolutional neural network 100 is depicted as comprising two convolutional layers and two pooling layers, the convolutional neural network 100 may have more or less convolutional layers and pooling layers. Additionally, the convolutional neural network 100 may include other layers or units which are not shown in Fig. 1 for simplicity.
  • a first convolution operation 115 may be performed on the input color image 110 and the first convolutional layer 120 may be generated accordingly.
  • a first pooling operation 125 may be performed on the first convolutional layer 120, and the first pooling layer 130 may thus be generated.
  • a second convolution operation 135 may be performed on the first pooling layer 130 to generate the second convolutional layer 140.
  • a second pooling operation 145 may be performed on the second convolutional layer 140 to generate the second pooling layer 150.
  • the second pooling layer 150 may be classified by the classification unit 160 and the image recognition process may be accomplished.
  • the pooling operations performed at a first pooling layer 130 and a second pooling layer 150.
  • the pooling operation is able to subsample the input layers, for example, , to reduce the spatial size of the input layers.
  • Example embodiments of the present disclosure are aimed at presenting a complete shift-invariant pooling algorithm. Further, in the method and apparatus in accordance with example embodiments of the present disclosure, not only full shift-invariance property can be obtained but also no information is discarded.
  • the first pooling layer 130 and/or the second pooling layer 150 may employ the pooling method as proposed herein to improve the performance of the convolutional neural network 100. This will be discussed in the following paragraphs. First of all, the pooling method as proposed herein will be described in detail with reference to Fig. 2.
  • Fig. 2 shows a flowchart illustrating a method 200 implemented at a pooling layer of a convolutional neutral network in accordance with embodiments of the present disclosure.
  • the method 200 can be implemented by the first pooling layer 130 and/or the second pooling layer 150 as shown in Fig. 1, for example.
  • a channel including a two-dimensional array of elements is received, the elements being associated with a feature extracted from data being processed by the convolutional neural network.
  • the channel may be a feature map of a convolutional layer associated with the pooling layer at which the method 200 is implemented.
  • the feature may be extracted from the data by the convolutional layer.
  • step 202 may further comprise receiving the channel from a convolutional layer of the convolutional neural network.
  • the data may be associated with at least one of image recognition, acoustic recognition, object detection, Advanced Driver Assistance Systems (ADAS) , self-driving cars, Optical Character Recognition (OCR) , face recognition, large-scale image classification, and Human Machine Interaction (HCI) . This may depend on the specific application of the convolutional neural network.
  • ADAS Advanced Driver Assistance Systems
  • OCR Optical Character Recognition
  • HAI Human Machine Interaction
  • the channel is divided into regions, each region including a sub-array of the array.
  • the pooling operation will be based on these regions. In other words, the pooling operation will be performed region by region, and the elements in a region will be pooled in the pooling of the region.
  • the step 204 may further comprise dividing the channel into non-overlapping regions. In this manner, the performance of the pooling operation may be improved, and the complexity of the pooling operation may be reduced.
  • each region may have a same quantity of elements, such that the performance and the complexity of the pooling operation may further be improved.
  • each region may have n ⁇ n elements, where n is an integer equal to or greater than two.
  • step 206 magnitudes of Fourier transformed elements of each region are calculated.
  • the Fourier transform may be performed on elements of each region, and the same quantity of Fourier transformed elements may thus be generated.
  • Each Fourier transformed element may have a magnitude and a phase.
  • both the magnitudes and phases of Fourier transformed elements of each region may be calculated, and then the phases may be dropped in step 206.
  • the calculated magnitudes of respective regions are reshaped into respective vectors to complete pooling of the feature. It is to be understood that the quantity of the calculated magnitudes of a region is equal to that of the element in the region. Thus, if there are a number of elements in a region, there are the same number of calculated magnitudes in the region. This number of calculated magnitudes may be reshaped into a vector, and the dimensionality of the vector is the quantity of the calculated magnitudes.
  • the pooling method proposed herein achieves complete shift-invariance due to the inherent shift-invariant nature of the Fourier transform. Moreover, instead of representing a region with a single value, the proposed pooling method represent a region with a vector. Therefore, the proposed pooling method does not lose useful information.
  • Fig. 3 shows a schematic diagram 300 illustrating a method implemented at a pooling layer of a convolutional neutral network in accordance with embodiments of the present disclosure.
  • a channel (for example, a feature map) 310 of a convolutional layer of a convolutional neutral network may be inputted to a pooling layer of the convolutional neutral network.
  • the channel 310 may have a two-dimensional array of elements 311.
  • the two-dimensional array in Fig. 3 is a 9 ⁇ 6 array, namely there are 9 rows and 6 columns of elements 311 in the channel 310. This is merely an example implementation without suggesting any limitations as to the scope of the present disclosure.
  • the elements 311 of the channel 310 may be divided into regions 312, and the elements 311 of the channel 310 are pooled region by region.
  • each region 312 may include 3 ⁇ 3 elements in Fig. 3, and the regions 312 may not be overlapped. Therefore, the channel 310 may be divided into 6 regions 312. Also, this is merely an example implementation without suggesting any limitations as to the scope of the present disclosure.
  • the Fourier transform may be applied to each region 312, in order to calculate the magnitudes of the Fourier transformed elements of each region 312.
  • the 3 ⁇ 3 magnitudes of the result of the Fourier transform may be calculated.
  • the quantity of 3 ⁇ 3 is merely an example implementation without suggesting any limitations as to the scope of the present disclosure.
  • each region 312 may be reshaped into respective 9-dimensional vectors 313 along the channel dimension.
  • the pooled result may be 9 new channels 330, and each channel 330 may include 6 respective magnitudes of 6 respective regions 312.
  • the particular quantities here are merely an example implementation without suggesting any limitations as to the scope of the present disclosure.
  • the first pooling layer 130 and/or the second pooling layer 150 as shown in Fig. 1 may employ the pooling method as proposed herein to improve the performance of the convolutional neural network 100.
  • a proposed Fourier Transform based pooling operation (FT-pooling operation) may be followed by a convolution operation.
  • an H ⁇ W ⁇ 3 color image 110 may be inputted to the convolutional neutral network 100.
  • the color image 110 may have H rows, W columns, and 3 channels.
  • the color image 110 may be convolved with a a 1 ⁇ b 1 ⁇ 3 filter in a first convolutional operation 115 and the resulting first convolutional layer 120 may contain k 1 channels (feature maps) , with each channel being of size of H ⁇ W.
  • the first pooling layer 130 may be convolved with a a 2 ⁇ b 2 ⁇ k 2 filter in a second convolutional operation 135 and the resulting second convolutional layer 140 may contain k 3 channels (feature maps) , with each channel being of size of (H/n) ⁇ (W/n) .
  • the performance of the convolutional neutral network 100 can be greatly improved, because the Fourier Transform based pooling method is completely shift-invariant and does not lose useful information.
  • Fig. 4 shows a block diagram illustrating an apparatus 400 implemented at a pooling layer of a convolutional neutral network in accordance with an embodiment of the present disclosure.
  • the apparatus 400 may be operable to implement the example method 200 described with reference to Fig. 2 and possibly any other processes or methods. It is to be understood that the method 200 is not necessarily implemented by the apparatus 400. At least some steps of the method 200 may be performed by one or more other entities.
  • the apparatus 400 includes a receiving unit 401, a dividing unit 402, a calculating unit 403, and a reshaping unit 404.
  • the receiving unit 401 is configured to receive a channel including a two-dimensional array of elements, the elements being associated with a feature extracted from data being processed by the convolutional neural network.
  • the dividing unit 402 is configured to divide the channel into regions, each region including a sub-array of the array.
  • the calculating unit 403 is configured to calculate magnitudes of Fourier transformed elements of each region.
  • the reshaping unit 404 is configured to reshape the calculated magnitudes of respective regions into respective vectors to complete pooling of the feature.
  • the receiving unit 401 may further be configured to receive the channel from a convolutional layer of the convolutional neural network.
  • the dividing unit 402 may further be configured to divide the channel into non-overlapping regions.
  • each region may have a same quantity of elements.
  • each region may have n ⁇ n elements, where n is an integer equal to or greater than two.
  • the reshaping unit 404 may further be configured to arrange the calculated magnitudes along the channel dimension. In some other embodiments, the reshaping unit 404 may further be configured to arrange the calculated magnitudes of the respective regions in a same order.
  • the data is associated with at least one of image recognition, acoustic recognition, object detection, Advanced Driver Assistance Systems (ADAS) , self-driving cars, Optical Character Recognition (OCR) , face recognition, large-scale image classification, and Human Machine Interaction (HCI) .
  • image recognition acoustic recognition
  • object detection acoustic recognition
  • ADAS Advanced Driver Assistance Systems
  • OCR Optical Character Recognition
  • face recognition face recognition
  • large-scale image classification large-scale image classification
  • HCI Human Machine Interaction
  • Fig. 5 shows a block diagram illustrating an apparatus 500 implemented at a pooling layer of a convolutional neutral network in accordance with another embodiment of the present disclosure.
  • the apparatus 500 may be operable to implement the example method 200 described with reference to Fig. 2 and possibly any other processes or methods. It is to be understood that the method 200 is not necessarily implemented by the apparatus 500. At least some steps of the method 200 may be performed by one or more other entities.
  • the apparatus 500 may include at least one processor 501, such as a data processor (DP) , and at least one memory 502 coupled to the processor 501.
  • the memory 502 may be non-transitory machine readable storage medium and it may store a program (PROG) 503.
  • the PROG 503 may include instructions that, when executed on the associated processor 501, enable the apparatus 500 to operate in accordance with the embodiments of the present disclosure, for example to perform the method 200.
  • the embodiments herein may be implemented by computer software executable by the processor 501 of the apparatus 500, or by hardware, or by a combination of software and hardware.
  • a combination of the at least one processor 501 and the at least one memory 502 may form processing means 510 adapted to implement various embodiments of the present disclosure, for example, method 200.
  • various embodiments of the present disclosure may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. Some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device. While various aspects of embodiments of the present disclosure are illustrated and described as block diagrams, flowcharts, or using some other pictorial representation, it will be appreciated that the blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
  • embodiments of the present disclosure can be described in the general context of machine-executable instructions, such as those included in program modules, being executed in a device on a target real or virtual processor.
  • program modules include routines, programs, libraries, objects, classes, components, data structures, or the like that perform particular tasks or implement particular abstract data types.
  • the functionality of the program modules may be combined or split between program modules as desired in various embodiments.
  • Machine-executable instructions for program modules may be executed within a local or distributed device. In a distributed device, program modules may be located in both local and remote storage media.
  • Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowcharts and/or block diagrams to be implemented.
  • the program code may execute entirely on a machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
  • a machine readable medium may be any tangible medium that rnay contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
  • the machine readable medium may be a machine readable signal medium or a machine readable storage medium.
  • a machine readable medium may include but not limited to an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing.
  • machine readable storage medium More specific examples of the machine readable storage medium would include an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM) , a read-only memory (ROM) , an erasable programmable read-only memory (EPROM or Flash memory) , an optical fiber, a portable compact disc read-only memory (CD-ROM) , an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read-only memory
  • EPROM or Flash memory erasable programmable read-only memory
  • CD-ROM portable compact disc read-only memory
  • magnetic storage device or any suitable combination of the foregoing.

Abstract

Embodiments of the present disclosure relate to method and apparatus for convolutional neural networks. In example embodiments, a method implemented at a pooling layer of a convolutional neural network comprises receiving a channel including a two-dimensional array of elements, the elements being associated with a feature extracted from data being processed by the convolutional neural network. The method further comprises dividing the channel into regions, each region including a sub-array of the array. The method further comprises calculating magnitudes of Fourier transformed elements of each region. The method further comprises reshaping the calculated magnitudes of respective regions into respective vectors to complete pooling of the feature.

Description

METHOD AND APPARATUS FOR CONVOLUTIONAL NEURAL NETWORKS FIELD
Embodiments of the present disclosure generally relate to the field of convolutional neural networks, and in particular, to a method and apparatus for a pooling layer of a convolutional neural network.
BACKGROUND
Convolutional neural networks (CNNs) have achieved state-of-the-art performance in the applications of image recognition, object detection, acoustic recognition, and so on. Representative applications of convolutional Neural Networks include, but are not limited to, AlphaGo, Advanced Driver Assistance Systems (ADAS) , self-driving cars, Optical Character Recognition (OCR) , face recognition, large-scale image classification (for example, ImageNet classification) , and Human Machine Interaction (HCI) . Convolutional neural networks are mainly organized in interweaved layers of two types: convolutional layers and pooling (subsampling) layers, with a convolutional layer (or several convolutional layers) followed by a pooling layer.
Pooling is a process that replaces the output of its corresponding convolutional layers at certain location with summary statistic of the nearby outputs. Pooling over spatial regions contributes to make feature representation become shift invariant and also contributes to improve the computational efficiency of the convolutional neural network.
Currently, max-pooling and average pooling (also called sum pooling) are two dominant pooling methods. The max-pooling operation outputs the maximum value within a rectangular neighborhood. The average pooling operation outputs the average value within a rectangular neighborhood. In some scenarios, the max-pooling and the average pooling can be combined.
In addition, stochastic pooling is proposed in which the deterministic pooling operations are replaced with a stochastic procedure by randomly picking the activation within each pooling region according to a multinomial distribution. In another traditional pooling method, the pooling formulates a kind of convolution. Moreover, others  proposed a spatial pyramid pooling to deal with a situation where the training images are of different size.
SUMMARY
In general, example embodiments of the present disclosure provide a method and apparatus for a pooling layer of a convolutional neural network.
In a first aspect, a method implemented at a pooling layer of a convolutional neural network is provided. The method comprises receiving a channel including a two-dimensional array of elements, the elements being associated with a feature extracted from data being processed by the convolutional neural network. The method further comprises dividing the channel into regions, each region including a sub-array of the array. The method further comprises calculating magnitudes of Fourier transformed elements of each region. The method further comprises reshaping the calculated magnitudes of respective regions into respective vectors to complete pooling of the feature.
In some embodiments, the receiving may comprise receiving the channel from a convolutional layer of the convolutional neural network.
In some embodiments, the dividing may comprise dividing the channel into non-overlapping regions.
In some embodiments, each region may have a same quantity of elements.
In some embodiments, each region may have n×n elements, where n is an integer equal to or greater than two.
In some embodiments, the reshaping comprises arranging the calculated magnitudes along the channel dimension.
In some embodiments, the arranging may comprise arranging the calculated magnitudes of the respective regions in a same order.
In some embodiments, the data may be associated with at least one of image recognition, acoustic recognition, object detection, Advanced Driver Assistance Systems (ADAS) , self-driving cars, Optical Character Recognition (OCR) , face recognition, large-scale image classification, and Human Machine Interaction (HCI) .
In a second aspect, an apparatus implemented at a pooling layer of a convolutional neural network is provided. The apparatus comprises at least one processor and at least one memory including computer program code. The at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to perform: receiving a channel including a two-dimensional array of elements, the elements being associated with a feature extracted from data being processed by the convolutional neural network; dividing the channel into regions, each region including a sub-array of the array; calculating magnitudes of Fourier transformed elements of each region; and reshaping the calculated magnitudes of respective regions into respective vectors to complete pooling of the feature.
In some embodiments, the at least one memory and the computer program code may further be configured to, with the at least one processor, cause the apparatus to perform receiving the channel from a convolutional layer of the convolutional neural network.
In some embodiments, the at least one memory and the computer program code may further be configured to, with the at least one processor, cause the apparatus to perform dividing the channel into non-overlapping regions.
In some embodiments, each region may have a same quantity of elements.
In some embodiments, each region may have n×n elements, where n is an integer equal to or greater than two.
In some embodiments, the at least one memory and the computer program code may further be configured to, with the at least one processor, cause the apparatus to perform: reshaping the calculated magnitudes of respective regions by arranging the calculated magnitudes along the channel dimension.
In some embodiments, the at least one memory and the computer program code may further be configured to, with the at least one processor, cause the apparatus to perform: arranging the calculated magnitudes of the respective regions in a same order.
In some embodiments, the data may be associated with at least one of image recognition, acoustic recognition, object detection, Advanced Driver Assistance Systems (ADAS) , self-driving cars, Optical Character Recognition (OCR) , face recognition,  large-scale image classification, and Human Machine Interaction (HCI) .
In a third aspect, an apparatus is provided. The apparatus comprises means for performing a method according to the first aspect.
In a fourth aspect, a computer program product is provided. The computer program product comprises at least one computer readable non-transitory memory medium having program code stored thereon. The program code, when executed by an apparatus, causes the apparatus to perform a method according to the first aspect.
Compared to existing pooling methods, the proposed pooling method and apparatus can achieve complete shift-invariance. Moreover, instead of representing a region with a single value, the proposed pooling method and apparatus represent a region with a vector, and thus they do not lose useful information.
It is to be understood that the summary section is not intended to identify key or essential features of embodiments of the present disclosure, nor is it intended to be used to limit the scope of the present disclosure. Other features of the present disclosure will become easily comprehensible through the following description.
BRIEF DESCRIPTION OF THE DRAWINGS
Through the more detailed description of some embodiments of the present disclosure in the accompanying drawings, the above and other objects, features and advantages of the present disclosure will become more apparent, wherein:
Fig. 1 is a schematic diagram of an environment in which embodiments of the present disclosure can be implemented;
Fig. 2 is a flowchart illustrating a method implemented at a pooling layer of a convolutional neutral network in accordance with embodiments of the present disclosure;
Fig. 3 is a schematic diagram illustrating a method implemented at a pooling layer of a convolutional neutral network in accordance with embodiments of the present disclosure;
Fig. 4 is a block diagram illustrating an apparatus implemented at a pooling layer of a convolutional neutral network in accordance with an embodiment of the present disclosure; and
Fig. 5 is a block diagram illustrating an apparatus implemented at a pooling layer of a convolutional neutral network in accordance with another embodiment of the present disclosure.
Throughout the drawings, the same or similar reference numerals represent the same or similar elements.
DETAILED DESCRIPTION
Hereinafter, the principle and spirit of the present disclosure will be described with reference to illustrative embodiments. It should be understood, all these embodiments are given merely for one skilled in the art to better understand and further practice the present disclosure, but not for limiting the scope of the present disclosure. For example, features illustrated or described as part of one embodiment may be used with another embodiment to yield still a further embodiment. In the interest of clarity, not all features of an actual implementation are described in this specification.
References in the specification to “one embodiment, ” “an embodiment, ” “an example embodiment, ” etc. indicate that the embodiment described may include a particular feature, structure, or characteristic, but it is not necessary that every embodiment includes the particular feature, structure, or characteristic. Moreover, such phrases are not necessarily referring to the same embodiment Further, when a particular feature, structure, or characteristic is described in connection with an embodiment, it is submitted that it is within the knowledge of one skilled in the art to affect such feature, structure, or characteristic in connection with other embodiments whether or not explicitly described.
It shall be understood that, although the terms “first” and “second” etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first element could be termed a second element, and similarly, a second element could be termed a first element, without departing from the scope of example embodiments. As used herein, the term “and/or” includes any and all combinations of one or more of the associated listed terms.
The terminology used herein is for the purpose of describing particular  embodiments only and is not intended to be liming of example embodiments. As used herein, the singular forms “a” , “an” and “the” are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms “comprises” , “comprising” , “has” , “having” , “includes” and/or “including” , when used herein, specify the presence of stated features, elements, and/or components etc., but do not preclude the presence or addition of one or more other features, elements, components and/or combinations thereof.
In the following description and claims, unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skills in the art to which this disclosure belongs.
The example embodiments of the present disclosure will now be described with reference to Figs. 1-5. It is to be understood that these embodiments are described for the purpose of illustration only and help those skilled in the art to understand and implement the present disclosure, without suggesting any limitations as to the scope of the disclosure. The disclosure described herein can be implemented in various manners other than those described below.
Fig. 1 shows a schematic diagram of an environment in which embodiments of the present disclosure can be implemented. As shown in Fig. 1, an example framework of a convolutional neural network 100 may include an input color image 110, a first convolutional layer 120, a first pooling layer 130, a second convolutional layer 140, a second pooling layer 150, and a classification unit 160.
Although an image recognition scenario is depicted as an example application of the convolutional neural network 100, it is to be understood that the convolutional neural network 100 can also apply to any other suitable environments as described above. For example, the convolutional neural network 100 may be equivalently or similarly applied to acoustic recognition, object detection, Advanced Driver Assistance Systems (ADAS) , self-driving cars, Optical Character Recognition (OCR) , face recognition, large-scale image classification, Human Machine Interaction (HCI) , and so on.
Further, although the convolutional neural network 100 is depicted as comprising two convolutional layers and two pooling layers, the convolutional neural network 100 may have more or less convolutional layers and pooling layers. Additionally, the  convolutional neural network 100 may include other layers or units which are not shown in Fig. 1 for simplicity.
In the image recognition process as shown in Fig. 1, after the input color image 110 is inputted to the convolutional neural network 100, a first convolution operation 115 may be performed on the input color image 110 and the first convolutional layer 120 may be generated accordingly. Then, a first pooling operation 125 may be performed on the first convolutional layer 120, and the first pooling layer 130 may thus be generated. In a same manner, a second convolution operation 135 may be performed on the first pooling layer 130 to generate the second convolutional layer 140. A second pooling operation 145 may be performed on the second convolutional layer 140 to generate the second pooling layer 150. As a result, the second pooling layer 150 may be classified by the classification unit 160 and the image recognition process may be accomplished.
Below are two roles of the pooling operations performed at a first pooling layer 130 and a second pooling layer 150. First, it makes the feature representation become invariant to small shifts of an input. This means that even if the input is shifted by a small amount, the values of most of the pooled outputs do not change. Second, by utilizing a single value to represent a region consisting of several elements, the pooling operation is able to subsample the input layers, for example, , to reduce the spatial size of the input layers.
Example embodiments of the present disclosure are aimed at presenting a complete shift-invariant pooling algorithm. Further, in the method and apparatus in accordance with example embodiments of the present disclosure, not only full shift-invariance property can be obtained but also no information is discarded. In particular, the first pooling layer 130 and/or the second pooling layer 150 may employ the pooling method as proposed herein to improve the performance of the convolutional neural network 100. This will be discussed in the following paragraphs. First of all, the pooling method as proposed herein will be described in detail with reference to Fig. 2.
Fig. 2 shows a flowchart illustrating a method 200 implemented at a pooling layer of a convolutional neutral network in accordance with embodiments of the present disclosure. The method 200 can be implemented by the first pooling layer 130 and/or the second pooling layer 150 as shown in Fig. 1, for example.
In step 202, a channel including a two-dimensional array of elements is received, the elements being associated with a feature extracted from data being processed by the convolutional neural network. In some embodiments, the channel may be a feature map of a convolutional layer associated with the pooling layer at which the method 200 is implemented. The feature may be extracted from the data by the convolutional layer. In these embodiments, step 202 may further comprise receiving the channel from a convolutional layer of the convolutional neural network.
In some embodiments, the data may be associated with at least one of image recognition, acoustic recognition, object detection, Advanced Driver Assistance Systems (ADAS) , self-driving cars, Optical Character Recognition (OCR) , face recognition, large-scale image classification, and Human Machine Interaction (HCI) . This may depend on the specific application of the convolutional neural network.
In step 204, the channel is divided into regions, each region including a sub-array of the array. The pooling operation will be based on these regions. In other words, the pooling operation will be performed region by region, and the elements in a region will be pooled in the pooling of the region. The step 204 may further comprise dividing the channel into non-overlapping regions. In this manner, the performance of the pooling operation may be improved, and the complexity of the pooling operation may be reduced. In some embodiments, each region may have a same quantity of elements, such that the performance and the complexity of the pooling operation may further be improved. In other embodiments, each region may have n×n elements, where n is an integer equal to or greater than two.
In step 206, magnitudes of Fourier transformed elements of each region are calculated. The Fourier transform may be performed on elements of each region, and the same quantity of Fourier transformed elements may thus be generated. Each Fourier transformed element may have a magnitude and a phase. In some embodiments, both the magnitudes and phases of Fourier transformed elements of each region may be calculated, and then the phases may be dropped in step 206.
In step 208, the calculated magnitudes of respective regions are reshaped into respective vectors to complete pooling of the feature. It is to be understood that the quantity of the calculated magnitudes of a region is equal to that of the element in the  region. Thus, if there are a number of elements in a region, there are the same number of calculated magnitudes in the region. This number of calculated magnitudes may be reshaped into a vector, and the dimensionality of the vector is the quantity of the calculated magnitudes.
In some embodiments, the step 208 may further comprises arranging the calculated magnitudes along the channel dimension, so that different calculated magnitudes of a region belong to different channels. In these embodiments, the step 208 may further comprise arranging the calculated magnitudes of the respective regions in a same order. For example, if two calculated magnitudes belong to two different regions but are in a same channel, their related elements are located in the same position (such as, the same row and the same column) in the two different regions.
It can be seen that the pooling method proposed herein achieves complete shift-invariance due to the inherent shift-invariant nature of the Fourier transform. Moreover, instead of representing a region with a single value, the proposed pooling method represent a region with a vector. Therefore, the proposed pooling method does not lose useful information.
Fig. 3 shows a schematic diagram 300 illustrating a method implemented at a pooling layer of a convolutional neutral network in accordance with embodiments of the present disclosure. As shown in Fig. 3, a channel (for example, a feature map) 310 of a convolutional layer of a convolutional neutral network may be inputted to a pooling layer of the convolutional neutral network. As depicted, the channel 310 may have a two-dimensional array of elements 311. For example, the two-dimensional array in Fig. 3 is a 9×6 array, namely there are 9 rows and 6 columns of elements 311 in the channel 310. This is merely an example implementation without suggesting any limitations as to the scope of the present disclosure.
In a dividing step which is depicted by the dotted lines, the elements 311 of the channel 310 may be divided into regions 312, and the elements 311 of the channel 310 are pooled region by region. As an example, each region 312 may include 3×3 elements in Fig. 3, and the regions 312 may not be overlapped. Therefore, the channel 310 may be divided into 6 regions 312. Also, this is merely an example implementation without suggesting any limitations as to the scope of the present disclosure.
In step 315, the Fourier transform may be applied to each region 312, in order to calculate the magnitudes of the Fourier transformed elements of each region 312. In step 320, the 3×3 magnitudes of the result of the Fourier transform may be calculated. In step 325, the 3×3 magnitudes may be reshaped into a 9-dimensional vector (3×3 = 9) . The quantity of 3×3 is merely an example implementation without suggesting any limitations as to the scope of the present disclosure.
As shown in Fig. 3, after the reshaping step 325, the elements 311 in each region 312 may be reshaped into respective 9-dimensional vectors 313 along the channel dimension. At last, the pooled result may be 9 new channels 330, and each channel 330 may include 6 respective magnitudes of 6 respective regions 312. The particular quantities here are merely an example implementation without suggesting any limitations as to the scope of the present disclosure.
Referring back to Fig. 1, as described above, the first pooling layer 130 and/or the second pooling layer 150 as shown in Fig. 1 may employ the pooling method as proposed herein to improve the performance of the convolutional neural network 100. Similarly with traditional convolutional neutral networks, a proposed Fourier Transform based pooling operation (FT-pooling operation) may be followed by a convolution operation.
As shown in Fig. 1, an H×W×3 color image 110 may be inputted to the convolutional neutral network 100. In other words, the color image 110 may have H rows, W columns, and 3 channels. The color image 110 may be convolved with a a1×b1×3 filter in a first convolutional operation 115 and the resulting first convolutional layer 120 may contain k1 channels (feature maps) , with each channel being of size of H×W. The first convolutional layer 120 may be processed by a first FT-pooling operation 125. Because the pooling size may be n×n, the resulting first pooling layer 130 may have k2 = n×k1 channels and each channel is of size (H/n) × (W/n) .
Then, the first pooling layer 130 may be convolved with a a2×b2×k2 filter in a second convolutional operation 135 and the resulting second convolutional layer 140 may contain k3 channels (feature maps) , with each channel being of size of (H/n) × (W/n) . Subsequently, the second convolutional layer 140 may be processed by a second FT-pooling operation 145. Because the pooling size is n×n, the resulting second pooling layer 150 may have k4 = n×k3 channels and each channel is of size (H/n2) × (W/n2) . This  process may continue until a predefined number of convolutional and pooling layers is obtained. Finally, a classification 160 may be conducted at the last layer.
By employing the Fourier Transform based pooling method in accordance with example embodiments of the present disclosure, the performance of the convolutional neutral network 100 can be greatly improved, because the Fourier Transform based pooling method is completely shift-invariant and does not lose useful information.
Fig. 4 shows a block diagram illustrating an apparatus 400 implemented at a pooling layer of a convolutional neutral network in accordance with an embodiment of the present disclosure. The apparatus 400 may be operable to implement the example method 200 described with reference to Fig. 2 and possibly any other processes or methods. It is to be understood that the method 200 is not necessarily implemented by the apparatus 400. At least some steps of the method 200 may be performed by one or more other entities.
Particularly, as illustrated in Fig. 4, the apparatus 400 includes a receiving unit 401, a dividing unit 402, a calculating unit 403, and a reshaping unit 404. The receiving unit 401 is configured to receive a channel including a two-dimensional array of elements, the elements being associated with a feature extracted from data being processed by the convolutional neural network. The dividing unit 402 is configured to divide the channel into regions, each region including a sub-array of the array. The calculating unit 403 is configured to calculate magnitudes of Fourier transformed elements of each region. The reshaping unit 404 is configured to reshape the calculated magnitudes of respective regions into respective vectors to complete pooling of the feature.
In some embodiments, the receiving unit 401 may further be configured to receive the channel from a convolutional layer of the convolutional neural network. The dividing unit 402 may further be configured to divide the channel into non-overlapping regions. In some embodiments, each region may have a same quantity of elements. In some other embodiments, each region may have n×n elements, where n is an integer equal to or greater than two.
In some embodiments, the reshaping unit 404 may further be configured to arrange the calculated magnitudes along the channel dimension. In some other embodiments, the reshaping unit 404 may further be configured to arrange the calculated  magnitudes of the respective regions in a same order.
In some embodiments, the data is associated with at least one of image recognition, acoustic recognition, object detection, Advanced Driver Assistance Systems (ADAS) , self-driving cars, Optical Character Recognition (OCR) , face recognition, large-scale image classification, and Human Machine Interaction (HCI) .
Fig. 5 shows a block diagram illustrating an apparatus 500 implemented at a pooling layer of a convolutional neutral network in accordance with another embodiment of the present disclosure. The apparatus 500 may be operable to implement the example method 200 described with reference to Fig. 2 and possibly any other processes or methods. It is to be understood that the method 200 is not necessarily implemented by the apparatus 500. At least some steps of the method 200 may be performed by one or more other entities.
The apparatus 500 may include at least one processor 501, such as a data processor (DP) , and at least one memory 502 coupled to the processor 501. The memory 502 may be non-transitory machine readable storage medium and it may store a program (PROG) 503. The PROG 503 may include instructions that, when executed on the associated processor 501, enable the apparatus 500 to operate in accordance with the embodiments of the present disclosure, for example to perform the method 200. The embodiments herein may be implemented by computer software executable by the processor 501 of the apparatus 500, or by hardware, or by a combination of software and hardware. A combination of the at least one processor 501 and the at least one memory 502 may form processing means 510 adapted to implement various embodiments of the present disclosure, for example, method 200.
The memory 502 may be of any type suitable to the local technical environment and may be implemented using any suitable data storage technology, such as semiconductor based memory devices, magnetic memory devices and systems, optical memory devices and systems, fixed memory and removable memory, as non-limiting examples. While only one memory 502 is shown in the apparatus 500, there may be several physically distinct memory modules in the apparatus 500. The processor 501 may be of any type suitable to the local technical environment, and may include one or more of general purpose computers, special purpose computers, microprocessors, digital  signal processors (DSPs) and processors based on multicore processor architecture, as non-limiting examples. The apparatus 500 may have multiple processors, such as an application specific integrated circuit chip that is slaved in time to a clock which synchronizes the main processor.
Generally, various embodiments of the present disclosure may be implemented in hardware or special purpose circuits, software, logic or any combination thereof. Some aspects may be implemented in hardware, while other aspects may be implemented in firmware or software which may be executed by a controller, microprocessor or other computing device. While various aspects of embodiments of the present disclosure are illustrated and described as block diagrams, flowcharts, or using some other pictorial representation, it will be appreciated that the blocks, apparatus, systems, techniques or methods described herein may be implemented in, as non-limiting examples, hardware, software, firmware, special purpose circuits or logic, general purpose hardware or controller or other computing devices, or some combination thereof.
By way of example, embodiments of the present disclosure can be described in the general context of machine-executable instructions, such as those included in program modules, being executed in a device on a target real or virtual processor. Generally, program modules include routines, programs, libraries, objects, classes, components, data structures, or the like that perform particular tasks or implement particular abstract data types. The functionality of the program modules may be combined or split between program modules as desired in various embodiments. Machine-executable instructions for program modules may be executed within a local or distributed device. In a distributed device, program modules may be located in both local and remote storage media.
Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowcharts and/or block diagrams to be implemented. The program code may execute entirely on a machine, partly on the machine, as a stand-alone software package,  partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine readable medium may be any tangible medium that rnay contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine readable medium may be a machine readable signal medium or a machine readable storage medium. A machine readable medium may include but not limited to an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of the machine readable storage medium would include an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM) , a read-only memory (ROM) , an erasable programmable read-only memory (EPROM or Flash memory) , an optical fiber, a portable compact disc read-only memory (CD-ROM) , an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are contained in the above discussions, these should not be construed as limitations on the scope of the present disclosure, but rather as descriptions of features that may be specific to particular embodiments. Certain features that are described in the context of separate embodiments may also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment may also be implemented in multiple embodiments separately or in any suitable sub-combination.
Although the present disclosure has been described in language specific to structural features and/or methodological acts, it is to be understood that the present disclosure defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims (18)

  1. A method implemented at a pooling layer of a convolutional neural network, comprising:
    receiving a channel including a two-dimensional array of elements, the elements being associated with a feature extracted from data being processed by the convolutional neural network;
    dividing the channel into regions, each region including a sub-array of the array;
    calculating magnitudes of Fourier transformed elements of each region; and
    reshaping the calculated magnitudes of respective regions into respective vectors to complete pooling of the feature.
  2. The method of claim 1, wherein the receiving comprises:
    receiving the channel from a convolutional layer of the convolutional neural network.
  3. The method of claim 1, wherein the dividing comprises:
    dividing the channel into non-overlapping regions.
  4. The method of claim 3, wherein each region has a same quantity of elements.
  5. The method of claim 4, wherein each region has n×n elements, and n is an integer equal to or greater than two.
  6. The method of claim 1, wherein the reshaping comprises:
    arranging the calculated magnitudes along the channel dimension.
  7. The method of claim 6, wherein the arranging comprises:
    arranging the calculated magnitudes of the respective regions in a same order.
  8. The method of claim 1, wherein the data is associated with at least one of image recognition, acoustic recognition, object detection, advanced driver assistance system,  self-driving cars, optical character recognition, face recognition, large-scale image classification, and human machine interaction.
  9. An apparatus implemented at a pooling layer of a convolutional neural network, comprising:
    at least one processor; and
    at least one memory including computer program code; wherein
    the at least one memory and the computer program code are configured to, with the at least one processor, cause the apparatus at least to:
    receive a channel including a two-dimensional array of elements, the elements being associated with a feature extracted from data being processed by the convolutional neural network;
    divide the channel into regions, each region including a sub-array of the array;
    calculate magnitudes of Fourier transformed elements of each region; and
    reshape the calculated magnitudes of respective regions into respective vectors to complete pooling of the feature.
  10. The apparatus of claim 9, wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus to:
    receive the channel from a convolutional layer of the convolutional neural network.
  11. The apparatus of claim 9, wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus to:
    divide the channel into non-overlapping regions.
  12. The apparatus of claim 11, wherein each region has a same quantity of elements.
  13. The apparatus of claim 12, wherein each region has n×n elements, and n is an  integer equal to or greater than two.
  14. The apparatus of claim 9, wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus to:
    reshape the calculated magnitudes of respective regions by arranging the calculated magnitudes along the channel dimension.
  15. The apparatus of claim 14, wherein the at least one memory and the computer program code are further configured to, with the at least one processor, cause the apparatus to:
    arrange the calculated magnitudes of the respective regions in a same order.
  16. The apparatus of claim 9, wherein the data is associated with at least one of image recognition, acoustic recognition, object detection, advanced driver assistance system, self-driving cars, optical character recognition, face recognition, large-scale image classification, and human machine interaction.
  17. An apparatus comprising means for performing a method according to any of Claims 1 to 8.
  18. A computer program product comprising at least one computer readable non-transitory memory medium having program code stored thereon, wherein the program code, when executed by an apparatus, causes the apparatus to perform a method according to any of Claims 1 to 8.
PCT/CN2016/086152 2016-06-17 2016-06-17 Method and apparatus for convolutional neural networks WO2017214968A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/086152 WO2017214968A1 (en) 2016-06-17 2016-06-17 Method and apparatus for convolutional neural networks

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/086152 WO2017214968A1 (en) 2016-06-17 2016-06-17 Method and apparatus for convolutional neural networks

Publications (1)

Publication Number Publication Date
WO2017214968A1 true WO2017214968A1 (en) 2017-12-21

Family

ID=60663846

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/086152 WO2017214968A1 (en) 2016-06-17 2016-06-17 Method and apparatus for convolutional neural networks

Country Status (1)

Country Link
WO (1) WO2017214968A1 (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108427921A (en) * 2018-02-28 2018-08-21 辽宁科技大学 A kind of face identification method based on convolutional neural networks
CN108446666A (en) * 2018-04-04 2018-08-24 平安科技(深圳)有限公司 The training of binary channels neural network model and face comparison method, terminal and medium
CN109146058A (en) * 2018-07-27 2019-01-04 中国科学技术大学 With the constant ability of transformation and the consistent convolutional neural networks of expression
CN110275147A (en) * 2018-03-13 2019-09-24 中国人民解放军国防科技大学 Human behavior micro-Doppler classification and identification method based on migration depth neural network
WO2019184657A1 (en) * 2018-03-30 2019-10-03 腾讯科技(深圳)有限公司 Image recognition method, apparatus, electronic device and storage medium
CN110751181A (en) * 2019-09-23 2020-02-04 华中科技大学 Target identification method based on sum pooling characteristics
WO2020087742A1 (en) * 2018-11-02 2020-05-07 深圳云天励飞技术有限公司 Processing element, apparatus and method used for implementing convolution operation
CN111330255A (en) * 2020-01-16 2020-06-26 北京理工大学 Amazon chess-calling generation method based on deep convolutional neural network
CN111645073A (en) * 2020-05-29 2020-09-11 武汉理工大学 Robot visual semantic navigation method, device and system
CN115047296A (en) * 2022-08-15 2022-09-13 四川轻化工大学 Power distribution network fault section positioning method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140288928A1 (en) * 2013-03-25 2014-09-25 Gerald Bradley PENN System and method for applying a convolutional neural network to speech recognition
CN104463172A (en) * 2014-12-09 2015-03-25 中国科学院重庆绿色智能技术研究院 Face feature extraction method based on face feature point shape drive depth model
US20160104056A1 (en) * 2014-10-09 2016-04-14 Microsoft Technology Licensing, Llc Spatial pyramid pooling networks for image processing

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140288928A1 (en) * 2013-03-25 2014-09-25 Gerald Bradley PENN System and method for applying a convolutional neural network to speech recognition
US20160104056A1 (en) * 2014-10-09 2016-04-14 Microsoft Technology Licensing, Llc Spatial pyramid pooling networks for image processing
CN104463172A (en) * 2014-12-09 2015-03-25 中国科学院重庆绿色智能技术研究院 Face feature extraction method based on face feature point shape drive depth model

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108427921A (en) * 2018-02-28 2018-08-21 辽宁科技大学 A kind of face identification method based on convolutional neural networks
CN110275147B (en) * 2018-03-13 2022-01-04 中国人民解放军国防科技大学 Human behavior micro-Doppler classification and identification method based on migration depth neural network
CN110275147A (en) * 2018-03-13 2019-09-24 中国人民解放军国防科技大学 Human behavior micro-Doppler classification and identification method based on migration depth neural network
WO2019184657A1 (en) * 2018-03-30 2019-10-03 腾讯科技(深圳)有限公司 Image recognition method, apparatus, electronic device and storage medium
US11609968B2 (en) 2018-03-30 2023-03-21 Tencent Technology (Shenzhen) Company Ltd Image recognition method, apparatus, electronic device and storage medium
CN108446666A (en) * 2018-04-04 2018-08-24 平安科技(深圳)有限公司 The training of binary channels neural network model and face comparison method, terminal and medium
CN109146058A (en) * 2018-07-27 2019-01-04 中国科学技术大学 With the constant ability of transformation and the consistent convolutional neural networks of expression
CN109146058B (en) * 2018-07-27 2022-03-01 中国科学技术大学 Convolutional neural network with transform invariant capability and consistent expression
WO2020087742A1 (en) * 2018-11-02 2020-05-07 深圳云天励飞技术有限公司 Processing element, apparatus and method used for implementing convolution operation
CN110751181A (en) * 2019-09-23 2020-02-04 华中科技大学 Target identification method based on sum pooling characteristics
CN111330255B (en) * 2020-01-16 2021-06-08 北京理工大学 Amazon chess-calling generation method based on deep convolutional neural network
CN111330255A (en) * 2020-01-16 2020-06-26 北京理工大学 Amazon chess-calling generation method based on deep convolutional neural network
CN111645073A (en) * 2020-05-29 2020-09-11 武汉理工大学 Robot visual semantic navigation method, device and system
CN115047296A (en) * 2022-08-15 2022-09-13 四川轻化工大学 Power distribution network fault section positioning method
CN115047296B (en) * 2022-08-15 2022-10-25 四川轻化工大学 Power distribution network fault section positioning method

Similar Documents

Publication Publication Date Title
WO2017214968A1 (en) Method and apparatus for convolutional neural networks
US11734572B2 (en) Spatial transformer modules
US10460211B2 (en) Image classification neural networks
TW202318227A (en) Matrix-vector processing system and method for performing vector reductions
US20150278634A1 (en) Information processing apparatus and information processing method
JP7286013B2 (en) Video content recognition method, apparatus, program and computer device
US10922785B2 (en) Processor and method for scaling image
WO2021057309A1 (en) Tracked target determination method and related device
US20170132759A1 (en) Method for upscaling an image and apparatus for upscaling an image
CN109313809B (en) Image matching method, device and storage medium
US11568323B2 (en) Electronic device and control method thereof
KR102509823B1 (en) Method and apparatus for estimating plane based on grids
EP3779863A1 (en) Techniques for upscaling images generated with undetermined downscaling kernels
CN107146245B (en) Image matching method and device
US9639961B2 (en) Image processing
JP2019504430A (en) Image processing method and device
US9953394B2 (en) Methods and systems for designing correlation filter
CN111008992A (en) Target tracking method, device and system and storage medium
WO2023184181A1 (en) Trajectory-aware transformer for video super-resolution
CN114758145A (en) Image desensitization method and device, electronic equipment and storage medium
WO2020237674A1 (en) Target tracking method and apparatus, and unmanned aerial vehicle
KR102085334B1 (en) Apparatus and method for recognizing turned object
CN115620299B (en) Image recognition method and device, computer equipment and storage medium
US20130106887A1 (en) Texture generation using a transformation matrix
US20130106851A1 (en) Tessellation Cache for Object Rendering

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16905083

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16905083

Country of ref document: EP

Kind code of ref document: A1