WO2016141282A1 - Convolutional neural network with tree pooling and tree feature map selection - Google Patents

Convolutional neural network with tree pooling and tree feature map selection Download PDF

Info

Publication number
WO2016141282A1
WO2016141282A1 PCT/US2016/020869 US2016020869W WO2016141282A1 WO 2016141282 A1 WO2016141282 A1 WO 2016141282A1 US 2016020869 W US2016020869 W US 2016020869W WO 2016141282 A1 WO2016141282 A1 WO 2016141282A1
Authority
WO
WIPO (PCT)
Prior art keywords
tree
neural network
feature map
recited
convolutional neural
Prior art date
Application number
PCT/US2016/020869
Other languages
French (fr)
Inventor
Zhuowen Tu
Chen-Yu Lee
Original Assignee
The Regents Of The University Of California
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by The Regents Of The University Of California filed Critical The Regents Of The University Of California
Publication of WO2016141282A1 publication Critical patent/WO2016141282A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • G06F18/24137Distances to cluster centroïds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks

Definitions

  • FIG. 1 depicts an example of a convolutional neural network, in accordance with some example embodiments
  • the system 200 may process the input data by at least utilizing a trained convolutional neural network having at least one of a tree feature map selection layer and a tree pooling layer.
  • the system 200 may utilize the trained convolutional neural network 100 to process the input data.
  • the trained convolutional neural network 100 may include at least one of the tree feature map selection layer 130 and the tree pooling layer 140.
  • the system 200 may provide, as an output, a result of the processing performed by the trained convolutional neural network.
  • the result of the processing performed by the trained convolutional neural network 100 may be a classification of the input data.
  • the system 200 may provide the output directly to a user or via the device 220.

Abstract

In one aspect, there is provided a method for training a convolutional neural network. The method may include: receiving training data; utilizing the training data to train a convolutional neural network comprising a tree pooling layer, wherein the tree pooling layer applies a soft decision tree to generate one or more pooled feature map; providing a trained convolutional neural network comprising a tree pooling layer. Related systems, methods, and articles of manufacture are also disclosed.

Description

CONVOLUTIONAL NEURAL NETWORK WITH TREE POOLING AND TREE
FEATURE MAP SELECTION
CROSS REFERENCE TO RELATED APPLICATIONS
[0001] This application claims priority to U.S. Provisional Patent Application No. 62/128,393 filed March 4, 2015, entitled "FOREST CONVOLUTIONAL NEURAL NETWORKS" and U.S. Provisional Patent Application No. 62/222,676, filed September 23, 2015, entitled "GENERALIZING POOLING FUNCTIONS IN CONVOLUTIONAL NEURAL NETWORKS," the contents of both applications are hereby incorporated by reference in their entirety.
STATEMENT OF GOVERNMENT SPONSORED SUPPORT
[0002] Certain aspects of the present disclosure were developed with U.S. Government Support under Grant No. NSF IIS-1360566 and NSF IIS-1360568 awarded by the National Science Foundation. The U.S. Government has certain rights in the subject matter of the present disclosure.
TECHNICAL FIELD
[0003] The subject matter disclosed herein relates to machine learning and more specifically to neural networks.
BACKGROUND
[0004] One of the foremost objectives in the development of artificial intelligence (AI) is to create a machine analog of the human brain. Ideally, AI should exhibit the ability to process complex data and evolve through learning. A convolutional neural network is one type of machine learning architecture that endeavors to emulate human perception and cognition. For example, a convolutional neural network can be used to perform tasks such as facial recognition, image search, speech recognition and translation, disease classification, and bio-marker discovery. A convolutional neural network can include multiple convolutional layers. At each convolutional layer, individual convolutional kernels are applied to data (e.g., image, speech, genome) to yield a number of feature maps. A convolutional kernel operates, in some respects, operates like a filter that detects a specific feature (e.g., lines, shapes, objects) in the data. As such, each feature map associated with the data may depict one or more occurrences of one particular feature in the data. The convolutional neural network can also include multiple pooling layers that alternate with the convolutional layers. At a pooling layer, each portion of a feature map from a preceding convolutional layer is subject to a pooling function. For example, a maximum function or an average function is typically applied to every portion of the feature map. This generates a pooled feature map that is a subsample of that feature map.
SUMMARY
[0005] Methods, systems, and apparatus, including computer program products, are provided for training a convolutional neural network having a tree pooling layer that applies a soft decision tree to generate pooled feature maps. In some example embodiments, there is provided a method that includes receiving training data; utilizing the training data to train a convolutional neural network comprising a tree pooling layer, wherein the tree pooling layer applies a soft decision tree to generate one or more pooled feature map; and providing a trained convolutional neural network comprising a tree pooling layer [0006] In some variations, one or more of the features disclosed herein including the following features can optionally be included in any feasible combination. The convolutional neural network may include a convolutional layer configured to generate a plurality of feature maps based on the training data. The convolutional layer may generate a feature map by applying a convolutional kernel to the training data. The convolutional kernel may be adapted to detect a feature in the training data. The feature map may depict one or more occurrences of the feature in the training data.
[0007] The convolutional neural network may further include a tree feature map selection layer configured to generate at least one selected feature map based on on the plurality of feature maps generated at the convolutional layer. The tree feature map selection layer may be configured to apply a soft decision tree to generate the at least one selected feature map. The soft decision tree may combine two or more of the plurality of the feature maps into the at least one selected feature map. The soft decision tree combines the two or more feature maps according to a mixing proportion which may indicate a portion of each of the two or more feature maps to include in the selected feature map.
[0008] The tree pooling layer may be configured to apply the soft decision tree to each portion of the selected feature map to generate a corresponding portion of the pooled feature map. The soft decision tree may include a plurality of leaf nodes and decision nodes, wherein each leaf node corresponds to a pooling filter to apply to a portion of the selected feature map, and wherein a decision node applies a soft splitting function that combines an output from each child node of that decision node according to a mixing proportion. The pooling filter may include one of a maximum operation, an average operation, and a stochastic operation. The mixing proportion may indicate a portion of the output from each child node to include in a combination of the outputs from the child nodes.
[0009] The convolutional neural network may further include an output layer configured to generate a training output based on the one or more pooled feature maps. Training the convolutional neural network may include determining, by backpropagation and gradient descent, one or more optimizations based on an error associated with the training output. Providing the trained convolutional neural network may include sending and/or storing the trained convolutional neural network.
[00010] Methods, systems, and apparatus, including computer program products, are provided for utilizing a trained convolutional neural network having a tree pooling layer that applies a soft decision tree to generate pooled feature maps. In some example embodiments, there is provided a method that includes receiving input data; processing the input data by utilizing a trained convolutional neural network comprising a tree pooling layer, wherein the tree pooling layer applies a soft decision tree to generate one or more pooled feature maps; and providing, as an output, a result of the processing performed by the trained convolutional neural network.
[00011] In some variations, one or more of the features disclosed herein including the following features can optionally be included in any feasible combination. The trained convolutional neural network may further include a convolutional layer, wherein the convolutional layer is configured to generate a plurality of feature maps based on the input data, and wherein the convolutional layer generates each of the plurality of feature maps by applying a convolutional kernel to the input data. The trained convolutional neural network may further include a tree feature map selection layer, wherein the tree feature map selection layer is configured to apply a soft decision tree to generate at least one selected feature map, and wherein the soft decision tree generates the at least one feature map by combining two or more of the plurality of feature maps generated at the convolutional layer.
DESCRIPTION OF THE DRAWINGS
[00012] The accompanying drawings, which are incorporated in and constitute a part of this specification, show certain aspects of the subject matter disclosed herein and, together with the description, help explain some of the principles associated with the disclosed implementations. In the drawings,
[00013] FIG. 1 depicts an example of a convolutional neural network, in accordance with some example embodiments;
[00014] FIG. 2 depicts a system diagram illustrating a system, in accordance with some example embodiments;
[00015] FIG. 3 depicts a flowchart illustrating a process for training a convolutional neural network, in accordance with some example embodiments;
[00016] FIG. 4 depicts a flowchart illustrating a process for training a convolutional neural network, in accordance with some example embodiments;
[00017] FIG. 5 depicts an example of a soft decision tree, in accordance with some embodiments; and
[00018] FIG. 6 depicts a flowchart illustrating a process for utilizing a trained convolutional neural network, in accordance with some example embodiments.
DETAILED DESCRIPTION [00019] In some example embodiments, a convolutional neural network may be configured to include a tree feature map selection layer where feature maps from a preceding convolutional layer may be combined to generate selected feature maps. At the tree feature map selection layer, soft decision trees may be applied to make "soft" selections across multiple feature maps. A "soft" selection across two or more feature maps may combine the feature maps into a single selected feature map. As such, a selected feature map may represent a combination of two or more feature maps from the preceding convolutional layer.
[00020] In some example embodiments, a convolutional neural network may be configured to include a tree pooling layer at which multiple pooling filters may be applied to a feature map (e.g., from a preceding convolutional layer) or a selected feature map (e.g., from a preceding tree feature map selection layer). At the tree pooling layer, soft decision trees may be applied to make "soft" decisions over outputs from the different pooling filters. Each of the pooling filters may apply a different pooling operation to subsample a portion of the feature map. A "soft" decision may combine the outputs from two or more pooling filters to generate a corresponding portion of a pooled feature map. As such, the pooled feature map may represent a subsampling of the corresponding feature map.
[00021] In some example embodiments, a convolutional neural network having a tree feature map selection layer and/or a tree pooling layer may be trained. For example, a convolutional neural network may be trained in a supervised learning mode including backpropagation and gradient descent.
[00022] In some example embodiments, a trained convolutional neural network having a tree feature map selection layer and/or a tree pooling layer may be used to process input data. For example, a trained convolutional neural network may be used to process image, speech, genomic data, and/or any other type of data. The output of a trained convolutional neural network may be, for example, a classification of the input data.
[00023] FIG. 1 depicts an example of a convolutional neural network 100, in accordance with some example embodiments. Referring to FIG. 1, the convolutional neural network 100 may include a plurality of layers including a convolutional layer 120, a tree feature map selection layer 130, a tree pooling layer 140, and an output layer 150.
[00024] In some example embodiments, the convolutional neural network 100 receives input data 110 at the convolutional layer 120. At the convolutional layer 120, a plurality of convolutional kernels may be applied to the input data 110 including a first convolutional kernel 122 and a second convolutional kernel 124. For example, the first convolutional kernel 122 may be applied to the input data 110 to generate a first feature map 126 and the second convolutional kernel 124 may be applied to the input data 110 to generate a second feature map 128. A different number of feature maps may be generated at the convolutional layer 120 without departing from the scope of the present disclosure. For instance, additional convolutional kernels may be applied to the input data 110 at the convolutional layer 120 to generate additional feature maps.
[00025] Each convolutional kernel may process data like a filter that is adapted to detect a specific feature in the input data 110. For example, where the input data 110 represents an image, the first convolutional kernel 122 may be adapted to detect horizontal lines in the input data 110 while the second convolutional kernel 122 may be adapted to detect vertical lines in the input data 110. As such, the first feature map 126 may depict all instances of horizontal lines in the image and the second feature map 128 may depict all instances of vertical lines in the image. The first convolutional kernel 122 and the second convolutional kernel 122 may also be adapted to detect more complex features in the input data 110 including shapes and objects (e.g., facial features).
[00026] At the tree feature map selection layer 130, a first soft decision tree 132 may be applied to the feature maps generated at the convolutional layer 120 including the first feature map 126 and the second feature map 128. In some example embodiments, the soft decision tree 132 makes a "soft" selection across the first feature map 126 and the second feature map 128. The "soft" selection may combine the first feature map 126 and the second feature map 128 into a selected feature map 134. The first feature map 126 and the second feature map 128 may be combined according to a mixing proportion. The mixing proportion may indicate a portion (e.g., percentage) of each of the first feature map 126 and the second feature map 128 to include in the selected feature map 134. In some example embodiments, the mixing proportion may vary based on the first feature map 126 and the second feature map 128.
[00027] A second soft decision tree 142 may be applied to the selected feature map 134 at the tree pooling layer 140. The second soft decision tree 142 may be applied to individual portions (e.g., a first portion 134A) of the selected feature map 134 to generate corresponding portions (e.g., a second portion 144A) in a pooled feature map 144. Different pooling filters (e.g., maximum pooling filter, average pooling filter, stochastic pooling filter, and/or the like) may be applied to the first portion 134A. The second soft decision tree 142 may generate the second portion 144 A by combining the outputs from the different pooling filters according to a mixing proportion that may vary based on the first portion 134A of the selected feature map 134. In some example embodiments, the pooled feature map 144 may be a sub-sample of the selected feature map 134 that maintains salient features from the selected feature map 134 while mitigating any noise. As such, the selected feature map 134 may provide a more robust and compact representation of the selected feature map 134.
[00028] The output layer 150 may provide an output 152 that is generated based on a plurality of pooled feature maps including the pooled feature map 144. In some example embodiments, the convolutional neural network 100 may be trained based on the output 152. For example, training the convolutional neural network 100 may include generating a loss function representative of an error associated with the output 152 as back propagated from the output layer 150 through to the convolutional layer 120. Training the convolutional neural network 100 may further include determining one or more optimizations by performing gradient descent to minimize the loss function.
[00029] The convolutional neural network 100 may include additional and/or different layers without departing from the scope of the present disclosure. For example, in some example embodiments, the convolutional neural network 100 may include additional tree feature map selection layers and/or tree pooling layers. The convolutional neural network 100 may optionally include one or more conventional pooling layers in addition to one or more tree pooling layers (e.g., the tree pooling layer 140) without departing from the scope of the present disclosure. In some embodiments where the convolutional neural network 100 includes both conventional and tree pooling layers, tree pooling layers (e.g., the tree pooling layer 140) may occupy lower levels of the convolutional neural network 100 relative to conventional tree pooling layers which may occupy higher levels of the convolutional neural network 100.
[00030] FIG. 2 depicts a system diagram illustrating a system 200, in accordance with some example embodiments. Referring to FIGS. 1-2, the convolutional neural network 100 may be implemented using the system 200. In some example embodiments, the system 200 may be realized in digital electronic circuitry, integrated circuitry, specially designed application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs) computer hardware, firmware, software, and/or combinations thereof.
[00031] In some example embodiments, the convolutional neural network system 200 may include one or more processors that implement a plurality of modules including a convolutional module 210, a tree feature map selection module 212, a tree pooling module 214, and an output module 216. The convolutional neural network 200 may include additional and/or different modules without departing from the scope of the present disclosure.
[00032] The convolutional module 210 may be configured to implement one or more convolutional layers (e.g., the convolutional layer 120) of the convolutional neural network 100. For example, the convolutional module 210 may receive the input data 110 and apply a plurality of convolutional kernels (e.g., the first convolutional kernel 122 and the second convolutional kernel 124) to generate a plurality of feature maps (e.g., the first feature map 126 and the second feature map 128).
[00033] The tree feature map selection module 212 may be configured to implement one or more tree feature map selection layers (e.g., the tree feature map selection layer 130) of the convolutional neural network 100. For example, the tree feature map selection module 212 may apply one or more soft decision trees (e.g., the first soft decision tree 132) to a plurality of feature maps (e.g., the first feature map 126 and the second feature map 128) to generate one or more selected feature maps (e.g., the selected feature map 134).
[00034] The tree pooling module 214 may be configured to implement one or more tree pooling layers (e.g., the tree pooling layer 140) of the convolutional neural network 100. For example, the tree pooling module 214 may apply one or more soft decision trees (e.g., the second soft decision tree 142) to portions (e.g., the first portion 134A) of each selected feature map (e.g., the selected feature map 134) to generate a corresponding pooled feature map (e.g., the pooled feature map 144).
[00035] The output module 216 may be configured to implement the output layer (e.g., the output layer 150) of the convolutional neural network 100. For example, the output module 216 may provide an output (e.g., the output 152) based on one or more pooled feature maps (e.g., the pooled feature map 144).
[00036] In some example embodiments, the system 200 may be configured to communicate with a device 220 (e.g., a personal computer, workstation, smartphone) via a wired and/or wireless network 230. The device 220 may provide a user interface for interacting with the system 200 including to train the system 200 and/or to utilize the system 200 to process input data. For instance, a user may provide, via the device 220, training data, input data, and/or hyper parameters (e.g., stride size in applying each convolutional kernel) for the system 200. The user may further receive outputs (e.g., the output 152) from the system 200 via the device 220.
[00037] FIG. 3 depicts a flowchart illustrating an example of a process 300 for training a convolutional neural network, in accordance with some example embodiments. Referring to FIGS. 1-3, the system 200 may perform the process 300 to train the convolutional neural network 100, which may have a tree feature map selection layer and/or a tree pooling layer.
[00038] At 302, the system 200 may receive training data. For example, the system 200 may receive training data directly from a user or from the device 220. In some example embodiments, training data may include at least one training input and a correct output corresponding to that training input. [00039] At 304, the system 200 may utilize the training data to train a convolutional neural network having a tree feature map selection layer and/or a tree pooling layer. For example, the system 200 may train the convolutional neural network 100 by using the convolutional neural network 100 to process a plurality of training inputs. For each training input, an error associated with the training output of the convolutional neural network 100 (e.g., the output 152) relative to the corresponding correct output may be back propagated through the convolutional neural network 100 to generate a loss function. Gradient descent may be performed in order to determine one or more optimizations to the convolutional neural network 100 which would minimize the loss function. In some example embodiments, training the convolutional neural network 100 may include using the convolutional neural network 100 to process any appropriate or desired number of training inputs. As such, the system 200 may perform multiple iterations of optimizations (e.g., adjustments of weights, biases, and/or parameters) in order to generate a trained convolutional neural network 100.
[00040] In some example embodiments, soft decision trees may be applied at both the tree feature map selection layer and the tree pooling layer. For example, the first soft decision tree 132 may be applied at the tree feature map selection layer 130 of the convolutional neural network 100. Meanwhile, the second soft decision tree 142 may be applied at the tree pooling layer 140 of the convolutional neural network 100. The application of soft decision trees may enable training of the convolutional neural network 100 in a supervised learning mode including backpropagation and gradient descent.
[00041] A conventional decision tree makes "hard" decisions in accordance with the following splitting function, which provides a discrete, non-continuous selection between different responses according to the following: s(tm) G {0, 1}
[00042] This type of splitting function provides a "hard" decision and is not differentiable. As such, a conventional decision tree is incompatible with a training paradigm that employs techniques such as backpropagation and gradient descent. By contrast, a soft decision tree makes "soft" decisions in accordance with splitting functions that provide a continuous and differentiable selection between different responses. As such, the convolutional neural network 100, which has at least one tree feature map selection layer or tree pooling layer, may be trained in a supervised learning mode including backpropagation and gradient descent.
[00043] At 306, a trained convolutional neural network having one or more of a tree feature map selection layer and tree pooling layer may be provided. For example, a trained convolutional neural network 100 may be deployed to process actual input data and provide an output (e.g., classification of the input data). In some example embodiments, the trained convolutional neural network may be provided in any appropriate or desired manner including computer software, dedicated circuitry (e.g., ASCIs), and/or over a cloud platform.
[00044] The process 300 may include additional and/or different operations than shown without departing from the scope of the present disclosure. For example, one or more operations of the process 300 may be repeated and/or omitted without departing from the scope of the present disclosure.
[00045] FIG. 4 depicts a flowchart illustrating an example of a process 400 for training a convolutional neural network, in accordance with some example embodiments. Referring to FIGS. 1-2 and 4, in some example embodiments, the process 400 may be performed by the system 200 to train the convolutional neural network 100 and may implement operation 304 of the process 300. [00046] At 402, the system 200 may generate a plurality of feature maps by applying one or more convolutional kernels to training input data. For example, the system 200 (e.g., the convolutional module 210) may apply the first convolutional kernel 122 to the training input data to generate the first feature map 126. The system 200 may also apply the second convolutional kernel 124 to the training input data to generate the second feature map 126. In some example embodiments, the system 200 may apply any number of convolutional kernels to the training input data to generate the plurality of feature maps.
[00047] At 404, the system 200 may generate at least one selected feature map by applying a soft decision tree to at least some of the plurality of feature maps. For example, the system 200 (e.g., the tree feature map selection module 212) may apply the first soft decision tree 132 to make a "soft" selection over the first feature map 126 and the second feature map 128. Specifically, the first soft decision tree 132 may combine the first feature map 126 and the second feature map 128 to generate the selected feature map 134. The soft decision tree 132 may apply a "soft" splitting function at a decision node of the first soft decision tree 132 that combines the first feature map 126 and the second feature map 128 according to a mixing proportion. The "soft" splitting function may be a sigmoid function that determines the mixing proportion based on the first feature map 126 and the second feature map 128. The mixing proportion may indicate a portion (e.g., percentage) of the first feature map 126 and a portion of the second feature map 128 to include in the selected feature map 134.
[00048] At 406, the system 200 may generate a pooled feature map by applying a soft decision tree to each portion of a feature map. For example, the system 200 (e.g., the tree pooling module 214) may apply the second soft decision tree 142 to the first portion 134A of the selected feature map 134 to generate the corresponding second portion 144 A in the pooled feature map 144. In some example embodiments, the pooled feature map 144 may provide a more robust and compact representation of the selected feature map 134.
[00049] Different pooling filters may be applied to the first portion 134 A. In some example embodiments, the second soft decision tree 142 may make a "soft" decision over the outputs from the different pooling filters. For example, the second soft decision tree 142 may apply a "soft" splitting function at a decision node to combine the outputs from the different pooling filter at each leaf node. The outputs from the different pooling filters may be combined according to a mixing proportion. The mixing proportion may indicate a portion (e.g., percentage) of the output from each pooling filter to include in the combination of the outputs. Meanwhile, the "soft" splitting function may be a sigmoid function that determines the mixing proportion based on the portion of the selected feature map 134 being pooled (e.g., the first portion 134A).
[00050] At 408, the system 200 may determine a training output based at least in part on one or more pooled feature maps. For example, the system 200 (e.g., the output module 216) may generate the output 152 based at least in part on the pooled feature map 144. In some example embodiments, the training out (e.g., the output 152) may exhibit an error relative to the correct output associated with the training input data.
[00051] At 410, the system 200 may determine one or more optimizations based at least in part on the training output. In some example embodiments, the system 200 may determine optimizations to the convolutional neural network 100 based on the error associated with the training output (e.g., the output 152). The optimizations may include adjustments to parameters applied at the tree feature selection map layer 130 and the tree pooling layer 140. Both the tree feature selection map 130 and the tree pooling layer 140 apply soft decision trees (e.g., the first soft decision tree 132 and the second soft decision tree 142). As such, the system 200 may determine optimizations to the convolutional neural network 100 at both the tree feature selection map layer 130 and the tree pooling layer 140 using techniques such as backpropagation and gradient descent.
[00052] As shown in FIG. 1, the first soft decision tree 132 at the tree feature map selection layer 130 may receive a set of feature maps from the convolutional layer 120. The set of feature maps may be denoted as follows:
WX + B,
wherein W G M,NoutxNin js ^Q wejg^ matrix of a convolutional kernel (e.g., the first convolutional kernel 122 or the second convolutional kernel 124) with input X (e.g., the input data 1 10) and biases B. The number of output channels is denoted by Nout and the number of input channels is denoted by Nin. The set of feature maps may be decomposed as follows:
WX + B = [WX ) W2]X + B2], wherein Wx; W2 G M.(Nout/2)xNin
[00053] The first soft decision tree 132 may make "soft" selections over the set of feature maps WX + B (e.g., to generate the selected feature map 134) according to the following splitting function:
/(X) = p°(W<iX + B + (1 - p)°(W2X + B2 ,
wherein p and (1— p) may represent the mixing proportion used at a decision node in the first soft decision tree 132 to combine responses from the child nodes of that decision node (e.g., the first feature map 126 and the second feature map 128), 1 may represent a vector where all elements are one, ° denotes a Hadamard product, and p may be defined as follows:
Figure imgf000017_0001
wherein s(t) is a splitting function (e.g., a sigmoid function) that determines the mixing proportion p.
[00054] Optimizations to the convolutional neural network 100 may include adjusting the weights W and biases B applied at the tree feature map selection layer 130 in order to minimize an error E associated with the training output (e.g., the output 152). Specifically, adjustments may be made according to the following partial derivatives of the splitting function f(X) with respect to the weights W and biases B :
= p° dXT
dWx dE
= (l - p)° dXT
dW2 ~
Figure imgf000018_0001
dE
= (i - p)° a
~d~B~> yt = p°(l - p)°<5[ (W-LX + B - (W2X + B2)l wherein δ = dE/dfi ) G R~ .
[00055] Further backpropagation of the error £ to a preceding layer (e.g., the convolutional layer 120) may be determined based on the following:
ri F
- = 5T [p°w1 + (i - Pyw2].
[00056] Similarly, a decision node (e.g., parent node, root node) in the second soft decision tree 142 at the tree pooling layer 140 may make "soft" decisions over outputs from that decision node' s child nodes. For example, a decision node may make a "soft" decision over the outputs from the different pooling filters (e.g., maximum pooling filter, average pooling filter, stochastic pooling filter, and/or the like) that are applied at the leaf nodes of the second soft decision tree 142 to an individual portion (e.g., the first portion 134A) of a feature map (e.g., the selected feature map 134). As such, the output at each node m of the second soft decision tree
142 may be defined as follows:
Figure imgf000019_0001
ht(x)> otherwise
wherein wm G M.N may be individual pooling filters and the splitting function s(tm) may be a sigmoid function that is applied at each decision node m based on the splitting parameter tm. In some example embodiments, the splitting parameter tm may be a weight that is applicable to an output at each decision node m. The splitting function s(tm) may be defined as follows:
[00057] Optimizations to the convolutional neural network 100 may include adjusting the pooling filters wm and the splitting parameters tm applied at the tree pooling layer 140 in order to minimize the error E associated with the output 152. For example, the following pooling function f(x) denotes the output function of the second soft decision tree 142 where the second soft decision tree 142 has a single layer (e.g., a pair of leaf nodes descending directly from a root node):
Figure imgf000019_0002
wherein p and (1— p) may represent the mixing proportion used at the root node of the second soft decision tree 142 to combine outputs from the different pooling filter at each leaf node.
[00058] As such, adjustments may be made according to the following partial derivatives of the pooling function f(x) with respect to the different pooling filters Wj and w2 :
Figure imgf000019_0003
dE _ dE d/(x)
= δ(1— p)x
dw2 d/(x) dw2 dE dE d/(x) c , Λ . , T T .
— = — = δρ(1 - p)(wJ x - wTx),
d t a/(x) dt 1 1 J' wherein p may represent the value of the response from the splitting function s(t) and x is the output response from a previous layer.
[00059] The error E to be further propagated from the tree pooling layer 140 to a preceding layer (e.g., the tree feature map selection layer 130) may be defined as follows:
dE dE a/(x) „r , ,Λ . ,
dX a/(x) d t 2 J
[00060] The process 400 may include additional and/or different operations than shown without departing from the scope of the present disclosure. For example, one or more operations included in the process 400 may be repeated and/or omitted without departing from the scope of the present disclosure. Moreover, when training a convolutional neural network, the process 400 may be repeated any appropriate or desired number of times (e.g., using different input data) to achieve an optimal convolutional neural network.
[00061] FIG. 5 depicts an example of a soft decision tree 500, in accordance with some embodiments. Referring to FIGS. 1-5, in some example embodiments, the soft decision tree 500 may implement the second soft decision tree 144.
[00062] The soft decision tree 500 may include a plurality of nodes. As shown in FIG. 5, the soft decision tree 500 may include a plurality of child nodes. Each child node is associated with an output from the application of a different pooling filter to the first portion 144 A of the selected feature map 144 including a first pooling output 510, a second pooling output 512, a third pooling output 514, and a fourth pooling output 516. Each pooling filter may apply a subsampling operation (e.g., maximum, average, stochastic, and/or the like) on the first portion 144 A. As such, each of the first pooling output 510, the second pooling output 512, the third pooling output 514, and the fourth pooling output 516 may be a subsample of the first portion 144A of the selected feature map 144.
[00063] The soft decision tree 500 may further include a plurality of decision nodes including a first parent node 522, a second parent node 524, and a root node 526. Each decision node may apply a "soft" splitting function which combines the outputs from that decision node's child nodes according to a mixing proportion. The mixing proportion may indicate a portion (e.g., percentage) of the outputs from each child node to include in an output from the parent node. In some example embodiments, the first parent node 522, the second parent node 524, and the root node 526 may each apply the following splitting function:
wherein s(tm) may be a sigmoid function that determines the mixing proportion at each node m of the soft decision tree 500 in accordance with the splitting parameter tm.
[00064] As such, the output at each node of the soft decision tree 500 may be denoted as follows:
Figure imgf000021_0001
ht(x)> otherwise
wherein wm G RN are individual pooling filters applied to each portion of the selected feature map 134 (e.g., the first portion 134A).
[00065] In some example embodiments, the soft decision tree 500 may be part of a convolutional neural network (e.g., the convolutional neural network 100). For example, the soft decision tree 500 may be applied at a tree pooling layer (e.g., the tree pooling layer 140) of the convolutional neural network. As such, training the convolutional neural network may include adjusting the pooling filters wm and the splitting parameters tm of the soft decision tree 500 in order to minimize the error E associated with an output of the convolutional neural network (e.g., the output 152). The adjustments may be made in a supervised learning mode where the error E is back propagated through the convolutional neural network to determine a loss function. Gradient descent may be performed to determine the pooling filters wm and the splitting parameters tm that would minimize the loss function.
[00066] Although the soft decision tree 500 is shown to include two levels of decision nodes, the soft decision tree 500 can include a different number of levels of decision nodes without departing from the scope of the present disclosure. For example, in some example embodiments, the soft decision tree 500 may include a single level of decision nodes.
[00067] FIG. 6 depicts a flowchart illustrating a process 600 for utilizing a trained convolutional neural network, in accordance with some example embodiments. Referring to FIGS. 1-2 and 6, in some example embodiments, the process 600 may be performed by the system 200 to utilize the convolutional neural network 100 subsequent to training.
[00068] At 602, the system 200 may receive input data. For example, the system 200 may receive input data directly from a user or from the device 220. In some example embodiments, the input data may be any type of data including image, speech, genomic data, and/or any other type of data.
[00069] At 604, the system 200 may process the input data by at least utilizing a trained convolutional neural network having at least one of a tree feature map selection layer and a tree pooling layer. For example, the system 200 may utilize the trained convolutional neural network 100 to process the input data. The trained convolutional neural network 100 may include at least one of the tree feature map selection layer 130 and the tree pooling layer 140. [00070] At 606, the system 200 may provide, as an output, a result of the processing performed by the trained convolutional neural network. For example, the result of the processing performed by the trained convolutional neural network 100 may be a classification of the input data. In some example embodiments, the system 200 may provide the output directly to a user or via the device 220.
[00071] The process 600 may include additional and/or different operations than shown without departing from the scope of the present disclosure. For example, one or more operations of the process 600 may be repeated and/or omitted without departing from the scope of the present disclosure.
[00072] One or more aspects or features of the subject matter described herein can be realized in digital electronic circuitry, integrated circuitry, specially designed application specific integrated circuits (ASICs), field programmable gate arrays (FPGAs) computer hardware, firmware, software, and/or combinations thereof. These various aspects or features can include implementation in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which can be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device. The programmable system or computing system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.
[00073] These computer programs, which can also be referred to as programs, software, software applications, applications, components, or code, include machine instructions for a programmable processor, and can be implemented in a high-level procedural and/or object- oriented programming language, and/or in assembly/machine language. As used herein, the term "machine-readable medium" refers to any computer program product, apparatus and/or device, such as for example magnetic discs, optical disks, memory, and Programmable Logic Devices (PLDs), used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor. The machine-readable medium can store such machine instructions non-transitorily, such as for example as would a non-transient solid- state memory or a magnetic hard drive or any equivalent storage medium. The machine-readable medium can alternatively or additionally store such machine instructions in a transient manner, such as for example, as would a processor cache or other random access memory associated with one or more physical processor cores.
[00074] To provide for interaction with a user, one or more aspects or features of the subject matter described herein can be implemented on a computer having a display device, such as for example a cathode ray tube (CRT) or a liquid crystal display (LCD) or a light emitting diode (LED) monitor for displaying information to the user and a keyboard and a pointing device, such as for example a mouse or a trackball, by which the user may provide input to the computer. Other kinds of devices can be used to provide for interaction with a user as well. For example, feedback provided to the user can be any form of sensory feedback, such as for example visual feedback, auditory feedback, or tactile feedback; and input from the user may be received in any form, including, but not limited to, acoustic, speech, or tactile input. Other possible input devices include, but are not limited to, touch screens or other touch-sensitive devices such as single or multi-point resistive or capacitive track pads, voice recognition hardware and software, optical scanners, optical pointers, digital image capture devices and associated interpretation software, and the like.
[00075] The subject matter described herein can be embodied in systems, apparatus, methods, and/or articles depending on the desired configuration. The implementations set forth in the foregoing description do not represent all implementations consistent with the subject matter described herein. Instead, they are merely some examples consistent with aspects related to the described subject matter. Although a few variations have been described in detail above, other modifications or additions are possible. In particular, further features and/or variations can be provided in addition to those set forth herein. For example, the implementations described above can be directed to various combinations and subcombinations of the disclosed features and/or combinations and subcombinations of several further features disclosed above. In addition, the logic flows depicted in the accompanying figures and/or described herein do not necessarily require the particular order shown, or sequential order, to achieve desirable results. Other implementations may be within the scope of the following claims.

Claims

WHAT IS CLAIMED IS:
1. A method comprising:
receiving training data;
utilizing the training data to train a convolutional neural network comprising a tree pooling layer, wherein the tree pooling layer applies a soft decision tree to generate one or more pooled feature map; and
providing a trained convolutional neural network comprising a tree pooling layer.
2. The method as recited in claim 1, wherein the convolutional neural network comprises a convolutional layer configured to generate a plurality of feature maps based at least in part on the training data.
3. The method as recited in claim 2, wherein the convolutional layer generates at least one feature map by at least applying a convolutional kernel to the training data.
4. The method as recited in claim 3, wherein the convolutional kernel is adapted to detect a feature in the training data.
5. The method as recited in claim 4, wherein the at least one feature map depicts one or more occurrences of the feature in the training data.
6. The method as recited in claim 2, wherein the convolutional neural network further comprises a tree feature map selection layer configured to generate at least one selected feature map based at least in part on the plurality of feature maps generated at the convolutional layer.
7. The method as recited in claim 6, wherein the tree pooling layer is configured to apply the soft decision tree to each portion of the selected feature map to generate a corresponding portion of the pooled feature map.
8. The method as recited in claim 7, wherein the soft decision tree comprises a plurality of leaf nodes and decision nodes, wherein each leaf node corresponds to a pooling filter to apply to a portion of the selected feature map, and wherein a decision node applies a soft splitting function that combines an output from each child node of that decision node according to a mixing proportion.
9. The method as recited in claim 8, wherein the pooling filter comprises one of a maximum operation, an average operation, and a stochastic operation.
10. The method as recited in claim 8, wherein the mixing proportion indicates a portion of the output from each child node to include in a combination of the outputs from the child nodes.
11. The method as recited in claim 6, wherein the tree feature map selection layer is configured to apply a soft decision tree to generate the at least one selected feature map.
12. The method as recited in claim 11, wherein the soft decision tree combines two or more of the plurality of the feature maps into the at least one selected feature map.
13. The method as recited in claim 12, wherein the soft decision tree combines the two or more feature maps according to a mixing proportion.
14. The method as recited in claim 13, wherein the mixing proportion indicates a portion of each of the two or more feature maps to include in the selected feature map.
15. The method as recited in claim 1, wherein the convolutional neural network further comprises an output layer configured to generate a training output based at least in part on the one or more pooled feature maps.
16. The method as recited in claim 15, wherein training the convolutional neural network includes determining, by at least backpropagation and gradient descent, one or more optimizations based at least in part on an error associated with the training output.
17. The method as recited in claim 1, wherein providing the trained convolutional neural network comprises one of sending and/or storing the trained convolutional neural network.
18. A method comprising:
receiving input data;
processing the input data by at least utilizing a trained convolutional neural network comprising a tree pooling layer, wherein the tree pooling layer applies a soft decision tree to generate one or more pooled feature maps; and
providing, as an output, a result of the processing performed by the trained convolutional neural network.
19. The method as recited in claim 18, wherein the trained convolutional neural network further comprises a convolutional layer, wherein the convolutional layer is configured to generate a plurality of feature maps based at least in part on the input data, and wherein the convolutional layer generates each of the plurality of feature maps by at least applying a convolutional kernel to the input data.
20. The method as recited in claim 19, wherein the trained convolutional neural network further comprises a tree feature map selection layer, wherein the tree feature map selection layer is configured to apply a soft decision tree to generate at least one selected feature map, and wherein the soft decision tree generates the at least one feature map by at least combining two or more of the plurality of feature maps generated at the convolutional layer.
21. A system comprising:
at least one processor; and
at least one memory including program code which when executed by the at least one memory provides operations comprising
receiving training data;
utilizing the training data to train a convolutional neural network comprising a tree pooling layer, wherein the tree pooling layer applies a soft decision tree to generate one or more pooled feature map; and
providing a trained convolutional neural network comprising a tree pooling layer.
22. The system as recited in claim 21, wherein the convolutional neural network comprises a convolutional layer configured to generate a plurality of feature maps based at least in part on the training data.
23. The system as recited in claim 22, wherein the convolutional layer generates at least one feature map by at least applying a convolutional kernel to the training data.
24. The system as recited in claim 23, wherein the convolutional kernel is adapted to detect a feature in the training data.
25. The system as recited in claim 24, wherein the at least one feature map depicts one or more occurrences of the feature in the training data.
26. The system as recited in claim 22, wherein the convolutional neural network further comprises a tree feature map selection layer configured to generate at least one selected feature map based at least in part on the plurality of feature maps generated at the convolutional layer.
27. The system as recited in claim 26, wherein the tree pooling layer is configured to apply the soft decision tree to each portion of the selected feature map to generate a corresponding portion of the pooled feature map.
28. The system as recited in claim 27, wherein the soft decision tree comprises a plurality of leaf nodes and decision nodes, wherein each leaf node corresponds to a pooling filter to apply to a portion of the selected feature map, and wherein a decision node applies a soft splitting function that combines an output from each child node of that decision node according to a mixing proportion.
29. The system as recited in claim 28, wherein the pooling filter comprises one of a maximum operation, an average operation, and a stochastic operation.
30. The system as recited in claim 28, wherein the mixing proportion indicates a portion of the output from each child node to include in a combination of the outputs from the child nodes.
31. The system as recited in claim 26, wherein the tree feature map selection layer is configured to apply a soft decision tree to generate the at least one selected feature map.
32. The system as recited in claim 31, wherein the soft decision tree combines two or more of the plurality of the feature maps into the at least one selected feature map.
33. The system as recited in claim 32, wherein the soft decision tree combines the two or more feature maps according to a mixing proportion.
34. The system as recited in claim 33, wherein the mixing proportion indicates a portion of each of the two or more feature maps to include in the selected feature map.
35. The system as recited in claim 21, wherein the convolutional neural network further comprises an output layer configured to generate a training output based at least in part on the one or more pooled feature maps.
36. The system as recited in claim 35, wherein training the convolutional neural network includes determining, by at least backpropagation and gradient descent, one or more optimizations based at least in part on an error associated with the training output.
37. The system as recited in claim 21, wherein providing the trained convolutional neural network comprises one of sending and/or storing the trained convolutional neural network.
38. A system comprising:
at least one processor; and
at least one memory including program code which when executed by the at least one memory provides operations comprising:
receiving input data; processing the input data by at least utilizing a trained convolutional neural network comprising a tree pooling layer, wherein the tree pooling layer applies a soft decision tree to generate one or more pooled feature maps; and
providing, as an output, a result of the processing performed by the trained convolutional neural network.
39. The system as recited in claim 18, wherein the trained convolutional neural network further comprises a convolutional layer, wherein the convolutional layer is configured to generate a plurality of feature maps based at least in part on the input data, and wherein the convolutional layer generates each of the plurality of feature maps by at least applying a convolutional kernel to the input data.
40. The system as recited in claim 39, wherein the trained convolutional neural network further comprises a tree feature map selection layer, wherein the tree feature map selection layer is configured to apply a soft decision tree to generate at least one selected feature map, and wherein the soft decision tree generates the at least one feature map by at least combining two or more of the plurality of feature maps generated at the convolutional layer.
41. A non-transitory computer-readable storage medium including program code which when executed by at least one processor causes operations comprising:
receiving training data;
utilizing the training data to train a convolutional neural network comprising a tree pooling layer, wherein the tree pooling layer applies a soft decision tree to generate one or more pooled feature map; and
providing a trained convolutional neural network comprising a tree pooling layer.
42. A non-transitory computer-readable storage medium including program code which when executed by at least one processor causes operations comprising:
receiving input data;
processing the input data by at least utilizing a trained convolutional neural network comprising a tree pooling layer, wherein the tree pooling layer applies a soft decision tree to generate one or more pooled feature maps; and
providing, as an output, a result of the processing performed by the trained convolutional neural network.
43. An apparatus comprising:
means for receiving training data;
means for utilizing the training data to train a convolutional neural network comprising a tree pooling layer, wherein the tree pooling layer applies a soft decision tree to generate one or more pooled feature map; and
means for providing a trained convolutional neural network comprising a tree pooling layer.
44. The apparatus as recited in claim 43, further comprising means for performing the method as recited in any of claims 2-17.
45. An apparatus comprising:
means for receiving input data;
means for processing the input data by at least utilizing a trained convolutional neural network comprising a tree pooling layer, wherein the tree pooling layer applies a soft decision tree to generate one or more pooled feature maps; and means for providing, as an output, a result of the processing performed by the trained convolutional neural network.
46. The apparatus as recited in claim 45, further comprising means for performing the method as recited in any of claims 19-20.
PCT/US2016/020869 2015-03-04 2016-03-04 Convolutional neural network with tree pooling and tree feature map selection WO2016141282A1 (en)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
US201562128393P 2015-03-04 2015-03-04
US62/128,393 2015-03-04
US201562222676P 2015-09-23 2015-09-23
US62/222,676 2015-09-23

Publications (1)

Publication Number Publication Date
WO2016141282A1 true WO2016141282A1 (en) 2016-09-09

Family

ID=56848752

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2016/020869 WO2016141282A1 (en) 2015-03-04 2016-03-04 Convolutional neural network with tree pooling and tree feature map selection

Country Status (1)

Country Link
WO (1) WO2016141282A1 (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106971155A (en) * 2017-03-21 2017-07-21 电子科技大学 A kind of unmanned vehicle track Scene Segmentation based on elevation information
WO2018081135A1 (en) * 2016-10-25 2018-05-03 Vmaxx Inc. Point to set similarity comparison and deep feature learning for visual recognition
WO2018084473A1 (en) * 2016-11-07 2018-05-11 삼성전자 주식회사 Method for processing input on basis of neural network learning and apparatus therefor
KR20180051335A (en) * 2016-11-07 2018-05-16 삼성전자주식회사 A method for input processing based on neural network learning algorithm and a device thereof
WO2018230832A1 (en) * 2017-06-15 2018-12-20 Samsung Electronics Co., Ltd. Image processing apparatus and method using multi-channel feature map
WO2020017875A1 (en) 2018-07-17 2020-01-23 Samsung Electronics Co., Ltd. Electronic apparatus, method for processing image and computer-readable recording medium
EP3687152A1 (en) * 2019-01-23 2020-07-29 StradVision, Inc. Learning method and learning device for pooling roi by using masking parameters to be used for mobile devices or compact networks via hardware optimization, and testing method and testing device using the same
EP3699819A1 (en) * 2019-02-19 2020-08-26 Fujitsu Limited Apparatus and method for training classification model and apparatus for performing classification by using classification model
CN112101318A (en) * 2020-11-17 2020-12-18 深圳市优必选科技股份有限公司 Image processing method, device, equipment and medium based on neural network model
US11106970B2 (en) 2017-11-17 2021-08-31 International Business Machines Corporation Localizing tree-based convolutional neural networks
US11119915B2 (en) 2018-02-08 2021-09-14 Samsung Electronics Co., Ltd. Dynamic memory mapping for neural networks
US11232344B2 (en) 2017-10-31 2022-01-25 General Electric Company Multi-task feature selection neural networks
CN115497006A (en) * 2022-09-19 2022-12-20 杭州电子科技大学 Urban remote sensing image change depth monitoring method and system based on dynamic hybrid strategy
US11676078B2 (en) 2018-06-29 2023-06-13 Microsoft Technology Licensing, Llc Neural trees

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020083424A1 (en) * 1996-03-25 2002-06-27 Anthony Passera Systems for analyzing and computing data items
US20030002731A1 (en) * 2001-05-28 2003-01-02 Heiko Wersing Pattern recognition with hierarchical networks
US20090016470A1 (en) * 2007-07-13 2009-01-15 The Regents Of The University Of California Targeted maximum likelihood estimation
US7912246B1 (en) * 2002-10-28 2011-03-22 Videomining Corporation Method and system for determining the age category of people based on facial images
WO2011088497A1 (en) * 2010-01-19 2011-07-28 Richard Bruce Baxter Object recognition method and computer system
EP2418643A1 (en) * 2010-08-11 2012-02-15 Software AG Computer-implemented method and system for analysing digital speech data
US20140270488A1 (en) * 2013-03-14 2014-09-18 Google Inc. Method and apparatus for characterizing an image
US20140288928A1 (en) * 2013-03-25 2014-09-25 Gerald Bradley PENN System and method for applying a convolutional neural network to speech recognition
US20150006444A1 (en) * 2013-06-28 2015-01-01 Denso Corporation Method and system for obtaining improved structure of a target neural network
US20150032449A1 (en) * 2013-07-26 2015-01-29 Nuance Communications, Inc. Method and Apparatus for Using Convolutional Neural Networks in Speech Recognition
US20150036920A1 (en) * 2013-07-31 2015-02-05 Fujitsu Limited Convolutional-neural-network-based classifier and classifying method and training methods for the same

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020083424A1 (en) * 1996-03-25 2002-06-27 Anthony Passera Systems for analyzing and computing data items
US20030002731A1 (en) * 2001-05-28 2003-01-02 Heiko Wersing Pattern recognition with hierarchical networks
US7912246B1 (en) * 2002-10-28 2011-03-22 Videomining Corporation Method and system for determining the age category of people based on facial images
US20090016470A1 (en) * 2007-07-13 2009-01-15 The Regents Of The University Of California Targeted maximum likelihood estimation
WO2011088497A1 (en) * 2010-01-19 2011-07-28 Richard Bruce Baxter Object recognition method and computer system
EP2418643A1 (en) * 2010-08-11 2012-02-15 Software AG Computer-implemented method and system for analysing digital speech data
US20140270488A1 (en) * 2013-03-14 2014-09-18 Google Inc. Method and apparatus for characterizing an image
US20140288928A1 (en) * 2013-03-25 2014-09-25 Gerald Bradley PENN System and method for applying a convolutional neural network to speech recognition
US20150006444A1 (en) * 2013-06-28 2015-01-01 Denso Corporation Method and system for obtaining improved structure of a target neural network
US20150032449A1 (en) * 2013-07-26 2015-01-29 Nuance Communications, Inc. Method and Apparatus for Using Convolutional Neural Networks in Speech Recognition
US20150036920A1 (en) * 2013-07-31 2015-02-05 Fujitsu Limited Convolutional-neural-network-based classifier and classifying method and training methods for the same

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10755082B2 (en) 2016-10-25 2020-08-25 Deep North, Inc. Point to set similarity comparison and deep feature learning for visual recognition
WO2018081135A1 (en) * 2016-10-25 2018-05-03 Vmaxx Inc. Point to set similarity comparison and deep feature learning for visual recognition
KR20180051335A (en) * 2016-11-07 2018-05-16 삼성전자주식회사 A method for input processing based on neural network learning algorithm and a device thereof
WO2018084473A1 (en) * 2016-11-07 2018-05-11 삼성전자 주식회사 Method for processing input on basis of neural network learning and apparatus therefor
KR102313773B1 (en) * 2016-11-07 2021-10-19 삼성전자주식회사 A method for input processing based on neural network learning algorithm and a device thereof
US10963738B2 (en) 2016-11-07 2021-03-30 Samsung Electronics Co., Ltd. Method for processing input on basis of neural network learning and apparatus therefor
CN106971155A (en) * 2017-03-21 2017-07-21 电子科技大学 A kind of unmanned vehicle track Scene Segmentation based on elevation information
WO2018230832A1 (en) * 2017-06-15 2018-12-20 Samsung Electronics Co., Ltd. Image processing apparatus and method using multi-channel feature map
US10740865B2 (en) 2017-06-15 2020-08-11 Samsung Electronics Co., Ltd. Image processing apparatus and method using multi-channel feature map
US11232344B2 (en) 2017-10-31 2022-01-25 General Electric Company Multi-task feature selection neural networks
US11106970B2 (en) 2017-11-17 2021-08-31 International Business Machines Corporation Localizing tree-based convolutional neural networks
US11119915B2 (en) 2018-02-08 2021-09-14 Samsung Electronics Co., Ltd. Dynamic memory mapping for neural networks
US11676078B2 (en) 2018-06-29 2023-06-13 Microsoft Technology Licensing, Llc Neural trees
KR20200008845A (en) * 2018-07-17 2020-01-29 삼성전자주식회사 Electronic apparatus, method for processing image and computer-readable recording medium
EP3752978A4 (en) * 2018-07-17 2021-05-26 Samsung Electronics Co., Ltd. Electronic apparatus, method for processing image and computer-readable recording medium
US11347962B2 (en) 2018-07-17 2022-05-31 Samsung Electronics Co., Ltd. Electronic apparatus, method for processing image and computer-readable recording medium
KR102476239B1 (en) * 2018-07-17 2022-12-12 삼성전자주식회사 Electronic apparatus, method for processing image and computer-readable recording medium
WO2020017875A1 (en) 2018-07-17 2020-01-23 Samsung Electronics Co., Ltd. Electronic apparatus, method for processing image and computer-readable recording medium
EP3687152A1 (en) * 2019-01-23 2020-07-29 StradVision, Inc. Learning method and learning device for pooling roi by using masking parameters to be used for mobile devices or compact networks via hardware optimization, and testing method and testing device using the same
EP3699819A1 (en) * 2019-02-19 2020-08-26 Fujitsu Limited Apparatus and method for training classification model and apparatus for performing classification by using classification model
US11514272B2 (en) 2019-02-19 2022-11-29 Fujitsu Limited Apparatus and method for training classification model and apparatus for performing classification by using classification model
JP7347202B2 (en) 2019-02-19 2023-09-20 富士通株式会社 Device and method for training a classification model and classification device using the classification model
CN112101318A (en) * 2020-11-17 2020-12-18 深圳市优必选科技股份有限公司 Image processing method, device, equipment and medium based on neural network model
CN115497006A (en) * 2022-09-19 2022-12-20 杭州电子科技大学 Urban remote sensing image change depth monitoring method and system based on dynamic hybrid strategy
CN115497006B (en) * 2022-09-19 2023-08-01 杭州电子科技大学 Urban remote sensing image change depth monitoring method and system based on dynamic mixing strategy

Similar Documents

Publication Publication Date Title
WO2016141282A1 (en) Convolutional neural network with tree pooling and tree feature map selection
JP6771645B2 (en) Domain separation neural network
KR102100977B1 (en) Compressed circulatory neural network model
US20220327714A1 (en) Motion Engine
CN107688493A (en) Train the method, apparatus and system of deep neural network
CN109328362A (en) Gradual neural network
US20150220311A1 (en) Computer implemented modeling system and method
US20180260843A1 (en) Creating targeted content based on detected characteristics of an augmented reality scene
AU2018368279A1 (en) Meta-learning for multi-task learning for neural networks
CN109478204A (en) The machine of non-structured text understands
JP7316453B2 (en) Object recommendation method and device, computer equipment and medium
JP6912588B2 (en) Image recognition Image recognition with filtering of output distribution
US9436909B2 (en) Increased dynamic range artificial neuron network apparatus and methods
US20180075348A1 (en) Machine learning model for analysis of instruction sequences
CN106462801A (en) Training neural networks on partitioned training data
CN106776673A (en) Multimedia document is summarized
US9547776B2 (en) Managing access permissions to class notebooks and their section groups in a notebook application
WO2016033506A1 (en) Processing images using deep neural networks
WO2022212883A1 (en) Motion engine
US20180075349A1 (en) Training a machine learning model for analysis of instruction sequences
WO2020159890A1 (en) Method for few-shot unsupervised image-to-image translation
KR102190103B1 (en) Method of providing commercialization service of an artificial neural network
CN110516791A (en) A kind of vision answering method and system based on multiple attention
US20190156193A1 (en) System and method for processing complex datasets by classifying abstract representations thereof
CN107657066A (en) Medical data scientific research field customizing method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16759569

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16759569

Country of ref document: EP

Kind code of ref document: A1