WO2018145308A1 - Mécanisme de réutilisation de filtre pour construire un réseau neuronal à convolution profonde robuste - Google Patents

Mécanisme de réutilisation de filtre pour construire un réseau neuronal à convolution profonde robuste Download PDF

Info

Publication number
WO2018145308A1
WO2018145308A1 PCT/CN2017/073342 CN2017073342W WO2018145308A1 WO 2018145308 A1 WO2018145308 A1 WO 2018145308A1 CN 2017073342 W CN2017073342 W CN 2017073342W WO 2018145308 A1 WO2018145308 A1 WO 2018145308A1
Authority
WO
WIPO (PCT)
Prior art keywords
convolutional
convolutional layer
feature maps
configuring
neural network
Prior art date
Application number
PCT/CN2017/073342
Other languages
English (en)
Inventor
Xiaoheng JIANG
Original Assignee
Nokia Technologies Oy
Nokia Technologies (Beijing) Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nokia Technologies Oy, Nokia Technologies (Beijing) Co., Ltd. filed Critical Nokia Technologies Oy
Priority to CN201780089497.0A priority Critical patent/CN110506277B/zh
Priority to PCT/CN2017/073342 priority patent/WO2018145308A1/fr
Publication of WO2018145308A1 publication Critical patent/WO2018145308A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • G06F18/24133Distances to prototypes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Definitions

  • the present disclosure is related to a neural network, and more particularly, to a filtering mechanism for a convolutional neural network.
  • CNNs deep convolutional neural networks
  • the power of deep convolntional neural networks lies in the fact that they are able to learn a hierarchy of features.
  • An example of CNN architecture is described in G. Huang, Z. Liu, Q. Weinberge: Densely Connected Convolutional Networks, CoRR, abs/1608. 06993 (2016) (hereinafter “Huang” ) .
  • a CNN architecture is proposed that introduces direct connections within all layers of a block in the neural network. That is, each layer is directly connected to every other layer in one block in a feed-forward fashion.
  • One block typically consists of several layers without a down-sampling operation. For each layer, the feature maps of all preceding layers are treated as separate inputs whereas its own feature maps are passed on as inputs to all subsequent layers.
  • the core idea is to reuse the feature maps generated in the previous layers. However, these feature maps themselves do not bring in new information to the neural network.
  • the present disclosure provides an apparatus and method to generate feature maps for a first convolutional layer of a convolutional neural network based on a region of an image to be evaluated and a learned filter from the first convolutional layer, to generate feature maps for one or more subsequent convolutional layers of the convolutional neural network after the first convolutional layer, and to detect a presence of an object of interest in the region of the image based on the generated feature maps of the first and one or more subsequent convolutional layers.
  • Each subsequent convolutional layer is generated based on the feature maps of a prior convolutional layer, a learned filter for the prior convolutional layer and a learned filter for the subsequent convolutional layer.
  • the apparatus and method can be further configured to receive the image which is captured from an image sensing device, and/or to initiate an alarm if the object is detected. Furthermore, the convolutional neural network can be applied to each region of the image to detect whether the object is present in any of the regions of the image.
  • the apparatus and method can also be configured to learn a filter for each convolutional layer of the convolutional neural network during a training stage (or phase) using one or more training images.
  • the apparatus and method can be configured to initialize filters for the convolutional layers of the convolutional neural network, to generate feature maps for each convolutional layer using forward-propagation, to calculate a loss using a loss function based on the generated feature maps and a score for each category and corresponding label, and to update the filters for the convolutional layers using back-propagation if the calculated loss has decreased.
  • Each subsequent convolutional layer after the first convolutional layer is generated based on the feature maps of a prior convolutional layer, a learned filter for the prior convolutional layer and a filter for the subsequent convolutional layer.
  • the apparatus and method can be configured to repeat the operations of calculating feature maps, calculating a loss, and updating the filters, until the convolutional neural network converges when the calculated loss is no longer decreasing.
  • two map features can be generated for each of the one or more subsequent convolutional layers. Furthermore, the operations of generating feature maps for a first convolutional layer, generating feature maps for one or more subsequent convolutional layers, and detecting a presence of an object are performed in a testing stage.
  • the apparatus and method can be configured to obtain a score for the region from application of the convolutional neural network, and to compare the score for the region to a threshold value. The object is detected in the region if the score for the region is larger than the threshold value.
  • Fig. 1 illustrates a block diagram of an example system for detecting a presence or absence of an object using a convolutional neural network (CNN) with filter reuse (or sharing) in accordance with an embodiment of the present disclosure
  • CNN convolutional neural network
  • Fig. 2 illustrates a block diagram of an example system for detecting a presence or absence of an object using a convolutional neural network (CNN) with filter reuse (or sharing) in accordance with another embodiment of the present disclosure
  • CNN convolutional neural network
  • Fig. 3 is an example architecture of a convolutional neural network which re-uses a filter from a prior convolutional layer in a subsequent convolutional layer in accordance with an embodiment of the present disclosure
  • Fig. 4 is a flow diagram showing an example process by which a system, such as for example in Fig. 1 or 2, is configured to implement training and/or testing stages using a convolutional neural network, in accordance with an embodiment of the present disclosure;
  • Fig. 5 is a flow diagram showing an example process by which a system, such as for example in Fig. 1 or 2, is configured to implement a training stage for training a convolutional neural network, in accordance with an embodiment of the present disclosure
  • Fig. 6 is a flow diagram showing an example process by which a system, such as for example in Fig. 1 or 2, is configured to implement a testing stage for evaluating an image or regions thereof using a trained convolutional neural network, in accordance with an embodiment of the present disclosure
  • Fig. 7 is a flow diagram showing an example detection process by which a system, such as for example in Fig. 1 or 2, is configured to detect a presence (or absence) of a feature, such as an object, using a convolutional neural network, in accordance with an example embodiment of the present disclosure.
  • an apparatus and method which employ a deep convolutional neural network (CNN) with a filtering reuse mechanism to analyze an image or region thereof, and to detect a presence (or absence) of an object (s) of interest.
  • the CNN is configured to re-use filters from a prior (e.g., a previous or earlier) convolutional layer to compute map features in a subsequent convolutional layer.
  • a prior e.g., a previous or earlier
  • the filters can be fully used or shared so that the ability of feature representation is significantly enhanced, thereby significantly improving the recognition accuracy of the resulting deep CNN.
  • the present CNN approach with filter reuse also can take advantage of information (e.g., filters) obtained from the prior convolutional layer, as well as generate new information (e.g., feature maps) in a current convolutional layer.
  • the architecture of such a CNN can reduce the number of parameters because each current convolutional layer reuses the filter of a prior convolutional layer. Such a configuration, thus, can address the over-fitting problem that is caused by using too many parameters.
  • the apparatus and method of the present disclosure can be employed in object recognition systems, such as for example a video surveillance system that employs a camera or other sensor.
  • the camera can capture several multi-view images of the same scenario such as 360-degree images.
  • the task of the video surveillance is to detect one or more objects of interest (e.g., pedestrians, animals, or other objects) from the multi-view images, and then provide an alert or notification (e.g., an alarm or warning) to the user.
  • an alert or notification e.g., an alarm or warning
  • a camera system can be provided to capture 360-degree images
  • a video surveillance system can potentially detect all objects of interest appearing in a scenario or environment.
  • each camera or camera sub-system
  • the operations of the video surveillance system using CNN with filter reuse can involve the following.
  • Each camera of the system captures an image.
  • the CNN with filter reuse can, for example, be employed to classify the region as an object of interest if the response of the CNN is larger than a pre-defined threshold, and to classify the region as background (e.g., non-object) if the response of the CNN is equal to or less than the threshold.
  • the object detection process can involve a training stage and testing stage.
  • the goal of the training stage is to design or configure the structure of the CNN with filter reuse, and to learn the parameters (i.e., the filters) of the CNN.
  • the CNN is trained to detect a presence (or absence) of a particular object (s) using training images as input.
  • Back-propagation can, for example, be used to learn or configure the parameters, such as the filters, of the CNN to detect the presence (or absence) of an object.
  • the training images can include example images of the object (s) of interest, of background (s) , and other aspects that may be present in an image.
  • the trained CNN with filter reuse is applied to an image to be tested (e.g., input image or testing image) to detect a presence (or absence) of the particular object (s) .
  • the goal of the testing stage is to classify each region of the image by taking the region as the input of the trained CNN. The region is classified as either an object of interest or background. If the classification decision is an object of interest, the system generates, for example, an alert or notification (e.g., an alert signal in the form of voice or message) which can be immediately sent to the user via a network connection (e.g., the Internet) or other media.
  • an alert or notification e.g., an alert signal in the form of voice or message
  • An alert can be generated once one of the cameras in the system detects an object of interest.
  • the object detection processes may be implemented in or with each camera or each camera subsystem. Examples of a CNN with filter reuse, and an objection detection system are described in further detail below with reference to the figures.
  • Fig. 1 illustrates a block diagram of example components of a system 100 for detecting a presence (or absence) of an object of interest using a convolutional neural network (CNN) that reuses or shares filters.
  • the system 100 includes one or more processor (s) 110, a plurality of sensors 120, a user interface (s) 130, a memory 140, a communication interface (s) 150, a power supply 160 and output device (s) 170.
  • the power supply 160 can include a battery power unit, which can be rechargeable, or a unit that provides connection to an external power source.
  • the sensors 120 are configured to sense or monitor activities, e.g., an object (s) , in a geographical area or an environment, such as around a vehicle, around or inside a building, and so forth.
  • the sensors 120 can include one or more image sensing device (s) or sensor (s) .
  • the sensor 120 can for example be a camera with one or more lenses (e.g., a camera, a web camera, a camera system to capture panoramic or 360 degree images, a camera with a wide lens or multiple lenses, etc. ) .
  • the image sensing device is configured to capture images or image data, which can be analyzed using the CNN to detect a presence (or absence) of an object of interest.
  • the captured images or image data can include image frames, video, pictures, and/or the like.
  • the sensor 120 may also comprise a millimeter wave radar, an infrared camera, Lidar (Light Detection And Ranging) sensor and/or other types of sensors.
  • the user interface (s) 130 may include a plurality of user input devices through which a user can input information or commands to the system 100.
  • the user interface (s) 130 may include a keypad, a touch-screen display, a microphone, or other user input devices through which a user can input information or commands.
  • the output devices 170 can include a display, a speaker or other devices which are able to convey information to a user.
  • the communication interface (s) 150 can include communication circuitry (e.g., transmitter (TX) , receiver (RX) , transceiver such as a radio frequency transceiver, etc. ) for conducting line-based communications with an external device such as a USB or Ethernet cable interface, or for conducting wireless communications with an external device, such as for example through a wireless personal area network, a wireless local area network, a cellular network or wireless wide area network.
  • TX transmitter
  • RX receiver
  • transceiver such as a radio frequency transceiver, etc.
  • the communication interface (s) 150 can, for example, be used to receive a CNN and its parameters or updates thereof (e.g., learned filters for an object of interest) from an external computing device 180 (e.g., server, data center, etc. ) , to transmit an alarm or other notification to an external computing device 180 (e.g., a user’s device such as a computer, etc. ) , and/or to interact with external computing devices 180 in to implement in a distributed manner the various operations described herein, such as the training stage, the testing stage, the alarm notification and/or other operations as described herein.
  • an external computing device 180 e.g., server, data center, etc.
  • an external computing device 180 e.g., a user’s device such as a computer, etc.
  • the memory 140 is a data storage device that can store computer executable code or programs, which when executed by the processor 110, controls the operations of the system 100.
  • the memory 140 also can store configuration information for a CNN 142 and its parameters 144 (e.g., learned filters) , images 146 (e.g., training images, captured images, etc. ) , and a detection algorithm 148 for implementing the various operations described herein, such as the training stage, the testing stage, the alarm notification, and other operations as described herein.
  • the processor 110 is in communication with the memory 140.
  • the processor 110 is a processing system, which can include one or more processors, such as CPU, GPU, controller, dedicated circuitry or other processing unit, which controls the operations of the system 100, including the detection operations (e.g., training stage, testing stage, alarm notification, etc. ) described herein in the present disclosure.
  • the processor 110 is configured to train the CNN 142 to detect a presence or absence of objects of interest (e.g., detect an object (s) of interest, background (s) , etc. ) by configuring or learning the parameters (e.g., learning the filters) using training images or the like, category/label information, and so forth.
  • the processor 110 is also configured to test captured image (s) or regions thereof using the trained CNN 142 with the learned parameters in order to detect a presence (or absence) of an object in an image or region thereof.
  • the object of interest may include a person such as a pedestrian, an animal, vehicles, traffic signs, road hazards, and/or the like, or other objects of interest according to the intended application.
  • the processor 110 is also configured to initiate an alarm or other notification when a presence of the object is detected, such as notifying a user by outputting the notification using the output device 170 or by transmitting the notification to an external computing device 180 (e.g., user’s device, data center, server, etc. ) via the communication interface 150.
  • the external computing device 180 can include components similar to those in the system 100, such as shown and described above with reference to Fig. 1.
  • Fig. 2 depicts an example system 200 including processor (s) 210, and sensor (s) 220 in accordance with some example embodiments.
  • the system 200 may also include a radio frequency transceiver 250.
  • the system 200 may be mounted in a vehicle 20, such as a car or truck, although the system may be used without the vehicles 20 as well.
  • the system 200 may include the same or similar components and functionality, such as provided in the system 100 of Fig. 1.
  • the senor (s) 220 may comprise one or more image sensors configured to provide image data, such as image frames, video, pictures, and/or the like.
  • the sensor 220 may comprise a camera, millimeter wave radar, an infrared camera, Lidar (Light Detection And Ranging) sensor and/or other types of sensors.
  • the processor 210 may comprise of CNN circuitry, which may represent dedicated CNN circuitry configured to implement the convolutional neural network and other operations as described herein.
  • the CNN circuitry may be implemented in other ways such as, using at least one memory including program code which when executed by at least one processing device (e.g., CPU, GPU, controller, etc. ) .
  • the system 200 may have a training stage.
  • the training stage may configure the CNN circuitry to learn to detect and/or classify one or more objects of interest.
  • the processor 210 may be trained with images including objects such as people, other vehicles, road hazards, and/or the like. Once trained, when an image includes the object (s) , the trained CNN implementable via the processor 210 may detect the object (s) and provide an indication of the detection/classification of the object (s) . In the training stage, the CNN may learn its configuration (e.g., parameters, weights, and/or the like) .
  • the configured CNN can be used in a test or operational stage to detect and/or classify regions (e.g., patches or portions) of an unknown, input image and thus determine whether that input image includes an object of interest or just background (i.e., not having an object of interest) .
  • regions e.g., patches or portions
  • the system 200 may be trained to detect objects, such as people, animals, other vehicles, traffic signs, road hazards, and/or the like.
  • ADAS advanced driver assistance system
  • an output such as a warning sound, haptic feedback, indication of recognized object, or other indication may be generated to for example warn or notify a driver.
  • the detected objects may signal control circuitry to take additional action in the vehicle (e.g., initiate breaking, acceleration/deceleration, steering and/or some other action) .
  • the indication may be transmitted to other vehicles, IoT devices or cloud, mobile edge computing (MEC) platform and/or the like via radio transceiver 250.
  • MEC mobile edge computing
  • Fig. 3 is an example of a convolutional neural network (CNN) architecture 300, which includes a plurality of convolutional layers (e.g., Layer 1 ... Layer L or l) , and a decision layer.
  • the CNN architecture 200 is configured to re-use or share filters from a prior convolutional layer in a subsequent convolutional layer.
  • an N 1 feature maps C 1 are obtained by a filter W 1 .
  • the spatial width and height of C 1 are w 1 and h 1 , respectively.
  • feature maps C 2 are obtained not only by a new filter W 2 but also by the filter W 1 of prior layer 1.
  • With the filter W 2 N 21 feature maps are obtained.
  • N 22 feature maps are obtained.
  • the N 21 feature maps and the N 22 feature maps are concatenated to form the feature maps C 2 in layer 2. Therefore, as shown in Fig. 2, the filter W 1 of prior layer 1 is reused in layer 2. Similarly, a new filter W 3 is used to generate the N 31 feature maps of layer 3, and the filter W 2 obtained in prior layer 2 is used to produce the N 32 feature maps of layer 3.
  • the N 31 feature maps and N 32 feature maps are concatenated to form feature maps C 3 of layer 3. In the same way, the rest of the feature maps C 4 , C 5 ... C L are computed.
  • the CNN architecture 300 can be employed in a detection process to detect a presence (or absence) of an object (s) of interest in a region of an image, or to classify regions of interest of an image.
  • the detection process can include a training stage to learn the parameters for the CNN using training images, and a testing stage to apply the trained CNN to classify regions of an image and to detect a presence (or absence) of an object of interest. Examples of the training stage and testing stage are described below with reference to the figures.
  • Fig. 4 is a flow diagram showing an example process 400 by which a system, such as for example in Fig. 1 or 2, is configured to implement training and/or testing stages of the convolutional neural network, such as for example shown in Fig. 3.
  • a system such as for example in Fig. 1 or 2
  • the process 400 is discussed below with reference to the processor 110 and other components of the system 100 in Fig 1, and describes high level operations that are performed in relations to a training stage, and a testing stage.
  • the processor 110 is configured to provide a convolutional neural network during a training stage.
  • the processor 110 is configured to learn a parameter (s) , such as a filter, for each convolutional layer of the convolutional neural network during a training stage.
  • a parameter such as a filter
  • the processor 110 is configured to generate feature maps for a first convolutional layer of the convolutional neural network based on a region of an image to be evaluated and a learned filter from the first convolutional layer during a testing stage.
  • the processor 110 is configured to generate feature maps for one or more subsequent convolutional layers of the convolutional neural network based on feature maps of a prior convolutional layer, a learned filter for the prior convolutional layer, and a learned filter for the subsequent convolutional layer during the testing stage.
  • the processor 110 is configured to detect a presence (or an absence) of an object of interest in the region of the image based on the generated feature maps of the first and one or more subsequent convolutional layers during the testing stage. In the event that an object is detected, the processor can be configured to initiate an alarm or other notification to a user or other entity.
  • Fig. 5 is a flow diagram showing an example process 500 by which a system, such as for example in Fig. 1 or 2, is configured to implement a training stage for training a CNN with filter reuse (see, e.g., Fig. 3) .
  • a system such as for example in Fig. 1 or 2
  • a training stage for training a CNN with filter reuse (see, e.g., Fig. 3) .
  • the process 400 is discussed below with reference to the processor 110 and other components of the system 100 in Fig 1, and describes operations that are performed during the training stage.
  • a set of training images and their corresponding labels are prepared. For example, if the training image contains an object of interest, then the label is set to a number (e.g., 1) . If the training image does not contain the object of interest, then the label of the image set to another number (e.g., -1) .
  • the set of training images and their corresponding labels are used during the training stage in the design and configuration of a CNN for detecting the object of interest.
  • the processor 110 implements an initialization operation of the parameters, such as the filters, for the CNN.
  • the processor 110 initializes the filters (e.g., W 1 ... W L ) for the convolutional layers (e.g., Layers 1 ... L or l) of the CNN, such as in Fig. 3.
  • the filters can be initialized by using a Gaussian distribution with zero mean and a small variation (e.g., 0.01) .
  • the processor 110 generates (e.g., calculates or computes) the feature maps on a convolutional layer-by-layer basis, such as for example using forward-propagation with a training image or region thereof as an input from the set of training images.
  • this operation can involve calculating feature maps using two filters, such as shown in the CNN architecture 300 of Fig. 3. One filter comes from the prior convolutional layer and the other filter comes from the current convolutional layer. For instance, given W i of layer l and W i+1 of layer l+1, the feature maps generated in layer l are denoted by N l .
  • the convolution operation is carried out twice.
  • the feature maps are computed where “°” represents a convolution operation.
  • the feature maps are computed as follows: Thereafter, the feature maps and are concatenated to generate the final output N l+1 of layer l+1. It is noted that W l is used in layer l to calculate the feature maps Therefore, the filter W l used in layer l is reused in layer l+1 to generate new feature maps.
  • the processor 110 implements a decision layer in which a loss calculation is performed.
  • the processor 110 performs a loss calculation, such as by calculating the loss according to the final score for each category and the corresponding label.
  • the loss calculation can be performed using a softmax loss function.
  • An example of a softmax loss function is represented by equation (1) as follows:
  • y is the vector representing the scores for all classes
  • y c is the score of class c.
  • softmax loss function instead of the softmax loss function, other functions can also be adopted in the decision layer, such as Support Vector Machine (SVM) loss function or other suitable loss functions for use with a CNN.
  • SVM Support Vector Machine
  • softmax loss function calculates the cross- entropy loss
  • SVM loss function calculates the hinge loss. As to the classification task, these two functions perform almost the same.
  • the processor 110 determines whether the filters of the CNN should be updated based on the calculated loss, e.g., a change of calculated loss. For example, the processor 110 determines if the loss has stopped decreasing or changing, or in other words, if the CNN is converging. If the loss has stopped decreasing, the processor 110 outputs the filters (e.g., the learned filters) for use in the CNN during the testing stage at reference 514.
  • the outputted filters can be stored in memory for use with the CNN.
  • the processor 110 updates the filters of the CNN at reference 512.
  • the processor 110 can implement back-propagation (e.g., standard back-propagation or other variants thereof) to update all of the filters of the CNN.
  • the filters can be updated through the chain rule during back-propagation, for example, according to equation (2) as follows:
  • represents the loss function
  • represents the updating coefficient (e.g., learning rate) .
  • the process 500 then continues by repeating the operations in references 506, 508 and 510, until the calculated loss stops decreasing or changing, or in other words, the CNN converges.
  • Fig. 6 is a flow diagram showing an example process 600 by which a system, such as for example in Fig. 1 or 2, is configured to implement a testing stage for evaluating an image or region thereof using a trained CNN with filter reuse (see, e.g., Fig. 3) .
  • the test stage can differ from the training stage in that it does not need to update the filters. Instead, the test stage can adopt the filters learned from the training stage to classify or detect objects. Furthermore, there is no need to calculate the loss for the decision layer. The decision layer simply decides which class has the highest score.
  • the process 600 is discussed below with reference to the processor 110 and other components of the system 100 in Fig 1, and describes operations that are performed during the testing stage.
  • the processor 110 implements a region proposal operation by determining the region (of an image) that is likely to contain the object of interest, e.g., a targeted object.
  • a region proposal operation by determining the region (of an image) that is likely to contain the object of interest, e.g., a targeted object.
  • one simple approach to identify a region of interest for evaluation is to adopt the sliding window technique that scans an input image exhaustively. Other methods can also be adopted.
  • the processor 110 implements map feature generation using the CNN with filter reuse. For example, the processor 110 applies the region of interest of the image to the CNN, and generates the feature maps on a convolutional layer-by-layer basis using the learned parameters, e.g., the filters, such as from the training stage.
  • the map feature generation procedure in the test stage can be similar to that performed in the training stages, such as described above with reference to Fig. 5.
  • the processor 110 implements a decision layer to perform classification or object detection of the region. For example, in the decision layer, the processor 110 can take the score vectory as input and determine which one (e.g., y c ) has the highest score. This operation outputs the label (e.g., pedestrian) corresponding to the highest score.
  • the processor 110 can take the score vectory as input and determine which one (e.g., y c ) has the highest score. This operation outputs the label (e.g., pedestrian) corresponding to the highest score.
  • the decision layer can use the softmax loss function, or other loss functions such as the SVM loss function.
  • the softmax loss function calculates the cross-entropy loss, whereas the SVM loss function calculates the hinge loss. As to the classification task, these two functions perform almost the same.
  • Fig. 7 is a flow diagram showing an example detection process 700 by which a system, such as for example in Fig. 1 or 2, is configured to detect a presence (or absence) of an object of interest, using a trained CNN with filter reuse (see, e.g., Fig. 3) .
  • a system such as for example in Fig. 1 or 2
  • a trained CNN with filter reuse see, e.g., Fig. 3
  • the process 700 is discussed below with reference to the processor 110 and other components of the system 100 in Fig 1.
  • the sensor (s) 120 captures image (s) .
  • the images can be captured for different scenarios depending on the application for the detection process 700.
  • the sensor (s) 120 may be positioned, installed or mounted to capture images for fixed locations (e.g., different locations in or around a building or other location) or for movable locations (e.g., locations around a moving vehicle, person or other system) .
  • a camera system such as a single or multi-lens camera or camera system to capture panoramic or 360 degree images, can be installed on a vehicle.
  • the processor 110 scans each region of an image, such as from the captured image (s) .
  • the processor 110 applies the CNN to each region of the image, such as by implementing a testing stage.
  • An example of a testing stage is described by the process 600 which is described with reference to Fig. 6.
  • the application of the CNN provides a score for the tested region of the image.
  • the processor 110 determines if the score from the CNN is larger than a threshold (e.g., a threshold value) .
  • the processor 110 does not initiate an alarm or notification, at reference 710.
  • the process 700 continues to capture and evaluate images. Otherwise, if the score is larger than the threshold, the processor 110, at reference 712, initiates an alarm or notification reflecting a detection of an object of interest or classification of such an object.
  • an object of interest can include a pedestrian, an animal, a vehicle, a traffic sign, a road hazard or other pertinent objects depending on the intended application for the detection process.
  • the alarm or notification may be initiated locally at the system 100 via one of the output devices 170 or transmitted to an external computing device 180.
  • the alarm may be provided to the user in the form of a visual or audio notification or other suitable medium (e.g., vibrational, etc. ) .
  • the KITTI dataset show the effectiveness of the present method and system which employs a CNN with filter reuse.
  • the KITTI dataset were captured by a pair of cameras.
  • the subset of the KITTI dataset used for pedestrian detection consists of 7481 training images and 7518 test images.
  • the sizes of the filters W 1 , W 2 , W 3 , W 4 , W 5 , W 6 , W 7 , W s , W 9 , W 10 , W 11 , W 12 , and W 13 are 3 ⁇ 3 ⁇ 3, 3 ⁇ 3 ⁇ 32, 3 ⁇ 3 ⁇ 64, 3 ⁇ 3 ⁇ 64, 3 ⁇ 3 ⁇ 128, 3 ⁇ 3 ⁇ 128, 3 ⁇ 3 ⁇ 128, 3 ⁇ 3 ⁇ 128, 3 ⁇ 3 ⁇ 128, 3 ⁇ 3 ⁇ 256, 3 ⁇ 3 ⁇ 256, 3 ⁇ 3 ⁇ 256, 3 ⁇ 3 ⁇ 256, 3 ⁇ 3 ⁇ 256, and 3 ⁇ 3 ⁇ 256, respectively.
  • the traditional VGG neural network is compared with an example of the present method and system which employs a filter reusing mechanism with the CNN.
  • the average precision (AP) of the present CNN with filter reuse is 60.43%whereas the average precision of the traditional VGG neural network is 56.74% (see, e.g., Simonyan K, Zisserman A.: Very deep convolutional networks for large-scale image recognition, arXiv preprint arXiv: 1409.1556 (2014) ) . It is observed that the present CNN method with filter reuse significantly outperforms the traditional VGG method. That is, the introduction of filter reuse or sharing plays an important role in improving the performance of object detection. As such, the present method and system, which t employs a filtering reuse mechanism in a CNN, can provide significant improvements to the field of object detection, and thus, video surveillance.
  • system 100 or 200 can be used to implement among other things operations including the training stage, the testing stage and the alarm notification, these operations may be distributed and performed across a plurality of systems over a communication network (s) .
  • the training stage may instead employ other variants of back-propagation that may be aimed at improving the performance of back-propagation.
  • the training and testing stages may also adopt other suitable loss functions or training strategies.
  • the CNN approach with reuse or shared filters, as described herein, may be utilized in various applications, including but not limited to object detection/recognition in video surveillance systems, in autonomous or semi-autonomous vehicles, or in ADAS implementations.
  • example embodiments may be implemented as a machine, process, or article of manufacture by using standard programming and/or engineering techniques to produce programming software, firmware, hardware or any combination thereof.
  • Any resulting program (s) having computer-readable program code, may be embodied on one or more computer-usable media such as resident memory devices, smart cards or other removable memory devices, or transmitting devices, thereby making a computer program product or article of manufacture according to the embodiments.
  • the terms “article of manufacture” and “computer program product” as used herein are intended to encompass a computer program that exists permanently or temporarily on any computer-usable medium or in any transmitting medium which transmits such a program.
  • memory/storage devices can include, but are not limited to, disks, solid state drives, optical disks, removable memory devices such as smart cards, SIMs, WIMs, semiconductor memories such as RAM, ROM, PROMS, etc.
  • Transmitting mediums include, but are not limited to, transmissions via wireless communication networks, the Intemet, intranets, telephone/modem-based network communication, hard-wired/cabled communication network, satellite communication, and other stationary or mobile network systems/communication links.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

L'invention concerne un appareil et un procédé, le procédé comprenant les étapes suivantes : génération de cartes de caractéristiques pour une première couche de convolution d'un réseau neuronal convolutif sur la base d'une région d'une image à évaluer et d'un filtre appris à partir de la première couche de convolution (406) ; génération de cartes de caractéristiques pour une ou plusieurs couches de convolution suivantes du réseau neuronal convolutif sur la base des cartes de caractéristiques d'une couche de convolution précédente, d'un filtre appris pour la couche de convolution précédente et d'un filtre appris pour la couche de convolution suivante (408) ; détection d'une présence d'un objet d'intérêt dans la région de l'image sur la base des cartes de caractéristiques générées de la première et d'une ou plusieurs couches de convolution suivantes (410).
PCT/CN2017/073342 2017-02-13 2017-02-13 Mécanisme de réutilisation de filtre pour construire un réseau neuronal à convolution profonde robuste WO2018145308A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201780089497.0A CN110506277B (zh) 2017-02-13 2017-02-13 用于构建鲁棒的深度卷积神经网络的滤波器重用机制
PCT/CN2017/073342 WO2018145308A1 (fr) 2017-02-13 2017-02-13 Mécanisme de réutilisation de filtre pour construire un réseau neuronal à convolution profonde robuste

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/073342 WO2018145308A1 (fr) 2017-02-13 2017-02-13 Mécanisme de réutilisation de filtre pour construire un réseau neuronal à convolution profonde robuste

Publications (1)

Publication Number Publication Date
WO2018145308A1 true WO2018145308A1 (fr) 2018-08-16

Family

ID=63107795

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/073342 WO2018145308A1 (fr) 2017-02-13 2017-02-13 Mécanisme de réutilisation de filtre pour construire un réseau neuronal à convolution profonde robuste

Country Status (2)

Country Link
CN (1) CN110506277B (fr)
WO (1) WO2018145308A1 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10733742B2 (en) 2018-09-26 2020-08-04 International Business Machines Corporation Image labeling
CN111507172A (zh) * 2019-01-31 2020-08-07 斯特拉德视觉公司 通过预测周围物体移动支持安全的自动驾驶的方法和装置
CN111986199A (zh) * 2020-09-11 2020-11-24 征图新视(江苏)科技股份有限公司 一种基于无监督深度学习的木地板表面瑕疵检测方法
US11176427B2 (en) 2018-09-26 2021-11-16 International Business Machines Corporation Overlapping CNN cache reuse in high resolution and streaming-based deep learning inference engines

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220101095A1 (en) * 2020-09-30 2022-03-31 Lemon Inc. Convolutional neural network-based filter for video coding
CN113866571A (zh) * 2021-08-06 2021-12-31 厦门欧易奇机器人有限公司 一种局放源定位方法、装置以及设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104616032A (zh) * 2015-01-30 2015-05-13 浙江工商大学 基于深度卷积神经网络的多摄像机系统目标匹配方法
CN104866900A (zh) * 2015-01-29 2015-08-26 北京工业大学 一种反卷积神经网络训练方法
CN105718889A (zh) * 2016-01-21 2016-06-29 江南大学 基于GB(2D)2PCANet深度卷积模型的人脸身份识别方法
CN105913087A (zh) * 2016-04-11 2016-08-31 天津大学 基于最优池化卷积神经网络的物体识别方法

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
NL2015087B1 (en) * 2015-06-05 2016-09-09 Univ Amsterdam Deep receptive field networks.

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104866900A (zh) * 2015-01-29 2015-08-26 北京工业大学 一种反卷积神经网络训练方法
CN104616032A (zh) * 2015-01-30 2015-05-13 浙江工商大学 基于深度卷积神经网络的多摄像机系统目标匹配方法
CN105718889A (zh) * 2016-01-21 2016-06-29 江南大学 基于GB(2D)2PCANet深度卷积模型的人脸身份识别方法
CN105913087A (zh) * 2016-04-11 2016-08-31 天津大学 基于最优池化卷积神经网络的物体识别方法

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10733742B2 (en) 2018-09-26 2020-08-04 International Business Machines Corporation Image labeling
US11176427B2 (en) 2018-09-26 2021-11-16 International Business Machines Corporation Overlapping CNN cache reuse in high resolution and streaming-based deep learning inference engines
CN111507172A (zh) * 2019-01-31 2020-08-07 斯特拉德视觉公司 通过预测周围物体移动支持安全的自动驾驶的方法和装置
CN111507172B (zh) * 2019-01-31 2023-08-18 斯特拉德视觉公司 通过预测周围物体移动支持安全的自动驾驶的方法和装置
CN111986199A (zh) * 2020-09-11 2020-11-24 征图新视(江苏)科技股份有限公司 一种基于无监督深度学习的木地板表面瑕疵检测方法
CN111986199B (zh) * 2020-09-11 2024-04-16 征图新视(江苏)科技股份有限公司 一种基于无监督深度学习的木地板表面瑕疵检测方法

Also Published As

Publication number Publication date
CN110506277A (zh) 2019-11-26
CN110506277B (zh) 2023-08-08

Similar Documents

Publication Publication Date Title
WO2018145308A1 (fr) Mécanisme de réutilisation de filtre pour construire un réseau neuronal à convolution profonde robuste
EP3509011B1 (fr) Appareils et procédés de reconnaissance d'expression faciale robuste contre un changement d'expression faciale
KR102442061B1 (ko) 전자 장치, 그의 경고 메시지 제공 방법 및 비일시적 컴퓨터 판독가능 기록매체
US9811732B2 (en) Systems and methods for object tracking
CN113366496A (zh) 用于粗略和精细对象分类的神经网络
CN111656362B (zh) 基于声音反馈的认知的和偶然的深度可塑性
US20170083772A1 (en) Apparatus and method for object recognition and for training object recognition model
JP7203224B2 (ja) 開放された車両ドアを検出するための分類器のトレーニング
CN108388834A (zh) 利用循环神经网络和级联特征映射的对象检测
CN110263920B (zh) 卷积神经网络模型及其训练方法和装置、巡检方法和装置
US10872275B2 (en) Semantic segmentation based on a hierarchy of neural networks
EP4052108A1 (fr) Prédiction de trajectoire d'agent à l'aide dentrées vectorisées
US11880758B1 (en) Recurrent neural network classifier
CN114758502B (zh) 双车联合轨迹预测方法及装置、电子设备和自动驾驶车辆
US11798298B2 (en) Distracted driving detection using a multi-task training process
CN110574041B (zh) 针对深度学习领域的协同激活
CN114495006A (zh) 遗留物体的检测方法、装置及存储介质
US20210357640A1 (en) Method, apparatus and computer readable media for object detection
US20230245437A1 (en) Model generation apparatus, regression apparatus, model generation method, and computer-readable storage medium storing a model generation program
CN114846523A (zh) 使用记忆注意力的多对象跟踪
US20230082079A1 (en) Training agent trajectory prediction neural networks using distillation
US11636694B2 (en) Video-based activity recognition
Ramtoula et al. Msl-raptor: A 6dof relative pose tracker for onboard robotic perception
US11443184B2 (en) Methods and systems for predicting a trajectory of a road agent based on an intermediate space
US20240157979A1 (en) Trajectory prediction using diffusion models

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17896253

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17896253

Country of ref document: EP

Kind code of ref document: A1