CN110506277A

CN110506277A - For constructing the filter reuse mechanism of the depth convolutional neural networks of robust

Info

Publication number: CN110506277A
Application number: CN201780089497.0A
Authority: CN
Inventors: 姜晓恒
Original assignee: Nokia Technologies Oy
Current assignee: Nokia Technologies Oy
Priority date: 2017-02-13
Filing date: 2017-02-13
Publication date: 2019-11-26
Anticipated expiration: 2037-02-13
Also published as: CN110506277B; WO2018145308A1

Abstract

A kind of device and method generate characteristic pattern (406) based on by the region for the image being evaluated and the filter through learning from the first convolutional layer this method comprises: be directed to the first convolutional layer of convolutional neural networks；For the subsequent convolutional layers of one or more of convolutional neural networks, characteristic pattern (408) are generated based on the characteristic pattern of previous convolutional layer, the filter for previous convolutional layer through learning and for the filter through learning of subsequent convolutional layer；And the characteristic pattern generated based on the first convolutional layer and one or more subsequent convolutional layers detects the presence (410) of interested object in the area of the image.

Description

For constructing the filter reuse mechanism of the depth convolutional neural networks of robust

Technical field

This disclosure relates to neural network, and more particularly, to the filtering mechanism for convolutional neural networks.

Background technique

Object identifying is the important component in computer vision field.In the past few years, depth convolutional Neural Network (CNN) has been used for promoting target identification.The powerful place of depth convolutional neural networks is that it being capable of learning characteristic Level the fact.In G.Huang, Z.Liu, Q.Weinberge:Densely Connected Convolutional The example of CNN framework is described in Networks, CoRR, abs/1608.06993 (2016) (hereinafter referred to as " Huang ").In In Huang, CNN framework is proposed, introduces and is directly connected in all layers of the block in neural network.That is, in a block In, every layer is directly connected to other each layers with feed-forward mode.One block generally includes the several layers operated without down-sampling. For each layer, all characteristic patterns in previous layer are considered as individually inputting, and the characteristic pattern of its own is used as to all The input of succeeding layer is persistently transmitted.Core concept is to reuse characteristic pattern generated in the layer of front.However, these characteristic patterns Itself will not bring new information for neural network.

Summary of the invention

Therefore, the disclosure provides a kind of device and method, to be directed to the first convolutional layer of convolutional neural networks, based on will be by The region of the image of assessment and the filter through learning from the first convolutional layer generate characteristic pattern, for convolutional neural networks The subsequent convolutional layers of one or more after the first convolutional layer generate characteristic patterns, and based on the first convolutional layer and one or The characteristic pattern generated of multiple subsequent convolutional layers detects the presence of interested object in the area of the image.It is each subsequent Characteristic pattern of the convolutional layer based on previous convolutional layer, the filter for previous convolutional layer through learning and for subsequent convolutional layer Filter through learning and be generated.

Device and method can be further configured to receive from image sensing apparatus institute captured image, and/or right As starting alarm in the case where being detected.In addition, convolutional neural networks can be applied to each region of image, with detection Object is with the presence or absence of in any region in the region of image.

Device and method can also be configured to during the training stage (or period) using one or more training images To be directed to each convolutional layer learning filters of convolutional neural networks.For learning filters, device and method can be configured To initialize filter for the convolutional layer of convolutional neural networks, it is directed to each convolutional layer using propagated forward and generates feature Figure, calculates loss using loss function based on characteristic pattern generated and for the score of each classification and respective labels, And convolutional layer is directed to using back-propagating in the case where loss calculated is reduced and updates filter.First convolution Characteristic pattern of the subsequent convolutional layer based on previous convolutional layer each of after layer, for previous convolutional layer the filter through learning with And for subsequent convolutional layer filter and be generated.Device and method can be configured to compute repeatedly characteristic pattern, calculate damage The operation for losing and updating filter, until convolutional neural networks are restrained when loss calculated is no longer reduced.

In device and method, two figure features can be for each convolutional layer quilt in one or more subsequent convolutional layers It generates.In addition, generating characteristic pattern for the first convolutional layer, generating characteristic pattern and detection for one or more subsequent convolutional layers The existing operation of object is performed in test phase.

For the presence of test object in the area of the image, device and method can be configured to from convolutional neural networks Application obtain be directed to region score, and will for region score compared with the value of threshold value.If for obtaining for region Divide the value greater than threshold value, then object is detected in the zone.

Detailed description of the invention

The description combination attached drawing of various example embodiments illustrated, in the accompanying drawings:

Fig. 1 illustrates according to an embodiment of the present disclosure for using the convolutional Neural that there is filter to reuse (or shared) Network (CNN) carrys out the block diagram of the present or absent example system of test object；

Fig. 2 illustrate according to another embodiment of the present disclosure for using the convolution that there is filter to reuse (or shared) Neural network (CNN) carrys out the block diagram of the present or absent example system of test object；

Fig. 3 is the exemplary architecture of convolutional neural networks according to an embodiment of the present disclosure, and the convolutional neural networks are subsequent The filter from previous convolutional layer is reused in convolutional layer；

Fig. 4 is the flow chart for showing the instantiation procedure of embodiment of the disclosure, the system quilt in such as Fig. 1 or Fig. 2 Configuration using convolutional neural networks to realize trained and/or test phase by the instantiation procedure；

Fig. 5 is the flow chart for showing instantiation procedure according to an embodiment of the present disclosure, and in such as Fig. 1 or Fig. 2 is System is configured to realize the training stage for training convolutional neural networks by the instantiation procedure；

Fig. 6 is the flow chart for showing instantiation procedure according to an embodiment of the present disclosure, and in such as Fig. 1 or Fig. 2 is System is configured to be realized using housebroken convolutional neural networks by the instantiation procedure for assessing image or its region Test phase；And

Fig. 7 is the flow chart for showing example detection process according to an example embodiment of the present disclosure, such as Fig. 1 or figure System in 2 is configured to depositing for the feature that such as object is detected using convolutional neural networks by the example detection process (or being not present).

Specific embodiment

According to various example embodiments, a kind of device and method are provided, use the depth with filtering reuse mechanism Convolutional neural networks (CNN) analyze image or its region, and the presence for detecting interested (multiple) objects (or is not deposited ).CNN is configured to reuse the filter from previously (for example, front or earlier) convolutional layer, with calculated for subsequent convolution Figure feature in layer.By this method, filter can be fully used or share, so that the ability of character representation is significantly increased By force, to significantly improve the identification accuracy of gained depth CNN.Compared with the other modes for simply reusing previous feature figure, This CNN method reused with filter can also utilize the information (for example, filter) obtained from previous convolutional layer, and New information (for example, characteristic pattern) is generated in current convolutional layer.Further, since each current convolutional layer reuses previous convolutional layer Filter, therefore the framework of this CNN can reduce the number of parameter.Therefore, this configuration can solve too many by using Overfitting problem caused by parameter.

The device and method of the disclosure can be used in object recognition system, such as using camera or other biographies The video monitoring system of sensor.For example, camera can capture several multi-view images of same scene, such as 360 degree of images.Depending on The task of frequency monitoring is that one or more interested objects are detected from multi-view image (for example, pedestrian, animal or other are right As), then provide a user alarm or notice (for example, alarm or warning).Because camera system can be provided to capture 360 Image is spent, so video monitoring system can potentially detect all interested objects occurred in scene or environment.At this In kind monitoring system, each camera (or camera sub-system) can be configured to perform object detection.For example, using having filtering The operation of the video monitoring system of the CNN thought highly of can be related to the following contents.Each camera captures images of system.For institute Each region of captured image can be for example used predefined to be greater than in the response of CNN with the CNN that filter reuses Threshold value in the case where by the territorial classification be interested object, and the response of CNN be equal to or less than threshold value the case where It is lower by the territorial classification be background (for example, non-object).

As described herein, object detection process can be related to training stage and test phase.The target of training stage is to set The structure for the CNN that there is filter to reuse for meter or configuration, and learn the parameter (that is, filter) of CNN.In the training stage, make Training image is used as input to train CNN to detect the presence of (multiple) special object (or being not present).For example, back-propagating The parameter (such as filter) of CNN can be used to learn or configure with the presence (or being not present) of test object.Training image It may include that may be present in the example image of (multiple) interested object, the example image of (multiple) background and image Other aspects.In test phase, the housebroken CNN reused with filter can be applied to the image (example that will be tested Such as, input picture or test image) with the presence (or being not present) of detection (multiple) special object.Utilize housebroken depth The structure and parameter of CNN, the target of test phase are by inputting come to each of image using region as housebroken CNN Territorial classification.Region is classified as interested object or background.For example, being if classification judgement is interested object System for example generates alarm or notice (for example, with voice or alarm signal of form of message), can be immediately via network connection (for example, internet) or other media are sent to user.These operations realized during object detection can be It is performed in each camera or camera sub-system of monitoring system.A magazine camera in systems detects interested Object after alarm can be generated.Object detection process can be implemented in each camera or each camera sub-system or Person is implemented using each camera or each camera sub-system.Below with reference to the accompanying drawings there is filtering to think highly of for description in further detail The example of CNN and object detection systems.

Fig. 1 illustrates the block diagram of the exemplary components of system 100, which is used for the convolution using reuse or shared filter Neural network (CNN) detects the presence (or being not present) of interested object.As shown in Figure 1, system 100 includes one or more A processor 110, multiple sensors 120, (multiple) user interface 130, memory 140, (multiple) communication interface 150, power supply 160 and (multiple) output equipment 170.Power supply 160 may include battery power supply unit, can be chargeable, Huo Zheke To be to provide the unit connecting with external power supply.

Sensor 120 is configured to sensing or monitoring geographic area or environment is (around such as vehicle periphery, building or interior Portion etc.) in activity (for example, (multiple) object).Sensor 120 may include one or more image sensing apparatus or sensing Device.For example, sensor 120 can be with one or more camera lenses camera (for example, camera, web camera, capture panorama or The camera system of 360 degree of images, camera with wide-angle lens or a plurality of lenses etc.).Image sensing apparatus is configured to capture CNN can be used to image or analysis of image data to detect the presence of interested object and (or not deposit in image or image data ).Institute's captured image or image data may include picture frame, video, picture etc..Sensor 120 can also include millimeter Wave radar, infrared camera, laser radar (light detection and ranging) sensor and/or other kinds of sensor.

(multiple) user interface 130 may include multiple user input equipments, and user can be inputted by multiple user Equipment to input information or order to system 100.(multiple) user interface 130 may include keypad, touch-screen display, wheat Gram wind or user can pass through its other users input equipment for inputting information or order.

Output equipment 170 may include display, loudspeaker or the other equipment that information can be conveyed to user.(multiple) Communication interface 150 may include telecommunication circuit, for example, transmitter (TX), receiver (RX), transceiver (such as RF transceiver) Deng, with for the external equipment of such as USB or Ethernet cable interface execute the communication based on route, or be used for such as example Wireless communication is such as executed with external equipment by wireless personal area network, WLAN, cellular network or wireless wide area network.Example Such as, (multiple) communication interface 150 can be used to receive from external computing device 180 (for example, server, data center etc.) CNN and its parameter or its update (for example, the filter through learning for being directed to interested object), to external computing device 180 (for example, equipment etc. of the user of computer) transmitting warnings or other notices, and/or and external computing device 180 interactions, to realize various operations as described herein in a distributed way, such as training stage, test phase, alarm be logical Know and/or as described herein other operate.

Memory 140 is data storage device, can store computer-executable code or program, which can hold The operation of line code or the program control system 100 when being executed by processor 110.Memory 140 can also be stored for CNN 142 and its configuration information of parameter 144 (for example, filter through learning), image 146 (for example, the figure of training image, capture As etc.) and for realizing various operation (such as training stage, test phase, alarm notification and this paper institutes as described herein State other operation) detection algorithm 148.

Processor 110 is communicated with memory 140.Processor 110 is processing system, may include one or more processing Device, such as CPU, GPU, controller, special circuit or control include the disclosure in detection described herein operation (for example, Training stage, test phase, alarm notification etc.) system 100 operation other processing units.For example, processor 110 is matched It sets with by using the configurations such as training image etc, classification/label information or learning parameter (for example, learning filters), to instruct Practice CNN 142 to detect the existence or non-existence of interested object, for example, the interested object of detection (multiple), (multiple) Background etc..Processor 110 is also configured to test capture using the housebroken CNN 142 with the parameter through learning (multiple) image or its region, so as to the presence (or being not present) of the test object in image or its region.Interested object It may include people, animal, vehicle, traffic sign, road hazard of pedestrian etc. or interested according to other of intended application Object.Processor 110 is also configured to starting alarm or other notices when the presence of object is detected, such as by making With output equipment 170 export notice or by via communication interface 150 to external computing device 180 (for example, user's sets Standby, data center, server etc.) transmission notice notifies user.External computing device 180 may include being similar to system 100 In those of component component, such as above with reference to shown in Fig. 1 and described.

It includes showing for (multiple) processor 210 and (multiple) sensor 220 that Fig. 2, which is depicted according to some example embodiments, Example system 200.System 200 can also include RF transceiver 250.In addition, system 200 can be installed in such as automobile or card In the vehicle 20 of vehicle, but the system can also be used in the case where no vehicle 20.System 200 may include such as existing The same or similar component and function provided in the system 100 of Fig. 1.

For example, (multiple) sensor 220 may include one or more imaging sensors, which is configured to The image data of picture frame, video, picture etc. is provided.For example, Senior Officer's auxiliary system/autonomous vehicle the case where Under, sensor 220 may include camera, millimetre-wave radar, infrared camera, laser radar (light detection and ranging) sensor and/ Or other kinds of sensor.

Processor 210 may include CNN circuit, can indicate to be configured to effect convolutional neural networks and described herein Other operation dedicated CNN circuits.Alternatively or additionally, CNN circuit can be otherwise implemented, such as using packet Include by least one processing equipment (for example, CPU, GPU, controller etc.) execute program code at least one processor and by It realizes.

In some example embodiments, system 200 can have the training stage.Training stage can configure CNN circuit and match It sets to be learnt so as to one or more interested objects that detect and/or classify.Processor 210 can use including such as The images of objects such as people, other vehicles, road hazard and be trained to.After being trained to, when image includes (multiple) object, warp (multiple) object can detecte by the achievable housebroken CNN of processor 210 and detection/the classification of (multiple) object is provided Instruction.In the training stage, CNN can learn it and configure (for example, parameter, weight etc.).After being trained to, the CNN that is configured Can be used to detect and/or classify the region (such as piece or part) of unknown input picture in test or operational phase, and And thereby determine that whether input picture includes interested object or only include background (not having interested object).

In some example embodiments, system 200 can be trained to test object, such as people, animal, other vehicles, Traffic sign, road hazard etc..In Senior Officer's auxiliary system (ADAS), it is detected in such as vehicle/people object When, such as warning, touch feedback, the instruction of the object recognized or the output of other instructions can be generated, with for example alert Accuse or notify driver.In the case where including the autonomous vehicle of system 200, the object detected can signal control Circuit to take additional movement (for example, starting pause, acceleration/deceleration, steering and/or other movements) in the car.In addition, Instruction can be transferred to other vehicles, IoT equipment or cloud via radio transceiver 250, mobile edge calculations (MEC) is put down Platform etc..

Fig. 3 is the example of convolutional neural networks (CNN) framework 300 comprising multiple convolutional layers (for example, layer 1 ... layer L or L) and judgement layer.CNN framework 200 is configured to that the filter from previous convolutional layer is reused or shared in subsequent convolutional layer. For example, in layer 1, N₁A characteristic pattern C₁Pass through filter W₁It is obtained.C₁Space width and height be w respectively₁And h₁.In layer In 2, characteristic pattern C₂Not only pass through new filter W₂But also pass through the filter W of previous layer 1₁And it is obtained.Utilize filtering Device W₂, N₂₁A characteristic pattern is obtained.Utilize existing filter W₁, N₂₂A characteristic pattern is obtained.N₂₁A characteristic pattern and N₂₂A spy Sign figure is cascaded with the characteristic pattern C in forming layer 2₂.Therefore, as shown in Figure 2, the filter W of previous layer 1₁The quilt in layer 2 It reuses.Similarly, new filter W₃It is used for the N of generation layer 3₃₁A characteristic pattern, and the filtering obtained in layer 2 previous Device W₂It is used to the N of layer 3₃₂A characteristic pattern.N₃₁A characteristic pattern and N₃₂A characteristic pattern is cascaded with the characteristic pattern of forming layer 3 C₃.In an identical manner, remaining characteristic pattern C₄、C₅…C_LIt is calculated.CNN framework 300 can be used in the detection process with The presence (or being not present) of (multiple) interested object, or the interested region to image are detected in the area of the image Classification.As described herein, detection process may include parameter of the training stage to use training image to learn for CNN, and And including test phase to apply housebroken CNN come the territorial classification to image and detect the presence of interested object (or being not present).Below with reference to the accompanying drawings the example of training stage and test phase described.

Fig. 4 is the flow chart for showing instantiation procedure 400, and the system of such as Fig. 1 or Fig. 2 are configured to through the example Process realizes training and/or the test phase of convolutional neural networks shown in such as Fig. 3.For purposes of illustration, Processor 110 and other assemblies below with reference to the system 100 in Fig. 1 discuss process 400, and the process 400 describes The higher level operation performed by training stage and test phase.

At appended drawing reference 402, processor 110 is configured to provide convolutional neural networks during the training stage.

At appended drawing reference 404, processor 110 is configured to during the training stage for each of convolutional neural networks Convolutional layer learning parameter, such as filter.

At appended drawing reference 406, processor 110 is configured to during test phase, for the of convolutional neural networks One convolutional layer generates feature based on by the region for the image being evaluated and the filter through learning from the first convolutional layer Figure.

At appended drawing reference 408, processor 110 be configured to one in test phase, for convolutional neural networks or Multiple subsequent convolutional layers, characteristic pattern based on previous convolutional layer, the filter for previous convolutional layer through learning and are directed to The filter through learning of subsequent convolutional layer generates characteristic pattern.

At appended drawing reference 410, processor 110 is configured to be based on the first convolutional layer and one or more in test phase The characteristic pattern generated of subsequent convolutional layer, to detect the presence (or being not present) of interested object in the area of the image. In the case where object is detected, processor can be configured to start alarm to user or other entities or other are logical Know.

Fig. 5 is the flow chart for showing instantiation procedure 500, and the system in such as Fig. 1 or Fig. 2 is configured to show by this Example process come realize for training have filter reuse CNN (for example, with reference to Fig. 3) training stage.For the mesh of explanation , processor 110 and other assemblies below with reference to system 100 in Fig. 1 discuss process 400, and the process 400 describes The performed operation during the training stage.

At appended drawing reference 502, prepare the set and its respective labels of training image.For example, if training image includes Interested object, then label is arranged to digital (for example, 1).If training image does not include interested object, image Label be arranged to another digital (for example, -1).The set of training image and its corresponding the label quilt during the training stage For in the design and configuration of CNN, for detecting interested object.

At appended drawing reference 504, processor 110 realizes the initialization operation of the parameter of such as filter for CNN.Example Such as, processor 110 for CNN such as in Fig. 3 convolutional layer (for example, layer 1 ... L or l) initialization filter (for example, W₁…W_L).Filter can be initialized by using the Gaussian Profile with zero-mean and very little variance (such as 0.01).

At appended drawing reference 506, processor 110 such as utilizes training image or its area from training image set Domain generates (for example, being calculated or estimated) characteristic pattern using propagated forward as input on the basis of layer-by-layer convolution.For example, this Operation may involve the use of two filters to calculate characteristic pattern, shown in the CNN framework 300 of such as Fig. 3.One filter comes from Previous convolutional layer, another filter come from current convolutional layer.For example, giving the W of given layer l_iWith the W of layer l+1_i+1, generated in layer l Characteristic pattern be represented as N_l.In the characteristic pattern of computation layer l+1, convolution operation is performed twice.Firstly, calculating characteristic patternWherein " ° " indicates convolution operation.Secondly, characteristic pattern is calculated as follows:With Afterwards, characteristic patternWithIt is cascaded the final output N with generation layer l+1_l+1.It should be noted that W_lIt is used to count in layer 1 Calculate characteristic patternTherefore, the filter W used in layer 1_lIt is reused in layer l+1 to generate new spy Sign figure.

At appended drawing reference 508, processor 110 realizes judgement layer, and costing bio disturbance is performed in the judgement layer.For example, Processor 110 is such as by calculating loss, Lai Zhihang costing bio disturbance according to the final score for each classification and respective labels. Softmax loss function can be used to execute costing bio disturbance.The example of softmax loss function is by equation as follows (1) it indicates:

Wherein:

Y is the vector for indicating the score for all classes, and

y_cIt is the score of class c.

Instead of softmax loss function, other functions can also be used in judgement layer, such as support vector machines (SVM) loss function or other loss functions appropriate for being used together with CNN.For example, softmax loss function calculates Intersect entropy loss, and SVM loss function calculates hinge loss.For classification task, the two function performances are almost the same.

At appended drawing reference 510, processor 110 based on loss (for example, variation of loss calculated) calculated come Determine whether the filter of CNN should be updated.For example, processor 110 determines whether loss has stopped reducing or change, or In other words, whether CNN is restraining person.If loss has stopped reducing, processor 110 exports at appended drawing reference 514 Filter (for example, filter through learning), for being used in CNN during test phase.The filter of output can be with It is stored in memory, for being used together with CNN.

Otherwise, if loss stops reducing not yet, processor 110 updates the filtering of CNN at appended drawing reference 512 Device.For example, back-propagating (for example, standard back-propagating or its other modification) may be implemented to update the institute of CNN in processor 110 There is filter.For example, filter can be updated by chain rule according to following equation (2) during back-propagating:

Wherein,

ε indicates loss function, and

It is the gradient propagated from deep layer.

Then, filter is updated as follows:

Wherein

η indicates to update coefficient (for example, learning rate).

Then, process 500 is continued by the operation in repeat reference numerals 506,508 and 510, until damage calculated It loses until stopping reducing or changing (or in other words CNN restrains).

Fig. 6 is the flow chart for showing instantiation procedure 600, and the system of such as Fig. 1 or Fig. 2 are configured to through the example Process realizes test phase, to assess for using the housebroken CNN (for example, with reference to Fig. 3) reused with filter Image or its region.Test phase and training stage be not the difference is that it needs to update filter.On the contrary, test phase The filter from the training stage through learning can be taken to classify or test object.In addition, without calculating the damage for judgement layer It loses.Which classification judgement layer simply adjudicates with highest score.For purposes of illustration, below with reference to the system in Fig. 1 100 processor 110 and other assemblies discuss process 600, and the process 600 description is performed during test phase Operation.

At appended drawing reference 602, processor 110 may include interested object (such as target object) by determination (image) region come realize region suggest operation.For example, a kind of to identify interested region for the simple of assessment Method is to take sliding window technique, scans input picture with exhausting formula.Other methods can also be taken.

At appended drawing reference 604, processor 110 realizes that figure feature is generated using the CNN reused with filter.Example Such as, the interested region of image is applied to CNN by processor 110, and uses the ginseng of such as study from the training stage Number (such as filter) generates characteristic pattern on the basis of layer-by-layer convolution.Figure feature generating process in test phase can be similar In performed process in the training stage, such as described above with reference to Fig. 5.

At appended drawing reference 606, processor 110 realizes judgement layer to execute the classification or object detection in region.For example, In It adjudicates in layer, processor 110 can be using score vector y as input, and which is determined (for example, y_c) obtained with highest Point.This operation exports label (for example, pedestrian) corresponding with top score.

As discussed previously, softmax loss function or unknown losses function can be used in judgement layer, and such as SVM loses letter Number.Softmax loss function, which calculates, intersects entropy loss, and SVM loss function calculates hinge loss.For classification task, this two A function performance is almost the same.

Fig. 7 is the flow chart for showing example detection process 700, and the system of such as Fig. 1 or Fig. 2 are configured to through this Example detection process using the housebroken CNN (for example, with reference to Fig. 3) reused with filter detects interested object Presence (or being not present).For purposes of illustration, the processor 110 below with reference to the system 100 in Fig. 1 and other assemblies are come Discussion process 700.

At appended drawing reference 702, (multiple) sensor 120 captures (multiple) image.Depending on answering for detection process 700 With image can be captured for different scenes.For example, (multiple) sensor 120 can be by positioning, installation or placement to capture For fixed position (for example, different location or other positions in building or around building) or it is directed to moveable position The image of (for example, position around mobile vehicle, people or other systems).For example, such as single-lens or multi-lens camera or Camera system to capture the camera system of panorama or 360 degree of images is installed on vehicle.

At appended drawing reference 704, processor 110 scans each area of such as image of (multiple) image from capture Domain.

At appended drawing reference 706, processor 110 is such as by realizing that CNN is applied to each of image by test phase Region.The example of test phase process 600 as described in reference Fig. 6 describes.As described above, the application of CNN is provided and is directed to The score of the test zone of image.

At appended drawing reference 708, processor 110 determines whether the score from CNN is greater than threshold value (for example, threshold value Value).

If score is not more than threshold value, at appended drawing reference 710, processor 110 does not start alarm or notice.Process 700 continue to capture and assess image.Otherwise, if score is greater than threshold value, at appended drawing reference 712, processor 110 starts instead Reflect the alarm or notice of the detection of object of interest or the classification of this object.As discussed previously, depending on for detecting The intended application of process, the example of interested object may include pedestrian, animal, vehicle, traffic sign, road hazard or its His related object.Alarm or notice can be opened via an output equipment in output equipment 170 is local at system 100 It is dynamic, or it is transferred to external computing device 180.Alarm can with vision or audible notification or other media appropriate (such as Vibration etc.) form be provided to user.

Experimental example

Experimental result on KITTI data set shows this method and system for using the CNN reused with filter Validity.KITTI data set is captured by a pair of of camera.The subset for being used for the KITTI data set of pedestrian detection includes 7481 A training image and 7518 test images.In this method and system, depth CNN can be for example including L=13 layer.Filtering Device W₁、W₂、W₃、W₄、W₅、W₆、W₇、W₈、W₉、W₁₀、W₁₁、W₁₂And W₁₃Size be 3 × 3 × 3 respectively, 3 × 3 × 32,3 × 3 × 64, 3×3×64、3×3×128、3×3×128、3×3×128、3×3×256、3×3×256、3×3×256、3×3× 256,3 × 3 × 256 and 3 × 3 × 256.Traditional VGG neural network with use the filter reuse mechanism using CNN The example of this method and system is compared.The mean accuracy (AP) of current CNN with filter reuse is 60.43%, and is passed The mean accuracy for VGG neural network of uniting is 56.74% (for example, with reference to Simonyan K, Zisserman A.:Very deep convolutional networks for large-scale image recognition.arXiv preprint arXiv:1409.1556(2014)).As can be seen that be better than with the current CNN method that filter reuses traditional significantly VGG method.That is, filter reuses or shared introducing plays an important role in improvement object detection aspect of performance.In this way, In This method and system that filtering reuse mechanism is used in CNN can be that field of video monitoring mentions for field of object detection and thus For significant improvement.

It should be understood that the systems and methods are only provided as example.Although system as described herein can be used 100 or 200 realize operation including training stage, test phase and alarm notification and other aspects, but these are operated It can be distributed and be executed across multiple systems by (multiple) communication network.In addition, other than the back-propagating of standard, training Stage can alternatively use other modifications of back-propagating, which can be intended to improve the performance of back-propagating.Instruction Experienced and test phase can also take other loss functions or Training strategy appropriate.As described herein, reuse or total is utilized The CNN mode for enjoying filter can be used for various applications, including but not limited in video monitoring system, autonomous or semi-autonomous In or ADAS realize in object detection/identification.

It is also understood that example embodiment disclosed and taught herein allows a variety of and various modifications and alterative version.Cause This, the number of item is not intended to be limited to using such as, but not limited to "one" equal singular references.

, it is contemplated that the exploitation of actual, the true business application comprising the disclosed embodiments will need many realities Now specific judgement, to reach the final goal that Development policy implements business.This specific judgement of realization may include simultaneously And may be not limited to abide by that system is related, business is related, government is related and other constraint conditions, these constraint conditions may be by Change in specific implementation, position and with the time.Although being said in absolute sense, the work of the side of exploitation may be it is complicated and Time-consuming, but for those skilled in the art in benefit of this disclosure, this work is still conventional affairs.

Using description provided herein, example embodiment can be generated by using standard program and/or engineering technology Programming software, firmware, hardware or any combination thereof, and it is implemented as machine, process or product.

Any (multiple) institute's calling programs with computer readable program code can be embodied in one or more meters On calculation machine usable medium, such as residence memory equipment, smart card or other removable memory equipment or transmission device, thus Manufacture computer program product or product according to the embodiment.In this way, as used herein term " product " and " computer journey Sequence product " is intended to cover permanently or temporary be present on any computer usable medium or transmits any of this program Computer program in transmission medium.

As described above, memory/storage can include but is not limited to disk, solid state drive, CD, move and deposit Storage device (such as smart card, SIM, WIM), semiconductor memory (such as RAM, ROM, PROM).Transmission medium includes but not Be limited to the transmission via following items: cordless communication network, internet, Intranet, the network based on telephone/modem are logical Letter, rigid line/cabled communication network, satellite communication and other are fixed or mobile network system/communication link.

While there has been illustrated and described that the specific embodiment of the disclosure and application, it should be appreciated that, the disclosure is unlimited In precision architecture disclosed herein and composition, and in the case where not departing from the present invention as defined in the appended claims, According to foregoing description, various modifications, variation and modification be can be significantly.

Claims

1. a method of computer implementation, comprising:

Configuration is with the first convolutional layer for convolutional neural networks, based on by the region for the image being evaluated and from described first The filter through learning of convolutional layer generates characteristic pattern；

Configuration generates characteristic pattern, each subsequent convolutional layer with the subsequent convolutional layer of one or more for the convolutional neural networks The characteristic pattern based on previous convolutional layer, the filter for the previous convolutional layer through learning and for described subsequent The filter through learning of convolutional layer and be generated；And

Configuration with based on the characteristic pattern generated of first convolutional layer and one or more of subsequent convolutional layers come in institute State the presence that interested object is detected in the region of image.

2. according to the method described in claim 1, further comprising:

Configuration is to receive the described image captured from image sensing apparatus.

3. method according to any one of claims 1 and 2 further comprises:

It configures to be directed to each volume of the convolutional neural networks using one or more training images during the training stage Lamination learning filters.

4. according to the method described in claim 3, wherein described configure with learning filters includes:

Configuration is to initialize filter for the convolutional layer of the convolutional neural networks；

Configuration is directed to each convolutional layer and generates characteristic pattern to use propagated forward, subsequent each of after first convolutional layer The characteristic pattern of the convolutional layer based on previous convolutional layer, the filter for the previous convolutional layer through learning and it is directed to institute It states the filter of subsequent convolutional layer and is generated；

Configuration with based on the characteristic pattern generated and for the score of each classification and respective labels using loss function come Calculate loss；And

Configuration in the case where loss calculated is reduced using back-propagating to be directed to described in the convolutional layer update Filter,

Wherein the configuration is to calculate characteristic pattern, the configuration to calculate loss and the configuration to update the filter quilt It repeats, until when loss calculated no longer reduces, the convolutional neural networks are restrained.

5. method according to claim 1 to 4, two of them figure feature is for one or more of subsequent The subsequent convolutional layer of each of convolutional layer is generated.

6. according to the method described in claim 5, further comprising:

Described two figure features are cascaded for the subsequent convolutional layer of each of one or more of subsequent convolutional layers.

7. method according to any one of claim 1 to 6, wherein the configuration is to generate feature for the first convolutional layer Figure, the configuration for one or more subsequent convolutional layers generation characteristic patterns and the configuration with the presence of test object to be existed Test phase is performed.

8. method according to any one of claim 1 to 7, wherein the configuration includes: to detect

Configuration is to obtain the score for the region from the application of the convolutional neural networks；And

It configures with the score by the region is directed to compared with the value of threshold value,

Wherein if being greater than the value of the threshold value for the score in the region, the object is detected in this region It measures.

9. method according to any one of claim 1 to 8, further comprises:

Configuration is to start alarm in the case where the object is detected.

10. method according to any one of claim 1 to 9, wherein the convolutional neural networks are applied to the figure Each region of picture, to detect the object with the presence or absence of in any region in the region of described image.

11. a kind of includes the device for executing the component of method according to any one of claim 1 to 10.

12. a kind of computer program product including computer generation code instruction, the computer generation code instruction is by least one When processor executes, device is made at least to execute method according to any one of claim 1 to 10.

13. a kind of device, comprising:

Memory, and

One or more processors are configured to:

For the first convolutional layer of convolutional neural networks, based on by the region for the image being evaluated and warp from the first convolutional layer The filter of study generates characteristic pattern；

Characteristic pattern is generated for the subsequent convolutional layer of one or more of the convolutional neural networks, each subsequent convolutional layer is based on first The characteristic pattern of preceding convolutional layer, the filter for the previous convolutional layer through learning and it is directed to the subsequent convolutional layer The filter through learning and be generated；And

Based on the characteristic pattern generated of first convolutional layer and one or more of subsequent convolutional layers come in described image The region in detect the presence of interested object.

14. device according to claim 13, wherein one or more of processors be further configured with receive from The described image that image sensing apparatus is captured.

15. device described in any one of 3 and 14 according to claim 1, wherein one or more of processors further by It configures to be directed to each convolutional layer of the convolutional neural networks using one or more training images during the training stage Learning filters.

16. device according to claim 15, wherein one or more of processors are configured to learning filters With:

Filter is initialized for the convolutional layer of the convolutional neural networks；

It is directed to each convolutional layer using propagated forward and generates characteristic pattern, subsequent convolutional layer each of after first convolutional layer The characteristic pattern based on previous convolutional layer, the filter for the previous convolutional layer through learning and for described subsequent The filter of convolutional layer and be generated；

Damage is calculated using loss function based on the characteristic pattern generated and for the score of each classification and respective labels It loses；And

The convolutional layer, which is directed to, using back-propagating in the case where loss calculated is reduced updates the filter,

Wherein one or more of processors are configured to compute repeatedly characteristic pattern, calculate loss and update the filter Operation, until it is calculated loss no longer reduce when the convolutional neural networks restrain until.

17. device described in any one of 3 to 16 according to claim 1, two of them figure feature is for one or more of The subsequent convolutional layer of each of subsequent convolutional layer is generated.

18. device described in any one of 3 to 17 according to claim 1, wherein one or more of processors are configured to Described two figure features are cascaded for the subsequent convolutional layer of each of one or more of subsequent convolutional layers.

19. device described in any one of 3 to 18 according to claim 1, wherein one or more of processors are configured to Characteristic pattern is generated for the first convolutional layer in test phase, generates characteristic pattern and inspection for one or more subsequent convolutional layers Survey the presence of object.

20. device described in any one of 3 to 19 according to claim 1, wherein the presence to test object, described one A or multiple processors are configured to:

The score for the region is obtained from the application of the convolutional neural networks；And

By for the region the score compared with the value of threshold value,

21. device described in any one of 3 to 20 according to claim 1, wherein one or more of processors further by Configuration is to start alarm in the case where the object is detected.

22. device described in any one of 3 to 21 according to claim 1, wherein the convolutional neural networks be applied to it is described Each region of image, to detect the object with the presence or absence of in any region in the region of described image.

23. a method of computer implementation, comprising:

Configuration is to initialize filter for the convolutional layer of convolutional neural networks；

Configuration using loss function to be calculated based on characteristic pattern generated and for the score of each classification and respective labels Loss；And

24. a kind of includes the device for executing the component of the method according to claim 11.

25. a kind of computer program product including computer generation code instruction, the computer generation code instruction is by least one When processor executes, device is made at least to execute the method according to claim 11.

26. a kind of device, comprising:

Memory；And

One or more processors are configured to:

Filter is initialized for the convolutional layer of convolutional neural networks；

Based on characteristic pattern generated and for the score of each classification and respective labels, loss is calculated using loss function； And