CN110008927A

CN110008927A - One kind automating determination method based on the improved alert of deep learning model Fast-Rcnn

Info

Publication number: CN110008927A
Application number: CN201910301206.0A
Authority: CN
Inventors: 轩素辉; 欧阳文文; 生拥宏; 于绘娟; 张瑞
Original assignee: Henan Dahua Security Polytron Technologies Inc
Current assignee: Henan Dahua Security Polytron Technologies Inc
Priority date: 2019-04-15
Filing date: 2019-04-15
Publication date: 2019-07-12

Abstract

The invention discloses one kind to automate determination method based on the improved alert of deep learning model Fast-Rcnn, the following steps are included: S1: Image Acquisition, the analog video signal of CCD or cmos camera is stored after A/D using image pick-up card or Video Adapter, then computer is sent to be handled；S2: candidate frame determines；S3: depth network extracts feature, CNN network can connect several full articulamentums after convolutional layer, the characteristic pattern that convolutional layer generates is mapped to the feature vector of a regular length, uses around the pixel image block as the input of CNN for training and predicting；S4: image classification can be responsible for creating and returning keymake appropriate by adapter for the practical class of object specific to target device；S5: picture size adjustment；S6: prediction of result, most latter two loss layers will be changed to one softmax layers, using this algorithm only detect human body it is mobile when, can have good prediction to true alert degree.

Description

One kind is determined based on the improved alert automation of deep learning model Fast-Rcnn Method

Technical field

The present invention relates to computer software technical fields, specially a kind of to be improved based on deep learning model Fast-Rcnn Alert automate determination method.

Background technique

There are thousands of family and trade company to be mounted with that security protection camera, most of camera have such function at present Can, i.e., when security protection alarm is triggered, camera can capture photo site automatically and pass the security protection control centre at rear back.Mesh The preceding processing for these images is essentially artificial treatment, i.e., has staff's point to open picture viewing to determine whether really from the background There is alert, and whether needs responding.Not only speed is slow for such processing mode, but also does not have uniformity.Different personnel's Judgment criteria is different, is difficult to unification for the judgement of alert, and the same staff is in the judgment criteria of same time It may also be different.It gives one example, when staff's fatigue is, the accuracy of judgement may decline.

A problem is also that the size issue of data volume.The user of annual security protection is increasing, at present may manpower What is also handled comes, but when number of users more to a certain degree when, required manpower is too many, and cost is also just on Rise, so there are many disadvantages with the method that manpower judges at present.

Summary of the invention

The purpose of the present invention is to provide one kind to be determined based on the improved alert automation of deep learning model Fast-Rcnn Method, not instead of by the movement of detection object, only detected using algorithm human body it is mobile when, to true alert degree There can be good prediction, to solve the problems mentioned in the above background technology.

To achieve the above object, the invention provides the following technical scheme: a kind of changed based on deep learning model Fast-Rcnn Into alert automate determination method, the alert automate determination method specific step is as follows:

S1: Image Acquisition, using image pick-up card or Video Adapter by CCD (Charge Coupled Device, charge coupling Clutch part) or the analog video signal of CMOS (complementary metal oxide semiconductor) video camera stored after A/D, then send meter Calculation machine is handled；

S2: candidate frame determines that image normalization is sent directly into network for 224 × 224, first five stage is the conv+ on basis Relu+pooling form ends up in the 5th stage, inputs P candidate region (picture numbers × geometric position × 4 1+, serial number For training), each candidate region is uniformly divided into M × N block by roi_pool layers, to every piece of progress max pooling.By feature Candidate region not of uniform size is changed into the unified data of size on figure, is sent into next layer；

S3: depth network extracts feature, and CNN network can connect several full articulamentums after convolutional layer, by convolutional layer The characteristic pattern of generation is mapped to the feature vector of a regular length, uses an image block around the pixel as CNN's Input for training and predicting, in program the 6th and 32 it is corresponding be pool1 and pool5 layers in former network.Whole network output As a result, you can correspond to former network size finally by output result, to save the picture feature result of other layers；

S4: image classification, for object practical class specific to target device, it is appropriate to be responsible for creating and returning by adapter Keymake, CIFAR102 and CIFAR100 database is two subsets of Tiny, has separately included 10 classes and 100 class objects Images body classification；

S5: picture size adjustment, the pond oI layer are used to different size of input tensor pond turning to fixed size, the pond RoI Change layer specify pond window quantity for W times HW × H, the size of each pond window be according to pond regional change,

S6: prediction of result, most latter two loss layers will be changed to one softmax layers, and input is the score of classification, output Probability is finally used NMS to each classification, is classified using softmax, is obtained frame using recurrence and is returned, the main body of CNN Structure may come from AlexNet, can be from VGGNet.

Preferably, the CMOS in the step S1 is complementary metal oxide semiconductor, it is originally one in computer system The important chip of kind, saves the most basic data of System guides, and CMOS passes through the image sensing being processed into digital photography Device.

Preferably, the node that is set as input layer for common max pooling layers in the step S2 and be the section of output layer Point and then decision function indicate whether i-node is selected as maximum value by j node and is exported, and not selected there are two types of possible: not in range It is interior, or be not maximum value, for roi max pooling, an input node may be connected with multiple output nodes, be set as The node of input layer is the output node of the candidate region.

Preferably, the picture size of two databases in the step S4 is all 32 × 32, and is color image.

Preferably, CIFAR10 includes 60,000 image, wherein 5 general-purposes are in model training, 1 general-purpose is in test, every one kind object Body has 5000 images for training, and 1000 images are similar with CIFAR-10 composition for testing CFAR-100.

Preferably, the pond layer in the step S5 includes super parameter, and the super parameter in pond includes filter size f With stride s, common parameter value is f=2, s=2.

Preferably, applying frequency is very high, and effect is equivalent to height and width reduction half, also has using f=3, s= 2 the case where.

Preferably, Softmax function in the step S6, or normalization exponential function, are that one kind of logical function pushes away Extensively, Softmax function is actually the log of gradient normalization of finite term discrete probability distribution.

Technical effect and advantage of the invention: proposed by the present invention a kind of based on deep learning model Fast-Rcnn improvement Alert automate determination method, compared with prior art, can only detect people using the improved calculation method of Fast-Rcnn Body it is mobile when, can have good prediction to true alert degree, avoid leading because of toy in image or natural cause The mobile technical problem for causing computer higher to the rate of false alarm of alert of the object of cause.

Detailed description of the invention

Fig. 1, which is that the present invention is a kind of, automates determination method frame based on the improved alert of deep learning model Fast-Rcnn Schematic diagram.

Fig. 2 is a kind of stream that determination method is automated based on the improved alert of deep learning model Fast-Rcnn of the present invention Cheng Tu.

Specific embodiment

Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.Herein Described specific embodiment is only used to explain the present invention, is not intended to limit the present invention.Based on the embodiments of the present invention, Every other embodiment obtained by those of ordinary skill in the art without making creative efforts, belongs to this hair The range of bright protection.

Sentenced referring to Fig. 1, the present invention provides one kind based on the improved alert automation of deep learning model Fast-Rcnn Determine method, comprising the following steps:

S1: Image Acquisition, using image pick-up card or Video Adapter by CCD (Charge Coupled Device, charge coupling Clutch part) or the analog video signal of CMOS (complementary metal oxide semiconductor) video camera stored after A/D, then send meter Calculation machine is handled, and CCD refers to charge-coupled device, be it is a kind of indicate signal magnitude with charge amount, with coupled modes transmit believe Number detecting element；

S2:S2: candidate frame determines that image normalization is sent directly into network for 224 × 224, first five stage is basic Conv+relu+pooling form ends up in the 5th stage, P candidate region of input (picture numbers × geometric position × 4 1+, Serial number is for training), each candidate region is uniformly divided into M × N block by roi_pool layers, to every piece of progress max pooling.It will Candidate region not of uniform size is changed into the unified data of size on characteristic pattern, is sent into next layer, and RoI refers to Mapping of " candidate frame " obtained after the completion of SelectiveSearch on characteristic pattern, candidate frame is generated by RPN, so Each " candidate frame " is mapped on characteristic pattern again afterwards, obtains RoIs；

S3:S3: depth network extracts feature, and CNN network can connect several full articulamentums after convolutional layer, by convolution The characteristic pattern (feature map) that layer generates is mapped to the feature vector of a regular length, uses one around the pixel Image block as CNN input for training and predict, in program the 6th and 32 it is corresponding be pool1 and pool5 in former network Layer.Whole network output as a result, you can finally by output result correspond to former network size, to save other layers Picture feature result；

S5: picture size adjustment, the pond oI layer are used to different size of input tensor pond turning to fixed size, the pond RoI Change layer specify pond window quantity for W times HW × H, the size of each pond window is according to pond regional change；

CMOS in the step S1 is complementary metal oxide semiconductor, it is originally a kind of important in computer system Chip, saves the most basic data of System guides, and CMOS passes through the imaging sensor being processed into digital photography.

The node of input layer is set as by common max pooling layers in the step S2 and is then sentenced for the node of output layer Certainly whether function representation i-node by j node is selected as maximum value output, it is not selected there are two types of may: not in range, or not It is maximum value, for roi max pooling, an input node may be connected with multiple output nodes, be set as input layer Node is the output node of the candidate region.

The picture size of two databases in the step S4 is all 32 × 32, and is color image, CIFAR10 packet Containing 60,000 image, wherein 5 general-purposes, in model training, for 1 general-purpose in test, each type objects have 5000 images for training, 1000 images are similar with CIFAR-10 composition for testing CFAR-100.

Pond layer in the step S5 includes super parameter, and the super parameter in pond includes filter size f and stride S, common parameter value are f=2, and s=2, applying frequency is very high, and effect is equivalent to height and width reduction half, are also had Using f=3, the case where s=2.

Softmax function in the step S6, or normalization exponential function, are a kind of popularizations of logical function, Softmax function is actually the log of gradient normalization of finite term discrete probability distribution.

In summary: preferential selection is passed through using the analog video signal of CMOS (complementary metal oxide semiconductor) video camera It is stored after A/D, then computer is sent to be handled, image normalization is sent directly into network for 224 × 224, first five stage is base The conv+relu+pooling form of plinth, after every piece of progress max pooling, by candidate regions not of uniform size on characteristic pattern Domain is changed into the unified data of size, is sent into next layer.

CNN network can connect several full articulamentums after convolutional layer, and the characteristic pattern that convolutional layer generates is mapped to one The feature vector of a regular length uses around the pixel image block as the input of CNN for training and predicting, right In object practical class specific to target device, be responsible for creating and returning keymake appropriate by adapter, will be different size of Input tensor pond turns to fixed size, and it is W times HW × H that the pond RoI layer, which specifies the quantity of pond window, most latter two Loss layers will be changed to one softmax layers, and input is the score of classification, and output probability uses NMS to each classification, uses Softmax is classified.

Finally, it should be noted that the foregoing is only a preferred embodiment of the present invention, it is not intended to restrict the invention, Although the present invention is described in detail referring to the foregoing embodiments, for those skilled in the art, still may be used To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features, All within the spirits and principles of the present invention, any modification, equivalent replacement, improvement and so on should be included in of the invention Within protection scope.

Claims

1. one kind automates determination method based on the improved alert of deep learning model Fast-Rcnn, it is characterised in that: the alert Automating determination method, specific step is as follows:

S1: Image Acquisition, using image pick-up card or Video Adapter by the analog video signal of CCD or cmos camera through A/D After store, then computer is sent to be handled；

S2: candidate frame determines that image normalization is sent directly into network for 224 × 224, first five stage is the conv+relu+ on basis Pooling form ends up in the 5th stage, inputs P candidate region, each candidate region is uniformly divided into M by roi_pool layers Candidate region not of uniform size on characteristic pattern is changed into the unified data of size to every piece of progress max pooling by × N block, It is sent into next layer；

S3: depth network extracts feature, and CNN network can connect several full articulamentums after convolutional layer, convolutional layer is generated Characteristic pattern be mapped to the feature vector of a regular length, use an image block around the pixel as the input of CNN For training and predicting, in program the 6th and 32 it is corresponding be pool1 and pool5 layers in former network；The knot of whole network output Fruit can correspond to former network size finally by output result, to save the picture feature result of other layers；

S4: image classification, for object practical class specific to target device, be responsible for creating and returning volume appropriate by adapter Device is write, CIFAR102 and CIFAR100 database is two subsets of Tiny, has separately included 10 classes and 100 class object Images bodies Classification；

S5: different size of input tensor pond is turned to fixed size by picture size adjustment, the pond oI layer, and the pond RoI layer is specified The quantity of pond window be W times HW × H, the size of each pond window is according to pond regional change；

S6: prediction of result, most latter two loss layers will be changed to one softmax layers, and input is the score of classification, output probability, NMS is used to each classification, is classified using softmax, frame is obtained using recurrence and returns.

2. according to claim 1 a kind of based on deep learning model Fast-Rcnn improved alert automation judgement side Method, it is characterised in that: the CMOS in the step S1 is complementary metal oxide semiconductor, it is originally a kind of in computer system Important chip, saves the most basic data of System guides, and CMOS passes through the imaging sensor being processed into digital photography.

3. according to claim 1 or 2 a kind of based on the improved alert automation judgement of deep learning model Fast-Rcnn Method, it is characterised in that: the node that is set as input layer for common max pooling layers in the step S2 and be the section of output layer Then decision function indicates whether i-node is selected as maximum value by j node and is exported to point.

4. according to claim 1-3 a kind of automatic based on the improved alert of deep learning model Fast-Rcnn Change determination method, it is characterised in that: the picture size of two databases in the step S4 is all 32 × 32, and is colored Image.

5. according to claim 1 a kind of based on deep learning model Fast-Rcnn improved alert automation judgement side Method, it is characterised in that: the pond layer in the step S5 includes super parameter, and the super parameter in pond includes filter size f With stride s, common parameter value is f=2, s=2.

6. according to claim 1 a kind of based on deep learning model Fast-Rcnn improved alert automation judgement side Method, it is characterised in that: Softmax function in the step S6, Softmax function are actually finite term discrete probability distribution Log of gradient normalization.