CN110287862A

CN110287862A - Anti- detection method of taking on the sly based on deep learning

Info

Publication number: CN110287862A
Application number: CN201910545151.8A
Authority: CN
Inventors: 张静; 胡锐; 周秦; 申枭; 李云松
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2019-06-21
Filing date: 2019-06-21
Publication date: 2019-09-27
Anticipated expiration: 2039-06-21
Also published as: CN110287862B

Abstract

The present invention discloses a kind of anti-detection method of taking on the sly based on deep learning, the steps include: the target detection network of 1, building deep learning；2, training set is generated；3, take a variety of scale picture frame modes that the same behavior of taking on the sly in picture is marked；4, training deep learning network；5, behavior of taking on the sly is detected；6, image enhancement is carried out to the picture without behavior of taking on the sly；7, the enhanced image of feature is detected again.The present invention is by taking a variety of scale picture frame modes when data set is marked, the low problem of accuracy is detected caused by overcoming because of behavior act diversification of taking on the sly, it constructs deep learning network and image enhancement processing is carried out to humanoid region, ensure that can reach live effect on the behavioral value of taking on the sly in monitor video, and accuracy with higher.

Description

Anti- detection method of taking on the sly based on deep learning

Technical field

The invention belongs to technical field of image processing, further relate to one of target detection technique field and are based on deeply Spend the anti-detection method of taking on the sly of study.The present invention can examine in real time the behavior of taking on the sly of people taken in video monitoring It surveys.

Technical background

Anti- in video monitoring take on the sly detection be in many privacy mechanisms and unit one very it is necessary to behavior, energy The internal security information of enough anti-locking mechanisms or unit is outwardly revealed.But in practice, stealing in artificial detection video monitoring Bat behavior is time-consuming and laborious, and is difficult to accomplish real-time detection.To solve the above problems, people are commonly designed object detection method, The behavior of taking on the sly in video monitoring is detected using computer.

Patent document " based on computer vision anti-take pictures display system of the abundant benefit year electronics Nantong Co., Ltd in its application One kind is provided in system and anti-photographic method " (application number: 201811171034.1, publication No.: CN109271814A) based on Anti- display system and the anti-photographic method of taking pictures of calculation machine vision.Steps of the method are call in from database based on RGB first The image data of color space, and data are filtered, keep data more smooth；Then RGB color is mapped Morphological scale-space is carried out to HSV space, and to image；Finally by the contour of object figure and size that will test and mobile phone and Digital camera is compared, and judges whether image includes the behavior of taking on the sly.The shortcoming of this method is: since this method is detecting When only equipment of taking on the sly is detected and behavior act of taking on the sly has diversification, be easily to take on the sly row by other behavior false judgments To have larger impact to Detection accuracy.

Shandong tide cloud service Information technology Co., Ltd is in patent document " a kind of Anti-sneak-shooting system and the side of its application A kind of Anti-sneak-shooting system and method are disclosed in method " (application number: 201711077705.3, publication No.: CN107784653A).It should The step of method is that the cine-oriented image shown on acquisition curtain in real time first exports cine-oriented image to judgment module of taking on the sly；So Real-time auditorium image afterwards, image is exported to judgment module of taking on the sly；For the every image received, current auditorium is calculated Matching degree between image and the cine-oriented image received, when calculated matching degree is more than or equal to preset matching degree threshold value, Think behavior of taking on the sly in present image.The shortcoming of this method is: since this method is used in contrast images matching degree Traditional matching process, calculation amount is larger, to can not be handled in real time video.

Summary of the invention

It is an object of the invention in view of the above shortcomings of the prior art, propose a kind of anti-theft based on deep learning network Detection method is clapped, solves the problems, such as to carry out the behavior of taking on the sly in video that inspection time difference method is low, is unable to reach live effect.

Realize the object of the invention thinking be first build the Yolov3 target detection network being made of four modules, The every layer parameter of network is set, then constructs data set, a variety of scale picture frame modes is taken to carry out the same behavior of taking on the sly in picture Label, and humanoid region is marked, then the good picture of input marking is trained deep learning network, finally will be real When the picture that acquires be input to and train detection in network and take on the sly behavior, to without the humanoid region carry out office in behavior picture of taking on the sly Portion's enhancing, the picture after local enhancement is inputted again in deep learning network, is detected again to picture.

The present invention realizes that specific step is as follows:

(1) the target detection network of deep learning is constructed:

It is as follows that (1a) builds the Yolov3 target detection network specific structure that one is made of four modules:

The structure of first module is successively are as follows: input layer → the 1st convolutional layer → the 2nd convolutional layer → first convolution submodule Block → the 3rd convolutional layer → the second convolution submodule → 4 convolutional layers → third convolution submodule → 5 convolutional layers → Volume Four product Submodule → the 6th convolutional layer → the 5th convolution submodule；The second convolution submodule is the first volume being sequentially connected in series by four Product unit composition；The 2nd convolution unit that the third convolution submodule is sequentially connected in series by eight forms；The Volume Four product submodule The 3rd convolution unit that block is sequentially connected in series by eight forms；The 5th convolution submodule is by four volumes 4 being sequentially connected in series Product unit composition；The structure of all convolution units is successively are as follows: two convolutional layer → ResNet layers be sequentially connected in series, each The ResNet layers of input terminal by place convolution submodule connects and is merged into output end；

The structure of second module is successively are as follows: the 7th convolutional layer → the 8th convolutional layer → the 9th convolutional layer → the 10th convolution Layer → the 11st convolutional layer → the 12nd convolutional layer → 13 convolutional layers → output layer；

The structure of the third module is successively are as follows: 14 convolutional layers → up-sampling layer → 1concat layers → volume 15 Lamination → the 16th convolutional layer → the 17th convolutional layer → the 18th convolutional layer → the 19th convolutional layer → the 20th convolutional layer → the 21st convolutional layer → output layer；

The structure of 4th module is successively are as follows: 22 convolutional layers → up-sampling layer → 2concat layers → volume 23 Lamination → the 24th convolutional layer → the 25th convolutional layer → the 26th convolutional layer → the 27th convolutional layer → the 28th convolutional layer → the 29th convolutional layer → output layer；

The 5th convolution submodule in first module is connected with the 7th convolutional layer in second module, by second mould The 11st convolutional layer is connected with the 14th convolutional layer in third module in block, by the 19th convolutional layer in third module and the 4th mould The 22nd convolutional layer is connected in block；By the Volume Four product submodule in first module and 1concat layers in third module It is connected, the third convolution submodule in first module is connected with 2concat layers in the 4th module, forms Yolov3 Target detection network；

The parameter that every layer of the target detection network of deep learning is arranged in (1b) is as follows:

All convolution kernel sizes of 1st to the 6th convolutional layer are disposed as 3*3, it is 32,64 that port number, which is set gradually, 128,256,512,1024, the 1st convolutional layer step-length is set as 1, and the step-length of the 2nd to the 5th convolutional layer is disposed as 2；

By the 7th, 9, the convolution kernel sizes of 11 convolutional layers be disposed as 1*1, port number is disposed as 512, and step-length is respectively provided with It is 1；

By the 8th, 10, the convolution kernel sizes of 12 convolutional layers be disposed as 3*3, port number is disposed as 1024, and step-length is all provided with It is set to 1；

The convolution kernel of 14th convolutional layer is dimensioned to 1*1, port number is set as 256, and step-length is set as 1；

By the 15th, 17, the convolution kernel sizes of 19 convolutional layers be disposed as 1*1, port number is disposed as 256, and step-length is all provided with It is set to 1；

By the 16th, 18, the convolution kernel sizes of 20 convolutional layers be disposed as 3*3, port number is disposed as 512, and step-length is all provided with It is set to 1；

The convolution kernel of 22nd convolutional layer is dimensioned to 1*1, port number is set as 128, and step-length is set as 1；

By the 23rd, 25, the convolution kernel sizes of 27 convolutional layers be disposed as 1*1, port number is disposed as 128, and step-length is all provided with It is set to 1；

By the 24th, 26, the convolution kernel sizes of 28 convolutional layers be disposed as 3*3, port number is disposed as 256, and step-length is all provided with It is set to 1；

By the 13rd, 21, the convolution kernel sizes of 29 convolutional layers be disposed as 1*1, port number is disposed as 255, and step-length is all provided with It is set to 1；

The convolution kernel size of two convolutional layers in 1st convolution submodule and the 1st to the 4th convolution unit is successively set respectively It is set to 1*1 and 3*3, step-length is disposed as 1, and the port number of two convolutional layers in the 1st convolution submodule is successively respectively set to 32 and 64, the port number of two convolutional layers in the 1st convolution unit is successively respectively set to 64 and 128, it will be in the 2nd convolution unit The port number of two convolutional layers is successively respectively set to 128 and 256, by the port number of two convolutional layers in the 3rd convolution unit according to It is secondary to be respectively set to 256 and 512, the port number of two convolutional layers in the 4th convolution unit is successively respectively set to 512 Hes 1024；

The step-length of up-sampling layers all in aforementioned four module is disposed as 2；

(2) training set is generated:

(2a) acquires at least 10,000 pictures and forms deep learning data set, wherein behavior of taking on the sly in 60% picture, It does not take on the sly in 40% picture behavior；

(2b) extracts 80% picture at random from all pictures for have the behavior of taking on the sly, all figures for behavior of never taking on the sly 80% picture composition training set is extracted in piece at random；

(3) take a variety of scale picture frame modes that the same behavior of taking on the sly in picture is marked:

(3a) makes marks to the peripheral picture frame for used photographing device of taking on the sly in all pictures for having same behavior of taking on the sly；

(3b) on photographing device used in taking on the sly in all pictures for having same behavior of taking on the sly and photographing device to going out The peripheral picture frame of existing a part of manpower makes marks；

(3c) is to the movement wheel of taking on the sly in all pictures for having same behavior of taking on the sly including complete manpower and photographing device Wide peripheral picture frame makes marks；

The humanoid region picture frame of every picture in deep learning data set is marked in (3d), obtains the training marked Collect picture；

(4) training deep learning network:

Labeled good training set picture is input in Yolov3 target detection network, network parameter is iterated It updates, the deconditioning when loss function drops to 0.1 or less obtains trained Yolov3 target detection network；

(5) behavior of taking on the sly is detected:

The picture acquired in real time in indoor environment to be detected is input to trained Yolov3 target and examined by (5a) In survey grid network, calculated target class of a picture for accordingly having marked detection target and network institute of the picture is exported Other score value；

(5b) sets 0.5 for the judgment threshold of the score value for the behavior of taking on the sly, if the behavior score value of taking on the sly detected Less than 0.5, it is believed that behavior of not taking on the sly；If the behavior score value of taking on the sly detected is greater than 0.5, it is believed that behavior of taking on the sly, it is defeated It takes on the sly out the score value and its location information of behavior；

(6) image enhancement is carried out to the picture without behavior of taking on the sly:

(6a) judges whether to be able to detect that humanoid region, will test if not detecting the behavior of taking on the sly in picture Humanoid area score value be confirmed as someone greater than 0.5 picture, humanoid picture of the area score value less than 0.5 that will test It is confirmed as nobody；

(6b) carries out in the humanoid region detected in picture at local equalization if being able to detect that humanoid region Reason；If not detecting humanoid region, it is believed that without behavior of taking on the sly in the picture；

(7) the enhanced image of feature is detected again:

Image after progress partial histogram equalization is inputted again in deep learning network, picture is carried out again It takes on the sly behavioral value.

Compared with prior art the present invention has the following advantages:

First, since the present invention takes a variety of scale picture frame modes when data set is marked, overcome existing skill Because detecting the low problem of accuracy caused by behavior act diversification of taking on the sly in art, so that the present invention is examined to the behavior of taking on the sly Accuracy with higher during survey.

Second, since the present invention has built the target detection network an of deep learning, for being examined to the behavior of taking on the sly Survey, overcome in the prior art because conventional method it is computationally intensive caused by the slow-footed problem of detection, allow the invention to reality Now the behavior of taking on the sly in monitor video is measured in real time.

Third makes the part of picture since the present invention is using the method for carrying out image enhancement to the picture without behavior of taking on the sly Feature is enhanced, so that the present invention has better detection effect to the detection for the behavior of taking on the sly.

Detailed description of the invention

Fig. 1 is flow chart of the invention；

Fig. 2 is analogous diagram of the invention.

Specific embodiment

The present invention will be further described with reference to the accompanying drawing.

Referring to Fig.1, the specific steps realized to the present invention are described in further detail.

Step 1, the target detection network of deep learning is constructed.

Build the Yolov3 target detection network being made of four modules；

The 5th convolution submodule in first module is connected with the 7th convolutional layer in second module, by second mould The 11st convolutional layer is connected with the 14th convolutional layer in third module in block, by the 19th convolutional layer in third module and the 4th mould The 22nd convolutional layer is connected in block；By the Volume Four product submodule in first module and 1concat layers in third module It is connected, the third convolution submodule in first module is connected with 2concat layers in the 4th module, forms Yolov3 Target detection network.

The parameter of every layer of the target detection network of deep learning is set；

The step-length of up-sampling layers all in aforementioned four module is disposed as 2.

Step 2, training set is generated:

It acquires at least 10,000 pictures and forms deep learning data set, wherein behavior of taking on the sly in 60% picture, 40% Picture in do not take on the sly behavior；

The picture for extracting 80% at random from all pictures for have the behavior of taking on the sly, in all pictures for behavior of never taking on the sly The picture of random extraction 80%, forms training set.

The behavior of taking on the sly in the picture refers to that people hold mobile phone under environment indoors, camera, taking pictures for plate set The standby behavior taken pictures.

Step 3, take a variety of scale picture frame modes that the same behavior of taking on the sly in picture is marked:

It makes marks to the peripheral picture frame for used photographing device of taking on the sly in all pictures for having same behavior of taking on the sly；

To what is occurred on take on the sly in all pictures for having same behavior of taking on the sly used photographing device and photographing device The peripheral picture frame of a part of manpower makes marks；

To the motion profile of taking on the sly in all pictures for having same behavior of taking on the sly including complete manpower and photographing device Peripheral picture frame makes marks.

The humanoid region picture frame of every picture in deep learning data set is marked, the training set figure marked is obtained Piece.

Step 4, training deep learning network:

Labeled good training set picture is input in Yolov3 target detection network, network parameter is iterated It updates, the deconditioning when loss function drops to 0.1 or less obtains trained Yolov3 target detection network.

Step 5, behavior of taking on the sly is detected:

The picture acquired in real time in indoor environment to be detected is input to trained Yolov3 target detection net In network, export picture and network that one of the picture has accordingly marked detection target the calculated target category Score value.

0.5 is set by the judgment threshold of the score value for the behavior of taking on the sly, if the behavior score value of taking on the sly detected is less than 0.5, it is believed that behavior of not taking on the sly；If the behavior score value of taking on the sly detected is greater than 0.5, it is believed that behavior of taking on the sly, output are stolen The score value and its location information of bat behavior.

Step 6, image enhancement is carried out to the picture without behavior of taking on the sly:

If not detecting the behavior of taking on the sly in picture, judge whether to be able to detect that humanoid region, the people that will test Picture of the shape area score value greater than 0.5 is confirmed as someone, picture confirmation of the humanoid area score value that will test less than 0.5 For nobody.

If being able to detect that humanoid region, the humanoid region detected in picture is subjected to local equalization processing； If not detecting humanoid region, it is believed that without behavior of taking on the sly in the picture.

Step 7, the enhanced image of feature is detected again:

It is described to without take on the sly performance-based objective image carry out feature enhancing the step of it is as follows: save monitor video in detect people But the picture for not detecting the behavior of taking on the sly is obtained the pixel coordinate information that these pictures carry out humanoid detection, is sat using pixel It marks information and histogram equalization is carried out to humanoid region, finally obtain the picture after local equalization.

Effect of the invention is described further below with reference to emulation experiment.

1. emulation experiment condition:

The hardware platform of emulation experiment of the invention are as follows: processor is Intel Core i7-6850K CPU, and video card is NVIDIA GeForce GTX 1080Ti, inside saves as 128GB.

The software platform of emulation experiment of the invention are as follows: ubuntu16.04.

2. emulation experiment content:

Emulation experiment of the present invention is that therein 80% is randomly selected from data set as training using method of the invention Collection, 10% is used as verifying collection, and remaining 10% is used as test set.The deep learning network built is carried out on the data set Training, obtains trained deep learning network.

Data set used in emulation experiment of the present invention is held take on the sly the picture of movement of mobile phone by someone of indoor acquisition and is formed, Imaging time is in March, 2019, and picture size is 1920 × 1080, picture format jpg.

Trained deep learning network is tested on test set, the behavior of taking on the sly in picture is detected, to figure After humanoid region is enhanced in piece, two kinds of mark modes of phone and phone-hand have obtained correct testing result As shown in Figure 2.In Fig. 2, it is denoted as phone with the testing result that the mode that the peripheral picture frame to mobile phone makes marks obtains, with to people The testing result that the mode that the peripheral picture frame of the motion profile of taking on the sly of hand-held mobile phone makes marks obtains is denoted as phone-hand.

Picture in present invention emulation is carried out using three evaluation indexes (accuracy rate Acc, misclassification rate FPR, leakage knowledge rate FNR) Testing result before and after local enhancement is evaluated respectively.Using following formula, accuracy rate Acc, misclassification rate FPR, leakage knowledge rate are calculated FNR, wherein TP expression is correctly divided into the picture number of target, FP indicates the picture for being mistakenly divided into target Number, TN expression are mistakenly divided into aimless picture number, FN expression is correctly divided into aimless picture number, by institute There is calculated result to be depicted as table 1:

The quantitative analysis table of testing result of the present invention in 1. emulation experiment of table

	FPR	FNR	Acc
				Before local enhancement	67.6%	37.4%	28.4%
After local enhancement	76.0%	26.9%	21.6%

In conjunction with table 1 as can be seen that the misclassification rate FPR that the present invention detects after local enhancement is 76.0%, leakage knowledge rate FNR is 26.9%, accuracy rate Acc are 21.6%, these three indexs are above the testing result before local enhancement, it was demonstrated that local enhancement figure The available more accurate testing result of the method for piece.

The above emulation experiment shows: the present invention takes a variety of scale picture frame modes when data set is marked, and overcomes The prior art is because detecting the low problem of accuracy caused by behavior act diversification of taking on the sly, so that the present invention is in the detection process Accuracy with higher.The target detection network based on deep learning has been built when detecting, overcomes the prior art Because conventional method it is computationally intensive caused by the slow-footed problem of detection so that take behavioral value of the present invention in monitor video On can reach live effect.

Claims

1. a kind of anti-detection method of taking on the sly based on deep learning, which is characterized in that the target detection network of deep learning is constructed, It takes a variety of scale picture frame modes that picture is marked, image enhancement, the step of this method is carried out to the picture without behavior of taking on the sly It is rapid as follows:

(1) the target detection network of deep learning is constructed:

The structure of first module is successively are as follows: and input layer → the 1st convolutional layer → the 2nd convolutional layer → first convolution submodule → 3rd convolutional layer → the second convolution submodule → 4 convolutional layers → third convolution submodule → 5 convolutional layers → Volume Four product submodule Block → the 6th convolutional layer → the 5th convolution submodule；The second convolution submodule is the first convolution list being sequentially connected in series by four Member composition；The 2nd convolution unit that the third convolution submodule is sequentially connected in series by eight forms；The Volume Four accumulates submodule The 3rd convolution unit being sequentially connected in series by eight forms；The 5th convolution submodule is the 4th convolution list being sequentially connected in series by four Member composition；The structure of all convolution units is successively are as follows: two convolutional layer → ResNet layers be sequentially connected in series, it is ResNet layers each The input terminal of place convolution submodule is connected and is merged into output end；

The structure of second module is successively are as follows: and the 7th convolutional layer → the 8th convolutional layer → the 9th convolutional layer → the 10th convolutional layer → 11st convolutional layer → the 12nd convolutional layer → 13 convolutional layers → output layer；

The structure of the third module is successively are as follows: 14 convolutional layers → up-sampling layer → the 1concat layers → the 15th convolutional layer → the 16 convolutional layer → the 17th convolutional layer → the 18th convolutional layer → the 19th convolutional layer → the 20th convolutional layer → the 21st convolutional layer → defeated Layer out；

The structure of 4th module is successively are as follows: 22 convolutional layers → up-sampling layer → the 2concat layers → the 23rd convolutional layer → the 24 convolutional layer → the 25th convolutional layer → the 26th convolutional layer → the 27th convolutional layer → the 28th convolutional layer → the 29th convolutional layer → defeated Layer out；

The 5th convolution submodule in first module is connected with the 7th convolutional layer in second module, it will be in second module 11st convolutional layer is connected with the 14th convolutional layer in third module, will be in the 19th convolutional layer in third module and the 4th module 22nd convolutional layer is connected；Volume Four product submodule in first module is connected with 1concat layers in third module, Third convolution submodule in first module is connected with 2concat layers in the 4th module, forms Yolov3 target Detect network；

All convolution kernel sizes of 1st to the 6th convolutional layer are disposed as 3*3, it is 32,64,128 that port number, which is set gradually, 256,512,1024, the 1st convolutional layer step-length is set as 1, and the step-length of the 2nd to the 5th convolutional layer is disposed as 2；

By the 7th, 9, the convolution kernel sizes of 11 convolutional layers be disposed as 1*1, port number is disposed as 512, and step-length is disposed as 1；

By the 8th, 10, the convolution kernel sizes of 12 convolutional layers be disposed as 3*3, port number is disposed as 1024, and step-length is disposed as 1；

By the 15th, 17, the convolution kernel sizes of 19 convolutional layers be disposed as 1*1, port number is disposed as 256, and step-length is disposed as 1；

By the 16th, 18, the convolution kernel sizes of 20 convolutional layers be disposed as 3*3, port number is disposed as 512, and step-length is disposed as 1；

By the 23rd, 25, the convolution kernel sizes of 27 convolutional layers be disposed as 1*1, port number is disposed as 128, and step-length is disposed as 1；

By the 24th, 26, the convolution kernel sizes of 28 convolutional layers be disposed as 3*3, port number is disposed as 256, and step-length is disposed as 1；

By the 13rd, 21, the convolution kernel sizes of 29 convolutional layers be disposed as 1*1, port number is disposed as 255, and step-length is disposed as 1；

The convolution kernel size of two convolutional layers in 1st convolution submodule and the 1st to the 4th convolution unit is successively respectively set to 1*1 and 3*3, step-length are disposed as 1, and the port number of two convolutional layers in the 1st convolution submodule is successively respectively set to 32 Hes 64, the port number of two convolutional layers in the 1st convolution unit is successively respectively set to 64 and 128, it will be two in the 2nd convolution unit The port number of convolutional layer is successively respectively set to 128 and 256, and the port number of two convolutional layers in the 3rd convolution unit is successively divided 256 and 512 are not set as, the port number of two convolutional layers in the 4th convolution unit is successively respectively set to 512 and 1024；

(2) training set is generated:

(2a) acquires at least 10,000 pictures and forms deep learning data set, wherein behavior of taking on the sly in 60% picture, 40% Picture in do not take on the sly behavior；

(2b) extracts 80% picture at random from all pictures for have the behavior of taking on the sly, in all pictures for behavior of never taking on the sly The picture of random extraction 80% forms training set；

(3b) is to occurring on photographing device used in taking on the sly in all pictures for having same behavior of taking on the sly and photographing device The peripheral picture frame of a part of manpower makes marks；

(3c) is to the motion profile of taking on the sly in all pictures for having same behavior of taking on the sly including complete manpower and photographing device Peripheral picture frame makes marks；

The humanoid region picture frame of every picture in deep learning data set is marked in (3d), obtains the training set figure marked Piece；

(4) training deep learning network:

Labeled good training set picture is input in Yolov3 target detection network, network parameter is iterated more Newly, the deconditioning when loss function drops to 0.1 or less obtains trained Yolov3 target detection network；

(5) behavior of taking on the sly is detected:

The picture acquired in real time in indoor environment to be detected is input to trained Yolov3 target detection net by (5a) In network, export picture and network that one of the picture has accordingly marked detection target the calculated target category Score value；

(5b) sets 0.5 for the judgment threshold of the score value for the behavior of taking on the sly, if the behavior score value of taking on the sly detected is less than 0.5, it is believed that behavior of not taking on the sly；If the behavior score value of taking on the sly detected is greater than 0.5, it is believed that behavior of taking on the sly, output are stolen The score value and its location information of bat behavior；

(6a) judges whether to be able to detect that humanoid region, the people that will test if not detecting the behavior of taking on the sly in picture Picture of the shape area score value greater than 0.5 is confirmed as someone, picture confirmation of the humanoid area score value that will test less than 0.5 For nobody；

(6b) carries out local equalization processing if being able to detect that humanoid region, by the humanoid region detected in picture； If not detecting humanoid region, it is believed that without behavior of taking on the sly in the picture；

(7) the enhanced image of feature is detected again:

Image after progress partial histogram equalization is inputted again in deep learning network, is taken on the sly again to picture Behavioral value.

2. the anti-detection method of taking on the sly according to claim 1 based on deep learning, which is characterized in that step (2a), step Suddenly (2b), the behavior of taking on the sly described in step (2c) refer to that people hold mobile phone under environment indoors, camera, taking pictures for plate set The standby behavior taken pictures.

3. the anti-detection method of taking on the sly according to claim 1 based on deep learning, which is characterized in that institute in step (6) That states is as follows the step of carrying out image enhancement to the picture without behavior of taking on the sly: saving and detects people in monitor video but do not detect To the picture for the behavior of taking on the sly, the pixel coordinate information that these pictures carry out humanoid detection is obtained, using pixel coordinate information to people Shape region carries out histogram equalization, finally obtains the picture after local equalization.