CN110378232A

CN110378232A - The examination hall examinee position rapid detection method of improved SSD dual network

Info

Publication number: CN110378232A
Application number: CN201910534569.9A
Authority: CN
Inventors: 马苗; 陶丽丽; 裴炤; 高子昂
Original assignee: Shaanxi Normal University
Current assignee: Shaanxi Normal University
Priority date: 2019-06-20
Filing date: 2019-06-20
Publication date: 2019-10-25
Anticipated expiration: 2039-06-20
Also published as: CN110378232B

Abstract

A kind of examination hall examinee position rapid detection method of improved SSD dual network by image preprocessing, building dynamic threshold SSD network, building up-sampling SSD network, the improved SSD dual network of training and forms test sample image progress testing procedure.On the basis of SSD network structure, the characteristics of according to examination hall examinee's data set, building dynamic adjustment hands over and compares threshold method, increase the positive sample amount of SSD network Small Target, up-sampling layer is added in SSD network, enhance Small object characteristics of image, the thought of integrating parallel dual network structure improves image object verification and measurement ratio.Compared with prior art, the good, high accuracy for examination with robustness is suitable for detecting examinee and invigilator teacher under examination scene.

Description

The examination hall examinee position rapid detection method of improved SSD dual network

Technical field

The invention belongs to image procossing and target detection technique fields, particularly relate in standardization examination hall monitor video The identification, positioning and demographics of examinee in the single-frame images of acquisition.

Background technique

Examination is one of education activities important link, is the major way for measuring, judging student's attainment level, is me The important means they state all types of talents examination and selected.In all kinds of important examinations of country in recent years, standardization examination hall has been played Important function.It, inevitably can be because of for a long time, uninterruptedly if only relying on the artificial monitor video of observation in real time carries out identification judgement Continuous work bring fatigue and there is the case where judging by accident or omitting, therefore advanced computer vision technique is incorporated into mark Standardization examination hall monitors big data, intelligently analyze that examinee's behavior and being aided with manually invigilates using video monitoring as main means newly Invigilator's form will be more more and more urgent.

It is the premise and basis of intellectual analysis examinee's behavior to the accurate detection of examinee position in standardization examination hall, and is based on Examinee's behavior of single frames scene is related to the target detection technique in image, includes target identification and positioning.Examination hall monitor video list The orientation of student of frame image and counting are an important application of the target detection technique in standardization examination hall, and process is related to very More image procossings and image analysis technology, such as the extraction of the body characteristics of people, detection zone in image are handed over and compare calculating etc. Aspect.Currently, image object detection method both domestic and external includes object detection method, base of the conventional method such as based on image threshold In inter-frame difference and the object detection method of bilinear interpolation etc., the object detection method based on deep learning has Faster- RNN, YOLO, SSD etc..When above-mentioned target identification and localization method are applied to the single-frame images of standardization examination hall monitor video, institute Existing technical problem underlying is that object recognition rate is not high, and positional accuracy is low, or even a large amount of missing inspections occurs or can not calculate Problem.This cause existing method standardize examination hall monitoring scene under by student's unusual checking for the purpose of using it is micro- its It is micro-.

Summary of the invention

Technical problem to be solved by the present invention lies in overcoming the disadvantage that above-mentioned prior art speed is slow, accuracy is not high, The examination hall examinee position for providing the improved SSD dual network that a kind of robustness is good, detection accuracy is high, speed is fast, accuracy is high is fast Fast detection method.

Technical solution used by above-mentioned technical problem is solved to be made of following step:

(1) image preprocessing

It is concentrated from image data and chooses 600~800 training sample images and 80~280 test sample images, with double The image of selection is normalized to 250 × 250~500 × 500 by pixel size by linear interpolation method.

(2) dynamic threshold SSD network is constructed

(a) using SSD network structure as the initial configuration of dynamic threshold SSD network

Input layer is the training sample image that 600~800 Zhang great little are 250 × 250~500 × 500, is connected after input layer The convolution 1 of VGG16, head and the tail sequentially connect 2~convolution of convolution 5 after convolution 1, and head and the tail sequentially connect full convolutional layer 6, complete after convolution 5 Convolutional layer 7, head and the tail sequentially connect 8~convolution of convolution 11, convolution 4, full convolutional layer 7, convolution 8, convolution 9, convolution after full convolutional layer 7 10 and convolution 11 after connect with output layer.

(b) building pre-selection frame matching process

In initial configuration, convolutional layer 4-3, the full convolutional layer 7, the convolutional layer 8-2 in convolution 8, convolution in convolution 4 are selected The convolutional layer 10-2 in convolutional layer 9-2, convolution 10 in 9, the convolutional layer 11 in convolution 11 totally 6 convolutional layers, in 6 convolutional layers On take the pre-selection frame of different scale to be matched with true tag frame, divide pre-selection the positive and negative sample set of frame, matching process be SSD net The area of pre-selection frame and true tag frame in network is handed over and is compared greater than friendship and than threshold value IOU, is denoted as positive sample, be otherwise negative sample This.

Determine friendship and the method than threshold value IOU are as follows: according to the ratio of the area of true tag frame and input image pixels size Value determines to hand over and the threshold value IOU of ratio, dynamic adjust threshold value IOU and divide positive negative sample, true tag frame area and input picture picture Plain size ratio≤0.01 hands over and is 0.4 than threshold value IOU, otherwise, hands over and is 0.5 than threshold value IOU, determine true tag as the following formula The area S of frame_GTbWith friendship and than threshold value IOU:

S_GTb=(y_max-y_min)×(x_max-x_min)

Wherein x_max、x_min、y_max、y_minThe respectively maxima and minima of the abscissa of true tag frame and ordinate； S_inputFor the area of input picture, trained network is constituted by dynamic threshold SSD network according to the matching process.

(3) building up-sampling SSD network

(a) building up-sampling SSD network structure

Input layer is the training sample image that 600~800 Zhang great little are 250 × 250~500 × 500, is connected after input layer The convolution 1 of VGG16, head and the tail sequentially connect convolution 2, convolution 3 after convolution 1, and head and the tail sequentially connect the convolutional layer of convolution 4 after convolution 3 4-1, convolutional layer 4-2, the up-sampling layer 4-4 that connection size is 76 × 76 after convolutional layer 4-2, connect convolution after up-sampling layer 4-4 After the prediction interval 4-5 that core number is 512, convolution kernel size is 3 × 3, convolutional layer 4-2 after connection convolutional layer 4-3, convolutional layer 4-3 Convolution 5 is connected, full convolutional layer 6, full convolutional layer 7 is linked in sequence after convolution 5 from beginning to end, head and the tail sequentially roll up by connection after full convolutional layer 7 8~convolution 11 is accumulated, is connect with output layer after convolutional layer 4-5, full convolutional layer 7, convolution 8, convolution 9, convolution 10 and convolution 11, structure At up-sampling SSD network structure.

(b) the pre-selection frame matching process of building up-sampling SSD network

The convolution in prediction interval 4-5, full convolutional layer 7, convolution 8 in up-sampling SSD network structure, in selection convolution 4 Layer 8-2, the convolutional layer 9-2 in convolution 9, the convolutional layer 10-2 in convolution 10, totally 6 convolutional layers of the convolutional layer 11 in convolution 11, Take the pre-selection frame of different scale to be matched on 6 convolutional layers with true tag frame, will pre-selection frame be divided into positive sample collection and Negative sample collection, specific matching principle are to preselect the area of frame and true tag frame to hand over and compare and be greater than friendship and than threshold value IOU, are denoted as just Otherwise sample is negative sample, wherein set friendship and than threshold value IOU as 0.5.

(4) the improved SSD dual network of training

Improved SSD dual network is made of dynamic threshold SSD network and up-sampling SSD network, simultaneously training, trained step Suddenly are as follows:

(a) setting classification number is 2, the t raining period of training set is 8~20 times, training batch size is 15~32, basis Study rate parameter is 0.0002~0.001, learning rate decay factor is 0.7~0.95.

(b) pre-training is carried out to improved SSD dual network using VOC2007 data set, by the weight after pre-training and partially Set the initial weight as improved SSD dual network and biasing.

(c) pretreated 600~800 Zhang great little training sample image for being 250 × 250~500 × 500 is input to In dynamic threshold SSD network after pre-training, sets and intersect entropy function as loss function, reduce penalty values using gradient descent method Propagated forward and backpropagation, iterative cycles propagated forward and backpropagation are carried out, and updates weighted value and the biasing of network, Training 8~20 times, obtains trained dynamic threshold SSD network model.

(d) pretreated 600~800 Zhang great little training sample image for being 250 × 250~500 × 500 is input to In up-sampling SSD network after pre-training, sets and intersect entropy function as loss function, reduce penalty values using gradient descent method and come Propagated forward and backpropagation, iterative cycles propagated forward and backpropagation are carried out, and updates weighted value and the biasing of network, is instructed Practice 8~20 times, obtains trained up-sampling SSD network model.

(5) test sample image is tested

(a) the classification confidence threshold value that target is arranged is 0.45~0.65.

(b) pretreated 80~280 test sample images are input to trained dynamic threshold SSD network simultaneously It is tested in up-sampling SSD network, obtains 2 groups of different image object testing results, wherein every group of result includes each The confidence level of the position coordinates of target, classification and generic.

It (c) is more than registration threshold value with position registration of the non-maxima suppression algorithm to target in 2 groups of testing results 0.35~0.6 testing result carries out duplicate removal, retains the higher position coordinates of confidence level and its classification, obtains finally detecting knot Fruit chooses average precision mAP value as evaluation index, evaluates testing result.

The building dynamic threshold SSD network step (2) of the invention the step of in (b), the pre-selection of the different scale The width ω of frame_kWith height h_kAre as follows:

Wherein, S_kFor size basis value, S_minMost preferably 0.1, S_maxMost preferably 0.9, respectively indicate smallest dimension and maximum Scale, m be the convolutional layer number of prediction, most preferably 6, k be (1, m] integer, a be pre-selection different the ratio of width to height of frame, a ∈ { 1/ 3,1/2,1,2,3}。

Training of the invention improved SSD dual network step (4) the step of in (a), the t raining period of training set is best It is 10 times, training batch size most preferably 16.

In the step of carrying out testing procedure (5) to test sample image (a) of the invention, the classification confidence level threshold of target Value most preferably 0.5.

In the step of carrying out testing procedure (5) to test sample image (c) of the invention, position is sat in testing result Target registration threshold value most preferably 0.4.

The present invention is on the basis of SSD network, and the characteristics of according to examination hall examinee's data set, building dynamic adjustment, which is handed over, simultaneously compares threshold Value method increases the positive sample amount of SSD network Small Target, and up-sampling layer is added in SSD network, and enhancing Small object image is special Sign, the thought of integrating parallel dual network structure improve image object verification and measurement ratio.Compared with prior art, good with robustness, inspection The advantages that precision is high is surveyed, under scene of taking an examination, the student classroom or teacher detection.

Detailed description of the invention

Fig. 1 is the flow chart of the embodiment of the present invention 1.

Fig. 2 is the SSD primitive network structure chart of embodiment 1.

Fig. 3 is the part-structure figure of the up-sampling SSD network of embodiment 1.

Fig. 4 is the original image that number is 302 in 1 examination hall examinee's data set of embodiment.

Fig. 5 is examinee's locating effect figure in Fig. 4.

Fig. 6 is the original image that number is 814 in 4 examination hall examinee's data set of embodiment.

Fig. 7 is examinee's locating effect figure of Fig. 6.

Fig. 8 is the original image that number is 475 in 5 examination hall examinee's data set of embodiment.

Fig. 9 is examinee's locating effect figure in Fig. 8.

Specific embodiment

The present invention is further detailed with example with reference to the accompanying drawing, but the present invention is not limited to following implementations Example.

Embodiment 1

The image of the present embodiment carrys out examination hall monitor video under the standardization examination hall environment that freely modes such as online crawler obtain Examination hall examinee's data set for constituting of single-frame images, the present embodiment uses 700 images in the data set as training set, 180 images are as test set, and training set is not be overlapped with test set.

In Fig. 1~4, the examination hall examinee position rapid detection method of the improved SSD dual network of the present embodiment, by following Step composition:

(1) image preprocessing

It is concentrated from image data and chooses 700 training sample images and 180 test sample images, use bilinear interpolation The image of selection is normalized to 300 × 300 by pixel size.

(2) dynamic threshold SSD network is constructed

The SSD network structure of the present embodiment is selected from paper " the Single Shot Multibox of CVPR meeting in 2016 Detector ", SSD network structure is by input layer, convolution 1, convolution 2, convolution 3, convolution 4, convolution 5, full convolutional layer 6, full convolution Layer 7, convolutional layer 8, convolutional layer 9, convolutional layer 10, convolutional layer 11 connect and compose.

Input layer is the training sample image that 700 Zhang great little are 300 × 300, and the convolution 1 of VGG16, volume are connected after input layer Convolution 2 is connected after product 1, convolution 3 is connected after convolution 2, and convolution 4 is connected after convolution 3, convolution 5 is connected after convolution 4, is connected after convolution 5 Full convolutional layer 6 connects full convolutional layer 7 after full convolutional layer 6, convolution 8 is connected after full convolutional layer 7, connects convolution 9, convolution after convolution 8 Convolution 10 is connected after 9, connects convolution 11 after convolution 10, after convolution 4, full convolutional layer 7, convolution 8, convolution 9, convolution 10 and convolution 11 It is connect with output layer.

(b) building pre-selection frame matching process

The width ω of the pre-selection frame of different scale_kWith height h_kAre as follows:

Wherein, S_kFor size basis value, S_minIt is 0.1, S_maxIt is 0.9, respectively indicates smallest dimension and out to out, m is The convolutional layer number of prediction, be 6, k be (1, m] integer, a be pre-selection different the ratio of width to height of frame, a ∈ { 1/3,1/2,1,2,3 }.

Determine friendship and the method than threshold value IOU are as follows: determine according to the area of true tag frame and the ratio of input picture area The threshold value IOU of fixed friendship and ratio, dynamic adjust threshold value IOU and divide positive negative sample, true tag frame area and input picture area ratio ≤ 0.01, it hands over and is 0.4 than threshold value IOU, otherwise, hand over and be 0.5 than threshold value IOU, determined by formula (1) and formula (2) true The area S of label frame_GTbWith friendship and than threshold value IOU:

S_GTb=(y_max-y_min)×(x_max-x_min)

(3) building up-sampling SSD network

(a) building up-sampling SSD network structure

Input layer is the training sample image that 700 Zhang great little are 300 × 300, and the convolution 1 of VGG16, volume are connected after input layer Head and the tail sequentially connect convolution 2, convolution 3 after product 1, and head and the tail sequentially connect convolutional layer 4-1, the convolutional layer 4-2 of convolution 4, volume after convolution 3 The up-sampling layer 4-4 that size is 76 × 76 is connected after lamination 4-2, connection convolution kernel number is 512, convolution after up-sampling layer 4-4 The prediction interval 4-5 that core size is 3 × 3 connects convolutional layer 4-3 after convolutional layer 4-2, connects convolution 5 after convolutional layer 4-3, after convolution 5 Full convolutional layer 6, full convolutional layer 7 is linked in sequence in head and the tail, and head and the tail sequentially connect 8~convolution of convolution 11, convolutional layer after full convolutional layer 7 It is connect with output layer after 4-5, full convolutional layer 7, convolution 8, convolution 9, convolution 10 and convolution 11, constitutes up-sampling SSD network knot Structure；

(4) the improved SSD dual network of training

(a) setting classification number is 2, the t raining period of training set is 10 times, training batch size is 16, basic learning rate ginseng Number is 0.0003, learning rate decay factor is 0.9.

(c) the pretreated 700 Zhang great little training sample image for being 300 × 300 is input to the dynamic after pre-training In threshold value SSD network, sets and intersect entropy function as loss function, reduce penalty values using gradient descent method to carry out propagated forward And backpropagation, iterative cycles propagated forward and backpropagation, and weighted value and the biasing of network are updated, training 10 times obtains Trained dynamic threshold SSD network model.

(d) by the pretreated 700 Zhang great little training sample image for being 300 × 300 be input to after pre-training on adopt In sample SSD network, set intersect entropy function as loss function, using gradient descent method reduce penalty values come carry out propagated forward with Backpropagation, iterative cycles propagated forward and backpropagation, and weighted value and the biasing of network are updated, training 10 times is instructed The up-sampling SSD network model perfected.

(5) test sample image is tested

(a) the classification confidence threshold value that target is arranged is 0.5.

(b) pretreated 180 test sample images are input to trained dynamic threshold SSD network and upper simultaneously It is tested in sampling SSD network, obtains 2 groups of different image object testing results, wherein every group of result includes each target Position coordinates, classification and generic confidence level.

It (c) is more than registration threshold value 0.4 with position registration of the non-maxima suppression algorithm to target in 2 groups of testing results Testing result carry out duplicate removal, retain the higher position coordinates of confidence level and its classification, obtain final detection result, choose average Precision mAP value evaluates testing result as evaluation index, and mAP value reaches 85.11%, every image averaging inspection Surveying the time is 0.13 second.For convenience of observation testing result, the present embodiment chooses a Zhang great little from 180 test sample images and is 704 × 576, the image (such as Fig. 4) that number is 312 is visualized, and visualization result is as shown in Figure 5.As seen from Figure 5, in figure The position of examinee can be detected, and the confidence value of each examinee's classification is higher, and effect is preferable.

Embodiment 2

The examination hall examinee position rapid detection method of the improvement SSD dual network of the present embodiment is made of following step:

(1) image preprocessing

It is concentrated from image data and chooses 700 training sample images and 180 test sample images, use bilinear interpolation The image of selection is normalized to 250 × 250 by pixel size.

(2) dynamic threshold SSD network is constructed

The SSD network structure of the present embodiment is by input layer, convolution 1, convolution 2, convolution 3, convolution 4, convolution 5, full convolutional layer 6, full convolutional layer 7, convolutional layer 8, convolutional layer 9, convolutional layer 10, convolutional layer 11 connect and compose.

Input layer is the training sample image that 700 Zhang great little are 250 × 250, and the convolution 1 of VGG16, volume are connected after input layer Convolution 2 is connected after product 1, convolution 3 is connected after convolution 2, and convolution 4 is connected after convolution 3, convolution 5 is connected after convolution 4, is connected after convolution 5 Full convolutional layer 6 connects full convolutional layer 7 after full convolutional layer 6, convolution 8 is connected after full convolutional layer 7, connects convolution 9, convolution after convolution 8 Convolution 10 is connected after 9, connects convolution 11 after convolution 10, after convolution 4, full convolutional layer 7, convolution 8, convolution 9, convolution 10 and convolution 11 It is connect with output layer.

(b) building pre-selection frame matching process

The step is same as Example 1.

(3) building up-sampling SSD network

(a) building up-sampling SSD network structure

Input layer is the training sample image that 700 Zhang great little are 250 × 250, and the convolution 1 of VGG16, volume are connected after input layer Head and the tail sequentially connect convolution 2, convolution 3 after product 1, and head and the tail sequentially connect convolutional layer 4-1, the convolutional layer 4-2 of convolution 4, volume after convolution 3 The up-sampling layer 4-4 that size is 76 × 76 is connected after lamination 4-2, connection convolution kernel number is 512, convolution after up-sampling layer 4-4 The prediction interval 4-5 that core size is 3 × 3 connects convolutional layer 4-3 after convolutional layer 4-2, connects convolution 5 after convolutional layer 4-3, after convolution 5 Full convolutional layer 6, full convolutional layer 7 is linked in sequence in head and the tail, and head and the tail sequentially connect 8~convolution of convolution 11, convolutional layer after full convolutional layer 7 It is connect with output layer after 4-5, full convolutional layer 7, convolution 8, convolution 9, convolution 10 and convolution 11, constitutes up-sampling SSD network knot Structure；

The step is same as Example 1.

(4) the improved SSD dual network of training

(a) setting classification number is 2, the t raining period of training set is 8 times, training batch size is 15, basic learning rate ginseng Number is 0.0002, learning rate decay factor is 0.7.

(c) the pretreated 700 Zhang great little training sample image for being 250 × 250 is input to the dynamic after pre-training In threshold value SSD network, sets and intersect entropy function as loss function, reduce penalty values using gradient descent method to carry out propagated forward And backpropagation, iterative cycles propagated forward and backpropagation, and weighted value and the biasing of network are updated, training 8 times is instructed The dynamic threshold SSD network model perfected.

(d) by the pretreated 700 Zhang great little training sample image for being 250 × 250 be input to after pre-training on adopt In sample SSD network, set intersect entropy function as loss function, using gradient descent method reduce penalty values come carry out propagated forward with Backpropagation, iterative cycles propagated forward and backpropagation, and weighted value and the biasing of network are updated, training 8 times is trained Good up-sampling SSD network model.

(5) test sample image is tested

(a) the classification confidence threshold value that target is arranged is 0.45.

It (c) is more than registration threshold value with position registration of the non-maxima suppression algorithm to target in 2 groups of testing results 0.35 testing result carries out duplicate removal, retains the higher position coordinates of confidence level and its classification, obtains final detection result, chooses Average precision mAP value evaluates testing result as evaluation index, and mAP value reaches 85.11%, and every image is flat Equal detection time is 0.13 second.

Embodiment 3

(1) image preprocessing

It is concentrated from image data and chooses 700 training sample images and 180 test sample images, use bilinear interpolation The image of selection is normalized to 500 × 500 by pixel size.

(2) dynamic threshold SSD network is constructed

Input layer is the training sample image that 700 Zhang great little are 500 × 500, and the convolution 1 of VGG16, volume are connected after input layer Convolution 2 is connected after product 1, convolution 3 is connected after convolution 2, and convolution 4 is connected after convolution 3, convolution 5 is connected after convolution 4, is connected after convolution 5 Full convolutional layer 6 connects full convolutional layer 7 after full convolutional layer 6, convolution 8 is connected after full convolutional layer 7, connects convolution 9, convolution after convolution 8 Convolution 10 is connected after 9, connects convolution 11 after convolution 10, after convolution 4, full convolutional layer 7, convolution 8, convolution 9, convolution 10 and convolution 11 It is connect with output layer.

(b) building pre-selection frame matching process

The step is same as Example 1.

(3) building up-sampling SSD network

(a) building up-sampling SSD network structure

Input layer is the training sample image that 700 Zhang great little are 500 × 500, and the convolution 1 of VGG16, volume are connected after input layer Head and the tail sequentially connect convolution 2, convolution 3 after product 1, and head and the tail sequentially connect convolutional layer 4-1, the convolutional layer 4-2 of convolution 4, volume after convolution 3 The up-sampling layer 4-4 that size is 76 × 76 is connected after lamination 4-2, connection convolution kernel number is 512, convolution after up-sampling layer 4-4 The prediction interval 4-5 that core size is 3 × 3 connects convolutional layer 4-3 after convolutional layer 4-2, connects convolution 5 after convolutional layer 4-3, after convolution 5 Full convolutional layer 6, full convolutional layer 7 is linked in sequence in head and the tail, and head and the tail sequentially connect 8~convolution of convolution 11, convolutional layer after full convolutional layer 7 It is connect with output layer after 4-5, full convolutional layer 7, convolution 8, convolution 9, convolution 10 and convolution 11, constitutes up-sampling SSD network knot Structure；

The step is same as Example 1.

(4) the improved SSD dual network of training

(a) setting classification number is 2, the t raining period of training set is 20 times, training batch size is 32, basic learning rate ginseng Number is 0.001, learning rate decay factor is 0.95.

(c) the pretreated 700 Zhang great little training sample image for being 500 × 500 is input to the dynamic after pre-training In threshold value SSD network, sets and intersect entropy function as loss function, reduce penalty values using gradient descent method to carry out propagated forward And backpropagation, iterative cycles propagated forward and backpropagation, and weighted value and the biasing of network are updated, training 20 times obtains Trained dynamic threshold SSD network model.

(d) by the pretreated 700 Zhang great little training sample image for being 500 × 500 be input to after pre-training on adopt In sample SSD network, set intersect entropy function as loss function, using gradient descent method reduce penalty values come carry out propagated forward with Backpropagation, iterative cycles propagated forward and backpropagation, and weighted value and the biasing of network are updated, training 20 times is instructed The up-sampling SSD network model perfected.

(5) test sample image is tested

(a) the classification confidence threshold value that target is arranged is 0.65.

It (c) is more than registration threshold value 0.6 with position registration of the non-maxima suppression algorithm to target in 2 groups of testing results Testing result carry out duplicate removal, retain the higher position coordinates of confidence level and its classification, obtain final detection result, choose average Precision mAP value evaluates testing result as evaluation index, and mAP value reaches 85.11%, every image averaging inspection Surveying the time is 0.13 second.

Embodiment 4

In above Examples 1 to 3, the present embodiment uses 600 images in the data set as training set, and 280 Image is opened as test set, and training set is not be overlapped with test set.

(1) image preprocessing

From other in image data concentration 600 training sample images of selection and 280 test sample images, the step Step is identical as corresponding embodiment.

(2) dynamic threshold SSD network is constructed

Input layer is 600 training sample images, and the pixel of training sample image is identical as corresponding embodiment.The step Other steps it is same as Example 1.

(b) building pre-selection frame matching process

The step is same as Example 1.

(3) building up-sampling SSD network

(a) building up-sampling SSD network structure

The step is same as Example 1.

(4) the improved SSD dual network of training

Step (a) is identical as corresponding embodiment as step (b).

(c) by pretreated 600 training sample images, in the dynamic threshold SSD network after being input to pre-training, instruction The pixel for practicing sample image is identical as corresponding embodiment.Other steps are identical as corresponding embodiment.

(d) by pretreated 600 training sample images, in the up-sampling SSD network after being input to pre-training, training The pixel of sample image is identical as corresponding embodiment.Other steps are identical as corresponding embodiment.

(5) test sample image is tested

(b) pretreated 280 test sample images are input to trained dynamic threshold SSD network and upper simultaneously It is tested in sampling SSD network, obtains 2 groups of different image object testing results, wherein every group of result includes each target Position coordinates, classification and generic confidence level.

It (c) is more than registration threshold value 0.4 with position registration of the non-maxima suppression algorithm to target in 2 groups of testing results Testing result carry out duplicate removal, retain the higher position coordinates of confidence level and its classification, obtain final detection result, choose average Precision mAP value evaluates testing result as evaluation index, and mAP value reaches 82.01%, every image averaging inspection Surveying the time is 0.131 second.For convenience of observation testing result, the present embodiment chooses a Zhang great little from 280 test sample images The image (such as Fig. 6) for being 814 for 704 × 576, number is visualized, visualization result such as Fig. 7.As seen from Figure 7, it is examined in figure Raw position can be detected, and the confidence value of each examinee's classification is higher, and effect is preferable.

Embodiment 5

In above Examples 1 to 3, the present embodiment uses 800 images in the data set as training set, and 80 Image is as test set, and training set is not be overlapped with test set.

(1) image preprocessing

From other in image data concentration 800 training sample images of selection and 80 test sample images, the step Step is identical as corresponding embodiment.

(2) dynamic threshold SSD network is constructed

Input layer is 800 training sample images, and the pixel of the image of training sample is identical as corresponding embodiment.The step Other rapid steps are same as Example 1.

(b) building pre-selection frame matching process

The step is same as Example 1.

(3) building up-sampling SSD network

(a) building up-sampling SSD network structure

Input layer is 800 training sample images, and the pixel of training sample image is identical as corresponding embodiment.The step Other steps it is same as Example 1.

The step is same as Example 1.

(4) the improved SSD dual network of training

Step (a) is identical as corresponding embodiment as step (b).

(c) by pretreated 800 training sample images, in the dynamic threshold SSD network after being input to pre-training, instruction The pixel for practicing sample image is identical as corresponding embodiment.Other steps are identical as corresponding embodiment.

(d) by pretreated 800 training sample images, in the up-sampling SSD network after being input to pre-training, training The pixel of sample image is identical as corresponding embodiment.Other steps are identical as corresponding embodiment.

(5) test sample image is tested

(b) pretreated 80 test sample images are input to trained dynamic threshold SSD network and upper simultaneously It is tested in sampling SSD network, obtains 2 groups of different image object testing results, wherein every group of result includes each target Position coordinates, classification and generic confidence level.

It (c) is more than registration threshold value 0.4 with position registration of the non-maxima suppression algorithm to target in 2 groups of testing results Testing result carry out duplicate removal, retain the higher position coordinates of confidence level and its classification, obtain final detection result, choose average Precision mAP value evaluates testing result as evaluation index, and mAP value reaches 80.21%, every image averaging inspection Surveying the time is 0.129 second.For convenience of observation testing result, the present embodiment chooses a Zhang great little from 80 test sample images and is 704 × 576, the image (such as Fig. 8) that number is 475 is visualized, visualization result such as Fig. 9.As seen from Figure 9, light compared with Under conditions of dark, the position of examinee can preferably be detected in figure, missing inspection do not occur, the classification confidence value of examinee compared with It is high.

Claims

1. a kind of examination hall examinee position rapid detection method of improved SSD dual network, it is characterised in that be made of following step:

(1) image preprocessing

It is concentrated from image data and chooses 600~800 training sample images and 80~280 test sample images, use bilinearity The image of selection is normalized to 250 × 250~500 × 500 by pixel size by interpolation method；

(2) dynamic threshold SSD network is constructed

Input layer is the training sample image that 600~800 Zhang great little are 250 × 250~500 × 500, is connected after input layer The convolution (1) of VGG16, head and the tail sequentially connect convolution (2)~convolution (5) to convolution (1) afterwards, and sequentially connection is complete for head and the tail afterwards for convolution (5) Convolutional layer (6), full convolutional layer (7), head and the tail sequentially connect convolution (8)~convolution (11) to full convolutional layer (7) afterwards, convolution (4), full volume Lamination (7), convolution (8), convolution (9), convolution (10) and convolution (11) are connect with output layer afterwards；

(b) building pre-selection frame matching process

Convolutional layer (the 8- in convolutional layer (4-3), full convolutional layer (7), convolution (8) in initial configuration, in selection convolution (4) 2), the convolutional layer (9-2) in convolution (9), the convolutional layer (10-2) in convolution (10), the convolutional layer (11) in convolution (11) totally 6 A convolutional layer takes the pre-selection frame of different scale to be matched with true tag frame on 6 convolutional layers, divides the pre-selection positive and negative sample of frame This collection, matching process are that the area of the pre-selection frame and true tag frame in SSD network is handed over and compared greater than friendship and than threshold value IOU, note It is otherwise negative sample for positive sample；

Determine friendship and the method than threshold value IOU are as follows: determine according to the area of true tag frame and the ratio of input image pixels size The threshold value IOU of fixed friendship and ratio, dynamic adjust threshold value IOU and divide positive negative sample, and true tag frame area and input image pixels are big Small ratio≤0.01 hands over and is 0.4 than threshold value IOU, otherwise, hands over and is 0.5 than threshold value IOU, determine true tag frame as the following formula Area S_GTbWith friendship and than threshold value IOU:

S_GTb=(y_max-y_min)×(x_max-x_min)

Wherein x_max、x_min、y_max、y_minThe respectively maxima and minima of the abscissa of true tag frame and ordinate；S_input For the area of input picture, trained network is constituted by dynamic threshold SSD network according to the matching process；

(3) building up-sampling SSD network

(a) building up-sampling SSD network structure

Input layer is the training sample image that 600~800 Zhang great little are 250 × 250~500 × 500, is connected after input layer The convolution (1) of VGG16, head and the tail sequentially connect convolution (2), convolution (3) to convolution (1) afterwards, and head and the tail sequentially connect convolution to convolution (3) afterwards (4) convolutional layer (4-1), convolutional layer (4-2), convolutional layer (4-2) connect the up-sampling layer (4-4) that size is 76 × 76 afterwards, on Sample level (4-4) connects the prediction interval (4-5) that convolution kernel number is 512, convolution kernel size is 3 × 3 afterwards, after convolutional layer (4-2) Connect convolutional layer (4-3), convolutional layer (4-3) connects convolution (5) afterwards, convolution (5) be linked in sequence from beginning to end afterwards full convolutional layer (6), Full convolutional layer (7), head and the tail sequentially connect convolution (8)~convolution (11) to full convolutional layer (7) afterwards, convolutional layer (4-5), full convolutional layer (7), convolution (8), convolution (9), convolution (10) and convolution (11) are connect with output layer afterwards, constitute up-sampling SSD network structure；

In up-sampling SSD network structure, in the prediction interval (4-5), full convolutional layer (7), convolution (8) in selection convolution (4) The convolutional layer (10-2) in convolutional layer (9-2), convolution (10) in convolutional layer (8-2), convolution (9), the convolution in convolution (11) Layer (11) totally 6 convolutional layers, take the pre-selection frame of different scale to be matched on 6 convolutional layers with true tag frame, will preselect Frame is divided into positive sample collection and negative sample collection, and specific matching principle is to preselect frame and the area of true tag frame is handed over and compare greater than friendship And than threshold value IOU, it is denoted as positive sample, is otherwise negative sample, wherein sets friendship and than threshold value IOU as 0.5；

(4) the improved SSD dual network of training

Improved SSD dual network is made of dynamic threshold SSD network and up-sampling SSD network, simultaneously training, trained step Are as follows:

(a) setting classification number is 2, the t raining period of training set is 8~20 times, training batch size is 15~32, basic learning Rate parameter is 0.0002~0.001, learning rate decay factor is 0.7~0.95；

(b) pre-training is carried out to improved SSD dual network using VOC2007 data set, by the weight and biasing work after pre-training Initial weight and biasing for improved SSD dual network；

(c) pretreated 600~800 Zhang great little training sample image for being 250 × 250~500 × 500 is input to pre- instruction In dynamic threshold SSD network after white silk, set intersect entropy function as loss function, using gradient descent method reduction penalty values come into Row propagated forward and backpropagation, iterative cycles propagated forward and backpropagation, and weighted value and the biasing of network are updated, training 8~20 times, obtain trained dynamic threshold SSD network model；

(d) pretreated 600~800 Zhang great little training sample image for being 250 × 250~500 × 500 is input to pre- instruction In up-sampling SSD network after white silk, sets and intersect entropy function as loss function, reduce penalty values using gradient descent method to carry out Propagated forward and backpropagation, iterative cycles propagated forward and backpropagation, and weighted value and the biasing of network are updated, training 8 ~20 times, obtain trained up-sampling SSD network model；

(5) test sample image is tested

(a) the classification confidence threshold value that target is arranged is 0.45~0.65；

(b) pretreated 80~280 test sample images are input to trained dynamic threshold SSD network and upper simultaneously It is tested in sampling SSD network, obtains 2 groups of different image object testing results, wherein every group of result includes each target Position coordinates, classification and generic confidence level；

(c) with non-maxima suppression algorithm to the position registration of target in 2 groups of testing results be more than registration threshold value 0.35~ 0.6 testing result carries out duplicate removal, retains the higher position coordinates of confidence level and its classification, obtains final detection result, chooses Average precision mAP value evaluates testing result as evaluation index.

2. the examination hall examinee position rapid detection method according to claim 1 for improving SSD dual network, it is characterised in that In the step of constructing dynamic threshold SSD network step (2) (b), the width ω of the pre-selection frame of the different scale_kWith height h_k Are as follows:

Wherein, S_kFor size basis value, S_minIt is 0.1, S_maxIt is 0.9, respectively indicates smallest dimension and out to out, m is prediction Convolutional layer number, be 6, k be (1, m] integer, a be pre-selection different the ratio of width to height of frame, a ∈ { 1/3,1/2,1,2,3 }.

3. the examination hall examinee position rapid detection method according to claim 1 for improving SSD dual network, it is characterised in that: In the step of training improved SSD dual network step (4) (a), the t raining period of training set is 10 times, training batch ruler Very little is 16.

4. the examination hall examinee position rapid detection method according to claim 1 for improving SSD dual network, it is characterised in that: In described the step of carrying out testing procedure (5) to test sample image (a), the classification confidence threshold value of target is 0.5.

5. a kind of examination hall examinee position rapid detection method for improving SSD dual network according to claim 1, feature exist In: in the step of of carrying out testing procedure (5) to test sample image (c), the registration of position coordinates in testing result Threshold value is 0.4.