CN109409210A

CN109409210A - A kind of method for detecting human face and system based on SSD frame

Info

Publication number: CN109409210A
Application number: CN201811058101.9A
Authority: CN
Inventors: 王鲁许; 董远; 白洪亮; 熊风烨
Original assignee: Beijing Faceall Co
Current assignee: SUZHOU FEISOU TECHNOLOGY Co.,Ltd.
Priority date: 2018-09-11
Filing date: 2018-09-11
Publication date: 2019-03-01
Anticipated expiration: 2038-09-11
Also published as: CN109409210B

Abstract

The embodiment of the invention provides a kind of method for detecting human face and system based on SSD frame, this method comprises: marking anchor point to target facial image；Based on the SSD frame convolutional neural networks after training, the target facial image for being labeled with anchor point is detected, gets target face frame, the SSD frame convolutional neural networks are obtained by being labeled with the sample facial image training of anchor point.The anchor point matching algorithm that the embodiment of the present invention passes through the adaptive face size distribution of setting, face in every facial image is matched into most suitable anchor point according to facial size size, to generate trained human face region, SSD frame convolutional neural networks are trained up, the performance of Face datection is improved.

Description

A kind of method for detecting human face and system based on SSD frame

Technical field

The present embodiments relate to deep learning field more particularly to a kind of method for detecting human face based on SSD frame and System.

Background technique

In recent years, with the fast development of the new and high technologies such as computer, human face detection tech neural network based is wide It is general to apply the convolutional neural networks for being based particularly on SSD training frame in fields such as photography, monitoring and gate inhibitions that there is detection speed Fastly, it detects the big object aspect high feature of accuracy rate and is paid close attention to by people.

Due to being limited to low-level feature abstract deficiency, the convolutional neural networks based on SSD frame are various sizes of in detection Effect when face is poor, especially small size face (size that small size face refers to is in 50px*50px face below) Aspect performance is relatively poor, in the limited situation of training data, the information of mining data cannot be trained, be caused well Training is insufficient, reduces the performance of the convolutional neural networks monitoring face of SSD frame.

Therefore, it needs a kind of to solve the above problems based on the method for detecting human face of SSD frame and system now.

Summary of the invention

In view of the problems of the existing technology, the embodiment of the invention provides a kind of Face datection sides based on SSD frame Method and system.

In a first aspect, the embodiment of the invention provides a kind of method for detecting human face based on SSD frame, comprising:

Anchor point is marked to target facial image；

Based on the SSD frame convolutional neural networks after training, the target facial image for being labeled with anchor point is detected, is obtained Target face frame is got, the SSD frame convolutional neural networks are obtained by being labeled with the sample facial image training of anchor point.

Second aspect, the embodiment of the invention provides a kind of face detection systems based on SSD frame, comprising:

Anchor point labeling module, for marking anchor point to target facial image；

Face detection module, for based on the SSD frame convolutional neural networks after training, to the target person for being labeled with anchor point Face image is detected, and gets target face frame, and the SSD frame convolutional neural networks are by being labeled with the sample graph of anchor point As training obtains.

The third aspect the embodiment of the invention provides a kind of electronic equipment, including memory, processor and is stored in storage On device and the computer program that can run on a processor, realize that such as first aspect mentions when the processor executes described program The step of method of confession.

Fourth aspect, the embodiment of the invention provides a kind of non-transient computer readable storage mediums, are stored thereon with meter Calculation machine program is realized as provided by first aspect when the computer program is executed by processor the step of method.

A kind of method for detecting human face and system based on SSD frame provided in an embodiment of the present invention, it is adaptive by being arranged The anchor point matching algorithm of face size distribution, the face in every facial image is most suitable according to the matching of facial size size Anchor point trains up SSD frame convolutional neural networks, improves Face datection to generate trained human face region Performance.

Detailed description of the invention

In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is this hair Bright some embodiments for those of ordinary skill in the art without creative efforts, can be with root Other attached drawings are obtained according to these attached drawings.

Fig. 1 is the flow diagram of the method for detecting human face provided in an embodiment of the present invention based on SSD frame；

Fig. 2 is the training flow diagram of SSD frame convolutional neural networks provided in an embodiment of the present invention；

Fig. 3 is the structural schematic diagram of the face detection system provided in an embodiment of the present invention based on SSD frame；

Fig. 4 is electronic devices structure schematic diagram provided in an embodiment of the present invention.

Specific embodiment

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.

Face datection, which refers in the picture given by certain computerized algorithm to any one width, detects people therein Face region, and the position of human face region is marked out to the relevant technologies come in picture.SSD frame convolutional neural networks are in face Have detection speed fast in detection, detects the big object aspect high feature of accuracy rate and paid close attention to by people, but be limited to low layer Feature extraction deficiency causes its performance in terms of detecting small size face relatively poor, common in the art based on SSD's Face datection frame main problem are as follows: low-level feature network structure is relatively easy, cannot sufficiently excavate small size face characteristic, make At in terms of small size face detection performance it is poor (size that small size face refers to is in 50px*50px face below)；Instruction Anchor point cannot be matched well by practicing data, caused to train insufficient, training data cannot be sufficiently excavated, especially in training data In limited situation, the information of mining data it cannot be trained well；Due to face training sample small-medium size face, in The specific gravity of equidimension face and large scale face is not necessarily balanced, and various sizes of face is caused to be to the contribution of loss function It is different, it is unfavorable for optimizing network parameter.

Therefore, how to effectively improve performance of the SSD frame convolutional neural networks in terms of detecting small size face and mention The problem of recall rate and accuracy rate of high entire detector are present urgent need to resolve.

It includes face or personage's body that sample facial image and target facial image described in the embodiment of the present invention, which refer to, The image of body also can be used as the embodiment of the present invention even if face or character physical in image is imperfect or only includes part In facial image.In addition, sample facial image and target facial image only to it in SSD frame convolutional neural networks into Made difference when row training or application, actual nature is the same between the two, is facial image, the embodiment of the present invention is not It is specifically limited.

Fig. 1 is the flow diagram of the method for detecting human face provided in an embodiment of the present invention based on SSD frame, such as Fig. 1 institute Show, the embodiment of the invention provides a kind of method for detecting human face based on SSD frame, comprising:

Step 101, anchor point is marked to target facial image；

Step 102, based on the SSD frame convolutional neural networks after training, to be labeled with the target facial image of anchor point into Row detection, gets target face frame, and the SSD frame convolutional neural networks are instructed by being labeled with the sample facial image of anchor point It gets.

In embodiments of the present invention, first by step 101, anchor point is marked to target facial image, in SSD frame convolution In neural network, for face characteristic in significantly more efficient extraction training data, anchor point is marked to each sample facial image. Step 202, based on the SSD frame convolutional neural networks after training, the target facial image for being labeled with anchor point is detected, is obtained Target face frame is got, the SSD frame convolutional neural networks are obtained by being labeled with the sample facial image training of anchor point, In training process, for feature extraction layer, SSD frame convolutional neural networks can be raw in each of sample facial image position At the anchor point of certain amount, for matching the face in training data, the face being only matched to by anchor point could treat as positive sample This (demarcates the face of anchor point), and the region without the matched anchor point delineation of face can be identified as negative sample (background), so Positive sample and negative sample region can extract feature respectively afterwards, are sent into loss function and classified and returned.

Therefore, the matched validity of anchor point directly affects the effect of classification, influences final detection performance.For example, if One anchor point is matched to a human face region many comprising background, will carry more redundancy noise when classification, Influence classifying quality.In the ideal case, if each anchor point is just matched to human face region, and ambient noise is not included When, classifying quality can be optimal.The embodiment of the present invention chooses corresponding ruler according to the facial size being distributed in sample facial image Transformation ratio is spent, scaling processing is carried out to sample facial image, then the sample facial image after scaling is taken just at random Square region, and the anchor point for being square the matching corresponding size of the face in region is labeled, and obtains the sample for being labeled with anchor point This facial image, scales and a series of processing such as go mean value finally by Random Level overturning, picture, obtains for SSD frame The training sample set of frame convolutional neural networks.

The embodiment of the present invention, will be in every facial image by the anchor point matching algorithm of the adaptive face size distribution of setting Face most suitable anchor point matched according to facial size size make SSD frame convolutional Neural to generate trained human face region Network is trained up, and the performance of Face datection is improved.

On the basis of the above embodiments, Fig. 2 is the training of SSD frame convolutional neural networks provided in an embodiment of the present invention Flow diagram, as shown in Fig. 2, the SSD frame convolutional neural networks are obtained by following steps training:

Step 201, random change of scale is carried out to sample facial image, obtains first sample facial image；

Step 202, square area is taken at random to the first sample facial image, if the square area contains Face, the then anchor point for matching corresponding size are labeled, and obtain the sample facial image for being labeled with anchor point；

Step 203, Random Level overturning is carried out to the sample facial image for being labeled with anchor point, obtains the second sample people Face image；

Step 204, by the size scaling of the second sample facial image to network inputs size and mean value is subtracted, obtained Third sample facial image is trained the SSD frame convolutional neural networks by the third sample facial image.

In embodiments of the present invention, due to the face of training data different scale greatly in the picture of shooting, data Distribution may unevenly, such as shoot parade picture small-medium size face accounting it is relatively large, and shone in front, party takes pictures Etc. large scales face accounting it is larger.Therefore the face for different scale is needed to do anchor point matching respectively.In order to increase data Diversity enables the small size face in original image to be matched to relatively large anchor point, and large scale face can also be matched to phase To lesser anchor point, original image is carried out random scaling by this patent, then takes off region from the picture after scaling With anchor point, the small size face amplified in this way can also be matched to big anchor point, and large scale face can also be matched to small anchor point, thus So that various sizes of anchor point has been obtained training, improves the ability to express of network.

Random change of scale is carried out to sample facial image first, specific as follows:

S1, the face number for being less than 32px and the size greater than 256px in a sample facial image is calculated separately, and The face sum of all sizes, is denoted as num_small, num_large, num_total respectively in this sample facial image；

If S2, num_small/num_total are greater than certain threshold value (such as 0.8), illustrate this sample face figure Small size face as in is occupied than great, then makees the random place of size constancy or enhanced processing to sample facial image Reason；

If the condition of S3, S2 are unsatisfactory for, the value of num_large/num_total is calculated, if it is greater than certain threshold value (example As 0.8), illustrates occupying than great for large scale face in this figure, then size constancy or diminution are made to sample facial image The random process of processing；

If the condition in S4, S2 and S3 is not satisfied, illustrate the face distribution phase of various sizes in sample facial image To random process that is uniform, then making size constancy to sample facial image, zoom in or out；

First sample facial image is obtained by above-mentioned random scaling step, then in first sample facial image Square area is taken at random, the square area without face is given up, is contained until in the square area taken at random Face, then corresponding anchor point is matched for the square area containing face, as the sample facial image for being labeled with anchor point.Then The sample facial image that anchor point will be labeled with carries out Random Level overturning, obtains the second sample facial image, then by the second sample Facial image zooms to network inputs size, for example, zooming to input size 640px*640px, wherein zooming parameter includes INTER_LINEAR, INTER_CUBIC, INTER_AREA, INTER_NEAREST, INTER_LANCZOS4, each scaling ginseng Number randomly selects, and the sample facial image after scaling is finally subtracted mean value, obtains the instruction for SSD frame convolutional neural networks Practice sample set.

On the basis of the above embodiments, random change of scale is carried out to sample facial image described, obtains the first sample Before this facial image, comprising:

The sample facial image is pre-processed, the pretreatment includes contrast, brightness plus white noise or tone Processing.

In embodiments of the present invention, various picture processing, including contrast, brightness plus white noise are carried out to original image Or the processing of tone, while each pretreated method being randomly selected by probability 0.5.In entire preprocessing process In, main purpose is to eliminate the redundancy in sample facial image, filters out interference and noise, restores necessary true letter Breath, to increase the detectability of relevant information, and then the extraction of Enhanced feature, the segmentation of image, matching and identification can By property, the quality of data is improved for subsequent processing as far as possible.

It is on the basis of the above embodiments, described that random change of scale is carried out to sample facial image, comprising:

Change of scale ratio is set according to the facial size distribution situation in the sample facial image, for the sample Facial image carries out random change of scale, and the change of scale ratio is 0.5,1 or 2.

It in embodiments of the present invention, is 0.5,1 or 2 by change of scale ratio setting, corresponding operation is by sample The length and width of this facial image reduce 0.5 times, remain unchanged, twice of length and width amplification, according to various sizes of in sample facial image Face is distributed accounting situation, chooses corresponding change of scale ratio.For example, the small size face in sample facial image occupies When specific gravity is larger, change of scale ratio is randomly selected from 1.0,2.0, if large scale face occupy specific gravity it is larger when, scale become Ratio is changed to randomly select from 0.5,1.0, if the face distribution of the various sizes in sample facial image is relatively uniform, scale Transformation ratio is randomly selected from 0.5,1.0,2.0.

The embodiment of the present invention ensure that each sample facial image can be with by the way that corresponding change of scale ratio is arranged Most suitable anchor point is matched, and the training effectiveness of SSD frame convolutional neural networks is improved.

On the basis of the above embodiments, described that square area is taken at random to the first sample facial image, if The square area contains face, then the anchor point for matching corresponding size is labeled, and obtains the sample face for being labeled with anchor point Image, comprising:

If the square area contains face, and the size of face is more than or equal to the ratio between the size of anchor point to be matched Preset threshold, the then anchor point for matching corresponding size are labeled；

If the square area contains face, and the ratio between the size of face and the size of anchor point to be matched are less than described Preset threshold then gives up the square area；

If the square area does not contain face, give up the square area.

In embodiments of the present invention, one piece of square area is taken at random to first sample facial image first, if this is just Square region is free of face, then gives up this square area, then proceed to take first sample facial image at random, until scratching Contain face in the square human face region taken, (default is the width and network inputs size for then calculating the square area 640) ratio, then be multiplied respectively with the longest edge of face in the square area, it obtains corresponding under network inputs size Then facial size S chooses and carries out anchor point mark with this facial size S and immediate anchor point size, for example, anchor point size Be 16,32,64,128,256 and 512, choose while, it is necessary to assure facial size S be less than anchor point size, and with most The ratio between close anchor point size is more than or equal to preset threshold, and in embodiments of the present invention, setting preset threshold is 0.9, if face When size S and immediate anchor point size carry out anchor point mark, the ratio between facial size S and immediate anchor point size are less than default Threshold value then gives up the square area.In addition, if the operation for take square area to first sample facial image reaches After certain number, for example, if the square area taken each time does not comply with requirement, just will after taking 50 times First sample facial image continues subsequent processing.It should be noted that the embodiment of the present invention includes one with square area A face is illustrated, and when in square area including multiple faces, is successively carried out above-mentioned step to multiple faces and is carried out Processing, if all people's face does not meet preset threshold in the square area, gives up the square area, specific steps It repeats no more.

The embodiment of the present invention is by matching adaptive face size to the square area taken in sample facial image Anchor point guarantees that the face in square area can be matched with most suitable anchor point, to sufficiently excavate training data, makes SSD frame convolutional neural networks are trained up, and the performance of Face datection is improved.

On the basis of the various embodiments described above, square region is taken at random to the first sample facial image described Domain, if the square area contains face, the anchor point for matching corresponding size is labeled, and obtains the sample for being labeled with anchor point After facial image, comprising:

By the facial image for including in the sample facial image for being labeled with anchor point by small size facial image, medium ruler Very little facial image and large scale facial image are classified；

Face penalty values after calculating separately facial image size classes, and according to formula:

Loss=L1+L2+L3+L4

Calculating final face penalty values, wherein L1 is small size face penalty values, and L2 is medium size face penalty values, L3 is large scale face penalty values, and L4 is all size face's penalty values；

The sample facial image for being labeled with anchor point is optimized according to the final face penalty values, for described SSD frame convolutional neural networks are trained.

In embodiments of the present invention, each feature extraction layer is each responsible for the face of one size of detection, such as Conv3_3 is responsible for detecting the face of 16px.Since small size face is unfavorable for detecting high-rise size is relatively small, Layer detects small size face (16px, 32px) SSD frame convolutional neural networks according to this from low to high, medium face (64px, 128px), large scale face (256px, 512px), but due in low layer characteristic pattern it is relatively large sized, small size face It is again relatively small, therefore when calculating loss function, the positive sample Limited Number that anchor point is matched to, and negative sample institute accounting High, the positive and negative proportional imbalance of weight.In order to alleviate this problem, OHEM algorithm is used in SSD frame, only chooses 3 times of positive sample numbers Purpose negative sample calculates loss function, then calculates the penalty values of the positive negative sample of the extraction of all feature extraction layers.But by The size of face is that have a certain range in normal picture, and common is that small size face and medium size face are in the majority, in this way Result in the shared penalty values ratio of large scale face relatively small, and small size face and the face of medium size size Penalty values proportion is higher, and will lead to large scale face in this way cannot train well, and effect is relatively poor.In order to solve The unbalanced problem of the face of different scale proportion in loss function, the embodiment of the present invention is by small size face, medium The penalty values of size face and large scale face are respectively calculated, and are then summed up again, while all size faces are added Penalty values, network is optimized using this tetrameric penalty values as final penalty values.Specifically, in small size face class In not, using the anchor point data for matching small size face as positive sample, it is not matched to the region of face anchor point as negative sample, Medium size face and large scale face are also made equally to handle.

The embodiment of the present invention calculates separately face penalty values to various sizes of face, then sums up again, adds simultaneously The face penalty values for entering the size carry out training sample using this tetrameric face penalty values as final penalty values Optimization, it is unbalanced to avoid various sizes of face penalty values proportion in loss function, leads to be unable to balance optimizing not The problem of with size face, improves the recall rate and accuracy of Face datection.

On the basis of the various embodiments described above, further, the face calculated separately after facial image size classes Penalty values specifically include:

Various sizes of face penalty values, formula in the sample image of anchor point are labeled with described in calculating are as follows:

Wherein, L_k({p_k,i},{t_k,i) lost for the face of the sample image for being labeled with anchor point under kth kind size Value, i is the label of an anchor point, p_k,iIt is in the face penalty values of the sample image for being labeled with anchor point under kth kind size I-th of anchor point is predicted as the probability of face,It is the label of mark, wherein the label containing face is 1, is free of face label It is 0, t_k,iIt is four position prediction coordinate vectors,It is four face anchor point coordinate vectors,It is Softmax loss Function,It is Smooth L1 loss function, N_clsFor the anchor point number of face and Background matching, N_regFor face The number for the anchor point matched, λ are the weight for matching two kinds of loss functions, default value 1；

After face penalty values in getting the sample facial image under different sizes, the final face is calculated Penalty values, formula are as follows:

Wherein, λ_kIt is the corresponding weighted value of the sorted four kinds of face penalty values of face scale, default value 1.It needs Bright, the classification function and regression function of the embodiment of the present invention are not limited to Softmax loss function and Smooth L1 damage Function is lost, other classification functions and regression function also can operate with the embodiment of the present invention.Therefore, the embodiment of the present invention is not to loss Function makees specific limit.

The embodiment of the present invention calculates separately penalty values to the face of different scale, then resulting penalty values are weighted place Reason, avoids the face penalty values of different scale in loss function, due to proportion it is unbalanced caused by cannot be balanced The problem of optimizing different scale face, improves the detection performance of SSD convolutional neural networks.

Fig. 3 is the structural schematic diagram of the face detection system provided in an embodiment of the present invention based on SSD frame, such as Fig. 3 institute Show, the embodiment of the invention provides a kind of face detection systems based on SSD frame, including anchor point labeling module 301 and face Detection module 302, wherein anchor point labeling module 301 is used to mark anchor point to target facial image；Face detection module 302 is used In based on the SSD frame convolutional neural networks after training, the target facial image for being labeled with anchor point is detected, mesh is got Face frame is marked, the SSD frame convolutional neural networks are obtained by being labeled with the sample image training of anchor point.

In embodiments of the present invention, for face characteristic, anchor point labeling module 301 in significantly more efficient extraction training data Anchor point is marked to each sample facial image in SSD frame convolutional neural networks is input to.When SSD frame convolutional neural networks After the completion of training, anchor point is marked to target facial image by anchor point labeling module 301, after the completion of anchor point mark, face inspection It surveys module 302 to detect the target facial image for being labeled with anchor point, to get target face frame.SSD frame convolution Neural network is in training process, and for feature extraction layer, SSD frame convolutional neural networks can be every in sample facial image One position generates the anchor point of certain amount, and for matching the face in training data, the face being only matched to by anchor point is Negative sample can be identified as positive sample (demarcating the face of anchor point), the region without the matched anchor point delineation of face (background), then positive sample and negative sample region can extract feature respectively, are sent into loss function and classified and returned.

Therefore, the matched validity of anchor point directly affects the effect of classification, influences final detection performance.In ideal situation Under, if each anchor point is just matched to human face region, and when not including ambient noise, classifying quality can be optimal.This Inventive embodiments choose corresponding scale transformation ratio according to the facial size being distributed in sample facial image, to sample facial image Scaling processing is carried out, then the sample facial image after scaling is taken into square area at random, and be square in region Face matching corresponding size anchor point be labeled, the sample facial image for being labeled with anchor point is obtained, finally by random water Flat overturning, picture scaling and a series of processing such as go mean value, obtain the training sample for SSD frame convolutional neural networks Collection.

The embodiment of the present invention, will be in every facial image by the anchor point matching algorithm of the adaptive face size distribution of setting Face most suitable anchor point matched according to facial size size improve Face datection to generate trained human face region Accuracy rate and recall rate.

Fig. 4 is electronic devices structure schematic diagram provided in an embodiment of the present invention, as shown in figure 4, the electronic equipment can wrap It includes: processor (processor) 401, communication interface (Communications Interface) 402, memory (memory) 403 and communication bus 404, wherein processor 401, communication interface 402, memory 403 are completed mutually by communication bus 404 Between communication.Processor 401 can call the logical order in memory 403, to execute following method: to target facial image Mark anchor point；Based on the SSD frame convolutional neural networks after training, the target facial image for being labeled with anchor point is detected, Target face frame is got, the SSD frame convolutional neural networks are obtained by being labeled with the sample facial image training of anchor point.

In addition, the logical order in above-mentioned memory 403 can be realized by way of SFU software functional unit and conduct Independent product when selling or using, can store in a computer readable storage medium.Based on this understanding, originally Substantially the part of the part that contributes to existing technology or the technical solution can be in other words for the technical solution of invention The form of software product embodies, which is stored in a storage medium, including some instructions to So that a computer equipment (can be personal computer, server or the network equipment etc.) executes each implementation of the present invention The all or part of the steps of example the method.And storage medium above-mentioned include: USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic or disk etc. it is various It can store the medium of program code.

The embodiment of the present invention discloses a kind of computer program product, and the computer program product is non-transient including being stored in Computer program on computer readable storage medium, the computer program include program instruction, when described program instructs quilt When computer executes, computer is able to carry out method provided by above-mentioned each method embodiment, for example, to target face figure As mark anchor point；Based on the SSD frame convolutional neural networks after training, the target facial image for being labeled with anchor point is examined It surveys, gets target face frame, the SSD frame convolutional neural networks are trained by the sample facial image for being labeled with anchor point It arrives.

The embodiment of the present invention provides a kind of non-transient computer readable storage medium, the non-transient computer readable storage medium The instruction of matter storage server, the computer instruction make computer execute the face based on SSD frame provided by above-described embodiment Detection method, for example, anchor point is marked to target facial image；Based on the SSD frame convolutional neural networks after training, to mark The target facial image for being marked with anchor point is detected, and target face frame is got, and the SSD frame convolutional neural networks pass through mark The sample facial image training for being marked with anchor point obtains.

Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations；Although Present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: it still may be used To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features； And these are modified or replaceed, technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution spirit and Range.

Claims

1. a kind of method for detecting human face based on SSD frame characterized by comprising

Anchor point is marked to target facial image；

Based on the SSD frame convolutional neural networks after training, the target facial image for being labeled with anchor point is detected, is got Target face frame, the SSD frame convolutional neural networks are obtained by being labeled with the sample facial image training of anchor point.

2. the method according to claim 1, wherein the SSD frame convolutional neural networks pass through following steps Training obtains:

Random change of scale is carried out to sample facial image, obtains first sample facial image；

Square area is taken to the first sample facial image at random, if the square area contains face, is matched The anchor point of corresponding size is labeled, and obtains the sample facial image for being labeled with anchor point；

Random Level overturning is carried out to the sample facial image for being labeled with anchor point, obtains the second sample facial image；

By the size scaling of the second sample facial image to network inputs size and mean value is subtracted, obtains third sample face Image is trained the SSD frame convolutional neural networks by the third sample facial image.

3. according to the method described in claim 2, it is characterized in that, carrying out random scale change to sample facial image described It changes, before obtaining first sample facial image, comprising:

The sample facial image is pre-processed, the pretreatment includes the place of contrast, brightness plus white noise or tone Reason.

4. according to the method described in claim 2, it is characterized in that, described carry out random change of scale to sample facial image, Include:

Change of scale ratio is set according to the facial size distribution situation in the sample facial image, for the sample face Image carries out random change of scale, and the change of scale ratio is 0.5,1 or 2.

5. according to the method described in claim 2, it is characterized in that, described take just the first sample facial image at random Square region, if the square area contains face, the anchor point for matching corresponding size is labeled, and obtains being labeled with anchor point Sample facial image, comprising:

If the square area contains face, and the ratio between the size of face and the size of anchor point to be matched are more than or equal to default Threshold value, the then anchor point for matching corresponding size are labeled；

If the square area contains face, and the ratio between the size of face and the size of anchor point to be matched are less than described preset Threshold value then gives up the square area；

If the square area does not contain face, give up the square area.

6. according to the method described in claim 2, it is characterized in that, being taken at random described to the first sample facial image Square area, if the square area contains face, the anchor point for matching corresponding size is labeled, and obtains being labeled with anchor After the sample facial image of point, comprising:

By the facial image for including in the sample facial image for being labeled with anchor point by small size facial image, medium size people Face image and large scale facial image are classified；

Loss=L1+L2+L3+L4

Calculate final face penalty values, wherein L1 is small size face penalty values, and L2 is medium size face penalty values, and L3 is Large scale face penalty values, L4 are all size face's penalty values；

The sample facial image for being labeled with anchor point is optimized according to the final face penalty values, for the SSD Frame convolutional neural networks are trained.

7. according to the method described in claim 6, it is characterized in that, the face calculated separately after facial image size classes Penalty values specifically include:

Wherein, L_k({p_k,i},{t_k,i) be the sample image for being labeled with anchor point under kth kind size face penalty values, i It is the label of an anchor point, p_k,iIt is i-th in the face penalty values of the sample image for being labeled with anchor point under kth kind size A anchor point is predicted as the probability of face,It is the label of mark, wherein the label containing face is 1, it is 0 without face label, t_k,iIt is four position prediction coordinate vectors,It is four face anchor point coordinate vectors,It is Softmax loss letter Number,It is Smooth L1 loss function, N_clsFor the anchor point number of face and Background matching, N_regFor face matching Anchor point number, λ be match two kinds of loss functions weight, default value 1；

After face penalty values in getting the sample facial image under different sizes, the final face loss is calculated Value, formula are as follows:

Wherein, λ_kIt is the corresponding weighted value of the sorted four kinds of face penalty values of face scale, default value 1.

8. a kind of face detection system based on SSD frame characterized by comprising

Face detection module, for based on the SSD frame convolutional neural networks after training, to the target face figure for being labeled with anchor point As being detected, target face frame is got, the SSD frame convolutional neural networks are instructed by being labeled with the sample image of anchor point It gets.

9. a kind of electronic equipment including memory, processor and stores the calculating that can be run on a memory and on a processor Machine program, which is characterized in that the processor is realized when executing described program such as any one of claim 1 to 7 the method Step.

10. a kind of non-transient computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer It is realized when program is executed by processor such as the step of any one of claim 1 to 7 the method.