CN108921017A

CN108921017A - Method for detecting human face and system

Info

Publication number: CN108921017A
Application number: CN201810506447.4A
Authority: CN
Inventors: 王鲁许; 董远; 白洪亮; 熊风烨
Original assignee: Beijing Faceall Co
Current assignee: SUZHOU FEISOU TECHNOLOGY Co.,Ltd.
Priority date: 2018-05-24
Filing date: 2018-05-24
Publication date: 2018-11-30
Anticipated expiration: 2038-05-24
Also published as: CN108921017B

Abstract

The application provides a kind of method for detecting human face and system.Method for detecting human face includes：It is handled by least one samples pictures of at least two-tier network structure to input, obtains at least two characteristic pattern corresponding at least at least two convolutional layers of two-tier network structure；Extract the characteristic information of first kind characteristic pattern and the characteristic information of the second category feature figure；According to the characteristic information of the characteristic information of first kind characteristic pattern and the second category feature figure, corresponding position coordinates obtain the detection position coordinates of face frame in samples pictures；The weighted value of each convolutional layer is updated according to the matching degree of the corresponding characteristic pattern of last bit convolutional layer and target image；Face datection model is generated according to the weighted value of each convolutional layer.It since first kind feature extraction layer is located further forward the relatively large sized of its characteristic plane, can be used for detecting small face, the purpose of small face detectability can be improved by the extraction of the characteristic information to first kind characteristic pattern.

Description

Method for detecting human face and system

Technical field

This application involves technical field of image detection more particularly to a kind of method for detecting human face and systems.

Background technique

Face datection is the process in locating human face region in the picture.Face datection is mainly used in people in practical applications In face identifying system, recognition of face is further carried out according to the human face region detected.SSD(single shot Multibox detector) it is widely used in Face datection as a kind of quick target detection frame, it is based on its feature The dimensional characteristic SSD network of extract layer accuracy rate with higher in terms of detecting big object.

SSD is a kind of single step detection convolutional neural networks frame, is broadly divided into two parts：A part is positioned at front end Facilities network network layers (such as VGG)；Another part is the feature extraction layer added on the basis of facilities network network layers.SSD network is being examined When mapping piece, feature vector is extracted according to the characteristic plane of convolutional layer, to detect face according to feature vector.Due to SSD net Network structure level number is more, and network is relatively deep, and the size of characteristic plane is gradually smaller, corresponding with characteristic plane accordingly silent Recognize frame size to become larger, therefore SSD network accuracy rate with higher in terms of detecting big object, and wisp is through excessive The characteristic information that can obtain of feature extraction layer is relatively fewer after layer convolution operation, thus in terms of detecting small face performance compared with Difference.

Summary of the invention

In view of this, the embodiment of the present application provides a kind of method for detecting human face and system, it is small to solve the detection of SSD network The problem of face performance difference.

The embodiment of the present application adopts the following technical solutions：

The embodiment of the present application provides a kind of method for detecting human face, including：

Process of convolution is carried out by least one samples pictures of at least two-tier network structure to input, is obtained and at least two Corresponding at least two characteristic pattern of at least two convolutional layers of layer network structure, at least two convolutional layer includes at least one First kind feature extraction layer and at least one second category feature extract layer, characteristic pattern corresponding with the first kind feature extraction layer For first kind characteristic pattern, characteristic pattern corresponding with the second category feature extract layer is the second category feature figure, wherein described first Category feature extract layer is located at before the second category feature extract layer；

Extract the characteristic information of the first kind characteristic pattern；

Extract the characteristic information of the second category feature figure；

According to the characteristic information of the characteristic information of the first kind characteristic pattern and the second category feature figure in the samples pictures In corresponding position coordinates obtain the detection position coordinates of face frame；

The weight of each convolutional layer is updated according to the matching degree of the corresponding characteristic pattern of last bit convolutional layer and target image Value, the last bit convolutional layer are the convolutional layer of the end output layer in at least two-tier network structure；

Face datection model is generated according to the weighted value of each convolutional layer.

The embodiment of the present application also provides method for detecting human face, including：

Process of convolution is carried out by be detected picture of at least two-tier network structure to input based on Face datection model, is obtained To at least two characteristic pattern corresponding at least at least two convolutional layers of two-tier network structure, at least two convolutional layers packet At least one first kind feature extraction layer and at least one second category feature extract layer are included, with the first kind feature extraction layer pair The characteristic pattern answered is first kind characteristic pattern, and characteristic pattern corresponding with the second category feature extract layer is the second category feature figure, Described in first kind feature extraction layer be located at before the second category feature extract layer；

Extract the characteristic information of the second category feature figure；

According to the characteristic information of the characteristic information of the first kind characteristic pattern and the second category feature figure in the mapping to be checked Corresponding position coordinates obtain the detection position coordinates of face frame in piece.

The embodiment of the present application also provides a kind of face detection systems, including：

Processing unit, to be carried out at convolution by least one samples pictures of at least two-tier network structure to input Reason, obtains at least two characteristic pattern corresponding at least at least two convolutional layers of two-tier network structure, and described volume at least two Lamination includes at least one first kind feature extraction layer and at least one second category feature extract layer, is mentioned with first category feature Taking the corresponding characteristic pattern of layer is first kind characteristic pattern, and characteristic pattern corresponding with the second category feature extract layer is the second category feature Figure, wherein the first kind feature extraction layer is located at before the second category feature extract layer；

First extraction unit, to extract the characteristic information of the first kind characteristic pattern；

Second extraction unit, to extract the characteristic information of the second category feature figure；

Detection unit, to be existed according to the characteristic information of the first kind characteristic pattern and the characteristic information of the second category feature figure Corresponding position coordinates obtain the detection position coordinates of face frame in the samples pictures；

Updating unit, it is each described to be updated according to the matching degree of the corresponding characteristic pattern of last bit convolutional layer and target image The weighted value of convolutional layer, the last bit convolutional layer are the convolutional layer of the end output layer in at least two-tier network structure；

Generation unit, to generate Face datection model according to the weighted value of each convolutional layer.

Processing unit, to based on Face datection model by least two-tier network structure to the picture to be detected of input into Row process of convolution, obtains at least two characteristic pattern corresponding at least at least two convolutional layers of two-tier network structure, it is described extremely Few two convolutional layers include at least one first kind feature extraction layer and at least one second category feature extract layer, with described first The corresponding characteristic pattern of category feature extract layer is first kind characteristic pattern, and characteristic pattern corresponding with the second category feature extract layer is the Two category feature figures, wherein the first kind feature extraction layer is located at before the second category feature extract layer；

Detection unit, to be existed according to the characteristic information of the first kind characteristic pattern and the characteristic information of the second category feature figure Corresponding position coordinates obtain the detection position coordinates of face frame in the picture to be detected.

The embodiment of the present application also provides a kind of electronic systems, including at least one processor and memory, the storage Device is stored with program, and is configured to execute following steps by processor described at least one：

Extract the characteristic information of the second category feature figure；

The embodiment of the present application also provides a kind of computer readable storage mediums, including the journey being used in combination with electronic system Sequence, program can be executed by processor to complete following steps：

Extract the characteristic information of the second category feature figure；

At least one above-mentioned technical solution that the embodiment of the present application uses can reach following beneficial effect：

It is handled, is obtained and at least two layers of mesh by least one samples pictures of at least two-tier network structure to input Corresponding at least two characteristic pattern of at least two convolutional layers of network structure, at least two convolutional layer include at least one first Category feature extract layer and at least one second category feature extract layer, characteristic pattern corresponding with the first kind feature extraction layer are the A kind of characteristic pattern, characteristic pattern corresponding with the second category feature extract layer is the second category feature figure, wherein the first kind is special Sign extract layer is located at before the second category feature extract layer；Extract the characteristic information and described second of the first kind characteristic pattern The characteristic information of category feature figure；According to the characteristic information of the characteristic information of the first kind characteristic pattern and the second category feature figure in institute State the detection position coordinates that corresponding position coordinates in samples pictures obtain face frame；According to the corresponding characteristic pattern of last bit convolutional layer The weighted value of each convolutional layer is updated with the matching degree of target image, the last bit convolutional layer is positioned at described at least two layers The convolutional layer of end output layer in network structure；Face datection model is generated according to the weighted value of each convolutional layer.Due to First kind feature extraction layer is located further forward the relatively large sized of its characteristic plane, can be used for detecting small face, by the first kind The depths of features for detecting small face can be improved in the extraction of the characteristic information of characteristic pattern, and then realizes and improve small Face datection energy The purpose of power.

Detailed description of the invention

The drawings described herein are used to provide a further understanding of the present application, constitutes part of this application, this Shen Illustrative embodiments and their description please are not constituted an undue limitation on the present application for explaining the application.In the accompanying drawings：

Fig. 1 is existing SSD network structure；

Fig. 2 is the flow diagram of method for detecting human face of the present invention；

Fig. 3 is a kind of SSD network structure schematic diagram implemented using method for detecting human face of the present invention；

Fig. 4 is a kind of schematic illustration that convolutional layer merges in method for detecting human face of the present invention；

Fig. 5 is the flow diagram of method for detecting human face of the present invention；

Fig. 6 is a kind of method flow schematic diagram of embodiment of method for detecting human face of the present invention；

Fig. 7 is the structural schematic diagram of face detection system of the present invention；

Fig. 8 is the structural schematic diagram of face detection system of the present invention.

Specific embodiment

As shown in Figure 1, illustrating a kind of existing SSD network structure, mainly consist of two parts, a part be by The facilities network network layers (VGG-16 in dotted line frame) that preceding 5 layers of convolutional network of VGG-16 is used as, another part are that SSD network newly adds (Extra Feature Layers is i.e. for the feature extraction layer added：The convolutional layer added on the basis of facilities network network layers), it is used to Extract high-level characteristic information.Wherein convolutional layer Conv4_3, Conv7, Conv8_2, Conv9_2, Conv10_2, Conv11_ 2 be main feature extraction layer, and the output of these convolutional layers is rolled up respectively with the convolution nuclear phase of two 3 × 3 sizes to obtain feature Value, the probability value of one of convolutional layer output category, each default box (box on characteristic plane) generate 2 Probability value；The relative position coordinates of another convolutional layer output regression, each default box generate 4 relative coordinate values (x, y, w, h).In addition, this 6 convolutional layers also pass through the original coordinates that prior box (preceding case) layer generates default box.On The quantity of each layer of default box is given in 6 convolutional layers described in face.Finally by the calculated result of front three point Loss (loss) layer He Bing be then passed to calculate loss value and then carry out rear feed, regularized learning algorithm parameter.According to learning training Good module detects picture, in the detection process since the size of characteristic plane successively becomes smaller, accordingly and feature The corresponding detection zone of plane becomes larger, therefore SSD network accuracy rate with higher in terms of detecting big object, and exists to pixel 50 × 50 or less (such as：In terms of small face) target object for it is fewer and fewer by effective information after multilayer convolution therefore Detection performance is poor.

In view of the above-mentioned problems, the purpose in order to realize the application, the embodiment of the present application proposes that one kind can based on SSD network The method for detecting human face and system detected to small face (pixel is in 50 × 50 picture regions below), described in extracting First kind feature extraction layer is (i.e.：Convolutional layer in facilities network network layers) first kind characteristic pattern characteristic information, to characteristic information Prediction classification processing is carried out, the face feature vector in the samples pictures is determined, improves small Face datection to realize The purpose of ability.Extraction Small object image is improved into through normalization and process of convolution by the characteristic plane to first kind characteristic pattern Depths of features, to make full use of the small face of the larger suitable detection of the characteristic plane relative size of first kind feature extraction layer Advantage, and then achieve the purpose that improve the accuracy for detecting small face.

To keep the purposes, technical schemes and advantages of the application clearer, below in conjunction with the application specific embodiment and Technical scheme is clearly and completely described in corresponding attached drawing.Obviously, described embodiment is only the application one Section Example, instead of all the embodiments.Based on the embodiment in the application, those of ordinary skill in the art are not doing Every other embodiment obtained under the premise of creative work out, shall fall in the protection scope of this application.

Below in conjunction with attached drawing, the technical scheme provided by various embodiments of the present application will be described in detail.

Fig. 2 is the flow diagram of method for detecting human face provided by the embodiments of the present application.The method can be as follows. The theme of the embodiment of the present application can be face detection system, the detection part being also possible in face identification system.

Step S101：Process of convolution is carried out by least one samples pictures of at least two-tier network structure to input, is obtained To at least two characteristic pattern corresponding at least at least two convolutional layers of two-tier network structure, at least two convolutional layers packet At least one first kind feature extraction layer and at least one second category feature extract layer are included, with the first kind feature extraction layer pair The characteristic pattern answered is first kind characteristic pattern, and characteristic pattern corresponding with the second category feature extract layer is the second category feature figure, Described in first kind feature extraction layer be located at before the second category feature extract layer.

In the embodiment of the present application, it when at least two-tier network structure handles samples pictures, obtains being located at facilities network The first kind characteristic pattern of network layers and the second category feature figure positioned at feature extraction layer, wherein first kind feature extraction layer can be Positioned at the convolutional layer of facilities network network layers.The first kind characteristic pattern obtained when handling samples pictures is in addition to facilities network network layers Convolutional layer except may additionally include the convolutional layer added on the basis of facilities network network layers (i.e.：Feature extraction layer).Convolutional layer Characteristic plane size is successively successively decreased with the increase of the number of plies.With the corresponding inspection of picture feature information for increasing acquisition of convolutional layer It is also more next big to survey area.

Further, before handling samples pictures, samples pictures can be adjusted to pre-set dimension, such as：300×300 Pixel, then handled, so that the size of samples pictures meets the requirement of SSD network structure.

Step S102：Extract the characteristic information of the first kind characteristic pattern.

In the embodiment of the present application, the characteristic information of the first kind feature extraction layer is extracted, including：

The first kind characteristic pattern compress；

Process of convolution is carried out to compressed first kind characteristic pattern, to obtain the feature letter of the first kind characteristic pattern Breath.

Specifically, it is Conv3_3 layers with first kind feature extraction layer to be illustrated：It is individual for Conv3_3 layers of addition Feature extraction layer.Since Conv3_3 layers relatively forward, characteristic plane is relatively large sized, and include is some relatively low levels, letter Single feature, original SSD network are not carry out feature extraction individually to this layer.Since facilities network network layers feature is more held Small face is easily detected, so can be individually in Conv3_3 layers of addition feature extraction layer according to the method for detecting human face that the application proposes To improve the ability for detecting small face.The feature extraction structure of this layer is as shown in Figure 3.By Conv3_3 layers respectively through Norm (normalizing Change) layer, the processing of 3 × 3 convolutional layers, then by the extraction prior box of SSD network and class probability and the layer of relative position To obtain feature vector, loss layers of calculating loss value are recently entered.

By to first kind characteristic pattern into through normalization and process of convolution improve extract image depths of features, thus sufficiently Using the advantage of the larger suitable small face of detection of the characteristic plane relative size of first kind feature extraction layer, and then reach raising inspection Survey the purpose of the accuracy of small face.

Further, also original classification Softmax Loss can be changed to Focal Loss, it is demonstrated experimentally that Focal The learning ability to difficult sample can be improved in Loss, to improve the performance of network.

It should be noted that first kind feature extraction layer can be any one layer in facilities network network layers, it is not limited to Conv3_3 layers.

In the embodiment of the present application, it while extracting the characteristic information of the first kind feature extraction layer, also can extract The feature vector of each convolutional layer after the first kind feature extraction layer.

Specifically, first kind feature extraction layer can refer to for the extraction of the feature vector of the convolutional layer of facilities network network layers Feature information extraction mode extracts feature vector, and the extraction of the feature vector of feature extraction layer can be used the defeated of convolutional layer It is rolled up respectively with the convolution nuclear phase of two 3 × 3 sizes out and obtains feature vector to obtain the mode of characteristic value.

Step S103：Extract the characteristic information of the second category feature figure.

In the embodiment of the present application, the characteristic information of the second category feature figure is extracted, including：

The corresponding second category feature figure of at least two second category feature extract layers is merged；

Corresponding characteristic information is extracted from the second category feature figure after merging.

In the present embodiment, the second category feature extract layer can be the convolutional layer Conv4_3 positioned at facilities network network layers, right The the second category feature figure answered can also be used rolls up the output of convolutional layer respectively to obtain spy with the convolution nuclear phase of two 3 × 3 sizes The mode of value indicative obtains characteristic information.

Further, the corresponding second category feature figure of at least two second category feature extract layers is merged, including：

A second category feature figure relatively large sized to characteristic plane in any two the second category feature figure carries out down Sampling processing, in the second category feature figure that down-sampling is handled and the second category feature of any two figure another the Two category feature figures merge；Or

The relatively small second category feature figure of characteristic plane size in any two the second category feature figure is carried out anti- Process of convolution, in the second category feature figure that deconvolution is handled and the second category feature of any two figure another the Two category feature figures merge.

It is further illustrated for merging the corresponding second category feature figure of two the second category feature extract layers：By the second class The output of feature extraction layer rolls up with the convolution nuclear phase of two 3 × 3 sizes obtain characteristic information to obtain the mode of characteristic value respectively； The mode that the corresponding characteristic information of merging extraction also can be used obtains characteristic information, such as：The down-sampling of one second category feature figure (Pooling) layer and another second category feature figure merge the deconvolution of extraction feature vector or one second category feature figure (devolution) layer and another second category feature figure merge extraction characteristic information.It should be noted that the second category feature Figure can also merge with first kind characteristic pattern to obtain the characteristic information of feature extraction layer.Corresponding characteristic information is extracted using merging Mode be able to ascend general image detection accuracy rate and recall rate.

Wherein, as soon as down-sampling refers to that sample value samplings several for sample sequence interval are primary, new sequence is obtained in this way It is the down-sampling of former sequence.Deconvolution is the contrary operation of convolution operation as its name suggests.Convolution operation is input picture, output figure The feature of piece, theoretical foundation are the translation invariances in statistic invariance, play the role of dimensionality reduction；Deconvolution is input picture Feature exports picture, plays the role of reduction.

In practical applications, it when the corresponding second category feature figure of extra two the second category feature extract layers, still can be used The above-mentioned mode merged two-by-two extracts feature vector.

In the embodiment of the present application, the corresponding second category feature figure of at least three second category feature extract layers is merged, packet It includes：

Another in the second category feature figure for handling down-sampling and the second category feature of any two figure It is also that the second category feature figure of merging and another second class through deconvolution processing is special after second category feature figure merges Sign figure is remerged, and the characteristic plane size of another the second category feature figure is less than second category feature of any two The characteristic plane size of figure；

Wherein the second category feature figure that down-sampling is handled with it is another in the second category feature of any two figure During a second category feature figure merges, one relatively large sized to characteristic plane in any three the second category feature figures Two category feature figures carry out down-sampling processing.

When merging at least three second category feature figures, three the second category feature figures can be merged and extract corresponding feature Information.It is described in detail for having seven the second category feature extract layers：The down-sampling layer of one second category feature figure and one second class The anti-second category feature figure and another second category feature figure of characteristic pattern merge extraction feature vector.As Figure 3-Figure 4, Conv3_3 be first kind feature extraction layer, Conv4_3 layers, Conv7 layers, Conv8_2 layers, Conv9_2 layers, Conv10_2, Conv11_2 is the second category feature extract layer；Since Conv4_3 layers, respectively to Conv4_3 layers, Conv7 layers, Conv8_2 layers, Conv9_2 layers, Conv10_2 layers of progress down-sampling, and deconvolution is successively carried out since Conv11_2 from opposite direction；First will The layer that the layer and next layer of progress deconvolution that middle layer and upper one layer of down-sampling obtain obtain merges, and is similar to Sanming City Structure is controlled, 4 feature extraction layers can be thus obtained.This 4 feature extraction layers not only have the relatively simple feature of low layer same When also there is high-rise relative complex feature, the ability to express of feature is more much better than network structure before.Wherein Conv4_3 layers and last bit Conv11_2 layers carry out characteristic vector pickup respectively.In addition to first kind feature extraction layer one shares 6 spies Levy extract layer.

It should be noted that in practical applications, the characteristic information of first kind feature extraction layer itself, last bit need to be extracted The characteristic information that the characteristic information of two category feature figures itself and three the second category feature figures merge, to obtain three second Three characteristic informations of category feature figure.When extra three the second category feature figures, above three the second category feature figure still can be used Combined mode extracts feature vector.Three the second category feature figures can be three adjacent the second category feature figures when merging, It can be three random the second category feature figures.

In practical applications, it is improved by the characteristic plane to first kind feature extraction layer into through normalization and process of convolution The depths of features for extracting image, to make full use of the larger suitable detection of the characteristic plane relative size of first kind feature extraction layer The advantage of small face, and then achieve the purpose that improve the accuracy for detecting small face.

Another in the second category feature figure for handling deconvolution and the second category feature of any two figure It is also that the second category feature figure of merging and another second class through down-sampling processing is special after second category feature figure merges Sign figure is remerged, and the characteristic plane size of another the second category feature figure is greater than second category feature of any two The characteristic plane size of figure；

Wherein the second category feature figure that deconvolution is handled with it is another in the second category feature of any two figure During a second category feature figure merges, one relatively small to characteristic plane size in any three the second category feature figures Two category feature figures carry out deconvolution processing.

It should be noted that in practical applications, the feature vector of first kind feature extraction layer itself, last bit need to be extracted The feature vector that the feature vector of two category feature figures itself and three the second category feature figures merge, to obtain three second Three feature vectors of category feature figure.When extra three the second category feature figures, above three the second category feature figure still can be used Combined mode extracts feature vector.Three the second category feature figures can be three adjacent the second category feature figures when merging, It can be three random the second category feature figures.

In the embodiment of the present application, any three second category feature figures are the second adjacent two-by-two category feature figure.

With reference to the structure of Fig. 4.Three the second category feature figures merge be respectively by three layers with 3 × 3 convolution kernel progress convolution, Again successively by BN (full name Batch Normalization, batch standardize) layer, Eltw Product it is laminated and as a result, again (i.e. by the final merging characteristic layer of PReLU layers of output：Merge layer).Then general by the extraction prior box of SSD and classification Rate value recently enters loss layers of calculating loss value.

Step S104：According to the characteristic information of the characteristic information of the first kind characteristic pattern and the second category feature figure described Corresponding position coordinates obtain the detection position coordinates of face frame in samples pictures.

In the embodiment of the present application, believed according to the feature of the characteristic information of the first kind characteristic pattern and the second category feature figure The detection position coordinates that the corresponding position coordinates in the samples pictures obtain face frame are ceased, including：

According to the mapping relations of the first kind characteristic pattern and the samples pictures, the spy of the first kind characteristic pattern is obtained Reference ceases the first location information in the samples pictures；

According to the mapping relations of the second category feature figure and the samples pictures, the spy of the second category feature figure is obtained Reference ceases the second location information in the samples pictures；

It will according to the corresponding score data of first location information score data corresponding with the second location information The first location information is merged with the second location information, to obtain the detection position coordinates of face frame.

It should be noted that each characteristic point is arranged for each feature extraction layer in the embodiment of the present application Anchor number is 6, and the Anchor default portion of original SSD network is 4, this number directly affects SSD network Detection performance.The recall rate and accuracy rate of SSD network can be improved by the modification to Anchor number.Recall rate (Recall Rate is also recall ratio) be relevant documentation number all in the relevant documentation number retrieved and document library ratio, measurement is The recall ratio of searching system.

Wherein, Anchor is a kind of structure in SSD network, be present in extract feature network layer in, number number Directly affect the effect of detection.

In the embodiment of the present application, prediction classification processing is carried out to obtain face classification value, by face according to characteristic information Classification value is compared with target value, if meeting targets threshold range, then it represents that this feature vector can be used as face characteristic information； If not meeting targets threshold range, then it represents that the corresponding image of this feature information is not face.

Step S105：Each convolution is updated according to the matching degree of the corresponding characteristic pattern of last bit convolutional layer and target image The weighted value of layer, the last bit convolutional layer are the convolutional layer of the end output layer in at least two-tier network structure.

Step S106：Face datection model is generated according to the weighted value of each convolutional layer.

In the present embodiment, the characteristic pattern and target image last bit convolutional layer in at least two-tier network structure exported It is matched to obtain corresponding matching degree (i.e.：Return gradient), the weight of each convolutional layer is adjusted according to corresponding matching degree Value and amount of bias improve the accuracy rate for detecting small face, according to final so that convolutional layer adjusted is conducive to detect small face The weighted value and amount of bias of determining each convolutional layer generate Face datection model.

Based on the same inventive concept, Fig. 5 is that the invention also provides a kind of method for detecting human face, including：

Step S201：It is rolled up based on Face datection model by be detected picture of at least two-tier network structure to input Product processing, obtains at least two characteristic pattern corresponding at least at least two convolutional layers of two-tier network structure, and described at least two A convolutional layer includes at least one first kind feature extraction layer and at least one second category feature extract layer, with the first kind spy Levying the corresponding characteristic pattern of extract layer is first kind characteristic pattern, and characteristic pattern corresponding with the second category feature extract layer is the second class Characteristic pattern, wherein the first kind feature extraction layer is located at before the second category feature extract layer；

Step S202：Extract the characteristic information of the first kind characteristic pattern；

Further, the characteristic information of the first kind characteristic pattern is extracted, including：

The first kind characteristic pattern is compressed；

Step S203：Extract the characteristic information of the second category feature figure；

Further, the characteristic information of the second category feature figure is extracted, including：

Step S204：According to the characteristic information of the characteristic information of the first kind characteristic pattern and the second category feature figure described Corresponding position coordinates obtain the detection position coordinates of face frame in picture to be detected.

In the embodiment of the present application, believed according to the feature of the characteristic information of the first kind characteristic pattern and the second category feature figure The detection position seat that the corresponding position coordinates in the picture to be detected obtain face frame is ceased, including：

It is adopted according to the corresponding score data of first location information score data corresponding with the second location information The first location information and the second location information are merged with Soft-NMS module, to obtain the detection of face frame Position coordinates.

In practical applications, since original NMS module can lose face in the case where face has overlapping cases, testing result is owed It is good.Therefore the application replaces original NMS module using Soft-NMS module, it is possible to reduce the detection when being overlapped face Performance, to improve the accuracy of detection face.

By extracting the first kind feature extraction layer (i.e.：Convolutional layer in facilities network network layers) characteristic information, to institute It states characteristic information to be handled, determines the face characteristic information in the samples pictures, improve small face inspection to realize The purpose of survey ability.Extraction Small object figure is improved into through normalization and process of convolution by the characteristic plane to first kind characteristic pattern The depths of features of picture, to make full use of the larger suitable small face of detection of the characteristic plane relative size of first kind feature extraction layer Advantage, and then achieve the purpose that improve the accuracy for detecting small face.

In one or more embodiment of the application, by corresponding second class of at least two second category feature extract layers Characteristic pattern merges, including：

In one or more embodiment of the application, by corresponding second class of at least three second category feature extract layers Characteristic pattern merges, including：

In one or more embodiment of the application, any three second category feature figures are two-by-two adjacent Two category feature figures.

As a kind of more preferred embodiment, as shown in fig. 6, method for detecting human face may include two stage-training stages And detection-phase.

Training stage includes the following steps：

Step 1：Samples pictures are adjusted to pre-set dimension；

Step 2：Process of convolution is carried out at least one samples pictures of input by least two-tier network structure, obtain with Corresponding at least two characteristic pattern of at least two convolutional layers of at least two-tier network structure, at least two convolutional layer include extremely A few first kind feature extraction layer and at least one second category feature extract layer, it is corresponding with the first kind feature extraction layer Characteristic pattern is first kind characteristic pattern, and characteristic pattern corresponding with the second category feature extract layer is the second category feature figure, wherein institute First kind feature extraction layer is stated to be located at before the second category feature extract layer；

Step 3：It extracts the characteristic information of first kind characteristic pattern and extracts the characteristic information of the second category feature figure；

Step 4：According to the characteristic information of the characteristic information of the first kind characteristic pattern and the second category feature figure in the sample Corresponding position coordinates obtain the detection position coordinates of face frame in this picture；

Step 5：According to the matching degree of last bit the convolutional layer corresponding characteristic pattern and target image of at least two-tier network structure Update the weighted value of each convolutional layer；

Step 6：Face datection model is generated according to the weighted value of each convolutional layer.

Detection-phase includes the following steps：

Step 7：The Face datection model initialization at least two-tier network structure obtained using training；

Step 8：Picture to be detected is adjusted to pre-set dimension；

Step 9：Process of convolution is carried out at least one samples pictures of input by least two-tier network structure, obtain with Corresponding at least two characteristic pattern of at least two convolutional layers of at least two-tier network structure；

Step 10：It extracts the characteristic information of first kind characteristic pattern and extracts the characteristic information of the second category feature figure；

Step 11：According to the characteristic information of the characteristic information of the first kind characteristic pattern and the second category feature figure it is described to Detect the detection position coordinates that corresponding position coordinates in picture obtain face frame.

Based on the same inventive concept, Fig. 7 is a kind of face detection system provided by the invention, including：

Processing unit 11, to pass through at least at least one sample of two-tier network structure to input based on Face datection model This picture carries out process of convolution, obtains at least two feature corresponding at least at least two convolutional layers of two-tier network structure Figure, at least two convolutional layer include at least one first kind feature extraction layer and at least one second category feature extract layer, Characteristic pattern corresponding with the first kind feature extraction layer is first kind characteristic pattern, corresponding with the second category feature extract layer Characteristic pattern is the second category feature figure, wherein the first kind feature extraction layer is located at before the second category feature extract layer；

First extraction unit 12, to extract the characteristic information of the first kind characteristic pattern；

Second extraction unit 13, to extract the characteristic information of the second category feature figure；

Detection unit 14, to according to the characteristic information of the first kind characteristic pattern and the characteristic information of the second category feature figure Corresponding position coordinates obtain the detection position coordinates of face frame in the samples pictures；

Updating unit 15, to update each institute according to the matching degree of the corresponding characteristic pattern of last bit convolutional layer and target image The weighted value of convolutional layer is stated, the last bit convolutional layer is the convolution of the end output layer in at least two-tier network structure Layer；

Generation unit 16, to generate Face datection model according to the weighted value of each convolutional layer.

In one or more embodiment of the application, first extraction unit 12 is special to extract the first kind The characteristic information of figure is levied, including：

The characteristic plane of the first kind feature extraction layer is compressed；

Process of convolution is carried out to compressed first kind characteristic pattern, to obtain the characteristic plane of the first kind characteristic pattern Characteristic information.

In one or more embodiment of the application, second extraction unit 13 is special to extract second class The characteristic information of figure is levied, including：

The corresponding second category feature figure of at least the second category feature extract layer is merged；

In one or more embodiment of the application, second extraction unit 13 is by least two second category features The corresponding second category feature figure of extract layer merges, including：

The second extraction unit 13 described in one or more embodiment of the application is by least three second category features The corresponding second category feature figure of extract layer merges, including：

In one or more embodiment of the application, second extraction unit 13 is by least three second category features The corresponding second category feature figure of extract layer merges, including：

In one or more embodiment of the application, the detection unit 14 is to according to the first kind characteristic pattern Characteristic information and the second category feature figure characteristic information in the samples pictures corresponding position coordinates obtain face frame Position coordinates are detected, including：

It should be noted that passing through at least at least one sample graph of two-tier network structure to input in the embodiment of the present application Piece carries out process of convolution, obtains at least two characteristic pattern corresponding at least at least two convolutional layers of two-tier network structure, institute Stating at least two convolutional layers includes at least one first kind feature extraction layer and at least one second category feature extract layer, and described The corresponding characteristic pattern of first kind feature extraction layer is first kind characteristic pattern, characteristic pattern corresponding with the second category feature extract layer For the second category feature figure, wherein the first kind feature extraction layer is located at before the second category feature extract layer；Described in extraction The characteristic information of the characteristic information of first kind characteristic pattern and the second category feature figure；According to the feature of the first kind characteristic pattern The characteristic information of information and the second category feature figure corresponding position coordinates in the samples pictures obtain the check bit of face frame Set coordinate；The weight of each convolutional layer is updated according to the matching degree of the corresponding characteristic pattern of last bit convolutional layer and target image Value, the last bit convolutional layer are the convolutional layer of the end output layer in at least two-tier network structure；According to each described The weighted value of convolutional layer generates Face datection model.Since the size that first kind feature extraction layer is located further forward its characteristic plane is opposite It is larger, it can be used for detecting small face, the spy for detecting small face can be improved by the extraction of the characteristic information to first kind characteristic pattern Depth is levied, and then realizes the purpose for improving small face detectability；It is improved by using the mode that convolutional layer merges and is The recall rate and accuracy rate of system detection.

Based on the same inventive concept, Fig. 8 is a kind of face detection system provided by the invention, including：

Processing unit 21, to pass through at least to be detected picture of the two-tier network structure to input based on Face datection model Process of convolution is carried out, at least two characteristic pattern corresponding at least at least two convolutional layers of two-tier network structure is obtained, it is described At least two convolutional layers include at least one first kind feature extraction layer and at least one second category feature extract layer, with described The corresponding characteristic pattern of a kind of feature extraction layer is first kind characteristic pattern, and characteristic pattern corresponding with the second category feature extract layer is Second category feature figure, wherein the first kind feature extraction layer is located at before the second category feature extract layer；

First extraction unit 22, to extract the characteristic information of the first kind characteristic pattern；

Second extraction unit 23, to extract the characteristic information of the second category feature figure；

Detection unit 24, to according to the characteristic information of the first kind characteristic pattern and the characteristic information of the second category feature figure Corresponding position coordinates obtain the detection position coordinates of face frame in the picture to be detected.

In one or more embodiment of the application, first extraction unit 22 is special to extract the first kind The characteristic information of figure is levied, including：

The first kind characteristic pattern is compressed；

In one or more embodiment of the application, second extraction unit 23 extracts the second category feature figure Characteristic information, including：

In one or more embodiment of the application, second extraction unit 23 is by least two second category features The corresponding second category feature figure of extract layer merges, including：

In one or more embodiment of the application, second extraction unit 23 is by least three second category features The corresponding second category feature figure of extract layer merges, including：

In one or more embodiment of the application, the detection unit 24 is to according to the first kind characteristic pattern Characteristic information and the second category feature figure characteristic information in the picture to be detected corresponding position coordinates obtain face frame Detection position sit, including：

It should be noted that being improved in such a way that the second extraction unit 23 merges convolutional layer in the embodiment of the present application The detection performance of system detection；The position coordinates of face characteristic are all merged to generate face using Soft-NMS module The detection position coordinates of frame improve the recall rate and accuracy rate of system detection.

Based on the same inventive concept, the present invention provides a kind of electronic systems, including at least one processor and memory, The memory is stored with program, and is configured to execute following steps by processor described at least one：

Extract the characteristic information of the second category feature figure；

Based on the same inventive concept, the present invention provides a kind of computer readable storage mediums, including with electronic system knot The program used is closed, program can be executed by processor to complete following steps：

Extract the characteristic information of the second category feature figure；

All the embodiments in this specification are described in a progressive manner, same and similar portion between each embodiment Dividing may refer to each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality For applying example, since it is substantially similar to the method embodiment, so being described relatively simple, related place is referring to embodiment of the method Part explanation.

The above description is only an example of the present application, is not intended to limit this application.For those skilled in the art For, various changes and changes are possible in this application.All any modifications made within the spirit and principles of the present application are equal Replacement, improvement etc., should be included within the scope of the claims of this application.

Claims

1. a kind of method for detecting human face, which is characterized in that including：

Process of convolution is carried out by least one samples pictures of at least two-tier network structure to input, is obtained and at least two layers of mesh Corresponding at least two characteristic pattern of at least two convolutional layers of network structure, at least two convolutional layer include at least one first Category feature extract layer and at least one second category feature extract layer, characteristic pattern corresponding with the first kind feature extraction layer are the A kind of characteristic pattern, characteristic pattern corresponding with the second category feature extract layer is the second category feature figure, wherein the first kind is special Sign extract layer is located at before the second category feature extract layer；

Extract the characteristic information of the second category feature figure；

Characteristic information according to the characteristic information of the first kind characteristic pattern and the second category feature figure is right in the samples pictures The position coordinates answered obtain the detection position coordinates of face frame；

The weighted value of each convolutional layer, institute are updated according to the matching degree of the corresponding characteristic pattern of last bit convolutional layer and target image State the convolutional layer that last bit convolutional layer is the end output layer in at least two-tier network structure；

2. method for detecting human face according to claim 1, which is characterized in that extract the feature letter of the first kind characteristic pattern Breath, including：

The first kind characteristic pattern is compressed；

Process of convolution is carried out to compressed first kind characteristic pattern, to obtain the characteristic information of the first kind characteristic pattern.

3. method for detecting human face according to claim 1, which is characterized in that extract the feature letter of the second category feature figure Breath, including：

4. method for detecting human face according to claim 3, which is characterized in that by least two second category feature extract layers pair The the second category feature figure answered merges, including：

A second category feature figure relatively large sized to characteristic plane in any two the second category feature figure carries out down-sampling Processing, another second class in the second category feature figure and the second category feature of any two figure that down-sampling is handled Characteristic pattern merges；Or

Deconvolution is carried out to the relatively small second category feature figure of characteristic plane size in any two the second category feature figure Processing, another second class in the second category feature figure and the second category feature of any two figure that deconvolution is handled Characteristic pattern merges.

5. method for detecting human face according to claim 4, which is characterized in that by least three second category feature extract layers pair The the second category feature figure answered merges, including：

In the second category feature figure for handling down-sampling and the second category feature of any two figure another second After category feature figure merges, also by the second category feature figure of merging and another the second category feature figure through deconvolution processing It is remerged, the characteristic plane size of another the second category feature figure is less than the second category feature of any two figure Characteristic plane size；

Wherein another in the second category feature figure that down-sampling is handled and the second category feature of any two figure the During two category feature figures merge, second class relatively large sized to characteristic plane in any three the second category feature figures Characteristic pattern carries out down-sampling processing.

6. method for detecting human face according to claim 4, which is characterized in that by least three second category feature extract layers pair The the second category feature figure answered merges, including：

In the second category feature figure for handling deconvolution and the second category feature of any two figure another second After category feature figure merges, also by the second category feature figure of merging and another the second category feature figure through down-sampling processing It is remerged, the characteristic plane size of another the second category feature figure is greater than the second category feature of any two figure Characteristic plane size；

Wherein another in the second category feature figure that deconvolution is handled and the second category feature of any two figure the During two category feature figures merge, to relatively small second class of characteristic plane size in any three the second category feature figures Characteristic pattern carries out deconvolution processing.

7. method for detecting human face according to claim 5, which is characterized in that any three second category feature figures are two Two the second adjacent category feature figures.

8. method for detecting human face according to claim 1, which is characterized in that believed according to the feature of the first kind characteristic pattern The characteristic information of breath and the second category feature figure corresponding position coordinates in the samples pictures obtain the detection position of face frame Coordinate, including：

According to the mapping relations of the first kind characteristic pattern and the samples pictures, the feature letter of the first kind characteristic pattern is obtained Cease the first location information in the samples pictures；

According to the mapping relations of the second category feature figure and the samples pictures, the feature letter of the second category feature figure is obtained Cease the second location information in the samples pictures；

It will be described according to the corresponding score data of first location information score data corresponding with the second location information First location information is merged with the second location information, to obtain the detection position coordinates of face frame.

9. a kind of method for detecting human face, which is characterized in that including：

Process of convolution is carried out by the picture to be detected of at least two-tier network structure to input based on Face datection model, obtain with Corresponding at least two characteristic pattern of at least two convolutional layers of at least two-tier network structure, at least two convolutional layer include extremely A few first kind feature extraction layer and at least one second category feature extract layer, it is corresponding with the first kind feature extraction layer Characteristic pattern is first kind characteristic pattern, and characteristic pattern corresponding with the second category feature extract layer is the second category feature figure, wherein institute First kind feature extraction layer is stated to be located at before the second category feature extract layer；

Extract the characteristic information of the second category feature figure；

According to the characteristic information of the characteristic information of the first kind characteristic pattern and the second category feature figure in the picture to be detected Corresponding position coordinates obtain the detection position coordinates of face frame.

10. method for detecting human face according to claim 9, which is characterized in that extract the feature of the first kind characteristic pattern Information, including：

The first kind characteristic pattern is compressed；

11. method for detecting human face according to claim 9, which is characterized in that extract the feature of the second category feature figure Information, including：

12. method for detecting human face according to claim 11, which is characterized in that by least two second category feature extract layers Corresponding second category feature figure merges, including：

13. method for detecting human face according to claim 12, which is characterized in that by least three second category feature extract layers Corresponding second category feature figure merges, including：

14. method for detecting human face according to claim 12, which is characterized in that by least three second category feature extract layers Corresponding second category feature figure merges, including：

15. method for detecting human face according to claim 13, which is characterized in that any three second category feature figures are The second adjacent category feature figure two-by-two.

16. method for detecting human face according to claim 9, which is characterized in that according to the feature of the first kind characteristic pattern The characteristic information of information and the second category feature figure corresponding position coordinates in the picture to be detected obtain the detection of face frame Position is sat, including：

It is used according to the corresponding score data of first location information score data corresponding with the second location information Soft-NMS module merges the first location information and the second location information, to obtain the check bit of face frame Set coordinate.

17. a kind of face detection system, which is characterized in that including：

Processing unit is obtained to carry out process of convolution by least one samples pictures of at least two-tier network structure to input To at least two characteristic pattern corresponding at least at least two convolutional layers of two-tier network structure, at least two convolutional layers packet At least one first kind feature extraction layer and at least one second category feature extract layer are included, with the first kind feature extraction layer pair The characteristic pattern answered is first kind characteristic pattern, and characteristic pattern corresponding with the second category feature extract layer is the second category feature figure, Described in first kind feature extraction layer be located at before the second category feature extract layer；

Detection unit, to according to the characteristic information of the first kind characteristic pattern and the characteristic information of the second category feature figure described Corresponding position coordinates obtain the detection position coordinates of face frame in samples pictures；

Updating unit, to update each convolution according to the matching degree of the corresponding characteristic pattern of last bit convolutional layer and target image The weighted value of layer, the last bit convolutional layer are the convolutional layer of the end output layer in at least two-tier network structure；

18. face detection system according to claim 17, which is characterized in that the first extracting unit used for extracting institute The characteristic information of first kind characteristic pattern is stated, including：

Process of convolution is carried out to compressed first kind characteristic pattern, to obtain the spy of the characteristic plane of the first kind characteristic pattern Reference breath.

19. face detection system according to claim 17, which is characterized in that the second extracting unit used for extracting institute The characteristic information of the second category feature figure is stated, including：

20. face detection system according to claim 19, which is characterized in that second extraction unit is by least two The corresponding second category feature figure of second category feature extract layer merges, including：

21. face detection system according to claim 20, which is characterized in that second extraction unit is by least three The corresponding second category feature figure of second category feature extract layer merges, including：

22. face detection system according to claim 20, which is characterized in that second extraction unit is by least three The corresponding second category feature figure of second category feature extract layer merges, including：

23. face detection system according to claim 21, which is characterized in that any three second category feature figures are The second adjacent category feature figure two-by-two.

24. face detection system according to claim 17, which is characterized in that the detection unit is to according to described The characteristic information of the characteristic information of a kind of characteristic pattern and the second category feature figure corresponding position coordinates in the samples pictures obtain The detection position coordinates of face frame are taken, including：

25. a kind of face detection system, which is characterized in that including：

Processing unit, to be rolled up based on Face datection model by be detected picture of at least two-tier network structure to input Product processing, obtains at least two characteristic pattern corresponding at least at least two convolutional layers of two-tier network structure, and described at least two A convolutional layer includes at least one first kind feature extraction layer and at least one second category feature extract layer, with the first kind spy Levying the corresponding characteristic pattern of extract layer is first kind characteristic pattern, and characteristic pattern corresponding with the second category feature extract layer is the second class Characteristic pattern, wherein the first kind feature extraction layer is located at before the second category feature extract layer；

Detection unit, to according to the characteristic information of the first kind characteristic pattern and the characteristic information of the second category feature figure described Corresponding position coordinates obtain the detection position coordinates of face frame in picture to be detected.

26. face detection system according to claim 25, which is characterized in that the first extracting unit used for extracting institute The characteristic information of first kind characteristic pattern is stated, including：

The first kind characteristic pattern is compressed；

27. face detection system according to claim 25, which is characterized in that second extraction unit extracts described the The characteristic information of two category feature figures, including：

28. face detection system according to claim 27, which is characterized in that second extraction unit is by least two The corresponding second category feature figure of second category feature extract layer merges, including：

29. face detection system according to claim 28, which is characterized in that second extraction unit is by least three The corresponding second category feature figure of second category feature extract layer merges, including：

30. face detection system according to claim 28, which is characterized in that second extraction unit is by least three The corresponding second category feature figure of second category feature extract layer merges, including：

31. face detection system according to claim 29, which is characterized in that

Any three second category feature figures are the second adjacent two-by-two category feature figure.

32. face detection system according to claim 25, which is characterized in that the detection unit is to according to described The characteristic information of the characteristic information of a kind of characteristic pattern and the second category feature figure corresponding position coordinates in the picture to be detected The detection position for obtaining face frame is sat, including：

33. a kind of electronic system, including at least one processor and memory, the memory is stored with program, and is matched It is set to and following steps is executed by processor described at least one：

Extract the characteristic information of the second category feature figure；

34. a kind of computer readable storage medium, including the program being used in combination with electronic system, program can be executed by processor To complete following steps：

Extract the characteristic information of the second category feature figure；

35. a kind of electronic system, including at least one processor and memory, the memory is stored with program, and is matched It is set to and following steps is executed by processor described at least one：

Extract the characteristic information of the second category feature figure；

36. a kind of computer readable storage medium, including the program being used in combination with electronic system, program can be executed by processor To complete following steps：

Extract the characteristic information of the second category feature figure；