CN108229281A

CN108229281A - The generation method and method for detecting human face of neural network, device and electronic equipment

Info

Publication number: CN108229281A
Application number: CN201710277489.0A
Authority: CN
Inventors: 杨硕; 熊元骏; 吕健勤; 汤晓鸥
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2017-04-25
Filing date: 2017-04-25
Publication date: 2018-06-29
Anticipated expiration: 2037-04-25
Also published as: CN108229281B

Abstract

Generation method and method for detecting human face, device and electronic equipment an embodiment of the present invention provides a kind of neural network, wherein, the generation method of the neural network includes：Target scale range is divided, obtains multiple sub-goal range scales, wherein, the target scale range carries out Face datection for neural network；Determine the corresponding sub-neural network of the multiple sub-goal range scale；The corresponding sub-neural network of the multiple sub-goal range scale is merged, obtains the neural network.The neural network generated through the embodiment of the present invention, the face being capable of detecting when in the great facial image of facial size range, while Face datection precision in improving facial image, additionally it is possible to improve the detection efficiency of face in facial image.

Description

The generation method and method for detecting human face of neural network, device and electronic equipment

Technical field

The present embodiments relate to a kind of artificial intelligence technology more particularly to generation method of neural network, device and electricity Sub- equipment and, a kind of method for detecting human face, device and electronic equipment.

Background technology

Human face detection tech is one of most important research direction in computer vision field, it is many human face analysis skills The basis of art, for example, face critical point detection technology and face recognition technology.The modeling of human face detection tech is very challenging property , face in the case of human face detection tech needs to detect different.For example, the face under different attitudes vibrations, different The face that degree is blocked, the face under different expressions, the face of different sizes, the face under different illumination conditions.One side Face, the face appearance between different people makes a big difference, different when face is in different postures even same person It blocks to degree, under different expressions and different ambient lights, the appearance of face looks unusual difference.On the other hand, In video monitoring system, such as subway, from the face general size very little of camera farther out, the face nearer from camera leads to Normal size is larger, and the appearance of various sizes of face is also very different, such as the small face of size does not see face details, only It can see profile, the big face of size is it will be clear that the details of human face five-sense-organ.In addition to this, human face detection tech needs Reach the real-time speed of service, because it is the basis of other many human face analysis technologies.

Existing method for detecting human face can preferably solve different postures, and the detection of the face blocked in various degree is asked Topic, but it is very low for the detection efficiency of the face of different sizes.

Invention content

The purpose of the embodiment of the present invention is, provides a kind of technical solution of neural network generation and the technology of Face datection Scheme.

It is according to embodiments of the present invention in a first aspect, provide a kind of generation method of neural network, the method includes： Target scale range is divided, obtains multiple sub-goal range scales, wherein, the target scale range is used for nerve net Network carries out Face datection；Determine the corresponding sub-neural network of the multiple sub-goal range scale；By the multiple specific item The corresponding sub-neural network of scale range is merged, and obtains the neural network.

Optionally, it is described that target scale range is divided, multiple sub-goal range scales are obtained, including：By described in Target scale range is uniformly divided into multiple subranges；For each subrange, extraction is concentrated to fall in current sub- model in training data Enclose interior multiple face samples；According to the characteristic value of each face sample in each subrange, the multiple specific item is determined Scale range.

Optionally, the characteristic value of each face sample according in each subrange, determines the multiple specific item Scale range, including：For each face sample in each subrange, the pixel value generation institute based on the face sample State the color histogram of face sample；For each subrange, the face of each two face sample in the subrange is calculated respectively Chi-Square measure between Color Histogram obtains the face sample distance matrix of the subrange；According to the face sample distance Matrix generates the appearance variation characteristic value of the subrange；According to the appearance variation characteristic value of each subrange, institute is determined State multiple sub-goal range scales.

Optionally, each face sample in each subrange, the pixel value life based on the face sample Into before the color histogram of the face sample, further include：For each face sample in each subrange, according to current The floor value of subrange zooms in and out the size of the face sample.

Optionally, it is described that target scale range is divided, multiple sub-goal range scales are obtained, including：Based on people The characteristic point of face divides the target scale range, obtains the multiple sub-goal range scale.

Optionally, it is described to determine the corresponding sub-neural network of the multiple sub-goal range scale, including：For every A sub- target scale range determines the down-sampled step-length of the corresponding pondization of sub-goal range scale；It is and down-sampled based on the pondization Step-length determines the corresponding sub-neural network of the sub-goal range scale.

Optionally, it is described to merge the corresponding sub-neural network of the multiple sub-goal range scale, it obtains The neural network, including：Based on shared network parameter, by the corresponding sub- nerve net of the multiple sub-goal range scale Network is merged, and obtains the neural network.

Second aspect according to embodiments of the present invention provides a kind of method for detecting human face, the method includes：To be checked The facial image of survey zooms in and out, the facial image after being scaled；The facial image after scaling is examined through neural network It surveys, obtains the Face datection of the facial image to be detected as a result, wherein, the neural network is according to embodiments of the present invention First aspect described in method generation.

Optionally, it is described that facial image to be detected is zoomed in and out, the facial image after being scaled, including：According to The upper bound of the target scale range of the neural network zooms in and out the facial image to be detected.

Optionally, the facial image to be detected is still image or is video image in sequence of frames of video.

Optionally, the sequence of frames of video is the sequence of frames of video in live streaming.

The third aspect according to embodiments of the present invention, provides a kind of generating means of neural network, and described device includes： Division module for being divided to target scale range, obtains multiple sub-goal range scales, wherein, the target scale Range carries out Face datection for neural network；Determining module, for determining that the multiple sub-goal range scale corresponds to respectively Sub-neural network；Fusion Module, for the corresponding sub-neural network of the multiple sub-goal range scale to be melted It closes, obtains the neural network.

Optionally, the division module, including：First division unit, for the target scale range to be uniformly divided into Multiple subranges；Extracting unit for being directed to each subrange, concentrates extraction to fall more in current subrange in training data A face sample；First determination unit for the characteristic value according to each face sample in each subrange, determines described Multiple sub-goal range scales.

Optionally, first determination unit, is specifically used for：For each face sample in each subrange, it is based on The pixel value of the face sample generates the color histogram of the face sample；For each subrange, calculate respectively described in Chi-Square measure in subrange between the color histogram of each two face sample obtains the face sample distance of the subrange Matrix；According to the face sample distance matrix, the appearance variation characteristic value of the subrange is generated；According to every sub- model The appearance variation characteristic value enclosed, determines the multiple sub-goal range scale.

Optionally, first determination unit, is additionally operable to：The face is generated in the pixel value based on the face sample It is right according to the floor value of current subrange for each face sample in each subrange before the color histogram of sample The size of the face sample zooms in and out.

Optionally, the division module, further includes：Second division unit, for based on the characteristic point of face to the mesh Scale range is divided, and obtains the multiple sub-goal range scale.

Optionally, the determining module, including：Second determination unit for being directed to each sub-goal range scale, determines The down-sampled step-length of the corresponding pondization of sub-goal range scale；And based on the down-sampled step-length of the pondization, determine the specific item scale Spend the corresponding sub-neural network of range.

Optionally, the Fusion Module, including：Integrated unit, for being based on shared network parameter, by the multiple specific item The corresponding sub-neural network of scale range is merged, and obtains the neural network.

Fourth aspect according to embodiments of the present invention, provides a kind of human face detection device, and described device includes：Scale mould Block, for being zoomed in and out to facial image to be detected, the facial image after being scaled；Detection module, for through nerve net Network is detected the facial image after scaling, obtains the Face datection of the facial image to be detected as a result, wherein, described Neural network is that the device described in the third aspect according to embodiments of the present invention generates.

Optionally, the Zoom module, including：Unit for scaling, for according to the target scale range of the neural network The upper bound, the facial image to be detected is zoomed in and out.

5th aspect according to embodiments of the present invention, provides a kind of electronic equipment, including：First processor, first are deposited Reservoir, the first communication device and the first communication bus, the first processor, the first memory and the first communication member Part completes mutual communication by first communication bus；The first memory refers to for storing at least one and can perform It enables, the executable instruction makes the first processor perform any one of them provided such as first aspect of the embodiment of the present invention The corresponding operation of generation method of neural network.

6th aspect according to embodiments of the present invention, provides a kind of electronic equipment, including：Second processor, second are deposited Reservoir, the second communication device and the second communication bus, the second processor, the second memory and the second communication member Part completes mutual communication by second communication bus；The second memory refers to for storing at least one and can perform It enables, the executable instruction makes the second processor perform any one of them provided such as second aspect of the embodiment of the present invention The corresponding operation of method for detecting human face.

7th aspect according to embodiments of the present invention, provides a kind of computer readable storage medium, the computer can Storage medium is read to be stored with：For being divided to target scale range, the executable finger of multiple sub-goal range scales is obtained It enables；For determining the executable instruction of the corresponding sub-neural network of the multiple sub-goal range scale；For by described in The corresponding sub-neural network of multiple sub-goal range scales is merged, and obtains the executable instruction of the neural network.

Eighth aspect according to embodiments of the present invention provides another computer readable storage medium, the computer Readable storage medium storing program for executing is stored with：For being zoomed in and out to facial image to be detected, facial image after being scaled is held Row instruction；For being detected through neural network to the facial image after scaling, the people of the facial image to be detected is obtained The executable instruction of face testing result.

The technical solution provided according to embodiments of the present invention by being divided to target scale range, obtains multiple sons Target scale range, and determine the sub-neural network corresponding to each sub-goal range scale, then by each specific item scale model It encloses corresponding sub-neural network to be merged, obtains the neural network for Face datection, the nerve net of the present embodiment generation Network compared with the prior art in the neural network based on scale invariability, be capable of detecting when the great face figure of facial size range Face as in, while Face datection precision in improving facial image, additionally it is possible to improve the detection of face in facial image Efficiency.

Description of the drawings

Fig. 1 is the flow chart of the generation method of according to embodiments of the present invention one neural network；

Fig. 2 is the flow chart of the generation method of according to embodiments of the present invention two neural network；

Fig. 3 is the schematic diagram of the concrete scene of the embodiment of the method for application drawing 2；

Fig. 4 is the flow chart of according to embodiments of the present invention three method for detecting human face；

Fig. 5 is the flow chart of according to embodiments of the present invention four method for detecting human face；

Fig. 6 is the structure diagram of the generating means of according to embodiments of the present invention five neural network；

Fig. 7 is the structure diagram of the generating means of according to embodiments of the present invention six neural network；

Fig. 8 is the structure diagram of according to embodiments of the present invention seven human face detection device；

Fig. 9 is the structure diagram of according to embodiments of the present invention eight human face detection device；

Figure 10 is the structure diagram of according to embodiments of the present invention nine electronic equipment；

Figure 11 is the structure diagram of according to embodiments of the present invention ten electronic equipment.

Specific embodiment

(identical label represents identical element in several attached drawings) and embodiment below in conjunction with the accompanying drawings, implement the present invention The specific embodiment of example is described in further detail.Following embodiment is used to illustrate the present invention, but be not limited to the present invention Range.

It will be understood by those skilled in the art that the terms such as " first ", " second " in the embodiment of the present invention are only used for distinguishing Different step, equipment or module etc. neither represent any particular technology meaning, also do not indicate that the inevitable logic between them is suitable Sequence.

Embodiment one

Fig. 1 is the flow chart of the generation method of according to embodiments of the present invention one neural network.

With reference to Fig. 1, in step S101, target scale range is divided, obtains multiple sub-goal range scales.

Wherein, the target scale is ranging to be generated, and for scale used in the neural network of Face datection Range that is to say that the neural network carries out Face datection using the target scale range.In embodiments of the present invention, face The size of face may be defined as the length of side of the limitting casing of face in facial image in image, then neural network to be generated is made Target scale range refers to the range residing for the length of side of the limitting casing of face in facial image, wherein, face limitting casing The length of side can be embodied by the side of limitting casing as numerical value, for example, the target scale range can be 10 pixels to 1300 pictures Between element, the sub-goal range scale can be 10 pixels between 500 pixels and 500 pixels are between 1300 pixels.Institute The target scale range should be included by stating the union of sub-goal range scale.In a particular embodiment, in order to enable generation Neural network can detect face in the great facial image of facial size range, nerve net to be generated can be given in advance The target scale range of network.

Specifically, the neural network to be generated can be the nerve net that feature extraction or target object detection can be achieved Generation network in network, including but not limited to convolutional neural networks, enhancing learning neural network, confrontation neural network etc..

In step s 102, the corresponding sub-neural network of the multiple sub-goal range scale is determined.

In a particular embodiment, it can be determined according to each sub-goal range scale and each sub-goal range scale institute Corresponding sub-neural network.Take this, for each sub-goal range scale, design corresponding sub-neural network, can improve The detection performance of face in facial image in the sub-goal range scale.Wherein, the sub-neural network can be achievable Feature extraction or the neural network of target object detection, including but not limited to convolutional neural networks enhance learning neural network, are right Generation network in anti-neural network etc..

In step s 103, the corresponding sub-neural network of the multiple sub-goal range scale is merged, obtained To the neural network.

In the present embodiment, the sub-neural network corresponding to each sub-goal range scale has identical parameter, can root Each sub-neural network is merged according to identical parameter, obtains the neural network for Face datection.

According to the generation method of the neural network of the present embodiment, by being divided to target scale range, obtain multiple Sub-goal range scale, and determine the sub-neural network corresponding to each sub-goal range scale, then by each specific item scale Sub-neural network corresponding to range is merged, and obtains the neural network for Face datection, the nerve of the present embodiment generation Network compared with the prior art in the neural network based on scale invariability, be capable of detecting when the great face of facial size range Face in image, while Face datection precision in improving facial image, additionally it is possible to improve the inspection of face in facial image Survey efficiency.

The generation method of the neural network of the present embodiment can be held by any suitable equipment with data-handling capacity Row, including but not limited to：Terminal device and server etc..

Embodiment two

Fig. 2 is the flow chart of the generation method of according to embodiments of the present invention two neural network.

With reference to Fig. 2, in step s 201, the target scale range is divided based on the characteristic point of face, is obtained The multiple sub-goal range scale.

In a particular embodiment, it is assumed that the target scale range can be divided into m-n+ by target scale ranging from [n, m] 1 sub- target scale subrange, wherein, n and m are the integer more than or equal to 1, and n is much smaller than m.But it divides so just Sub-goal range scale can be caused excessive, and the sub-neural network also resulted in for sub-goal range scale is excessive, causes The high redundancy of sub-neural network.In addition, the variation of face appearance is not the variation uniform in size with facial size.It is very aobvious So, this embodiment is worthless.According to the observation of present inventor, the inventor of the present application discovered that the people that size is small Face (size for being less than 40 pixels) can lose most profiling information, and the face of this small size can pass through rigid structure and ring Border is characterized.The face (40 pixels are between 140 pixels) of medium size varies widely, because of this in image A little faces are not usually the main object of photographer, and therefore, they have different postures, and the direction watched is different.Large scale Face (more than 140 pixels) usually have very little variation because when image is captured, they are the main bodys pair of photographer As.These large-sized faces are generally in front or lateral attitude.It can be by target according to the characteristic point of various sizes of face Range scale is divided into small range, medium range and a wide range of, and network structure is also contemplated for by this dividing mode.Nationality This, can improve the accuracy of detection of face in facial image.

Optionally, it is described that target scale range is divided, multiple sub-goal range scales are obtained, including：By described in Target scale range is uniformly divided into multiple subranges；For each subrange, the training data of the neural network concentrate with Machine extraction falls multiple face samples in current subrange；According to the feature of each face sample in each subrange Value, determines the multiple sub-goal range scale.Wherein, the feature of each face sample according in each subrange Value, determines the multiple sub-goal range scale, including：For each face sample in each subrange, based on the people The pixel value of face sample generates the color histogram of the face sample；For each subrange, the subrange is calculated respectively Chi-Square measure between the color histogram of interior each two face sample obtains the face sample distance matrix of the subrange； According to the face sample distance matrix, the appearance variation characteristic value of the subrange is generated；According to each subrange Appearance variation characteristic value determines the multiple sub-goal range scale.Wherein, it for each subrange, counts in the subrange The pixel value of each face sample, and obtain according to the pixel value color histogram of each face sample in the subrange. Optionally, each face sample in each subrange, the pixel value based on the face sample generate the people Before the color histogram of face sample, further include：For each face sample in each subrange, according to current subrange Floor value zooms in and out the size of the face sample.Specifically, it for each face sample in each subrange, presses According to the floor value of current subrange, by the size scaling of face sample each in current subrange into the lower bound of current subrange Size.

In a particular embodiment, target scale range is divided into k (k>=100) part, for each subrange, S (the s fallen in the subrange are randomly selected in the training data concentration of neural network to be generated>=300) a face sample, And by the size scaling of face sample into the lower bound size of the subrange.For each face sample, according to the picture of face sample Numerical value obtains the color histogram of the face sample, wherein, the color histogram can be as the spy for describing the face sample Sign.For each subrange, the chi-Square measure of the color histogram of face sample in the subrange between any two is calculated, and incite somebody to action Characteristic value of the norm of the distance matrix arrived as description face appearance variation size.For each subrange, using above-mentioned side The characteristic value of face appearance variation size is calculated in formula, and by these characteristic value normalizations to the range of [0-1], and with Nogata The form output of figure.Then, the characteristic value of face appearance variation size is divided.Such as, if it is desired to export b specific item Scale range, the feature value division that face appearance can be changed to size is b parts, then exports the subrange of every portion, obtains To each sub-goal range scale.

In step S202, the corresponding sub-neural network of the multiple sub-goal range scale is determined.

In the present embodiment, it is described to determine the corresponding sub-neural network of the multiple sub-goal range scale, including： For each sub-goal range scale, the down-sampled step-length of the corresponding pondization of sub-goal range scale is determined；And based on the pond Down-sampled step-length determines the corresponding sub-neural network of the sub-goal range scale.Specifically, it is true according to sub-goal range scale The down-sampled step-length of the pondization of the convolutional layer of the sub-neural network corresponding to the sub-goal range scale is determined, so that the son The upper bound of scale subrange of the target scale range after the convolutional layer of corresponding sub-neural network is mapped and lower bound difference It is less than preset value with the absolute value of the difference of the template size of the preconfigured space pondization operation for feature extraction.Nationality This, can improve the detection performance of face in facial image.

Wherein, why allow to carry out mapping after scale subrange the upper bound and lower bound respectively with preconfigured template The absolute value of the difference of size is less than preset value, is because this makes it possible to the upper bound for causing the scale subrange after mapping under Boundary will not excessively be more than or less than the size of template, so that volume of the sub-goal range scale in corresponding sub-neural network Lamination mapped after scale subrange and preconfigured template size it is basically identical, so as to improve the detection of face Performance.Those skilled in the art can obtain the preset value by test, and the preset value is generally 2.

Specifically, template size refers to obtain different size of face characteristic by the operation of spatial pyramid pondization The size of the face characteristic of same size, the template size define not only the big of face characteristic after the operation of this pondization It is small, and the convolutional layer of the sub-neural network of extraction Face datection feature is also defined, for example, c*c, (c=3 or c=5).It is false If sub-goal range scale is [a, b], the down-sampled step-length of pondization of the convolutional layer of sub-neural network is n, then the specific item scale Scale subrange of the range after the convolutional layer of corresponding sub-neural network is mapped is [a/n, b/n].Determining sub-goal After the down-sampled step-length of the pondization of the convolutional layer of sub-neural network corresponding to range scale, it can obtain and specific item scale model Corresponding sub-neural network is enclosed, the structure of the sub-neural network includes the all-network layer before convolutional layer and convolutional layer.

For example, when preconfigured template size is 5*5, for face (10 pixels to 40 pixels of small size Between), can the best detection performance of small size face be realized for 8 sub-neural network by the down-sampled step-length of pondization, because Scale subrange on characteristic pattern is 2 pixels between 5 pixels after the projection of small size face, with preconfigured template size It is similar.

In step S203, the corresponding sub-neural network of the multiple sub-goal range scale is merged, is obtained To the neural network.

Specifically, which includes：It is based on shared network parameter, the multiple sub-goal range scale is corresponding Sub-neural network is merged, and obtains the neural network.Specifically, by the corresponding sub- god of each sub-goal range scale It is merged through network；According to the sub-neural network after fusion, the neural network for being used for Face datection is obtained.Wherein, institute Some sub-neural networks share identical network parameter.Take this, the corresponding son nerve of each sub-goal range scale can be reduced The redundancy of network.In a particular embodiment, the identical network parameter does not include the down-sampled step-length of pondization.

In the present embodiment, after according to step S201-S203 generations for the neural network of Face datection, using end Training method to end is trained the neural network of generation.Specifically, by the training sample set of the neural network The sampling of positive negative sample is carried out according to ready-portioned sub-goal range scale, and will be positive and negative in different sub-goal range scales Sample is used in the single neural network for training the fusion be directed to the detection branches of different sub-goal range scales.Different sub-goals The error meeting parallel computation of the detection branches of range scale, then carries out the training of network with gradient backpropagation.Wherein, Training supervision message may come from, and include but are not limited to, face frame, human face characteristic point and face character etc..Specifically, For each sub-goal range scale, the positive negative sample in sub-goal range scale is extracted in training sample set, wherein negative Sample comes from two parts, and a part comes from pure background, i.e., marks sample standard deviation with the face in all sub-goal range scales The sample not overlapped.Another part negative sample, which comes from, to be labeled with fraction with face in the sub-goal range scale and overlaps Sample, for example, overlapping part be less than union 1/3.This two-part negative sample adds in final negative sample with equal probabilities Set.Positive sample comes from has largely overlapping sample with the face mark sample in the sub-goal range scale, for example, handing over It is folded to be partially larger than the 1/2 of union.If the face mark sample outside sample and the sub-goal range scale have it is most overlapping, Then ignore such sample.

Fig. 3 is the schematic diagram of the concrete scene of the embodiment of the method for application drawing 2.As shown in figure 3, describe detection face All threads, this framework include 3 sub-neural networks with different spaces pond step-length and depth, this 3 sub- nerve nets Network is fused into single backbone network by shared parameter.The single backbone network becomes to have identical with ResNet-50 Structure, can be used area-of-interest pondization operate from each block of backbone network (for example, res2x, res3x, res4x, Res5x the face characteristic of last layer of extraction face sample), and the face characteristic that these are extracted is used to train region Candidate network RPN and fast convolution neural network Faster-RCNN.The target scale detection range of the backbone network is 10 pixels To between 1300 pixels, the corresponding sub-goal size measurement range of 3 sub-neural networks is respectively 10 pixels to 40 pictures Between element, 40 pixels are between 140 pixels, and for 140 pixels between 1300 pixels, the template size of space pondization operation is 5*5. When facial image is inputted in backbone network, the face characteristic and step-length of last layer of block res2x that extractable step-length is 4 are The face characteristic of last layer of 8 block res3x, and obtain size in facial image according to these facial features localizations and fall into 10 To the face 1 between 40 pixels, the face characteristic and step-length of last layer of block res3x that extractable step-length is 8 are pixel The face characteristic of last layer of 16 block res4x, and obtain size in facial image according to these facial features localizations and fall into 40 pixels can extract the face characteristic and step of last layer for the block res4x that step-length is 16 to the face 2 between 140 pixels The face characteristic of last layer of a length of 32 block res5x, and obtain size in facial image according to these facial features localizations 140 pixels are fallen into the face 3 between 1300 pixels.Analysis On Multi-scale Features are utilized in the figure and carry out Face datection.Certainly, Face datection can not be carried out using Analysis On Multi-scale Features, for example, removing the block res2x in backbone network, can directly extract step-length The face characteristic of last layer for 8 block res3x, and size in facial image is obtained according to the facial features localization and falls into 10 Pixel can directly extract the face characteristic of last layer for the block res4x that step-length is 16 to the face between 40 pixels, and according to The facial features localization obtains size in facial image and falls into 40 pixels to the face between 140 pixels, can directly extract step-length The face characteristic of last layer for 32 block res5x, and size in facial image is obtained according to the facial features localization and is fallen into 140 pixels are to the face between 1300 pixels.In a particular embodiment, Face datection ratio is carried out using Analysis On Multi-scale Features not having Have using Analysis On Multi-scale Features carry out Face datection performance it is slightly better.

According to the generation method of the neural network of the present embodiment, by based on the characteristic point of face to target scale range into Row divides, and obtains multiple sub-goal range scales, and determine and each specific item scale model according to each sub-goal range scale Corresponding sub-neural network is enclosed, reuses sub- nerve net of the method for shared parameter corresponding to by each sub-goal range scale Network is merged, and obtains the neural network for Face datection, the neural network of the present embodiment generation compared with the prior art in Neural network based on scale invariability, the face being capable of detecting when in the great facial image of facial size range, and carry High Face datection performance.While Face datection precision in improving facial image, additionally it is possible to improve face in facial image Detection efficiency.

The neural network generation method of the present embodiment can be performed by any suitable equipment with data-handling capacity, Including but not limited to：Terminal device and server etc..

Embodiment three

Fig. 4 is the flow chart of according to embodiments of the present invention three method for detecting human face.

With reference to figure 4, in step S301, facial image to be detected is zoomed in and out, the facial image after being scaled.

Since the size of facial image to be detected is arbitrary, it is possible to which the size of facial image to be detected is not fallen Enter in the range of the target scale for the neural network of Face datection, at this time, it may be necessary to facial image to be detected is zoomed in and out, So that the size of facial image to be detected is fallen into the range of the target scale of the neural network.

In step s 302, the facial image after scaling is detected through neural network, obtains the people to be detected The Face datection result of face image.

Wherein, the neural network is to generate to obtain according to the method for above-described embodiment one or the description of above-described embodiment two 's.The Face datection result includes the size information and location information of face in facial image.

Exemplary embodiment of the present invention is directed to a kind of method for detecting human face, by facial image to be detected into Row scaling, the facial image after being scaled, and the facial image after scaling is detected through neural network, obtain described treat The Face datection of the facial image of detection is capable of detecting when face ruler as a result, the method for middle detection face compared with the prior art Face in the very little great facial image of range, while Face datection precision in improving facial image, additionally it is possible to improve people The detection efficiency of face in face image.

The method for detecting human face of the present embodiment can be performed by any suitable equipment with data-handling capacity, including But it is not limited to：Terminal device and server etc..

Example IV

Fig. 5 is the flow chart of according to embodiments of the present invention four method for detecting human face.

With reference to figure 5, in step S401, according to the upper bound of the target scale range of the neural network, to described to be checked The facial image of survey zooms in and out.

Specifically, according to the upper bound of the target scale range of the neural network, to the facial image to be detected into Row scaling so that the size of the long side of the facial image after scaling is less than or equal to the target scale range of the neural network The upper bound.

Wherein, facial image to be detected is likely to be rectangular or square, as long as ensureing the face after scaling The size of the long side of image is less than or equal to the upper bound of the target scale range of neural network, then the facial image after scaling is just Fall into the target scale range of the neural network.Specifically, the facial image to be detected can be the static state of shooting Video image or image of synthesis etc. in image, sequence of frames of video.The sequence of frames of video can be in live streaming Sequence of frames of video.

In step S402, the facial image after scaling is detected through neural network, obtains the people to be detected The Face datection result of face image.

Since the step is identical with the step S302 in above-described embodiment three, details are not described herein.

The method for detecting human face of the embodiment of the present invention has important application, such as the video monitoring in security protection, mobile phone sum number Autofocus system, electron album and face identification system of code camera etc..Further, it is also possible to help improve face tracking skill The effect of art, face recognition technology and face critical point detection technology.

Exemplary embodiment of the present invention is directed to a kind of method for detecting human face, according to the target ruler of the neural network The upper bound of range is spent, the facial image to be detected is zoomed in and out so that the size of the long side of the facial image after scaling The facial image after scaling is carried out less than or equal to the upper bound of the target scale range of the neural network, and through neural network Detection, obtain the Face datection of the facial image to be detected as a result, compared with the prior art it is middle detection face method, energy Enough detect the face in the great facial image of facial size range, Face datection precision is same in facial image is improved When, additionally it is possible to improve the detection efficiency of face in facial image.

Embodiment five

Based on identical technical concept, Fig. 6 is the generating means for the neural network for showing according to embodiments of the present invention five Structure diagram.It can be used to perform the generation method flow of the neural network as described in embodiment one.

With reference to Fig. 6, the generating means of the neural network include division module 501, determining module 502 and Fusion Module 503.

Division module 501 for being divided to target scale range, obtains multiple sub-goal range scales, wherein, institute It states target scale range and carries out Face datection for neural network；

Determining module 502, for determining the corresponding sub-neural network of the multiple sub-goal range scale；

Fusion Module 503, for the corresponding sub-neural network of the multiple sub-goal range scale to be merged, Obtain the neural network.

The generating means of the neural network provided through this embodiment by being divided to target scale range, obtain Multiple sub-goal range scales, and determine the sub-neural network corresponding to each sub-goal range scale, then by each sub-goal Sub-neural network corresponding to range scale is merged, and obtains the neural network for Face datection, the present embodiment generation Neural network compared with the prior art in the neural network based on scale invariability, be capable of detecting when that facial size range is great Face in facial image, while Face datection precision in improving facial image, additionally it is possible to improve face in facial image Detection efficiency.

Embodiment six

Based on identical technical concept, Fig. 7 is the generating means for the neural network for showing according to embodiments of the present invention six Structure diagram.It can be used to perform the generation method flow of the neural network as described in embodiment two.

With reference to Fig. 7, the generating means of the neural network include division module 601, determining module 602 and Fusion Module 603. Wherein, division module 601 for being divided to target scale range, obtain multiple sub-goal range scales, wherein, it is described Target scale range carries out Face datection for neural network；Determining module 602, for determining the multiple specific item scale model Enclose corresponding sub-neural network；Fusion Module 603, for the corresponding son of the multiple sub-goal range scale is refreshing It is merged through network, obtains the neural network.

Optionally, the division module 601, including：First division unit 6011, for the target scale range is equal It is even to be divided into multiple subranges；Extracting unit 6012 for being directed to each subrange, concentrates extraction to fall in current son in training data In the range of multiple face samples；First determination unit 6013, for according to each face sample in each subrange Characteristic value determines the multiple sub-goal range scale.

Optionally, first determination unit 6013, is specifically used for：For each face sample in each subrange, Pixel value based on the face sample generates the color histogram of the face sample；For each subrange, calculate respectively Chi-Square measure in the subrange between the color histogram of each two face sample obtains the face sample of the subrange Distance matrix；According to the face sample distance matrix, the appearance variation characteristic value of the subrange is generated；According to described each The appearance variation characteristic value of subrange, determines the multiple sub-goal range scale.

Optionally, first determination unit 6013, is additionally operable to：Described in the pixel value generation based on the face sample Before the color histogram of face sample, for each face sample in each subrange, according to the lower bound of current subrange Value, zooms in and out the size of the face sample.

Optionally, the division module 601, further includes：Second division unit 6014, for the characteristic point pair based on face The target scale range is divided, and obtains the multiple sub-goal range scale.

Optionally, the determining module 602, including：Second determination unit 6021, for being directed to each specific item scale model It encloses, determines the down-sampled step-length of the corresponding pondization of sub-goal range scale；And based on the down-sampled step-length of the pondization, determine the son The corresponding sub-neural network of target scale range.

Optionally, the Fusion Module 603, including：Integrated unit 6031, for being based on shared network parameter, by described in The corresponding sub-neural network of multiple sub-goal range scales is merged, and obtains the neural network.

It should be noted that the detail that the generating means of neural network provided in an embodiment of the present invention are further related to It is described in detail in the generation method of neural network provided in an embodiment of the present invention, is not repeating herein.

Embodiment seven

Based on identical technical concept, Fig. 8 is the structural frames for the human face detection device for showing according to embodiments of the present invention seven Figure.It can be used to perform the method for detecting human face flow as described in embodiment three.

With reference to Fig. 8, which includes Zoom module 701 and detection module 702.

Zoom module 701, for being zoomed in and out to facial image to be detected, the facial image after being scaled；

Detection module 702 for being detected through neural network to the facial image after scaling, obtains described to be detected The Face datection of facial image as a result,

Wherein, the neural network is generated according to the device described in embodiment five or embodiment six.

The human face detection device provided through this embodiment by being zoomed in and out to facial image to be detected, is contracted Facial image after putting, and the facial image after scaling is detected through neural network, obtain the face figure to be detected The Face datection of picture is capable of detecting when that facial size range is great as a result, the method for middle detection face compared with the prior art Face in facial image, while Face datection precision in improving facial image, additionally it is possible to improve face in facial image Detection efficiency.

Embodiment eight

Based on identical technical concept, Fig. 9 is the structural frames for the human face detection device for showing according to embodiments of the present invention eight Figure.It can be used to perform the method for detecting human face flow as described in example IV.

With reference to Fig. 9, which includes Zoom module 801 and detection module 802.Wherein, Zoom module 801, For being zoomed in and out to facial image to be detected, the facial image after being scaled；Detection module 802, for through nerve net Network is detected the facial image after scaling, obtains the Face datection of the facial image to be detected as a result, wherein, described Neural network is generated according to the device described in embodiment five or embodiment six.

Optionally, the Zoom module 801, including：Unit for scaling 8011, for according to the target ruler of the neural network The upper bound of range is spent, the facial image to be detected is zoomed in and out.

It should be noted that the detail further related to for human face detection device provided in an embodiment of the present invention is at this It is described in detail in the method for detecting human face that inventive embodiments provide, is not repeating herein.

Embodiment nine

The embodiment of the present invention additionally provides a kind of electronic equipment, such as can be mobile terminal, personal computer (PC), put down Plate computer, server etc..Below with reference to Figure 10, it illustrates suitable for being used for realizing the terminal device of the embodiment of the present invention or service The structure diagram of the electronic equipment 900 of device.As shown in Figure 10, electronic equipment 900 includes one or more first processors, the One communication device etc., one or more of first processors are for example：One or more central processing unit (CPU) 901 and/ Or one or more image processors (GPU) 913 etc., first processor can be according to being stored in read-only memory (ROM) 902 Executable instruction or performed from the executable instruction that storage section 908 is loaded into random access storage device (RAM) 903 Various appropriate actions and processing.In the present embodiment, the first read-only memory 902 and random access storage device 903 are referred to as One memory.First communication device includes communication component 912 and/or communication interface 909.Wherein, communication component 912 may include but Be not limited to network interface card, the network interface card may include but be not limited to IB (Infiniband) network interface card, communication interface 909 include such as LAN card, The communication interface of the network interface card of modem etc., communication interface 909 perform mailing address via the network of such as internet Reason.

First processor can communicate to perform executable finger with read-only memory 902 and/or random access storage device 903 It enables, is connected by the first communication bus 904 with communication component 912 and communicated through communication component 912 with other target devices, from And the corresponding operation of generation method of neural network any one of provided in an embodiment of the present invention is completed, for example, to target scale model It encloses and is divided, obtain multiple sub-goal range scales, wherein, the target scale range carries out face inspection for neural network It surveys；Determine the corresponding sub-neural network of the multiple sub-goal range scale；By the multiple sub-goal range scale point Not corresponding sub-neural network is merged, and obtains the neural network.

In addition, in RAM 903, it can also be stored with various programs and data needed for device operation.CPU901 or GPU913, ROM902 and RAM903 are connected with each other by the first communication bus 904.In the case where there is RAM903, ROM902 For optional module.RAM903 stores executable instruction or executable instruction, executable instruction is written into ROM902 at runtime First processor is made to perform the corresponding operation of above-mentioned communication means.Input/output (I/O) interface 905 is also connected to the first communication Bus 904.Communication component 912 can be integrally disposed, may be set to be with multiple submodule (such as multiple IB network interface cards), and It is chained in communication bus.

I/O interfaces 905 are connected to lower component：Importation 906 including keyboard, mouse etc.；It is penetrated including such as cathode The output par, c 907 of spool (CRT), liquid crystal display (LCD) etc. and loud speaker etc.；Storage section 908 including hard disk etc.； And the communication interface 909 of the network interface card including LAN card, modem etc..Driver 910 is also according to needing to connect It is connected to I/O interfaces 905.Detachable media 911, such as disk, CD, magneto-optic disk, semiconductor memory etc. are pacified as needed On driver 910, in order to be mounted into storage section 908 as needed from the computer program read thereon.

Need what is illustrated, framework as shown in Figure 10 is only a kind of optional realization method, can root during concrete practice The component count amount and type of above-mentioned Figure 10 are selected, are deleted, increased or replaced according to actual needs；It is set in different function component Put, can also be used it is separately positioned or integrally disposed and other implementations, such as GPU and CPU separate setting or can be by GPU collection Into on CPU, communication device separates setting, can also be integrally disposed on CPU or GPU, etc..These interchangeable embodiment party Formula each falls within protection scope of the present invention.

Particularly, according to embodiments of the present invention, it is soft to may be implemented as computer for the process above with reference to flow chart description Part program.For example, the embodiment of the present invention includes a kind of computer program product, including being tangibly embodied in machine readable media On computer program, computer program included for the program code of the method shown in execution flow chart, and program code can wrap The corresponding instruction of corresponding execution method and step provided in an embodiment of the present invention is included, for example, being divided to target scale range, is obtained To multiple sub-goal range scales, wherein, the target scale range carries out Face datection for neural network；It determines described more The corresponding sub-neural network of a sub- target scale range；By the corresponding son nerve of the multiple sub-goal range scale Network is merged, and obtains the neural network.In such embodiments, the computer program can by communication device from It is downloaded and installed on network and/or is mounted from detachable media 911.It is performed in the computer program by first processor When, perform the above-mentioned function of being limited in the method for the embodiment of the present invention.

Embodiment ten

The embodiment of the present invention additionally provides a kind of electronic equipment, such as can be mobile terminal, personal computer (PC), put down Plate computer, server etc..Below with reference to Figure 11, it illustrates suitable for being used for realizing the terminal device of the embodiment of the present invention or service The structure diagram of the electronic equipment 1000 of device.As shown in figure 11, electronic equipment 1000 include one or more second processors, Second communication device etc., one or more of second processors are for example：One or more central processing unit (CPU) 1001, And/or one or more image processors (GPU) 1013 etc., second processor can be according to being stored in read-only memory (ROM) Executable instruction in 1002 is loaded into the executable finger in random access storage device (RAM) 1003 from storage section 1008 It enables and performs various appropriate actions and processing.In the present embodiment, the second read-only memory 1002 and random access storage device 1003 are referred to as second memory.Second communication device includes communication component 1012 and/or communication interface 1009.Wherein, it communicates Component 1012 may include but be not limited to network interface card, and the network interface card may include but be not limited to IB (Infiniband) network interface card, communication interface 1009 include the communication interface of the network interface card of LAN card, modem etc., and communication interface 1009 is via such as because of spy The network of net performs communication process.

Second processor can communicate executable to perform with read-only memory 1002 and/or random access storage device 1003 Instruction, is connected with communication component 1012 by the second communication bus 1004 and led to through communication component 1012 and other target devices Letter, the corresponding operation of any one method for detecting human face that embodiment provides thereby completing the present invention, for example, to face to be detected Image zooms in and out, the facial image after being scaled；The facial image after scaling is detected through neural network, obtains institute The Face datection of facial image to be detected is stated as a result, wherein, the neural network is according to above-described embodiment one or embodiment What the method described in two generated.

In addition, in RAM 1003, it can also be stored with various programs and data needed for device operation.CPU1001 or GPU1013, ROM1002 and RAM1003 are connected with each other by the second communication bus 1004.In the case where there is RAM1003, ROM1002 is optional module.RAM1003 stores executable instruction or executable instruction is written into ROM1002 at runtime, Executable instruction makes second processor perform the corresponding operation of above-mentioned communication means.Input/output (I/O) interface 1005 also connects To the second communication bus 1004.Communication component 1012 can be integrally disposed, may be set to be (such as more with multiple submodule A IB network interface cards), and chained in communication bus.

I/O interfaces 1005 are connected to lower component：Importation 1006 including keyboard, mouse etc.；Including such as cathode The output par, c 1007 of ray tube (CRT), liquid crystal display (LCD) etc. and loud speaker etc.；Storage section including hard disk etc. 1008；And the communication interface 1009 of the network interface card including LAN card, modem etc..The also root of driver 1010 According to needing to be connected to I/O interfaces 1005.Detachable media 1011, such as disk, CD, magneto-optic disk, semiconductor memory etc., It is mounted on driver 1010 as needed, in order to be mounted into storage part as needed from the computer program read thereon Divide 1008.

Need what is illustrated, framework as shown in figure 11 is only a kind of optional realization method, can root during concrete practice The component count amount and type of above-mentioned Figure 11 are selected, are deleted, increased or replaced according to actual needs；It is set in different function component Put, can also be used it is separately positioned or integrally disposed and other implementations, such as GPU and CPU separate setting or can be by GPU collection Into on CPU, communication device separates setting, can also be integrally disposed on CPU or GPU, etc..These interchangeable embodiment party Formula each falls within protection scope of the present invention.

Particularly, according to embodiments of the present invention, it is soft to may be implemented as computer for the process above with reference to flow chart description Part program.For example, the embodiment of the present invention includes a kind of computer program product, including being tangibly embodied in machine readable media On computer program, computer program included for the program code of the method shown in execution flow chart, and program code can wrap The corresponding instruction of corresponding execution method and step provided in an embodiment of the present invention is included, for example, contracting to facial image to be detected It puts, the facial image after being scaled；The facial image after scaling is detected through neural network, is obtained described to be detected The Face datection of facial image is as a result, wherein, the neural network is according to the side described in above-described embodiment one or embodiment two Method generation.In such embodiments, which can be downloaded and installed by communication device from network, And/or it is mounted from detachable media 1011.When the computer program is performed by second processor, the embodiment of the present invention is performed Method in the above-mentioned function that limits.

Methods and apparatus of the present invention, equipment may be achieved in many ways.For example, software, hardware, firmware can be passed through Or any combinations of software, hardware, firmware realize the method and apparatus of the embodiment of the present invention, equipment.For the step of method Merely to illustrate, the step of method of the embodiment of the present invention, is not limited to described in detail above suitable for rapid said sequence Sequence, unless specifically stated otherwise.In addition, in some embodiments, the present invention can be also embodied as being recorded in record Jie Program in matter, these programs include being used to implement machine readable instructions according to the method for the embodiment of the present invention.Thus, this hair The recording medium of program of the bright also covering storage for execution according to the method for the embodiment of the present invention.

The description of the embodiment of the present invention in order to example and description for the sake of and provide, and be not exhaustively or will The present invention is limited to disclosed form, and many modifications and variations are obvious for the ordinary skill in the art.Choosing It is to more preferably illustrate the principle of the present invention and practical application to select and describe embodiment, and makes those of ordinary skill in the art It will be appreciated that the present invention is so as to design the various embodiments with various modifications suitable for special-purpose.

Claims

1. a kind of generation method of neural network, which is characterized in that the method includes：

Target scale range is divided, obtains multiple sub-goal range scales, wherein, the target scale range is for god Face datection is carried out through network；

Determine the corresponding sub-neural network of the multiple sub-goal range scale；

The corresponding sub-neural network of the multiple sub-goal range scale is merged, obtains the neural network.

2. according to the method described in claim 1, it is characterized in that, described divide target scale range, obtain multiple Sub-goal range scale, including：

The target scale range is uniformly divided into multiple subranges；

For each subrange, extraction is concentrated to fall multiple face samples in current subrange in training data；

According to the characteristic value of each face sample in each subrange, the multiple sub-goal range scale is determined.

3. according to the method described in claim 2, it is characterized in that, each face sample according in each subrange Characteristic value, determine the multiple sub-goal range scale, including：

For each face sample in each subrange, the pixel value based on the face sample generates the face sample Color histogram；

For each subrange, calculate respectively card side in the subrange between the color histogram of each two face sample away from From obtaining the face sample distance matrix of the subrange；

According to the face sample distance matrix, the appearance variation characteristic value of the subrange is generated；

According to the appearance variation characteristic value of each subrange, the multiple sub-goal range scale is determined.

4. according to the method described in claim 3, it is characterized in that, each face sample in each subrange, Before pixel value based on the face sample generates the color histogram of the face sample, further include：

For each face sample in each subrange, according to the floor value of current subrange, to the ruler of the face sample It is very little to zoom in and out.

5. according to the method described in any one claim in claim 1-4, which is characterized in that described to target scale model It encloses and is divided, obtain multiple sub-goal range scales, including：

The target scale range is divided based on the characteristic point of face, obtains the multiple sub-goal range scale.

6. a kind of method for detecting human face, which is characterized in that the method includes：

Facial image to be detected is zoomed in and out, the facial image after being scaled；

The facial image after scaling is detected through neural network, obtains the Face datection knot of the facial image to be detected Fruit,

Wherein, the neural network is generated according to the method described in any one claim in claim 1-5.

7. a kind of generating means of neural network, which is characterized in that described device includes：

Division module for being divided to target scale range, obtains multiple sub-goal range scales, wherein, the target Range scale carries out Face datection for neural network；

Determining module, for determining the corresponding sub-neural network of the multiple sub-goal range scale；

Fusion Module for the corresponding sub-neural network of the multiple sub-goal range scale to be merged, obtains institute State neural network.

8. a kind of human face detection device, which is characterized in that described device includes：

Zoom module, for being zoomed in and out to facial image to be detected, the facial image after being scaled；

Detection module for being detected through neural network to the facial image after scaling, obtains the face figure to be detected The Face datection of picture as a result,

Wherein, the neural network is device generation according to claim 7.

9. a kind of electronic equipment, which is characterized in that the equipment includes：First processor, first memory, the first communication device With the first communication bus, the first processor, the first memory and first communication device are logical by described first Believe that bus completes mutual communication；

For the first memory for storing an at least executable instruction, the executable instruction performs the first processor Such as the corresponding operation of generation method of claim 1-5 any one of them neural networks.

10. a kind of electronic equipment, which is characterized in that the equipment includes：Second processor, second memory, the second communication member Part and the second communication bus, the second processor, the second memory and second communication device pass through described second Communication bus completes mutual communication；

For the second memory for storing an at least executable instruction, the executable instruction performs the second processor The corresponding operation of method for detecting human face as claimed in claim 6.