CN103218600A

CN103218600A - Real-time face detection algorithm

Info

Publication number: CN103218600A
Application number: CN2013101052206A
Authority: CN
Inventors: 王昆
Original assignee: Sichuan Changhong Electric Co Ltd
Current assignee: Sichuan Changhong Electric Co Ltd
Priority date: 2013-03-29
Filing date: 2013-03-29
Publication date: 2013-07-24
Anticipated expiration: 2033-03-29
Also published as: CN103218600B

Abstract

The invention discloses a real-time face detection algorithm, relates to a human-computer natural interaction technology and aims to provide a face detection method which occupies a small system resource and is rapid to use. The real-time face detection algorithm has the technical key points that the real-time face detection algorithm comprises the following steps of a step of detecting a whole frame, i.e. carrying out whole-frame face detection on an input image and if face information is not detected, carrying out whole-frame face detection on a next frame of image until the face information is detected; a step of recording a face region obtained by whole-frame detection, i.e. recording positon information of a face in the frame of image and a rectangle size of the face to obtain the face region; a step of sequentially carrying out detection of a forecast position on a first frame of image, a second frame of image, a third frame of image and a fourth frame of image with face information images, which are obtained by whole-frame detection; and repeating the steps to process subsequent images.

Description

A kind of real-time face detection algorithm

Technical field

The present invention relates to the natural human-machine interaction technology, particularly human face detection tech in the computer vision technique and real-time implementation thereof.

Background technology

The application of nature human-computer interaction technology and biological identification technology is more and more wider, and people's face detects with identification can greatly promote user experience.At present, effect stability and use more people's face detection algorithm mainly to be based on AdaBoost people's face detection algorithm of Haar.This algorithm mainly comprises two parts: training and identification.The general offline mode that adopts of training is chosen a large amount of people's face samples as positive sample, chooses a large amount of non-face images simultaneously as negative sample, trains, and training is general consuming time longer, and training result comprises a large amount of Haar features and weights.

The existing practice is: when people's face detects, rectangle traversal input picture by a fixed size, for each position that this rectangle is in, choose and calculate the Haar feature of the image of this position according to training result, and judge whether the image of this position is people's face.To input picture convergent-divergent certain proportion, repeat above testing process then, zoom to people that people's face detect to set up to original image and be bold till the little bound.

As seen, this method need travel through at the multiscale space of image, and the data amount information of processing is bigger, and along with the increase of handling picture size, calculated amount will significantly promote, and handle every two field picture and need calculate in a large number, will take a large amount of cpu resources.On multi-purpose computer, this algorithm also is difficult to reach in real time, moreover takies a large amount of system resources.

Summary of the invention

Technical matters to be solved by this invention is: at the problem of above-mentioned existence, provide a kind of less system resource and method for detecting human face fast of taking.

The technical solution used in the present invention is as follows: may further comprise the steps:

Whole frame detects step: input picture is put in order frame people face detect, if do not detect people's face information, then the next frame image is put in order frame people face and detect, up to detecting people's face information;

The whole detected human face region step of frame of record: recorder's face is positioned at positional information and people's face rectangle size of this two field picture, obtains human face region;

Described whole frame is detected the 1st two field picture, the 2nd two field picture, the 3rd two field picture and the 4th two field picture that have behind people's face frame to carry out predicted position successively and detects step, described predicted position detects step: the predicted position of image is carried out people's face detect, and preserve detected human face region position and human face region area; Described predicted position is corresponding to the position of the human face region that is detected in the former frame image, and the scope of described predicted position is that each limit extends 20% respectively on the basis of the human face region area that is detected in the former frame image, and is no more than the entire image size;

Repeat above steps and handle follow-up image.

Preferably, described whole frame detection step further comprises:

Step 101: view data is carried out the colour of skin extract, obtain the binary map of broca scale picture;

Step 102: colour of skin bianry image is successively carried out row projection and row projection, and merge the row and column of adjacency, obtain the area of skin color piece;

Step 103: each area of skin color piece is carried out people's face respectively detect.

Preferably, described whole frame detection step 101 comprises:

Step 1011: the RGB image to input carries out color space transformation, obtains the yuv space image;

Step 1012: create the wide contour single channel mask image of a width of cloth and input picture, all pixel values are initialized as 0;

Step 1013: travel through each pixel of described yuv space image,, then the value of respective pixel in the described single channel mask image is put 1 if the V of certain pixel and U component value lay respectively in [80 ~ 120] and [133 ~ 173] scope; Thereby obtain the mask image of binaryzation.

Preferably, described step 102 comprises:

Step 1021: the mask image of described binaryzation is sued for peace to the pixel value of every row by row, obtain a row vector;

Step 1022: from left to right scanning this row vector, is that separator is divided into several sections to this row vector with continuous 5 values less than 30, and give up length less than 20 the section;

Step 1023: remaining each section structure rectangle is obtained a sub-mask image list; The practice that each section structure rectangle is obtained sub-mask image is: the x coordinate of rectangle left upper apex is arranged in the position of capable vector for the starting point of this section, the y coordinate of rectangle left upper apex is 0, the width of rectangle is the length of this section, and the height of rectangle is the height of input picture;

Each sub-mask image in the sub-mask image list in the step 1023 is carried out following operation:

Step 1024:, obtain a column vector by the every row summation of row antithetical phrase mask image;

Step 1025: scanning this column vector from top to bottom, is separator with continuous 5 values less than 10, and this column vector is divided into several sections, and give up the height less than 10 the section;

Step 1026: remaining each section structure rectangle is obtained an area of skin color piece tabulation; The practice that each section structure rectangle is obtained the area of skin color piece is: the x coordinate of rectangle left upper apex is the x coordinate of the sub-mask image left upper apex of its correspondence, the y coordinate of rectangle left upper apex is arranged in the position of column vector for the starting point of this section, the width of rectangle is the width of the sub-mask image of its correspondence, and the height of rectangle is the length of this section.

Preferably, described each area of skin color piece is carried out people's face detect, obtain human face region.

Preferably, the method for described people's face detection is the AdaBoost people's face detection algorithm based on Haar.

In sum, owing to adopted technique scheme, the invention has the beneficial effects as follows:

The present invention adopts diverse ways to carry out people's face to each frame and detects, in general, and for the 1st, 6,11 ... handle Deng the mode that frame (every 4 frames) adopts whole frame to detect, for the 2nd, 3,4,5,7,8,9,10 ... Deng frame, promptly whole frame detects follow-up 4 frames of frame, and the mode that adopts predicted position to detect is handled.Predicted position is the detection position of previous frame, range size be detected human face region respectively enlarge 20% up and down, but do not exceed the entire image size.Only look for maximum human face region in this zone, speed is very fast, and occupying system resources is less.

Description of drawings

The present invention will illustrate by example and with reference to the mode of accompanying drawing, wherein:

Fig. 1 is the transfer process synoptic diagram of original input picture to single channel mask image.

Fig. 2 is the transfer process synoptic diagram of single channel mask image to sub-mask image list.

Fig. 3 is the transfer process synoptic diagram of sub-mask image list to the tabulation of area of skin color piece.

The area of skin color piece tabulation synoptic diagram of Fig. 4 for obtaining by one embodiment of the invention.

Fig. 5 detects the step synoptic diagram for predicted position of the present invention.

Embodiment

Disclosed all features in this instructions, or the step in disclosed all methods or the process except mutually exclusive feature and/or step, all can make up by any way.

Disclosed arbitrary feature in this instructions (comprising any accessory claim, summary and accompanying drawing) is unless special narration all can be replaced by other equivalences or the alternative features with similar purpose.That is, unless special narration, each feature is an example in a series of equivalences or the similar characteristics.

The algorithm improvement that the present invention adopts is that adopting diverse ways to carry out people's face to each frame detects, and specific practice is such, comprising:

The whole detected human face region step of frame of record: described human face region signal comprises that people's face is positioned at positional information and people's face rectangle size of this two field picture.

From detecting image, successively its 4 follow-up two field pictures are carried out predicted position and detect step with people's face information by whole frame.

Described predicted position detects step: the predicted position of image is carried out people's face detect, and preserve detected human face region position and human face region area; Described predicted position is corresponding to the position of the people's face that is detected in the former frame image, and the scope of described predicted position is that each limit extends 20% respectively on the basis of people's face rectangular area of being detected in the former frame image, and is no more than the entire image size.As Fig. 5, in one embodiment of the invention, suppose the detected people's face of former frame rectangle upper left corner fixed point coordinate for (x1, y), width is w1, highly is h; Then the present frame sensing range is (x1-w1 * 0.2, y-h * 0.2) for rectangle upper left corner fixed point coordinate, and the rectangle width is w1 * (1+0.2), and the rectangle height is h * (1+0.2).If people's face rectangle has surpassed image range after enlarging 20% outward on one side, then estimation range is exceeded with the image boundary of this limit correspondence.

Repeat above-mentioned three images that step process is follow-up.

Described whole frame detects step and further comprises:

As Fig. 1, be example with the image of 640 * 480 sizes, described whole frame detects step 101 and comprises:

Step 1011: the RGB image of 640 * 480 sizes to input carries out color space transformation, obtains the yuv space image;

Step 1012: create the single channel mask image of a width of cloth and 640 * 480 sizes, all pixel values are initialized as 0;

Step 1013: travel through each pixel of described yuv space image,, then the value of respective pixel in the described single channel mask image is put 1 if the V of certain pixel and U component value lay respectively in [80 ~ 120] and [133 ~ 173] scope; Thereby obtain the mask image of the binaryzation of 640 * 480 sizes.

As Fig. 2, described step 102 comprises:

Step 1021: the mask image of described binaryzation is sued for peace to the pixel value of every row by row, obtain a row vector with 640 elements;

Step 1022: from left to right scanning this row vector, is that separator is divided into several sections to this row vector with continuous 5 elements less than 30, and gives up length less than 20 section (for example, certain section have 18 elements);

Step 1023: remaining each section structure rectangle is obtained a sub-mask image list; The practice that each section structure rectangle is obtained sub-mask image is: the position that the x coordinate of rectangle left upper apex is arranged in capable vector for the starting point element of this section is sequence number in other words, the y coordinate of rectangle left upper apex is 0, the width of rectangle be this section length (for example, this section has 50 elements, the length of this section is 50 so), the height of rectangle is the height of input picture, i.e. 480 pixels;

As Fig. 3, each the sub-mask image in the sub-mask image list in the step 1023 is carried out following operation:

Step 1024:, obtain a column vector with 480 elements by the every row summation of row antithetical phrase mask image;

Step 1025: scanning this column vector from top to bottom, is separator with continuous 5 elements less than 10, and this column vector is divided into several sections, and give up the height less than 10 the section;

Step 1026: remaining each section structure rectangle is obtained an area of skin color piece tabulation; The practice that each section structure rectangle is obtained the area of skin color piece is: the x coordinate of rectangle left upper apex is the x coordinate of the sub-mask image left upper apex of its correspondence, the position that the y coordinate of rectangle left upper apex is arranged in column vector for the starting point element of this section is sequence number in other words, the width of rectangle is the width of the sub-mask image of its correspondence, and the height of rectangle is the length of this section.

Again described each area of skin color piece is carried out people's face and detect, obtain human face region information.

The method that said people's face detects in predicted position detection step and the step 103 among the present invention can be this area AdaBoost people's face detection algorithm based on Haar commonly used, because this algorithm is not described in detail in this its principle for the classic algorithm of detection people face.

The present invention is not limited to aforesaid embodiment.The present invention expands to any new feature or any new combination that discloses in this manual, and the arbitrary new method that discloses or step or any new combination of process.

Claims

1. a real-time face detection algorithm is characterized in that, may further comprise the steps:

Described whole frame is detected the 1st two field picture, the 2nd two field picture, the 3rd two field picture and the 4th two field picture that have behind people's face frame to carry out predicted position successively and detects step, described predicted position detects step: the predicted position of image is carried out people's face detect, and preserve detected human face region position and human face region area; Described predicted position is corresponding to the position of the human face region that is detected in the former frame image, and the scope of described predicted position is that each limit extends 20% respectively on the basis of people's face rectangular area of being detected in the former frame image, and is no more than the entire image size;

Repeat the follow-up image of aforementioned each step process successively.

2. algorithm according to claim 1 is characterized in that, described whole frame detects step and further comprises:

3. algorithm according to claim 2 is characterized in that, described whole frame detects step 101 and comprises:

4. algorithm according to claim 3 is characterized in that, described step 102 comprises:

5. algorithm according to claim 4 is characterized in that, described each area of skin color piece is carried out people's face detect, and obtains human face region.

6. according to any described algorithm in the claim 2 ~ 5, it is characterized in that the method that the people's face in described predicted position detection step and the step 103 detects is the AdaBoost people's face detection algorithm based on Haar.