CN101582163B

CN101582163B - Method for capturing clearest human face in video monitor images

Info

Publication number: CN101582163B
Application number: CN2009100537561A
Authority: CN
Inventors: 李科; 孙兵; 刘欢喜; 刘允才
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2009-06-25
Filing date: 2009-06-25
Publication date: 2011-05-04
Anticipated expiration: 2029-06-25
Also published as: CN101582163A

Abstract

The invention relates to a method for capturing the clearest human face in video monitor images; the method of rapid discrete Fourier transform is used to select the clearest human face, comprising the following steps: acquiring a video; using methods such as histogram equalization, image morphological processing and the like to pretreat the images; using the method of rapid target detection to carry out human face detection; using the method of mutual inclusiveness of eliminated areas and the method of YCbCr statistical property to eliminate incorrect human faces; estimating position and speed of the human faces according to the human faces in multiframe images to track human faces; using rapid discrete Fourier transform to judge definition of the human faces to select the clearest human faces. The method of the invention can realize detection and tracking of human faces, processing the monitoring video in real time; thus not needing much space for storing the monitoring video and saving time, space and cost.

Description

The method for catching of the most clear people's face in the video monitoring image

Technical field

The present invention relates to the method for catching of the most clear people's face in a kind of video monitoring image, is the information processing method in a kind of camera head monitor field.

Background technology

At present, the monitoring of single camera is widely used in Industry Control, safety guarantee, traffic monitoring, fields such as gate control system.Need automatic image in video, to detect in a lot of fields (as: security system, gate control system) with people's face, and record.Because videograph is continuous images, same individual can produce several thousand to several ten thousand width of cloth images according to the length of its residence time, if it is all write down is a kind of waste to the space, also bring very big trouble for the searching work in later stage, selecting the record that wherein sharpness is high is a kind of feasible solution intuitively.Because light, factor such as is blocked in the visual angle, in the video detected people's face not always the front be likely fuzzy clearly, choosing clear people's face from a large amount of consecutive images automatically becomes a difficult problem.

Through the discovery of searching to the prior art document, people such as Ying-Li Tian have delivered the paper that is entitled as " Robust and efficient foreground analysis for real-time video surveillance " (the effective Analysis on Prospect of robust in the real-time video monitoring) in ComputerVision and Pattern Recognition (computer vision and pattern-recognition) meeting in 2005, this article carries out modeling with the method for mixed Gaussian template to background, remove shade by gray scale and texture information then, the last people's face that in the prospect that obtains, detects.This method has well solved the detection and tracking problem of prospect (people's face), but can't select the image of high definition automatically from the facial image of continuous acquisition, does not have to solve the problem of choosing picture rich in detail from this a series of images.

Summary of the invention

The objective of the invention is at the deficiencies in the prior art, propose the method for catching of the most clear people's face in a kind of video monitoring image, can from the facial image of continuous acquisition, select the image of high definition automatically.

For achieving the above object, the present invention utilizes the fast discrete Fourier transform method, selects the most clear people's face in a large amount of people's face sequence images, comprising: the obtaining of video; Utilize methods such as histogram equalization, morphological image processing to the image pre-service; The method of utilizing fast target to detect is carried out people's face and is detected; According to the velocity estimation track human faces; Utilize fast discrete Fourier pair people face to select.

Method of the present invention realizes by following steps:

1, guarded region camera video captured is carried out the image pre-service, behind the adjustment brightness value coloured image is converted into black white image, image is carried out the gray balance that histogram equalization makes entire image, then image is carried out morphology and handle.

2, each two field picture after the morphology processing is detected people's face with improved AdaBoost method among the function library opencv that increases income; The regional method that comprises mutually of detected people's face utilization rejecting and the method for YCbCr statistical property are removed incorrect people's face; All use above-mentioned people's face detection algorithm to detect people's face to each two field picture; Be bold little and coordinate and of recorder by people's face location estimation people's face travel direction and speed in the multiple image of front and back.

If 3 have all detected people's face in two two field pictures of front and back, and people's face of back one frame is then thought same individual face near the coordinate of former frame people face; Do not detect in one two field picture of back if in prior image frame, detect people's face, then think emerging people's face; Do not detect in one frame of back if detect people's face in prior image frame, the position in image judges whether the people leaves guarded region according to people's face die-out time and people's face; Realize the tracking of people's face with this, and preserve all detected facial images as candidate image.

4, normalization is carried out in all candidate images of same individual, made it to have identical size, remove the frame portion of these candidate images then, remainder is done the fast discrete Fourier transform, obtain the Fourier transform value; M maximum in all the Fourier transform values pairing candidate image of value picked out, therefrom selected a maximum image of pixel to preserve, finish the seizure of the most clear people's face in the video monitoring image; Wherein M is the 5-10% of Fourier transform value total quantity.

Image whether clear at frequency domain by its high fdrequency component decision, and high fdrequency component is abundant more, its Fourier transform value is big more, by certain pre-service, can make Fourier transform value and sharpness proportional, selects image with this.

The present invention's remarkable result compared with prior art is: realize the detection and tracking of people's face more fast, processing monitor video that can be real-time.The present invention can select people's face and storage relevant information the most clearly from a large amount of people's faces, do not need a large amount of space storage monitor videos, only needs a little space storage key message, has saved time, space and cost.

Description of drawings

Fig. 1 is the state transition diagram of face tracking part in the embodiment of the invention.

Embodiment

Below in conjunction with drawings and Examples technical scheme of the present invention is elaborated.Following examples have provided detailed embodiment and process being to implement under the prerequisite with the technical solution of the present invention, but protection scope of the present invention is not limited to following embodiment.

The present embodiment concrete steps are as follows:

(1) Video Capture and image pre-service

By the camera capturing video, be transferred to background program then and handle.

Guarded region camera video captured is carried out the image pre-service, behind the adjustment brightness value coloured image is converted into black white image, image is carried out the gray balance that histogram equalization makes entire image, then image is carried out morphology and handle.

Describe for convenient, suppose that ambient brightness is not high at this moment, earlier brightness is compensated.Because the expression way of the image that obtains generally is to use RGB (RGB) value, be earlier that HSI (tone, color saturation, brightness) value is expressed with image transitions, conversion formula is:

I = \frac{R + G + B}{3}

S = 1 - \frac{3}{R + G + B} [\min (R, G, B)]

H = \{\begin{matrix} θ & B \leq G \\ 360 - θ & B > G \end{matrix}

Wherein

θ = \cos^{- 1} (R - \frac{G}{2} - \frac{B}{2}) / \sqrt{{(R - G)}^{2} + (R - G) (G - B)}

In the formula, R represents red component, and G represents green component, and B represents blue component, and H represents tone, and S represents color saturation, and I represents brightness.Increase suitable I value then, at this moment coloured image is converted to black white image.

The black white image that obtains is done histogram equalization.Gray scale to resulting all images point is added up, and obtains the probability density of each gray-scale value, probability density is carried out integration get final product.

(2) people's face detects

Pretreated each two field picture is detected people's face with improved AdaBoost method among the function library opencv that increases income.Although this method can provide very high verification and measurement ratio and very low fallout ratio, but still the zone that has some mistakes is by when conducting oneself face, as not being the zone of skin color and the zone that comprises mutually.The regional method that comprises is mutually rejected in detected people's face utilization in the present invention and the method for YCbCr (brightness, chroma blue, red color) statistical property is removed incorrect people's face.At first check the coordinate and the size in the zone that each is considered to people's face,, remove that is bigger if find to have a big zone to comprise a zonule.Detect each regional color then, a large amount of statistics show in the YCbCr color space, to have only the zone of those values in 0＜Y＜1.01,0.52＜Cb＜0.66,0.32＜Cr＜0.63 scope just to have people's face.If color is removed this zone not in this scope.

Be bold little and coordinate and of recorder by people's face location estimation people's face travel direction and speed in the multiple image of front and back.

(3) face tracking

If all detected people's face in two two field pictures of front and back, and people's face of back one frame is then thought same individual face near the coordinate of former frame people face; Do not detect in one two field picture of back if in prior image frame, detect people's face, then think emerging people's face; Do not detect in one frame of back if detect people's face in prior image frame, the position in image judges whether the people leaves guarded region according to people's face die-out time and people's face; Realize the tracking of people's face with this, and preserve all detected facial images as candidate image.

Fig. 1 has provided the state transition diagram of face tracking part in the present embodiment.As shown in Figure 1, this step is as described below: for ease of describing, define several variablees earlier.Array variable PRE[] be used for writing down people's face information that former frame is caught, array variable CUR[] be used for people's face information of minute book frame-grab.Array variable has following two kinds of main states: useful, useless.This variable of useful expression just is being used to recorded information, and this variable of useless expression is not used or goes out of use.Useful state has three branch's states again: new, normal, disappearance.New this variable storage of expression be people's face information that newly detects, and its value of the confidence can't think that less than putting letter threshold value lower limit this is people's face.Normal this people's face continuous several times of expression is detected.This people's face of expression that disappears was normal in the past, but had disappeared in nearest several frames, and die-out time is less than the die-out time threshold value.

To each variable CUR, earlier which PRE of inquiry can mate with it, if coupling is arranged, record is which variable of array mates.If do not mate, think that then this is emerging people's face, is kept at its information among the useless PRE.

For the PRE of each coupling, consider its state.If it is new, increase its value of the confidence, depositary's face information.If it is normal, only need depositary's face information.If disappear, this people has been found in this explanation once more, is normal with its Status Change, depositary's face information.

For array CUR[] in each CUR carry out above step, handle those arrays PRE[then] in do not mated.If state be new and the value of the confidence less than threshold value, think that it is a noise, directly the deletion.The value of the confidence reduces the value of the confidence greater than threshold value if state is new.If normal and near image boundary, think that it is to have walked out guarded region, output existing information.Near image boundary, it is not labeled as disappearance if still have normally.If state be disappear but die-out time less than threshold value, increase die-out time.If state be disappear and die-out time greater than threshold value, think that it has disappeared, the output existing information.

Enter people's face if this frame needs output information and select step, if do not need then directly enter the video that obtains of next samsara.

(4) people's face is selected

Described people's face selects part main by the realization of fast discrete Fourier transform.At first normalization is carried out in all candidate images of same individual, made it to have identical size.Remove the frame portion of these candidate images then, remainder is done the fast discrete Fourier transform, obtain the Fourier transform value.The image that 20 pixels be multiply by in 20 of present embodiment intercepting candidate image middle carries out the fast discrete Fourier transform to these images then, again transform center is transferred to picture centre.

Then begin the high fdrequency component of computed image.For the high fdrequency component of computed image, make earlier and multiply by 20 weight matrix for one 20, make this matrix more little the closer to the value of centre, the value near the edge is big more more.This is because Fast Fourier Transform (FFT) is arrived submarginal place with the information conversion of high fdrequency component.In the present embodiment, the value of its central ring is 1, and every later on outside circle adds 1.

The back addition of multiplying each other of the respective value of the respective value of this weight matrix and image is promptly obtained the Fourier transform value.

The size of all Fourier transform values is relatively selected wherein maximum 5 (are M=5, general, if the 5-10% that M is a Fourier transform value total quantity, is then got in candidate image greater than 50), therefrom selects a maximum image of pixel to preserve.This image all can be very suitable aspect sharpness and big or small two, makes human eye perceives very clear.Arrive this, finish the seizure of the most clear people's face in the video monitoring image.

At last with the people's face and the relevant information storage thereof that obtain, so that later processing and use.

Claims

1. the method for catching of the most clear people's face in the video monitoring image is characterized in that comprising following steps:

1) guarded region camera video captured is carried out the image pre-service, behind the adjustment brightness value coloured image is converted into black white image, image is carried out the gray balance that histogram equalization makes entire image, then image is carried out morphology and handle;

2) each two field picture after the morphology processing is detected people's face with improved AdaBoost method among the function library opencv that increases income; The regional method that comprises mutually of detected people's face utilization rejecting and the method for YCbCr statistical property are removed incorrect people's face, at first check coordinate and size that each is considered to human face region, there is a big zone to comprise a zonule if find, it is bigger to remove that, detect each regional color then, in the YCbCr color space, have only those values in 0＜Y＜1.01,0.52＜Cb＜0.66,0.32 the zone in＜Cr＜0.63 scope just has people's face, if color is removed this zone not in this scope; Be bold little and coordinate and of recorder by people's face location estimation people's face travel direction and speed in the multiple image of front and back;

3) if all detected people's face in two two field pictures of front and back, and people's face of back one frame is then thought same individual face near the coordinate of former frame people face; Do not detect in one two field picture of back if in prior image frame, detect people's face, then think emerging people's face; Do not detect in one frame of back if detect people's face in prior image frame, the position in image judges whether the people leaves guarded region according to people's face die-out time and people's face; Realize the tracking of people's face with this, and preserve all detected facial images as candidate image;

4) normalization is carried out in all candidate images of same individual, make it to have identical size, remove the frame portion of these candidate images then, remainder is done the fast discrete Fourier transform, transform center is transferred to picture centre, and make a weight matrix, make this weight matrix more little the closer to the value of centre, value near the edge is big more more, and the back addition of multiplying each other of the respective value of the respective value of this weight matrix and image is obtained the Fourier transform value; M maximum in all the Fourier transform values pairing candidate image of value picked out, therefrom selected a maximum image of pixel to preserve, finish the seizure of the most clear people's face in the video monitoring image; Wherein M is the 5-10% of Fourier transform value total quantity.