CN101673346B

CN101673346B - Method, equipment and system for processing image

Info

Publication number: CN101673346B
Application number: CN 200810215058
Authority: CN
Inventors: 曾炜; 张洪明
Original assignee: NEC China Co Ltd
Current assignee: NEC China Co Ltd
Priority date: 2008-09-09
Filing date: 2008-09-09
Publication date: 2013-06-05
Anticipated expiration: 2028-09-09
Also published as: JP4642128B2; CN101673346A; JP2010103980A

Abstract

The invention discloses method, equipment and system for processing image. An image is captured by shooting the same scene by a plurality of cameras distributed at different positions. The method comprises the following steps: detecting face images in the captured image; with different pose angles, processing each detected face image by using a face pose model determined in advance, then generating synthetical images of different pose angles, and taking the synthetical images as synthetical image sets of the corresponding face images; extracting the characteristic vectors of the synthetical image in the synthetical image sets; calculating the distance between different synthetical image sets by calculating the characteristic vector distance between the synthetical images in different synthetical image sets; and clustering the detected face images based on the distance between different synthetical image sets. The method, equipment and system can easily detect and extract the high-quality face images.

Description

The method, apparatus and system that image is processed

Technical field

The present invention relates to face image processing, be specifically related to a kind of facial image from a plurality of video cameras be processed, in order to it is carried out the method for cluster, equipment and system.

Background technology

It is the much-talked-about topic of pattern-recognition research and development that people's face detects.In the past few years, human face detection tech is all obtaining significant progress aspect accuracy of detection and detection speed.The purpose that people's face detects is to determine whether to have in image the zone that comprises facial image and automatically locates these facial images zones.People's face detects the every field that is widely used in living.For example, in recent years, people's face detects and is embedded in digital camera, so that end user's face Automatic Measurement Technique helps people to obtain high-quality image for people's face.In addition, in safety applications, human face detection tech is used for extracting facial image and the facial image that extracts is offered the recognition of face device, thereby carries out the automatic analysis of people's face.

In addition, the people's face capture technique based on camera is the application that people's face detects.Then people's face capture technique exports those images with the facial image zone at first by camera captures images.This technology can be used for the front-end module of recognition of face or the front-end module of human face analysis system, in order to provide good facial image data for follow-up analysis.

The purpose that facial image is caught is automatically to extract facial image from the image of input.Usually the employing human face detection tech comes the human face region in positioning image.Because people's face is the 3D object, facial image is actually the projection of three-dimensional face object on two dimensional image plane.Different facial pose can produce several facial images from 3D people's face.Therefore, the task of people's face capture technique still will not detect and extract front face image, and will catch the facial image of other directions, for example side image.

Present people's face capture technique uses single camera usually.The image that at first people's face detection module catches from the camera acquisition, then each position in the image of code acquisition.In each position, at first people's face detection module determines the image-region of preliminary dimension, and judges whether this zone is human face region.If this zone is classified as human face region, this zone is used as the candidate face image-region and processes.After search, if these human face regions have lap, these facial images are merged.Merge the most at last the position mark behaviour face image-region in facial image zone.Iff the needs facial image, get final product from these facial image zone taking-up sample areas.

The shortcoming that detects based on people's face of single camera is that people's face detection module can obtain higher accuracy of detection for front face image, and for non-front face image, precision is relatively poor.Another problem is to only have when people face camera and can catch direct picture, and actual situation can not always satisfy this point.Therefore need face detection system to obtain as much as possible the direct picture of people's face.In other words, in the human face detection tech based on single camera, the problem that is difficult to avoid the human face posture due to the person of being taken to cause, reason is that people always do not face camera lens.If people do not face the camera lens of camera, the facial image of catching must be non-direct picture or side image.

In the people's face capture technique based on polyphaser, be very easy to obtain the direct picture of people's face, this is owing to a plurality of cameras being arranged facing to people, making the chance of the direct picture of catching people's face increase.In the human face detection tech based on polyphaser, a plurality of cameras can be caught the image of the different directions of same person face at synchronization, and this needs to carry out from the facial image of different cameras the problem of cluster with regard to having produced.

Summary of the invention

Image processing techniques of the present invention is based on polyphaser.Caught the different images of same person face at synchronization due to a plurality of cameras, technology of the present invention is intended to provide a kind of clustering method, so as with these images corresponding to different people cluster.In an embodiment of the present invention, utilize distance between image to measure similarity between the image of different attitudes, will carry out human face posture and estimate caused complicated calculations amount problem thereby removed from.In other words, image processing techniques of the present invention can be caught facial image effectively, and does not have the attitude estimation problem, and facial image that can outputting high quality.

In one aspect of the invention, proposed a kind of method that image is processed, described image is by a plurality of video cameras that are distributed in diverse location, Same Scene to be caught, and described method comprises step: detect facial image from the image of catching; With different attitude angles, with predetermined human face posture model, each in the facial image that detects is processed, generate the composograph of different attitude angles, as the composite diagram image set of corresponding facial image; Extract the eigenvector of the concentrated composograph of composograph; Calculate distance between different composite diagram image sets by calculating eigenvector distance between the composograph that different composographs concentrate; Based on the distance between different composite diagram image sets, the facial image that detects is carried out cluster.

In another aspect of this invention, proposed a kind of equipment that image is processed, described image is by a plurality of video cameras that are distributed in diverse location, this captured equipment of Same Scene to be comprised: pick-up unit, detect facial image from the image of catching; Faceform's memory storage is stored predetermined human face posture model; Treating apparatus with different attitude angles, is processed each in the facial image that detects with the human face posture model of storing in faceform's memory storage, generates the composograph of different attitude angles, as the composite diagram image set of corresponding facial image; Extraction element, the eigenvector of the composograph that the extraction composograph is concentrated; Apart from calculation element, calculate distance between different composite diagram image sets by calculating eigenvector distance between the composograph that different composographs concentrate; Clustering apparatus carries out cluster based on the distance between different composite diagram image sets to the facial image that detects.

In still another aspect of the invention, proposed a kind of image processing system, having comprised: be distributed in the video camera of diverse location, be used for catching image for Same Scene; Pick-up unit detects facial image from the image of catching; Faceform's memory storage is stored predetermined human face posture model; Treating apparatus with different attitude angles, is processed each in the facial image that detects with predetermined human face posture model, generates the composograph of different attitude angles, as the composite diagram image set of corresponding facial image; Extraction element, the eigenvector of the composograph that the extraction composograph is concentrated; Apart from calculation element, calculate distance between different composite diagram image sets by calculating eigenvector distance between the composograph that different composographs concentrate; Clustering apparatus carries out cluster based on the distance between different composite diagram image sets to the facial image that detects.

Utilize the technology of the embodiment of the present invention, can be easy to detect and extract high-quality facial image.Owing to having used a plurality of cameras that are arranged in diverse location, the technology of the embodiment of the present invention can solve the human face posture problem.The human face posture space is divided into a plurality of Ziren face configuration spaces by a plurality of cameras.It is very little that human face posture between each video camera changes degree.

The embodiment of the present invention has also adopted effective people's face distance to carry out the facial image cluster, thus more robust and calculated amount less.

Description of drawings

From the detailed description below in conjunction with accompanying drawing, above-mentioned feature and advantage of the present invention will be more obvious, wherein:

Fig. 1 shows the structural representation according to the image processing system of the embodiment of the present invention;

Fig. 2 shows the process flow diagram according to the image processing method of the embodiment of the present invention;

Fig. 3 processes the schematic diagram of attitude angle used to facial image; And

Fig. 4 carries out the schematic diagram that distance is calculated distance matrix used.

Embodiment

Below, describe the preferred embodiment of the present invention in detail with reference to accompanying drawing.For clarity and conciseness, the detailed description that is included in the known function and structure here will be omitted, and make theme of the present invention unclear to prevent them.

Fig. 1 shows the structural representation according to the image processing system of the embodiment of the present invention.As shown in Figure 1, comprise Video Capture part 10 according to the image processing system of the embodiment of the present invention, people's face test section 20, people's face cluster part 30 and select part 40.According to embodiments of the invention, people's face cluster part 30 comprises graphics processing unit 31, eigenvector extraction unit 32, metrics calculation unit 33 and cluster cell 34.

Video Capture part 10 for example is arranged on a plurality of video cameras of diverse location, and for Same Scene, for example the doorway of mansion is taken, and converts the vision signal of catching to Digital Image Data.The image of then catching is admitted to people's face test section 20.In people's face test section 20, comprise the position in the zone of people's face in the image that catch the location, and extract facial image based on these positions from image.Then, in people's face cluster part 30, each facial image from different cameras is carried out cluster, form the image sets for different people.At last, in selecting part 40, based on predetermined criterion, for example the distance between sharpness or two, elect each class as and select corresponding presentation graphics, as output the image sets after cluster.

In people's face cluster part 30, graphics processing unit 31 is processed each facial image with the 3D attitude mode or the 2D attitude mode that are stored in advance in faceform's storer (not shown), generate the synthetic facial image of each attitude angle, as the composite diagram image set for individual facial image.Then, eigenvector extraction unit 32 extracts LDA or the PCA vector of each composograph.Calculated the distance between the synthetic facial image of different composite diagram image sets by metrics calculation unit 33, and with the distance of minimum as the distance between two set.Next, cluster cell 34 carries out cluster based on the distance between the composite diagram image set to facial image, produces the image sets for different people.

Next, as mentioned above, select part 40 based on predetermined criterion, for example the distance between sharpness or two, elect each class as and select corresponding presentation graphics, as output the image sets after cluster.

Below in conjunction with describe detailed formation and the operating process of above-mentioned various piece in detail as Fig. 2～4.Fig. 2 shows the process flow diagram according to the image processing method of the embodiment of the present invention.

In an embodiment of the present invention, the form that adopts a plurality of cameras to cooperate is together caught image, and these camera arrangements become to take for same target, for example the doorway of mansion.In other words, these magazine at least part of public visual fields that have.

At step S11, comprise that the Video Capture part 10 of camera and video frequency collection card produces vision signal for Same Scene, vision signal is sampled and converted thereof into digital video image.The digital video image that produces is stored in the memory buffer (not shown) of system.According to embodiments of the invention, the form of image can be PAL or NTSC or determine according to user's needs.Equally, the size of image can be pre-determined or definite according to user's demand equally.

at step S12, various detecting devices are adopted in people's face test section, non-patent literature 1 (Ming-Hsuan Yan for example, David J.Kriengman, and Narendra Ahuja.DetectingFaces in Images:A Survey.IEEE Transactions On Pattern analysis andMachine Intelligence, Vol.24, No.1, pp.34-58, 2002) the various human-face detectors of describing in, perhaps non-patent literature 2 (Paul A.Viola, Michael J.Jones:RapidObject Detection using a Boosted Cascade of Simple Features.InProceedings of IEEE Computer Society Conference on Computer Visionand Pattern Recognition (CVPR2001), Vol.1, pp.511-518, Kauai, HI, USA, the sorter that 8-14December2001) proposes detects the human face region of catching in image.For example at first to using with the image of people's face and with the image of people's face, sorter not trained, then be applied in the detection in interested zone.If certain district inclusion facial image, sorter output ' 1 ', otherwise output ' 0 '.Like this, sorter finds the human face region in image after with all positions in different yardstick searching images.

At step S13, with different attitude angles, facial image is processed, obtain the composite diagram image set.As mentioned above, in the application based on polyphaser, a people is caught the facial image that obtains different attitude angles by a plurality of cameras, and this is because camera is in different positions and orientation.Therefore the process of cluster can be found out the process that the facial image of same people's different attitudes is classified.

Usually, be greater than similarity between the image of different attitudes from the similarity between the facial image of two identical attitudes of same people.Be less than from the similarity between the facial image of two identical attitudes of same people from the similarity between the facial image of two identical attitudes of different people.Therefore will to be easy to be gathered be a class to the facial image of identical attitude.According to embodiments of the invention, use non-patent literature 3 (W.Zhao, R.Chellappa, A.Rosenfeld, P.J.Phillips, Face Recognition:A Literature Survey, ACM ComputingSurveys, Vol.35, Issue4, pp.399-458, December2003) distance between described eigenvector such as LDA or PCA represents the similarity between facial image.According to embodiments of the invention, with different attitude angles, facial image is processed, for example play up, produce for different attitude angles, for example level and/or vertical-45 degree are to+45 degree, synthetic facial image, as the composite diagram image set of this facial image.Fig. 3 is the example of each human face posture angle.

At step S14, extract each Characteristic of Image vector that composograph is concentrated.At step S15, determine distance between the composite diagram image set by the distance between the calculated characteristics vector.

For example, then LDA or PCA eigenvector distance between each composograph of calculating different images collection calculate the minor increment in these distances, as the distance between the composite diagram image set.The below describes this process in detail.

Given two width facial image f _iAnd f _j, by with different attitude angles, two width facial images being played up to obtain to synthesize accordingly facial image collection F _iAnd F _jSynthetic facial image set representations is as follows:

F _i＝{f _i(-nθ)，…，f _i(-θ)，f _i(0)，f _i(θ)，…，f _i(nθ)}，

F _j＝{f _j(-nθ)，…，f _j(-θ)，f _j(0)，f _j(θ)，…，f _j(nθ)}.

f _i(k θ) and f _j(k θ) uses the synthetic facial image that obtains after the model rendering of certain attitude angle k θ, and θ is the unit human face posture angle of being scheduled to, and k is-and n is to the integer variable of n. for every couple of facial image f _i(k θ) and f _i(k θ) can pass through the eigenvector of LDA or PCA eigenvector apart from obtaining distance matrix.Fig. 4 shows the example of such a distance matrix.After having calculated eigenmatrix, with facial image f _iAnd f _jBetween minimum human face posture distance (MFPD) be defined as minor increment between distance matrix.MFPD is expressed as follows:

MFPD (f_{i}, f_{j}) = \min_{k_{i}, k_{j} &Element; [- n, n]} d (f_{i} (k_{i} θ), f_{j} (k_{j} θ)) .

At step S16, based on the distance between image set, facial image is carried out cluster.In case obtained the distance between two width facial images, can use the hierarchical clustering method of Constraint-based can be used to facial image is carried out cluster.The constraint here is if different facial images from same video camera, is divided into inhomogeneity with these facial images.For example, will be merged into same class apart from two classes less than predetermined threshold value, until there is no annexable classification.Two class C _iAnd C _jBetween distance definition as follows:

D (C_{i}, C_{j}) = \min_{\overset{f &Element; C_{i}}{f' &Element; C_{j}}} MFPD (f, f') .

Distance between two classes can also adopt maximum or average MFPD distance.

At step S17, after cluster, select piece image as the presentation graphics of such image from facial image according to predetermined criterion.For example select that width image of that maximum width image of distance between two or sharpness maximum as such presentation graphics

Top description only is used for realizing embodiments of the present invention; it should be appreciated by those skilled in the art; the any modification or partial replacement that is not departing from the scope of the present invention; all should belong to claim of the present invention and come restricted portion; therefore, protection scope of the present invention should be as the criterion with the protection domain of claims.

Claims

1. method that image is processed, described image is by a plurality of video cameras that are distributed in diverse location, Same Scene to be caught, described method comprises step:

Detect facial image from the image of catching;

With different attitude angles, with predetermined human face posture model, each in the facial image that detects is processed, generate the composograph of different attitude angles, as the composite diagram image set of corresponding facial image;

Extract the eigenvector of the concentrated composograph of composograph;

Calculate distance between different composite diagram image sets by calculating eigenvector distance between the composograph that different composographs concentrate;

Based on the distance between different composite diagram image sets, the facial image that detects is carried out cluster;

Wherein, comprise apart from the step of calculating the distance between different composite diagram image sets by the eigenvector that calculates between the concentrated composograph of different composographs:

Calculate the eigenvector distance between the concentrated composograph of different composographs;

Minimum eigenvector distance in the eigenvector distance of determining to calculate is as the distance between described different composite diagram image sets.

2. the method for claim 1 also comprises:

That maximum width image of distance select two from the image of same cluster between is as the presentation graphics of this cluster.

3. the method for claim 1 also comprises:

Select that width image of sharpness maximum from the image of same cluster, as the presentation graphics of this cluster.

4. the method for claim 1, wherein said eigenvector is LDA or PCA eigenvector.

5. the method for claim 1 wherein saidly comprises the step that the facial image that detects carries out cluster based on the distance between different composite diagram image sets:

Distance between different composite diagram image sets is divided into same cluster less than the facial image of predetermined threshold.

6. if the method for claim 1, wherein different facial image from same video camera, is divided into inhomogeneity with these facial images.

7. the method for claim 1, wherein the scope of described attitude angle is from-45 degree to+45 level and/or the vertical angles of spending.

8. the method for claim 1, wherein said human face posture model is 2D or 3D human face posture model.

9. equipment that image is processed, described image is captured to Same Scene by a plurality of video cameras that are distributed in diverse location, this equipment comprises:

Pick-up unit detects facial image from the image of catching;

Faceform's memory storage is stored predetermined human face posture model;

Treating apparatus with different attitude angles, is processed each in the facial image that detects with the human face posture model of storing in faceform's memory storage, generates the composograph of different attitude angles, as the composite diagram image set of corresponding facial image;

Extraction element, the eigenvector of the composograph that the extraction composograph is concentrated;

Apart from calculation element, calculate distance between different composite diagram image sets by calculating eigenvector distance between the composograph that different composographs concentrate;

Clustering apparatus carries out cluster based on the distance between different composite diagram image sets to the facial image that detects;

Its middle distance calculation element calculates the eigenvector distance between the concentrated composograph of different composographs, and minimum eigenvector distance in the eigenvector distance of determining to calculate, as the distance between described different composite diagram image sets.

10. equipment as claimed in claim 9 also comprises:

Selecting arrangement, that the maximum width image of distance between being used for selecting two from the image of same cluster is as the presentation graphics of this cluster.

11. equipment as claimed in claim 9 also comprises:

Selecting arrangement, that width image of selection sharpness maximum from the image of same cluster is as the presentation graphics of this cluster.

12. equipment as claimed in claim 9, wherein said eigenvector are LDA or PCA eigenvector.

13. equipment as claimed in claim 9, wherein said clustering apparatus is divided into same cluster with the distance between different composite diagram image sets less than the facial image of predetermined threshold.

14. equipment as claimed in claim 9, wherein, if different facial images from same video camera, clustering apparatus is divided into inhomogeneity with this image.

15. equipment as claimed in claim 9, wherein, the scope of described attitude angle is level and/or the vertical angle from-45 degree to+45 degree.

16. equipment as claimed in claim 9, wherein said human face posture model are 2D or 3D human face posture model.