CN103077368A - Method and device for positioning mouth part of human face image as well as method and system for recognizing mouth shape - Google Patents

Method and device for positioning mouth part of human face image as well as method and system for recognizing mouth shape Download PDF

Info

Publication number
CN103077368A
CN103077368A CN2011103281380A CN201110328138A CN103077368A CN 103077368 A CN103077368 A CN 103077368A CN 2011103281380 A CN2011103281380 A CN 2011103281380A CN 201110328138 A CN201110328138 A CN 201110328138A CN 103077368 A CN103077368 A CN 103077368A
Authority
CN
China
Prior art keywords
image
mouth
carried out
facial
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2011103281380A
Other languages
Chinese (zh)
Inventor
王晓平
曾文斌
赵文忠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yinchen Intelligent Identfiying Science & Technology Co Ltd Shanghai
Original Assignee
Yinchen Intelligent Identfiying Science & Technology Co Ltd Shanghai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yinchen Intelligent Identfiying Science & Technology Co Ltd Shanghai filed Critical Yinchen Intelligent Identfiying Science & Technology Co Ltd Shanghai
Priority to CN2011103281380A priority Critical patent/CN103077368A/en
Publication of CN103077368A publication Critical patent/CN103077368A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a method and a device for positioning a mouth part of a human face image and a method and a system for recognizing the mouth shape. The method for positioning the mouth part comprises the following steps of: detecting and positioning two eyes in an input human face image; carrying out geometric rectification on the human face image on the basis of the positioned result to form a first image; extracting a mouth part pre-estimation area from the first image to obtain a second image; establishing a skin color model of human face on the basis of the first image; projecting the second image to the skin color model so as to calculate the probability of skin color to obtain a third image; carrying out binarization processing on the third image to form a fifth image; forming an eighth image by the second image on the basis of a color tone component; carrying out assistant detection on the fifth image by the eighth image to form a ninth image; carrying out mathematical morphology operation on the ninth image to form an eleventh image; and determining the position of a mouth part area in the input human face image by a horizontal projection and a vertical projection on the basis of the eleventh image. According to the technical scheme, accurate positioning of the mouth part can be effectively realized.

Description

The mouth localization method of facial image and device, the recognition methods of mouth shape and system
Technical field
The present invention relates to image processes and area of pattern recognition, particularly a kind of mouth localization method of facial image and device, the recognition methods of mouth shape and system.
Background technology
Along with growing to modern intellectual technology demand of society, the state of people's face mouth detected is also becoming a significant emerging research topic.
For example: with the closely bound up traffic safety of life monitoring field, if can realize real time intelligent control to driver driving, and can in monitoring, give timely prompting to the yawning tired situation of opening one's mouth, will significantly reduce road accident rate; In the Expression Recognition technical field, the contour shape of mouth will be a key character as expression classification; In the lip reading aid identification technical field for the deaf-mute, the continuous change procedure of mouth shape will be the basic foundation that discourse content is identified, etc.
And the primary prerequisite of above everything required solution is: realize the effective location to mouth.Traditional method is in advance the colour of skin in a large amount of facial images, lip look to be added up, and sets up color distribution model, again based on the mouth location of the color distribution model realization of setting up to the facial image of input.Yet but there is the situation of colour cast in the facial image of some input, and this will bring larger interference to the method that the common in advance statistics of taking is set up color distribution model, thereby cause the mouth locate failure.For making when there is the colour cast situation in facial image the mouth location still effective, propose a kind of method that can accurately locate the colorized face images mouth and seem very necessary.
Correlation technique can be the Chinese patent application of CN 1710595A with reference to publication number also, this Patent Application Publication a kind of mouth-corner positioning method.
Summary of the invention
The problem to be solved in the present invention provides a kind of mouth localization method of facial image, can effectively realize the mouth in the facial image is accurately located, thereby help to improve the accuracy rate of lip-syncing shape identification.
For addressing the above problem, technical solution of the present invention provides a kind of mouth localization method of facial image, comprising:
Eyes in the facial image of input are carried out detection and location, and based on the result behind the location described facial image is carried out geometry correction, form the first image;
From described the first image, extract mouth and estimate the zone, obtain the second image;
Based on described the first image, set up the complexion model of people's face;
Described the second image is carried out projection to calculate skin color probability to described complexion model, obtain the 3rd image;
Described the 3rd image is carried out binary conversion treatment, form the 5th image;
Described the second image is formed the 8th image based on the tone component, and with described the 8th image described the 5th image is carried out auxiliary detection, form the 9th image;
Described the 9th image is carried out the mathematical morphology operation, form the 11 image;
Based on described the 11 image, determine respectively the position of mouth region in the facial image of input with horizontal projection and vertical projection.
Optionally, from described the first image, extract mouth and estimate the zone, obtain the second image and comprise:
Estimate from left to right the left margin in zone with 25% place of described the first picture traverse as described mouth, estimate the edge, the right in zone with 75% place of described the first picture traverse as described mouth;
Estimate from top to bottom the upper edge in zone with 65% place of described the first picture altitude as described mouth, estimate the lower edge in zone with the lower edge of described the first image as described mouth;
Four edges cut out described mouth and estimate described the second image of zone formation along upper and lower, left and right.
Optionally, described based on described the first image, the complexion model of setting up people's face comprises:
With the color space of described the first image transitions to brightness and chrominance separation;
Choose the chromatic component in the described color space, statistics is set up described complexion model.
Optionally, the color space of described brightness and chrominance separation is the Lab color space.
Optionally, described complexion model is Gauss model.
Optionally, described the 3rd image is carried out binary conversion treatment, forms the 5th image and comprise:
Described the 3rd image is carried out obtaining the 4th image after the binaryzation operation;
Whether judge the number of white pixel in described the 4th image greater than first threshold, be then described the 4th image to be carried out exporting described the 5th image behind the inversion operation, otherwise directly export as described the 5th image with the 4th image;
Described first threshold multiply by greater than 50% ratio in the total number of pixels of described the 4th image to be set.
Optionally, described the second image being formed the 8th image based on the tone component comprises:
With described the second image transitions to the color space with tone component;
Choose described tone component and export the 6th image;
Described the 6th image is carried out binary conversion treatment, form the 8th image.
Optionally, described color space with tone component is the HSV color space, and described tone component is the H component.
Optionally, described described the 6th image is carried out binary conversion treatment, forms the 8th image and comprise:
Described the 6th image is carried out obtaining the 7th image after the binaryzation operation;
Whether judge the number of white pixel in described the 7th image greater than Second Threshold, be then described the 7th image to be carried out exporting described the 8th image behind the inversion operation, otherwise directly export as described the 8th image with the 7th image;
Described Second Threshold multiply by greater than 50% ratio in the total number of pixels of described the 7th image to be set.
Optionally, describedly with described the 8th image described the 5th image is carried out auxiliary detection, forms the 9th image and comprise:
For the white pixel in described the 5th image, if the pixel in the predetermined neighborhood of its respective pixel in described the 8th image is black picture element, then this white pixel in described the 5th image is set to black picture element;
After traveling through all pixels in described the 5th image, form described the 9th image.
Optionally, described the 9th image is carried out the mathematical morphology operation, forms the 11 image and comprise:
Described the 9th image is communicated with district's mark, and a connection district of Retention area maximum, the tenth image formed;
Described the tenth image is carried out the closed operation operation, form described the 11 image.
Optionally, described based on described the 11 image, the position with mouth region in the facial image of horizontal projection and the definite input of vertical projection comprises respectively:
Described the 11 image is carried out respectively horizontal projection and vertical projection;
From up to down and the projection value that produces of bottom-up line by line determined level projection, if projection value surpasses the 3rd threshold value, then the position that described projection value is corresponding is respectively as upper edge and the lower edge of described mouth region;
From left to right and right-to-left judge by column the projection value that vertical projection produces, if projection value surpasses the 4th threshold value, then the position that described projection value is corresponding is respectively as left margin and the edge, the right of described mouth region;
Determine the position of mouth region in the facial image with four edges, upper and lower, left and right.
Optionally, described the 3rd threshold value and the 4th threshold value are 1%~5% of maximal projection value.
For addressing the above problem, technical solution of the present invention also provides a kind of mouth shape recognition methods of facial image, comprise: position with the mouth of above-mentioned mouth localization method to the facial image of input, and carry out the identification of mouth shape based on the described mouth region of determining behind the location.
For addressing the above problem, technical solution of the present invention also provides a kind of mouth locating device of facial image, comprising:
Pretreatment unit is suitable for the eyes in the facial image of input are carried out detection and location, and based on the result behind the location described facial image is carried out geometry correction, forms the first image;
Extraction unit is suitable for extracting mouth from described the first image and estimates the zone, obtains the second image;
Complexion model is set up the unit, is suitable for setting up the complexion model of people's face based on described the first image;
The skin color probability computing unit is suitable for described the second image is carried out projection to calculate skin color probability to described complexion model, obtains the 3rd image;
The first binarization unit is suitable for described the 3rd image is carried out binary conversion treatment, forms the 5th image;
Tone auxiliary detection unit is suitable for described the second image is formed the 8th image based on the tone component, and with described the 8th image described the 5th image is carried out auxiliary detection, forms the 9th image;
The mathematical morphology operating unit is suitable for described the 9th image is carried out the mathematical morphology operation, forms the 11 image;
The projection positioning unit is suitable for based on described the 11 image, determines respectively the position of mouth region in the facial image of input with horizontal projection and vertical projection.
For addressing the above problem, technical solution of the present invention also provides a kind of mouth shape recognition system, comprising: the mouth locating device of above-mentioned facial image; Recognition unit is suitable for based on described mouth locating device the mouth of facial image of input being positioned rear definite described mouth region and carries out the identification of mouth shape.
Compared with prior art, technical scheme of the present invention has the following advantages:
Thereby by the facial image of input being set up immediately the acquisition of adaptive flesh colour model corresponding to the skin distribution of this image, then the mouth that extracts facial image is estimated zone formation the second image and is carried out projection to calculate skin color probability to described complexion model, thereby obtain the 3rd image, and then described the 3rd image is carried out binary conversion treatment form the 5th image, meanwhile described the second image is carried out auxiliary detection based on formed the 8th image of tone component to described the 5th image, then be communicated with again block reservation, a series of mathematical morphology operations such as closed operation, accurately locate mouth region by sciagraphy at last, thereby can realize effectively that the mouth in the facial image accurately locates, can also improve thus the accuracy rate of lip-syncing shape identification.
Wherein, can make the result of binary conversion treatment more accurate by carry out auxiliary detection in conjunction with the tone component, thereby reach the better locating effect of mouth region in the described facial image.
Description of drawings
Fig. 1 is the schematic flow sheet of the mouth localization method of the facial image that provides of embodiment of the present invention;
Fig. 2 extracts the schematic diagram that mouth is estimated the zone;
Fig. 3 a to Figure 13 a is the actual example figure of the mouth location of facial image in colour cast situation not of the embodiment of the invention;
Fig. 3 b to Figure 13 b is the actual example figure of the mouth location of facial image in the first colour cast situation of the embodiment of the invention;
Fig. 3 c to Figure 13 c is the actual example figure of the mouth location of facial image in the second colour cast situation of the embodiment of the invention;
Fig. 3 d to Figure 13 d is the actual example figure of the mouth location of facial image in the third colour cast situation of the embodiment of the invention;
Figure 14 is the structural representation of the mouth locating device of the facial image that provides of embodiment of the present invention.
Embodiment
Prior art is added up the colour of skin in a large amount of facial images, lip look usually in advance, set up color distribution model, again based on the mouth location of the color distribution model realization of setting up to the facial image of input, yet, when there is the colour cast situation in the facial image of input, cause possibly the mouth locate failure to facial image, thereby be difficult to effectively realize that the mouth in the facial image accurately locates, and then affect the accuracy rate of lip-syncing shape identification.
For this reason, the technical program at first uses the human eye location technology that the position of eyes in the facial image of input is positioned, and based on the location after the result described facial image is carried out geometry correction, thereby by the facial image of input being set up immediately the acquisition of adaptive flesh colour model corresponding to the skin distribution of this image, then extract the mouth approximate region (mouth is estimated the zone) of estimating, again mouth is estimated regional formed image (the second image) thus carrying out projection to described complexion model obtains probability distribution graph (the 3rd image), and then the 3rd image carried out binary conversion treatment, meanwhile carry out auxiliary detection in conjunction with the tone component, then be communicated with again block reservation, a series of mathematical morphology operations such as closed operation, accurately locate mouth region by sciagraphy at last, thereby can realize effectively that the mouth in the facial image accurately locates, thereby help to improve the accuracy rate of lip-syncing shape identification.
The advantage of the technical program is to need not in advance the colour of skin, lip look to be carried out statistical modeling, but the instant adaptive modeling of facial image to inputting, thereby can not be subject to image color cast to the impact of mouth location, when having promoted mouth positioning result reliability, strengthen the dirigibility of algorithm yet.
For above-mentioned purpose of the present invention, feature and advantage can more be become apparent, below in conjunction with accompanying drawing the specific embodiment of the present invention is described in detail.Set forth detail in the following description so that fully understand the present invention.But the present invention can be different from alternate manner described here and implements with multiple, and those skilled in the art can be in the situation that do similar popularization without prejudice to intension of the present invention.Therefore the present invention is not subjected to the restriction of following public embodiment.
Fig. 1 is the schematic flow sheet of the mouth localization method of the facial image that provides of embodiment of the present invention.As shown in Figure 1, the mouth localization method of described facial image comprises:
Step S101 carries out detection and location to the eyes in the facial image of input, and based on the result behind the location described facial image is carried out geometry correction, forms the first image;
Step S102 extracts mouth and estimates the zone from described the first image, obtain the second image;
Step S103 based on described the first image, sets up the complexion model of people's face;
Step S104 carries out projection to calculate skin color probability with described the second image to described complexion model, obtains the 3rd image;
Step S105 carries out binary conversion treatment to described the 3rd image, forms the 5th image;
Step S106 forms the 8th image with described the second image based on the tone component, and with described the 8th image described the 5th image is carried out auxiliary detection, forms the 9th image;
Step S107 carries out the mathematical morphology operation to described the 9th image, forms the 11 image;
Step S108 based on described the 11 image, determines respectively the position of mouth region in the facial image of input with horizontal projection and vertical projection.
Fig. 3 a to Figure 13 a is the actual example figure of the mouth location of facial image in colour cast situation not of the embodiment of the invention.Below in conjunction with Fig. 1 and Fig. 3 a to Figure 13 a, elaborate with the mouth localization method of specific embodiment to above-mentioned facial image.
At first, execution in step S101 carries out detection and location to the eyes in the facial image of input, and based on the result behind the location described facial image is carried out geometry correction, forms the first image.In the present embodiment, the facial image of input is colorized face images, in the specific implementation, after also can transferring the colorized face images of input to gray level image, carry out eyes detection and location and geometry correction, certainly, formed described the first image still is colorized face images behind the step S101 again.
In the present embodiment, the facial image of inputting is carried out two point (eyes) detection and location, can adopt any one human-eye positioning method in the prior art, for example AdaBoost method, artificial neural network method, template matching method, Gray Projection method etc.Be conventionally known to one of skill in the art as for various human eye location technologies, do not repeat them here.
Step S101 carries out pretreated process to the facial image of inputting.For the method for various recognitions of face, the standardization of facial image all is very important, and final recognition result is also had direct impact.The standardization here refers to mainly in each width of cloth facial image, whether the relative position of the key position of people's face in image be consistent.Because normally utilize the information of entire image to carry out recognition of face, for the original image without any processing, may there be skew in the position of people face position in image, this can affect the correct identification of people's face.Especially, the embodiment of the invention is that the mouth to facial image positions, will certainly affect the mouth location if the position of people face position in image is inaccurate, so need to carry out geometry correction to the facial image of input, so that the last all normalizings of the facial image under the different input conditions arrive same size, and people's face key position also is consistent as far as possible.Geometry correction mainly comprises: convergent-divergent, rotation, upset.Convergent-divergent is exactly that the people's face that comprises in the original image is zoomed to unified size, and it is according to the coordinate relation (obtaining by above-mentioned human eye location technology) that is eyes.Human eye is very important parts of human face, by convergent-divergent process to guarantee binocular interval from identical, thereby the positions such as other positions such as nose, mouth, cheek also all remain on relative standard's position.Rotation is exactly facial image in the original image is carried out rotation processing in the plane, and fundamental purpose is to make line between two remain on the position of level.Upset mainly is to consider that may there be the problem that turns upside down in somebody's face image, and just can correct by upset, so that the people's face in the target image keeps positive.Because the mouth localization method of the described facial image of embodiment of the present invention specifically positions the mouth region in the facial image of input, can extract comparatively exactly mouth for the ease of subsequent step and estimate the zone, in the present embodiment, described geometry correction mainly is the level correction of the facial image of inputting being carried out by the mode of rotation, after namely adopting the human eye location technology to obtain the position (coordinate) of eyes, take the eyes line as benchmark the not positive facial image of attitude is carried out level correction.
By step S101 the facial image of inputting is carried out pre-service, and after exporting the first image, execution in step S102 extracts mouth and estimates the zone from described the first image, obtain the second image.In the present embodiment, step S102 specifically can comprise:
Step S102a, 25% place with described the first picture traverse estimates regional left margin as described mouth from left to right, estimates the edge, the right in zone as described mouth with 75% place of described the first picture traverse;
Step S102b, 65% place with described the first picture altitude estimates regional upper edge as described mouth from top to bottom, estimates the lower edge in zone as described mouth with the lower edge of described the first image;
Step S102c from described the first image, cuts out described mouth along four edges, the determined upper and lower, left and right of step S102a, step S102b and estimates the zone, forms described the second image.
Fig. 2 extracts the schematic diagram that mouth is estimated the zone.Consult Fig. 2, this figure carries out the afterwards schematic diagram of the first image of acquisition of pre-service to the facial image of inputting, and there is shown the facial main position in the facial image, such as eye, nose, mouth, cheek etc.The width of supposing described the first image is w, highly be h, in the present embodiment, when determining that mouth is estimated the zone, estimate from left to right the left margin in zone as described mouth with 25% place of described the first picture traverse, estimate the edge, the right in zone as described mouth with 75% place of described the first picture traverse, namely as shown in Figure 2, described mouth is estimated the left margin in zone and the distance between the left margin of described the first image is 0.25w, the right that described mouth is estimated the zone along and the right of described the first image along between distance be 0.25w (be described mouth the right of estimating the zone along and the left margin of described the first image between distance be 0.75w, indicate among the figure); Estimate from top to bottom the upper edge in zone as described mouth with 65% place of described the first picture altitude, estimate the lower edge in zone as described mouth with the lower edge of described the first image, namely as shown in Figure 2, it is that 0.35h (is that the distance that described mouth is estimated between the upper edge of regional upper edge and described the first image is 0.65h that described mouth is estimated distance between the lower edge of upper edge and described the first image in zone, do not indicate among the figure), the lower edge that described mouth is estimated the zone overlaps with the lower edge of described the first image.After determining that described mouth is estimated four edges, upper and lower, left and right in zone, estimate the zone just can determine described mouth, mouth was estimated the zone as described in the rectangular area was shown in dotted line frame among Fig. 2.Described mouth is estimated region deviding and has been gone out the approximate region at mouth place in the facial image, can tentatively delimit the residing scope of mouth, can accurately determine more targetedly thus the particular location of mouth, it is follow-up more accurate to the mouth zone location to make, and has also reduced the calculated amount when subsequent step positions mouth region.
Need to prove, the described mouth of the present embodiment is estimated four edges, upper and lower, left and right in zone and is determined according to practical experience, should guarantee that mouth drops on described mouth and estimates in the scope in zone, thus the calculated amount the when scope that can not make again described mouth estimate the zone affects too greatly the accuracy of location and increases the location.At other embodiment, also can take other standards determine mouth estimate the zone upper, lower, left, right four edges, for example, the distance that described mouth is estimated between the left margin of regional left margin and described the first image is 0.2w, the right that described mouth is estimated the zone along and the right of described the first image along between distance be 0.2w, described mouth is estimated the upper edge in zone and the distance between the lower edge of described the first image is 0.4h, the lower edge that described mouth is estimated the zone overlaps with the lower edge of described the first image, the mouth of determining is like this estimated the zone and is implemented determined mouth with respect to this to estimate regional extent larger, can guarantee that mouth drops on described mouth and estimates in the scope in zone, but follow-up calculated amount when mouth region is accurately located also can be relatively large.
In the first image, cut out the mouth that obtains and estimate the zone, after estimating the zone and forms the second image by described mouth, just finished mouth has been estimated regional extraction.Fig. 3 a is a concrete example of the second image.Shown in Fig. 3 a, the mouth that extracts is estimated mouth, chin and the part cheek etc. that mainly comprise people's face in the zone, estimate the zone by the mouth that extracts, the basic approximate range of determining mouth, in order to realize the accurate location of mouth, what next face is exactly the problem that the colour of skin, lip look two classes are distinguished, and therefore also needs the second image is carried out a series of processing, determines mouth profile in the second image by the differentiation of the colour of skin and lip look.
After forming described the first image by step S101, also need execution in step S103, based on described the first image, set up the complexion model of people's face.Need to prove, step S103 needs could carry out based on the first image that step S101 forms, therefore step S103 carries out after step S101, but the execution sequence of step S102 and step S103 then there is no dividing of priority, execution in step S103 after can first execution in step S102, execution in step S102 after also can first execution in step S103 can also carry out step S102, step S103 simultaneously.
In the present embodiment, the concrete modeling method of step S103 is as follows:
Step S103a is with the color space of described the first image transitions to brightness and chrominance separation;
Step S103b chooses the chromatic component in the described color space, and statistics is set up described complexion model.
Because CCD used in everyday (Charged Coupled Device, charge-coupled image sensor) the direct perception of image capture device is RGB (RGB) component, yet from rgb value, be difficult to know represented Color Cognition attribute, and each component all comprises, and monochrome information is also processed to image and compression has brought difficulty, so common way is that image is transformed into the color space (color space) of a brightness and chrominance separation to obtain preferably Clustering features from rgb space.In the present embodiment, the color space of described brightness and chrominance separation is the Lab color space, namely select the Lab color space to set up described complexion model, because the Lab color space is the even color space that CIE (International Commission on Illumination) recommends, it is comprised of luminance channel L and Color Channel a, b, the a passage represents that from redness to green scope, the b passage represents from blueness to yellow scope.This color space and people are very approaching to the perception of color, and the distance between different colours can directly be weighed with Euclidean distance.In other embodiments, the color space of described brightness and chrominance separation also can be the YCbCr color space, and wherein Y refers to luminance component, and Cb refers to the chroma blue component, and Cr refers to the red color component.
The general colour model roughly can be divided into two classes: parameter type model (Parametric) and nonparametric formula model (Non-parametric), both have very big-difference its on the storage area of required computing time of colour of skin sample training and parameter.The parameter type complexion model is Gauss model (GM, Gaussian Model), gauss hybrid models (Gaussian Mixture Model), oval border complexion model (Elliptic boundary model) etc. for example; Nonparametric formula complexion model such as Bei Shi sorter (Bayes classifier), Normalized Lookup Table (NLUT), Self-Organizing Map (SOM) etc.
In the present embodiment, the described complexion model of foundation is the Gauss model in the parameter type complexion model.Particularly, described the first image transitions behind the Lab color space, is chosen a, b component statistics is set up colour of skin dimensional Gaussian model N (μ, σ), wherein, μ is the Mean Parameters of this model, and σ is the standard deviation parameter of this model.Because the colour of skin has occupied the overwhelming majority of described the first image, so what the model of setting up reflected generally is skin distribution, and lip, eyes, eyebrow etc. are because the area ratio that occupies is obviously little many than colour of skin area, so what this part pixel played in the skin color modeling process is the effect of noise, can not change the overall distribution of the colour of skin.
Need to prove, step S103 carries out statistical modeling to the colour of skin, lip look in advance such as what commonly use in the prior art, but to the instant adaptive modeling of facial image of each width of cloth input, the parameter of the model of namely setting up is according to the difference of color of the facial image of input and respective change, thereby can not be subject to image color cast to the impact of mouth location, can promote the reliability of follow-up mouth positioning result, also strengthen the dirigibility of algorithm.
After completing steps S102, the step S103, in order to realize the differentiation of the colour of skin and lip look in the second image, execution in step S104 carries out projection to calculate skin color probability with described the second image to described complexion model, obtains the 3rd image.Thereby step S104 is the process of each pixel of described the second image being carried out colour of skin detecting acquisition skin color probability distribution plan.Colour of skin detecting is main to be to judge whether the pixel of surveying is colour of skin point by complexion model, because distributing, skin color probability can be considered normal distribution (Normal Distribution), approach its skin color probability distribution therefore in the present embodiment the colour of skin has been set up Gauss model, by can calculate skin color probability value corresponding to pixel to this model projection.Particularly, the skin color probability value P of each pixel can be calculated as follows, and realizes that namely each pixel is to described complexion model projection:
P = ( 2 π ) - d 2 | Σ | - 1 2 exp [ - 1 2 ( x - μ ) Σ - 1 ( x - μ ) T ]
Wherein, d is vectorial dimension, in the present embodiment, and d=2;
X is the modeling vector, x=(a, b);
μ is vectorial average, μ = 1 n Σ i = 1 n x i , Wherein n is number of samples;
∑ is covariance matrix, Σ = σ a 2 σ ab σ ab σ b 2 .
By above-mentioned computing formula, by behind the pixel traversing operation, form described the 3rd image.Fig. 4 a is a concrete example of the 3rd image.Shown in Fig. 4 a, with described the second image to described complexion model projection with after calculating skin color probability, the 3rd image of acquisition has tentatively embodied the general profile of mouth, but still clear not, but also has interference sections, need to further process.
After step S104 formed the 3rd image, execution in step S105 carried out binary conversion treatment to described the 3rd image, forms the 5th image.In the present embodiment, step S105 specifically can comprise:
Step S105a carries out obtaining the 4th image after the binaryzation operation to described the 3rd image;
Whether step S105b judges the number of white pixel in described the 4th image greater than first threshold, is then described the 4th image to be carried out exporting described the 5th image behind the inversion operation, otherwise directly exports as described the 5th image with the 4th image; Wherein, described first threshold multiply by greater than 50% ratio in the total number of pixels of described the 4th image and sets.
To image binaryzation operation, be the basic operation that image is processed, the image of 256 grades of gray scales for example, the gray-scale value of the pixel on the image is set to 0 or 255 exactly, namely whole image is presented obvious black and white effect.The gray level image of 256 brightness degrees is chosen the binary image that obtains still can reflect integral image and local feature by suitable threshold value.In Digital Image Processing, bianry image occupies very important status, and image binaryzation is conducive to the further processing of image, and image is become simply, and data volume reduces, and can highlight the profile of interested target.For example: all gray scales can be judged to be more than or equal to the pixel of threshold value and belong to certain objects, its gray-scale value is set to 255, otherwise these pixels are excluded beyond object area, and gray-scale value is set to 0, the object area of expression background or exception.During implementation, can use various known Thresholds, such as maximum variance between clusters, maximum entropy method, empirical method etc., in the present embodiment, preferably adopt maximum variance between clusters self-adaptation definite threshold.
By step S105a the 3rd image is carried out the binaryzation operation, form the 4th image.Fig. 5 a is a concrete example of the 4th image.Shown in Fig. 5 a, the black of area maximum partly is mouth (corresponding to lip look area) among the figure, and white portion is the colour of skin (corresponding to colour of skin area) on every side, in order to guarantee that mouth is thereby that white is conveniently carried out subsequent operation, therefore also need execution in step S105b, judge that the pixel of mouth is black picture element or white pixel, if black picture element, then transfer the pixel of mouth to white pixel by inversion operation, transfer the pixel of surrounding skin to black picture element.Particularly, calculate first the white pixel number in described the 4th image, and compare with the first threshold that arranges, because the described mouth of preceding step is estimated region division must be looser, colour of skin area should be obviously greater than lip look area, so described first threshold can be set in an advantage ratio of total number of pixels in the 4th image in this zone.In the present embodiment, described advantage ratio specifically refers to the larger part pixel proportion of proportion that white pixel number in the image (being made of white pixel and black picture element) and black picture element number account for total number of pixels in the image.Therefore, total number of pixels in the 4th image can be multiply by one greater than setting out described first threshold after 50% the ratio.If the white pixel number in the 4th image is greater than described first threshold, make the larger specific gravity that color pixel number (colour of skin area) has accounted for the total number of pixels of the 4th image clear, (white pixel is set to black picture element then the 4th image to be carried out inversion operation, black picture element is set to white pixel) described the 5th image of rear output, otherwise remain unchanged, export as described the 5th image with the 4th image.Fig. 6 a is a concrete example of the 5th image, shown in Fig. 6 a, with respect to Fig. 5 a, the white portion of area maximum is mouth (corresponding to lip look area) among Fig. 6 a, and black part is divided into the colour of skin (corresponding to colour of skin area) on every side, certainly, interference sections still exists, and also may there be error in the result of step S105 binaryzation, therefore also needs to rely on subsequent step to process, the error of removing interference sections and revising step S105 binaryzation result.
Until step S105 exports the 5th image, meanwhile, also execution in step S106 forms the 8th image with described the second image based on the tone component, and with described the 8th image described the 5th image is carried out auxiliary detection, forms the 9th image through step S103.
In the present embodiment, among the step S106 described the second image is formed the 8th image based on the tone component and specifically can comprise:
S106a, with described the second image transitions to the color space with tone component;
S106b chooses described tone component and exports the 6th image;
S106c carries out binary conversion treatment to described the 6th image, forms the 8th image.
Particularly, the color space that has the tone component described in the step S106a is the HSV color space, and described tone component is the H component.HSV model (HSV color space) is a kind of Color Cognition model that obtains through nonlinear transformation from RGB model (rgb color space), a circular cone subset corresponding to cylindrical-coordinate system, wherein, H represents tone, S represents color saturation, V represents lightness (brightness), this model has advantages of that not only colourity separates with brightness, and than RGB model better visual effect is arranged, based on this, combine tone component H in the present embodiment and come mouth is carried out auxiliary detection, purpose is to further carrying out the accuracy that color verifies to improve the binaryzation result through the mouth testing result behind the step S105.
Wherein, step S106c specifically also comprises:
Described the 6th image is carried out obtaining the 7th image after the binaryzation operation.Wherein, but the implementation of the process refer step S105a of binaryzation operation, during implementation, can use equally various known Thresholds, such as maximum variance between clusters, maximum entropy method, empirical method etc., in the present embodiment, preferably adopt maximum variance between clusters self-adaptation definite threshold.
After forming described the 7th image, whether judge the number of white pixel in described the 7th image greater than Second Threshold, be then described the 7th image to be carried out exporting described the 8th image behind the inversion operation, otherwise directly export as described the 8th image with the 7th image; Described Second Threshold multiply by greater than 50% ratio in the total number of pixels of described the 7th image to be set.But the implementation of this step refer step S105b, its purpose and step S105b are similar, thereby in order to guarantee that mouth is that white is conveniently carried out subsequent operation.Particularly, calculate first the white pixel number in described the 7th image, and compare with the Second Threshold that arranges, because the described mouth of preceding step is estimated region division must be looser, colour of skin area should be obviously greater than lip look area in this zone, so described Second Threshold can be set in an advantage ratio of total number of pixels in the 7th image, soon total number of pixels multiply by one greater than setting out described Second Threshold after 50% the ratio in the 7th image.If the white pixel number in the 7th image is greater than described Second Threshold, make the larger specific gravity that color pixel number (colour of skin area) has accounted for the total number of pixels of the 7th image clear, (white pixel is set to black picture element then the 7th image to be carried out inversion operation, black picture element is set to white pixel) described the 8th image of rear output, otherwise remain unchanged, export as described the 8th image with the 7th image.Fig. 7 a, Fig. 8 a, Fig. 9 a are respectively the concrete example of the 6th image, the 7th image, the 8th image.Described the 6th image of step S106b output is shown in Fig. 7 a, and described the 7th image of S106c output, the 8th image are shown in Fig. 8 a, Fig. 9 a.Wherein, Fig. 8 a is the image that Fig. 7 a is carried out the rear output of binaryzation operation, in this actual example, has slight gray scale difference although can only find out Fig. 8 a and Fig. 7 a, but in other actual example, in the colour cast situation of the facial image various degrees of inputting, the image (the 7th image) of output just can exist obvious gray scale difference (can consult Fig. 7 b and Fig. 8 b with the preoperative image of binaryzation (the 6th image) after the binaryzation operation, Fig. 7 c and Fig. 8 c or Fig. 7 d and Fig. 8 d, wherein Fig. 7 b and Fig. 8 b, Fig. 7 c and Fig. 8 c, the facial image that Fig. 7 d and Fig. 8 d are respectively input is three kinds of concrete examples of formed the 6th image and the 7th image in the colour cast situation in various degree); Owing to needn't carry out inversion operation in this actual example, therefore Fig. 8 a is identical with Fig. 9 a, but in other actual example, when judging to carry out inversion operation to the 7th image the time, the 8th image of output just can be different from the 7th image (for example can consult Fig. 8 c and Fig. 9 c, wherein the concrete example of Fig. 8 c and Fig. 9 c facial image formed the 7th image and the 8th image in the second colour cast situation that are respectively input); In the 8th image shown in Fig. 9 a, also be formed with the mouth profile of white, the mouth profile of white difference to some extent in the 5th image shown in itself and Fig. 6 a but, thus follow-uply can carry out auxiliary detection to the mouth in the 5th image by the 8th image and reach the purpose of revising step S105 part binaryzation result.
In the present embodiment, with described the 8th image described the 5th image is carried out auxiliary detection among the step S106, forms the 9th image and specifically can comprise:
Step S106d, for the white pixel in described the 5th image, if the pixel in the predetermined neighborhood of its respective pixel in described the 8th image is black picture element, then this white pixel in described the 5th image is set to black picture element;
Step S106e, travel through all pixels in described the 5th image after, form described the 9th image.
Step S106d and step S106e are the process of the mouth in the 5th image being carried out auxiliary detection by the 8th image.Particularly, contrast the 5th image and the 8th image, for the white pixel in the 5th image, investigate the situation of its predetermined neighborhood of respective pixel in the 8th image (this figure is binary picture), described predetermined neighborhood can choose 3 * 3,5 * 5,7 * 7 neighborhoods such as grade, if in the 8th image in the neighborhood of respective pixel all pixel values be black (0 value) entirely, then this white pixel in the 5th image is put black (0 value), after according to said method traveling through all pixels (what specifically investigate is all white pixel) in the 5th image, export described the 9th image.Figure 10 a is a concrete example of the 9th image.Shown in Figure 10 a, after by the 8th image the mouth in the 5th image being carried out auxiliary detection, not only revised the mouth profile in the 5th image, but also some interference sections have been removed, mouth profile among the figure is comparatively clear, but therefore the small size perforated that still has some also needs to be for further processing.
Execution in step S107 carries out the mathematical morphology operation to described the 9th image, forms the 11 image.Mathematical morphology (Mathemati-cal Morphology) is based upon on the basis of set theory, is a kind of mathematical method for research geometric configuration and structure.In recent years, mathematical morphology has developed into a kind of novel Digital Image Processing and the method for area of pattern recognition, is applied in the numerous areas such as image detection, texture analysis, grain size analysis, feature generation, skeletonizing, shape analysis, compression, constituent analysis and refinement.The basic thought that mathematical morphology is used for the image processing is the correspondingly-shaped of removing to measure and extract image with the structural element with certain form, to reach the purpose to graphical analysis and identification.The fundamental operation of mathematical morphology comprises: expansion, burn into opening and closing.Expansion is that all background dots that will contact with target object in the image merge to the process in the object, and the result makes that target increases, the aperture dwindles, and can augment the space in the target, makes it form connected domain; Corrosion has the effect that the target of making is dwindled, the target endoporus increases and outside isolated noise is eliminated; Unlatching has small objects in the removal of images, and connects at object that very thin place separates object and level and smooth larger object acts on easily; And closure has tiny space in the object image of filling, connects the effect of adjacent object and smooth boundary.
In the present embodiment, step S107 specifically comprises:
Step S107a is communicated with district's mark to described the 9th image, and a connection district of Retention area maximum, forms the tenth image.Owing to having more connection district (connected domain) in the 9th image, wherein the area maximum connection district is mouth, therefore, by the connection district in the 9th image is marked, and the connection district of Retention area maximum, remove the connection district of other small sizes, just can reach the purpose that keeps mouth and remove peripheral interference sections.Figure 11 a is a concrete example of the tenth image, shown in Figure 11 a, with respect to Figure 10 a, the small size of periphery connection district (white portion of small size) is removed among Figure 10 a, the connection district of the area maximum that keeps is the mouth of required location, yet, still also have fraction to disturb pixel among Figure 11 a, for example also have little black picture element in the large-area white portion.
Step S107b carries out the closed operation operation to described the tenth image, forms described the 11 image.Closed operation operation by this step can be filled little hole, namely remove the less black picture element that exists in the above-mentioned large-area white portion (mouth), during implementation, desirable 4 neighborhoods of morphological operation operator or 8 neighborhoods form described the 11 image after the closed operation operation.
About the specific implementation of mathematical morphology operation, can with reference to prior art implementation commonly used, be not described in detail at this.Figure 12 a is a concrete example of the 11 image.Shown in Figure 12 a, formed more clear, complete mouth profile among the figure, there is not interference sections in periphery yet, and has hardly the small size hole in the mouth profile yet, therefore the follow-up position that just can determine mouth region by the part of the white pixel in the 11 image.
After step S107 formed described the 11 image, execution in step S108 based on described the 11 image, determined respectively the position of mouth region in the facial image of input with horizontal projection and vertical projection.Need to prove, mouth region described in the step S108 is the mouth precise region of required location, the described mouth that step S102 extracts is estimated the approximate range that mouth then can only be determined in the zone, the embodiment of the invention is at first to extract mouth to estimate the zone, then estimates at described mouth and further determines more accurate described mouth region on the regional basis.
In the present embodiment, step S108 specifically can comprise:
Step S108a carries out respectively horizontal projection and vertical projection to described the 11 image;
Step S108b, from up to down and the projection value that produces of bottom-up line by line determined level projection, if projection value surpasses the 3rd threshold value, then the position that described projection value is corresponding is respectively as upper edge and the lower edge of described mouth region;
Step S108c, from left to right and right-to-left judge by column the projection value that vertical projection produces, if projection value surpasses the 4th threshold value, then the position that described projection value is corresponding is respectively as left margin and the edge, the right of described mouth region;
Step S108d determines the position of mouth region in the facial image with four edges, upper and lower, left and right.
Wherein, described horizontal projection can be taked following formula:
J ( y ) = Σ x = m n I ( x , y )
Wherein, the projection value when J (y) is y for ordinate, I (x, y) are the value of pixel (x, y), and [m, n] is between projection in zone.
Similarly, described vertical projection can be taked following formula:
J ( x ) = Σ y = m n I ( x , y )
Wherein, the projection value when J (x) is x for horizontal ordinate, I (x, y) are the value of pixel (x, y), and [m, n] is between projection in zone.
In the specific embodiment, described the 3rd threshold value, the 4th threshold value can arrange according to actual conditions, for example, described the 3rd threshold value, the 4th threshold value all can be made as 1%~5% of maximal projection value, that is: when described the 3rd threshold value being made as horizontal projection 1%~5% of resulting maximal projection value, when described the 4th threshold value is made as vertical projection 1%~5% of resulting maximal projection value.During actual enforcement, described the 3rd threshold value also can equate with described the 4th threshold value.
Need to prove, no matter the facial image of input (coloured image) be normal color or be in the colour cast situation, can effectively realize all that by above-mentioned steps S101 to S108 the mouth of described facial image accurately locates.The actual example of the mouth region positioning result that Figure 13 a is facial image in colour cast situation not.Consult Figure 13 a, the figure acceptance of the bid shows the mouth region 30a after mouth is estimated regional 20a and finished the mouth location, can find out, when the facial image of input in colour cast situation not, the accurate location that relies on the mouth localization method of the described facial image of the embodiment of the invention can effectively realize mouth.Wherein, the described mouth of cutting is estimated the formed image of regional 20a and is the second image shown in Fig. 3 a.Fig. 3 b to Figure 13 b is the actual example figure of the mouth location of facial image in the first colour cast situation of the embodiment of the invention, Fig. 3 c to Figure 13 c is the actual example figure of the mouth location of facial image in the second colour cast situation of the embodiment of the invention, and Fig. 3 d to Figure 13 d is the actual example figure of the mouth location of facial image in the third colour cast situation of the embodiment of the invention.Fig. 3 b to Figure 13 b, Fig. 3 c to Figure 13 c, Fig. 3 d to Figure 13 d are specially the actual example of the second image to the 11 images and the actual example of mouth region positioning result respectively corresponding to Fig. 3 a to Figure 13 a.Facial image as for input is being realized the implementation that mouth is located in the colour cast situation in various degree, but same refer step S101 to S108 does not repeat them here.Figure 13 b is the actual example of the mouth region positioning result of facial image in the first colour cast situation, Figure 13 c is the actual example of the mouth region positioning result of facial image in the second colour cast situation, and Figure 13 d is the actual example of the mouth region positioning result of facial image in the third colour cast situation.Such as Figure 13 b, Figure 13 c, shown in Figure 13 d, mouth region 30b after its mouth that has indicated respectively the first colour cast situation human face image is estimated regional 20b and finished the mouth location, mouth region 30c after the mouth of the second colour cast situation human face image is estimated regional 20c and finished the mouth location, mouth region 30d after the mouth of the third colour cast situation human face image is estimated regional 20d and finished the mouth location, this shows, even when the facial image colour cast of inputting even serious colour cast, rely on the mouth localization method of the described facial image of the embodiment of the invention can orient more exactly mouth region equally, visible the method has stronger robustness and adaptability.
In addition, need to prove, when the input facial image colour cast degree not simultaneously, after extracting mouth and estimating the zone, distinguish to some extent to the 11 image that formed the 11 image of step S107 does not only form in the colour cast situation with the facial image of input separately by the step S103 in the mouth localization method of the described facial image of the embodiment of the invention, and it is also distinguished each other to some extent, but this does not affect the accurate location to described mouth region, specifically can be respectively with reference to figure 12b, Figure 12 c, the actual example that has formed the 11 image in the colour cast situation in various degree among Figure 12 d for the facial image of input.
Based on the mouth localization method of above-mentioned facial image, embodiment of the present invention also provides a kind of mouth shape recognition methods of facial image.The recognition methods of described mouth shape comprises: position with the mouth of above-mentioned mouth localization method to the facial image of input, and carry out the identification of mouth shape based on the described mouth region of determining behind the location.The implementation of concrete mouth shape identification can adopt the prior art usual way to carry out, and is not described in detail at this.
Need to prove, the mouth localization method of the facial image that embodiment of the present invention provides and the recognition methods of mouth shape can be applied in the traffic safety monitoring field, realization is to the real time intelligent control of driver driving, give timely prompting by the identification of mouth shape to the yawning tired situation of opening one's mouth in monitoring, this will significantly reduce road accident rate; Also can be applied in the Expression Recognition technical field, the contour shape of mouth will be as a key character of expression classification; In can also the lip reading aid identification technical field for the deaf-mute, the continuous change procedure of mouth shape will be the basic foundation that discourse content is identified.
Based on the mouth localization method of above-mentioned facial image, embodiment of the present invention also provides a kind of mouth locating device of facial image.Figure 14 is the structural representation of the mouth locating device of the facial image that provides of embodiment of the present invention.As shown in figure 14, the mouth locating device of described facial image comprises: pretreatment unit 101, and be suitable for the eyes in the facial image of input are carried out detection and location, and based on the result behind the location described facial image carried out geometry correction, form the first image; Extraction unit 102 is connected with described pretreatment unit 101, is suitable for extracting mouth from described the first image and estimates the zone, obtains the second image; Complexion model is set up unit 103, is connected with described pretreatment unit 101, is suitable for setting up the complexion model of people's face based on described the first image; Skin color probability computing unit 104 is set up unit 103 with described extraction unit 102, complexion model and is linked to each other, and is suitable for described the second image is carried out projection to calculate skin color probability to described complexion model, obtains the 3rd image; The first binarization unit 105 links to each other with described skin color probability computing unit 104, is suitable for described the 3rd image is carried out binary conversion treatment, forms the 5th image; Tone auxiliary detection unit 106, link to each other with described extraction unit 102, the first binarization unit 105, be suitable for described the second image is formed the 8th image based on the tone component, and with described the 8th image described the 5th image carried out auxiliary detection, form the 9th image; Mathematical morphology operating unit 107 links to each other with described tone auxiliary detection unit 106, is suitable for described the 9th image is carried out the mathematical morphology operation, forms the 11 image; Projection positioning unit 108 links to each other with described mathematical morphology operating unit 107, is suitable for based on described the 11 image, determines respectively the position of mouth region in the facial image of input with horizontal projection and vertical projection.
In the present embodiment, described extraction unit 102 can comprise: first side is along determining unit, be suitable for estimating as described mouth with 25% place of described the first picture traverse from left to right the left margin in zone, estimate the edge, the right in zone with 75% place of described the first picture traverse as described mouth; Estimate from top to bottom the upper edge in zone with 65% place of described the first picture altitude as described mouth, estimate the lower edge in zone with the lower edge of described the first image as described mouth; The cutting unit is suitable for along described first side along determining unit four edges, determined upper and lower, left and right and cuts out described mouth and estimate the zone and form described the second image.
Described complexion model is set up unit 103 and can be comprised: the first converting unit is suitable for the color space of described the first image transitions to brightness and chrominance separation; Statistics is set up model unit, is suitable for choosing the chromatic component in the described color space, and statistics is set up described complexion model.In the specific embodiment, the color space of described brightness and chrominance separation can be the Lab color space, and described complexion model can be Gauss model.
Described the first binarization unit 105 can comprise: the first binaryzation operating unit is suitable for described the 3rd image is carried out obtaining the 4th image after the binaryzation operation; The first judging unit is suitable for judging that whether the number of white pixel in described the 4th image is greater than first threshold; The first output unit is suitable for being described the 5th image of output after being then described the 4th image to be carried out inversion operation when the judged result of described the first judging unit, otherwise directly exports as described the 5th image with the 4th image; Described first threshold is set in an advantage ratio of the total number of pixels of described the 4th image.
Described tone auxiliary detection unit 106 can comprise: the second converting unit is suitable for described the second image transitions to the color space with tone component; The 3rd output unit is suitable for choosing described tone component and exports the 6th image; The second binarization unit is suitable for described the 6th image is carried out binary conversion treatment, forms the 8th image; Setting unit, be suitable for for the white pixel in described the 5th image, if the pixel in the predetermined neighborhood of its respective pixel in described the 8th image is black picture element, then this white pixel in described the 5th image is set to black picture element, after traveling through all pixels in described the 5th image, form described the 9th image.In the specific embodiment, described color space with tone component is the HSV color space, and described tone component is the H component.
In addition, described the second binarization unit can comprise: the second binaryzation operating unit is suitable for described the 6th image is carried out obtaining the 7th image after the binaryzation operation; The second judging unit is suitable for judging that whether the number of white pixel in described the 7th image is greater than Second Threshold; The second output unit is suitable for being described the 8th image of output after being then described the 7th image to be carried out inversion operation when the judged result of described the second judging unit, otherwise directly exports as described the 8th image with the 7th image; Described Second Threshold is set in an advantage ratio of the total number of pixels of described the 7th image.
Described mathematical morphology operating unit 107 can comprise: be communicated with district's processing unit, be suitable for described the 9th image is communicated with district's mark, one of the Retention area maximum is communicated with the district, forms the tenth image; The closed operation operating unit is suitable for described the tenth image is carried out the closed operation operation, forms described the 11 image.
Described projection positioning unit 108 can comprise: projecting cell is suitable for described the 11 image is carried out respectively horizontal projection and vertical projection; Second Edge is along determining unit, is suitable for from up to down and projection value that bottom-up line by line determined level projection produces, if projection value surpasses the 3rd threshold value, then the position that described projection value is corresponding is respectively as upper edge and the lower edge of described mouth region; Also be suitable for from left to right and right-to-left is judged the projection value that vertical projection produces by column, if projection value surpasses the 4th threshold value, then the position that described projection value is corresponding is respectively as left margin and the edge, the right of described mouth region; Positioning unit is suitable for determining with four edges, upper and lower, left and right the position of mouth region in the facial image.During implementation, described the 3rd threshold value and the 4th threshold value are 1%~5% of maximal projection value.
Further, the present invention also provides a kind of mouth shape recognition system, comprising: the mouth locating device of above-mentioned facial image; Recognition unit is suitable for based on described mouth locating device the mouth of facial image of input being positioned rear definite described mouth region and carries out the identification of mouth shape.Can with reference to the mouth localization method of above-mentioned facial image, the enforcement of mouth shape recognition methods, not repeat them here as for the mouth locating device of above-mentioned facial image, the implementation of mouth shape recognition system.
To sum up, the mouth localization method of the facial image that embodiment of the present invention provides and device, the recognition methods of mouth shape and system have following beneficial effect at least:
Thereby by the facial image of input being set up immediately the acquisition of adaptive flesh colour model corresponding to the skin distribution of this image, then the mouth that extracts facial image is estimated zone formation the second image and is carried out projection to calculate skin color probability to described complexion model, thereby obtain the 3rd image, and then described the 3rd image is carried out binary conversion treatment form the 5th image, meanwhile described the second image is carried out auxiliary detection based on formed the 8th image of tone component to described the 5th image, then be communicated with again block reservation, a series of mathematical morphology operations such as closed operation, accurately locate mouth region by sciagraphy at last, thereby can realize effectively that the mouth in the facial image accurately locates, thereby can improve the accuracy rate of lip-syncing shape identification.
Wherein, can make the result of binary conversion treatment more accurate by carry out auxiliary detection in conjunction with the tone component, thereby reach the better locating effect of mouth region in the described facial image.
Although the present invention with preferred embodiment openly as above; but it is not to limit the present invention; any those skilled in the art without departing from the spirit and scope of the present invention; can utilize method and the technology contents of above-mentioned announcement that technical solution of the present invention is made possible change and modification; therefore; every content that does not break away from technical solution of the present invention; to any simple modification, equivalent variations and modification that above embodiment does, all belong to the protection domain of technical solution of the present invention according to technical spirit of the present invention.

Claims (27)

1. the mouth localization method of a facial image is characterized in that, comprising:
Eyes in the facial image of input are carried out detection and location, and based on the result behind the location described facial image is carried out geometry correction, form the first image;
From described the first image, extract mouth and estimate the zone, obtain the second image;
Based on described the first image, set up the complexion model of people's face;
Described the second image is carried out projection to calculate skin color probability to described complexion model, obtain the 3rd image;
Described the 3rd image is carried out binary conversion treatment, form the 5th image;
Described the second image is formed the 8th image based on the tone component, and with described the 8th image described the 5th image is carried out auxiliary detection, form the 9th image;
Described the 9th image is carried out the mathematical morphology operation, form the 11 image;
Based on described the 11 image, determine respectively the position of mouth region in the facial image of input with horizontal projection and vertical projection.
2. the mouth localization method of facial image according to claim 1 is characterized in that, extracts mouth and estimate the zone from described the first image, obtains the second image and comprises:
Estimate from left to right the left margin in zone with 25% place of described the first picture traverse as described mouth, estimate the edge, the right in zone with 75% place of described the first picture traverse as described mouth;
Estimate from top to bottom the upper edge in zone with 65% place of described the first picture altitude as described mouth, estimate the lower edge in zone with the lower edge of described the first image as described mouth;
Four edges cut out described mouth and estimate described the second image of zone formation along upper and lower, left and right.
3. the mouth localization method of facial image according to claim 1 is characterized in that, described based on described the first image, the complexion model of setting up people's face comprises:
With the color space of described the first image transitions to brightness and chrominance separation;
Choose the chromatic component in the described color space, statistics is set up described complexion model.
4. the mouth localization method of facial image according to claim 3 is characterized in that, the color space of described brightness and chrominance separation is the Lab color space.
5. according to claim 1 or the mouth localization method of 3 or 4 described facial images, it is characterized in that, described complexion model is Gauss model.
6. the mouth localization method of facial image according to claim 1 is characterized in that, described the 3rd image is carried out binary conversion treatment, forms the 5th image and comprises:
Described the 3rd image is carried out obtaining the 4th image after the binaryzation operation;
Whether judge the number of white pixel in described the 4th image greater than first threshold, be then described the 4th image to be carried out exporting described the 5th image behind the inversion operation, otherwise directly export as described the 5th image with the 4th image;
Described first threshold multiply by greater than 50% ratio in the total number of pixels of described the 4th image to be set.
7. the mouth localization method of facial image according to claim 1 is characterized in that, described the second image is formed the 8th image based on the tone component comprise:
With described the second image transitions to the color space with tone component;
Choose described tone component and export the 6th image;
Described the 6th image is carried out binary conversion treatment, form the 8th image.
8. the mouth localization method of facial image according to claim 7 is characterized in that, described color space with tone component is the HSV color space, and described tone component is the H component.
9. the mouth localization method of facial image according to claim 7 is characterized in that, described described the 6th image is carried out binary conversion treatment, forms the 8th image and comprises:
Described the 6th image is carried out obtaining the 7th image after the binaryzation operation;
Whether judge the number of white pixel in described the 7th image greater than Second Threshold, be then described the 7th image to be carried out exporting described the 8th image behind the inversion operation, otherwise directly export as described the 8th image with the 7th image;
Described Second Threshold multiply by greater than 50% ratio in the total number of pixels of described the 7th image to be set.
10. the mouth localization method of facial image according to claim 1 is characterized in that, describedly with described the 8th image described the 5th image is carried out auxiliary detection, forms the 9th image and comprises:
For the white pixel in described the 5th image, if the pixel in the predetermined neighborhood of its respective pixel in described the 8th image is black picture element, then this white pixel in described the 5th image is set to black picture element;
After traveling through all pixels in described the 5th image, form described the 9th image.
11. the mouth localization method of facial image according to claim 1 is characterized in that, described the 9th image is carried out the mathematical morphology operation, forms the 11 image and comprises:
Described the 9th image is communicated with district's mark, and a connection district of Retention area maximum, the tenth image formed;
Described the tenth image is carried out the closed operation operation, form described the 11 image.
12. the mouth localization method of facial image according to claim 1 is characterized in that, and is described based on described the 11 image, the position with mouth region in the facial image of horizontal projection and the definite input of vertical projection comprises respectively:
Described the 11 image is carried out respectively horizontal projection and vertical projection;
From up to down and the projection value that produces of bottom-up line by line determined level projection, if projection value surpasses the 3rd threshold value, then the position that described projection value is corresponding is respectively as upper edge and the lower edge of described mouth region;
From left to right and right-to-left judge by column the projection value that vertical projection produces, if projection value surpasses the 4th threshold value, then the position that described projection value is corresponding is respectively as left margin and the edge, the right of described mouth region;
Determine the position of mouth region in the facial image with four edges, upper and lower, left and right.
13. the mouth localization method of facial image according to claim 12 is characterized in that, described the 3rd threshold value and the 4th threshold value are 1%~5% of maximal projection value.
14. the mouth shape recognition methods of a facial image is characterized in that, positions with the mouth of each described mouth localization method of claim 1 to 13 to the facial image of input, and carries out the identification of mouth shape based on the described mouth region of determining behind the location.
15. the mouth locating device of a facial image is characterized in that, comprising:
Pretreatment unit is suitable for the eyes in the facial image of input are carried out detection and location, and based on the result behind the location described facial image is carried out geometry correction, forms the first image;
Extraction unit is suitable for extracting mouth from described the first image and estimates the zone, obtains the second image;
Complexion model is set up the unit, is suitable for setting up the complexion model of people's face based on described the first image;
The skin color probability computing unit is suitable for described the second image is carried out projection to calculate skin color probability to described complexion model, obtains the 3rd image;
The first binarization unit is suitable for described the 3rd image is carried out binary conversion treatment, forms the 5th image;
Tone auxiliary detection unit is suitable for described the second image is formed the 8th image based on the tone component, and with described the 8th image described the 5th image is carried out auxiliary detection, forms the 9th image;
The mathematical morphology operating unit is suitable for described the 9th image is carried out the mathematical morphology operation, forms the 11 image;
The projection positioning unit is suitable for based on described the 11 image, determines respectively the position of mouth region in the facial image of input with horizontal projection and vertical projection.
16. the mouth locating device of facial image according to claim 15 is characterized in that, described extraction unit comprises:
First side is suitable for estimating as described mouth with 25% place of described the first picture traverse from left to right the left margin in zone along determining unit, estimates the edge, the right in zone as described mouth with 75% place of described the first picture traverse; Estimate from top to bottom the upper edge in zone with 65% place of described the first picture altitude as described mouth, estimate the lower edge in zone with the lower edge of described the first image as described mouth;
The cutting unit is suitable for along described first side along determining unit four edges, determined upper and lower, left and right and cuts out described mouth and estimate the zone and form described the second image.
17. the mouth locating device of facial image according to claim 15 is characterized in that, described complexion model is set up the unit and is comprised:
The first converting unit is suitable for the color space of described the first image transitions to brightness and chrominance separation;
Statistics is set up model unit, is suitable for choosing the chromatic component in the described color space, and statistics is set up described complexion model.
18. the mouth locating device of facial image according to claim 17 is characterized in that, the color space of described brightness and chrominance separation is the Lab color space.
19. according to claim 15 or the mouth locating device of 17 or 18 described facial images, it is characterized in that, described complexion model is Gauss model.
20. the mouth locating device of facial image according to claim 15 is characterized in that, described the first binarization unit comprises:
The first binaryzation operating unit is suitable for described the 3rd image is carried out obtaining the 4th image after the binaryzation operation;
The first judging unit is suitable for judging that whether the number of white pixel in described the 4th image is greater than first threshold;
The first output unit is suitable for being described the 5th image of output after being then described the 4th image to be carried out inversion operation when the judged result of described the first judging unit, otherwise directly exports as described the 5th image with the 4th image;
Described first threshold multiply by greater than 50% ratio in the total number of pixels of described the 4th image to be set.
21. the mouth locating device of facial image according to claim 15 is characterized in that, described tone auxiliary detection unit comprises:
The second converting unit is suitable for described the second image transitions to the color space with tone component;
The 3rd output unit is suitable for choosing described tone component and exports the 6th image;
The second binarization unit is suitable for described the 6th image is carried out binary conversion treatment, forms the 8th image;
Setting unit, be suitable for for the white pixel in described the 5th image, if the pixel in the predetermined neighborhood of its respective pixel in described the 8th image is black picture element, then this white pixel in described the 5th image is set to black picture element, after traveling through all pixels in described the 5th image, form described the 9th image.
22. the mouth locating device of facial image according to claim 21 is characterized in that, described color space with tone component is the HSV color space, and described tone component is the H component.
23. the mouth locating device of facial image according to claim 21 is characterized in that, described the second binarization unit comprises:
The second binaryzation operating unit is suitable for described the 6th image is carried out obtaining the 7th image after the binaryzation operation;
The second judging unit is suitable for judging that whether the number of white pixel in described the 7th image is greater than Second Threshold;
The second output unit is suitable for being described the 8th image of output after being then described the 7th image to be carried out inversion operation when the judged result of described the second judging unit, otherwise directly exports as described the 8th image with the 7th image;
Described Second Threshold multiply by greater than 50% ratio in the total number of pixels of described the 7th image to be set.
24. the mouth locating device of facial image according to claim 15 is characterized in that, described mathematical morphology operating unit comprises:
Be communicated with district's processing unit, be suitable for described the 9th image is communicated with district's mark, one of the Retention area maximum is communicated with the district, forms the tenth image;
The closed operation operating unit is suitable for described the tenth image is carried out the closed operation operation, forms described the 11 image.
25. the mouth locating device of facial image according to claim 15 is characterized in that, described projection positioning unit comprises:
Projecting cell is suitable for described the 11 image is carried out respectively horizontal projection and vertical projection;
Second Edge is along determining unit, is suitable for from up to down and projection value that bottom-up line by line determined level projection produces, if projection value surpasses the 3rd threshold value, then the position that described projection value is corresponding is respectively as upper edge and the lower edge of described mouth region; Also be suitable for from left to right and right-to-left is judged the projection value that vertical projection produces by column, if projection value surpasses the 4th threshold value, then the position that described projection value is corresponding is respectively as left margin and the edge, the right of described mouth region;
Positioning unit is suitable for determining with four edges, upper and lower, left and right the position of mouth region in the facial image.
26. the mouth locating device of facial image according to claim 25 is characterized in that, described the 3rd threshold value and the 4th threshold value are 1%~5% of maximal projection value.
27. the mouth shape recognition system of a facial image is characterized in that, comprising: the mouth locating device of each described facial image of claim 15 to 26; Recognition unit is suitable for based on described mouth locating device the mouth of facial image of input being positioned rear definite described mouth region and carries out the identification of mouth shape.
CN2011103281380A 2011-10-25 2011-10-25 Method and device for positioning mouth part of human face image as well as method and system for recognizing mouth shape Pending CN103077368A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2011103281380A CN103077368A (en) 2011-10-25 2011-10-25 Method and device for positioning mouth part of human face image as well as method and system for recognizing mouth shape

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011103281380A CN103077368A (en) 2011-10-25 2011-10-25 Method and device for positioning mouth part of human face image as well as method and system for recognizing mouth shape

Publications (1)

Publication Number Publication Date
CN103077368A true CN103077368A (en) 2013-05-01

Family

ID=48153893

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011103281380A Pending CN103077368A (en) 2011-10-25 2011-10-25 Method and device for positioning mouth part of human face image as well as method and system for recognizing mouth shape

Country Status (1)

Country Link
CN (1) CN103077368A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103390282A (en) * 2013-07-30 2013-11-13 百度在线网络技术(北京)有限公司 Image tagging method and device
CN105975896A (en) * 2015-03-12 2016-09-28 欧姆龙株式会社 Image processing apparatus and image processing method
CN106067016A (en) * 2016-07-20 2016-11-02 深圳市飘飘宝贝有限公司 A kind of facial image eyeglass detection method and device
CN107798314A (en) * 2017-11-22 2018-03-13 北京小米移动软件有限公司 Skin color detection method and device
CN107911608A (en) * 2017-11-30 2018-04-13 西安科锐盛创新科技有限公司 The method of anti-shooting of closing one's eyes
WO2018201662A1 (en) * 2017-05-05 2018-11-08 广州视源电子科技股份有限公司 Lip color rendering method, apparatus, and electronic device
CN109639960A (en) * 2017-10-05 2019-04-16 卡西欧计算机株式会社 Image processing apparatus, image processing method and recording medium
CN109820491A (en) * 2019-01-28 2019-05-31 中山大学孙逸仙纪念医院 Prevent asphyxia neonatorum induction chip
CN110580336A (en) * 2018-06-08 2019-12-17 北京得意音通技术有限责任公司 Lip language word segmentation method and device, storage medium and electronic equipment
CN110781840A (en) * 2019-10-29 2020-02-11 深圳市梦网百科信息技术有限公司 Nose positioning method and system based on skin color detection
CN111062965A (en) * 2019-12-26 2020-04-24 西华大学 Low-complexity double-threshold multi-resolution mouth detection method based on assembly line
CN111062266A (en) * 2019-11-28 2020-04-24 东华理工大学 Face point cloud key point positioning method based on cylindrical coordinates
CN111526279A (en) * 2017-01-19 2020-08-11 卡西欧计算机株式会社 Image processing apparatus, image processing method, and recording medium
CN113141416A (en) * 2021-05-12 2021-07-20 广州诚为信息技术有限公司 SDWAN's enterprise operation management platform
CN113436131A (en) * 2020-03-04 2021-09-24 上海微创卜算子医疗科技有限公司 Defect detection method, defect detection device, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040017930A1 (en) * 2002-07-19 2004-01-29 Samsung Electronics Co., Ltd. System and method for detecting and tracking a plurality of faces in real time by integrating visual ques
CN1512453A (en) * 2002-12-30 2004-07-14 佳能株式会社 Image processing method and device
CN1710595A (en) * 2005-06-16 2005-12-21 上海交通大学 Mouth-corner positioning method
CN101393597A (en) * 2007-09-19 2009-03-25 上海银晨智能识别科技有限公司 Method for identifying front of human face

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040017930A1 (en) * 2002-07-19 2004-01-29 Samsung Electronics Co., Ltd. System and method for detecting and tracking a plurality of faces in real time by integrating visual ques
CN1512453A (en) * 2002-12-30 2004-07-14 佳能株式会社 Image processing method and device
CN1710595A (en) * 2005-06-16 2005-12-21 上海交通大学 Mouth-corner positioning method
CN101393597A (en) * 2007-09-19 2009-03-25 上海银晨智能识别科技有限公司 Method for identifying front of human face

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王晓平等: "一种面向唇读的彩色人脸图像唇部定位方法", 《第十三届全国图像图形学学术会议论文集》 *

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103390282B (en) * 2013-07-30 2016-04-13 百度在线网络技术(北京)有限公司 Image labeling method and device thereof
CN103390282A (en) * 2013-07-30 2013-11-13 百度在线网络技术(北京)有限公司 Image tagging method and device
CN105975896A (en) * 2015-03-12 2016-09-28 欧姆龙株式会社 Image processing apparatus and image processing method
CN105975896B (en) * 2015-03-12 2019-05-28 欧姆龙株式会社 Image processing apparatus and image processing method
CN106067016A (en) * 2016-07-20 2016-11-02 深圳市飘飘宝贝有限公司 A kind of facial image eyeglass detection method and device
CN111526279A (en) * 2017-01-19 2020-08-11 卡西欧计算机株式会社 Image processing apparatus, image processing method, and recording medium
CN111526279B (en) * 2017-01-19 2022-10-11 卡西欧计算机株式会社 Image processing apparatus, image processing method, and recording medium
WO2018201662A1 (en) * 2017-05-05 2018-11-08 广州视源电子科技股份有限公司 Lip color rendering method, apparatus, and electronic device
CN109639960A (en) * 2017-10-05 2019-04-16 卡西欧计算机株式会社 Image processing apparatus, image processing method and recording medium
CN109639960B (en) * 2017-10-05 2020-12-29 卡西欧计算机株式会社 Image processing apparatus, image processing method, and recording medium
CN107798314A (en) * 2017-11-22 2018-03-13 北京小米移动软件有限公司 Skin color detection method and device
CN107911608A (en) * 2017-11-30 2018-04-13 西安科锐盛创新科技有限公司 The method of anti-shooting of closing one's eyes
CN110580336A (en) * 2018-06-08 2019-12-17 北京得意音通技术有限责任公司 Lip language word segmentation method and device, storage medium and electronic equipment
CN109820491A (en) * 2019-01-28 2019-05-31 中山大学孙逸仙纪念医院 Prevent asphyxia neonatorum induction chip
CN110781840A (en) * 2019-10-29 2020-02-11 深圳市梦网百科信息技术有限公司 Nose positioning method and system based on skin color detection
CN110781840B (en) * 2019-10-29 2022-08-26 深圳市梦网视讯有限公司 Nose positioning method and system based on skin color detection
CN111062266A (en) * 2019-11-28 2020-04-24 东华理工大学 Face point cloud key point positioning method based on cylindrical coordinates
CN111062266B (en) * 2019-11-28 2022-07-15 东华理工大学 Face point cloud key point positioning method based on cylindrical coordinates
CN111062965A (en) * 2019-12-26 2020-04-24 西华大学 Low-complexity double-threshold multi-resolution mouth detection method based on assembly line
CN111062965B (en) * 2019-12-26 2023-01-17 西华大学 Low-complexity double-threshold multi-resolution mouth detection method based on assembly line
CN113436131A (en) * 2020-03-04 2021-09-24 上海微创卜算子医疗科技有限公司 Defect detection method, defect detection device, electronic equipment and storage medium
CN113141416A (en) * 2021-05-12 2021-07-20 广州诚为信息技术有限公司 SDWAN's enterprise operation management platform

Similar Documents

Publication Publication Date Title
CN103077368A (en) Method and device for positioning mouth part of human face image as well as method and system for recognizing mouth shape
CN106682601B (en) A kind of driver's violation call detection method based on multidimensional information Fusion Features
CN103824081B (en) Method for detecting rapid robustness traffic signs on outdoor bad illumination condition
CN101916370B (en) Method for processing non-feature regional images in face detection
CN102254152B (en) License plate location method based on color change points and color density
CN102332157B (en) Method for eliminating shadow
CN103034852B (en) The detection method of particular color pedestrian under Still Camera scene
CN103198315B (en) Based on the Character Segmentation of License Plate of character outline and template matches
CN103366190A (en) Method for identifying traffic sign
CN102194108B (en) Smile face expression recognition method based on clustering linear discriminant analysis of feature selection
CN109657632A (en) A kind of lane detection recognition methods
CN105354554A (en) Color and singular value feature-based face in-vivo detection method
CN102999753A (en) License plate locating method
CN103020579A (en) Face recognition method and system, and removing method and device for glasses frame in face image
CN101702197A (en) Method for detecting road traffic signs
CN103679168A (en) Detection method and detection device for character region
CN105354530A (en) Vehicle body color identification method and apparatus
CN106529592A (en) License plate recognition method based on mixed feature and gray projection
CN103119625B (en) Video character separation method and device
CN106919910B (en) Traffic sign identification method based on HOG-CTH combined features
CN106529461A (en) Vehicle model identifying algorithm based on integral characteristic channel and SVM training device
CN103390167A (en) Multi-characteristic layered traffic sign identification method
CN108564034A (en) The detection method of operating handset behavior in a kind of driver drives vehicle
CN102306276A (en) Method for identifying color of vehicle body in video vehicle image based on block clustering
CN102184404B (en) Method and device for acquiring palm region in palm image

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20130501