CN104346601A

CN104346601A - Object identification method and equipment

Info

Publication number: CN104346601A
Application number: CN201310320936.8A
Authority: CN
Inventors: 王喜顺; 陈曾; 李献; 温东超; 朱福国
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2013-07-26
Filing date: 2013-07-26
Publication date: 2015-02-11
Anticipated expiration: 2033-07-26
Also published as: CN104346601B

Abstract

The invention discloses an object identification method and equipment. The equipment comprises an extraction unit which is used for extracting characteristics corresponding to object property pairs of an object area based on difference of the object property pairs with respect to each object property pair in a set of predefined object properties through configuration; and an identification unit which is used for identifying the object properties of the object area based on the extracted characteristics of the object area through configuration.

Description

Object identifying method and equipment

Technical field

The present invention relates to the method and apparatus for the Object identifying in image.More specifically, the present invention relates to the method and apparatus of the object properties for the subject area in recognition image.

Background technology

In recent years, the object detection/identification in image is popularly applied to the field of image procossing, computer vision and pattern-recognition, and plays important effect wherein, and object can be any one in face, hand, health etc.

A kind of common object detection/identification is the face in detection and Identification image.In face recognition, usually realize the identification of the attribute (such as, expression) of each face comprised in the image of at least one face image, and there is the multiple technology for realizing such face recognition.

Hereafter by for the Facial expression recognition of the face comprised in image to explain the current techniques for the face's Attribute Recognition in image of the prior art.Ultimate principle for the method for Facial expression recognition follows the framework shown in Fig. 1.

More specifically, for the face image of input, first Facial expression recognition method obtains the face area (face detection) comprised in image, and (face's registration) aligns in the face corresponding to this face area that the facial feature points that then basis is extracted in face area may be in different attitude.Then, the method extracts the feature (feature extraction) of the face image through alignment, and finally determines the expression corresponding to this face area of face according to extracted feature.

For feature extraction, certain methods pays close attention to the marking area (salient region) in face image, here as shown in Figure 2, marking area refers to the region (such as eye areas, nasal area, mouth region etc.) being usually regarded as representing the characteristic part of face in face image.

In such a case, feature (that is, the left eye region feature f of four marking areas is extracted respectively _{left eye}, right eye region feature f _{right eye}, nasal area feature f _nose, and mouth region feature f _mouth), and by these four marking area features being attached to the feature (f coming together to represent face _always), thus,

F _always=f _{left eye}+ f _{right eye}+ f _nose+ f _mouth

Feature f _alwaysbe used to the expression predicting the face corresponding to face image.

Usually, the method based on the marking area in face area like this extracts the feature in the feature of marking area instead of the whole region of face image, then predict the expression of face according to extracted feature, as Fig. 3 of the process flow diagram of the Facial expression recognition based on the marking area in face image of the prior art is shown left part shown in.The right part of Fig. 3 schematically shows the example of such Facial expression recognition method based on marking area, wherein, after the some facial feature points in face image being detected, four marking areas (that is, left eye region, right eye region, nasal area and mouth region) are correspondingly located.

Industrial Technology Research Institute (TW) U.S. Patent application US2012/0169895A1 under one's name discloses a kind of method for catching countenance based on the marking area in face image.The method catches the marking area feature of the face image to generate target feature vector from four marking areas respectively, then this target feature vector and multiple previously stored proper vector is compared to generate parameter value.When parameter value is higher than threshold value, the method selects one of image as target image.Based on this target image, Facial expression recognition and assorting process can be performed further.Such as, recognition target image to obtain countenance state, and is classified to image according to countenance state.

Substituting as marking area, the representative area of the face image of other type can be used to carry out face's Attribute Recognition.

The set that Mitsubishi electric research laboratories, INC U.S. Patent application US2010/0111375A1 under one's name discloses a kind of sub-block (patch) based on comprising in face image carrys out the method for the face's attribute in recognition image.More specifically, face image is divided into one group of sub-block by the method, and each sub-block and prototype sub-block is compared one by one the prototype sub-block of determining to mate, and according to the incompatible one group of attribute determining face of property set be associated with the prototype sub-block of coupling.Here, this sub-block set of extracting in the method can be equal to each several part in marking area.

Renesas Electronics Corporation U.S. Patent application US 2012/0076418A1 under one's name discloses a kind of face attributes estimation method and apparatus.The method extracts specific region from face area, and sets the zonule in this specific region.Then, the method utilize similarity calculation method to calculate in this zonule and face's ingredient of storing one by one each between similarity, to determine face's attribute.Here, except the quantity of specific region, the specific region used in the method can be equal to marking area.

Said method of the prior art usually from marking area or its equivalent regions (such as, multiple sub-block in face image or a little specific region) extract feature, and (namely extracted feature and each corresponding in one group of predefined feature of multiple known face's attribute are compared, compare one to one), to carry out face's Attribute Recognition.

In addition, not changed between recognition phase by the marking area of locating or equivalent regions in the face image that be identified, therefore between recognition phase for all comparisons, only have one and the constant proper vector deriving from face image.That is, a proper vector from face image is only had to be used to compare with corresponding to each in the multiple previously stored proper vector of multiple known face subordinate property.

But, between recognition phase, use a constant characteristic of the face area that will be identified may be efficient not to such an extent as to face area can not be identified exactly for all comparison one to one.

Should point out, some marking areas may not have distinctiveness (discriminative) for the expression of some types.Such as, for sad expression and neutral expression, just there is not very large difference in nasal area, therefore, nasal area does not have distinctiveness for the identification of sad expression and neutral expression.Another problem is that the some parts in marking area does not have distinctiveness.Such as, for sad expression and neutral expression, the eyebrow part of eye areas does not have distinctiveness.That is, if the marking area of locating and thus from the feature of this extracted region for being constant with the comparing of set of predefined face attribute, then the some parts in some regions and region may be redundancies for the identification of the expression of some types some expression centerings.

As mentioned above, still need a kind of can based on from having more the method that distinctive feature accurately identifies the attribute of face area in the face area in image.

Summary of the invention

The identification that present invention is directed to the object in image is developed, and is intended to solve problem as above.

According to an aspect of the present invention, provide a kind of method for the subject area in recognition image, the method comprises extraction step, for each object properties pair in the set for predefined object properties, extract the feature right corresponding to these object properties of subject area based on the diversity that these object properties are right; And identification step, for the object properties of the feature identification subject area based on extracted subject area.

According to a further aspect in the invention, provide a kind of equipment for the subject area in recognition image, comprise: extraction unit, be arranged to for each object properties pair in the set of predefined object properties, extract the feature right corresponding to these object properties of subject area based on the diversity that these object properties are right; And recognition unit, be arranged to the object properties of the feature identification subject area based on extracted subject area.

Method and apparatus according to the invention, for each object properties pair in the set of predefined object properties, is extracted the feature right corresponding to these object properties of subject area, and this feature is used for Object identifying based on the diversity that these object properties are right.Therefore, recognition efficiency and accuracy rate can improve.

Read the following explanation of exemplary embodiment with reference to accompanying drawing, further feature of the present invention will become fairly obvious.

Accompanying drawing explanation

To be incorporated in instructions and the accompanying drawing forming a part for instructions shows embodiments of the invention, and together with the description for explaining principle of the present invention.In the accompanying drawings, similar Reference numeral indicates similar project.

Fig. 1 illustrates the canonical process of Facial expression recognition of the prior art.

Fig. 2 illustrates the typical marking area in face.

Fig. 3 is the process flow diagram that Facial expression recognition method of the prior art is shown.

Fig. 4 is the block diagram of the exemplary hardware arrangement that the computer system that can realize embodiments of the invention is shown.

Fig. 5 illustrates the process flow diagram according to object properties recognition methods of the present invention.

Fig. 6 illustrates the block diagram according to object properties identification equipment of the present invention.

Fig. 7 is the diagram of the face area explained in face image.

Fig. 8 schematically shows the unique point in face area.

Fig. 9 is the process flow diagram of the process illustrated in extraction step.

Figure 10 schematically shows the location of the organic region in face area.

Figure 11 schematically shows the right example of countenance.

Figure 12 is the process flow diagram of the determination schematically showing the right template of countenance.

Figure 13 illustrates some exemplary the average images.

Figure 14 illustrate for countenance centering each expression by the image of corresponding segmentation.

Figure 15 illustrates from the right template of the countenance that obtains of segmentation image of each expression for countenance centering.

Figure 16 illustrates the location of the different block of pixels depended in the face area right for countenance of the right template of countenance.

Figure 17 is the process flow diagram of the process illustrated in characteristic extraction step.

Figure 18 is the process flow diagram of the process during identification step is shown a kind of realizes.

Figure 19 is the process flow diagram of the process illustrated in the another kind realization of identification step.

Embodiment

Hereafter describe embodiments of the invention in detail with reference to the accompanying drawings.

It should be noted that Reference numeral similar in the accompanying drawings and the similar project of letter instruction, and therefore once a project is defined in an accompanying drawing, then for accompanying drawing subsequently without the need to discussing it again.

First the implication of some term used in context of the present disclosure will be explained.

In context of the present disclosure, image will refer to polytype image, such as coloured image, gray level image etc.Because process of the present invention performs mainly for gray level image, therefore unless stated otherwise, the image otherwise in the disclosure will refer to the gray level image comprising multiple pixel.

Should point out, solution of the present invention also can be applicable to the image (such as coloured image) of other type, if such image can be converted into gray level image and process of the present invention can for through conversion gray level image perform.

Image can comprise at least one object images usually, and object images comprises subject area usually, and therefore in context of the present disclosure, object images and subject area are equal to each other and alternately use.In common image to as if image in face.

The feature of the subject area in image normally represents the feature of the characteristic of such subject area, and can be color characteristic, textural characteristics, shape facility etc. usually.Conventional feature is color characteristic, and it is the feature of overall importance of representative image and usually obtained by the color histogram based on each color section (color bin).The feature of image is obtained usually in vector form, and each composition of vector corresponds to a color section.

Object properties refer to the apparent state of the object that may correspond in different condition, and object properties can belong to different classifications.For face, the classification of face's attribute can be selected from comprise countenance, the one corresponded to when face is face in the sex of people of this face and the group at the age of people, face's attribute classification is therefore not limited, and can be other classification.When face's attribute corresponds to countenance, face's attribute can be a kind of expression (such as, sadness, smile, laugh etc.).

Certainly, object properties are therefore not limited, and such as, object can be the person, and object properties may correspond in when the different conditions of people when running, standing, go down on one's kness or lying low etc.

Object properties to be by the object properties of any predefined quantity in the set being contained in predefined object properties form right, in this set, all object properties can be distinguished in a certain category set, and this set can be prepared, the set of these predefined object properties can form at least one object properties pair, and each object properties are to the object properties with equal number.

The object properties that object properties centering comprises can be selected arbitrarily by the set from these predefined object properties, and in such a case, the set of these predefined object properties can comprise C _n ^tindividual object properties pair, wherein n is the quantity of the object properties in this set, and t is the quantity of the object properties that object properties centering comprises.

Preferably, the quantity of object properties that object properties centering comprises can be 2.

Preferably, the object properties of object properties centering can be object properties as follows, and the difference namely between these object properties is large and or even contrary.Such as, for face, object properties to being made up of laugh expression and expression of crying especially, thus have distinctiveness for such object properties to extracted part more.

In the disclosure, term " first ", " second " etc. only for distinguishing element or step, instead of want order instruction time, prioritizing selection or importance.

Fig. 4 is the block diagram of the hardware configuration that the computer system 1000 can implementing embodiments of the invention is shown.

As shown in Figure 4, computer system comprises computing machine 1110.Computing machine 1110 comprises processing unit 1120, system storage 1130, non-removable non-volatile memory interface 1140, removable non-volatile memory interface 1150, user's input interface 1160, network interface 1170, video interface 1190 and exports peripheral interface 1195, and they are connected by system bus 1121.

System storage 1130 comprises ROM(ROM (read-only memory)) 1131 and RAM(random access memory) 1132.BIOS(Basic Input or Output System (BIOS)) 1133 to reside in ROM1131.Operating system 1134, application program 1135, other program module 1136 and some routine datas 1137 reside in RAM1132.

Non-removable nonvolatile memory 1141(such as hard disk) be connected to non-removable non-volatile memory interface 1140.Non-removable nonvolatile memory 1141 such as can store operating system 1144, application program 1145, other program module 1146 and some routine datas 1147.

Removable nonvolatile memory (such as floppy disk 1151 and CD-ROM drive 1155) is connected to removable non-volatile memory interface 1150.Such as, diskette 1 152 can insert floppy disk 1151, and CD(compact-disc) 1156 can insert CD-ROM drive 1155.

Such as the input equipment of mouse 1161 and keyboard 1162 is connected to user's input interface 1160.

Computing machine 1110 is connected to remote computer 1180 by network interface 1170.Such as, network interface 1170 can be connected to remote computer 1180 through LAN (Local Area Network) 1171.Alternatively, network interface 1170 can be connected to modulator-demodular unit (modulator-demodulator) 1172, and modulator-demodular unit 1172 is connected to remote computer 1180 through wide area network 1173.

Remote computer 1180 can comprise the storer 1181 of such as hard disk, and it stores remote application 1185.

Video interface 1190 is connected to monitor 1191.

Export peripheral interface 1195 and be connected to printer 1196 and loudspeaker 1197.

Computer system shown in Fig. 4 is only illustrative, and never intends limit the present invention, its application or use.

Computer system shown in Fig. 4 can be implemented as standalone computer for any embodiment, or the disposal system in equipment, wherein can remove one or more unnecessary assembly or can add one or more additional assembly.

Hereafter describe the object identifying method according to basic embodiment of the present invention with reference to Fig. 5, Fig. 5 illustrates according to the process in the method for basic embodiment of the present invention.

In step S100(hereinafter referred to as extraction step) in, for each object properties pair in the set of predefined object properties, extract the feature right corresponding to these object properties of subject area based on the diversity (dissimilarity) that these object properties are right.

As mentioned above, all object properties of the set of these predefined object properties belong to same classification, and object properties are to being made up of (such as, the two) object properties of any predetermined quantity comprised in the set of these predefined object properties.

As an alternative, object properties are to (such as, the two) object properties that can be the predetermined quantity meeting predetermined relationship between which.

In one implementation, subject area can be the subject area being aligned (align), and the alignment of subject area (such as based on the unique point detected in subject area) can be realized by many ways.Should point out, whether subject area aligns optional for the realization of extraction operation.

In step S200(hereinafter referred to as identification step) in, based on the object properties of the feature identification subject area of extracted subject area.

In one implementation, the process in extraction step can comprise the process (positioning step) corresponding at least one block of the right template of these object properties for locating in this subject area, this template characterize these object properties between diversity; And for extracting the process (characteristic extraction step) corresponding to the right feature of these object properties of this subject area based at least one located block.

Here, template can be regarded as the diversity template of the diversity between the right characterizing objects attribute of object properties, and at least one the different block of pixels between the image of the object properties usually comprised by object properties centering is formed.In fact, each different block of pixels may correspond to the respective pixel block between the image of the object properties of the predetermined quantity comprised in object properties centering, this respective pixel block is positioned at the correspondence position of each image and has corresponding size, it is mapped onto one another that the position of the different block of pixels wherein in each image and size can be dependent on predetermined rule (ratio between the size such as, depending on the image of each object properties when each image has different sizes).

Preferably, the object properties that the image of subject area and object properties centering comprise can pretreated (such as, be aligned), to have formed objects, and in the case, each in different block of pixels in template may correspond to the respective pixel block between the image of the object properties of the predetermined quantity comprised in object properties centering, and this respective pixel block is arranged in the same position of each image and has formed objects.

Therefore, can be by the block of pixels of locating according to such position of different block of pixels and size for these object properties at least one block to location from this subject area, as long as block of pixels can map mutually according to predetermined rule, and preferably this block of pixels has identical position and size.

The large I of each block of pixels is freely set, and can not affect the realization of solution of the present invention.

In one implementation, the template that object properties are right is by such as under type realization: two the mean object area images corresponding respectively to two object properties that this object properties centering comprises are divided into multiple pieces that correspond to each other; Extract the feature of each in multiple pieces of the mean object area image that each corresponding with each object properties is divided; Similarity between the feature determining the corresponding blocks in these two the mean object area images be divided; And select the such block in these two the mean object area images be divided to form template, the similarity between this block is lower than predefined threshold value.

Here, corresponding divide each image referring to the object properties of object properties centering can by the mode division by correspondence, thus each in each in the block through dividing in an object properties image can be mapped in another object properties image according to the pre-defined rule block through dividing.Preferably, each image of the object properties of object properties centering has formed objects, thus identical for the partition mode of each image and there is identical scale, thus the block through dividing in an object properties image has identical position and size with the corresponding block through dividing in another object properties image.Partition mode can be any pattern, such as grid etc.

The template that object properties are right can be prepared and store, or can be produced during extraction operation.The operation obtaining the right template of object properties can be comprised in extraction step, or can not be comprised in extraction step.

Mean object area image corresponding to object properties by previously prepared in many ways, and in generality realizes, can be produced by being averaged by the multiple analogical object area images with formed objects corresponding to same target attribute.

Preferably, position fixing process can perform, to improve operating efficiency further based on the auxiliary area comprised in subject area (auxiliary region).Such as, auxiliary area (can be depended on the position of the identified unique point in subject area) by many ways and locate.In such a case, position fixing process can locate at least one block corresponding to the template of the right diversity of characterizing objects attribute in auxiliary area, and the template of the diversity that characterizing objects attribute is right also can be determined based on the such auxiliary area in the image of the object properties of object properties centering, instead of is determined based on the entirety of the image of the object properties of object properties centering.

In one implementation, characteristic extraction procedure can comprise extract feature from each at least one block of locating subject area, and the feature of each extracted block is linked the feature as subject area.Therefore, the final feature extracted is usually expressed as the form of vector, and each component in vector corresponds to each block.

In the process of identification step, the identification of object properties can be implemented in a number of ways.

In one implementation, identification can be realized by so-called " one to one (one against one) " mode, and in the manner in which, for the set of predefined object properties, object properties can at C _n ^tvote in wheel, wherein n is the quantity of the object properties comprised in this set, and t is the quantity of the object properties that object properties centering comprises and is preferably 2.The object properties with top score will be confirmed as object properties.

More specifically, this identifying can comprise identification of steps, for each object properties pair in the set for predefined object properties, based on object properties in two object properties that this object properties centering that the signature identification right corresponding to these object properties of subject area is corresponding with this subject area comprises, and the score of the object properties corresponding to this subject area is increased predetermined value, wherein, the whole object properties comprised in the set of this predefined object properties have identical initial score; And attribute determining step, for determining that the object properties with top score in the set of these predefined object properties are the object properties of this subject area.

In another implementation, identification can be realized by so-called " victory one (one beating one) " mode, wherein, the quantity of the predefined object properties comprised in the set of predefined object properties is n, object properties can be determined in n-1 wheel, only have in wherein taking turns one and will advance to next round for object properties to the attribute of winning, and the attribute of finally winning will be confirmed as object properties.

More specifically, this identifying can comprise identification of steps, for the object properties pair of in the set for predefined object properties, based on object properties in two object properties that this object properties centering that the signature identification right corresponding to these object properties of subject area is corresponding with this subject area comprises, and attribute determining step, for in the set based on the object properties corresponding to this subject area and these predefined object properties except these object properties to except residue object properties determine the object properties of this subject area, wherein, if the quantity of residue object properties equals 0, object properties then corresponding to this subject area are confirmed as the object properties of this subject area, otherwise by the set of the object properties corresponding to this subject area and these predefined object properties except these object properties to except residue object properties again return group for new object properties set, and this identification of steps and attribute determining step are performed successively for this new object properties set.

Should point out, said method can perform for a subject area that can comprise in the image of at least one subject area at every turn, and can repeat the number of times identical with the quantity of subject area, and one of them subject area only comprises an object that will be identified.

Fig. 6 illustrates the block diagram according to object identification device of the present invention.

Equipment 600 for the identification of the subject area in image can comprise extraction unit 601, be configured to for each object properties pair in the set of predefined object properties, extract the feature right corresponding to these object properties of subject area based on the diversity that these object properties are right; And recognition unit 602, be configured to the object properties of the feature identification subject area based on extracted subject area.

Preferably, extraction unit 601 can comprise positioning unit 601-1, is arranged at least one block corresponding to the right template of these object properties in this subject area of location, this template characterize these object properties between diversity; And feature extraction unit 601-2, be arranged to the feature right corresponding to these object properties extracting this subject area based at least one located block.

Preferably, this positioning unit 601-1 can comprise the unit of the auxiliary area be arranged in the anchored object region, position of the identified unique point depended in subject area; And be arranged in the auxiliary area of location correspond to the right characterizing objects attribute of object properties between the unit of at least one block of template of diversity.

Preferably, feature extraction unit 601-2 can comprise the unit being arranged to and extracting feature from each at least one block of this subject area, and is arranged to the unit of the feature of each extracted block link as the feature of subject area.

Preferably, this recognition unit 602 can comprise identify unit 602-1, be arranged to for each object properties pair in the set of predefined object properties, based on object properties in two object properties that this object properties centering that the signature identification right corresponding to these object properties of subject area is corresponding with this subject area comprises, and the score of the object properties corresponding to this subject area is increased predetermined value, wherein, the whole object properties comprised in the set of this predefined object properties have identical initial score; And attribute determining unit 602-2, the object properties with top score be arranged in the set determining these predefined object properties are the object properties of this subject area.

Additionally or as an alternative, this recognition unit 602 can comprise identify unit 602-3, be arranged to for object properties pair in the set of predefined object properties, based on object properties in two object properties that this object properties centering that the signature identification right corresponding to these object properties of subject area is corresponding with this subject area comprises, and attribute determining unit 602-4, be arranged to based in the set of the object properties corresponding to this subject area and these predefined object properties except these object properties to except residue object properties determine the object properties of this subject area, wherein, if the quantity of residue object properties equals 0, object properties then corresponding to this subject area are confirmed as the object properties of this subject area, otherwise by the set of the object properties corresponding to this subject area and these predefined object properties except these object properties to except residue object properties again return group for new object properties set, and the operation of this mark and attribute determination operation are performed successively for this new object properties set.

Characterizing objects attribute between the template of diversity can be pre-formed as described above and be stored discretely with equipment 600.Additionally or as an alternative, equipment 600 can comprise be arranged to formed in the above described manner right these object properties of sign of object properties between the unit of template of diversity.

[favourable technique effect]

Generally speaking, the invention provides a kind of thinking of identification of the object properties for the subject area in image newly, wherein introduce the right concept of object properties to improve feature extraction and the identification of subject area.

More specifically, diversity between the object properties that object properties centering comprises is used to for object properties to the different block of pixels extracted in subject area, and the feature of the subject area extracted is used to determine that subject area is corresponding with which object properties of object properties centering.Therefore, extraction and the identification of the feature of subject area are performed in pairs, and thus, recognition efficiency and accuracy can be enhanced.

Should point out, so different block of pixels of subject area for each object properties be used as in taking turns at each in the set of the predefined object properties of comparison basis to being determined and extracting, and can diversity between the object properties that comprise of reflection object attribute centering.In addition, such part be extracted can be changed adaptively between recognition phase, that is, the different block of pixels of subject area can be dependent on each and takes turns the comparison relatively and change, instead of keeps constant.

Therefore, subject area may for object properties to being that public instead of distinctive some parts can not be extracted, and the part extracted can diversity more accurately between the object properties that comprise of reflection object attribute centering, and contribute to determining exactly subject area corresponds in the object properties that object properties centering comprises which, thus the object properties of subject area can be determined more accurately.

Hereafter, in order to contribute to thoroughly understanding realization of the present invention, use face as the example of the object that will be identified to explain the exemplary realization of solution of the present invention.Should point out, solution of the present invention also can be applicable to the object of other types.

For the face area in the image that will be identified, its attribute can belong to plurality of classes.Such as, the classification of face's attribute can be selected to comprise countenance, the sex of people corresponding with this face when face is face and the one in the group at age.Certainly, the classification of face's attribute is therefore not limited, and can be other classification except above-mentioned classification.

[example 1]

Hereafter, the process according to the face's attribute (such as, countenance) for the face area in recognition image of the present invention will be described.

In general, for the face area in the input picture that its expression will be identified, for each countenance pair in the set of predefined countenance, that extracts this face area based on the diversity between the countenance that this countenance centering comprises corresponds to the right feature of this countenance, then identifies the countenance of this face area based on the feature of extracted face area.When there is multiple face in the image inputted, this process is repeated the number of times identical with the quantity of face.

Hereafter the details of this process will be described.

At first, for the input picture that can comprise at least one face, detect the face area in this input picture, a usual face area corresponds to a face in image.Fig. 7 illustrates the face area of the rectangle detected from input picture.

Preferably, before the face area detected is used for feature extraction, face area is alignd usually respectively, and this alignment can be performed in many ways.

In one implementation, face area is aligned based on the unique point of the predetermined quantity extracted from face image, and wherein the quantity of unique point can be set based on the experience of operator, and is not limited to some specific quantity.Feature Points Extraction can be such as Xudong Cao, Yichen Wei, Fang Wen, Jian Sun.Face alignment by explicit shape regression CVPR, 2012, and D.Cristinacce and T.F.Cootes.Boosted regression active shape models.BMVC, ASM disclosed in 2007.Should point out, Feature Points Extraction is therefore not limited, and can be any other method known in the art.

Fig. 8 schematically shows and extracts 7 unique points from face area, and as shown in Figure 8, these 7 unique points are: two corners of the mouths of each two canthus, nose and mouths in two eyes.

Alignment can be performed as follows.Should point out, following what be known in the art is only exemplary for the process of aliging, and alignment also performs by other processes.

When aliging, the mean place of 7 unique points be extracted is calculated according to the sample of the handmarking of predetermined quantity.Assuming that there is the sample of n mark, seven some P _i(x _i, y _i) mean place of (i=1 ~ 7) is calculated as follows:

x_{i} = Σ_{j = 1}^{n} x_{j}, y_{i} = Σ_{j = 1}^{n} y_{j}

Here, x and y can represent transverse axis position and vertical pivot position.

These seven some P _i(x _i, y _i) mean place of (i=1 ~ 7) is defined as target face (objective face), and adopts affine maps process to align with target face to make input face.

The size of the face of alignment can be 200*200 pixel.Should point out, the size of the face area of alignment is not limited, and can be any other size.

Next, the face area that may be aligned in input picture will stand feature extraction.Fig. 9 is the process flow diagram of the process illustrated for feature extraction, and wherein step S101 is shown by dashed lines, this means that this step is optional.

In this characteristic extraction procedure, in face area, location corresponds at least one different block of pixels of the right template of countenance among the set of predefined countenance, this template characterizes the diversity (S102) between the countenance of countenance centering, then, based on the feature (S103) right corresponding to this countenance being extracted face area by least one different block of pixels of locating.

The right template of countenance can be made up of at least one block corresponded to each other among the image of each countenance of countenance centering, this at least one block can reflect the diversity between the countenance image of this countenance centering, and will describe the details of template in detail after a while.

For the process of the different block of pixels in location, in one implementation, can directly depend on predetermined correspondence relation (such as, in same position and there is identical block size) for each countenance to the template right according to this countenance in face image, locate at least one different block of pixels.Should point out, the correspondence relation between the block at least one different block of pixels and template is therefore not limited, and can meet other rules.

In another implementation, can perform for locating auxiliary area (such as in face image in advance, organic region) process (S101), thus the location of different block of pixels in face image can only for the auxiliary area in face image (such as, organic region in face image) perform, instead of perform for whole face image.

Auxiliary area can be any shape, such as rectangle, square etc., and can be any size according to the experiment of operator.

As shown in Figure 10, four organic regions can be located, comprise two eye areas, a nasal area and a mouth region.In identifying, for the face of each alignment, the size in these four regions is fixing.Such as, in the face of 200*200 pixel, the size of eye rectangle is 80*60, and the size of nose rectangle is 140*40, and the size of mouth rectangle is 140*80.

Preferably, the location of organic region can be determined by the unique point in face image.If origin is the upper left corner of image.When locating left eye region, the center of rectangular area can be consistent with the mid point of the line AB in Figure 10.Similarly, the center of the rectangle of right eye region can be identical with the mid point of the line CD in Figure 10.For nasal area, if the position coordinates in the upper left corner is (n1, n2), the position coordinates in the lower right corner is (n3, n4), and the position coordinates of nose E is (e1, e2), then the coordinate of these three points meets following equation:

e1=α＊(n1+n3),e2=n2+β＊(n4-n2),

Here, 0.3≤α≤0.7,0.5≤β≤0.8.

For mouth region, if H(h1, h2) be the mid point of line FG, the upper left corner of mouth region is (m1, m2), and the lower right corner is (m3, m4).The relation of coordinate meets following equation:

h1=γ＊(m1+m3),h2=m2+δ＊(m4-m2),

Here, 0.3≤γ≤0.7,0.3≤δ≤0.6.

Thus, four auxiliary areas in face image can be located, and can refer to the right template of this countenance for the different block of pixels in the right face image of countenance and only located in this auxiliary area.In such a case, preferably, the right template of countenance can be only made up of different piece in the auxiliary area in each countenance of this countenance centering, auxiliary area in countenance is corresponding with the auxiliary area of locating in face area (such as, be in same position with the auxiliary area in face area and have formed objects) in a predefined manner.

Hereafter, the right template of countenance will be described in detail.

In the present invention, for each countenance in the set of predefined countenance to determining template.Countenance to the countenance that can comprise predetermined quantity, and preferably comprises two kinds of countenances, and any two kinds of countenances wherein can form countenance pair.

Such as, as shown in figure 11, if the set of predefined countenance comprises three kinds of countenances: sad, neutral and angry, therefore C can be there is ₃ ²individual countenance pair.Such as, as shown in figure 11, countenance is to being made up of sad expression and neutral expression, and countenance is formed having neutral expression and angry facial expression, and countenance is to being made up of sad expression and angry facial expression.

In another implementation, the countenance that countenance centering comprises can be the very large even reciprocal countenance of the difference between them.Such as, countenance to being made up of laugh expression and expression of crying particularly, thus has more distinctiveness for such countenance to the block be extracted.

Figure 12 illustrates the process flow diagram of the process for configuring the right template of countenance.Such process can be performed in advance before execution process of the present invention, thus the right template of all countenances comprised in the set of predefined countenance can be preconfigured and store.As an alternative, such process can perform in time along with the execution of process of the present invention.

First, two the average face images corresponding respectively to two countenances that countenance centering comprises are divided into multiple pieces with being corresponded to each other.

The average face image of each expression is fabricated by the face be aligned in same expression being averaged usually.For the average face of expression of laughing, assuming that have the laugh sample of N number of alignment, the average face image I of laugh is obtained by following formula:

I = Σ_{i = 1}^{N} \frac{1}{N} I_{i}

The gray-scale value that this equation refers to the pixel of each alignment in face image is added together when having weight 1/N, to obtain the average face image of laugh.

The average face image that Figure 13 exemplarily illustrates laugh, neutrality, sadness and smiles, and so average face image is usually produced based on countenance database in advance and is stored.

The correspondence that Figure 14 schematically shows two average face images divides.These two average face images utilize identical pattern (such as, grid) to be divided respectively, and the size of the block be divided is not limited.Such as, in the average face image of 200*200 pixel, the size of block is 10*10 pixel.

Should point out, the partition mode of each average face image is therefore not limited, and can correspond to each other otherwise, such as, when each average face image has different sizes, each block in partition mode can correspond to each other according to the ratio of the size of each average face image.

Next, the feature of each in multiple pieces of each average face image be divided will be extracted.There is multiple extracting method, and each in this multiple extracting method can be applied to this process.

As known in the art, feature extracting method can be such as Timo Ojala, Matti Pietikainen, and Topi Maenpaa, Multi-resolution gary-scale and rotation invariant texture classification with local binary patterns, IEEE Transaction on Pattern Analysis and Machine Intelligence, local binary disclosed in 2002 (LBP), or such as Ville Ojansivu, and Janne Heikkila, Blur insensitive texture classification using local phase quantization, local phase disclosed in ICISP2008 quantizes (LPQ).

When LBP, block size is identical with the size of different block of pixels, and section add up to such as 59.Therefore, each piece of LBP feature has 59 dimensions.Feature calculation process is summarized as follows:

1) for each pixel in input picture, LBP is calculated _8,1

A) value as the center pixel of current pixel is obtained

B) pixel value in eight adjacent areas is extracted

C) g is calculated by bilinear interpolation _p, (P=0,1 ..., 7)

D) pass through

{LBP}_{8,1} = Σ_{p = 0}^{7} s (g_{p} - g_{c}) * 2^{p}

Calculate LBP value

Here, g _pthe gray-scale value of one of neighbor, and g _cit is the gray-scale value of center pixel.

2) Ville Ojansivu and Janne Heikkila is used, LBP value mapping table disclosed in Blur insensitive texture classification using local phase quantization.ICISP2008, by by the LBP histogram building 59 dimensions added together for the described LBP of each pixel in block.

Next, determine two through divide average face image in corresponding blocks feature between similarity.

Such as, the similarity (with reference to Figure 14) of corresponding blocks can use Euclidean distance to be determined.Assuming that two proper vector f1=<a1, a2 ... an>, f2=<b1, b2 ... the similarity of bn>, f1 and f2 is as follows:

S (f_{1}, f_{2}) = \sqrt{{(a_{1}, b_{1})}^{2} + {(a_{2} - b_{2})}^{2} + . . . + {(a_{n} - b_{n})}^{2}}

Therefore, the similarity between these two the average face images divided can be determined as described above one by one piece.Should point out, the determination of similarity is therefore not limited, and can be realized by with alternate manner known in the art.

Finally, the similarity selecting in two average face images divided between them lower than the block of predetermined threshold to form template.

More specifically, the similarity of the corresponding blocks in two average face images through dividing by with ascending sort, the block of the first predetermined quantity to being selected, and the right index of the block of this predetermined quantity is saved using as the right template of expression.This predetermined quantity (also correspond to predefined threshold value, this predefined threshold value can be last right similarity of the centering of predetermined quantity) is optimised by testing.An example of right template of expressing one's feelings is illustrated in fig .15.

Therefore, right for each expression, template that can be right according to this expression be formed as described above locates one group of different block of pixels in face area.Figure 16 illustrates this process.

First, according to this template, the face image of input is divided into block, and block divides and can divide identical (such as having same pattern and same block size) with template.Then, because right template of expressing one's feelings have adjusted the index of the different block of pixels in face image, different block of pixels is positioned in the face image of alignment according to the index in template.In fact, the size of different block of pixels can be 10*10 pixel.

As mentioned above, when the auxiliary area in face image by pre-aligned time, said process can only perform for auxiliary area.

Based on the different block of pixels of the such location in the face area that will be identified, the feature of face area can be extracted.Figure 17 illustrates the process flow diagram for such feature extraction.Especially, feature (S1031) can be extracted for each at least one block in face area, and the feature of each block extracted is by the feature (S1032) linked as face area.

Feature extraction mode can be any method as known in the art, and such as, can be identical with the method (such as, LBP) in above-mentioned characteristic extraction procedure.

Then, the feature of all different block of pixels is linked to represent the feature of countenance.The dimension of final vector is 59*n, and n is the sum of different block of pixels here, and 59 representatives are used for the sum of the section of feature extraction and can are other numerals any.When only utilizing the different block of pixels in auxiliary area, in each organic region, the feature of each different block of pixels is by with fixing sequence interlock, and then the feature of four organic regions obtained is linked.

Hereafter the identification of the countenance of face image will be described.

Figure 18 is a kind of process flow diagram realized that identifying is shown, in this implementation, identifies and to be realized by so-called " one to one " mode, in this approach, countenance by the set for predefined countenance by the C that votes _n ^tsecondary, here n is the quantity of the countenance comprised in this set and t is the quantity of the countenance that countenance centering comprises, that is, for a countenance to carrying out single ballot, and the countenance with top score will be confirmed as final countenance.

Each expression is indicated as B ₁..., B _n, and the score of each expression can be set as 0 at first, is designated as f (B ₁)=...=f (B _n)=0.For each countenance pair, when face area is determined to correspond to one of them countenance (B _i) time, the score of such countenance is increased steady state value, such as f (B _i)=f (B _i)+1.

Finally, corresponding to maximum score f (B)=max{f (B ₁) ... f (B _n) countenance be confirmed as the countenance of face area.

Figure 19 is another process flow diagram realized that identifying is shown, in this implementation, identification can be realized by so-called " victory one " mode, in this approach, the quantity of the predefined countenance comprised in the set of predefined countenance is n, the countenance of face area will be determined by n-1 wheel, wherein only have the expression of winning that in taking turns, countenance is right to advance to next round, and the expression of finally winning be confirmed as final countenance.

If each expression is indicated as B ₁..., B _n, and countenance is to comprising two countenances.At the beginning, from the set of predefined countenance, select arbitrarily countenance to (B _i, B _j), and for this countenance pair, determine that this face area is corresponding with which countenance.Such as, determine, the countenance of face area is B _i.

Then, expression B will be got rid of in the initial sets from expression _j, and remaining expression is gathered being again organized as new expression.For the set that this is new, again perform said process.

Therefore, such process will be performed n-1 wheel, and final remaining expression is identified as the expression of face image.

In general, " victory one " mode ratio " one to one " mode is more efficient, and poor accuracy is seldom identical.

Determine to be realized by any mode (such as passing through sorter) known in the art for the expression that countenance is right.When sorter, right for expression, proper vector is classified by two-value sorter.Adopt such as Linear SVM disclosed in Chih-Chung Chang and Chih-Jen Lin LIBSVM:a library for support vector machines.2011 as this sorter.Decision function is sgn (w ^tf+b), wherein w and b is stored in dictionary, and w is for SVM by the weight of training and b is bias term (bias item), and f is the proper vector of face area.

Experimental result

Table 1 shows the data set used in test.The image of this data centralization by from the Internet download, has the frontal one image of nature expression.

Table 1

	Laugh	Neutral	Sad	Smile
					Training set	2042	1976	1746	2140
Test set	717	487	1302	518
					Sum	2759	2463	3048	2658

Table 2 illustrates that the solution of the application contrasts the performance (such as, Facial expression recognition accuracy) of such as U.S. Patent application US2012/0169895.

Table 2

The mixed knowledge matrix (confusion matrix) of prior art is as shown in table 3, and the confusion matrix of the application is as shown in table 4.

Table 3

Table 4

[example 2]

Hereafter will describe according to the process for the face's attribute (such as, the age of face) in recognition image of the present invention.

If there is the set of predefined age level, comprise children, teenager, adult, the elderly etc.Then, configure age level pair by the set of this predefined age level, and the age of face will be determined in pairs.This process can be implemented similarly by with the realization in example 1.

More specifically, at the beginning, this process detects face and locate face area in the face image of input.

Next, this process according to by the right template of each age level of training for each age level to the different block of pixels of locating in face area.

Next, this process is based on the feature for obtaining face area to the different block of pixels of classifying to each age level.The right feature of age level represents by the feature of different block of pixels being linked.

Next, this process to the age level determining face, and integrates classification results to determine the age level of face for each age level.

Preferably, before the different block of pixels in location, face can be aligned.

Preferably, location different block of pixels before or period, auxiliary area can be located in face area, thus location different block of pixels and subsequent operation during, only can process auxiliary area.

Should point out, above-mentioned example is only illustrative, instead of restrictive.Therefore solution method of the present invention is not limited, and can be applicable to the object properties identification of other type.

[industrial applicability]

The present invention can be used for multiple application.Such as, the present invention can be applicable to the state of the object in detection and tracking image, and the smile in such as camera, spectators' responding system and automatic picture annotation detects.

More specifically, in one implementation, can detected object, and method of the present invention then can be utilized to identify the attribute of this object.

When camera application, catch image by camera.System selects face image by face detection techniques from caught image.Face image is inputted Facial expression recognition module.Some predefined expressions (such as, glad, sad, neutral etc.) are identified.Then, recognition result is inputted evaluation module, this evaluation module assesses effect of meeting according to the expression of spectators.This system finally exports the assessment result of effect of meeting.

Various ways can be adopted to carry out method and system of the present invention.Such as, method and system of the present invention is carried out by software, hardware, firmware or their any combination.The order of the step of the method mentioned above is only illustrative, and unless specifically stated otherwise, otherwise the step of method of the present invention is not limited to specifically described order above.In addition, in certain embodiments, the present invention also can be embodied as the program recorded in recording medium, comprises the machine readable instructions for implementing according to method of the present invention.Therefore, the present invention also covers the recording medium stored for implementing the program according to method of the present invention.

Although reference example embodiment describes the present invention, it will be appreciated by those skilled in the art that above-mentioned example is only illustrative instead of intends to limit the scope of the invention.It will be understood by those skilled in the art that above-described embodiment can be modified when not deviating from scope and spirit of the present invention.Scope of the present invention is defined by the appended claims, and the scope of appended claim will be given the most wide in range explanation, to comprise all such modifications and equivalent structure and function.

Claims

1. an equipment, comprises:

Extraction unit, is arranged to for each object properties pair in the set of predefined object properties, extracts the feature right corresponding to these object properties of subject area based on the diversity that these object properties are right; And

Recognition unit, is arranged to the object properties of the feature identification subject area based on extracted subject area.

2. equipment according to claim 1, wherein, all object properties in the described set of predefined object properties belong to same classification, and wherein, object properties are formed by any two object properties comprised in the described set of predefined object properties.

3. equipment according to claim 1, wherein, described subject area is face area, and described object properties are face's attributes, and

Wherein, the classification of described face attribute be selected from comprise countenance, the one corresponded to when face is face in the sex of people of this face and the group at age.

4. equipment according to claim 1, wherein, described subject area is based on the subject area of unique point identified in subject area alignment.

5. equipment according to claim 1, wherein, described extraction unit comprises:

Positioning unit, is arranged at least one block corresponding to the right template of these object properties in this subject area of location, this template characterize these object properties between diversity; And

Feature extraction unit, is arranged to the feature right corresponding to these object properties extracting this subject area based at least one located block.

6. equipment according to claim 5, wherein, described positioning unit comprises:

Be arranged to the unit of the auxiliary area in the anchored object region, position of the identified unique point depended in subject area; And

Be arranged in the described auxiliary area in location correspond to the right characterizing objects attribute of object properties between the unit of at least one block of template of diversity.

7. the equipment according to claim 5 or 6, wherein, the described object properties of the sign that object properties are right between the described template of diversity formed in the following way:

Two the mean object area images corresponding respectively to two object properties that this object properties centering comprises are divided into multiple pieces that correspond to each other;

Extract the feature of each in multiple pieces of the mean object area image that each corresponding with each object properties is divided;

Similarity between the feature determining the corresponding blocks in these two the mean object area images be divided; And

Select the block as follows in these two the mean object area images be divided to form template, the similarity between block is lower than predefined threshold value.

8. equipment according to claim 5, wherein, described feature extraction unit comprises:

Be arranged to the unit extracting feature from each at least one block of this subject area, and

Be arranged to the unit of the feature of each extracted block link as the feature of subject area.

9. equipment according to claim 1, wherein, described recognition unit comprises:

Identify unit, be arranged to for each object properties pair in the set of predefined object properties, this subject area of signature identification corresponding to these object properties right based on subject area is corresponding with which object properties in two object properties that this object properties centering comprises, and the score of the object properties corresponding to this subject area is increased predetermined value, wherein, the whole object properties comprised in the set of this predefined object properties have identical initial score; And

Attribute determining unit, the object properties with top score be arranged in the set determining these predefined object properties are the object properties of this subject area.

10. equipment according to claim 1, wherein, described recognition unit comprises:

Identify unit, be arranged to for object properties pair in the set of predefined object properties, corresponding based on which object properties in two object properties that this subject area of signature identification and this object properties centering right corresponding to these object properties of subject area comprises, and

Attribute determining unit, be arranged to based in the set of the object properties corresponding to this subject area and these predefined object properties except these object properties to except residue object properties determine the object properties of this subject area,

Wherein, if the quantity of residue object properties equals 0, then the object properties corresponding to this subject area are confirmed as the object properties of this subject area,

Otherwise, by in the set of the object properties corresponding to this subject area and these predefined object properties except these object properties to except residue object properties again return group for new object properties set, and the operation of this mark and attribute determination operation are one after the other performed for this new object properties set.

11. 1 kinds of methods, comprise:

For each object properties pair in the set of predefined object properties, extract the feature right corresponding to these object properties of subject area based on the diversity that these object properties are right; And

Based on the object properties of the feature identification subject area of extracted subject area.