CN101833569A

CN101833569A - A method for automatic identification of movie face images

Info

Publication number: CN101833569A
Application number: CN 201010141915
Authority: CN
Inventors: 卢汉清; 张一帆; 徐常胜; 程健
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2010-04-08
Filing date: 2010-04-08
Publication date: 2010-09-15

Abstract

The present invention relates to a method for automatically identifying human face images in movies. The method includes: Step 1: Using a multi-view face detection and tracker to automatically acquire human face sequences in movie videos and perform clustering to correspond to different people; Step 2: Measure the relationship between faces according to the frequency of the face sequences of different characters appearing in the same scene, and establish a face relationship network; Step 3: Use a computer to download and store the movie video related For the corresponding plain text movie script, the computer counts the frequency of the names of different characters in the same scene in the plain text movie script; step 4: measure the relationship between names according to the frequency, and establish a name relationship network; step 5: the computer will The face undirected graph and the name undirected graph of the face relationship network and the name relationship network are matched to realize the matching between the vertices in the face relationship network and the name relationship network, and to realize the identification of the fusion of the face and the name.

Description

A kind of method of film human face image being carried out Automatic Logos

Technical field

The invention belongs to the content of multimedia analysis field, relate to a kind of method of in film video, automatic facial image being carried out Automatic Logos.

Background technology

Flourish along with film industry, large quantities of films is made generation in succession.Index, organization and management for magnanimity film video data also become more and more important.In film, because plot all is to launch around the personage, so personage's focus of paying close attention to of spectators normally, be the important content that needs in the film to mark with index.People's face identification technology mainly was towards news video in the past, because in news video, can in text, obtain a large amount of names, and this transcribed text aligns automatically in time with video, can realize the related of people's face and name according to the consistance of time by phonetic transcription.This method can be called " local coupling ".Yet in film video, local coupling is but also inapplicable.This is because name often do not occur in film personage's the dialogue, so can't extract enough names from the phonetic transcription text.In screen play, though include personage's name, not free information can't be alignd drama with video in time, promptly can't adopt local coupling to realize the coupling of people's face and name.Therefore still lacking at present utilizes screen play the people's face in the film video to be carried out the method for Automatic Logos.

Summary of the invention

The objective of the invention is in screen play, to extract personage's name, people's face in the film is discerned and marked, owing to do not comprise temporal information in the drama, it can't be alignd in time with video, the method that the present invention utilizes figure to mate, under the condition of given film video and drama, a kind of method of film human face image being carried out Automatic Logos is proposed.

For reaching described purpose, the present invention proposes a kind of film human face image to be carried out the method for Automatic Logos, and technical scheme of the present invention realizes by following steps:

Step 1: utilize people's face detection and tracking device of various visual angles, in a film video, obtain people's face sequence automatically, people's face sequence is carried out cluster with the different personage of correspondence;

Step S2: according to people's face sequence of different personages common frequency that occurs in same scene, measure between people's face and concern, set up people's face relational network;

Step S3: utilize computing machine from the download of screen play database and storage and the corresponding plain text screen play of a described film video, different personages' name common frequency that occurs in same scene in the computer statistics plain text screen play;

Step S4: the described frequency of foundation is measured the relation between the name, sets up the name relational network;

Step S5: computing machine is expressed as people's face relational network and name relational network the form of people's face non-directed graph and name non-directed graph, people's face non-directed graph and name non-directed graph are mated, realize the coupling between the summit in people's face relational network and the name relational network, promptly realize sign people's face and name fusion.

The invention has the advantages that: method of the present invention is compared with traditional local matching process, the inventive method has been broken away from the requirement to temporal information, but in global scope, in video and two mode of text, calculate the statistical information of people's face and name respectively, set up people's face network of personal connections and name relational network.Between the summit of two networks, set up corresponding relation by the method for figure coupling then, to realize Automatic Logos to people's face.The given film video of the present invention with and corresponding drama, can realize automatic people's face sign, be people's face of occurring in the video automatically mark go up its corresponding personage's name, can retrieve for film video, video frequency abstract automatically application such as generation provide support.

Description of drawings

Fig. 1 is that the present invention carries out Automatic Logos integral frame synoptic diagram to film human face image;

Fig. 2 is the present invention carries out Automatic Logos to film human face image a method flow diagram.

Embodiment

Describe each related detailed problem in the technical solution of the present invention in detail below in conjunction with accompanying drawing.

As Fig. 1 the present invention is shown film human face image is carried out Automatic Logos integral frame synoptic diagram, realize that the required basic hardware condition of Automatic Logos integral frame of the present invention is: a dominant frequency is 2.4GHz, in save as the computing machine of 1G; Required software condition is: programmed environment (Visual C++6.0).Utilize people's face detection and tracking device 2, people's face network of computer realization various visual angles to set up the function that unit 6 and matching unit 7 are set up in unit 3, plain text screen play storage unit 5, name network.

Fig. 1 illustrates Automatic Logos integral frame of the present invention and comprises: film video storehouse 1, people's face detection and tracking device 2 of various visual angles, people's face network is set up unit 3, screen play database 4, plain text screen play storage unit 5, the name network is set up unit 6 and matching unit 7, people's face detection and tracking device 2 of various visual angles is connected with film video storehouse 1, the film video that people's face detection and tracking device 2 of various visual angles receives in the shadow video library 1, in film video, obtain people's face sequence automatically, people's face sequence is carried out cluster with the different personage of correspondence, according to people's face sequence of different personages common frequency that occurs in same scene, measure between people's face and concern; People's face network is set up unit 3 and is connected with people's face detection and tracking device 2 of various visual angles, and people's face network is set up unit 3 and set up people's face relational network according to relation between people's face; Plain text screen play storage unit 5 is connected with screen play database 4, plain text screen play storage unit 5 is from 4 downloads of screen play database and storage and the corresponding plain text screen play of a described film video, according to the name common frequency that occurs in same scene of different personages in the plain text screen play, the described frequency of foundation is measured the relation between the name; The name network is set up unit 6 and is connected with plain text screen play storage unit 5; The name network is set up unit 6 and is set up the name relational network according to the relation between the name; Matching unit 7 is set up unit 3 with people's face network respectively and is set up unit 6 with the name network and be connected, people's face relational network and people's face network that matching unit 7 is set up unit 3 generations with people's face network are set up the name relational network that unit 6 generates, be expressed as the form of people's face non-directed graph and name non-directed graph, people's face non-directed graph and name non-directed graph are mated, realize the coupling between the summit in people's face relational network and the name relational network, promptly realize sign people's face and name fusion.

As Fig. 2 a kind of method flow diagram that film human face image is carried out Automatic Logos of the present invention is shown, the flow process of this method comprises five step: step S1: detection of people's face and cluster, step S2: set up people's face relational network, step S3: name statistics, step S4: set up name relational network and step S5: people's face relational network and name relational network are represented with graph model, carried out the figure coupling.

1. set up people's face relational network

Step S1: in film video, we adopt people's face detection and tracking device (Y.Li of various visual angles, H.Z.Ai, C.Huang, and S.H.Lao.Robust head tracking with particles based onmultiple cues fusion.In Proceedings of HCI/ECCV, pages 29-39,2006.) obtain people's face sequence automatically, people's face sequence is carried out cluster with the different personage of correspondence.In people's face sequence, every width of cloth facial image all is normalized to 64 * 64 gray level image, and is expressed as the gray feature vector of 64 * 64 dimensions, by the linear embedding grammar in part proper vector is reduced to 4 dimensions then.When carrying out the cluster of people's face sequence, we adopt earth transport distance [2] (Y.Rubner, C.Tomasi, and L.J.Guibas.A metric for distributions with applications to image databases.In Proceedings ofIEEE International Conference on Computer Vision, pages 59-66,1998.) as the distance metric between people's face sequence.The earth transport distance is the distance metric mode between a kind of set, results from transportation problem, and its essence is the minimum cost of finding the solution in the weighting point set transfer process, belongs to constrained optimization problem.It possesses following two character: (1) allows part similar, and does not require the equal and opposite in direction of two data set, and this tolerance for distance between the people's face sequence that contains different images quantity is particularly important.(2) for the comparatively serious dissimilar situation that occurs between data acquisition, can punish.This mainly is because in film video, because factor affecting such as illumination, attitudes, may seem more similar at people's face of different personages in some cases.For this similarity from parts of images between people's face sequence of different personages, must rely on the dissimilarity of other image is punished, just can avoid they are mixed is same people.After establishing the distance metric mode, we adopt the method for cohesion hierarchical clustering to carry out the cluster of people's face sequence.

Step S2: according to people's face sequence of different personages common frequency that occurs in same scene, measure between people's face and concern, set up people's face relational network; When calculating concerning between people's face, we add up in twos people's face common frequency that occurs in the Same Scene in film.At first, we add up the frequency that everyone face occurs in each scene.Because we have finished the cluster to people's face sequence, therefore only need the distribution situation of people's face sequence in each scene in each bunch of statistics, can obtain people's face frequency of occurrence distribution matrix M wherein _fBe the number of people's face sequence cluster, n _fBe the number of scene in the video, the element o in the matrix _Ik ^FaceRepresent the frequency that i people's face occurs in k scene.I in the matrix is capable

Be the distribution of i people's face frequency of occurrence in each scene of whole film.We calculate any two people's faces common frequency c that occurs in same scene then _Ij ^k, account form is:

Min (o wherein _Ik ^Face, o _Jk ^Face) be the element o that gets i and j people's face k row in people's face frequency distribution matrix _Ik ^FaceAnd o _Jk ^FaceLess value among both, this formula are indicated as i and j people's face common frequency c that occurs in k scene _Ij ^kThen these two people's faces common frequency that occurs in each scene of whole film is calculated as follows: n _fNumber for scene in the video.By asking for the relation between people's face in twos, we can set up people's face relational network.This people's face relational network is by adjacency matrix

Represent.Adjacency matrix is m _f* m _fSquare formation, its off diagonal element r _Ij ^FaceBe two people's faces common frequency that occurs in whole film of correspondence, the element r on the diagonal line _Ii ^FaceBe that i people's face is from the frequency that occurs in whole film.

2. set up the name relational network

Step S3: utilize computing machine from the download of screen play database and storage and the corresponding plain text screen play of a described film video, different personages' name common frequency that occurs in same scene in the computer statistics plain text screen play; Step S4: the described frequency of foundation is measured the relation between the name, sets up the name relational network; Similar with people's face relational network method for building up, when setting up the name relational network, its relation also is to measure by the frequency that both occur in the Same Scene in drama jointly.At first we add up the frequency that each name occurs in each scene, obtain a name frequency of occurrence distribution matrix

M wherein _nBe the number of name, n _nIt is the number of scene in the drama.Then according to formula

Calculate the common frequency r that occurs between the name in twos _Ij ^Name, generating the name network of personal connections, this network is equally by an adjacency matrix

Represent m _nNumber for name.

3. the coupling of people's face and name

Step S5: computing machine is expressed as people's face relational network and name relational network the form of people's face non-directed graph and name non-directed graph, people's face non-directed graph and name non-directed graph are mated, realize the coupling between the summit in people's face relational network and the name relational network, promptly realize sign people's face and name fusion.Setting up people's face relational network R _FaceWith name relational network R _NameAfter, they can be represented by non-directed graph respectively:

G _face＝<V _f，E _f，W _f>，G _name＝<V _n，E _n，W _n>。

At people's face non-directed graph G _FaceIn, summit V _f={ f ₁, f ₂..., f _mRepresent m _fIndividual's face, limit E _fRepresent the relation between people's face in twos, the weight on limit

Record is the level of intimate of relation between the two, the weight on summit

The frequency that people's face that record is corresponding occurs in whole film.

At name non-directed graph G _NameIn, summit V _n={ n ₁, n ₂..., n _mRepresent m _nIndividual name, same, limit E _nAnd weights W _nMutual relationship between the expression name.

Because when setting up people's face relational network and name relational network, in the video in the number of people's face sequence cluster and the drama number of name be consistent, therefore when representing these two networks with non-directed graph, the number of vertex of people's face non-directed graph and name non-directed graph is consistent, and unification is here represented with m.Given people's face non-directed graph, name non-directed graph are G _FaceAnd G _Name, comprise m summit respectively, m * m kind candidate's the people's face and the coupling of name are then arranged, we are stored in them among the tabulation L.For each candidate matches a=(f _i, n _i'), for appraiser's face f _iWith name n _i' between matching degree, we have defined an index M (a), are called " matching degree ":

M (a) = \exp {- \frac{(r_{ii}^{face} - r_{i^{'} i^{'}}^{name})}{2 σ^{2}}}

R wherein _Ii ^FaceBehaviour face non-directed graph G _FaceThe weight on middle summit, r _{I ' i '} ^NameBe name non-directed graph G _NameIn the weight on summit, σ is the sensitivity coefficient of regulating the noise degrees of tolerance, but free adjustment, exp{} is for being the exponential function at the end with e.M (a) can be counted as the feature of this coupling, and when coupling when being correct, its matching degree M (a) will be higher.

For the coupling of any two people's faces and name (a, b), a=(f wherein _i, n _{I '}), f _iBe people's face, n _{I '}Be a name, b=(f _j, n _{J '}), f _jBe people's face, n _{J '}Be a name, at people's face non-directed graph G _FaceIn, i people's face f _iWith j people's face f _jBetween the pass be r _Ij ^FaceAt name non-directed graph G _NameIn, i name n _{I '}With j name n _{J '}Between the pass be r _{I ' j '} ^NameIf these two couplings a and b are correct, then relation value r _Ij ^FaceAnd r _{I ' j '} ^NameShould be approaching, we be referred to as compatible; Otherwise these two relation value differ bigger, and we are referred to as to repel each other.Therefore, at these two the coupling we defined equally an index M (a b), is called " compatible degree ":

M (a, b) = \exp {- \frac{(r_{ij}^{face} - r_{i^{'} j^{'}}^{name})}{2 σ^{2}}}

(a b) can be regarded as the feature of these two couplings to M, if they all are correct, then (a b) will be higher for their compatible degree M.Based on definition, M (a, value b) is non-negative, and be symmetrical (M (and a, b)=M (b, a)).Meanwhile, for these two couplings, we also need consider the mapping one by one constraint between name and the people's face, when they and this constraint conflict, as a=(f _i, n _{I '}), b=(f _i, n _{J '}), i.e. people's face f _iBoth with name n _{I '}The coupling again with name n _{J '}Coupling, then (a b) is changed to 0 with the compatible degree M of these two couplings.So far, the matching problem between two figure summits just is reduced to searches a coupling set C in all possible candidate matches, and this is integrated into when satisfying mapping constraint one to one, and its matching degree and compatible degree sum that comprises coupling can obtain maximal value.Its objective function is defined as:

S＝∑ _a，b∈CM(a，b)+∑ _a∈CM(a)

For this reason, we represent all possible candidate matches with a new non-directed graph, corresponding each candidate's in the summit of figure coupling, its weight is matching degree M (a), the relation between corresponding two candidate matches in the limit of figure, its weight be compatible degree M (a, b).Because total m * m possible candidate matches is so the number of vertex among the figure is m ²The adjacency matrix of this figure is Its element be each matching degree M (a) and compatible degree M (a, b).Therefore, our target is actually at matrix

The middle element set C that seeks under the constraint condition that satisfies mapping one to one, makes that gathering the interior element sum obtains maximal value.In order to solve the optimization problem of this belt restraining, we have introduced a kind of method [3] (M.Leordeanu and M.Hebert.A spectral technique forcorrespondence problems using pairwise constraints.In Proceedings of the 10th IEEEInternational Conference on Computer Vision of spectrum, pages 1482-1489,2005.).This method is proposed by Leordeanu and Hebert, can search prevailing element set in matrix.At first, we define a normalized indication vector

Its element value x (i) is pairing i coupling a _iThe degree of confidence that belongs to goal set C, its mould value is 1.We wish to try to achieve optimum solution x ^*, make

According to matching degree M (a) and compatible degree M (a, definition b) as can be known, matrix Be non-negative symmetric matrix.Therefore, decide through consultation reason, when x is a matrix according to Rayleigh

Main proper vector the time,

Can obtain maximal value, and according to the Perron-Frobenius theorem, the element value strictness of the main proper vector of being tried to achieve is distributed in the interval [0,1], this meets us just before to indicating the definition of vector, finishes the indication vector x to optimum then ^*Find the solution.Because we have deposited all candidate matches among the tabulation L, after obtaining this optimum solution, we at first search the greatest member value x in this vector ^*(a ^*), the coupling a of its correspondence ^*Be most probable coupling, satisfy its reservation.Then according to mapping constraint one to one, we are with all and a ^*Afoul coupling is deleted from tabulation L, simultaneously with x ^*In the corresponding element value put 0.Next, we continue to seek x ^*In the greatest member value, keep its corresponding coupling in tabulation L, and afoul with it other couplings of deletion.By that analogy, be circulated to the operation of all couplings all having been finished reservation or deletion.The coupling that finally remains is institute and asks.Everyone the face sequence cluster that generates in step S1 has so all been mated a name, and the people's face sequence in the class all identifies with this name.

The above; only be the embodiment among the present invention; but protection scope of the present invention is not limited thereto; anyly be familiar with the people of this technology in the disclosed technical scope of the present invention; can understand conversion or the replacement expected; all should be encompassed in of the present invention comprising within the scope, therefore, protection scope of the present invention should be as the criterion with the protection domain of claims.

Claims

1. one kind is carried out the method for Automatic Logos to film human face image, it is characterized in that it is as follows that the method comprising the steps of:

Step 2: according to people's face sequence of different personages common frequency that occurs in same scene, measure between people's face and concern, set up people's face relational network;

Step 3: utilize computing machine from the download of screen play database and storage and the corresponding plain text screen play of a described film video, different personages' name common frequency that occurs in same scene in the computer statistics plain text screen play;

Step 4: the described frequency of foundation is measured the relation between the name, sets up the name relational network;

Step 5: computing machine is expressed as people's face relational network and name relational network the form of people's face non-directed graph and name non-directed graph, people's face non-directed graph and name non-directed graph are mated, realize the coupling between the summit in people's face relational network and the name relational network, promptly realize sign people's face and name fusion.

2. the method for Automatic Logos as claimed in claim 1 is characterized in that, at first is the distribution situation of people's face sequence in each scene that comprises in the statistics people face sequence cluster when concerning between the tolerance people face, obtains people's face frequency of occurrence distribution matrix

M wherein _fBe the number of people's face sequence cluster, n _fBe the number of scene in the video, the element o in the matrix _Ik ^FaceRepresent the frequency that i people's face occurs in k scene.

3. the method for Automatic Logos as claimed in claim 1 is characterized in that, calculates any two people's faces common frequency c that occurs in same scene _Ij ^kAccount form is: Wherein this formula is represented i and j people's face common frequency c that occurs in k scene _Ij ^kMin (o _Ik ^Face, o _Jk ^Face) be the element o that gets i and j people's face k row in people's face frequency distribution matrix _Ik ^FaceAnd o _Jk ^FaceLess value among both.

4. the method for Automatic Logos as claimed in claim 1 is characterized in that, two people's face common frequency r that occur in whole film _Ij ^FaceAccount form is:

Wherein i and j people's face common frequency that occurs in k scene are r _Ij ^Face, min (o _Ik ^Face, o _Jk ^Face) be the element o that gets i and j people's face k row in people's face frequency distribution matrix _Ik ^FaceAnd o _Jk ^FaceLess value among both, n _fNumber for scene in the video.

5. the method for Automatic Logos as claimed in claim 1 is characterized in that, described people's face relational network is by adjacency matrix

Represent that adjacency matrix is m _f* m _fSquare formation, m _fThe number of behaviour face sequence cluster, its off diagonal element r _Ij ^FaceBe two people's faces common frequency that occurs in whole film of correspondence, the element r on the diagonal line _Ii ^FaceBe that i people's face is from the frequency that occurs in whole film.

6. the method for Automatic Logos as claimed in claim 1 is characterized in that, at first adds up the frequency that each name occurs during concerning between the described tolerance name in each scene, obtains a name frequency of occurrence distribution matrix M wherein _nBe the number of name sequence cluster, n _nIt is the number of scene in the drama; Element o in the matrix _Ik ^NameRepresent the frequency that i name occurs in k scene.

7. the method for Automatic Logos as claimed in claim 6 is characterized in that, according to formula

Calculate the common frequency r that occurs between the name in twos _Ij ^Name, generate the name relational network.

8. the method for Automatic Logos as claimed in claim 7 is characterized in that, described name relational network is by an adjacency matrix Represent.

9. the method for Automatic Logos as claimed in claim 1 is characterized in that, described people's face relational network R _FaceWith name relational network R _NameRepresent by non-directed graph: G _Face=＜V _f, E _f, W _f, G _Name=＜V _n, E _n, W _n; Because when setting up people's face relational network and name relational network, in the video in the number of people's face sequence cluster and the drama number of name be consistent, therefore when representing these two networks with non-directed graph, the number of vertex of people's face non-directed graph and name non-directed graph is consistent, unification is represented with m, at people's face non-directed graph G _FaceIn, summit V _f={ f ₁, f ₂..., f _mRepresent m people's face, limit E _fRepresent the relation between people's face in twos, the weight on limit

Write down the level of intimate that concerns between people's face in twos, the frequency that the corresponding people's face of weight record on summit occurs in whole film; At name non-directed graph G _NameIn, summit V _n={ n ₁, n ₂..., n _mRepresent m name, and same, limit E _nAnd weights W _nMutual relationship between the expression name.