Summary of the invention
The objective of the invention is in screen play, to extract personage's name, people's face in the film is discerned and marked, owing to do not comprise temporal information in the drama, it can't be alignd in time with video, the method that the present invention utilizes figure to mate, under the condition of given film video and drama, a kind of method of film human face image being carried out Automatic Logos is proposed.
For reaching described purpose, the present invention proposes a kind of film human face image to be carried out the method for Automatic Logos, and technical scheme of the present invention realizes by following steps:
Step 1: utilize people's face detection and tracking device of various visual angles, in a film video, obtain people's face sequence automatically, people's face sequence is carried out cluster with the different personage of correspondence;
Step S2: according to people's face sequence of different personages common frequency that occurs in same scene, measure between people's face and concern, set up people's face relational network;
Step S3: utilize computing machine from the download of screen play database and storage and the corresponding plain text screen play of a described film video, different personages' name common frequency that occurs in same scene in the computer statistics plain text screen play;
Step S4: the described frequency of foundation is measured the relation between the name, sets up the name relational network;
Step S5: computing machine is expressed as people's face relational network and name relational network the form of people's face non-directed graph and name non-directed graph, people's face non-directed graph and name non-directed graph are mated, realize the coupling between the summit in people's face relational network and the name relational network, promptly realize sign people's face and name fusion.
The invention has the advantages that: method of the present invention is compared with traditional local matching process, the inventive method has been broken away from the requirement to temporal information, but in global scope, in video and two mode of text, calculate the statistical information of people's face and name respectively, set up people's face network of personal connections and name relational network.Between the summit of two networks, set up corresponding relation by the method for figure coupling then, to realize Automatic Logos to people's face.The given film video of the present invention with and corresponding drama, can realize automatic people's face sign, be people's face of occurring in the video automatically mark go up its corresponding personage's name, can retrieve for film video, video frequency abstract automatically application such as generation provide support.
Embodiment
Describe each related detailed problem in the technical solution of the present invention in detail below in conjunction with accompanying drawing.
As Fig. 1 the present invention is shown film human face image is carried out Automatic Logos integral frame synoptic diagram, realize that the required basic hardware condition of Automatic Logos integral frame of the present invention is: a dominant frequency is 2.4GHz, in save as the computing machine of 1G; Required software condition is: programmed environment (Visual C++6.0).Utilize people's face detection and tracking device 2, people's face network of computer realization various visual angles to set up the function that unit 6 and matching unit 7 are set up in unit 3, plain text screen play storage unit 5, name network.
Fig. 1 illustrates Automatic Logos integral frame of the present invention and comprises: film video storehouse 1, people's face detection and tracking device 2 of various visual angles, people's face network is set up unit 3, screen play database 4, plain text screen play storage unit 5, the name network is set up unit 6 and matching unit 7, people's face detection and tracking device 2 of various visual angles is connected with film video storehouse 1, the film video that people's face detection and tracking device 2 of various visual angles receives in the shadow video library 1, in film video, obtain people's face sequence automatically, people's face sequence is carried out cluster with the different personage of correspondence, according to people's face sequence of different personages common frequency that occurs in same scene, measure between people's face and concern; People's face network is set up unit 3 and is connected with people's face detection and tracking device 2 of various visual angles, and people's face network is set up unit 3 and set up people's face relational network according to relation between people's face; Plain text screen play storage unit 5 is connected with screen play database 4, plain text screen play storage unit 5 is from 4 downloads of screen play database and storage and the corresponding plain text screen play of a described film video, according to the name common frequency that occurs in same scene of different personages in the plain text screen play, the described frequency of foundation is measured the relation between the name; The name network is set up unit 6 and is connected with plain text screen play storage unit 5; The name network is set up unit 6 and is set up the name relational network according to the relation between the name; Matching unit 7 is set up unit 3 with people's face network respectively and is set up unit 6 with the name network and be connected, people's face relational network and people's face network that matching unit 7 is set up unit 3 generations with people's face network are set up the name relational network that unit 6 generates, be expressed as the form of people's face non-directed graph and name non-directed graph, people's face non-directed graph and name non-directed graph are mated, realize the coupling between the summit in people's face relational network and the name relational network, promptly realize sign people's face and name fusion.
As Fig. 2 a kind of method flow diagram that film human face image is carried out Automatic Logos of the present invention is shown, the flow process of this method comprises five step: step S1: detection of people's face and cluster, step S2: set up people's face relational network, step S3: name statistics, step S4: set up name relational network and step S5: people's face relational network and name relational network are represented with graph model, carried out the figure coupling.
1. set up people's face relational network
Step S1: in film video, we adopt people's face detection and tracking device (Y.Li of various visual angles, H.Z.Ai, C.Huang, and S.H.Lao.Robust head tracking with particles based onmultiple cues fusion.In Proceedings of HCI/ECCV, pages 29-39,2006.) obtain people's face sequence automatically, people's face sequence is carried out cluster with the different personage of correspondence.In people's face sequence, every width of cloth facial image all is normalized to 64 * 64 gray level image, and is expressed as the gray feature vector of 64 * 64 dimensions, by the linear embedding grammar in part proper vector is reduced to 4 dimensions then.When carrying out the cluster of people's face sequence, we adopt earth transport distance [2] (Y.Rubner, C.Tomasi, and L.J.Guibas.A metric for distributions with applications to image databases.In Proceedings ofIEEE International Conference on Computer Vision, pages 59-66,1998.) as the distance metric between people's face sequence.The earth transport distance is the distance metric mode between a kind of set, results from transportation problem, and its essence is the minimum cost of finding the solution in the weighting point set transfer process, belongs to constrained optimization problem.It possesses following two character: (1) allows part similar, and does not require the equal and opposite in direction of two data set, and this tolerance for distance between the people's face sequence that contains different images quantity is particularly important.(2) for the comparatively serious dissimilar situation that occurs between data acquisition, can punish.This mainly is because in film video, because factor affecting such as illumination, attitudes, may seem more similar at people's face of different personages in some cases.For this similarity from parts of images between people's face sequence of different personages, must rely on the dissimilarity of other image is punished, just can avoid they are mixed is same people.After establishing the distance metric mode, we adopt the method for cohesion hierarchical clustering to carry out the cluster of people's face sequence.
Step S2: according to people's face sequence of different personages common frequency that occurs in same scene, measure between people's face and concern, set up people's face relational network; When calculating concerning between people's face, we add up in twos people's face common frequency that occurs in the Same Scene in film.At first, we add up the frequency that everyone face occurs in each scene.Because we have finished the cluster to people's face sequence, therefore only need the distribution situation of people's face sequence in each scene in each bunch of statistics, can obtain people's face frequency of occurrence distribution matrix
M wherein
fBe the number of people's face sequence cluster, n
fBe the number of scene in the video, the element o in the matrix
Ik FaceRepresent the frequency that i people's face occurs in k scene.I in the matrix is capable
Be the distribution of i people's face frequency of occurrence in each scene of whole film.We calculate any two people's faces common frequency c that occurs in same scene then
Ij k, account form is:
Min (o wherein
Ik Face, o
Jk Face) be the element o that gets i and j people's face k row in people's face frequency distribution matrix
Ik FaceAnd o
Jk FaceLess value among both, this formula are indicated as i and j people's face common frequency c that occurs in k scene
Ij kThen these two people's faces common frequency that occurs in each scene of whole film is calculated as follows:
n
fNumber for scene in the video.By asking for the relation between people's face in twos, we can set up people's face relational network.This people's face relational network is by adjacency matrix
Represent.Adjacency matrix is m
f* m
fSquare formation, its off diagonal element r
Ij FaceBe two people's faces common frequency that occurs in whole film of correspondence, the element r on the diagonal line
Ii FaceBe that i people's face is from the frequency that occurs in whole film.
2. set up the name relational network
Step S3: utilize computing machine from the download of screen play database and storage and the corresponding plain text screen play of a described film video, different personages' name common frequency that occurs in same scene in the computer statistics plain text screen play; Step S4: the described frequency of foundation is measured the relation between the name, sets up the name relational network; Similar with people's face relational network method for building up, when setting up the name relational network, its relation also is to measure by the frequency that both occur in the Same Scene in drama jointly.At first we add up the frequency that each name occurs in each scene, obtain a name frequency of occurrence distribution matrix
M wherein
nBe the number of name, n
nIt is the number of scene in the drama.Then according to formula
Calculate the common frequency r that occurs between the name in twos
Ij Name, generating the name network of personal connections, this network is equally by an adjacency matrix
Represent m
nNumber for name.
3. the coupling of people's face and name
Step S5: computing machine is expressed as people's face relational network and name relational network the form of people's face non-directed graph and name non-directed graph, people's face non-directed graph and name non-directed graph are mated, realize the coupling between the summit in people's face relational network and the name relational network, promptly realize sign people's face and name fusion.Setting up people's face relational network R
FaceWith name relational network R
NameAfter, they can be represented by non-directed graph respectively:
G
face=<V
f,E
f,W
f>,G
name=<V
n,E
n,W
n>。
At people's face non-directed graph G
FaceIn, summit V
f={ f
1, f
2..., f
mRepresent m
fIndividual's face, limit E
fRepresent the relation between people's face in twos, the weight on limit
Record is the level of intimate of relation between the two, the weight on summit
The frequency that people's face that record is corresponding occurs in whole film.
At name non-directed graph G
NameIn, summit V
n={ n
1, n
2..., n
mRepresent m
nIndividual name, same, limit E
nAnd weights W
nMutual relationship between the expression name.
Because when setting up people's face relational network and name relational network, in the video in the number of people's face sequence cluster and the drama number of name be consistent, therefore when representing these two networks with non-directed graph, the number of vertex of people's face non-directed graph and name non-directed graph is consistent, and unification is here represented with m.Given people's face non-directed graph, name non-directed graph are G
FaceAnd G
Name, comprise m summit respectively, m * m kind candidate's the people's face and the coupling of name are then arranged, we are stored in them among the tabulation L.For each candidate matches a=(f
i, n
i'), for appraiser's face f
iWith name n
i' between matching degree, we have defined an index M (a), are called " matching degree ":
R wherein
Ii FaceBehaviour face non-directed graph G
FaceThe weight on middle summit, r
I ' i ' NameBe name non-directed graph G
NameIn the weight on summit, σ is the sensitivity coefficient of regulating the noise degrees of tolerance, but free adjustment, exp{} is for being the exponential function at the end with e.M (a) can be counted as the feature of this coupling, and when coupling when being correct, its matching degree M (a) will be higher.
For the coupling of any two people's faces and name (a, b), a=(f wherein
i, n
I '), f
iBe people's face, n
I 'Be a name, b=(f
j, n
J '), f
jBe people's face, n
J 'Be a name, at people's face non-directed graph G
FaceIn, i people's face f
iWith j people's face f
jBetween the pass be r
Ij FaceAt name non-directed graph G
NameIn, i name n
I 'With j name n
J 'Between the pass be r
I ' j ' NameIf these two couplings a and b are correct, then relation value r
Ij FaceAnd r
I ' j ' NameShould be approaching, we be referred to as compatible; Otherwise these two relation value differ bigger, and we are referred to as to repel each other.Therefore, at these two the coupling we defined equally an index M (a b), is called " compatible degree ":
(a b) can be regarded as the feature of these two couplings to M, if they all are correct, then (a b) will be higher for their compatible degree M.Based on definition, M (a, value b) is non-negative, and be symmetrical (M (and a, b)=M (b, a)).Meanwhile, for these two couplings, we also need consider the mapping one by one constraint between name and the people's face, when they and this constraint conflict, as a=(f
i, n
I '), b=(f
i, n
J '), i.e. people's face f
iBoth with name n
I 'The coupling again with name n
J 'Coupling, then (a b) is changed to 0 with the compatible degree M of these two couplings.So far, the matching problem between two figure summits just is reduced to searches a coupling set C in all possible candidate matches, and this is integrated into when satisfying mapping constraint one to one, and its matching degree and compatible degree sum that comprises coupling can obtain maximal value.Its objective function is defined as:
S=∑
a,b∈CM(a,b)+∑
a∈CM(a)
For this reason, we represent all possible candidate matches with a new non-directed graph, corresponding each candidate's in the summit of figure coupling, its weight is matching degree M (a), the relation between corresponding two candidate matches in the limit of figure, its weight be compatible degree M (a, b).Because total m * m possible candidate matches is so the number of vertex among the figure is m
2The adjacency matrix of this figure is
Its element be each matching degree M (a) and compatible degree M (a, b).Therefore, our target is actually at matrix
The middle element set C that seeks under the constraint condition that satisfies mapping one to one, makes that gathering the interior element sum obtains maximal value.In order to solve the optimization problem of this belt restraining, we have introduced a kind of method [3] (M.Leordeanu and M.Hebert.A spectral technique forcorrespondence problems using pairwise constraints.In Proceedings of the 10th IEEEInternational Conference on Computer Vision of spectrum, pages 1482-1489,2005.).This method is proposed by Leordeanu and Hebert, can search prevailing element set in matrix.At first, we define a normalized indication vector
Its element value x (i) is pairing i coupling a
iThe degree of confidence that belongs to goal set C, its mould value is 1.We wish to try to achieve optimum solution x
*, make
According to matching degree M (a) and compatible degree M (a, definition b) as can be known, matrix
Be non-negative symmetric matrix.Therefore, decide through consultation reason, when x is a matrix according to Rayleigh
Main proper vector the time,
Can obtain maximal value, and according to the Perron-Frobenius theorem, the element value strictness of the main proper vector of being tried to achieve is distributed in the interval [0,1], this meets us just before to indicating the definition of vector, finishes the indication vector x to optimum then
*Find the solution.Because we have deposited all candidate matches among the tabulation L, after obtaining this optimum solution, we at first search the greatest member value x in this vector
*(a
*), the coupling a of its correspondence
*Be most probable coupling, satisfy its reservation.Then according to mapping constraint one to one, we are with all and a
*Afoul coupling is deleted from tabulation L, simultaneously with x
*In the corresponding element value put 0.Next, we continue to seek x
*In the greatest member value, keep its corresponding coupling in tabulation L, and afoul with it other couplings of deletion.By that analogy, be circulated to the operation of all couplings all having been finished reservation or deletion.The coupling that finally remains is institute and asks.Everyone the face sequence cluster that generates in step S1 has so all been mated a name, and the people's face sequence in the class all identifies with this name.
The above; only be the embodiment among the present invention; but protection scope of the present invention is not limited thereto; anyly be familiar with the people of this technology in the disclosed technical scope of the present invention; can understand conversion or the replacement expected; all should be encompassed in of the present invention comprising within the scope, therefore, protection scope of the present invention should be as the criterion with the protection domain of claims.