CN107122745B

CN107122745B - Method and device for identifying person track in video

Info

Publication number: CN107122745B
Application number: CN201710293791.5A
Authority: CN
Inventors: 徐佳宏; 李益永; 兰志才; 曾勇; 韩涛
Original assignee: Shenzhen Ipanel TV Inc
Current assignee: Shenzhen Ipanel TV Inc
Priority date: 2017-04-28
Filing date: 2017-04-28
Publication date: 2020-10-20
Anticipated expiration: 2037-04-28
Also published as: CN107122745A

Abstract

The invention discloses a method and a device for identifying a figure track in a video, wherein the method comprises the following steps: converting all video segments containing face images in a video to be identified into a plurality of color pictures containing the face images; acquiring face position data in a color picture, and establishing a face fourth-order tensor model; carrying out tensor calculation, and carrying out norm calculation summation on the plurality of human face feature tensor data and the reference human face feature tensor data; classifying the corresponding faces in the color pictures to obtain the color pictures corresponding to the same face, and restoring the color pictures corresponding to the same face to the video to be recognized to obtain the figure track of the face in the video to be recognized. The method achieves the purposes of wide application range and reduction of cost and computation.

Description

Method and device for identifying person track in video

Technical Field

The invention relates to the technical field of face recognition, in particular to a tensor-based face recognition method and device.

Background

With the development of science and technology, the traditional personal authentication means such as passwords, certificates, IC cards and the like can be separated from the holder, so that the phenomena of counterfeiting, embezzlement or deciphering and the like can occur, and the requirements of modern social and economic activities and social security and precaution can not be met. Therefore, the biometric identification technology is developed at the present time, and in the biometric identification technology, because the face identification technology has the characteristics of convenient acquisition \ easy acceptance and unsuitability for counterfeiting, the face identification technology is always a research hotspot in the biometric identification technology.

The existing face recognition technology can be summarized as follows: after the face is detected and the key features of the face are located, the main face area is cut out and then preprocessed, the extraction of the face features is completed through a recognition algorithm, and the face features are compared with the known face in a database to complete the final face classification. However, with the increase of video applications, it becomes more and more valuable to extract information in a video, and correspondingly detecting the track of a certain person in the video is also a meaningful task, because different faces appear in the video, the same faces need to be classified into one class, and different faces are classified into different classes to track the track of the certain person in the video. Therefore, if the existing face recognition technology is adopted to research the track of a certain person in a video, the stored known face and the training sample need to be established, the application range of the face recognition technology is greatly reduced due to the stored known face, the recognition rate can be improved but the cost is increased due to the fact that a large number of training samples are needed, and the calculation amount is greatly increased by comparing with the stored known face. Clearly, it is not reasonable to rely on existing face recognition techniques to achieve this.

Disclosure of Invention

In view of the above problems, the present invention provides a method and an apparatus for identifying a person track in a video, which achieve the purposes of wide application range and reduced cost and computation.

In order to achieve the above object, according to a first aspect of the present invention, there is provided a method for identifying a trajectory of a person in a video, the method comprising:

converting all video segments containing face images in a video to be identified into a plurality of color pictures containing the face images;

acquiring face position data in the color picture containing the face image, and establishing a face fourth-order tensor model of the color picture containing the face image according to the face position data;

tensor calculation is carried out in the face fourth-order tensor model, and a plurality of face feature tensor data corresponding to the color pictures containing the face images are obtained;

performing norm calculation summation on the plurality of face feature tensor data and reference face feature tensor data, wherein the reference face feature tensor data is a certain face tensor feature data selected from the plurality of face feature tensor data;

setting a sampling threshold value sigma according to the number of samples of the video to be identified converted into the color picture, and obtaining the color picture according to the sampling threshold value sigma

Classifying the faces corresponding to the color pictures to obtain the color pictures corresponding to the same face, wherein S is a norm calculation summation result, m is face position width data corresponding to reference face feature tensor data, n is face position height data corresponding to the reference face feature tensor data, and r is 3 to represent three colors of red, yellow and blue;

and restoring each color picture corresponding to the same face to the video to be recognized, recording the position information of the face in the video to be recognized, and obtaining the figure track of the face in the video to be recognized.

Preferably, the obtaining of the face position data in the color picture including the face image, and establishing the face fourth-order tensor model of the color picture including the face image according to the face position data includes:

converting the color picture containing the face image into a corresponding black-and-white picture, and extracting position data of the face in the black-and-white picture, wherein the position data comprises an initial coordinate, width data and height data;

acquiring face position data of a color picture corresponding to the black-and-white picture according to the position data of the face in the black-and-white picture;

according to the width data m and the height data n in the face position data of the color picture, establishing a face third-order tensor A [1: m, 1: n, 1: r ], wherein r ═ 3, represents three colors of red, yellow and blue;

and setting translation tensor data m/3 according to the width data m, and obtaining a translation tensor B [ i: m × 2/3+ i, i: n-m 2/3+ i 2, 1: r ], wherein i represents a relation coefficient between the translated tensor and the original tensor, and the value of i is more than or equal to 1 and less than or equal to m/3;

and carrying out translation processing on the three-order tensor of the human face according to the translation tensor to obtain a four-order tensor model C [1: m/3,1: m 2/3,1: n-m 2/3,1: r ].

Preferably, the tensor calculation in the face fourth-order tensor model to obtain a plurality of face feature tensor data corresponding to the color pictures containing the face images includes:

establishing the series C of the four-order tensor model of the human face_kThe calculation function of (a), wherein,

C_k＝0.75*C_k-1+0.25*B_kwherein 1 ≦ k ≦ m/3;

according to the translation tensor B, calculating to obtain the translation tensor B_kA value of (d);

calculating the face feature tensor data C, wherein C ═ C_kWherein k is m/3.

According to a second aspect of the present invention, there is provided an apparatus for identifying a trajectory of a person in a video, the apparatus comprising:

the conversion module is used for converting the video to be recognized into a plurality of color pictures containing the face images according to whether the video segments contain the face images or not;

the establishing module is used for acquiring face position data in the color picture containing the face image and establishing a face fourth-order tensor model of the color picture containing the face image according to the face position data;

the first calculation module is used for carrying out tensor calculation in the face fourth-order tensor model to obtain a plurality of face feature tensor data corresponding to each color picture containing a face image;

the second calculation module is used for performing norm calculation summation on the plurality of face feature tensor data and reference face feature tensor data, wherein the reference face feature tensor data is certain face tensor feature data selected from the plurality of face feature tensor data;

a classification module for setting a sampling threshold value sigma according to the number of samples of the video to be identified converted into color pictures

and the track recognition module is used for restoring each color picture corresponding to the same face into the video to be recognized, recording the position information of the face in the video to be recognized and obtaining the figure track of the face in the video to be recognized.

Preferably, the establishing module includes:

the first extraction unit is used for converting the color picture containing the face image into a corresponding black-and-white picture and extracting position data of the face in the black-and-white picture, wherein the position data comprises initial coordinates, width data and height data;

the second extraction unit is used for acquiring the face position data of the color picture corresponding to the black-and-white picture according to the face position data in the black-and-white picture;

a first establishing unit, configured to establish, according to the width data m and the height data n in the face position data of the color picture, a face third-order tensor a [1: m, 1: n, 1: r ], wherein r ═ 3, represents three colors of red, yellow and blue;

an obtaining unit, configured to set translation tensor data m/3 according to the width data m, and obtain a translation tensor B [ i: m × 2/3+ i, i: n-m 2/3+ i 2, 1: r ], wherein i represents a relation coefficient between the translated tensor and the original tensor, and the value of i is more than or equal to 1 and less than or equal to m/3;

and the second establishing unit is used for performing translation processing on the three-order tensor of the human face according to the translation tensor to obtain a four-order tensor model C [1: m/3,1: m × 2/3,1: n-m × 2/3,1: r ].

Preferably, the first calculation module includes:

a function establishing unit for establishing the number of stages C of the face fourth-order tensor model_kThe calculation function of (a), wherein,

C_k＝0.75*C_k-1+0.25*B_kwherein 1 ≦ k ≦ m/3;

a translation tensor calculation unit, configured to calculate and obtain the translation tensor B according to the translation tensor B_kA value of (d);

a data calculation unit for calculating the face feature tensor data C, wherein C is C_kWherein k is m/3.

Compared with the prior art, the method has the advantages that all video segments containing the face images in the video to be recognized are converted into the color pictures containing the face images, the face tensor fourth-order model is built according to the face position data in the color pictures, the face tensor fourth-order model contains more face information due to the fact that the face tensor fourth-order model is built according to the face position data in the color pictures, and the process of face recognition is more accurate; tensor calculation is carried out in the face fourth-order tensor model, and a plurality of face feature tensor data corresponding to the color pictures containing the face images are obtained; calculating and summing norms of the plurality of human face feature tensor data and the benchmark human face feature tensor data, setting a sampling threshold, classifying the corresponding human faces in the color pictures, and obtaining each color picture corresponding to the same human face; and restoring each color picture corresponding to the same face to the video to be recognized, recording the position information of the face in the video to be recognized, and obtaining the figure track of the face in the video to be recognized. In the process of face recognition, different photos of the same person can be recognized without training parameters based on a tensor model, so that the cost can be effectively saved; and at the same time, the track of a person in the video is identified without the known face of the person, so the method has a wide application range. In summary, the invention achieves the purposes of wide application range and reduction of cost and computation amount.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

Fig. 1 is a schematic flowchart illustrating a method for identifying a track of a person in a video according to an embodiment of the present invention;

fig. 2 is a schematic flowchart illustrating a process of establishing a fourth-order tensor model in step S12 shown in fig. 1 according to an embodiment of the present invention;

fig. 3 is a schematic flowchart of a corresponding process of calculating the tensor data of the face features in step S13 shown in fig. 1 according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of an apparatus for recognizing a human track in a video according to a third embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

The terms "first" and "second," and the like in the description and claims of the present invention and the above-described drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "comprising" and "having," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not set forth for a listed step or element but may include steps or elements not listed.

Example one

Fig. 1 is a schematic flowchart of a method for identifying a track of a person in a video according to an embodiment of the present invention, where the method includes the following steps:

s11, converting all video clips containing face images in the video to be recognized into a plurality of color pictures containing the face images;

specifically, in order to identify the trajectory of a person in a video, a color picture including the face of the person is acquired.

S12, obtaining face position data in the color picture containing the face image, and establishing a face fourth-order tensor model of the color picture containing the face image according to the face position data;

it can be understood that, because the face fourth-order tensor model is established according to the face position data in the face color picture, the included face information is more comprehensive, the influence caused by face offset is weakened, and the recognition rate can be effectively improved.

S13, carrying out tensor calculation in the face fourth-order tensor model to obtain a plurality of face feature tensor data corresponding to the color pictures containing the face images;

s14, performing norm calculation summation on the plurality of face feature tensor data and reference face feature tensor data, where the reference face feature tensor data is some face tensor feature data selected from the plurality of face feature tensor data;

s15, setting a sampling threshold value sigma according to the number of samples of the color picture converted from the video to be identified

Classifying the corresponding faces in the color pictures to obtain each color picture corresponding to the same face, wherein S is a norm calculation summation result, and m is a baseThe quasi-human face feature tensor data correspond to human face position width data, the n-benchmark human face feature tensor data correspond to human face position height data, and r is 3 to represent red, yellow and blue colors;

s16, restoring each color picture corresponding to the same face to the video to be recognized, recording the position information of the face in the video to be recognized, and obtaining the character track of the face in the video to be recognized.

Specifically, recording the position information of the face appearing in the video to be recognized is as follows: and recording information such as the frame number, displacement, position and the like of the same person in the video to be identified.

According to the technical scheme disclosed by the embodiment of the invention, all video segments containing face images in a video to be recognized are converted into a plurality of color pictures containing the face images, a face tensor fourth-order model is established according to face position data in the color pictures, and the face tensor fourth-order model contains more face information due to the fact that the face tensor fourth-order model is established according to the face position data in the color pictures, so that the process of face recognition is more accurate; tensor calculation is carried out in the face fourth-order tensor model, and a plurality of face feature tensor data corresponding to the color pictures containing the face images are obtained; calculating and summing norms of the plurality of human face feature tensor data and the benchmark human face feature tensor data, setting a sampling threshold, classifying the corresponding human faces in the color pictures, and obtaining each color picture corresponding to the same human face; and restoring each color picture corresponding to the same face to the video to be recognized, recording the position information of the face in the video to be recognized, and obtaining the figure track of the face in the video to be recognized. In the process of face recognition, different photos of the same person can be recognized without training parameters based on a tensor model, so that the cost can be effectively saved; and at the same time, the track of a person in the video is identified without the known face of the person, so the method has a wide application range. In summary, the invention achieves the purposes of wide application range and reduction of cost and computation amount.

Example two

Referring to a first embodiment of the present invention and a specific process of steps S11 to S16 described in fig. 1, and referring to fig. 2, which is a corresponding schematic flow chart of creating a fourth-order tensor model in step S12 shown in fig. 1 according to an embodiment of the present invention, the acquiring face position data in the color picture including the face image, and creating the face fourth-order tensor model of the color picture including the face image according to the face position data includes:

s121, converting the color picture containing the face image into a corresponding black-and-white picture, and extracting position data of the face in the black-and-white picture, wherein the position data comprises an initial coordinate, width data and height data;

specifically, the main purpose of converting a color picture into a black-and-white picture to obtain position data is to reduce the amount of computation, convert the color picture into a black-and-white picture and detect possible face positions in the black-and-white picture, further determine whether the face is the face by detecting the representative characteristics of the five sense organs such as skin, eyes, mouth, nose and the like, and record corresponding face position data.

S122, acquiring face position data of a color picture corresponding to the black-and-white picture according to the face position data in the black-and-white picture;

the face position data of the color picture comprises initial coordinates, width data and height data;

s123, establishing a human face third-order tensor A [1: m, 1: n, 1: r ], wherein r ═ 3, represents three colors of red, yellow and blue;

s124, setting translation tensor data m/3 according to the width data m, and obtaining a translation tensor B [ i: m × 2/3+ i, i: n-m 2/3+ i 2, 1: r ], wherein i represents a relation coefficient between the translated tensor and the original tensor, and the value of i is more than or equal to 1 and less than or equal to m/3;

specifically, since most of the faces in the video are not front faces, or there is an offset or an occlusion, further face translation processing is required to construct a fourth-order tensor. I denotes the relationship of the flattened tensor to the original tensor, for example, I2 identifies the flattened second tensor, which takes 2 to m 2/3+2 lines of the first index of the original tensor and 2 to n-m 2/3+4 lines of the second index.

S125, translating the three-order tensor of the human face according to the translation tensor to obtain a four-order tensor model C [1: m/3,1: m 2/3,1: n-m 2/3 of the human face, 1: r ].

Correspondingly, referring to fig. 3, a corresponding schematic flowchart of a process of calculating face feature tensor data in step S13 shown in fig. 1 according to an embodiment of the present invention, where tensor calculation is performed in the face fourth-order tensor model to obtain a plurality of face feature tensor data corresponding to each color picture including a face image, includes:

s131, establishing a series C of a face fourth-order tensor model_kIn which C_k＝0.75*C_k-1+0.25*B_kWherein 1 ≦ k ≦ m/3;

s132, calculating and obtaining the translation tensor B according to the translation tensor B_kA value of (d);

s133, calculating the face feature tensor data C, where C is C_kWherein k is m/3.

Specifically, each numerical value in the RGB image of the human face is greater than 0 and smaller than 255, so the tensor C corresponding to the fourth-order tensor model is a positive tensor, and the eigenvalue of the positive tensor whose absolute value is the largest is not necessarily a negative number. If any element in the positive tensor increases, the maximum eigenvalue also strictly increases, so that it can be considered that the algorithm for solving the maximum eigenvalue corresponding to the positive tensor C is linear convergence as follows:

||X₀||＝1

X_k＝C X_k-1

in the above equation, k represents the number of iterations, i.e., the kth iteration. X₀The initial iteration value is represented by taking a positive vector, typically a full vector, and then unitizing. X_kRepresenting a k-th iteration vector, which converges to the feature vector; t is t_kRepresenting the number with the maximum ratio of the corresponding positions of the vector, and linearly converging the number to the characteristic value;

although the solution of the eigenvalue and eigenvector of the positive tensor is linear convergence, the data volume of the tensor is large, the multiplication calculation amount of the tensor and the vector is huge, and even if the iteration times are not many, the calculation amount is still large, and in order to improve the corresponding calculation speed, in the embodiment of the invention, a method of number series is preferably used for replacing the solution of the tensor eigenvalue and eigenvector, that is:

C_k＝0.75*C_k-1+0.25*B_k

C₁＝C(1),

B_i+1＝C(i+1)

C₁the 1 st item representing the array is the 1 st tensor B translated out₁；C_iThe ith item representing the array is obtained by iteration of an expression; b is_i+1Is translated out of_i+1The tensor is obtained, and then the face tensor C is updated into a series of new data C which can embody the face characteristics_k。

According to the technical scheme disclosed by the second embodiment of the invention, firstly, the color picture is converted into the black and white picture, and then the face position data of the color picture is obtained by calculating the face position data in the black and white picture, so that the operation amount in the face recognition process can be reduced; because most of the human faces in the video are not front faces, the translation processing of the human faces is required to establish the fourth-order tensor of the human faces, the essential characteristics of the human faces are reflected by the characteristic values and the characteristic vectors of the positive tensor, the calculation amount is effectively reduced by solving the data of the characteristic tensor of the human faces by a method of setting the number of stages, and meanwhile, the recognition efficiency is improved.

EXAMPLE III

Corresponding to the method for identifying the trajectory of the person in the video disclosed in the first embodiment and the second embodiment of the present invention, a third embodiment of the present invention further provides a device for identifying the trajectory of the person in the video, referring to fig. 4, which is a schematic structural diagram of the device for identifying the trajectory of the person in the video provided in the third embodiment of the present invention, and the device includes:

the conversion module 1 is used for converting a video to be identified into a plurality of color pictures containing face images according to whether the video segments contain the face images or not;

the establishing module 2 is used for acquiring face position data in the color picture containing the face image and establishing a face fourth-order tensor model of the color picture containing the face image according to the face position data;

the first calculating module 3 is configured to perform tensor calculation in the face fourth-order tensor model to obtain a plurality of face feature tensor data corresponding to each color picture including a face image;

the second calculation module 4 is configured to perform norm calculation summation on the multiple pieces of face feature tensor data and reference face feature tensor data, where the reference face feature tensor data is some face tensor feature data selected from the multiple pieces of face feature tensor data;

a classification module 5, configured to set a sampling threshold σ according to the number of samples of the color picture converted from the video to be recognized

and the track recognition module 6 is configured to restore each color picture corresponding to the same face to the video to be recognized, record position information of the face appearing in the video to be recognized, and obtain a figure track of the face in the video to be recognized.

Correspondingly, the establishing module 2 comprises:

a first extraction unit 21, configured to convert the color picture containing the face image into a corresponding black-and-white picture, and extract position data of the face in the black-and-white picture, where the position data includes an initial coordinate, width data, and height data;

the second extraction unit 22 is configured to obtain face position data of a color picture corresponding to the black-and-white picture according to the position data of the face in the black-and-white picture;

a first establishing unit 23, configured to establish, according to the width data m and the height data n in the face position data of the color picture, a face third-order tensor a [1: m, 1: n, 1: r ], wherein r ═ 3, represents three colors of red, yellow and blue;

an obtaining unit 24, configured to set translation tensor data m/3 according to the width data m, and obtain a translation tensor B [ i: m × 2/3+ i, i: n-m 2/3+ i 2, 1: r ], wherein i represents a relation coefficient between the translated tensor and the original tensor, and the value of i is more than or equal to 1 and less than or equal to m/3;

the second establishing unit 25 is configured to perform translation processing on the third-order tensor of the human face according to the translation tensor, to obtain a fourth-order tensor model C [1: m/3,1: m × 2/3,1: n-m × 2/3,1: r ].

Correspondingly, the first computing module 3 includes:

a function establishing unit 31 for establishing the number of levels C of the four-order tensor model of the human face_kThe calculation function of (a), wherein,

C_k＝0.75*C_k-1+0.25*B_kwherein 1 ≦ k ≦ m/3;

a translation tensor calculation unit 32, configured to calculate and obtain the translation tensor B according to the translation tensor B_kA value of (d);

a data calculating unit 33, configured to calculate the face feature tensor data C, where C is equal to C_kWherein k is m/3.

In the third embodiment of the invention, all video segments containing face images in a video to be recognized are converted into a plurality of color pictures containing the face images in the conversion module, and then a face tensor fourth-order model is established according to face position data in the color pictures through the establishment module; tensor calculation is carried out in a first calculation module, and a plurality of face feature tensor data corresponding to the color pictures containing the face images are obtained; performing norm calculation summation on the plurality of human face feature tensor data and reference human face feature tensor data in a second calculation module, setting a sampling threshold value through a classification module, and classifying the corresponding human faces in the color pictures to obtain each color picture corresponding to the same human face; and finally, restoring each color picture corresponding to the same face to the video to be recognized in a track recognition module, and recording the position information of the face in the video to be recognized to obtain the figure track of the face in the video to be recognized. In the process of face recognition, different photos of the same person can be recognized without training parameters based on a tensor model, so that the cost can be effectively saved; and at the same time, the track of a person in the video is identified without the known face of the person, so the method has a wide application range. In summary, the invention achieves the purposes of wide application range and reduction of cost and computation amount.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for identifying a trajectory of a person in a video, the method comprising:

2. The method according to claim 1, wherein the obtaining face position data in the color picture including the face image, and establishing a face fourth-order tensor model of the color picture including the face image according to the face position data comprises:

3. The method according to claim 2, wherein the performing tensor calculation in the face fourth-order tensor model to obtain a plurality of face feature tensor data corresponding to the color pictures containing the face images comprises:

establishing the series C of the four-order tensor model of the human face_kIn which C_k＝0.75*C_k-1+0.25*B_kWherein 1 ≦ k ≦ m/3;

calculating to obtain a translation tensor B according to the translation tensor B_kA value of (d);

calculating the face feature tensor data C, wherein C ═ C_kIn the formula，k＝m/3。

4. An apparatus for identifying a trajectory of a person in a video, the apparatus comprising:

5. The apparatus of claim 4, wherein the establishing module comprises:

6. The apparatus of claim 5, wherein the first computing module comprises:

a function establishing unit for establishing the number of stages C of the face fourth-order tensor model_kIn which C_k＝0.75*C_k-1+0.25*B_kWherein 1 ≦ k ≦ m/3;

a translation tensor calculation unit, configured to calculate and obtain a translation tensor B according to the translation tensor B_kA value of (d);