CN110543584A

CN110543584A - method, device, processing server and storage medium for establishing face index

Info

Publication number: CN110543584A
Application number: CN201810534101.5A
Authority: CN
Inventors: 杨宇; 谢金运; 伍倡辉
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2018-05-29
Filing date: 2018-05-29
Publication date: 2019-12-06
Anticipated expiration: 2038-05-29
Also published as: CN110543584B

Abstract

The embodiment of the invention provides a method, a device, a processing server and a storage medium for establishing a face index, wherein the method comprises the following steps: acquiring at least one video; respectively determining the corresponding face characteristics of each video, and determining the video information of each face characteristic in the corresponding video; clustering the face features of the same face to obtain at least one first cluster, wherein the face features aggregated by the first cluster represent the face features of the same face in the at least one video; and associating the face features with the video information of the corresponding video aiming at the face features aggregated by the first clusters to obtain the face index of the at least one video. The embodiment of the invention can efficiently establish the face index under the condition of no face information mark.

Description

method, device, processing server and storage medium for establishing face index

Technical Field

the invention relates to the technical field of data processing, in particular to a method and a device for establishing a face index, a processing server and a storage medium.

Background

the face index represents the association between the face features in the video and the video information, and the face index is established for the video, so that the video information associated with the target character in the video (such as the video time point, the video progress and the like of the target character in the video) can be efficiently provided when the scenes such as the target character and the like are inquired in the video; based on the characteristics of the face index, the face index is widely applied to the fields of video on demand, security protection and the like.

the establishment process of the face index generally requires a user to mark face information in advance; however, there are many people in the video, and the face index is established based on the way of marking the face information by the user, which will cause extremely tedious work of marking the face information by the user, and the difficulty of marking the face information is also extremely high, which undoubtedly results in extremely low efficiency of establishing the face index.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method, an apparatus, a processing server, and a storage medium for establishing a face index, so as to efficiently establish a face index without marking face information.

in order to achieve the above purpose, the embodiments of the present invention provide the following technical solutions:

A method of creating a face index, comprising:

acquiring at least one video;

respectively determining the corresponding face characteristics of each video, and determining the video information of each face characteristic in the corresponding video;

Clustering the face features of the same face to obtain at least one first cluster; wherein, the face features aggregated by one first cluster represent the face features of the same face in the at least one video;

And associating the face features with the video information of the corresponding video aiming at the face features aggregated by the first clusters to obtain the face index of the at least one video.

The embodiment of the invention also provides a device for establishing the face index, which comprises the following steps:

The video acquisition module is used for acquiring at least one video;

The face characteristic and video information determining module is used for respectively determining the face characteristics corresponding to each video and determining the video information of each face characteristic in the corresponding video;

the first cluster obtaining module is used for carrying out cluster processing on the face features of the same face to obtain at least one first cluster; wherein, the face features aggregated by one first cluster represent the face features of the same face in the at least one video;

And the face index establishing module is used for associating the face features with the video information of the corresponding video aiming at the face features aggregated by the first clusters to obtain the face index of the at least one video.

An embodiment of the present invention further provides a processing server, including: the face index establishing method comprises at least one memory and at least one processing chip, wherein the memory stores a program, and the processing chip calls the program to realize the steps of the face index establishing method.

the embodiment of the invention also provides a storage medium, wherein the storage medium stores a program suitable for being executed by a processing chip so as to realize the steps of the method for establishing the face index.

Based on the above technical solution, the method for establishing a face index provided in the embodiment of the present invention can perform clustering processing on face features of the same face for at least one video to obtain a first cluster corresponding to the at least one video; and further associating each face feature aggregated by each first cluster with the video information of the corresponding video, so that the face index suitable for the at least one video can be efficiently established under the condition of no face information mark.

further, by clustering the face features of the same face in the at least one video, the face features aggregated in one first cluster can be considered to represent the same face, so that the number of faces in the face index library can be reduced, and the data redundancy of the face index can be greatly reduced.

drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a diagram illustrating a conventional method for creating a face index based on user-tagged face information;

fig. 2 is a schematic structural diagram of a system for establishing a face index according to an embodiment of the present invention;

Fig. 3 is a signaling flowchart of a method for establishing a face index according to an embodiment of the present invention;

FIG. 4 is a diagram illustrating an example of a process for creating face indexes under multiple videos;

Fig. 5 is another signaling flowchart of a method for establishing a face index according to an embodiment of the present invention;

Fig. 6 is a diagram illustrating an example of processing for creating a face index in the case of one video;

fig. 7 is a flowchart of a method for clustering face features of the same face according to an embodiment of the present invention;

Fig. 8 is a flowchart of another method for clustering face features of the same face according to the embodiment of the present invention;

FIG. 9 is a schematic diagram of adding facial features to a cluster;

FIG. 10 is a schematic diagram of clusters of facial features from which secondary faces are deleted;

FIG. 11 is a diagram illustrating an exemplary application provided by an embodiment of the present invention;

Fig. 12 is a block diagram of a structure of an apparatus for creating a face index according to an embodiment of the present invention;

Fig. 13 is a block diagram of a processing server according to an embodiment of the present invention.

Detailed Description

Fig. 1 is a schematic diagram of a conventional method for creating a face index based on marking face information by a user, as shown in fig. 1, when creating a face index, the user needs to mark face information such as basic information of a person (e.g., name, gender, etc.) and a face image (e.g., a front face image, a side face image, etc.) of the person, so that a server takes the marked face information as input for registration, determines video information corresponding to face features of the face image in a video through registration processing, and associates the video information with the basic information and the face features of the person to create the face index of the person;

it can be seen that, in the traditional way of establishing face indexes, when there are many people in a video (especially for large-scale video data), there are very tedious works of marking face information by a user, and the difficulty of marking face information is also very high, and there is a problem that the establishment efficiency of face indexes is very low;

Further, for a person who has no labeled face information but exists in a video, a conventional way of establishing a face index is to register a face image without labeled face information in the video as an independent input, so that data of the face index is extremely redundant.

In order to solve the above-mentioned drawbacks, embodiments of the present invention provide a scheme for efficient face index establishment without face information tagging; the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 2 is a schematic diagram of an alternative architecture of a system for establishing a face index according to an embodiment of the present invention, and as shown in fig. 2, the system may include: video source 10, processing server 20, face index repository 30.

The video source 10 may be considered as a source of a video, and depending on an application scenario of the embodiment of the present invention, the video may be represented by streaming data, such as a video stream;

as an example, for a live broadcast scene, a video source may be a live video server, and the embodiment of the present invention may perform face index establishment on a live video output by the live video server; for the on-demand scene, the video source can be an on-demand video library (a plurality of videos are recorded in the on-demand video library), the on-demand video library can provide the videos on demand by the user when the user requests the videos, and in the embodiment of the invention, the face index can be established for the videos recorded in the on-demand video library; of course, the video source may also be a movie video library, a television video library, etc., and is not specifically described one by one; obviously, the video source may be at least one (one or more) of the above described forms of video source.

The processing server 20 is a service device for establishing a face index for a video provided by the video source 10, and is a main processing device for establishing a face index in the embodiment of the present invention; the processing server may be implemented by a single server or a server group consisting of a plurality of servers.

The face index database 30 is a database for recording face indexes established in the embodiment of the present invention.

As a system structure variation, the face index library may also be implemented by a storage unit in the processing server, for example, a storage device with data storage capability in the processing server may record the face index established by the embodiment of the present invention.

in order to efficiently establish a face index under the condition of no face information mark, the core process of establishing the face index in the embodiment of the invention can be as follows:

the processing server acquires at least one video which is sent by a video source and used for establishing a face index; for the at least one video, the processing server can respectively determine the face features corresponding to each video, and determine the video information of each face feature in the corresponding video (such as the video time point and/or the video progress of each face feature in the video to which the face feature belongs); therefore, the processing server can perform clustering processing on the face features of the same face to obtain at least one first cluster (for convenience of description, in the embodiment of the present invention, one cluster of the face features of the same face in at least one video can be defined as one first cluster; and further, for each face feature aggregated by each first cluster, associating the face feature with the video information of the corresponding video to obtain a face index suitable for the at least one video.

Based on the core process, the processing server can perform clustering processing on the face features of the same face on at least one acquired video under the condition of no face information mark to obtain at least one first cluster, so that association of the face features in the video information of the corresponding video is performed on the face features aggregated by the first clusters to obtain a face index of the at least one video; the face index is efficiently established under the condition of no face information mark.

based on the above core flow, the embodiment of the present invention may perform face index establishment on one video, and may also perform face index establishment on multiple videos, which will be described below.

optionally, fig. 3 shows an optional signaling flow for establishing a face index according to an embodiment of the present invention in a case where a face index is established for multiple videos; it should be noted that the flow shown in fig. 3 is only an optional flow for establishing a face index in the case of multiple videos according to the embodiment of the present invention, and based on the core flow described above, the embodiment of the present invention may also perform establishment of a face index in the case of multiple videos by using other method flows;

Referring to fig. 3, when face index establishment is performed based on multiple videos, a process provided by an embodiment of the present invention may include:

in step S10, the video source inputs a plurality of videos to the processing server.

The multiple videos referred to here may be multiple videos of a video source, and may be considered as multiple videos used for establishing a face index in the embodiment of the present invention, that is, the embodiment of the present invention may establish a face index for the multiple videos comprehensively;

As an example, a video source may randomly input a plurality of videos to a processing server, for example, the video source randomly inputs a plurality of videos of the video source to the processing server, and in this process, the processing server may comprehensively establish a face index for the plurality of videos input by the video source; alternatively, the video source may be designated (e.g., by a processing server or by a worker), and the designated plurality of videos are input to the processing server;

optionally, in the embodiment of the present invention, each video of the video source may have a unique identifier (e.g., a video ID);

Optionally, any one of the videos input by the video source to the processing server may be in a streaming form (i.e., a video stream), and accordingly, the video source may input a plurality of video streams to the processing server; obviously, the existence of the video in the streaming form is only optional, and the embodiment of the present invention may also support the face index establishment for videos in other forms, for example, after the processing server may acquire complete videos from the video source, the face index establishment is performed for the complete videos.

Accordingly, the processing server may obtain multiple videos.

Step S11, the processing server determines the face features corresponding to each video, and determines the video information of each face feature in the corresponding video.

The processing server can respectively determine corresponding face features for each acquired video, and determine video information (such as video time points and/or video progress of the face features in the corresponding video) of each determined face feature in the corresponding video;

as an example, for any one video acquired by the processing server, the processing server may extract facial features from key video frames in the video, so as to process each video acquired by the processing server and determine corresponding facial features of each video; meanwhile, for the determined face features corresponding to each video, determining video information of key video frames corresponding to each face feature in the corresponding video (for example, the key video frames corresponding to each face feature, video time points and/or video progress of the video to which the key video frames belong);

for example, for a certain video acquired by the processing server, the processing server may extract the face features of key video frames in the video, and determine video information of the key video frames of the face features in the video.

As another example, for any video acquired by the processing server, the processing server may capture a video according to a set time interval, extract facial features from each captured video, process each video acquired by the processing server, and determine facial features corresponding to each video; meanwhile, determining video information of corresponding screenshots of the face features in corresponding videos (such as the screenshots corresponding to the face features, video time points and/or video progress in the corresponding videos);

For example, for a certain video acquired by the processing server, the processing server may capture a screenshot of the video according to a set time interval, extract a face feature of each screenshot, and determine video information of the screenshot of each face feature in the video.

Optionally, the two manners shown above for determining the face features corresponding to the videos may be used in combination or alternatively; of course, the embodiment of the present invention may also perform face feature extraction on each video frame of each acquired video to determine the face feature corresponding to each video; and simultaneously, determining the video information of the video frame corresponding to each face characteristic in the corresponding video.

Optionally, the video information of the face features in the corresponding video may further include: identification of the video (such as video ID) corresponding to the facial features.

Step S12, the processing server performs clustering processing on the face features of the same face respectively for the face features corresponding to each video, so as to obtain at least one second cluster corresponding to each video respectively.

after the face features corresponding to each video are determined, for any video, the face features of the same face in the video can be grouped into one type to obtain a second cluster corresponding to the video (that is, one second cluster of one video can represent one cluster of the face features of the same face in the video); processing each video in this way, and then obtaining a second cluster corresponding to each video;

It is understood that a first cluster refers to a cluster of facial features of the same face in at least one video that the processing server uses to create the face index; and the second cluster refers to a cluster of the face features of the same face in one video under the condition that the processing server acquires a plurality of videos.

For any video, the embodiment of the invention can analyze the similarity among the face features of the video, and cluster the face features of which the similarity meets the requirement of preset similarity in the video into one class, thereby realizing the clustering processing of the face features of the same face in the video;

Optionally, the requirement for the predetermined similarity may be determined according to an actual situation, and the embodiment of the present invention is not limited, and as an optional implementation, the face features may be represented by using face feature vectors (for example, high latitude face feature vectors are used to represent face features), the similarity between the face features may be represented by using distances between the face feature vectors, predetermined vector distances corresponding to the face features representing the same face may be set, and the face features whose distances between the face feature vectors are within the predetermined vector distances are grouped into one class.

further, a video often has a primary face and a secondary face, and for any video, the embodiment of the present invention may determine a cluster of face features of each primary face in the video (the number of primary faces in a video may be one or more), to obtain a second cluster corresponding to the video;

correspondingly, when the face features of the same face are clustered aiming at the face features corresponding to each video, for any video, the embodiment of the invention can delete the cluster corresponding to the secondary face in the cluster of the video after clustering the face features of the same face in the video, and only keep the cluster corresponding to the primary face in the cluster of the video to obtain a second cluster corresponding to the video;

as an optional implementation, for any video, after clustering the face features of the same face in the video, the embodiment of the present invention may further delete the cluster in which the number of occurrences of the face features in the cluster of the video is smaller than the threshold number of occurrences (i.e., delete the cluster of the secondary face in the cluster of the video), so as to obtain a second cluster corresponding to the video.

And step S13, the processing server associates the face features with the video information of the corresponding video according to the face features aggregated by the corresponding second clusters of the videos respectively to obtain the clustering results of the videos.

After determining corresponding second clusters respectively for each video to obtain at least one second cluster corresponding to each video, the embodiment of the invention can respectively associate the face features with the video information of the corresponding video for the face features aggregated by each second cluster corresponding to each video; for any second cluster (a second cluster represents a cluster of face features of the same face in a video), the embodiment of the present invention may associate, for each face feature aggregated by the second cluster, the face feature at the video time point and/or the video progress of the video to which the face feature belongs; processing the second cluster corresponding to each video so as to obtain a clustering result of each video;

It can be seen that the clustering result of a piece of video may at least include: and aggregating at least one second cluster of the face features of the same face in the video, and aggregating all the face features aggregated by all the second clusters in the video information of the video.

it should be noted that, step S12 and step S13 are only an optional way for obtaining the clustering result of each video in the embodiment of the present invention, and other ways may also be used in the embodiment of the present invention to obtain the clustering result of each video.

And step S14, the processing server carries out clustering processing on the face features of the same face according to the clustering result of each video to obtain at least one first cluster corresponding to the plurality of videos.

after the clustering result of each video is obtained, second clusters of the face features of the same face can be determined respectively for each video; on this basis, the face features of the same face may be clustered with respect to the second cluster of each video, so as to determine a cluster suitable for the face features of the same face in the plurality of videos, and obtain a first cluster corresponding to the plurality of videos (i.e., a cluster of face features of the same face in the plurality of videos if the at least one video is a plurality of videos).

and step S15, associating the face features with the video information of the corresponding video aiming at the face features aggregated by the first clusters to obtain the face index of the at least one video.

after the first clusters corresponding to the multiple videos are obtained, for each face feature aggregated by each first cluster, the face features can be associated with the video information of the corresponding video to obtain the face indexes of the multiple videos.

And step S16, the processing server writes the face index into a face index library.

The method for establishing the face index provided by the embodiment of the invention can perform clustering processing on the face characteristics of the same face respectively aiming at each video under the condition of a plurality of videos to obtain the second cluster corresponding to each video, and perform association of the face characteristics on the video information of the corresponding video through the face characteristics gathered by each second cluster corresponding to each video to obtain the clustering result of each video; therefore, the clustering process can be further carried out on the face features of the same face in the clustering results of the videos to obtain corresponding first clusters suitable for the videos; and further associating each face feature aggregated by each first cluster with the video information of the corresponding video, so that the face index suitable for the at least one video can be efficiently established under the condition of no face information mark.

for convenience of understanding, the description will be made by taking two videos as the plurality of videos and dividing the videos into a first video and a second video as an example, and assuming that the first video and the second video have the same person a, the first video has a person B different from the second video, and the second video has a person C different from the first video (alternatively, the persons a and B may be persons who are main faces in the first video, and the persons a and C may be persons who are main faces in the second video), and a corresponding processing example may be as shown in fig. 4:

after the processing server acquires the first video, extracting face features from key video frames and/or screenshots with set time intervals at intervals in the first video, and performing clustering processing on the face features of the same face to obtain a second clustering FA of the face features of the person A in the first video, wherein the second clustering FA is represented by [ FA1, FA2, … and FAn ], and each face feature of the person A is represented by [ TA1, TA2, … and TAN ] at a time point TA corresponding to the first video; obtaining a second cluster FB of the face features of the person B in the first video, wherein the second cluster FB is represented by [ FB1, FB2, … and FBn ], and each face feature of the person B is represented by [ TB1, TB2, … and TBn ] at a corresponding time point TB of the first video;

moreover, after the processing server acquires the second video, the processing server may extract the face features from the key video frames and/or the screenshots with time intervals set at intervals in the second video, and perform clustering processing on the face features of the same face to obtain a second cluster FA 'of the face features of the person a in the second video, which is represented by [ FA 1', FA2 ', …, FAn' ] and [ TA1 ', TA 2', …, TAn '] of each face feature of the person a at a corresponding time point TA' of the second video; obtaining a second cluster FC of each face feature of the person C in the second video, wherein the second cluster FC is represented by [ FC1, FC2, … and FCn ], and each face feature of the person C is represented by [ TC1, TC2, … and TCn ] at each corresponding time point TC of the second video;

after the processing server groups the face features of the person a in the first video into one group and the face features of the person B into one group, the processing server may associate each of the face features of the second cluster of the face features of the person a in the first video with each of the corresponding time points of the first video and the corresponding ID of the first video (the ID of the first video may be represented by VID 1), so as to obtain a clustering result of the person a in the first video, which may be represented by [ < FA1, VID1, TA1>, < FA2, VID1, TA2> …, < FAn, VID1, TAn > ]; for each face feature of the second cluster of face features of the person B in the first video, associating each corresponding time point in the first video with the corresponding ID of the first video to obtain a clustering result of the person B in the first video, which can be represented by [ < FB1, VID1, TB1>, < FB2, VID1, TB2> …, < FBn, VID1, TBn > ];

after the processing server groups the face features of the person a in the second video into one group and the face features of the person C into one group, the processing server may associate each of the face features of the second cluster of the face features of the person a in the second video with each of the time points corresponding to the second video and the ID of the corresponding second video (the ID of the second video may be represented by VID 2), to obtain a clustering result of the person a in the second video, which may be represented by [ < FA1 ', VID2, TA 1' >, < FA2 ', VID2, TA 2' > …, < FAn ', VID2, TAn' >; for each face feature of the second cluster of face features of the person C in the second video, associating each corresponding time point in the second video with the corresponding ID of the second video to obtain a clustering result of the person C in the second video, which can be represented by [ < FC1, VID2, TC1>, < FC2, VID2, TC2> …, < FCn, VID2, TCn > ];

After obtaining the clustering result of the person a and the clustering result of the person B in the first video and obtaining the clustering result of the person a and the clustering result of the person C in the second video, the processing server may perform clustering on the face features of the same face in the first video and the second video, that is, the face features in the clustering result of the person a in the first video and the face features in the clustering result of the person a in the second video, so as to obtain a first cluster of the person a suitable for the first video and the second video, which is expressed by [ < FA1, FA2, …, FAn >, < FA1 ', FA2 ', …, FAn ' >; since the persons B and C are different persons in the first video and the second video, the second cluster of the person B in the first video may be the first cluster of the person B suitable for the first video and the second video, and the second cluster of the person C in the second video may be the first cluster of the person C suitable for the first video and the second video;

Further, the face features aggregated by the first clusters can be associated with the video information of the face features in the corresponding video to obtain face indexes suitable for the first video and the second video; the resulting face index can be expressed as: [ < FA1, VID1, TA1>, < FA2, VID1, TA2> …, < FAn, VID1, TAN >, < FA1 ', VID2, TA 1' >, < FA2 ', VID2, TA 2' > …, < FAn ', VID2, TAN' > ], [ < FB1, VID1, TB1>, < FB2, VID1, TB2> …, < FBn, VID1, TBn > ], [ < FC1, VID2, TC1>, < VID2, VID2, TC2> …, < FCn, VID2, TCn > ].

alternatively, the second cluster FA of the face feature of the person a in the first video may be represented by a mean value of [ FA1, FA2, …, FAn ], the second cluster FB of the face feature of the person B in the first video may be represented by a mean value of [ FB1, FB2, …, FBn ], the second cluster FA 'of the face feature of the person a in the second video may be represented by a mean value of [ FA 1', FA2 ', …, FAn' ], and the second cluster FC of the face feature of the person C in the second video may be represented by a mean value of [ FC1, FC2, …, FCn ].

The above shows a scheme for establishing a face index in the case of multiple videos; the embodiment of the invention can also establish the face index under the condition of one video; optionally, fig. 5 shows another optional signaling flow for establishing a face index according to an embodiment of the present invention; referring to fig. 5, when face index establishment is performed based on a video, a process provided by an embodiment of the present invention may include:

in step S20, the video source inputs a video to the processing server.

The video referred to here may be any video of a video source, and may be considered as any video for establishing a face index in the embodiment of the present invention;

as an example, the video source may input videos to the processing server one by one or randomly, for example, if the video source inputs videos of the video source to the processing server one by one or randomly, in this process, the processing server may establish a face index one by one for any one video input by the video source; alternatively, the video source may be designated (e.g., by the processing server or by a worker), and the designated video is input to the processing server;

Alternatively, the video may exist in the form of a video stream.

Accordingly, the processing server may retrieve the video.

and step S21, the processing server determines the face characteristics corresponding to the video and determines the video information corresponding to the face characteristics in the video.

Alternatively, for a piece of video, the manner of determining the corresponding facial features of the video may be as shown in step S11 in fig. 3.

and step S22, the processing server carries out clustering processing on the face features of the same face according to the face features corresponding to the video to obtain a first cluster corresponding to the video.

Optionally, for a video, the process of clustering the face features of the same face in the face features corresponding to the video may be as shown in step S12 in fig. 3;

it should be noted here that, in the case of multiple videos, the cluster of the facial features of the same face of each video is referred to as a second cluster, and the cluster of the facial features of the same face of each second cluster in the multiple videos may be referred to as a first cluster suitable for the multiple videos (in the case of multiple videos, the first cluster referred to in this embodiment of the present invention is in one form);

in the case of a video, the cluster of facial features of the same face in the video can be regarded as the first cluster (a form of the first cluster in the case of a video) suitable for the video.

And step S23, the processing server associates the face features with the video information of the video according to the face features of the clusters corresponding to the video, and establishes the face index of the video.

after the face features of the same face in the video are clustered to obtain clusters corresponding to the video, the embodiment of the invention can associate the face features of each cluster corresponding to the video with the video information of the video to obtain a clustering result corresponding to the video; therefore, under the condition of establishing the face index for one video, the corresponding clustering result of the video can be used as the face feature of the video.

as can be seen, in the embodiment of the present invention, the clustering result corresponding to one video may include: and each cluster of the face features of the same face in the video and the video information corresponding to the face features of each cluster in the video.

optionally, the video information of the face features of one cluster in the corresponding video may at least include: each face feature of the cluster is at the time point of the corresponding video; as an alternative, the face features of a cluster in the video information of the corresponding video may include: the video progress of each face feature of the cluster in the corresponding video; obviously, the video information of the face features of one cluster in the corresponding video may also include: the face features of the cluster are at the time point of the corresponding video and/or the video progress;

further, the video information may also include: identification of the video, such as an ID of the video, etc.

as a hypothetical example, in the case of a video, if F represents a face feature (e.g., using a high latitude face feature vector), T represents a corresponding time point of each face feature in the video, and if there is a main face in the video, a first cluster of face features of the main face in the video can be obtained, and the first cluster is represented by [ F11, F12, …, F1n ]; meanwhile, the corresponding time points [ T11, T12, … and T1n ] of each face feature of the first cluster in the video can be determined; for each face feature of the first cluster, after associating the face feature at a corresponding time point of the video, under a video, an optional representation form of the face index may be:

[ < F11, F12, …, F1n >, < T11, T12, …, T1n > ]; wherein, < F11, F12, …, F1n > may also be represented using the mean value of the human face features; optionally, the face index may further include: an identification of the video;

of course, when there are multiple faces (e.g., multiple main faces) in a video, there may be multiple first clusters of face features belonging to different faces in the face index, and each face feature in each first cluster is associated with a corresponding time point in the video.

And step S24, the processing server writes the face index into a face index library.

Under the condition of establishing a face index for a video, after a corresponding clustering result of the video is obtained, the embodiment of the invention can realize clustering of the face features of the same face in the video through the corresponding clustering result of the video, and associate the face features of each cluster with the video information corresponding to the video, and under the condition of no face information mark, associate the face features of at least one same figure in the video with the video information, thereby achieving the establishment of the face index of the video.

And the facial features of the same face are clustered, so that the facial features of the same cluster represent the same face, the number of faces in a face index library can be reduced, and the data redundancy of face indexes can be greatly reduced.

For convenience of understanding, in the case of a video, the corresponding processing procedure may be as shown in fig. 6, where the video information is represented by the facial features at the corresponding video points of the video, and referring to fig. 6, after the facial features of the same face in the video are clustered, a plurality of clusters (i.e., a plurality of first clusters corresponding to the video) may be obtained; the cluster 1 can be a cluster of face features < F11, F12, …, F1n > of the same face, and each face feature in the cluster 1 is associated at a corresponding video time point of the video, so that [ < F11, T11>, < F12, T12> …, < F1n, T1n > ], wherein < T11, T12, …, T1n > represents [ F11, F12, …, F1n ] at the corresponding time point of the video;

The cluster 2 may be a cluster of face features [ F21, F22, …, F2n ] of another face, and each face feature in the cluster 2 is associated at a corresponding video time point of the video, so that [ < F21, T21>, < F22, T22> …, < F2n, T2n > ], where < T21, T22, …, and T2n > represent [ F21, F22, …, F2n ] at the corresponding time point of the video.

The above describes face index establishment based on the case of multiple videos, and face index establishment in the case of one video; in summary, the embodiment of the present invention clusters the face features of the same face for at least one video for which a face index needs to be established, to obtain first clusters suitable for the at least one video, and associates the face features of the first clusters with video information of corresponding videos, to establish the face index suitable for the at least one video; it is noted that the at least one video may be a plurality of videos (as in the processing case shown in fig. 3 above) or may be one video (as in the processing case shown in fig. 5 above);

Of course, in addition to the processing cases shown in fig. 3 and fig. 5, in the embodiment of the present invention, the face features of the same face in multiple videos may be clustered, instead of clustering the face features of the same face in each video separately for each video.

optionally, in an implementation process of clustering the same face features in the face features of the video, the embodiment of the present invention may set a predetermined similarity requirement, and cluster the face features whose similarities meet the predetermined similarity requirement into one class by analyzing the similarities between the face features, so as to implement clustering the face features of the same face in the face features of the video.

Optionally, in the embodiment of the present invention, the face features may also be extracted from the video according to the playing time sequence of the video (especially, when the video is a video stream); at this time, the facial features of the video are extracted in sequence according to the playing time sequence, so the embodiment of the invention can also perform clustering processing on the facial features of the same face according to the extracting sequence of the facial features, namely for the facial features extracted first, the embodiment of the invention can perform clustering processing first, and for the facial features extracted later, the embodiment of the invention can perform clustering processing later;

optionally, fig. 7 shows a flowchart of a method for clustering face features of the same face according to an embodiment of the present invention, where the method may be applied to any stage of clustering face features of the same face, such as the stages in step S12 in fig. 3, step S14 in fig. 3, step S22 in fig. 5, and the like; referring to fig. 7, the method may include:

step S30, for any face feature to be clustered, detecting whether a target cluster with the similarity meeting the preset similarity requirement with the face feature to be clustered exists in the obtained clusters, if so, executing step S31, otherwise, executing step S32.

in the embodiment of the invention, the face features can be represented by face feature vectors; because the face features are extracted from the video in sequence according to the playing time sequence, for an extracted face feature, the processing server may or may not cluster other face features similar to the face feature before clustering;

Optionally, in the embodiment of the present invention, for any face feature to be clustered (for example, a face feature currently extracted from any one of videos, or, in the case of multiple videos, any face feature in a second cluster of any one of the videos), it may be detected whether a target cluster whose similarity to the face feature to be clustered meets a predetermined similarity requirement exists in the obtained clusters, so as to determine whether other face features belonging to the same face as the face feature to be clustered have been clustered.

And step S31, aggregating the facial features to be clustered to the target cluster.

when the target cluster with the similarity meeting the preset similarity requirement with the face feature to be clustered exists in the obtained cluster, the other face features belonging to the same face with the face feature to be clustered can be considered to be clustered, and the face feature to be clustered can be gathered to the target cluster.

and step S32, setting a new cluster, and gathering the face features to be clustered to the new cluster.

when the obtained clusters are detected, and no target cluster with the similarity meeting the preset similarity requirement with the face features to be clustered exists, the face features to be clustered can be considered to belong to a new face, the face features to be clustered can be independently used as one cluster, and the face features to be clustered are clustered into the new cluster by setting the new cluster.

The embodiment of the invention can express the similarity between the human face features through the distance of the human face feature vector of the human face features; optionally, fig. 8 shows another flow of a method for clustering face features of the same face according to an embodiment of the present invention, where the flow shown in fig. 8 may be regarded as a refinement of the flow shown in fig. 7, and with reference to fig. 8, the flow may include:

step S40, for any face feature to be clustered, detecting whether the obtained cluster exists in the preset vector distance of the face feature vector of the face feature to be clustered, if so, executing step S41, and if not, executing step S43.

and step S41, judging whether the corresponding radius of the obtained cluster is larger than a radius threshold value after the face features to be clustered are added into the obtained cluster, if not, executing step S42, and if so, executing step S43.

And step S42, determining the obtained cluster as a target cluster, and gathering the face features to be clustered to the target cluster.

And step S43, setting a new cluster, and gathering the face features to be clustered to the new cluster.

it can be seen that if an obtained cluster exists within a predetermined vector distance of a face feature vector corresponding to a face feature to be clustered, and after the face feature to be clustered is added to the obtained cluster, the radius of the obtained cluster is not greater than a radius threshold, it is determined that the obtained cluster is the target cluster, and the face feature to be clustered can be aggregated to the target cluster;

If the obtained cluster does not exist in the preset vector distance of the face feature vector corresponding to the face feature to be clustered, determining that the face feature to be clustered belongs to a new face, setting a new cluster, and aggregating the face feature to be clustered to the new cluster;

If the obtained cluster exists within the preset vector distance of the face feature vector corresponding to the face feature to be clustered, but after the face feature to be clustered is added into the obtained cluster, the radius of the obtained cluster is larger than a radius threshold value, the face feature of the obtained cluster is determined, the face feature to be clustered and the face feature to be clustered do not belong to the same face, a new cluster can be set, and the face feature to be clustered is gathered to the new cluster.

Further, after the facial features to be clustered are aggregated into a cluster (which may be an already-obtained target cluster or a new cluster), the corresponding radius and centroid of the cluster to which the facial features to be clustered are aggregated can be updated; furthermore, a corresponding video information list (such as a video time point list, a video progress list and the like) can be set for each cluster, so that after the facial features to be clustered are clustered to a certain cluster, the facial features to be clustered are inserted into the corresponding video information list of the cluster to which the facial features to be clustered are aggregated, and the facial features to be clustered are in the video information of the corresponding video.

correspondingly, when detecting whether the obtained cluster exists in the preset vector distance of the face feature vector of any face feature to be clustered, whether the obtained cluster exists or not can be judged;

as an example, taking clustering of face features of the same face in a video as an example, the face features may be extracted according to a video frame sequence of the video, the extracted face features become face features to be clustered, and the face features of the same face may be extracted according to video time points < T1, T2, …, Tn > of the video < F1, F2, …, Fn >;

for any extracted face feature, judging whether a cluster exists in neighbor with the distance epsilon from the face feature vector of the face feature; if the cluster exists, adjusting the cluster can be tried, and after the face features are added, whether the radius update of the cluster exceeds the maximum radius threshold value is determined: if the maximum radius threshold value is exceeded, setting the face feature as a new cluster, and inserting the video time point of the face feature in the video into a video time point list corresponding to the cluster; otherwise, after the face feature is added to the cluster, the centroid and the radius of the cluster are updated, and at the same time, the video time point of the video of the face feature is updated in the video time point list corresponding to the cluster, and the corresponding schematic diagram can be as shown in fig. 9.

It should be noted that clustering is actually dividing a data set into different classes or clusters according to a certain criterion (e.g., a distance criterion), so that the similarity of data in the same cluster is as large as possible, and the difference of data objects not in the same cluster is also as large as possible. The clustered data of the same class are gathered together as much as possible, and different data are separated as much as possible; in the corresponding illustration of fig. 9 described above, an empirical radius value epsilon may be set, where epsilon is used as the radius to form a circle, the data in the circle is considered similar, the centroid corresponding to a cluster is the center of the circle, and when the data gathered in a cluster changes, the centroid and the radius corresponding to the cluster are updated accordingly.

it should be further noted that, if the video is input to the processing server in the form of a video stream, the processing server may distinguish the video streams of the plurality of videos by establishing a video index chain; optionally, when a processing server acquires a video stream, it may determine, through the identifier of the video stream, whether a video index chain of the video stream exists, where one video index chain may record clusters of face features of different faces aggregated in the video (if the video index chain is in a form of a second cluster); if yes, extracting the facial features of the video stream according to a video frame sequence (a form of a video playing time sequence) aiming at the video stream, and determining corresponding clusters of the extracted facial features in a video index chain of the video (such as the implementation of a mode of fig. 7 or fig. 8);

if not, a video index chain of the video stream may be created, the facial features of the video stream are extracted according to the video frame sequence, and corresponding clusters of the extracted facial features in the created video index chain of the video are determined (for example, implemented in the manner of fig. 7 or fig. 8).

Optionally, the video index chain may exist in a key value form, for example, the video index chain of one video may use the identifier of the video as a primary key, and use each cluster of the face features of the same face in the video as a value associated with the primary key.

Optionally, at the stage of clustering the face features of the same face in any one of the videos to obtain clusters of the same face features (e.g., at the stage of S12 shown in fig. 3, S22 shown in fig. 5, etc.), the embodiment of the present invention may set the clusters of the face features corresponding to any one of the videos as clusters of the face features of the main faces in the videos;

Optionally, for any video, after at least one cluster of the face features of the same face in the video is obtained, the cluster in which the number of occurrences of the face features in the at least one cluster is smaller than a number threshold may be deleted;

if under the condition of a plurality of videos, when the processing of obtaining the second clusters corresponding to each video is performed, for any video, after at least one cluster of the face features of the same face in the video is obtained, the cluster of which the occurrence frequency of the face features in the at least one cluster is smaller than the frequency threshold value can be deleted, so that at least one second cluster corresponding to the video is obtained;

For example, in the case of a video, when processing is performed to obtain a first cluster corresponding to the video, after at least one cluster of face features of the same face in the video is obtained, clusters with face feature occurrence times smaller than a frequency threshold value may be deleted to obtain at least one first cluster corresponding to the video.

optionally, the video information of the face features in the corresponding video may include: the human face features are at the video time points of the corresponding videos; optionally, for any video, in the embodiment of the present invention, the method shown in fig. 10 may be used to delete the cluster of the face features of the secondary face in the cluster of the face features of the video; referring to fig. 10, the method may include:

Step S50, for any video, after at least one cluster of the face features of the same face in the video is obtained, determining the video time point distribution corresponding to the face features of each cluster of the video.

in order to distinguish a primary face from a secondary face, after the face features of the same face in a video are clustered for any one video, the distribution of the video time points of the face features clustered by each cluster in the corresponding video is determined, that is, the distribution of the face features of each cluster in the corresponding video time points of the video is determined.

And step S51, deleting clusters of which the video time point distribution is not in at least one set time interval.

Optionally, in the embodiment of the present invention, at least one set time interval may be set, for example, n time intervals with time duration t; therefore, for any video, the video time point distribution corresponding to the face features of each cluster of the video can be judged, whether the video time point distribution is in more than n time intervals with the duration of t is judged, if not, the face features of the clusters can be determined to belong to the secondary faces in the video and can be deleted, and if so, the face features of the clusters can be determined to belong to the main faces in the video and can be reserved.

The scheme of establishing the face index for one or more videos under the condition of no face information mark is described from different angles, and the first cluster suitable for one or more videos is obtained by clustering the face features of the same face in one or more videos; the face features aggregated by each first cluster are associated with the video information of the corresponding video, so that efficient face index establishment can be realized; furthermore, through each first cluster, the face features of the same face in one or more videos can be aggregated, the number of faces in a face index library can be reduced, and data redundancy of face indexes can be greatly reduced.

After the face index is established, the embodiment of the invention can support a user to retrieve the video with the face of the target person by inputting the face image of the target person when retrieving the video; as an application example, an application process of the embodiment of the present invention may be as shown in fig. 11, and referring to fig. 11, the process may include:

And step S60, the terminal sends a video retrieval request to the processing server, wherein the video retrieval request carries the target face image.

When a user needs to search for a video with a target person, a target face image of the target person can be input to the terminal, the terminal can correspondingly send a video retrieval request to the processing server, and the video retrieval request carries the target face image;

as an example, a user may import a target face image on a page of a video client loaded on a terminal, and after the user clicks a search button of the page, the terminal may send a video retrieval request carrying the target face image to a processing server through the video client.

and step S61, the processing server extracts the target face features of the target face image.

after the processing server obtains the video retrieval request, a target face image carried in the video retrieval request can be obtained through analysis, and face features (called as target face features) of the target face image are extracted.

And step S62, the processing server determines the video information of the corresponding video associated with the target face feature from the face index.

After the processing server establishes the face index, based on the face features aggregated by each first cluster in the face index and the video information of the associated face features in the corresponding video, the processing server may determine the video information of the corresponding video associated with the target face features (including each video corresponding to the target face features, the video time points of the target face features in each corresponding video, and the like).

Optionally, as an implementation, the processing server may perform k-nearest neighbor query on the centroid of each first cluster in the face index based on the target face features to obtain the first cluster where the target face features are located, and determine the video IDs of the videos corresponding to the face features aggregated in the first cluster where the target face features are located, and the video time points in the videos.

And step S63, the processing server sends the video information to the terminal.

And after determining the video information of the corresponding video associated with the target face characteristics, the processing server can output the determined video information, namely, the video information is sent to a terminal.

And step S64, the terminal displays the video information.

Optionally, the terminal may display the video information on a search result page of the video client, for example, display a video corresponding to the target face feature, and display a video time point of the corresponding video.

The application example given above is explained by using a video client to perform video retrieval, but it is needless to say that a user may also log in a video website using a browsing component such as a browser to perform video retrieval through the video website.

the following introduces a device for establishing a face index provided by the embodiment of the present invention; the apparatus for creating a face index described below may be regarded as a program module that is required by the processing server to implement the method for creating a face index provided in the embodiment of the present invention. The contents of the apparatus for creating a face index described below may be referred to in correspondence with the contents of the method for creating a face index described above.

fig. 12 is a block diagram of an apparatus for creating a face index according to an embodiment of the present invention, where the apparatus is applicable to a processing server, and referring to fig. 12, the apparatus may include:

A video obtaining module 100, configured to obtain at least one video;

the face feature and video information determining module 200 is configured to determine face features corresponding to the videos respectively, and determine video information of the corresponding videos of the face features;

A first cluster obtaining module 300, configured to perform cluster processing on the face features of the same face to obtain at least one first cluster; wherein, the face features aggregated by one first cluster represent the face features of the same face in the at least one video;

The face index establishing module 400 is configured to associate, for each face feature aggregated by each first cluster, video information of the face feature in a corresponding video to obtain a face index of the at least one video.

optionally, under the condition of multiple videos, the first cluster obtaining module 300 is configured to perform clustering on the face features of the same face to obtain at least one first cluster, and specifically includes:

obtaining the clustering result of each video, wherein the clustering result of one video at least comprises the following steps: at least one second cluster of the face features of the same face in the video is gathered, and the video information of the face features gathered by the second clusters in the video is obtained;

and aiming at the clustering result of each video, clustering the face features of the same face to obtain at least one first cluster corresponding to the plurality of videos.

Optionally, in the case of multiple videos, the first clustering obtaining module 300 is configured to obtain a clustering result of each video, and specifically includes:

respectively clustering the face features of the same face according to the corresponding face features of each video to respectively obtain at least one second cluster corresponding to each video;

and respectively associating the face features in the video information of the corresponding video aiming at the face features aggregated by the corresponding second clusters of the videos to obtain the clustering result of the videos.

Optionally, in the case of multiple videos, the first cluster obtaining module 300 is configured to obtain at least one second cluster corresponding to each video, and specifically includes:

for any video, after at least one cluster of the face features of the same face in the video is obtained, the cluster of which the face feature occurrence frequency is smaller than a frequency threshold value in the at least one cluster is deleted, and at least one second cluster corresponding to the video is obtained.

Optionally, the video information of the corresponding video of the facial features at least includes: the human face features are at the video time points of the corresponding videos; in the case of multiple videos, the first cluster obtaining module 300 is configured to delete, after obtaining at least one cluster of face features of the same face in any video, a cluster in which the number of occurrences of the face features in the at least one cluster is smaller than a number threshold, so as to obtain at least one second cluster corresponding to the video, and specifically includes:

for any video, after at least one cluster of the face features of the same face in the video is obtained, determining video time point distribution corresponding to the face features of each cluster of the video;

and deleting clusters of which the video time point distribution is not in at least one set duration interval to obtain at least one second cluster corresponding to the video.

optionally, under the condition of multiple videos, the first clustering obtaining module 300 is configured to perform clustering on the face features of the same face for the corresponding face features of the videos, and specifically includes:

if any one of the obtained videos has a video index chain, determining a corresponding second cluster of the face features in the video index chain of the video for the face features extracted from the video; wherein, a video index chain records each second cluster of the face features of different faces gathered in the video;

and if no video index chain exists in any one of the acquired videos, creating a video index chain corresponding to the video, and determining a second cluster corresponding to the extracted face features in the created video index chain of the video for the face features extracted from the video.

Optionally, for any face feature to be clustered, the process of clustering the face features of the same face is executed, which specifically includes:

Detecting whether a target cluster with the similarity of the face features to be clustered meeting the requirement of preset similarity exists in the obtained clusters;

if the target cluster exists, the face features to be clustered are gathered to the target cluster;

And if the target cluster does not exist, setting a new cluster, and gathering the face features to be clustered to the new cluster.

Optionally, the above-mentioned process of performing detection on whether there is a target cluster whose similarity to the face feature to be clustered meets a predetermined similarity requirement in the obtained clusters may specifically include:

detecting whether the obtained cluster exists in a preset vector distance of the face feature vector of the face features to be clustered;

if the obtained cluster exists and the corresponding radius of the obtained cluster is not greater than the radius threshold value after the face features to be clustered are added into the obtained cluster, determining the obtained cluster as the target cluster;

And if the obtained cluster does not exist or the obtained cluster exists, but after the face features to be clustered are added into the obtained cluster, the radius of the obtained cluster is larger than a radius threshold value, and the target cluster does not exist.

Optionally, for the facial features to be clustered, the process of executing the video information of the associated facial features in the corresponding video may specifically include:

and inserting the facial features to be clustered into a video information list corresponding to the cluster to which the facial features to be clustered are aggregated, wherein the facial features to be clustered are in the video information of the corresponding video.

Optionally, the apparatus for establishing a face index provided in the embodiment of the present invention may be further configured to: and updating the corresponding radius and the centroid of the cluster to which the face features to be clustered are gathered.

Optionally, in the embodiment of the present invention, the video information of the corresponding video with the face features may specifically include:

The face features are at the video time point of the corresponding video, and/or the video progress, and/or the identification of the video.

optionally, the apparatus for establishing a face index provided in the embodiment of the present invention may be further configured to:

Acquiring a video retrieval request, wherein the video retrieval request carries a target face image;

extracting the target face features of the target face image;

determining video information of a corresponding video associated with the target face features from the face index;

Outputting the determined video information.

The embodiment of the invention also provides a processing server, which can be loaded with the program module to execute the method for establishing the face index provided by the embodiment of the invention; the program modules may be implemented in the form of programs, fig. 13 is a block diagram illustrating a structure of a processing server according to an embodiment of the present invention, and referring to fig. 13, the processing server may include:

at least one processing chip 1, at least one communication interface 2, at least one memory 3 and at least one communication bus 4;

in the embodiment of the present invention, the number of the processing chip 1, the communication interface 2, the memory 3, and the communication bus 4 is at least one, and the processing chip 1, the communication interface 2, and the memory 3 complete mutual communication through the communication bus 4;

The processing chip 1 may be a central processing unit CPU or an application Specific Integrated circuit asic or one or more Integrated circuits configured to implement embodiments of the present invention.

The memory 3 may comprise a high-speed RAM memory and may also comprise a non-volatile memory, such as at least one disk memory.

wherein, the memory 3 stores programs, and the processing chip 1 calls the programs stored in the memory 3 to realize the steps of the method for establishing the face index executed by the processing server.

the procedure is specifically used for:

acquiring at least one video;

optionally, an embodiment of the present invention further provides a storage medium, where the storage medium may store a program suitable for being executed by a processing chip, so as to implement the steps of the method for establishing a face index, executed by a processing server.

the embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

the steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processing chip, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for establishing a face index, comprising:

acquiring at least one video;

2. the method for creating a face index according to claim 1, wherein the at least one video is a plurality of videos; the clustering the face features of the same face to obtain at least one first cluster comprises:

3. the method for establishing a face index according to claim 2, wherein the obtaining of the clustering result of each video comprises:

4. the method of claim 3, wherein the obtaining at least one second cluster corresponding to each video respectively comprises:

5. the method according to claim 3, wherein the clustering the face features of the same face with respect to the face features corresponding to the videos respectively comprises:

6. The method for establishing a face index according to any one of claims 1 to 5, wherein for any face feature to be clustered, the process of clustering the face features of the same face comprises:

7. The method of claim 6, wherein the detecting whether there is a target cluster in the obtained clusters, the similarity of which to the face features to be clustered meets a predetermined similarity requirement, comprises:

8. the method for establishing the face index according to claim 6, wherein for the face features to be clustered, the process of executing the video information of the associated face features in the corresponding video comprises:

9. the method of claim 6, further comprising:

and updating the corresponding radius and the centroid of the cluster to which the face features to be clustered are gathered.

10. the method of claim 1, wherein the face features in the video information of the corresponding video comprise:

11. The method of claim 4, wherein the facial features at least comprise, in video information of a corresponding video: the human face features are at the video time points of the corresponding videos;

for any video, after at least one cluster of the face features of the same face in the video is obtained, deleting the cluster of which the frequency of occurrence of the face features in the at least one cluster is less than a frequency threshold value, and obtaining at least one second cluster corresponding to the video includes:

12. The method for creating a face index according to claim 1, further comprising:

extracting the target face features of the target face image;

outputting the determined video information.

13. An apparatus for creating a face index, comprising:

The video acquisition module is used for acquiring at least one video;

14. A processing server, comprising: at least one memory storing a program and at least one processing chip invoking said program for implementing the steps of the method of creating a face index according to any one of claims 1-12.

15. A storage medium storing a program adapted to be executed by a processing chip to implement the steps of the method for creating a face index according to any one of claims 1 to 12.