CN110348362B

CN110348362B - Label generation method, video processing method, device, electronic equipment and storage medium

Info

Publication number: CN110348362B
Application number: CN201910604117.3A
Authority: CN
Inventors: 杨帆
Original assignee: Beijing Dajia Internet Information Technology Co Ltd
Current assignee: Beijing Dajia Internet Information Technology Co Ltd
Priority date: 2019-07-05
Filing date: 2019-07-05
Publication date: 2022-10-28
Anticipated expiration: 2039-07-05
Also published as: CN110348362A

Abstract

The disclosure relates to a label generation method, a video processing method, a label generation device, a video processing device, an electronic device and a storage medium. The label generation method comprises the following steps: acquiring a video to be processed, and extracting a face feature vector corresponding to a figure in the video to be processed; acquiring a plurality of standard human face feature vectors and an identity label corresponding to each standard human face feature vector; respectively calculating the similarity between the face feature vector and each standard face feature vector; and determining the identity labels corresponding to the at least two standard face feature vectors with the similarity greater than a preset first similarity threshold and the highest similarity as the identity labels corresponding to the video to be processed. The identity tag is generated for the video, and whether people in different videos are the same or not can be determined according to the identity tag of the video, so that a basis is provided for services such as video recommendation and the like; and whether the people are the same or not can be determined by comparing the number of the same identity tags corresponding to different videos, and compared with the method of directly comparing the characteristics of the people, the method is simpler and more convenient and has higher efficiency.

Description

Label generation method, video processing method, device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of video processing technologies, and in particular, to a tag generation method, a video processing method, a tag generation device, a video processing device, an electronic device, and a storage medium.

Background

With the rapid development of internet technology, users increasingly rely on obtaining information through networks. In order to meet the requirement of users for watching videos, various video websites come along.

Video websites usually label videos for video recommendation and other services. The video tag may describe information related to the video. In the related art, from the viewpoint of a person in a video, information on the sex and age of the person is generally used as a tag corresponding to the video.

However, the video tags are not comprehensive in information and cannot accurately describe the identity information of people in the video, so that whether people in different videos are the same or not cannot be judged according to the video tags.

Disclosure of Invention

The disclosure provides a tag generation method, a video processing method, a tag generation device, a video processing device, an electronic device, a storage medium method, a storage medium device and a storage medium system, and aims to at least solve the problems that video tag information is not comprehensive and identity information of people in videos cannot be accurately described in the related art. The technical scheme of the disclosure is as follows:

according to a first aspect of the embodiments of the present disclosure, there is provided a tag generation method, including:

acquiring a video to be processed, and extracting a face feature vector corresponding to a figure in the video to be processed;

acquiring a plurality of standard face feature vectors and an identity label corresponding to each standard face feature vector;

respectively calculating the similarity between the face feature vector and each standard face feature vector;

and determining the identity labels corresponding to the at least two standard face feature vectors with the similarity greater than a preset first similarity threshold and the highest similarity as the identity labels corresponding to the video to be processed.

Optionally, the step of extracting the face feature vector corresponding to the person in the video to be processed includes: inputting an image containing a person in the video to be processed into a preset neural network model; and taking a vector formed by the face features output by the second last layer of the neural network model as a face feature vector corresponding to the person in the video to be processed.

Optionally, the step of obtaining a plurality of standard face feature vectors and an identity tag corresponding to each standard face feature vector includes: and acquiring a plurality of standard face feature vectors and identity labels corresponding to the standard face feature vectors from a pre-generated standard face library.

Optionally, the standard face library is generated by: obtaining a plurality of sample images, and extracting sample face feature vectors corresponding to people in the sample images; the sample image is provided with a corresponding identity label, and the identity label is used as an identity label corresponding to the sample face feature vector; randomly selecting a sample face feature vector as a standard face feature vector, and adding the standard face feature vector and a corresponding identity label to a standard face library; respectively calculating the similarity between the sample face feature vector and each standard face feature vector added to the standard face library aiming at each residual sample face feature vector; and taking the sample face feature vectors with the similarity smaller than a preset second similarity threshold value as standard face feature vectors, and adding the standard face feature vectors and the corresponding identity labels to the standard face library.

According to a second aspect of the embodiments of the present disclosure, there is provided a video processing method, including:

acquiring a video to be compared, and acquiring an identity tag corresponding to the video to be compared; wherein the identity tag is generated using the tag generation method as described above;

acquiring the number of identical identity tags in the identity tags corresponding to every two videos to be compared;

and when the number of the same identity tags exceeds a preset number threshold value, determining that the characters in the two videos to be compared are the same.

According to a third aspect of the embodiments of the present disclosure, there is provided a tag generation apparatus including:

the first extraction module is configured to acquire a video to be processed and extract a face feature vector corresponding to a person in the video to be processed;

the first acquisition module is configured to acquire a plurality of standard human face feature vectors and identity labels corresponding to the standard human face feature vectors;

a first calculation module configured to perform respective calculation of similarity of the face feature vector and each standard face feature vector;

the first determining module is configured to determine, as the identity tag corresponding to the video to be processed, the identity tag corresponding to at least two standard human face feature vectors with the similarity greater than a preset first similarity threshold and the highest similarity.

Optionally, the first extraction module includes: the input unit is configured to input one frame of image containing a person in the video to be processed into a preset neural network model; and taking a vector formed by the face features output by the second last layer of the neural network model as a face feature vector corresponding to the person in the video to be processed.

Optionally, the first obtaining module is configured to specifically perform obtaining, from a pre-generated standard face library, a plurality of standard face feature vectors and an identity tag corresponding to each standard face feature vector.

Optionally, the standard face library is generated by: the second extraction module is configured to acquire a plurality of sample images and extract sample face feature vectors corresponding to the persons in the sample images; the sample image is provided with a corresponding identity label, and the identity label is used as an identity label corresponding to the sample face feature vector; the system comprises a first adding module, a second adding module and a third adding module, wherein the first adding module is configured to randomly select a sample face feature vector as a standard face feature vector, and add the standard face feature vector and a corresponding identity tag to a standard face library; a second calculation module configured to perform, for each of the remaining sample face feature vectors, calculating a similarity between the sample face feature vector and each of the standard face feature vectors added to the standard face library, respectively; and the second adding module is configured to perform the steps of taking the sample face feature vectors with the similarity smaller than a preset second similarity threshold value as standard face feature vectors, and adding the standard face feature vectors and the corresponding identity labels to the standard face library.

According to a fourth aspect of the embodiments of the present disclosure, there is provided a video processing apparatus including:

the second acquisition module is used for acquiring the video to be compared and acquiring the identity tag corresponding to the video to be compared; wherein the identity tag is generated using a tag generation apparatus as described above;

the third acquisition module is used for acquiring the number of the same identity tags in the identity tags corresponding to every two videos to be compared;

and the second determining module is used for determining that the people in the two videos to be compared are the same when the number of the same identity tags exceeds a preset number threshold.

According to a fifth aspect of the embodiments of the present disclosure, there is provided a server, including: a processor; a memory for storing the processor-executable instructions; wherein the processor is configured to execute the instructions to implement the tag generation method, and/or the video processing method, as described above.

According to a sixth aspect of embodiments of the present disclosure, there is provided a storage medium having instructions that, when executed by a processor of an electronic device, enable the electronic device to perform the tag generation method, and/or the video processing method, as described above.

According to a seventh aspect of embodiments of the present disclosure, there is provided a computer program product comprising readable program code which, when run on a computing device, can cause the computing device to perform the tag generation method, and/or the video processing method, as described above.

The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:

in the embodiment of the disclosure, a video to be processed is obtained, and a face feature vector corresponding to a figure in the video to be processed is extracted; acquiring a plurality of standard human face feature vectors and an identity label corresponding to each standard human face feature vector; respectively calculating the similarity of the face feature vector and each standard face feature vector; and determining the identity labels corresponding to the at least two standard face feature vectors with the similarity greater than a preset first similarity threshold and the highest similarity as the identity labels corresponding to the video to be processed. Therefore, in the embodiment of the disclosure, the identity tag is generated for the video, and the identity tag indicates the identity information of the people in the video, so that whether the people in different videos are the same or not can be determined according to the identity tag of the video, and a basis is provided for services such as video recommendation and the like; and one video corresponds to a plurality of identity tags, so that when the characters in different videos are compared to be the same, whether the characters are the same can be determined by comparing the number of the same identity tags corresponding to different videos, and compared with the method of directly comparing the characters, the method is simpler and more convenient and has higher efficiency.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.

FIG. 1 is a flow diagram illustrating a method of tag generation according to an exemplary embodiment.

FIG. 2 is a flow diagram illustrating a method of tag generation according to an example embodiment.

Fig. 3 is a flow chart illustrating a video processing method according to an exemplary embodiment.

Fig. 4 is a block diagram illustrating a label generation apparatus according to an example embodiment.

Fig. 5 is a block diagram illustrating a video processing apparatus according to an example embodiment.

FIG. 6 is a block diagram illustrating an apparatus in accordance with an example embodiment.

Detailed Description

In order to make the technical solutions of the present disclosure better understood, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

FIG. 1 is a flow diagram illustrating a method of tag generation according to an exemplary embodiment. As shown in fig. 1, the tag generation method may be used in a server, including the following steps.

In step S11, a video to be processed is obtained, and a face feature vector corresponding to a person in the video to be processed is extracted.

The video to be processed refers to the video with the requirement of generating the label. For example, each video in the video library of the video website may be used as a video to be processed.

In this embodiment, for the video to be processed, an identity tag corresponding to the video to be processed is generated. The identity label can indicate the identity information of people in the video to be processed, and the identity information can be recognized based on the human face characteristics. Therefore, in the embodiment, the face feature vector corresponding to the person in the video to be processed is extracted. For example, a neural network model with a function of recognizing human face features may be used to recognize a video to be processed, so as to extract a human face feature vector corresponding to a person therein. Specific procedures will be described in detail in the following examples.

In step S12, a plurality of standard face feature vectors and an identity tag corresponding to each standard face feature vector are obtained.

The standard face feature vector refers to face feature vectors corresponding to representative persons, and the representative persons are just like bases in a face feature space, and persons in a video can be represented by the persons. For example, the representative person may be a person with a high frequency of appearance in the video.

In this embodiment, a plurality of standard face feature vectors may be collected in advance, and an identity tag corresponding to each standard face feature vector may be marked and stored.

In step S13, the similarity between the face feature vector and each standard face feature vector is calculated.

In step S14, the identity tags corresponding to the at least two standard face feature vectors with the similarity greater than the preset first similarity threshold and the highest similarity are determined as the identity tags corresponding to the to-be-processed video.

If the similarity between the face feature vector corresponding to the person in the video to be processed and a certain standard face feature vector is greater than a preset first similarity threshold value, it is indicated that the similarity between the two face feature vectors is greater, and therefore the similarity between the corresponding persons is greater.

The similarity between the face feature vector corresponding to the person in the video to be processed and the plurality of standard face feature vectors is possibly greater than a preset first similarity threshold, in this embodiment, at least two standard face feature vectors with the highest similarity are selected, and the identity tag corresponding to the selected face feature vector is determined as the identity tag corresponding to the video to be processed. Therefore, one to-be-processed video may have at least two identity tags, which indicate that the person in the to-be-processed video is most similar to the persons corresponding to the at least two identity tags.

For example, identity tags corresponding to 50 standard face feature vectors with the highest similarity may be selected as identity tags corresponding to the video to be processed, and if the number of the standard face feature vectors with the similarity greater than the preset first similarity threshold is less than 50, identity tags corresponding to all the standard face feature vectors with the similarity greater than the preset first similarity threshold are selected as identity tags corresponding to the video to be processed.

For the specific numerical value of the first similarity threshold, a person skilled in the art may set any suitable value according to practical situations, which is not limited in this embodiment. For example, the first similarity threshold may be set to 0.5, 0.6, 0.7, and so on.

In the embodiment of the disclosure, the identity tag is generated for the video, and the identity tag indicates the identity information of the people in the video, so that whether the people in different videos are the same or not can be determined according to the identity tag of the video, and a basis is provided for services such as video recommendation. One video corresponds to a plurality of identity tags, so whether the characters are the same or not can be determined by comparing the number of the same identity tags corresponding to different videos when comparing whether the characters in different videos are the same or not, and compared with the method of directly comparing the characters, the method is simpler and more convenient and has higher efficiency.

FIG. 2 is a flowchart illustrating a method of tag generation, according to an example embodiment. As shown in fig. 2, the tag generation method includes the following steps.

In step S21, a neural network model is generated.

In this embodiment, a neural network model for extracting a face feature vector may be generated.

When the model is trained, a large number of sample images containing people can be obtained from the Internet, and identity labels corresponding to the sample images are labeled by a label person, wherein each identity label is used as a category. The identity tag can be a unique identification of a person in the sample image, and the unique identification of the person can be a unique account number, a unique number and the like of the person. In the sample image, there are a plurality of sample images corresponding to the same identity tag. For example, 50 ten thousand sample images may be obtained, and the total number of identity tags corresponding to the 50 ten thousand sample images is 1 ten thousand.

In an implementation, the neural network model may employ a Resnet (residual network) model. The Resnet model is a residual network model in which the network is fitted not to the original mapping directly, but to the residual mapping. The Resnet model may include convolutional layers, fully-connected layers, etc., where convolutional layers may be used to extract features and fully-connected layers may be used to classify extracted features. Therefore, the sample image is input into the Resnet model, the face features in the sample image are extracted through the convolution layer, the extracted face features are input into the full-link layer through the convolution layer, and the face features are classified through the full-link layer. And training the neural network model by using the plurality of sample images and the identity labels corresponding to the sample images, and obtaining the final neural network model after training.

In step S22, a standard face library is generated.

In this embodiment, a standard face library may be generated, where the standard face library is used to store standard face feature vectors and corresponding identity labels. The process of generating the standard face library comprises the following steps A1-A4:

a1, obtaining a plurality of sample images, and extracting sample face feature vectors corresponding to people in the sample images.

The sample images in this step may be the same as those used for generating the neural network model, or may be a large number of sample images including a person, which are obtained from the internet again and have a high frequency of appearance.

And extracting a face feature vector corresponding to a person in each sample image as a sample face feature vector. In implementation, the neural network model generated in step S21 is used to extract a face feature vector corresponding to a person in the sample image.

The process of extracting the face feature vector may include: inputting the sample image into a preset neural network model; and taking a vector formed by the face features output by the second last layer of the neural network model as a face feature vector corresponding to the person in the sample image. Because the convolution layer in the neural network model is used for extracting features, and the full-connection layer is used for classifying the extracted features, the face features output by the penultimate layer (namely, the last convolution layer) of the neural network model are obtained here, and the vector formed by the features is used as the face feature vector corresponding to the person in the sample image. For example, the face feature vector may be a 1024-dimensional vector.

And A2, randomly selecting a sample face feature vector as a standard face feature vector, and adding the standard face feature vector and the corresponding identity label to a standard face library.

In implementation, an iterative method can be used to find the standard face feature vector. Firstly, a sample face feature vector is randomly selected as a standard face feature vector, and the standard face feature vector and a corresponding identity label are added to a standard face library. The sample image has a corresponding identity label, and the identity label is an identity label corresponding to a sample face feature vector corresponding to a person in the sample image.

And A3, respectively calculating the similarity between the sample face feature vector and each standard face feature vector added to the standard face library aiming at each residual sample face feature vector.

And A4, taking the sample face feature vectors with the similarity smaller than a preset second similarity threshold value as standard face feature vectors, and adding the standard face feature vectors and the corresponding identity labels to the standard face library.

And aiming at each residual sample face feature vector, if the similarity between the sample face feature vector and each standard face feature vector added to the standard face library is smaller than a preset second similarity threshold, indicating that the standard face feature vector similar to the sample face feature vector is not added in the face database, taking the sample face feature vector as the standard face feature vector, and adding the standard face feature vector and the corresponding identity label to the standard face library. If the similarity between the sample face feature vector and each standard face feature vector added to the standard face library is greater than or equal to a preset second similarity threshold, it is indicated that the standard face feature vector similar to the sample face feature vector is already added to the face database, and therefore the sample face feature vector does not need to be added to the standard face library.

For the specific value of the second similarity threshold, a person skilled in the art may set any suitable value according to the actual situation, and the embodiment does not limit this. For example, the second similarity threshold may be set to 0.4, 0.5, 0.6, etc. The second similarity threshold may be the same as or different from the first similarity threshold.

In an alternative embodiment, the similarity of two eigenvectors can be measured according to the cosine distance between the two eigenvectors. The cosine distance is a measure for measuring the difference between two individuals by using the cosine value of the included angle between two vectors in the vector space. The larger the cosine distance between two eigenvectors, the greater the similarity of the two eigenvectors. Thus, if the similarity is measured by cosine distance, a second distance threshold may be set. And when the cosine distance between the sample face feature vector and the standard face feature vector is smaller than a second distance threshold, determining that the similarity between the sample face feature vector and the standard face feature vector is smaller than a second similarity threshold.

For the specific value of the second distance threshold, a person skilled in the art may set any suitable value according to the actual situation, and the embodiment does not limit this. For example, the second distance threshold may be set to 0.4, 0.5, 0.6, etc.

Suppose that the two feature vectors are respectively f (x) _i ) And f (x) _j ) And if i and j are natural numbers, the cosine distance between the two eigenvectors is as follows:

of course, the similarity between the two feature vectors may also be measured by other ways, such as euclidean distance, mahalanobis distance, manhattan distance, etc., which is not limited in this embodiment.

In step S23, a video to be processed is obtained, and a face feature vector corresponding to a person in the video to be processed is extracted.

And extracting the face characteristic vector corresponding to the character in the video to be processed aiming at the video to be processed. In implementation, the neural network model generated in step S21 is used to extract a face feature vector corresponding to a person in the video to be processed.

The process of extracting the face feature vector corresponding to the person in the video to be processed may include: inputting an image containing a person in the video to be processed into a preset neural network model; and taking a vector formed by the face features output by the second last layer of the neural network model as a face feature vector corresponding to the person in the video to be processed. The one frame of image containing the person may be one frame of image containing the person randomly extracted from the video to be processed.

In step S24, a plurality of standard face feature vectors and an identity tag corresponding to each standard face feature vector are obtained from a pre-generated standard face library.

As described above, the standard face library stores a plurality of standard face feature vectors and an identity tag corresponding to each standard face feature vector.

In step S25, the similarity between the face feature vector and each standard face feature vector is calculated.

In step S26, the identity tags corresponding to the at least two standard face feature vectors with the similarity greater than the preset first similarity threshold and the highest similarity are determined as the identity tags corresponding to the to-be-processed video.

If the similarity between the human face feature vector corresponding to the person in the video to be processed and the standard human face feature vector is larger than a preset first similarity threshold value, the similarity between the person in the video to be processed and the person corresponding to the standard human face feature vector is larger.

Similar to step S22 above, the similarity between two eigenvectors may be measured according to the cosine distance between the two eigenvectors. Thus, if the similarity is measured by cosine distance, a first distance threshold may be set. And when the cosine distance between the face feature vector corresponding to the person in the video to be processed and the standard face feature vector is greater than a first distance threshold, determining that the similarity between the person in the video to be processed and the standard face feature vector is greater than a first similarity threshold.

For the specific value of the first distance threshold, a person skilled in the art may set any suitable value according to the actual situation, and the embodiment does not limit this. For example, the first distance threshold may be set to 0.5, 0.6, 0.7, etc.

Fig. 3 is a flow chart illustrating a video processing method according to an exemplary embodiment. As shown in fig. 3, the video processing method may be used in a server, including the following steps.

In step S31, a video to be compared is obtained, and an identity tag corresponding to the video to be compared is obtained.

In the embodiment, after the identification tag is generated for the video by using the tag generation method, whether the characters in the two videos are the same or not is compared according to the identification tag.

The video to be compared refers to a video having a requirement for comparing whether the persons are the same. For example, when video recommendation is performed, in order to avoid that videos with the same person exist in recommended videos, candidate videos including the same person in candidate videos may be screened, and one of the candidate videos including the same person may be recommended, and the candidate videos in the scene may all be videos to be compared. The identity tags corresponding to the videos to be compared are the tags generated by the tag generation method, and the number of the identity tags corresponding to each video to be compared is at least two.

In step S32, the number of the same identity tags in the identity tags corresponding to each two videos to be compared is obtained.

The identity tags indicate that the similarity between the people in the videos to be compared and the people is larger, so that if certain identity tags corresponding to two different videos are the same, the similarity between the people in the two videos and the people corresponding to the identity tags is larger.

Therefore, in this embodiment, for every two videos to be compared, the identity tags corresponding to the two videos to be compared are compared, and the number of the same identity tags in the identity tags corresponding to the two videos to be compared is obtained.

In step S33, when the number of identical identity tags exceeds a preset number threshold, it is determined that the people in the two videos to be compared are identical.

If the number of the same identity tags in the identity tags corresponding to the two videos to be compared is larger, it can be shown that the similarity of the characters in the two videos to be compared is larger. Therefore, when the number of the identical identity tags exceeds a preset number threshold, the persons in the two videos to be compared can be considered to be identical. For the specific value of the quantity threshold, a person skilled in the art may set any suitable value according to the actual situation, and the embodiment does not limit this. For example, number thresholds of 5, 10, 15, etc. may be set.

Based on the comparison method, whether the characters in the two videos are the same or not can be obtained in a short time, the calculation amount based on the comparison method is small, the calculation amount is about 2500 times of comparison, and o (50 log 50) times of comparison can be achieved after optimization. And if a mode of directly comparing the human face features is adopted, the calculation amount is about 1024 times of multiply-add, compared with label comparison, the multiply-add calculation is higher in complexity and longer in time consumption, and the algorithm provided by the disclosure has high timeliness in occasions with high time requirements such as video recommendation and the like.

Fig. 4 is a block diagram illustrating a tag generation apparatus according to an example embodiment. Referring to fig. 4, the apparatus includes a first extraction module 401, a first acquisition module 402, a first calculation module 403, and a first determination module 404.

The first extraction module 401 is configured to perform acquiring a video to be processed, and extract a face feature vector corresponding to a person in the video to be processed.

The first obtaining module 402 is configured to obtain a plurality of standard face feature vectors and an identity tag corresponding to each standard face feature vector.

A first calculating module 403 configured to perform calculating the similarity of the face feature vector and each standard face feature vector respectively.

The first determining module 404 is configured to determine, as the identity tag corresponding to the video to be processed, the identity tag corresponding to at least two standard face feature vectors whose similarity is greater than a preset first similarity threshold and whose similarity is the highest.

In an alternative embodiment, the first extraction module 401 includes: the input unit is configured to input one frame of image containing a person in the video to be processed into a preset neural network model; and taking a vector formed by the face features output by the second last layer of the neural network model as a face feature vector corresponding to the person in the video to be processed.

In an optional implementation manner, the first obtaining module 402 is configured to specifically obtain a plurality of standard face feature vectors and an identity tag corresponding to each standard face feature vector from a pre-generated standard face library.

In an alternative embodiment, the standard face library is generated by: the second extraction module is configured to acquire a plurality of sample images and extract sample face feature vectors corresponding to the persons in the sample images; the sample image is provided with a corresponding identity label, and the identity label is used as an identity label corresponding to the sample face feature vector; the system comprises a first adding module, a second adding module and a third adding module, wherein the first adding module is configured to execute random selection of a sample face feature vector as a standard face feature vector, and add the standard face feature vector and a corresponding identity label to a standard face library; a second calculation module configured to perform, for each of the remaining sample face feature vectors, calculating a similarity between the sample face feature vector and each of the standard face feature vectors added to the standard face library, respectively; and the second adding module is configured to perform the steps of taking the sample face feature vectors with the similarity smaller than a preset second similarity threshold value as standard face feature vectors, and adding the standard face feature vectors and the corresponding identity labels to the standard face library.

Fig. 5 is a block diagram illustrating a video processing device according to an example embodiment. Referring to fig. 5, the apparatus includes a second obtaining module 501, a third obtaining module 502 and a second determining module 503.

The second obtaining module 501 is configured to obtain a video to be compared, and obtain an identity tag corresponding to the video to be compared. Wherein the identity tag is generated using the apparatus shown in figure 4.

A third obtaining module 502, configured to obtain the number of the same identity tags in the identity tags corresponding to every two videos to be compared.

A second determining module 503, configured to determine that the people in the two videos to be compared are the same when the number of the same identity tags exceeds a preset number threshold.

In the embodiment of the disclosure, the identity tag is generated for the video, and the identity tag indicates the identity information of the people in the video, so that whether the people in different videos are the same or not can be determined according to the identity tag of the video, and a basis is provided for services such as video recommendation and the like; and one video corresponds to a plurality of identity tags, so that when people in different videos are compared to be the same, the number of the same identity tags corresponding to different videos can be compared to determine whether the people are the same, and compared with the method of directly comparing the characteristics of people, the method is simpler and more convenient and has higher efficiency.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 6 is a block diagram illustrating an apparatus 500 for tag generation, and/or video processing, according to an example embodiment. For example, the apparatus 600 may be provided as a server.

Referring to fig. 6, the apparatus 600 includes a processing component 622 that further includes one or more processors and memory resources, represented by memory 632, for storing instructions, such as applications, that are executable by the processing component 622. The application programs stored in memory 632 may include one or more modules that each correspond to a set of instructions. Further, the processing component 622 is configured to execute instructions to perform the above-described methods.

The apparatus 600 may also include a power component 626 configured to perform power management of the apparatus 600, a wired or wireless network interface 650 configured to connect the apparatus 600 to a network, and an input/output (I/O) interface 658. The apparatus 600 may operate based on an operating system stored in the memory 632, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, or the like.

In an exemplary embodiment, there is also provided a storage medium comprising instructions, such as a memory comprising instructions, executable by a processor of an apparatus for tag generation, and/or video processing, to perform the above method. Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

In an exemplary embodiment, a computer program product is also provided, which comprises readable program code executable by a processor of an apparatus for tag generation, and/or video processing, to perform the above method. Alternatively, the program code may be stored in a storage medium of an apparatus for tag generation, and/or video processing, which may be a non-transitory computer-readable storage medium, for example, ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice in the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements that have been described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A tag generation method, comprising:

acquiring a plurality of standard face feature vectors and an identity label corresponding to each standard face feature vector; the identity label is a unique identifier of the standard human face feature vector;

respectively calculating the similarity of the face feature vector and each standard face feature vector;

determining identity labels corresponding to at least two standard face feature vectors with similarity greater than a preset first similarity threshold and highest similarity as identity labels corresponding to the videos to be processed, so that when the number of the same identity labels corresponding to the two videos exceeds a preset number threshold, a server determines that people in the two videos are the same;

the step of obtaining a plurality of standard face feature vectors and an identity label corresponding to each standard face feature vector comprises the following steps:

acquiring a plurality of standard face feature vectors and an identity label corresponding to each standard face feature vector from a pre-generated standard face library;

the standard face library is generated by the following steps:

obtaining a plurality of sample images, and extracting sample face feature vectors corresponding to people in the sample images; the sample image is provided with a corresponding identity label, and the identity label is used as an identity label corresponding to the sample face feature vector;

randomly selecting a sample face feature vector as a standard face feature vector, and adding the standard face feature vector and a corresponding identity label to a standard face library;

respectively calculating the similarity between the sample face feature vector and each standard face feature vector added to the standard face library aiming at each residual sample face feature vector;

and taking the sample face feature vectors with the similarity smaller than a preset second similarity threshold value as standard face feature vectors, and adding the standard face feature vectors and the corresponding identity labels to the standard face library.

2. The tag generation method according to claim 1, wherein the step of extracting the face feature vector corresponding to the person in the video to be processed comprises:

inputting an image containing a person in one frame of the video to be processed into a preset neural network model;

and taking a vector formed by the face features output by the second last layer of the neural network model as a face feature vector corresponding to the person in the video to be processed.

3. A label generation apparatus, comprising:

the first acquisition module is configured to acquire a plurality of standard face feature vectors and identity labels corresponding to the standard face feature vectors; the identity label is a unique identifier of the standard human face feature vector;

the first determining module is configured to determine identity tags corresponding to at least two standard face feature vectors with similarity greater than a preset first similarity threshold and highest similarity as identity tags corresponding to the videos to be processed, so that when the number of the same identity tags corresponding to two videos exceeds a preset number threshold, the server determines that people in the two videos are the same;

the first acquisition module is configured to specifically execute acquisition of a plurality of standard face feature vectors and an identity tag corresponding to each standard face feature vector from a pre-generated standard face library;

the standard face library is generated by the following modules:

the second extraction module is configured to acquire a plurality of sample images and extract sample face feature vectors corresponding to people in the sample images; the sample image is provided with a corresponding identity label, and the identity label is used as an identity label corresponding to the sample face feature vector;

the system comprises a first adding module, a second adding module and a third adding module, wherein the first adding module is configured to execute random selection of a sample face feature vector as a standard face feature vector, and add the standard face feature vector and a corresponding identity label to a standard face library;

a second calculation module configured to perform, for each of the remaining sample face feature vectors, calculating a similarity between the sample face feature vector and each of the standard face feature vectors added to the standard face library, respectively;

and the second adding module is configured to perform the steps of taking the sample face feature vectors with the similarity smaller than a preset second similarity threshold value as standard face feature vectors, and adding the standard face feature vectors and the corresponding identity labels to the standard face library.

4. The tag generation apparatus according to claim 3, wherein the first extraction module includes:

the input unit is configured to input one frame of image containing a person in the video to be processed into a preset neural network model;

5. A server, comprising:

a processor;

a memory for storing the processor-executable instructions;

wherein the processor is configured to execute the instructions to implement the label generation method of any of claims 1 to 2.

6. A storage medium in which instructions, when executed by a processor of a server, enable the server to perform the label generation method of any one of claims 1 to 2.