CN110348362B - Label generation method, video processing method, device, electronic equipment and storage medium - Google Patents

Label generation method, video processing method, device, electronic equipment and storage medium Download PDF

Info

Publication number
CN110348362B
CN110348362B CN201910604117.3A CN201910604117A CN110348362B CN 110348362 B CN110348362 B CN 110348362B CN 201910604117 A CN201910604117 A CN 201910604117A CN 110348362 B CN110348362 B CN 110348362B
Authority
CN
China
Prior art keywords
face feature
feature vector
identity
video
standard
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910604117.3A
Other languages
Chinese (zh)
Other versions
CN110348362A (en
Inventor
杨帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Dajia Internet Information Technology Co Ltd
Original Assignee
Beijing Dajia Internet Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Dajia Internet Information Technology Co Ltd filed Critical Beijing Dajia Internet Information Technology Co Ltd
Priority to CN201910604117.3A priority Critical patent/CN110348362B/en
Publication of CN110348362A publication Critical patent/CN110348362A/en
Application granted granted Critical
Publication of CN110348362B publication Critical patent/CN110348362B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/43Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
    • H04N21/44Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
    • H04N21/44008Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving operations for analysing video streams, e.g. detecting features or characteristics in the video stream
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/466Learning process for intelligent management, e.g. learning user preferences for recommending movies
    • H04N21/4662Learning process for intelligent management, e.g. learning user preferences for recommending movies characterized by learning algorithms
    • H04N21/4666Learning process for intelligent management, e.g. learning user preferences for recommending movies characterized by learning algorithms using neural networks, e.g. processing the feedback provided by the user
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/45Management operations performed by the client for facilitating the reception of or the interaction with the content or administrating data related to the end-user or to the client device itself, e.g. learning user preferences for recommending movies, resolving scheduling conflicts
    • H04N21/466Learning process for intelligent management, e.g. learning user preferences for recommending movies
    • H04N21/4668Learning process for intelligent management, e.g. learning user preferences for recommending movies for recommending content, e.g. movies
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/84Generation or processing of descriptive data, e.g. content descriptors

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Human Computer Interaction (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The disclosure relates to a label generation method, a video processing method, a label generation device, a video processing device, an electronic device and a storage medium. The label generation method comprises the following steps: acquiring a video to be processed, and extracting a face feature vector corresponding to a figure in the video to be processed; acquiring a plurality of standard human face feature vectors and an identity label corresponding to each standard human face feature vector; respectively calculating the similarity between the face feature vector and each standard face feature vector; and determining the identity labels corresponding to the at least two standard face feature vectors with the similarity greater than a preset first similarity threshold and the highest similarity as the identity labels corresponding to the video to be processed. The identity tag is generated for the video, and whether people in different videos are the same or not can be determined according to the identity tag of the video, so that a basis is provided for services such as video recommendation and the like; and whether the people are the same or not can be determined by comparing the number of the same identity tags corresponding to different videos, and compared with the method of directly comparing the characteristics of the people, the method is simpler and more convenient and has higher efficiency.

Description

Label generation method, video processing method, device, electronic equipment and storage medium
Technical Field
The present disclosure relates to the field of video processing technologies, and in particular, to a tag generation method, a video processing method, a tag generation device, a video processing device, an electronic device, and a storage medium.
Background
With the rapid development of internet technology, users increasingly rely on obtaining information through networks. In order to meet the requirement of users for watching videos, various video websites come along.
Video websites usually label videos for video recommendation and other services. The video tag may describe information related to the video. In the related art, from the viewpoint of a person in a video, information on the sex and age of the person is generally used as a tag corresponding to the video.
However, the video tags are not comprehensive in information and cannot accurately describe the identity information of people in the video, so that whether people in different videos are the same or not cannot be judged according to the video tags.
Disclosure of Invention
The disclosure provides a tag generation method, a video processing method, a tag generation device, a video processing device, an electronic device, a storage medium method, a storage medium device and a storage medium system, and aims to at least solve the problems that video tag information is not comprehensive and identity information of people in videos cannot be accurately described in the related art. The technical scheme of the disclosure is as follows:
according to a first aspect of the embodiments of the present disclosure, there is provided a tag generation method, including:
acquiring a video to be processed, and extracting a face feature vector corresponding to a figure in the video to be processed;
acquiring a plurality of standard face feature vectors and an identity label corresponding to each standard face feature vector;
respectively calculating the similarity between the face feature vector and each standard face feature vector;
and determining the identity labels corresponding to the at least two standard face feature vectors with the similarity greater than a preset first similarity threshold and the highest similarity as the identity labels corresponding to the video to be processed.
Optionally, the step of extracting the face feature vector corresponding to the person in the video to be processed includes: inputting an image containing a person in the video to be processed into a preset neural network model; and taking a vector formed by the face features output by the second last layer of the neural network model as a face feature vector corresponding to the person in the video to be processed.
Optionally, the step of obtaining a plurality of standard face feature vectors and an identity tag corresponding to each standard face feature vector includes: and acquiring a plurality of standard face feature vectors and identity labels corresponding to the standard face feature vectors from a pre-generated standard face library.
Optionally, the standard face library is generated by: obtaining a plurality of sample images, and extracting sample face feature vectors corresponding to people in the sample images; the sample image is provided with a corresponding identity label, and the identity label is used as an identity label corresponding to the sample face feature vector; randomly selecting a sample face feature vector as a standard face feature vector, and adding the standard face feature vector and a corresponding identity label to a standard face library; respectively calculating the similarity between the sample face feature vector and each standard face feature vector added to the standard face library aiming at each residual sample face feature vector; and taking the sample face feature vectors with the similarity smaller than a preset second similarity threshold value as standard face feature vectors, and adding the standard face feature vectors and the corresponding identity labels to the standard face library.
According to a second aspect of the embodiments of the present disclosure, there is provided a video processing method, including:
acquiring a video to be compared, and acquiring an identity tag corresponding to the video to be compared; wherein the identity tag is generated using the tag generation method as described above;
acquiring the number of identical identity tags in the identity tags corresponding to every two videos to be compared;
and when the number of the same identity tags exceeds a preset number threshold value, determining that the characters in the two videos to be compared are the same.
According to a third aspect of the embodiments of the present disclosure, there is provided a tag generation apparatus including:
the first extraction module is configured to acquire a video to be processed and extract a face feature vector corresponding to a person in the video to be processed;
the first acquisition module is configured to acquire a plurality of standard human face feature vectors and identity labels corresponding to the standard human face feature vectors;
a first calculation module configured to perform respective calculation of similarity of the face feature vector and each standard face feature vector;
the first determining module is configured to determine, as the identity tag corresponding to the video to be processed, the identity tag corresponding to at least two standard human face feature vectors with the similarity greater than a preset first similarity threshold and the highest similarity.
Optionally, the first extraction module includes: the input unit is configured to input one frame of image containing a person in the video to be processed into a preset neural network model; and taking a vector formed by the face features output by the second last layer of the neural network model as a face feature vector corresponding to the person in the video to be processed.
Optionally, the first obtaining module is configured to specifically perform obtaining, from a pre-generated standard face library, a plurality of standard face feature vectors and an identity tag corresponding to each standard face feature vector.
Optionally, the standard face library is generated by: the second extraction module is configured to acquire a plurality of sample images and extract sample face feature vectors corresponding to the persons in the sample images; the sample image is provided with a corresponding identity label, and the identity label is used as an identity label corresponding to the sample face feature vector; the system comprises a first adding module, a second adding module and a third adding module, wherein the first adding module is configured to randomly select a sample face feature vector as a standard face feature vector, and add the standard face feature vector and a corresponding identity tag to a standard face library; a second calculation module configured to perform, for each of the remaining sample face feature vectors, calculating a similarity between the sample face feature vector and each of the standard face feature vectors added to the standard face library, respectively; and the second adding module is configured to perform the steps of taking the sample face feature vectors with the similarity smaller than a preset second similarity threshold value as standard face feature vectors, and adding the standard face feature vectors and the corresponding identity labels to the standard face library.
According to a fourth aspect of the embodiments of the present disclosure, there is provided a video processing apparatus including:
the second acquisition module is used for acquiring the video to be compared and acquiring the identity tag corresponding to the video to be compared; wherein the identity tag is generated using a tag generation apparatus as described above;
the third acquisition module is used for acquiring the number of the same identity tags in the identity tags corresponding to every two videos to be compared;
and the second determining module is used for determining that the people in the two videos to be compared are the same when the number of the same identity tags exceeds a preset number threshold.
According to a fifth aspect of the embodiments of the present disclosure, there is provided a server, including: a processor; a memory for storing the processor-executable instructions; wherein the processor is configured to execute the instructions to implement the tag generation method, and/or the video processing method, as described above.
According to a sixth aspect of embodiments of the present disclosure, there is provided a storage medium having instructions that, when executed by a processor of an electronic device, enable the electronic device to perform the tag generation method, and/or the video processing method, as described above.
According to a seventh aspect of embodiments of the present disclosure, there is provided a computer program product comprising readable program code which, when run on a computing device, can cause the computing device to perform the tag generation method, and/or the video processing method, as described above.
The technical scheme provided by the embodiment of the disclosure at least brings the following beneficial effects:
in the embodiment of the disclosure, a video to be processed is obtained, and a face feature vector corresponding to a figure in the video to be processed is extracted; acquiring a plurality of standard human face feature vectors and an identity label corresponding to each standard human face feature vector; respectively calculating the similarity of the face feature vector and each standard face feature vector; and determining the identity labels corresponding to the at least two standard face feature vectors with the similarity greater than a preset first similarity threshold and the highest similarity as the identity labels corresponding to the video to be processed. Therefore, in the embodiment of the disclosure, the identity tag is generated for the video, and the identity tag indicates the identity information of the people in the video, so that whether the people in different videos are the same or not can be determined according to the identity tag of the video, and a basis is provided for services such as video recommendation and the like; and one video corresponds to a plurality of identity tags, so that when the characters in different videos are compared to be the same, whether the characters are the same can be determined by comparing the number of the same identity tags corresponding to different videos, and compared with the method of directly comparing the characters, the method is simpler and more convenient and has higher efficiency.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure and are not to be construed as limiting the disclosure.
FIG. 1 is a flow diagram illustrating a method of tag generation according to an exemplary embodiment.
FIG. 2 is a flow diagram illustrating a method of tag generation according to an example embodiment.
Fig. 3 is a flow chart illustrating a video processing method according to an exemplary embodiment.
Fig. 4 is a block diagram illustrating a label generation apparatus according to an example embodiment.
Fig. 5 is a block diagram illustrating a video processing apparatus according to an example embodiment.
FIG. 6 is a block diagram illustrating an apparatus in accordance with an example embodiment.
Detailed Description
In order to make the technical solutions of the present disclosure better understood, the technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings.
It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the disclosure described herein are capable of operation in sequences other than those illustrated or otherwise described herein. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
FIG. 1 is a flow diagram illustrating a method of tag generation according to an exemplary embodiment. As shown in fig. 1, the tag generation method may be used in a server, including the following steps.
In step S11, a video to be processed is obtained, and a face feature vector corresponding to a person in the video to be processed is extracted.
The video to be processed refers to the video with the requirement of generating the label. For example, each video in the video library of the video website may be used as a video to be processed.
In this embodiment, for the video to be processed, an identity tag corresponding to the video to be processed is generated. The identity label can indicate the identity information of people in the video to be processed, and the identity information can be recognized based on the human face characteristics. Therefore, in the embodiment, the face feature vector corresponding to the person in the video to be processed is extracted. For example, a neural network model with a function of recognizing human face features may be used to recognize a video to be processed, so as to extract a human face feature vector corresponding to a person therein. Specific procedures will be described in detail in the following examples.
In step S12, a plurality of standard face feature vectors and an identity tag corresponding to each standard face feature vector are obtained.
The standard face feature vector refers to face feature vectors corresponding to representative persons, and the representative persons are just like bases in a face feature space, and persons in a video can be represented by the persons. For example, the representative person may be a person with a high frequency of appearance in the video.
In this embodiment, a plurality of standard face feature vectors may be collected in advance, and an identity tag corresponding to each standard face feature vector may be marked and stored.
In step S13, the similarity between the face feature vector and each standard face feature vector is calculated.
In step S14, the identity tags corresponding to the at least two standard face feature vectors with the similarity greater than the preset first similarity threshold and the highest similarity are determined as the identity tags corresponding to the to-be-processed video.
If the similarity between the face feature vector corresponding to the person in the video to be processed and a certain standard face feature vector is greater than a preset first similarity threshold value, it is indicated that the similarity between the two face feature vectors is greater, and therefore the similarity between the corresponding persons is greater.
The similarity between the face feature vector corresponding to the person in the video to be processed and the plurality of standard face feature vectors is possibly greater than a preset first similarity threshold, in this embodiment, at least two standard face feature vectors with the highest similarity are selected, and the identity tag corresponding to the selected face feature vector is determined as the identity tag corresponding to the video to be processed. Therefore, one to-be-processed video may have at least two identity tags, which indicate that the person in the to-be-processed video is most similar to the persons corresponding to the at least two identity tags.
For example, identity tags corresponding to 50 standard face feature vectors with the highest similarity may be selected as identity tags corresponding to the video to be processed, and if the number of the standard face feature vectors with the similarity greater than the preset first similarity threshold is less than 50, identity tags corresponding to all the standard face feature vectors with the similarity greater than the preset first similarity threshold are selected as identity tags corresponding to the video to be processed.
For the specific numerical value of the first similarity threshold, a person skilled in the art may set any suitable value according to practical situations, which is not limited in this embodiment. For example, the first similarity threshold may be set to 0.5, 0.6, 0.7, and so on.
In the embodiment of the disclosure, the identity tag is generated for the video, and the identity tag indicates the identity information of the people in the video, so that whether the people in different videos are the same or not can be determined according to the identity tag of the video, and a basis is provided for services such as video recommendation. One video corresponds to a plurality of identity tags, so whether the characters are the same or not can be determined by comparing the number of the same identity tags corresponding to different videos when comparing whether the characters in different videos are the same or not, and compared with the method of directly comparing the characters, the method is simpler and more convenient and has higher efficiency.
FIG. 2 is a flowchart illustrating a method of tag generation, according to an example embodiment. As shown in fig. 2, the tag generation method includes the following steps.
In step S21, a neural network model is generated.
In this embodiment, a neural network model for extracting a face feature vector may be generated.
When the model is trained, a large number of sample images containing people can be obtained from the Internet, and identity labels corresponding to the sample images are labeled by a label person, wherein each identity label is used as a category. The identity tag can be a unique identification of a person in the sample image, and the unique identification of the person can be a unique account number, a unique number and the like of the person. In the sample image, there are a plurality of sample images corresponding to the same identity tag. For example, 50 ten thousand sample images may be obtained, and the total number of identity tags corresponding to the 50 ten thousand sample images is 1 ten thousand.
In an implementation, the neural network model may employ a Resnet (residual network) model. The Resnet model is a residual network model in which the network is fitted not to the original mapping directly, but to the residual mapping. The Resnet model may include convolutional layers, fully-connected layers, etc., where convolutional layers may be used to extract features and fully-connected layers may be used to classify extracted features. Therefore, the sample image is input into the Resnet model, the face features in the sample image are extracted through the convolution layer, the extracted face features are input into the full-link layer through the convolution layer, and the face features are classified through the full-link layer. And training the neural network model by using the plurality of sample images and the identity labels corresponding to the sample images, and obtaining the final neural network model after training.
In step S22, a standard face library is generated.
In this embodiment, a standard face library may be generated, where the standard face library is used to store standard face feature vectors and corresponding identity labels. The process of generating the standard face library comprises the following steps A1-A4:
a1, obtaining a plurality of sample images, and extracting sample face feature vectors corresponding to people in the sample images.
The sample images in this step may be the same as those used for generating the neural network model, or may be a large number of sample images including a person, which are obtained from the internet again and have a high frequency of appearance.
And extracting a face feature vector corresponding to a person in each sample image as a sample face feature vector. In implementation, the neural network model generated in step S21 is used to extract a face feature vector corresponding to a person in the sample image.
The process of extracting the face feature vector may include: inputting the sample image into a preset neural network model; and taking a vector formed by the face features output by the second last layer of the neural network model as a face feature vector corresponding to the person in the sample image. Because the convolution layer in the neural network model is used for extracting features, and the full-connection layer is used for classifying the extracted features, the face features output by the penultimate layer (namely, the last convolution layer) of the neural network model are obtained here, and the vector formed by the features is used as the face feature vector corresponding to the person in the sample image. For example, the face feature vector may be a 1024-dimensional vector.
And A2, randomly selecting a sample face feature vector as a standard face feature vector, and adding the standard face feature vector and the corresponding identity label to a standard face library.
In implementation, an iterative method can be used to find the standard face feature vector. Firstly, a sample face feature vector is randomly selected as a standard face feature vector, and the standard face feature vector and a corresponding identity label are added to a standard face library. The sample image has a corresponding identity label, and the identity label is an identity label corresponding to a sample face feature vector corresponding to a person in the sample image.
And A3, respectively calculating the similarity between the sample face feature vector and each standard face feature vector added to the standard face library aiming at each residual sample face feature vector.
And A4, taking the sample face feature vectors with the similarity smaller than a preset second similarity threshold value as standard face feature vectors, and adding the standard face feature vectors and the corresponding identity labels to the standard face library.
And aiming at each residual sample face feature vector, if the similarity between the sample face feature vector and each standard face feature vector added to the standard face library is smaller than a preset second similarity threshold, indicating that the standard face feature vector similar to the sample face feature vector is not added in the face database, taking the sample face feature vector as the standard face feature vector, and adding the standard face feature vector and the corresponding identity label to the standard face library. If the similarity between the sample face feature vector and each standard face feature vector added to the standard face library is greater than or equal to a preset second similarity threshold, it is indicated that the standard face feature vector similar to the sample face feature vector is already added to the face database, and therefore the sample face feature vector does not need to be added to the standard face library.
For the specific value of the second similarity threshold, a person skilled in the art may set any suitable value according to the actual situation, and the embodiment does not limit this. For example, the second similarity threshold may be set to 0.4, 0.5, 0.6, etc. The second similarity threshold may be the same as or different from the first similarity threshold.
In an alternative embodiment, the similarity of two eigenvectors can be measured according to the cosine distance between the two eigenvectors. The cosine distance is a measure for measuring the difference between two individuals by using the cosine value of the included angle between two vectors in the vector space. The larger the cosine distance between two eigenvectors, the greater the similarity of the two eigenvectors. Thus, if the similarity is measured by cosine distance, a second distance threshold may be set. And when the cosine distance between the sample face feature vector and the standard face feature vector is smaller than a second distance threshold, determining that the similarity between the sample face feature vector and the standard face feature vector is smaller than a second similarity threshold.
For the specific value of the second distance threshold, a person skilled in the art may set any suitable value according to the actual situation, and the embodiment does not limit this. For example, the second distance threshold may be set to 0.4, 0.5, 0.6, etc.
Suppose that the two feature vectors are respectively f (x) i ) And f (x) j ) And if i and j are natural numbers, the cosine distance between the two eigenvectors is as follows:
Figure BDA0002120236770000081
of course, the similarity between the two feature vectors may also be measured by other ways, such as euclidean distance, mahalanobis distance, manhattan distance, etc., which is not limited in this embodiment.
In step S23, a video to be processed is obtained, and a face feature vector corresponding to a person in the video to be processed is extracted.
And extracting the face characteristic vector corresponding to the character in the video to be processed aiming at the video to be processed. In implementation, the neural network model generated in step S21 is used to extract a face feature vector corresponding to a person in the video to be processed.
The process of extracting the face feature vector corresponding to the person in the video to be processed may include: inputting an image containing a person in the video to be processed into a preset neural network model; and taking a vector formed by the face features output by the second last layer of the neural network model as a face feature vector corresponding to the person in the video to be processed. The one frame of image containing the person may be one frame of image containing the person randomly extracted from the video to be processed.
In step S24, a plurality of standard face feature vectors and an identity tag corresponding to each standard face feature vector are obtained from a pre-generated standard face library.
As described above, the standard face library stores a plurality of standard face feature vectors and an identity tag corresponding to each standard face feature vector.
In step S25, the similarity between the face feature vector and each standard face feature vector is calculated.
In step S26, the identity tags corresponding to the at least two standard face feature vectors with the similarity greater than the preset first similarity threshold and the highest similarity are determined as the identity tags corresponding to the to-be-processed video.
If the similarity between the human face feature vector corresponding to the person in the video to be processed and the standard human face feature vector is larger than a preset first similarity threshold value, the similarity between the person in the video to be processed and the person corresponding to the standard human face feature vector is larger.
Similar to step S22 above, the similarity between two eigenvectors may be measured according to the cosine distance between the two eigenvectors. Thus, if the similarity is measured by cosine distance, a first distance threshold may be set. And when the cosine distance between the face feature vector corresponding to the person in the video to be processed and the standard face feature vector is greater than a first distance threshold, determining that the similarity between the person in the video to be processed and the standard face feature vector is greater than a first similarity threshold.
For the specific value of the first distance threshold, a person skilled in the art may set any suitable value according to the actual situation, and the embodiment does not limit this. For example, the first distance threshold may be set to 0.5, 0.6, 0.7, etc.
Fig. 3 is a flow chart illustrating a video processing method according to an exemplary embodiment. As shown in fig. 3, the video processing method may be used in a server, including the following steps.
In step S31, a video to be compared is obtained, and an identity tag corresponding to the video to be compared is obtained.
In the embodiment, after the identification tag is generated for the video by using the tag generation method, whether the characters in the two videos are the same or not is compared according to the identification tag.
The video to be compared refers to a video having a requirement for comparing whether the persons are the same. For example, when video recommendation is performed, in order to avoid that videos with the same person exist in recommended videos, candidate videos including the same person in candidate videos may be screened, and one of the candidate videos including the same person may be recommended, and the candidate videos in the scene may all be videos to be compared. The identity tags corresponding to the videos to be compared are the tags generated by the tag generation method, and the number of the identity tags corresponding to each video to be compared is at least two.
In step S32, the number of the same identity tags in the identity tags corresponding to each two videos to be compared is obtained.
The identity tags indicate that the similarity between the people in the videos to be compared and the people is larger, so that if certain identity tags corresponding to two different videos are the same, the similarity between the people in the two videos and the people corresponding to the identity tags is larger.
Therefore, in this embodiment, for every two videos to be compared, the identity tags corresponding to the two videos to be compared are compared, and the number of the same identity tags in the identity tags corresponding to the two videos to be compared is obtained.
In step S33, when the number of identical identity tags exceeds a preset number threshold, it is determined that the people in the two videos to be compared are identical.
If the number of the same identity tags in the identity tags corresponding to the two videos to be compared is larger, it can be shown that the similarity of the characters in the two videos to be compared is larger. Therefore, when the number of the identical identity tags exceeds a preset number threshold, the persons in the two videos to be compared can be considered to be identical. For the specific value of the quantity threshold, a person skilled in the art may set any suitable value according to the actual situation, and the embodiment does not limit this. For example, number thresholds of 5, 10, 15, etc. may be set.
Based on the comparison method, whether the characters in the two videos are the same or not can be obtained in a short time, the calculation amount based on the comparison method is small, the calculation amount is about 2500 times of comparison, and o (50 log 50) times of comparison can be achieved after optimization. And if a mode of directly comparing the human face features is adopted, the calculation amount is about 1024 times of multiply-add, compared with label comparison, the multiply-add calculation is higher in complexity and longer in time consumption, and the algorithm provided by the disclosure has high timeliness in occasions with high time requirements such as video recommendation and the like.
Fig. 4 is a block diagram illustrating a tag generation apparatus according to an example embodiment. Referring to fig. 4, the apparatus includes a first extraction module 401, a first acquisition module 402, a first calculation module 403, and a first determination module 404.
The first extraction module 401 is configured to perform acquiring a video to be processed, and extract a face feature vector corresponding to a person in the video to be processed.
The first obtaining module 402 is configured to obtain a plurality of standard face feature vectors and an identity tag corresponding to each standard face feature vector.
A first calculating module 403 configured to perform calculating the similarity of the face feature vector and each standard face feature vector respectively.
The first determining module 404 is configured to determine, as the identity tag corresponding to the video to be processed, the identity tag corresponding to at least two standard face feature vectors whose similarity is greater than a preset first similarity threshold and whose similarity is the highest.
In an alternative embodiment, the first extraction module 401 includes: the input unit is configured to input one frame of image containing a person in the video to be processed into a preset neural network model; and taking a vector formed by the face features output by the second last layer of the neural network model as a face feature vector corresponding to the person in the video to be processed.
In an optional implementation manner, the first obtaining module 402 is configured to specifically obtain a plurality of standard face feature vectors and an identity tag corresponding to each standard face feature vector from a pre-generated standard face library.
In an alternative embodiment, the standard face library is generated by: the second extraction module is configured to acquire a plurality of sample images and extract sample face feature vectors corresponding to the persons in the sample images; the sample image is provided with a corresponding identity label, and the identity label is used as an identity label corresponding to the sample face feature vector; the system comprises a first adding module, a second adding module and a third adding module, wherein the first adding module is configured to execute random selection of a sample face feature vector as a standard face feature vector, and add the standard face feature vector and a corresponding identity label to a standard face library; a second calculation module configured to perform, for each of the remaining sample face feature vectors, calculating a similarity between the sample face feature vector and each of the standard face feature vectors added to the standard face library, respectively; and the second adding module is configured to perform the steps of taking the sample face feature vectors with the similarity smaller than a preset second similarity threshold value as standard face feature vectors, and adding the standard face feature vectors and the corresponding identity labels to the standard face library.
Fig. 5 is a block diagram illustrating a video processing device according to an example embodiment. Referring to fig. 5, the apparatus includes a second obtaining module 501, a third obtaining module 502 and a second determining module 503.
The second obtaining module 501 is configured to obtain a video to be compared, and obtain an identity tag corresponding to the video to be compared. Wherein the identity tag is generated using the apparatus shown in figure 4.
A third obtaining module 502, configured to obtain the number of the same identity tags in the identity tags corresponding to every two videos to be compared.
A second determining module 503, configured to determine that the people in the two videos to be compared are the same when the number of the same identity tags exceeds a preset number threshold.
In the embodiment of the disclosure, the identity tag is generated for the video, and the identity tag indicates the identity information of the people in the video, so that whether the people in different videos are the same or not can be determined according to the identity tag of the video, and a basis is provided for services such as video recommendation and the like; and one video corresponds to a plurality of identity tags, so that when people in different videos are compared to be the same, the number of the same identity tags corresponding to different videos can be compared to determine whether the people are the same, and compared with the method of directly comparing the characteristics of people, the method is simpler and more convenient and has higher efficiency.
With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.
Fig. 6 is a block diagram illustrating an apparatus 500 for tag generation, and/or video processing, according to an example embodiment. For example, the apparatus 600 may be provided as a server.
Referring to fig. 6, the apparatus 600 includes a processing component 622 that further includes one or more processors and memory resources, represented by memory 632, for storing instructions, such as applications, that are executable by the processing component 622. The application programs stored in memory 632 may include one or more modules that each correspond to a set of instructions. Further, the processing component 622 is configured to execute instructions to perform the above-described methods.
The apparatus 600 may also include a power component 626 configured to perform power management of the apparatus 600, a wired or wireless network interface 650 configured to connect the apparatus 600 to a network, and an input/output (I/O) interface 658. The apparatus 600 may operate based on an operating system stored in the memory 632, such as Windows Server, mac OS XTM, unixTM, linuxTM, freeBSDTM, or the like.
In an exemplary embodiment, there is also provided a storage medium comprising instructions, such as a memory comprising instructions, executable by a processor of an apparatus for tag generation, and/or video processing, to perform the above method. Alternatively, the storage medium may be a non-transitory computer readable storage medium, which may be, for example, a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
In an exemplary embodiment, a computer program product is also provided, which comprises readable program code executable by a processor of an apparatus for tag generation, and/or video processing, to perform the above method. Alternatively, the program code may be stored in a storage medium of an apparatus for tag generation, and/or video processing, which may be a non-transitory computer-readable storage medium, for example, ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice in the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.
It will be understood that the present disclosure is not limited to the precise arrangements that have been described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims (6)

1. A tag generation method, comprising:
acquiring a video to be processed, and extracting a face feature vector corresponding to a figure in the video to be processed;
acquiring a plurality of standard face feature vectors and an identity label corresponding to each standard face feature vector; the identity label is a unique identifier of the standard human face feature vector;
respectively calculating the similarity of the face feature vector and each standard face feature vector;
determining identity labels corresponding to at least two standard face feature vectors with similarity greater than a preset first similarity threshold and highest similarity as identity labels corresponding to the videos to be processed, so that when the number of the same identity labels corresponding to the two videos exceeds a preset number threshold, a server determines that people in the two videos are the same;
the step of obtaining a plurality of standard face feature vectors and an identity label corresponding to each standard face feature vector comprises the following steps:
acquiring a plurality of standard face feature vectors and an identity label corresponding to each standard face feature vector from a pre-generated standard face library;
the standard face library is generated by the following steps:
obtaining a plurality of sample images, and extracting sample face feature vectors corresponding to people in the sample images; the sample image is provided with a corresponding identity label, and the identity label is used as an identity label corresponding to the sample face feature vector;
randomly selecting a sample face feature vector as a standard face feature vector, and adding the standard face feature vector and a corresponding identity label to a standard face library;
respectively calculating the similarity between the sample face feature vector and each standard face feature vector added to the standard face library aiming at each residual sample face feature vector;
and taking the sample face feature vectors with the similarity smaller than a preset second similarity threshold value as standard face feature vectors, and adding the standard face feature vectors and the corresponding identity labels to the standard face library.
2. The tag generation method according to claim 1, wherein the step of extracting the face feature vector corresponding to the person in the video to be processed comprises:
inputting an image containing a person in one frame of the video to be processed into a preset neural network model;
and taking a vector formed by the face features output by the second last layer of the neural network model as a face feature vector corresponding to the person in the video to be processed.
3. A label generation apparatus, comprising:
the first extraction module is configured to acquire a video to be processed and extract a face feature vector corresponding to a person in the video to be processed;
the first acquisition module is configured to acquire a plurality of standard face feature vectors and identity labels corresponding to the standard face feature vectors; the identity label is a unique identifier of the standard human face feature vector;
a first calculation module configured to perform respective calculation of similarity of the face feature vector and each standard face feature vector;
the first determining module is configured to determine identity tags corresponding to at least two standard face feature vectors with similarity greater than a preset first similarity threshold and highest similarity as identity tags corresponding to the videos to be processed, so that when the number of the same identity tags corresponding to two videos exceeds a preset number threshold, the server determines that people in the two videos are the same;
the first acquisition module is configured to specifically execute acquisition of a plurality of standard face feature vectors and an identity tag corresponding to each standard face feature vector from a pre-generated standard face library;
the standard face library is generated by the following modules:
the second extraction module is configured to acquire a plurality of sample images and extract sample face feature vectors corresponding to people in the sample images; the sample image is provided with a corresponding identity label, and the identity label is used as an identity label corresponding to the sample face feature vector;
the system comprises a first adding module, a second adding module and a third adding module, wherein the first adding module is configured to execute random selection of a sample face feature vector as a standard face feature vector, and add the standard face feature vector and a corresponding identity label to a standard face library;
a second calculation module configured to perform, for each of the remaining sample face feature vectors, calculating a similarity between the sample face feature vector and each of the standard face feature vectors added to the standard face library, respectively;
and the second adding module is configured to perform the steps of taking the sample face feature vectors with the similarity smaller than a preset second similarity threshold value as standard face feature vectors, and adding the standard face feature vectors and the corresponding identity labels to the standard face library.
4. The tag generation apparatus according to claim 3, wherein the first extraction module includes:
the input unit is configured to input one frame of image containing a person in the video to be processed into a preset neural network model;
and taking a vector formed by the face features output by the second last layer of the neural network model as a face feature vector corresponding to the person in the video to be processed.
5. A server, comprising:
a processor;
a memory for storing the processor-executable instructions;
wherein the processor is configured to execute the instructions to implement the label generation method of any of claims 1 to 2.
6. A storage medium in which instructions, when executed by a processor of a server, enable the server to perform the label generation method of any one of claims 1 to 2.
CN201910604117.3A 2019-07-05 2019-07-05 Label generation method, video processing method, device, electronic equipment and storage medium Active CN110348362B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910604117.3A CN110348362B (en) 2019-07-05 2019-07-05 Label generation method, video processing method, device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910604117.3A CN110348362B (en) 2019-07-05 2019-07-05 Label generation method, video processing method, device, electronic equipment and storage medium

Publications (2)

Publication Number Publication Date
CN110348362A CN110348362A (en) 2019-10-18
CN110348362B true CN110348362B (en) 2022-10-28

Family

ID=68177897

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910604117.3A Active CN110348362B (en) 2019-07-05 2019-07-05 Label generation method, video processing method, device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN110348362B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110675433A (en) 2019-10-31 2020-01-10 北京达佳互联信息技术有限公司 Video processing method and device, electronic equipment and storage medium
CN110866491B (en) * 2019-11-13 2023-11-24 腾讯科技(深圳)有限公司 Target retrieval method, apparatus, computer-readable storage medium, and computer device
CN111353300B (en) * 2020-02-14 2023-09-01 中科天玑数据科技股份有限公司 Data set construction and related information acquisition method and device
CN113365102B (en) * 2020-03-04 2022-08-16 阿里巴巴集团控股有限公司 Video processing method and device and label processing method and device
CN111444366B (en) * 2020-04-10 2024-02-20 Oppo广东移动通信有限公司 Image classification method, device, storage medium and electronic equipment
CN111488936B (en) * 2020-04-14 2023-07-28 深圳力维智联技术有限公司 Feature fusion method and device and storage medium
CN111708988B (en) * 2020-05-15 2023-05-30 北京奇艺世纪科技有限公司 Infringement video identification method and device, electronic equipment and storage medium
CN112163122B (en) * 2020-10-30 2024-02-06 腾讯科技(深圳)有限公司 Method, device, computing equipment and storage medium for determining label of target video

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20150093058A (en) * 2014-02-06 2015-08-17 주식회사 에스원 Method and apparatus for recognizing face
WO2017024506A1 (en) * 2015-08-11 2017-02-16 常平 Method for prompting information and system for pushing advertisement when inserting advertisement before playing video
CN108764611A (en) * 2018-04-12 2018-11-06 合肥指南针电子科技有限责任公司 A kind of public security equipment intellectualized management system

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108932254A (en) * 2017-05-25 2018-12-04 中兴通讯股份有限公司 A kind of detection method of similar video, equipment, system and storage medium
CN107423678A (en) * 2017-05-27 2017-12-01 电子科技大学 A kind of training method and face identification method of the convolutional neural networks for extracting feature
CN109726619A (en) * 2017-10-31 2019-05-07 深圳市祈飞科技有限公司 A kind of convolutional neural networks face identification method and system based on parameter sharing
CN108846694A (en) * 2018-06-06 2018-11-20 厦门集微科技有限公司 A kind of elevator card put-on method and device, computer readable storage medium
CN109598190A (en) * 2018-10-23 2019-04-09 深圳壹账通智能科技有限公司 Method, apparatus, computer equipment and storage medium for action recognition
CN109598211A (en) * 2018-11-16 2019-04-09 恒安嘉新(北京)科技股份公司 A kind of real-time dynamic human face recognition methods and system
CN109523325A (en) * 2018-11-29 2019-03-26 成都睿码科技有限责任公司 A kind of targetedly self-regulation advertisement delivery system based on recognition of face

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20150093058A (en) * 2014-02-06 2015-08-17 주식회사 에스원 Method and apparatus for recognizing face
WO2017024506A1 (en) * 2015-08-11 2017-02-16 常平 Method for prompting information and system for pushing advertisement when inserting advertisement before playing video
CN108764611A (en) * 2018-04-12 2018-11-06 合肥指南针电子科技有限责任公司 A kind of public security equipment intellectualized management system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Matching faces with textual cues in soccer videos;Bertini M 等;《2006 IEEE International Conference on Multimedia and Expo》;20061231;第537-540页 *
基于多模态信息融合的新闻图像人脸标注;征察 等;《计算机应用》;20171231;第37卷(第10期);3006-3010页 *
智能视觉监控中行人再识别技术研究;四建楼;《中国优秀博硕士学位论文全文数据库(博士)》;20180915;I138-33页 *

Also Published As

Publication number Publication date
CN110348362A (en) 2019-10-18

Similar Documents

Publication Publication Date Title
CN110348362B (en) Label generation method, video processing method, device, electronic equipment and storage medium
CN110362677B (en) Text data category identification method and device, storage medium and computer equipment
US8064641B2 (en) System and method for identifying objects in video
CN106547744B (en) Image retrieval method and system
CN110909182B (en) Multimedia resource searching method, device, computer equipment and storage medium
CN108491794B (en) Face recognition method and device
CN106575280B (en) System and method for analyzing user-associated images to produce non-user generated labels and utilizing the generated labels
CN111814817A (en) Video classification method and device, storage medium and electronic equipment
CN112364204A (en) Video searching method and device, computer equipment and storage medium
CN115331150A (en) Image recognition method, image recognition device, electronic equipment and storage medium
CN112966626A (en) Face recognition method and device
CN113128526B (en) Image recognition method and device, electronic equipment and computer-readable storage medium
CN113033507B (en) Scene recognition method and device, computer equipment and storage medium
CN113254687B (en) Image retrieval and image quantification model training method, device and storage medium
CN112070744B (en) Face recognition method, system, device and readable storage medium
CN108024148B (en) Behavior feature-based multimedia file identification method, processing method and device
CN114943549A (en) Advertisement delivery method and device
CN115599953A (en) Training method and retrieval method of video text retrieval model and related equipment
CN114048344A (en) Similar face searching method, device, equipment and readable storage medium
RU2708504C1 (en) Method of training goods recognition system on images
CN114443904A (en) Video query method, video query device, computer equipment and computer readable storage medium
CN113869099A (en) Image processing method and device, electronic equipment and storage medium
KR102060110B1 (en) Method, apparatus and computer program for classifying object in contents
JP2015097036A (en) Recommended image presentation apparatus and program
CN112115740A (en) Method and apparatus for processing image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant