CN113963303A

CN113963303A - Image processing method, video recognition method, device, equipment and storage medium

Info

Publication number: CN113963303A
Application number: CN202111331061.2A
Authority: CN
Inventors: 蔡祎俊; 陈德健; 项伟
Original assignee: Bigo Technology Singapore Pte Ltd
Current assignee: Bigo Technology Singapore Pte Ltd
Priority date: 2021-11-11
Filing date: 2021-11-11
Publication date: 2022-01-21

Abstract

The embodiment of the invention discloses an image processing method, a video identification device, equipment and a storage medium. The image processing method comprises the following steps: extracting human face features meeting preset requirements from head portrait images of target users and adding the human face features into a reference human face feature set, acquiring human face features corresponding to target human faces with the occurrence frequency meeting the preset frequency requirements from historical videos issued by the target users and adding candidate human face feature sets, screening the human face features in the candidate human face feature sets based on the reference human face feature set to obtain an extended human face feature set, and finally constructing a target human face library corresponding to the target users according to the reference human face feature set and the extended human face feature set. By adopting the technical scheme, the head portrait image of the same user and the published historical video thereof can be automatically integrated to construct the face library of the user, the construction cost of the scheme is low, the construction efficiency is high, the constructed face library can cover more face state changes, and the accuracy of the face library is improved.

Description

Image processing method, video recognition method, device, equipment and storage medium

Technical Field

Embodiments of the present invention relate to the field of image processing, and in particular, to an image processing method, a video recognition method, an apparatus, a device, and a storage medium.

Background

With the rapid development of internet technology, more and more applications or platforms can support users to upload audio and video contents, such as short videos and the like. With the great increase of video content on the internet, more and more application demands appear in the aspects of video content understanding, content auditing and the like, wherein the original identification is an important application direction.

At present, common implementation manners of original recognition include a method based on video retrieval, a method based on video face recognition, and the like. The core of video retrieval is to establish a retrieval library, store retrieval characteristics corresponding to each original video, and then determine whether the video exists by comparing the video to be identified with the video in the retrieval library, so as to determine whether the video is original. The method for identifying the video face needs to establish a face library in advance, for a video needing to be identified, a face is detected from the video and then matched with the face library, whether the face in the video is consistent with the face of a video uploader is determined, and whether the video is original is further determined.

Disclosure of Invention

The embodiment of the invention provides an image processing method, a video identification method, a device, equipment and a storage medium, which can optimize the existing image processing and video identification scheme based on human faces.

In a first aspect, an embodiment of the present invention provides an image processing method, where the method includes:

acquiring an head portrait image of a target user in a preset application program, and adding a first face feature into a standard face feature set under the condition that the first face feature meeting preset requirements is extracted from the head portrait image;

acquiring second face features corresponding to a target face with the occurrence frequency meeting the preset frequency requirement from a historical video issued by the target user in the preset application program, and adding the second face features into a candidate face feature set;

screening the face features in the candidate face feature set based on the reference face feature set, and determining an extended face feature set according to a screening result;

and constructing a target face library corresponding to the target user according to the standard face feature set and the extended face feature set.

In a second aspect, an embodiment of the present invention provides a video identification method, where the method includes:

extracting human face features to be recognized from a target video to be recognized uploaded by a target user;

and comparing the face features to be recognized with each face feature in a target face library corresponding to the target user, and recognizing the originality of the target video according to a comparison result, wherein the target face library is obtained by adopting the image processing method provided by the embodiment of the invention.

In a third aspect, an embodiment of the present invention provides an image processing apparatus, including:

the system comprises an avatar image processing module, a standard face feature set and a face feature extraction module, wherein the avatar image processing module is used for acquiring an avatar image of a target user in a preset application program, and adding a first face feature into the standard face feature set under the condition that the first face feature meeting preset requirements is extracted from the avatar image;

the historical video processing module is used for acquiring second face features corresponding to a target face, the occurrence frequency of which meets the requirement of preset frequency, from a historical video issued by the target user in the preset application program, and adding the second face features into the candidate face feature set;

the human face feature screening module is used for screening the human face features in the candidate human face feature set based on the reference human face feature set and determining an expanded human face feature set according to a screening result;

and the face library construction module is used for constructing a target face library corresponding to the target user according to the standard face feature set and the extended face feature set.

In a fourth aspect, an embodiment of the present invention provides a video identification apparatus, where the apparatus includes:

the face feature extraction module is used for extracting face features to be recognized from a target video to be recognized, which is uploaded by a target user;

and the originality identification module is used for comparing the human face features to be identified with the human face features in a target human face library corresponding to the target user and identifying the originality of the target video according to a comparison result, wherein the target human face library is obtained by adopting the image processing method provided by the embodiment of the invention.

In a fifth aspect, an embodiment of the present invention provides a computer device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the computer program, the processor implements the image processing method and/or the video recognition method according to an embodiment of the present invention.

In a sixth aspect, the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the image processing method and/or the video recognition method provided by the present invention.

According to the image processing scheme provided by the embodiment of the invention, a first face feature meeting preset requirements is extracted from a head portrait image of a target user and added into a standard face feature set, a second face feature corresponding to a target face with the occurrence frequency meeting the preset frequency requirements is obtained from a historical video issued by the target user and added into a candidate face feature set, the face features in the candidate face feature set are screened based on the standard face feature set to obtain an expanded face feature set, and finally, a target face library corresponding to the target user is constructed according to the standard face feature set and the expanded face feature set. By adopting the technical scheme, the head portrait image of the same user and the published historical video thereof can be automatically integrated to construct the face library of the user, the construction cost of the scheme is low, the construction efficiency is high, the constructed face library can cover more face state changes, the face characteristics of the user can be represented more comprehensively and accurately, the accuracy of the face library is improved, and further, when the face set is used for video originality recognition and other aspects, a better application effect can be achieved.

According to the video identification scheme provided by the embodiment of the invention, the face features to be identified are extracted from the target video to be identified uploaded by the target user, the face features to be identified are compared with the face features in the target face library corresponding to the target user obtained by adopting the image processing method provided by the embodiment of the invention, and the originality of the target video is identified according to the comparison result.

Drawings

Fig. 1 is a schematic flowchart of an image processing method according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating another image processing method according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of an image processing method according to an embodiment of the present invention;

fig. 4 is a schematic flowchart of a video recognition method according to an embodiment of the present invention;

fig. 5 is a block diagram of an image processing apparatus according to an embodiment of the present invention;

fig. 6 is a block diagram of a video recognition apparatus according to an embodiment of the present invention;

fig. 7 is a block diagram of a computer device according to an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures. In addition, the embodiments and features of the embodiments in the present invention may be combined with each other without conflict.

Fig. 1 is a flowchart illustrating an image processing method according to an embodiment of the present invention, where the method may be executed by an image processing apparatus, where the apparatus may be implemented by software and/or hardware, and may be generally integrated in a computer device. The computer equipment can be mobile equipment such as a mobile phone, a tablet computer, a notebook computer, a personal digital assistant and the like; or other devices such as desktop computers or servers. As shown in fig. 1, the method includes:

step 101, acquiring an avatar image of a target user in a preset application program, and adding a first facial feature into a reference facial feature set under the condition that the first facial feature meeting preset requirements is extracted from the avatar image.

In the embodiment of the present disclosure, the preset application may be an application having a video publishing function, and the specific type is not limited, for example, the preset application may be a social application such as a short video application, and may also be another type of application. The preset application program can be installed in the electronic device as a client, the electronic device can be the same as or different from the computer device described in the embodiment of the present invention, and under different conditions, the computer device can be a server device corresponding to the preset application program.

For example, a user may register an account in a preset application program and publish a work such as a video through the registered account. Taking a specific application scene as an example, the preset application program comprises a short video application program, the user can upload a short video work shot by the user as a video author to a short video platform, the short video platform can issue the short video work to the short video application programs used by other users for playing, and then the other users can watch the short video work issued by the video author.

For example, after registering an account, the user may set a head portrait corresponding to the account. In general, in order to enhance the recognition degree of the self account, a user tends to use a photo containing the face of the user (some of the photo may also contain the face of a person having close relationship, such as a parent, a friend, or a partner) as an avatar, in addition, the display area of the avatar image in a preset application program is usually small, and in order to make other users clearly see the face of the user, the proportion of the face in the whole avatar image is usually high. In the embodiment of the invention, the reference face features are preferentially acquired from the head portrait image, so that the acquisition efficiency of the reference face features can be effectively improved and the acquisition cost is reduced.

For example, the target user may be understood as a user who needs to perform face library construction in a targeted manner, and may be any user or a specified user, which is not limited specifically. The target account may be understood as an account that the target user registers and uses in a preset application. The acquired avatar image may include all or a part of the avatar image used by the target user, that is, all or a part of the avatar image set by the target user for the target account, for example, may include the avatar image currently in use, may include the avatar image used within a first preset history period (for example, the last year), and may include all the avatar images used from the time of registration of the target account to the current time.

For example, since the head portrait of the user is usually a custom image, it is generally not guaranteed whether the user includes a face or not, and also the quality of the face is not guaranteed. For example, there may be cases where the image of the avatar is a natural scene, the image of the avatar is an unreal person such as a cartoon person, and the face of the avatar is blocked or a side face. In order to better acquire the characteristics capable of accurately representing the face of the target user, a preset requirement can be set, a first face characteristic meeting the preset requirement is extracted from the head portrait image, and a reference face characteristic set is added. The preset requirements can be set according to actual conditions, for example, the definition of the face meets the preset definition requirement (if the definition is greater than a preset definition threshold, a specific calculation mode of the definition is not limited), the face rotation angle is smaller than a preset angle threshold, the face does not have a blocking object, and the like. The face rotation angle can be understood as an included angle between a plane where the face is located and a plane where the view finding lens is located, and when the face is a standard front face, the face rotation angle is 0 or close to 0. The mask may be a mask or a sunglass, for example.

For example, a preset face detection model may be used to detect whether a head portrait image includes a face, and when it is determined that the head portrait image includes the face, a preset face recognition model is used to extract face features, determine whether the face features meet preset requirements, and mark the face features meeting the preset requirements as first face features. The number of first facial features may be one or more, and may also be 0. In the event at least one first facial feature is extracted, the extracted first facial feature is added to a set of baseline facial features.

For example, whether the facial features meeting the preset requirements are included can be directly predicted based on the head portrait image, if yes, the facial features are extracted from the head portrait image, and the extracted facial features are added into the reference facial feature set as the first facial features. The method has the advantages that the data processing amount in the processes of face detection and face recognition can be reduced, and the extraction efficiency of the first face features is improved.

For example, the initial state of the reference facial feature set may be an empty set, or may include an initial reference facial feature, which is not limited specifically. The initial reference facial features may be facial features extracted from a target image uploaded by the target user, for example, the preset application program may prompt the target user to actively upload an image containing the own face as the target image, so as to determine the reference face more accurately.

Step 102, obtaining a second face feature corresponding to a target face with the occurrence frequency meeting a preset frequency requirement from a historical video issued by the target user in the preset application program, and adding the second face feature into a candidate face feature set.

In the embodiment of the present invention, the historical videos may include all or part of videos published by the target user in the preset application program. For example, the videos may be published within a second preset historical period (e.g., the last three months, which may be dynamically determined according to factors such as the video publishing frequency of the target user). For another example, a screening manner other than the time dimension may be adopted, for example, videos recorded by the front-facing camera are selected according to the service characteristics of the preset application program, or grouping is performed according to the recording manner, and screening of the second face features is performed for each group, and the like. For many users, the user is often taken as a shooting object, that is, the video content usually contains the face of the video publishing user, so that the face features can be extracted from the historical video published by the target user to enrich the face library corresponding to the target user. In addition, the faces of other users except the target user may appear in the history video, but the appearance frequency is generally lower than that of the face of the target user, so that the preset frequency requirement can be set. For example, the face with the highest frequency of occurrence is determined as the target face; for another example, a plurality of faces with high occurrence frequency (faces with frequency greater than a preset frequency threshold, or a preset number of faces with high frequency) are determined as target faces, and it is considered that the occurrence frequency of friends or relatives of the target user may also be high, and may be related to the style or content of a video work distributed by the target user, for example, some users may be used to shoot the daily life of family members. And extracting the face features aiming at the target face, and recording the extracted face features as second face features.

For example, a face library is generally used for video recognition, various state changes and interference factors such as postures, face rotation angles, makeup, illumination and the like may exist in faces appearing in videos to be recognized, and it is difficult to effectively cover various changes when the face library is constructed manually, which is high in cost. In the embodiment of the invention, the face features are automatically mined from the historical video of the user, and the features for representing the faces of the user in different states can be quickly and accurately screened out through subsequent screening based on the reference face features, so that the accuracy of a face library is improved, and the video recognition effect is favorably improved.

Optionally, the candidate facial features may further include other facial features while focusing on including the second facial feature, and the specific source is not limited, for example, the candidate facial features may be from an avatar image or a photo actively uploaded by the user.

And 103, screening the face features in the candidate face feature set based on the reference face feature set, and determining an extended face feature set according to a screening result.

Illustratively, the face features in the standard face feature set can accurately represent the face of the target user, while the face features in the candidate face feature set are derived from videos and can cover more face state changes, the standard face feature set is used for screening the face features in the candidate face feature set, some face features with poor quality or low reference value can be eliminated, and the accuracy of a final face library is improved. Optionally, the screening may be performed by retaining the facial features close to the facial features in the reference facial feature set, and adding the retained facial features to the extended facial feature set.

And 104, constructing a target face library corresponding to the target user according to the standard face feature set and the extended face feature set.

Illustratively, the reference facial feature set and the extended facial feature set may be merged to form a target facial library corresponding to the target user. Optionally, the face features in the standard face feature set and the extended face feature set are added into an initial face library corresponding to the target user, so as to obtain a target face library. The initial face library may be empty or may include initial face features, and the source of the initial face features is not limited, and may be, for example, face features extracted from a target image containing a face of a user actively uploaded by a target user.

For example, the target face library may be used to identify the originality of the video uploaded by the target user. The specific application is not limited, and alternative applications will be described below.

The image processing method provided by the embodiment of the invention comprises the steps of extracting a first face feature meeting preset requirements from a head portrait image of a target user, adding a reference face feature set, acquiring a second face feature corresponding to a target face with the occurrence frequency meeting the preset frequency requirements from a historical video issued by the target user, adding a candidate face feature set, screening face features in the candidate face feature set based on the reference face feature set to obtain an expanded face feature set, and finally constructing a target face library corresponding to the target user according to the reference face feature set and the expanded face feature set. By adopting the technical scheme, the head portrait image of the same user and the published historical video thereof can be automatically integrated to construct the face library of the user, the construction cost of the scheme is low, the construction efficiency is high, the constructed face library can cover more face state changes, the face characteristics of the user can be represented more comprehensively and accurately, the accuracy of the face library is improved, and further, when the face set is used for video originality recognition and other aspects, a better application effect can be achieved.

In some embodiments, before the filtering the face features in the candidate face feature set based on the reference face feature set, the method further includes: and adding a third facial feature which is extracted from the head portrait image and does not meet the preset requirement into the candidate facial feature set. The method has the advantages that for the face features which cannot be added with the reference face feature set, the face features belong to the target user, but are possibly influenced by interference factors such as shielding or side faces, the reference face feature set is not successfully added, however, the interference factors can also appear in the video to be recognized, and therefore, a third face feature which does not meet the preset requirement is added into the candidate face feature set, so that the face library can cover richer face states, the anti-jamming capability of video face recognition can be improved, and the application effect of the face library is further improved. The third facial features may include all facial features or part of facial features that do not meet preset requirements in the avatar image.

In some embodiments, after the acquiring the avatar image of the target user in the preset application program, the method further includes: inputting the head portrait images into a preset classification model, and determining the categories of the head portrait images according to the output result of the preset classification model, wherein labels of training sample images corresponding to the preset classification model comprise available labels and unavailable labels; and extracting the face features from the head portrait images with the available categories to obtain first face features meeting preset requirements. The method has the advantages that the available head portrait images can be quickly and accurately identified by using the classification model, and the extraction efficiency of the first face features is improved.

For example, the preset classification model may be a two-classification model or a more-classification model, and is not limited in particular. The binary model has the advantages of simple training, low calculation overhead and the like, and can minimize the additional cost overhead. Taking a binary classification model as an example, in the model training stage, the initial classification model can be trained by using a positive sample and a negative sample, so as to obtain a preset classification model. The positive samples and the negative samples are collectively called training samples, and are specifically training sample images. The training sample image is an image carrying a label, the label of the positive sample can be available, the label of the negative sample can be unavailable, the label can be added into the image for training in an artificial labeling mode, when labeling is carried out, the definition and the face rotation angle of the face in the image can be referred, for example, the image label of the face containing the clear non-shielding face can be labeled with an available label, and other images are labeled with unavailable labels. For example, the sample image may also be evaluated by using a preset rule to determine a corresponding label, for example, the label is that the definition of the face in the available training sample image meets a preset definition requirement and the face rotation angle is smaller than a preset angle threshold, and the definition of the face in the unavailable training sample image does not meet the preset definition requirement or the face rotation angle is greater than or equal to the preset angle threshold. Optionally, whether the face is shielded or not can be judged, and if yes, the corresponding label is unavailable.

For example, after the avatar image is input into the preset classification model, it may be determined that the current avatar image is an available avatar or an unavailable avatar, and for the avatar image whose category is available, it may be considered that the available avatar image includes facial features meeting preset requirements, and at this time, the preset face recognition model is used to extract facial features from the available avatar image, so as to obtain the first facial features.

In some embodiments, the obtaining, from a history video published by the target user in the preset application program, a second face feature corresponding to a target face whose occurrence frequency meets a preset frequency requirement includes: extracting face features aiming at video frames in a historical video published by the target user in the preset application program, and adding the extracted face features serving as alternative face features into an alternative face feature pool; performing preset clustering processing on the alternative face features in the alternative face feature pool to obtain a plurality of clusters; counting the number of the alternative face features contained in each cluster, and determining the cluster with the accumulated number meeting the requirement of the preset number as a high-frequency cluster, wherein the face to which the alternative face features in the high-frequency cluster belong is recorded as a target face with the occurrence frequency meeting the requirement of the preset frequency; and acquiring a second face feature from the high-frequency cluster. The method has the advantages that the clustering algorithm can be used for rapidly classifying a plurality of face features appearing in the historical video, so that the target face with high appearance frequency can be found conveniently.

For example, the Clustering algorithm used in the predetermined Clustering process is not limited, and may be, for example, a Density-Based Clustering method with Noise (DBSCAN), spectral Clustering, or K-Means (K-Means). In order to ensure that the face features in each cluster correspond to the faces of the same person, a stricter similarity threshold may be used, and two candidate face features serving as two feature points are connected together under the condition of high similarity to form a cluster (also called a cluster). The preset number requirement may be, for example, a maximum number, a number greater than a preset number threshold, or a number ranked at least from the top greater than a preset ranking threshold, etc. Through screening required by a preset number, one or more clusters with more face features in the clusters can be obtained and recorded as high-frequency clusters, which indicates that the corresponding face has higher frequency. And then, acquiring all or part of the face features from the high-frequency clusters as second face features, and adding the candidate face feature set.

In some embodiments, the obtaining of the second facial feature from the high frequency cluster includes: for each alternative face feature in the high-frequency cluster, calculating a first similarity between the current alternative face feature and each alternative face feature except the current alternative face feature in the high-frequency cluster, and calculating a first numerical value, wherein the first numerical value is the sum of the first similarities; and obtaining the alternative face features in the high-frequency cluster, which correspond to the first numerical value and meet the requirement of a first preset numerical value, as second face features. The method has the advantages that more representative alternative face features are further screened out as second face features to be obtained aiming at the alternative face features in the high-frequency cluster, the subsequent calculation amount of screening the candidate face feature set based on the reference face feature set can be reduced, the operation and storage resources of equipment are saved, and the accuracy and precision of a face library can also be improved.

For example, the calculation mode of the first similarity is not limited, and may be cosine similarity calculation or euclidean distance calculation, or other similarity measurement algorithms may be adopted according to the data characteristics.

In some embodiments, further comprising: under the condition that first face features meeting preset requirements cannot be extracted from the head portrait image, determining the cluster with the largest accumulated number as a target cluster, wherein the high-frequency cluster comprises the target cluster; and adding the alternative face features corresponding to the first numerical value in the target cluster and meeting the requirement of a second preset numerical value into the reference face feature set. The second predetermined value requirement is generally stricter than the second predetermined value requirement, and may be, for example, the maximum first value. The method has the advantages that when the reference face feature set cannot be determined according to the head portrait image, the reference face feature set is determined according to the face with the highest occurrence frequency in the historical video, and the successful construction of the target face library is guaranteed.

In some embodiments, the screening the facial features in the candidate facial feature set based on the reference facial feature set, and determining an extended facial feature set according to a screening result includes: and for each candidate face feature in the candidate face feature set, calculating second similarity between the current candidate face feature and each reference face feature in the reference face feature set, and under the condition that at least one second similarity is larger than a preset similarity threshold, determining the current candidate face feature as an extended face feature and adding the extended face feature set. The method has the advantages that the candidate face features close to the reference face features in the reference face feature set can be rapidly screened out based on the similarity, the screened face features are guaranteed to belong to the target user to a certain extent, the face library construction efficiency is improved, and the accuracy and precision of the face library can be improved.

For example, the calculation method of the second similarity is not limited, and may be the same as or different from the calculation method of the first similarity, and may be a method of calculating cosine similarity or a method of calculating euclidean distance, or may use other similarity measurement algorithms according to data characteristics.

Fig. 2 is a schematic flow chart of another image processing method according to an embodiment of the present invention, which is optimized based on the above-mentioned alternative embodiments, and fig. 3 is a schematic diagram of a principle of the image processing method according to an embodiment of the present invention, which can be understood by referring to fig. 2 and fig. 3. As shown in fig. 2, the method may include:

step 201, acquiring an avatar image of a target user in a preset application program.

In the embodiment of the present invention, the image processing method may be executed offline or online, and is not particularly limited. For the case of offline execution, the avatar image and the historical video of each user may be stored in a database, and when a face library needs to be constructed, a corresponding avatar image (i.e., the user avatar shown in fig. 3) may be acquired from the database associated with the preset application program for the target user.

Step 202, inputting the avatar image into a preset classification model, and determining the category of the avatar image according to the output result of the preset classification model.

For example, the preset classification model may be a binary classification model for determining whether the avatar image is available, and thus may be referred to as an avatar availability model. At present, the mainstream face recognition model generally has the best recognition effect on a clear and non-blocking front face, and when interference factors such as blocking or side faces exist in a face, recognition errors are easy to occur, so that the available head portrait can be defined as the head portrait containing the clear and non-blocking front face in the embodiment of the invention. Specifically, when model training is performed, available labels are added to sample images, which meet the requirement of preset definition, have the face rotation angle smaller than a preset angle threshold value and have no shielding object, so that the sample images become positive samples, and unavailable labels are added to sample images which do not meet the condition so that the sample images become negative samples.

And 203, extracting the face features from the head portrait images with the available types to obtain first face features meeting preset requirements, adding the first face features into a reference face feature set, extracting the face features from the head portrait images with the unavailable types to obtain third face features, and adding the third face features into a candidate face feature set.

For example, based on the judgment result of the avatar availability model, the available avatars can be screened from all avatars used by the target user, and faces are detected from the available avatars through the face detection model to form a reference face set of the target user, which can be recorded as

The first face feature can be obtained by extracting the face feature of the reference face through the face recognition model. The reference facial feature set may be empty, taking into account that the target user's avatar may not be available in its entirety. If no head portrait image with the available category exists, processing aiming at the reference face feature set is not needed in the step.

For example, for the head portrait images of which the types are unavailable, some human face occlusion or interference factors such as side faces may exist in the videos uploaded by the user, and therefore, the anti-interference capability of the human face library can be improved by utilizing the head portrait images. Specifically, face recognition can be performed on the head portrait images, the recognized faces can form a head portrait face pool, further, face features can be extracted to obtain third face features, and the third face features are added into the candidate face feature set for further screening in subsequent steps. In addition, an unusable head portrait image may not exist, for example, the head portrait of the target user is a clear and unobstructed front face, and the third face feature does not exist at this time.

And 204, extracting the face features of the video frames in the historical videos released by the target user in the preset application program, and adding the extracted face features serving as alternative face features into an alternative face feature pool.

For example, a corresponding historical video (that is, the user upload video shown in fig. 3) may be acquired from a database associated with a preset application program for a target user, and in order to improve the comprehensiveness of a face library, all videos issued by the target user may be used herein to form a historical video set, which may be recorded as a historical video set

For each historical video, decoding is performed respectively to obtain a corresponding video frame set, which can be recorded as

Obtaining an alternative face pool from a set of video frames using a face detection model, which can be written as

The high-frequency face can be understood as a face with high appearance frequency in a video released by a user. To find a high frequency face from the candidate face pool, first, the face similarity in the candidate face pool can be calculated. Extracting features of each face in the alternative face pool by using the face recognition model to obtain an alternative face feature pool which can be recorded as

And step 205, performing preset clustering processing on the alternative face features in the alternative face feature pool to obtain a plurality of clusters.

Illustratively, a cosine similarity and DBSCAN clustering algorithm is adopted to divide all the candidate facial features into a plurality of clusters, and each cluster contains one or more candidate facial features. Illustratively, by two face faces_iAnd face_jCharacteristic e of_iAnd e_jCosine similarity between

The similarity of two faces is represented, and the larger the cosine similarity is, the higher the face similarity is. The DBSCAN clustering algorithm is a clustering method based on density estimation, similar feature points are connected in a bottom-up mode, and all the connected feature points form a cluster. The DBSCAN has the advantages of high calculation speed, capability of obtaining clusters in any shapes and insensitivity to outliers, so that the requirement of mining high-frequency faces can be well met. In order to ensure that the face features in each cluster correspond to the same face, a stricter similarity threshold, such as 95%, may be used.

And step 206, counting the number of the alternative human face features contained in each cluster, and determining the cluster with the accumulated number meeting the requirement of the preset number as a high-frequency cluster.

Illustratively, the candidate face feature pool obtains a plurality of clusters after passing through the DBSCAN algorithm, and the more the number of face features contained in the clusters is, the higher the frequency of the corresponding faces appears. The preset number requirement may be, for example, sorting according to the accumulated number, and determining the first a (the value of a may be set according to an actual situation, and may be 1 or 3, etc.) clusters in the sorting as high-frequency clusters, that is, high-frequency clusters.

And step 207, calculating a first similarity between the current alternative face feature and each alternative face feature except the current alternative face feature in the high-frequency cluster, calculating a first numerical value, and taking the alternative face feature corresponding to the first numerical value in the high-frequency cluster and meeting the requirement of the first preset numerical value as a second face feature to obtain for each alternative face feature in the high-frequency cluster.

Illustratively, the same cluster may contain features of a plurality of faces in different changing states, and some faces have poor usability due to factors such as angle, occlusion or video blurring, so that high-frequency face features with highest usability or higher usability can be further found from the inside of the high-frequency cluster, thereby ensuring that a face library constructed based on high-frequency faces has higher quality. To this end, each of the clusters is further calculated

Cosine similarity between all face feature pairs (namely every two face features) in the face image to obtain a similarity matrix M epsilon R^K×KWherein M is_ijAnd expressing the cosine similarity between the ith personal face feature and the jth personal face feature in the cluster. The second face feature is obtained by selecting the face feature with the maximum or larger sum of the similarities of other samples in the cluster, and this process can be regarded as summing up the rows of the similarity matrix M respectively, and then finding out the column with the maximum or larger value, that is, by

And finding the index p of the high-frequency face in the cluster so as to obtain a second face feature. The second set of facial features may be referred to as the high frequency pool of faces in FIG. 3.

And 208, adding the second face features into the candidate face feature set, determining the cluster with the largest accumulated number as a target cluster under the condition that no head portrait image with the available type exists, and adding the alternative face features, corresponding to the first numerical value and meeting the requirement of a second preset numerical value, in the target cluster into the reference face feature set.

Illustratively, after the second facial features are obtained, the second facial features are also added into the candidate facial feature set, and under the condition that the third facial features exist, the candidate facial feature set simultaneously contains the facial features from the head portrait image and the facial features of the high-frequency face from the historical video, so that the candidate facial feature set can cover various facial features under various different states. That is, the candidate facial feature set includes the head portrait face pool and the high frequency face pool in fig. 3.

If the reference face feature set cannot be obtained according to the head portrait image, the face with the highest occurrence frequency can be determined as the reference face of the user, and the face features with the highest availability or higher availability are screened out from the reference face feature set and added into the reference face feature set.

And 209, calculating second similarity of the current candidate face features and each reference face feature in the reference face feature set for each candidate face feature in the candidate face feature set, and determining the current candidate face features as extended face features and adding the extended face feature sets under the condition that at least one second similarity is larger than a preset similarity threshold.

Illustratively, after a reference face feature set and a candidate face feature set are obtained, a face feature consistent with the identity of a reference face is screened out from the candidate face feature set by using a face library screening algorithm to expand a face library and improve the face recognition capability under various interference factors in a video. Specifically, cosine similarity of the face features in the candidate face feature set and the reference face features is calculated, the face features with the cosine similarity larger than a preset similarity threshold are selected and added into the extended face feature set.

And step 210, adding the standard face feature set and the expanded face feature set into an initial face library corresponding to the target user to obtain a target face library.

Wherein the initial face library may be an empty set or already contain initial face features. For example, after the target face library is obtained, the target face library may be updated periodically or aperiodically. For example, when a target user uploads a new avatar, an update may be triggered; for another example, when the number of newly uploaded videos of the target user reaches a set value, updating may be triggered. After the update is triggered, the image processing method of the embodiment of the invention can be executed again, and incremental update can also be performed on newly uploaded head portrait images and newly published videos. For example, the newly uploaded head portrait may be input into a preset classification model, and if the output type is available, the newly uploaded head portrait may be directly added into the target face library, and if the output type is unavailable, the target face library does not need to be updated. For another example, for a newly uploaded video, face features in the newly uploaded video can be extracted, second similarity between the face features and each reference face feature in the reference face feature set is calculated, and the face features are added into the target face library under the condition that at least one second similarity is larger than a preset similarity threshold.

The image processing method provided by the embodiment of the invention preferentially extracts the reference face characteristic from the head portrait image used by the user, divides the head portrait image into the usable head portrait and the unusable head portrait by adopting a binary classification model, extracts the reference face characteristic from the usable head portrait, constructs the candidate face characteristic set according to the high-frequency face characteristic extracted from the historical video published by the user and the face characteristic extracted from the unusable head portrait image, screens the face characteristic with higher similarity from the candidate face characteristic set by utilizing the reference face characteristic set as the extended face characteristic, expands the reference face characteristic set, thereby more comprehensively representing the face characteristic set of the target user, avoids manually collecting the face material of each user, reduces a large amount of labor cost, realizes the efficient and automatic construction of the face library, has strong expandability of the face library, and can better adapt to the characteristic that the user continuously uploads a large amount of newly added videos under the service scenes such as short videos and the like, the method can be completed off-line, namely the calculation cost can be in the off-line face library construction stage, and no additional calculation step is added in the on-line face recognition stage, so that the calculation cost and the processing delay of on-line face recognition are not increased, various mainstream face detection models and face recognition models can be used in the face library construction process, and no additional training data is required to be collected for optimizing or adjusting the models, thus, the cost overhead of collecting data and optimizing the model can be avoided, and the flexibility of practical application can be improved.

Fig. 4 is a flowchart of a video recognition method according to an embodiment of the present invention, which may be executed by a video recognition apparatus, where the apparatus may be implemented by software and/or hardware, and may be generally integrated in a computer device. The computer equipment can be mobile equipment such as a mobile phone, a tablet computer, a notebook computer, a personal digital assistant and the like; or other devices such as desktop computers or servers. As shown in fig. 4, the method includes:

step 401, extracting the face features to be recognized from the target video to be recognized uploaded by the target user.

For example, the target video may be an unpublished video uploaded by the target user, or may be a published video, which is not limited specifically. The method comprises the steps of decoding a target video to obtain a corresponding video frame, detecting a group of faces from the video frame through a face detection model, and extracting face features by adopting a face recognition model (generally consistent with the face recognition model adopted in the face library construction process) to obtain the face features to be recognized, wherein the number of the face features to be recognized is multiple.

Step 402, comparing the face features to be recognized with each face feature in a target face library corresponding to the target user, and recognizing the original degree of the target video according to the comparison result.

The target face library is obtained by adopting any image processing method provided by the embodiment of the invention.

For example, the face features to be recognized are matched with the face features in the target face library one by one, the matching mode may be, for example, calculating the similarity between the face features to be recognized and the face features in the target face library, and if there are face features to be recognized whose similarity is greater than a set similarity threshold, it is indicated that the target video to be recognized includes the face of the target user, and the target video may be considered as an original video or the original degree is high.

Illustratively, after the originality of the target video is identified, the target video can be processed in a targeted manner according to the originality, and the processing mode can be set according to the actual service scene. For example, if the originality is high, the probability that the target video enters the video recommendation queue can be increased, so that more users can view the target video, and if the originality is low, the probability that the target video enters the video recommendation queue can be decreased, so that fewer users or even no user can view the target video.

According to the video identification method provided by the embodiment of the invention, the face features to be identified are extracted from the target video to be identified uploaded by the target user, the face features to be identified are compared with the face features in the target face library corresponding to the target user obtained by adopting the image processing method provided by the embodiment of the invention, and the originality of the target video is identified according to the comparison result.

Fig. 5 is a block diagram of an image processing apparatus according to an embodiment of the present invention, where the apparatus may be implemented by software and/or hardware, and may be generally integrated in a computer device, and may be configured to construct a face library by executing an image processing method. As shown in fig. 5, the apparatus includes:

the head portrait image processing module 501 is configured to acquire a head portrait image of a target user in a preset application program, and add a first facial feature to a reference facial feature set when the first facial feature meeting a preset requirement is extracted from the head portrait image;

a historical video processing module 502, configured to obtain, from a historical video issued by the target user in the preset application program, a second face feature corresponding to a target face whose occurrence frequency meets a preset frequency requirement, and add the second face feature to a candidate face feature set;

a face feature screening module 503, configured to screen face features in the candidate face feature set based on the reference face feature set, and determine an extended face feature set according to a screening result;

a face library construction module 504, configured to construct a target face library corresponding to the target user according to the standard face feature set and the extended face feature set.

The image processing device provided by the embodiment of the invention can automatically synthesize the head portrait images of the same user and the published historical videos thereof to construct the face library of the user, the construction cost of the scheme is low, the construction efficiency is high, the constructed face library can cover more face state changes, the face characteristics of the user can be more comprehensively and accurately represented, the accuracy of the face library is improved, and further, better application effect can be achieved when the face set is used for application in aspects such as video originality recognition and the like.

Fig. 6 is a block diagram of a video recognition apparatus according to an embodiment of the present invention, where the apparatus may be implemented by software and/or hardware, and may be generally integrated in a computer device, and may perform originality recognition on a video by executing a video recognition method. As shown in fig. 6, the apparatus includes:

the face feature extraction module 601 is configured to extract a face feature to be recognized from a target video to be recognized, which is uploaded by a target user;

an originality identification module 602, configured to compare the facial features to be identified with each facial feature in a target face library corresponding to the target user, and identify the originality of the target video according to a comparison result, where the target face library is obtained by using the image processing method provided in the embodiment of the present invention.

The video recognition device provided by the embodiment of the invention extracts the face features to be recognized from the target video to be recognized uploaded by the target user, compares the face features to be recognized with the face features in the target face library corresponding to the target user obtained by the image processing method provided by the embodiment of the invention, and recognizes the originality of the target video according to the comparison result.

The embodiment of the invention provides computer equipment, wherein the image processing device provided by the embodiment of the invention can be integrated in the computer equipment. Fig. 7 is a block diagram of a computer device according to an embodiment of the present invention. The computer device 700 comprises a memory 701, a processor 702 and a computer program stored on the memory 701 and executable on the processor 702, wherein the processor 702 implements the image processing method and/or the video recognition method provided by the embodiment of the invention when executing the computer program.

Embodiments of the present invention also provide a storage medium containing computer-executable instructions, which when executed by a computer processor, are used to perform the image processing method and/or the video recognition method provided by the embodiments of the present invention.

The image processing device, the video recognition device, the equipment and the storage medium provided in the above embodiments can execute the corresponding method provided in the embodiments of the present invention, and have the corresponding functional modules and beneficial effects of the execution method. Technical details that have not been elaborated upon in the above-described embodiments may be referred to a method provided in any embodiment of the invention.

Note that the above is only a preferred embodiment of the present invention. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in more detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the claims.

Claims

1. An image processing method, comprising:

2. The method of claim 1, further comprising, before the filtering the face features in the candidate face feature set based on the reference face feature set:

and adding a third facial feature which is extracted from the head portrait image and does not meet the preset requirement into the candidate facial feature set.

3. The method according to claim 1, further comprising, after the obtaining of the avatar image of the target user in the preset application program:

inputting the head portrait images into a preset classification model, and determining the categories of the head portrait images according to the output result of the preset classification model, wherein labels of training sample images corresponding to the preset classification model comprise available labels and unavailable labels;

and extracting the face features from the head portrait images with the available categories to obtain first face features meeting preset requirements.

4. The method according to claim 1, wherein the obtaining, from the historical video published by the target user in the preset application program, the second face feature corresponding to the target face whose occurrence frequency meets a preset frequency requirement includes:

extracting face features aiming at video frames in a historical video published by the target user in the preset application program, and adding the extracted face features serving as alternative face features into an alternative face feature pool;

performing preset clustering processing on the alternative face features in the alternative face feature pool to obtain a plurality of clusters;

counting the number of the alternative face features contained in each cluster, and determining the cluster with the accumulated number meeting the requirement of the preset number as a high-frequency cluster, wherein the face to which the alternative face features in the high-frequency cluster belong is recorded as a target face with the occurrence frequency meeting the requirement of the preset frequency;

and acquiring a second face feature from the high-frequency cluster.

5. The method of claim 4, wherein the obtaining second facial features from the high frequency clusters comprises:

for each alternative face feature in the high-frequency cluster, calculating a first similarity between the current alternative face feature and each alternative face feature except the current alternative face feature in the high-frequency cluster, and calculating a first numerical value, wherein the first numerical value is the sum of the first similarities;

and obtaining the alternative face features in the high-frequency cluster, which correspond to the first numerical value and meet the requirement of a first preset numerical value, as second face features.

6. The method of claim 5, further comprising:

under the condition that first face features meeting preset requirements cannot be extracted from the head portrait image, determining the cluster with the largest accumulated number as a target cluster, wherein the high-frequency cluster comprises the target cluster;

and adding the alternative face features corresponding to the first numerical value in the target cluster and meeting the requirement of a second preset numerical value into the reference face feature set.

7. The method of claim 1, wherein the screening the facial features in the candidate facial feature set based on the reference facial feature set, and determining an extended facial feature set according to the screening result comprises:

and for each candidate face feature in the candidate face feature set, calculating second similarity between the current candidate face feature and each reference face feature in the reference face feature set, and under the condition that at least one second similarity is larger than a preset similarity threshold, determining the current candidate face feature as an extended face feature and adding the extended face feature set.

8. A video recognition method, comprising:

comparing the face features to be recognized with each face feature in a target face library corresponding to the target user, and recognizing the originality of the target video according to the comparison result, wherein the target face library is obtained by adopting the image processing method according to any one of claims 1 to 7.

9. An image processing apparatus characterized by comprising:

10. A video recognition apparatus, comprising:

an originality recognition module, configured to compare the facial features to be recognized with each facial feature in a target face library corresponding to the target user, and recognize an originality of the target video according to a comparison result, where the target face library is obtained by using the image processing method according to any one of claims 1 to 7.

11. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1-8 when executing the computer program.

12. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1 to 8.