CN112100427A

CN112100427A - Video processing method and device, electronic equipment and storage medium

Info

Publication number: CN112100427A
Application number: CN202010916957.6A
Authority: CN
Inventors: 程文龙
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2020-09-03
Filing date: 2020-09-03
Publication date: 2020-12-18

Abstract

The application discloses a video processing method, a video processing device, electronic equipment and a storage medium, wherein the video processing method is applied to the electronic equipment and comprises the following steps: acquiring a plurality of face images containing faces in video data to be processed; clustering the plurality of face images to obtain a clustering result, wherein the clustering result comprises a face image set of each face in a plurality of faces; acquiring node betweenness corresponding to each face based on the clustering result; sequencing the face image set of each face according to the node betweenness to obtain a sequencing result; and presenting the face image set of the plurality of faces based on the sorting result. The method can cluster the face images in the video into the album groups of the corresponding characters, can sort the face image sets according to the node betweenness of the faces, and is convenient for a user to check the face image sets.

Description

Video processing method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of electronic device technologies, and in particular, to a video processing method and apparatus, an electronic device, and a storage medium.

Background

Electronic devices, such as mobile phones, tablet computers, etc., have become one of the most common consumer electronic products in people's daily life. Along with the development of science and technology level, mobile terminal can be provided with the camera usually to realize shooing the function, and people can use electronic equipment to take a photograph more and more conveniently, with the nice moment in the record life. The electronic equipment can store videos shot by people, so that the users can conveniently check the videos, but when the users need to check contents related to people in the videos, the users need to watch the videos to search the videos, and the searching efficiency is low.

Disclosure of Invention

In view of the foregoing problems, the present application provides a video processing method, an apparatus, an electronic device, and a storage medium.

In a first aspect, an embodiment of the present application provides a video processing method, which is applied to an electronic device, and the method includes: acquiring a plurality of face images containing faces in video data to be processed; clustering the plurality of face images to obtain a clustering result, wherein the clustering result comprises a face image set of each face in a plurality of faces; acquiring node betweenness corresponding to each face based on the clustering result; sequencing the face image set of each face according to the node betweenness to obtain a sequencing result; and presenting the face image set of the plurality of faces based on the sorting result.

In a second aspect, an embodiment of the present application provides a video processing apparatus, which is applied to an electronic device, and the apparatus includes: the image processing device comprises an image acquisition module, a face clustering module, an betweenness acquisition module, an image set ordering module and an image set display module, wherein the image acquisition module is used for acquiring a plurality of face images containing faces in video data to be processed; the face clustering module is used for clustering the face images to obtain a clustering result, and the clustering result comprises a face image set of each face in a plurality of faces; the betweenness acquiring module is used for acquiring the node betweenness corresponding to each face based on the clustering result; the image set ordering module is used for ordering the face image set of each face according to the node betweenness to obtain an ordering result; the image set display module is used for displaying the face image sets of the plurality of faces based on the sorting result.

In a third aspect, an embodiment of the present application provides an electronic device, including: one or more processors; a memory; one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the video processing method provided by the first aspect above.

In a fourth aspect, an embodiment of the present application provides a computer-readable storage medium, where a program code is stored in the computer-readable storage medium, and the program code can be called by a processor to execute the video processing method provided in the first aspect.

According to the scheme, the multiple face images containing faces in the video data to be processed are obtained, the multiple face images are clustered, a clustering result is obtained, the clustering result comprises a face image set of each face in the multiple faces, node betweenness corresponding to each face is obtained based on the clustering result, the face image sets of each face are sorted according to the node betweenness, a sorting result is obtained, the face image sets of the multiple faces are presented based on the sorting result, therefore, clustering of the face images in the video to album groups of corresponding figures can be achieved, the face image sets can be sorted according to the node betweenness of the faces, and a user can conveniently check the face image sets.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 shows a flow diagram of a video processing method according to an embodiment of the application.

Fig. 2 shows a flow diagram of a video processing method according to another embodiment of the present application.

Fig. 3 shows a flowchart of step S240 in a video processing method according to another embodiment of the present application.

Fig. 4 shows a flow diagram of a video processing method according to yet another embodiment of the present application.

Fig. 5 shows a flow diagram of a video processing method according to yet another embodiment of the present application.

Fig. 6 shows a block diagram of a video processing apparatus according to an embodiment of the present application.

Fig. 7 is a block diagram of an electronic device according to an embodiment of the present application, configured to execute a video processing method according to an embodiment of the present application.

Fig. 8 is a storage unit for storing or carrying program codes for implementing a video processing method according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application.

At present, mobile terminal uses in daily life, and the prevalence has nearly been covered by the people, and wherein, camera module has become intelligent terminal main function point, and user's accessible mobile terminal's camera function shoots the photo to record life, study, work are in the twinkling of an eye.

In addition, the electronic equipment can store the videos shot by people, so that the user can conveniently check the shot videos. In some scenes, a user may need to view the content corresponding to the characters in the video, and at this time, the user may be required to view the video and view the content in the video one by one, so that the corresponding content in the video is found. In some scenes, there may be a case that a user needs to capture an image of a person in a video, and in this case, the user also needs to watch the video and capture the image after finding the video image of the person. As such, the user may spend a significant amount of time viewing the video to find the desired image of the character content.

Of course, there are also some techniques that can generate a face album for faces of persons for video data. However, in the generated face albums of multiple people, the user cannot know which face albums of people in the video are more important, and cannot help the user dig out important people in the video.

In view of the above problems, the inventor provides a video processing method, an apparatus, an electronic device, and a storage medium, which can cluster face images in a video into album groups corresponding to people, and can sort the face image sets according to the node betweenness of faces, so that a user can conveniently know the face image sets of important people in the video when viewing the face image sets. The specific video processing method is described in detail in the following embodiments.

Referring to fig. 1, fig. 1 is a schematic flowchart illustrating a video processing method according to an embodiment of the present application. In a specific embodiment, the video processing method is applied to the video processing apparatus 400 shown in fig. 6 and the electronic device 100 (fig. 7) equipped with the video processing apparatus 400. The following will describe a specific process of this embodiment by taking an electronic device as an example, and it is understood that the electronic device applied in this embodiment may be a smart phone, a tablet computer, a smart watch, smart glasses, a notebook computer, and the like, which is not limited herein. As will be described in detail with respect to the flow shown in fig. 1, the video processing method may specifically include the following steps:

step S110: and acquiring a plurality of face images containing faces in the video data to be processed.

In the embodiment of the application, the electronic device can process the video data to be processed, so that a face image set with different faces in the video data is generated, and a user can conveniently check the face images in the video data. The video data to be processed may be video data to be subjected to face image generation.

In some embodiments, the video data to be processed may be video data obtained by detecting, by the electronic device, an operation of a user, and when determining a video selection operation performed by the user according to the detected operation, taking a local video of the electronic device selected by the user as the video to be processed, and obtaining the video data of the video to be processed. For example, the electronic device may present a picture library, where the picture library includes a plurality of videos, and when the electronic device detects a selection operation on a target video, the electronic device may respond to the selection operation to take the selected video as the above to-be-processed video. Of course, the above method is only an example, and the selected video in the folder may also be used as the above to-be-processed video according to the selection operation of the video in the folder.

In other embodiments, the video data to be processed may also be video data of a set of human face images that the electronic device receives from other devices and needs to form a face. For example, the electronic device may download video data to be processed from the server, so that after the video data to be processed is processed subsequently, a face image set of each face in the video is obtained.

Of course, in the embodiment of the present application, the manner of acquiring the video data to be processed may not be limited.

In the embodiment of the application, the electronic device can detect the face image containing the face in the video data, so that a plurality of face images containing the face can be acquired from the video data. The electronic equipment acquires the face images in the video data, and can respectively detect whether the face exists in each frame of image after the video data is split into multiple frames of images, so that all the face images containing the face can be determined.

It can be understood that there may also be a face image containing a face in the video data, and in this case, the face image may be used as a face image set of a face corresponding to the video data to be processed; in general, a large number of face images exist in a video time, and when a user takes a video through an electronic device, the user usually takes the contents of people at a plurality of moments, so that the user needs to divide the face images in the video data into a face image set. Therefore, the embodiment of the present application mainly takes the video data including a plurality of facial images as an example for description, wherein the specific number of facial images included in the video data is not limited in the embodiment of the present application. In addition, the face image obtained from the video data may include a face with one face, that is, a face with one person, or may include faces with multiple faces, that is, faces with multiple persons, for example, a person is captured when a video is captured by an electronic device during group shooting, and is recorded in the video, so that the faces with multiple faces exist in the video data.

Step S120: and clustering the plurality of face images to obtain a clustering result, wherein the clustering result comprises a face image set of each face in the plurality of faces.

In the embodiment of the application, the electronic device clusters the obtained multiple face images, so that images belonging to the same face can be extracted as a face image set of the same face, and a user can find the face image required to be found by the user in the video data by checking the face image set of each face.

In some embodiments, the electronic device may cluster a plurality of face images according to the face features of the face images, for example, extract the face features in each face image through a face feature algorithm, then cluster the face features through a clustering algorithm, and determine the images corresponding to the face features belonging to the same category as the face images of the same face according to the clustering result. It can be understood that when different face images contain faces with the same face, the face features of the face images should be similar; in addition, when the face image contains the faces with a plurality of faces, the face features of different faces in the face image can be clustered with the face features of other face images, so that the face images can be clustered into one class as long as the face features of any one face are similar, namely the face images are clustered into the same face. For example, if the face image a includes face images of a person 1 and a person 2, and the face image B includes face images of a person 2 and a person 3, the face image a and the face image B may be clustered into one type.

Step S130: and acquiring the node betweenness corresponding to each face based on the clustering result.

In the embodiment of the application, the electronic device clusters a plurality of face images in the video data, and after a clustering result is obtained, a face image set with different faces can be obtained. When the user views the face image sets, the user cannot know which face albums of people in the video are more important, so that the user cannot be helped to dig out important people in the video. Therefore, the electronic device can further mine the importance of different faces, that is, the importance of different people, so that the importance of people is presented in the above face image sets in the following process, and a user can conveniently view the face image sets.

In this embodiment of the application, the electronic device may obtain the node betweenness corresponding to the different faces, that is, the node betweenness of people corresponding to the different faces, based on the above clustering result. The node betweenness refers to the proportion of the number of paths passing through the node in all shortest paths in the network to the total number of the shortest paths, the network refers to a character relationship network, and the node betweenness can reflect the action and the influence of the corresponding node or edge in the whole network and is an important global geometric quantity. That is to say, the node betweenness corresponding to the face may reflect the role and influence of the person corresponding to the face in the person relationship network, and the larger the node betweenness is, the higher the importance of the person corresponding to the face in the person relationship network is; conversely, the smaller the node betweenness, the lower the importance of the person corresponding to the face in the person relationship network.

In some embodiments, the above person relationship network may refer to a person relationship network formed by faces appearing in the video data, that is, a person relationship network may be constructed according to different faces in the clustering result, and then the node betweenness corresponding to each face is determined from the person relationship network.

Step S140: and sequencing the face image set of each face according to the node betweenness to obtain a sequencing result.

In the embodiment of the application, after the electronic device obtains the node numbers of different faces, the face image set of each face can be sorted according to the node numbers, so that the sorting of the face image set of each face is determined. The electronic equipment can determine the sequence of the face image set corresponding to each face according to the node betweenness.

In some embodiments, the electronic device ranks the face image sets of each face according to the node betweenness of each face, and may include: and sequencing the face image set of each face based on the sequence of the node betweenness corresponding to each face from large to small to obtain a sequencing result. It can be understood that, since the node betweenness corresponding to the face can reflect the role and influence of the person corresponding to the face in the person relationship network, the greater the node betweenness, the higher the importance of the person corresponding to the face in the person relationship network, and therefore, in the ranking result, the more important the face image set of the face that is the front in the order is, the more important the face to which the face image set belongs.

Step S150: and presenting the face image set of the plurality of faces based on the sorting result.

In the embodiment of the application, after the electronic device ranks the obtained face image sets of multiple faces and obtains a ranking result, the electronic device may present the face image sets of multiple faces based on the ranking result. As an implementation manner, when the electronic device detects a viewing operation on a face image set, the face image set may be displayed according to a sequence of each image set in the sequence, for example, the face image set may be used as one folder, and then a plurality of folders are displayed in a page according to the sequence, and the earlier the face image set is in the sequence, the earlier the corresponding folder is in the page; as another embodiment, the electronic device may divide the plurality of face image sets into a plurality of pages according to the ranking of each face image set, each page includes one or at least two face image sets, the display order of different pages corresponds to the ranking of the face image sets, and then the plurality of pages are displayed in sequence, for example, each page corresponds to one face image set, the display order of different pages is different, a user may switch the pages to view different face image sets, and the face image set with a high importance is located on a page that is earlier in the display order, so that the user may preferentially view the face image set with a high importance when viewing the face image set of a face in the video data; in another embodiment, the electronic device may also display the top N face image sets in the ranking according to the ranking of the face image sets, for example, display folders corresponding to the top N face image sets in a page. Of course, the manner in which the face image sets of multiple faces are presented based on the ordering of the face image sets of different faces may not be limiting.

The video processing method provided by the embodiment of the application comprises the steps of obtaining a plurality of face images containing faces in video data to be processed, clustering the plurality of face images to obtain a clustering result, wherein the clustering result comprises a face image set of each face in the plurality of faces, then obtaining node betweenness corresponding to each face based on the clustering result, sequencing the face image set of each face according to the node betweenness to obtain a sequencing result, and presenting the face image sets of the plurality of faces based on the sequencing result, so that the face images in the video can be clustered to the face image sets of corresponding persons, the face image sets can be sequenced according to the node betweenness of the faces, and a user can conveniently check the face image sets.

Referring to fig. 2, fig. 2 is a schematic flowchart illustrating a video processing method according to another embodiment of the present application. The video processing method is applied to the electronic device, and will be described in detail with respect to the flow shown in fig. 2, and the video processing method may specifically include the following steps:

step S210: and acquiring a plurality of face images containing faces in the video data to be processed.

Step S220: and clustering the plurality of face images to obtain a clustering result, wherein the clustering result comprises a face image set of each face in the plurality of faces.

In the embodiment of the present application, step S210 and step S220 may refer to the contents of the foregoing embodiments, and are not described herein again.

Step S230: and taking each face in the clustering result as a node of a character relationship network.

The video processing method provided by the embodiment of the application introduces the node betweenness corresponding to each face after the face image sets of different faces in the video data are obtained in detail.

In the embodiment of the application, when the electronic device obtains the node betweenness of each face, each face can be used as a node in the character relationship network. The face corresponding to the face image set obtained in the embodiment of the application corresponds to the character, so that each face can be used as a node of the character relation network; in addition, the edge refers to an edge between connecting nodes for characterizing the relationship between different nodes.

Step S240: and acquiring the node betweenness corresponding to each node in the character relationship network as the node betweenness corresponding to each face.

In this embodiment of the application, after the electronic device takes each face as a node in the character relationship network, the node betweenness corresponding to each node is the node betweenness corresponding to each face. Therefore, the electronic device can obtain the node betweenness corresponding to each node to obtain the node betweenness corresponding to each face.

In some embodiments, referring to fig. 3, obtaining the node betweenness corresponding to each node in the physical relationship network may include:

step S241: and acquiring the times of group photo among different faces in the clustering result based on the plurality of face images.

In this embodiment, the face image of the video data may include a face with one face, that is, a face with one person, or may include faces with multiple faces, that is, faces with multiple persons, for example, a person is captured by an electronic device when a video is captured during group shooting, and is recorded in the video, so that a face with multiple faces exists in the video data. Therefore, for the face image comprising a plurality of faces, whether a group photo is generated between different faces can be determined according to the face image.

In this embodiment, the electronic device may determine the number of group shots between different faces in the above clustering result based on the above plurality of face images in the video data. It can be understood that the number of times of group photo reflects the closeness of the connection between the faces, and if the number of times of group photo between two faces is larger, it means that the number of times of group photo between the persons corresponding to the two faces is larger, that is, the connection between the persons corresponding to the two faces is closer.

Optionally, the electronic device may filter a face image including at least two faces from the plurality of face images based on the plurality of face images; and determining the times of group photo between different faces according to all the face images containing at least two faces.

The electronic device determines the number of times of group photo between different faces according to all face images including at least two faces, may be traversing each face image including at least two faces, and determines that group photo is performed between faces corresponding to face image sets according to face image sets of different faces to which the face image belongs, and after traversing all face images including at least two faces, may determine the number of times of group photo between each face and other different faces according to obtained results, that is, obtain the number of times of group photo between different faces.

Step S242: and taking the times of group photo between different faces as the edge weight between the nodes corresponding to different faces in the character relation network.

In this embodiment, since the number of times of composition reflects the degree of closeness of the relation between faces, if the number of times of composition between two faces is large, it indicates that the number of times of composition between persons corresponding to the two faces is large, and therefore the number of times of composition between different faces can be used as the edge weight between nodes corresponding to different faces in the person relationship network. The edge weight is used for reflecting the relationship association degree between the nodes, namely the closeness degree of the relationship between the nodes, and when the edge weight is larger, namely the relationship between the nodes is tighter, the length of the edge corresponding to the edge weight is shorter; conversely, when the edge weight is smaller, that is, the relationship between the nodes is more distant, the length of the edge corresponding to the edge weight is smaller. Thus, by determining edge weights for edges between different nodes in the people relationship network, the length of the edges between the nodes can be determined.

Step S243: determining a first number of shortest paths in the human relationship network and a second number of shortest paths through each node based on edge weights between nodes corresponding to the different faces.

In this embodiment, after obtaining the edge weights between the nodes corresponding to different faces, the electronic device may determine, according to the edge weights between the nodes, a first number of shortest paths in the human relationship network and a second number of shortest paths passing through each node. It is to be understood that, since the distance between nodes in the human relationship network, that is, the length of the edge, is related to the edge weight, the shortest paths in the human relationship network and the shortest path of the shortest paths through the node corresponding to each face can be determined based on the edge weight. The shortest path refers to a path having the shortest distance among all connected paths between two nodes.

Optionally, for the person relationship network to which the side weight is given, since the distance satisfies the triangle inequality, that is, the sum of the distances of the two sides is necessarily greater than the third side, the shortest path can be determined according to the relationship between the side weight and the length of the side.

Optionally, the electronic device may also determine the distance between each node according to the relationship between the edge weight and the length of the edge, and then, when determining the shortest path, calculate the distance of the paths through traversing all the communicated paths between the two nodes, and determine the path corresponding to the shortest distance from the calculated multiple distances.

Step S244: and acquiring the ratio of the second quantity to the first quantity corresponding to each node to obtain the node betweenness corresponding to each node.

In this embodiment, after obtaining the second number of the shortest paths passing through each node and the total number of the shortest paths, the electronic device may obtain the node betweenness corresponding to each node according to a ratio of the second number to the first number.

For the calculation of the node betweenness, the calculation can be performed according to the following formula:

wherein, σ is the shortest path number, σ (s, t | v) is the shortest path number which needs to pass through the node v in the shortest paths from the node s to the node t, and σ (s, t) is the shortest path number from the node s to the node t. When the shortest path is unique, C_B(v) Only the number of shortest paths through v is calculated.

And calculating the node corresponding to each face according to the calculation mode, so as to obtain the node betweenness of the node corresponding to each face.

Step S250: and sequencing the face image set of each face according to the node betweenness to obtain a sequencing result.

Step S260: and presenting the face image set of the plurality of faces based on the sorting result.

In the embodiment of the present application, step S250 and step S260 may refer to the contents of the foregoing embodiments, and are not described herein again.

The video processing method provided by the embodiment of the application can cluster the face images in the video into the face image sets of the corresponding persons, can sort the face image sets according to the node betweenness of the faces, and is convenient for users to check the face image sets. In addition, a calculation mode of the node betweenness of the faces is provided, and the node betweenness reflects the importance of the figures corresponding to the faces in the figure relation network, so that when the face image sets of a plurality of faces are presented according to the sorting result, a user can know the importance of the figures corresponding to the faces, and the user can conveniently check the required face image sets of the figures.

Referring to fig. 4, fig. 4 is a flowchart illustrating a video processing method according to another embodiment of the present application. The video processing method is applied to the electronic device, and will be described in detail with respect to the flow shown in fig. 4, and the video processing method may specifically include the following steps:

step S310: and acquiring a key frame image in the video data.

In the embodiment of the application, when the electronic device acquires a plurality of face images containing faces in video data according to the video data, the electronic device can acquire key frame images in the video data so as to filter out irrelevant or repeated frame images in the video data, thereby reducing the calculation amount of the electronic device in the subsequent processing process, and in addition, avoiding repeated face images in the acquired face images.

In some embodiments, since many frames in a video are not associated with motion, the frames associated with motion may be used as key frames. Optionally, the optical flow of the object motion in the video may be analyzed, a video frame with the smallest optical flow moving frequency in a video lens of the video is selected as the extracted key frame each time, and the amount of motion of the video frame calculated by using the optical flow method may be calculated according to the following formula:

M(k)＝∑∑|L_x(i，j，k)|+|L_y(i，j，k)|

where m (k) represents the motion amount of the k-th frame, Lx (i, j, k) represents the component of the optical flow X at the pixel point (i, j) of the k-th frame, and Ly (i, j, k) represents the component of the optical flow y at the pixel point (i, j) of the k-th frame.

After the motion amount of the video frame is calculated, the local minimum value can be taken as a key frame, so that the key frame in the video data is obtained.

In other embodiments, the key frames in the video data may be extracted based on a clustering method. Optionally, the electronic device may divide the video frame into a plurality of clusters by clustering, and then select a corresponding frame in each cluster as a key frame. Specifically, the electronic device may initialize a clustering center according to a clustering algorithm; determining a reference frame classified into a class or a new clustering center as the class by calculating the range between the clustering center and the current frame; and then selecting the video frame closest to the clustering center to be processed into a key frame.

Of course, the specific way of the electronic device to extract the key frame image in the video data may not be limited, for example, the electronic device may also extract the key frame in the video data by using some video processing tools (e.g., Fast Forward Mpeg tools, etc.).

Step S320: and performing face detection on the key frame image to acquire a plurality of face images containing faces in the key frame image.

In the embodiment of the application, after the electronic device acquires the key frame images in the video data, the electronic device may perform face detection on the key frame images, so as to acquire a plurality of face images including faces in all the key frame images.

In some embodiments, before performing face detection on the key frame image and acquiring a plurality of face images including a face in the key frame image, the electronic device may further perform filtering on the key frame image to filter out a key frame image that meets a certain quality condition. Specifically, the electronic device may filter key frame images satisfying a first preset quality condition from the key frame images.

Alternatively, the image quality of the key frame image may be referred to by image parameters such as the degree of color level distribution dispersion, the degree of quality sharpness, and the brightness. Wherein, the color level distribution dispersion degree can be expressed by using the color level distribution dispersion degree, the quality definition can be expressed by using the image average gradient, and the brightness can be expressed by using the image average value.

The standard deviation of the image reflects the dispersion degree of the gray value of the image pixel relative to the average value, and the larger the standard deviation is, the more dispersed the gray level distribution in the image is, and the better the image quality is. Given a key frame image F, where M is the number of rows of the image matrix and N is the number of columns of the image matrix, the image standard deviation std of F is:

the average gradient of the image can measure the detail contrast and the texture transformation degree in the image, and reflects the definition degree of the image to a certain extent, so that the average gradient of the image

Comprises the following steps:

wherein Δ x (F (i, j) and Δ y (F (i, j)) respectively represent the first order difference of the pixel point (i, j) in the x or y direction.

The image mean may reflect the average brightness of the image, with the greater the average brightness the better the image quality. The image mean u of the key frame image F is:

in this embodiment, the first preset quality condition may include: a combination of one or more conditions that the image standard deviation is greater than a first threshold, the image mean gradient is greater than a second threshold, and the image mean is greater than a third threshold. Specific values of the first threshold, the second threshold, and the third threshold may not be limited.

Further, after the electronic device screens out the key frame images meeting the first preset quality condition, the electronic device may perform face detection on the key frame images meeting the first preset quality condition to obtain a plurality of face images including faces.

In the embodiment of the present application, a specific face detection algorithm may not be limited, and for example, the face detection algorithm may be a retina face algorithm or the like.

Step S330: and acquiring the face images meeting a second preset quality condition from the plurality of face images.

In the embodiment of the application, after the electronic device acquires a plurality of face images, the face images can be screened to filter out low-quality face images. Specifically, the electronic device may acquire a face image satisfying a second preset quality condition from the plurality of face images.

In some embodiments, the second preset quality condition may include: the face detection confidence of the face image is larger than a fourth threshold value, and/or the face angle is smaller than a fifth threshold value. Specific values of the fourth threshold and the fifth threshold may not be limited.

Step S340: and filtering the face images which do not belong to any category from the face images meeting the second preset quality condition to obtain the face images to be clustered.

In the embodiment of the application, after the face image meeting the second preset quality condition is acquired, the passerby can be filtered to filter the face image of the passerby. Specifically, the electronic device may extract facial features of each facial image, for example, extracting the facial features by adopting FaceNet algorithm, then clustering by using HDBSCAN and filtering out all-1 types of facial images.

Step S350: and clustering the face images to be clustered to obtain a clustering result, wherein the clustering result comprises a face image set of each face in a plurality of faces.

In the embodiment of the application, after the electronic device obtains the face images to be clustered, the face images to be clustered can be clustered, so that a clustering result is obtained. The electronic device can extract the face features of the face images to be clustered, and then perform clustering by using the face features of the face images to be clustered.

In some embodiments, the video data to be processed may be video data of multiple videos, and in this case, when clustering the face images to be clustered, the electronic device may cluster the face images to be clustered corresponding to each of the multiple videos, and obtain a first clustering result corresponding to each video; then, acquiring face images of different faces in each video according to the first clustering result corresponding to each video; then obtaining the average human face characteristics of the human face images of different faces in each video; clustering the average face features corresponding to different faces in the videos based on the average face features corresponding to different faces in each video to obtain a second clustering result; and finally, integrating the first clustering result according to the second clustering result to obtain a third clustering result.

When the electronic equipment processes the face images of a single video, the electronic equipment can directly use K-means to cluster based on the number K of the detected faces, and calculate the average face characteristics of each type, and then cluster the average face characteristics corresponding to different faces in a plurality of videos by using the average face characteristics of each type to obtain a second clustering result. It can be understood that, by obtaining the average face feature of each type (i.e., each face) for each video and then clustering the average face features of the plurality of videos, huge calculation amount caused by directly putting the face images in the plurality of videos together for clustering can be avoided.

After obtaining the second clustering result, the electronic device may integrate the images of the respective categories in the first clustering result into the corresponding categories in the second clustering result, so as to obtain a final clustering result.

Step S360: and acquiring the node betweenness corresponding to each face based on the clustering result.

Step S370: and sequencing the face image set of each face according to the node betweenness to obtain a sequencing result.

Step S380: and presenting the face image set of the plurality of faces based on the sorting result.

In the embodiment of the present application, steps S360 to S380 may refer to the contents of the foregoing embodiments, and are not described herein again.

According to the video processing method provided by the embodiment of the application, when the face image in the video data is obtained, the key frame image is obtained, and then the face image containing the face is extracted, so that the calculation amount of the electronic equipment can be greatly reduced, and the repetition of the face image concentrated by the subsequently obtained face image is avoided. In addition, when the face images are clustered, the face images are screened, so that the face images with better quality are screened, misclassification during clustering is avoided, and the clustering accuracy is improved. And subsequently, the face image set is sorted according to the node betweenness of the faces, so that a user can conveniently check the face image set.

Referring to fig. 5, fig. 5 is a schematic flowchart illustrating a video processing method according to still another embodiment of the present application. The video processing method is applied to the electronic device, and will be described in detail with respect to the flow shown in fig. 5, and the video processing method may specifically include the following steps:

step S410: and acquiring a plurality of face images containing faces in the video data to be processed.

Step S420: and clustering the plurality of face images to obtain a clustering result, wherein the clustering result comprises a face image set of each face in the plurality of faces.

Step S430: and acquiring the node betweenness corresponding to each face based on the clustering result.

Step S440: and sequencing the face image set of each face according to the node betweenness to obtain a sequencing result.

In the embodiment of the present application, steps S410 to S440 may refer to the contents of the foregoing embodiments, and are not described herein again.

Step S450: and determining whether the photo album corresponding to each face exists.

In the embodiment of the application, after the electronic device sorts the face image set of each face and obtains the sorting result, an album corresponding to each face can be formed, so that a user can view the face image set of the face in the video.

In some embodiments, after obtaining the face image sets, the electronic device may identify the categories to which the faces corresponding to the respective face image sets belong. The category to which each face belongs may be a person corresponding to the face, and for example, it may be recognized that the face is a user corresponding to the electronic device, or a person other than the user (for example, family, friend, colleague), and the like.

Optionally, the electronic device may perform category identification on the face images in the face image set by using a pre-trained classification model. The classification model is obtained by training the initial model by using the face image which is marked with the category of the person in advance, so that the classification model can identify the category to which the face image in the face image set belongs.

Further, after the electronic device determines the category corresponding to each face image set, it may be determined whether the electronic device currently has an album corresponding to the category, that is, whether the album corresponding to the face exists. It can be understood that, in the electronic device, an album of a certain person may already exist before, and therefore, a newly acquired face image set of the person may be considered to be added to the existing album, so that formation of the album of the face image set may be realized

Step S460: and when the photo album corresponding to each face exists, adding the face image set corresponding to each face to the photo album corresponding to each face.

In the embodiment of the application, when the photo album corresponding to each face exists, the images in the face image set can be directly added to the photo album corresponding to the face; when the photo album corresponding to any face does not exist, the photo album corresponding to the face can be created, and the images in the face image set corresponding to the face are added into the newly created photo album, so that the face photo album of the face is formed.

Step S470: and presenting the photo album of the plurality of faces based on the sequencing result.

In the embodiment of the application, when the electronic device presents the face image set according to the sorting result, because the face images in the face image set are added to the photo albums of the corresponding faces, the photo albums of a plurality of faces can be presented based on the sorting result. For example, albums of a plurality of faces may be displayed in the page in a sequential order, and the album of the corresponding face is located further forward in the page as the face image set is located further forward in the ranking.

In some embodiments, the electronic device may also generate a cover of the photo album of each face, and when the photo album of each face is displayed, the cover of the photo album may be displayed, so that a user can conveniently identify the photo albums corresponding to different faces.

Optionally, the electronic device may obtain the importance score corresponding to each face image from an album of each face, and then obtain the face image with the highest importance score from the face images in the album as an album cover according to the importance score. The score of the face image can be determined according to the historical operation record of the face image by the user, and the historical operation record can reflect the attention degree of the face image by the user, so that the photo album cover can facilitate the user to identify the face corresponding to the photo album.

Optionally, the face images in the album may simultaneously include face images of other faces, so that the electronic device may further screen out the face images only including the face corresponding to the album from all the face images in the album, and then select the face image with the highest image quality from the screened face images as the album cover. The image quality may include a color level distribution dispersion degree, a definition degree, a brightness, and the like. It can be understood that, because the cover of the album is the face image only containing the face, and the image quality is the best, the user can conveniently identify the face corresponding to the album.

The video processing method provided by the embodiment of the application can cluster the face images in the video into the album groups of the corresponding persons, can sort the face image sets according to the node betweenness of the faces, and is convenient for a user to check the face image sets. In addition, the electronic equipment also adds the face images in the face image set to the photo album corresponding to the faces aiming at each face image set, so that the user can conveniently check the face image set.

Referring to fig. 6, a block diagram of a video processing apparatus 400 according to an embodiment of the present disclosure is shown. The video processing apparatus 400 applies the above-mentioned electronic device, and the video processing apparatus 400 includes: an image acquisition module 410, a face clustering module 420, an betweenness acquisition module 430, an image set ordering module 440, and an image set presentation module 450. The image obtaining module 410 is configured to obtain a plurality of face images including faces in video data to be processed; the face clustering module 420 is configured to cluster the plurality of face images to obtain a clustering result, where the clustering result includes a face image set of each of a plurality of faces; the betweenness obtaining module 430 is configured to obtain a node betweenness corresponding to each face based on the clustering result; the image set ordering module 440 is configured to order the face image set of each face according to the node betweenness, so as to obtain an ordering result; the image set presentation module 450 is configured to present a face image set of the plurality of faces based on the ranking result.

In some embodiments, the betweenness obtaining module 430 may include: a node acquisition unit and an betweenness determination unit. The node acquisition unit is used for taking each face in the clustering result as a node of a character relationship network; the betweenness determining unit is used for acquiring the node betweenness corresponding to each node in the character relationship network as the node betweenness corresponding to each face.

In this embodiment, the betweenness determining unit may be specifically configured to: acquiring the times of group photo among different faces in the clustering result based on the plurality of face images; taking the times of group photo between different faces as the edge weight between the nodes corresponding to different faces in the character relation network; determining a first number of shortest paths in the people relationship network and a second number of shortest paths through each node based on edge weights between nodes corresponding to the different faces; and acquiring the ratio of the second quantity to the first quantity corresponding to each node to obtain the node betweenness corresponding to each node.

In some embodiments, the image acquisition module 410 may include: a key frame image acquisition unit and a face image acquisition unit. The key frame image acquisition unit is used for acquiring key frame images in the video data; the face image acquisition unit is used for carrying out face detection on the key frame image and acquiring a plurality of face images containing faces in the key frame image.

In this embodiment, the image acquisition module 410 may further include a first screening unit. The first screening unit is used for screening the key frame images meeting a first preset quality condition from the key frame images before the key frame images are subjected to face detection to obtain a plurality of face images including faces in the key frame images; the face image acquisition unit is used for carrying out face detection on the key frame images meeting the first preset quality condition and acquiring a plurality of face images containing faces.

In some embodiments, the face clustering module 420 may include: the image filtering device comprises a second image screening unit, an image filtering unit and an image clustering unit. The second image screening unit is used for acquiring the face images meeting a second preset quality condition from the plurality of face images; the image filtering unit is used for filtering face images which do not belong to any category from the face images meeting the second preset quality condition to obtain face images to be clustered; and the image clustering unit is used for clustering the face images to be clustered to obtain clustering results.

In this embodiment, the video data includes video data of a plurality of videos. The image clustering unit may specifically be configured to: clustering face images to be clustered corresponding to each video in the plurality of videos to obtain a first clustering result corresponding to each video; acquiring face images of different faces in each video according to the first clustering result corresponding to each video; acquiring average face features of face images of different faces in each video; clustering the average face features corresponding to different faces in the plurality of videos based on the average face features corresponding to different faces in each video to obtain a second clustering result; and integrating the first clustering results according to the second clustering results to obtain third clustering results.

In some embodiments, the image set ordering module 440 may be specifically configured to: and sequencing the face image set of each face based on the sequence of the node betweenness corresponding to each face from large to small to obtain a sequencing result.

In some embodiments, the image set presentation module 450 may include: the album display system comprises an album determining unit, an album adding unit and an album displaying unit. The photo album determining unit is used for determining whether a photo album corresponding to each face exists or not; the album adding unit is used for adding the face image set corresponding to each face to the album corresponding to each face when the album corresponding to each face exists; and the album display unit is used for displaying the albums of the plurality of faces based on the sequencing result.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described apparatuses and modules may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the several embodiments provided in the present application, the coupling between the modules may be electrical, mechanical or other type of coupling.

In addition, functional modules in the embodiments of the present application may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.

In summary, according to the scheme provided by the application, a plurality of face images including faces in video data to be processed are obtained, the plurality of face images are clustered, a clustering result is obtained, the clustering result includes a face image set of each face in a plurality of faces, node betweenness corresponding to each face is obtained based on the clustering result, the face image sets of each face are sorted according to the node betweenness, a sorting result is obtained, the face image sets of the plurality of faces are presented based on the sorting result, therefore, clustering of the face images in a video into album groups of corresponding persons can be achieved, sorting of the face image sets can be achieved according to the node betweenness of the faces, and a user can conveniently check the face image sets.

Referring to fig. 7, a block diagram of an electronic device according to an embodiment of the present application is shown. The electronic device 100 may be an electronic device capable of running an application, such as a smart phone, a tablet computer, a smart watch, smart glasses, and a notebook computer. The electronic device 100 in the present application may include one or more of the following components: a processor 110, a memory 120, and one or more applications, wherein the one or more applications may be stored in the memory 120 and configured to be executed by the one or more processors 110, the one or more programs configured to perform a method as described in the aforementioned method embodiments.

Processor 110 may include one or more processing cores. The processor 110 connects various parts within the overall electronic device 100 using various interfaces and lines, and performs various functions of the electronic device 100 and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 120 and calling data stored in the memory 120. Alternatively, the processor 110 may be implemented in hardware using at least one of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), and Programmable Logic Array (PLA). The processor 110 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing display content; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 110, but may be implemented by a communication chip.

The Memory 120 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). The memory 120 may be used to store instructions, programs, code sets, or instruction sets. The memory 120 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing various method embodiments described below, and the like. The data storage area may also store data created by the electronic device 100 during use (e.g., phone book, audio-video data, chat log data), and the like.

Referring to fig. 8, a block diagram of a computer-readable storage medium according to an embodiment of the present application is shown. The computer-readable medium 800 has stored therein a program code that can be called by a processor to execute the method described in the above-described method embodiments.

The computer-readable storage medium 800 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. Alternatively, the computer-readable storage medium 800 includes a non-volatile computer-readable storage medium. The computer readable storage medium 800 has storage space for program code 810 to perform any of the method steps of the method described above. The program code can be read from or written to one or more computer program products. The program code 810 may be compressed, for example, in a suitable form.

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not necessarily depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A video processing method applied to an electronic device, the method comprising:

acquiring a plurality of face images containing faces in video data to be processed;

clustering the plurality of face images to obtain a clustering result, wherein the clustering result comprises a face image set of each face in a plurality of faces;

acquiring node betweenness corresponding to each face based on the clustering result;

sequencing the face image set of each face according to the node betweenness to obtain a sequencing result;

and presenting the face image set of the plurality of faces based on the sorting result.

2. The method according to claim 1, wherein the obtaining the node betweenness corresponding to each face based on the clustering result comprises:

taking each face in the clustering result as a node of a character relationship network;

and acquiring the node betweenness corresponding to each node in the character relationship network as the node betweenness corresponding to each face.

3. The method of claim 2, wherein the obtaining the node betweenness corresponding to each node in the human relationship network comprises:

acquiring the times of group photo among different faces in the clustering result based on the plurality of face images;

taking the times of group photo between different faces as the edge weight between the nodes corresponding to different faces in the character relation network;

determining a first number of shortest paths in the people relationship network and a second number of shortest paths through each node based on edge weights between nodes corresponding to the different faces;

and acquiring the ratio of the second quantity to the first quantity corresponding to each node to obtain the node betweenness corresponding to each node.

4. The method of claim 1, wherein the obtaining a plurality of facial images in the video data to be processed comprises:

acquiring a key frame image in the video data;

and performing face detection on the key frame image to acquire a plurality of face images containing faces in the key frame image.

5. The method according to claim 4, wherein before the performing face detection on the key frame image to obtain a plurality of face images including faces in the key frame image, the method further comprises:

screening key frame images meeting a first preset quality condition from the key frame images;

the performing face detection on the key frame image to obtain a plurality of face images including faces in the key frame image includes:

and performing face detection on the key frame images meeting the first preset quality condition to obtain a plurality of face images containing faces.

6. The method according to claim 1, wherein the clustering the plurality of facial images to obtain a clustering result comprises:

acquiring a face image meeting a second preset quality condition from the plurality of face images;

filtering face images which do not belong to any category from the face images meeting the second preset quality condition to obtain face images to be clustered;

and clustering the face images to be clustered to obtain a clustering result.

7. The method according to claim 6, wherein the video data comprises video data of a plurality of videos, and the clustering the face images to be clustered to obtain a clustering result comprises:

clustering face images to be clustered corresponding to each video in the plurality of videos to obtain a first clustering result corresponding to each video;

acquiring face images of different faces in each video according to the first clustering result corresponding to each video;

acquiring average face features of face images of different faces in each video;

clustering the average face features corresponding to different faces in the plurality of videos based on the average face features corresponding to different faces in each video to obtain a second clustering result;

and integrating the first clustering results according to the second clustering results to obtain third clustering results.

8. The method according to any one of claims 1 to 7, wherein the ranking the set of face images of each face according to the node betweenness to obtain a ranking result comprises:

and sequencing the face image set of each face based on the sequence of the node betweenness corresponding to each face from large to small to obtain a sequencing result.

9. The method of any of claims 1-7, wherein the presenting the set of facial images of the plurality of faces based on the ranking result comprises:

determining whether an album corresponding to each face exists;

when the photo album corresponding to each face exists, adding the face image set corresponding to each face to the photo album corresponding to each face;

and presenting the photo album of the plurality of faces based on the sequencing result.

10. A video processing apparatus, applied to an electronic device, the apparatus comprising: an image acquisition module, a face clustering module, an betweenness acquisition module, an image set ordering module and an image set display module, wherein,

the image acquisition module is used for acquiring a plurality of face images containing faces in video data to be processed;

the face clustering module is used for clustering the face images to obtain a clustering result, and the clustering result comprises a face image set of each face in a plurality of faces;

the betweenness acquiring module is used for acquiring the node betweenness corresponding to each face based on the clustering result;

the image set ordering module is used for ordering the face image set of each face according to the node betweenness to obtain an ordering result;

the image set display module is used for displaying the face image sets of the plurality of faces based on the sorting result.

11. An electronic device, comprising:

one or more processors;

a memory;

one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to perform the method of any of claims 1-9.

12. A computer-readable storage medium, having stored thereon program code that can be invoked by a processor to perform the method according to any one of claims 1 to 9.