WO2021129444A1

WO2021129444A1 - File clustering method and apparatus, and storage medium and electronic device

Info

Publication number: WO2021129444A1
Application number: PCT/CN2020/136176
Authority: WO
Inventors: 彭冬炜
Original assignee: Oppo广东移动通信有限公司
Priority date: 2019-12-27
Filing date: 2020-12-14
Publication date: 2021-07-01
Also published as: CN111177086A

Abstract

A file clustering method, a file clustering apparatus, a computer-readable storage medium and an electronic device, which relate to the technical field of terminals. The file clustering method comprises: acquiring at least one picture file and at least one video file (S32); extracting a facial feature from each picture file (S34); extracting a facial feature from each video file (S36); and clustering the at least one picture file and the at least one video file according to the facial feature of each picture file and the facial feature of each video file (S38).

Description

File clustering method and device, storage medium and electronic equipment

Cross references to related applications

This application claims the priority of the Chinese patent application with the application number 201911382475.0 and titled "File clustering method and device, storage medium and electronic equipment" filed on December 27, 2019. The entire content of the Chinese patent application is incorporated by reference. All are incorporated into this article.

Technical field

The present disclosure relates to the field of terminal technology, and in particular, to a file clustering method, a file clustering device, a computer-readable storage medium, and an electronic device.

Background technique

With the development of terminal technology, a large number of pictures and videos can be processed and stored on the terminal. These pictures and videos are mainly obtained by shooting the scene with the camera module on the terminal. Among them, the case where the subject is a person accounts for the vast majority .

In actual storage, storage is usually only for the type (picture or video) and shooting time. The storage method is single, and it is not convenient for users to quickly find the shooting results belonging to the same object.

Summary of the invention

According to a first aspect of the present disclosure, there is provided a file clustering method, including: obtaining at least one picture file and at least one video file; extracting facial features of each picture file; extracting facial features of each video file; Clustering the at least one picture file and the at least one video file according to the face feature of each picture file and the face feature of each video file.

According to a second aspect of the present disclosure, there is provided a file clustering device, including: a file acquisition module for acquiring at least one picture file and at least one video file; a first feature extraction module for extracting information about each picture file Face features; the second feature extraction module is used to extract the face features of each video file; the file clustering module is used to compare all the face features of each image file and the face features of each video file The at least one picture file and the at least one video file are clustered.

According to a third aspect of the present disclosure, there is provided a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the above-mentioned file clustering method is realized.

According to a fourth aspect of the present disclosure, there is provided an electronic device, including a processor; a memory, configured to store one or more programs, and when the one or more programs are executed by the processor, the processor can realize the above File clustering method.

Description of the drawings

FIG. 1 shows a schematic diagram of an exemplary system architecture of a document clustering method or document clustering device to which an embodiment of the present disclosure can be applied;

Figure 2 shows a schematic structural diagram of an electronic device suitable for implementing embodiments of the present disclosure;

Fig. 3 schematically shows a flowchart of a file clustering method according to an exemplary embodiment of the present disclosure;

Fig. 4 schematically shows a flowchart of extracting facial features of a video file according to an exemplary embodiment of the present disclosure;

FIG. 5 schematically shows a flowchart of the entire process of file clustering according to an exemplary embodiment of the present disclosure;

Fig. 6 schematically shows a block diagram of a file clustering apparatus according to an exemplary embodiment of the present disclosure;

Fig. 7 schematically shows a block diagram of a file clustering apparatus according to another exemplary embodiment of the present disclosure.

Detailed ways

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the example embodiments can be implemented in various forms, and should not be construed as being limited to the examples set forth herein; on the contrary, these embodiments are provided so that the present disclosure will be more comprehensive and complete, and the concept of the example embodiments will be fully conveyed To those skilled in the art. The described features, structures or characteristics can be combined in one or more embodiments in any suitable way. In the following description, many specific details are provided to give a sufficient understanding of the embodiments of the present disclosure. However, those skilled in the art will realize that the technical solutions of the present disclosure can be practiced without one or more of the specific details, or other methods, components, devices, steps, etc. can be used. In other cases, the well-known technical solutions are not shown or described in detail in order to avoid overwhelming the crowd and obscure all aspects of the present disclosure.

In addition, the drawings are only schematic illustrations of the present disclosure, and are not necessarily drawn to scale. The same reference numerals in the figures denote the same or similar parts, and thus their repeated description will be omitted. Some of the block diagrams shown in the drawings are functional entities and do not necessarily correspond to physically or logically independent entities. These functional entities may be implemented in the form of software, or implemented in one or more hardware modules or integrated circuits, or implemented in different networks and/or processor devices and/or microcontroller devices.

The flowchart shown in the drawings is only an exemplary description, and does not necessarily include all the steps. For example, some steps can be decomposed, and some steps can be combined or partially combined, so the actual execution order may be changed according to actual conditions. In addition, all the terms "first" and "second" below are only for distinguishing purposes and should not be regarded as a limitation of the present disclosure.

FIG. 1 shows a schematic diagram of an exemplary system architecture of a file clustering method or a file clustering device to which an embodiment of the present disclosure can be applied.

As shown in FIG. 1, the system architecture 1000 may include one or more of

terminal devices

1001, 1002, 1003, a network 1004 and a server 1005. The network 1004 is used to provide a medium for communication links between the

terminal devices

1001, 1002, 1003 and the server 1005. The network 1004 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.

It should be understood that the numbers of terminal devices, networks, and servers in FIG. 1 are merely illustrative. According to implementation needs, there can be any number of terminal devices, networks, and servers. For example, the server 1005 may be a server cluster composed of multiple servers.

The user can use the

terminal devices

1001, 1002, 1003 to interact with the server 1005 through the network 1004 to receive or send messages and so on. The

terminal devices

1001, 1002, 1003 may be various electronic devices with display screens, including but not limited to smart phones, tablet computers, portable computers, desktop computers, and so on.

In the embodiment where the

terminal device

1001, 1002, 1003 executes the following file clustering method, the

terminal device

1001, 1002, 1003 can obtain at least one picture file and at least one video file from the server 1005 through the network 1004, in this case Next, the server 1005 may be, for example, a cloud platform such as a cloud photo album. Alternatively, the

terminal devices

1001, 1002, 1003 may take pictures and videos through the camera modules equipped with them, and obtain at least one picture file and at least one video file from them. Or, some of the picture files and video files acquired by the

terminal devices

1001, 1002, 1003 are determined by shooting by the camera module itself, and the other part is from the server 1005. This disclosure does not limit the sources of image files and video files.

It should be noted that the picture files and video files described in the exemplary embodiments of the present disclosure both contain human face images, that is, the present disclosure mainly focuses on the clustering of pictures and videos based on human faces. However, it should be understood that the solution of the present disclosure can also be applied to clustering of other photographed objects, and these other photographed objects may include, for example, animals, vehicles, buildings, etc., which is not limited in the present disclosure.

Next, for the acquired picture files and video files, the

terminal devices

1001, 1002, 1003 may respectively extract facial features, and use the extracted facial features to cluster the acquired picture files and video files. This allows image files and video files for the same subject to be assigned the same cluster ID, which is convenient for users to view.

In the embodiment in which the server 1005 executes the following file clustering method, the server 1005 can obtain the picture files and video files taken by the camera modules of the

terminal devices

1001, 1002, 1003, extract their facial features, and use them according to The facial features of the picture files and video files cluster the obtained picture files and video files.

The following will take the

terminal equipment

1001, 1002, 1003 to execute the solution of the present disclosure as an example for description. In this case, the file clustering apparatus of the exemplary embodiment of the present disclosure may be configured in the

terminal equipment

1001, 1002, 1003.

FIG. 2 shows a schematic diagram of an electronic device suitable for implementing the exemplary embodiments of the present disclosure, and the electronic device corresponds to the above terminal device such as a mobile phone. It should be noted that the electronic device shown in FIG. 2 is only an example, and should not bring any limitation to the function and scope of use of the embodiments of the present disclosure.

The electronic device of the present disclosure includes at least a processor and a memory. The memory is used to store one or more programs. When the one or more programs are executed by the processor, the processor can implement the file clustering method of the exemplary embodiment of the present disclosure. .

Specifically, as shown in FIG. 2, the electronic device 200 may include: a processor 210, an internal memory 221, an external memory interface 222, a universal serial bus (USB) interface 230, a charging management module 240, and a power management module 241, battery 242, antenna 1, antenna 2, mobile communication module 250, wireless communication module 260, audio module 270, speaker 271, receiver 272, microphone 273, earphone interface 274, sensor module 280, display screen 290, camera module 291 , Indicator 292, motor 293, button 294, Subscriber Identification Module (SIM) card interface 295, etc. The sensor module 280 may include a depth sensor 2801, a pressure sensor 2802, a gyroscope sensor 2803, an air pressure sensor 2804, a magnetic sensor 2805, an acceleration sensor 2806, a distance sensor 2807, a proximity light sensor 2808, a fingerprint sensor 2809, a temperature sensor 2810, and a touch sensor. 2811, ambient light sensor 2812, bone conduction sensor 2813, etc.

It can be understood that the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the electronic device 200. In other embodiments of the present application, the electronic device 200 may include more or fewer components than shown, or combine certain components, or split certain components, or arrange different components. The illustrated components can be implemented in hardware, software, or a combination of software and hardware.

The processor 210 may include one or more processing units. For example, the processor 210 may include an application processor (AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (Image Signal Processor, ISP), controller, video codec, digital signal processor (Digital Signal Processor, DSP), baseband processor and/or Neural-etwork Processing Unit (NPU), etc. Among them, the different processing units may be independent devices or integrated in one or more processors. In addition, a memory may be provided in the processor 210 to store instructions and data.

The USB interface 230 is an interface that complies with the USB standard specification, and specifically may be a MiniUSB interface, a MicroUSB interface, a USBTypeC interface, and so on. The USB interface 230 can be used to connect a charger to charge the electronic device 200, and can also be used to transfer data between the electronic device 200 and peripheral devices. It can also be used to connect earphones and play audio through earphones. This interface can also be used to connect other electronic devices, such as AR devices.

The charging management module 240 is used to receive charging input from the charger. Among them, the charger can be a wireless charger or a wired charger. The power management module 241 is used to connect the battery 242, the charging management module 240, and the processor 210. The power management module 241 receives input from the battery 242 and/or the charging management module 240, and supplies power to the processor 210, the internal memory 221, the display screen 290, the camera module 291, and the wireless communication module 260.

The wireless communication function of the electronic device 200 can be implemented by the antenna 1, the antenna 2, the mobile communication module 250, the wireless communication module 260, the modem processor, and the baseband processor.

The mobile communication module 250 can provide a wireless communication solution including 2G/3G/4G/5G and the like applied to the electronic device 200.

The wireless communication module 260 can provide wireless local area networks (Wireless Local Area Networks, WLAN) (such as Wireless Fidelity (Wi-Fi) networks), Bluetooth (Bluetooth, BT), and global navigation satellites used on the electronic device 200. System (Global Navigation Satellite System, GNSS), Frequency Modulation (FM), Near Field Communication (NFC), Infrared Technology (Infrared, IR) and other wireless communication solutions.

The electronic device 200 implements a display function through a GPU, a display screen 290, an application processor, and the like. The GPU is a microprocessor for image processing and is connected to the display screen 290 and the application processor. The GPU is used to perform mathematical and geometric calculations and is used for graphics rendering. The processor 210 may include one or more GPUs that execute program instructions to generate or change display information.

The electronic device 200 can realize a shooting function through an ISP, a camera module 291, a video codec, a GPU, a display screen 290, and an application processor. In some embodiments, the electronic device 200 may include 1 or N camera modules 291, and N is a positive integer greater than 1. If the electronic device 200 includes N cameras, one of the N cameras is the main camera.

The internal memory 221 may be used to store computer executable program code, where the executable program code includes instructions. The internal memory 221 may include a storage program area and a storage data area. The external memory interface 222 may be used to connect an external memory card, such as a Micro SD card, so as to expand the storage capacity of the electronic device 200.

The electronic device 200 can implement audio functions through an audio module 270, a speaker 271, a receiver 272, a microphone 273, a headphone interface 274, an application processor, and the like. For example, music playback, recording, etc.

The audio module 270 is used to convert digital audio information into an analog audio signal for output, and is also used to convert an analog audio input into a digital audio signal. The audio module 270 can also be used to encode and decode audio signals. In some embodiments, the audio module 270 may be provided in the processor 210, or part of the functional modules of the audio module 270 may be provided in the processor 210.

The speaker 271, also called a "speaker", is used to convert audio electrical signals into sound signals. The electronic device 200 can listen to music through the speaker 271, or listen to a hands-free call. The receiver 272, also called "earpiece", is used to convert audio electrical signals into sound signals. When the electronic device 200 answers a call or voice message, it can receive the voice by bringing the receiver 272 close to the human ear. The microphone 273, also called "microphone" or "microphone", is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can approach the microphone 273 through the mouth to make a sound, and input the sound signal to the microphone 273. The electronic device 200 may be provided with at least one microphone 273. The earphone interface 274 is used to connect wired earphones.

Regarding the sensors included in the electronic device 200, the depth sensor 2801 is used to obtain depth information of the scene. The pressure sensor 2802 is used to sense the pressure signal and can convert the pressure signal into an electrical signal. The gyro sensor 2803 may be used to determine the movement posture of the electronic device 200. The air pressure sensor 2804 is used to measure air pressure. The magnetic sensor 2805 includes a Hall sensor. The electronic device 200 can use the magnetic sensor 2805 to detect the opening and closing of the flip holster. The acceleration sensor 2806 can detect the magnitude of the acceleration of the electronic device 200 in various directions (generally three axes). The distance sensor 2807 is used to measure distance. The proximity light sensor 2808 may include, for example, a light emitting diode (LED) and a light detector, such as a photodiode. The fingerprint sensor 2809 is used to collect fingerprints. The temperature sensor 2810 is used to detect temperature. The touch sensor 2811 may pass the detected touch operation to the application processor to determine the type of the touch event. The visual output related to the touch operation can be provided through the display screen 290. The ambient light sensor 2812 is used to sense the brightness of the ambient light. The bone conduction sensor 2813 can acquire vibration signals.

The button 294 includes a power-on button, a volume button, and so on. The button 294 may be a mechanical button. It can also be a touch button. The motor 293 can generate vibration prompts. The motor 293 can be used for incoming call vibration notification, and can also be used for touch vibration feedback. The indicator 292 can be an indicator light, which can be used to indicate the charging status, power change, or to indicate messages, missed calls, notifications, and so on. The SIM card interface 295 is used to connect to the SIM card. The electronic device 200 interacts with the network through the SIM card to implement functions such as call and data communication.

The present application also provides a computer-readable storage medium. The computer-readable storage medium may be included in the electronic device described in the foregoing embodiment; or it may exist alone without being assembled into the electronic device.

The computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable removable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In the present disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device.

The computer-readable storage medium can send, propagate, or transmit the program for use by or in combination with the instruction execution system, apparatus, or device. The program code contained on the computer-readable storage medium can be transmitted by any suitable medium, including but not limited to: wireless, wire, optical cable, RF, etc., or any suitable combination of the above.

The computer-readable storage medium carries one or more programs, and when the above one or more programs are executed by an electronic device, the electronic device realizes the method described in the following embodiments.

The flowcharts and block diagrams in the accompanying drawings illustrate the possible implementation architecture, functions, and operations of the system, method, and computer program product according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagram may represent a module, program segment, or part of the code, and the above-mentioned module, program segment, or part of the code contains one or more for realizing the specified logic function. Executable instructions. It should also be noted that, in some alternative implementations, the functions marked in the block may also occur in a different order from the order marked in the drawings. For example, two blocks shown one after another can actually be executed substantially in parallel, and they can sometimes be executed in the reverse order, depending on the functions involved. It should also be noted that each block in the block diagram or flowchart, and the combination of blocks in the block diagram or flowchart, can be implemented by a dedicated hardware-based system that performs the specified functions or operations, or can be implemented by It is realized by a combination of dedicated hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented in software or hardware, and the described units may also be provided in a processor. Among them, the names of these units do not constitute a limitation on the unit itself under certain circumstances.

FIG. 3 schematically shows a flowchart of a file clustering method according to an exemplary embodiment of the present disclosure. Referring to FIG. 3, the document clustering method may include the following steps:

S32. Obtain at least one picture file and at least one video file.

In the exemplary embodiment of the present disclosure, for the acquired file, the terminal device can determine the data format of the file to determine whether the file is a picture file or a video file.

For a picture file, the terminal device can use a face detection algorithm to determine whether the picture file contains a human face. If it contains a human face, the file is determined to be the picture file to be clustered in this disclosure. If it does not contain a human face, it can be Discard the picture file, where “discard” means not to use the picture file as a picture file to be clustered in the present disclosure.

For a video file, the terminal device can use a video-based face detection algorithm to determine whether the video file contains a human face. Similarly, if it contains a human face, the video file is determined to be the video file to be clustered in this disclosure, if If a human face is not included, the video file can be discarded, where discarding means not using the video file as a video file to be clustered in the present disclosure.

That is, in an actual scene containing pictures and videos, the terminal device may first obtain a candidate picture set and a candidate video set, the candidate picture set includes at least one candidate picture, and the candidate video set includes at least one candidate video. Next, on the one hand, face detection is performed on each candidate picture in the candidate picture set, and the candidate picture containing the face is determined as the above-mentioned picture file, which is used to perform the following method and process; on the other hand, for the candidate video set Face detection is performed on each candidate video, and the candidate video containing the face is determined as the above-mentioned video file, which is used to perform the following method and process.

According to some embodiments of the present disclosure, the terminal device can directly acquire at least one picture file containing a face image and at least one video file containing a face image. The process of specifically determining whether a picture file and a video file contain a face image can be performed by the front-end module before executing the solution of the present disclosure, and the present disclosure does not limit the function of the front-end module except for determining whether a face image is contained.

S34. Extract the facial features of each picture file.

For each picture file obtained in step S32, the following process is performed:

First, the facial features of all faces contained in the image file can be extracted. Specifically, some embodiments of the present disclosure may extract facial features of all faces contained in a picture file through a convolutional neural network. The present disclosure does not limit the model structure and training process of the convolutional neural network. In addition, in addition to convolutional neural networks, methods based on geometric features, template matching methods, methods based on wavelet theory, methods based on hidden Markov models, methods based on support vector machines, etc. can also be used to implement human faces. The feature extraction is not particularly limited in this disclosure.

Next, at least one target object is determined, and facial features related to the at least one target object are extracted from the facial features of all faces as the facial features of the image file.

Among them, taking the shooting scene as an example, the target object may be an object determined in the shooting scene, and may also be referred to as a target shooting object. In addition, in the case where the picture file does not correspond to the shooting scene, for example, a picture file downloaded from the Internet or transmitted by other users, the target object may be an object designated by the user, which is not limited in the present disclosure.

For the process of determining the target subject from multiple subjects in the shooting scene, in one embodiment, the user's selection operation of clicking on the subject during the camera preview can be obtained to determine the target subject, that is, when previewing the picture , The object corresponding to the position where the user clicks on the screen is the target shooting object. In another embodiment, the user can set the determination criteria of the target subject to determine the target subject. These determination criteria may include, but are not limited to, for example, the number of repetitions in the historical picture exceeds a predetermined number, and the height is less than 120cm children, subjects wearing hats, etc.

It should be noted that, according to some other embodiments of the present disclosure, through the facial feature extraction process, sub-pictures containing only the target object can be intercepted from the picture file, and used as pictures for later analysis, clustering, and display, and save them in the album. in. It is easy to see that the sub-picture does not contain the face image of the non-target object.

In the case that there are picture files that do not contain human faces among the multiple picture files obtained in step S32, through the process of extracting human face features in step S34, such picture files that do not contain human faces can also be eliminated.

S36. Extract the facial features of each video file.

With reference to step S402 to step S406 in FIG. 4, the process of extracting the facial features of the video file by those skilled in the art will be described.

In step S402, at least one key frame image is extracted from the video file.

According to some embodiments of the present disclosure, first, the terminal device may perform image quality evaluation on each video frame image of the video file to obtain a quality score. Specifically, the quality score can be determined based on factors such as saturation and exposure. In addition, the image quality of each video frame image can also be evaluated based on the Human Visual System (HVS).

Next, the terminal device may obtain the quality threshold, compare the quality score of each video frame image with the quality threshold, and determine the video frame image with the quality score greater than the quality threshold as the key frame image. Among them, the quality threshold can be set in advance, and the present disclosure does not limit its value. For example, in an example where the image quality score ranges from 0 to 10, the quality threshold may be set to 7.5. In addition, the quality threshold may be determined in combination with the processing capability of the terminal device. For example, the higher the processing capability of the terminal device, the lower the quality threshold may be set to obtain multiple key frame images.

According to other embodiments of the present disclosure, for a video file, the video frame image may be extracted at a predetermined time interval as a key frame image, for example, the predetermined time interval may be, for example, 3 seconds.

According to still other embodiments of the present disclosure, the terminal device may extract only one video frame image from the video file through analysis means, as a key frame image, to represent the entire video file.

In step S404, the facial features of each key frame image are extracted.

Similar to determining the facial features of the picture file described above, first, the facial features of all faces contained in the key frame image can be extracted. Specifically, a convolutional neural network can also be used to extract the face features of all faces contained in the key frame image. Face features, and the convolutional neural network used here can be the same as the convolutional neural network for determining the facial features of the picture file. Next, at least one target object is determined, and the facial features related to the at least one target object are extracted from the facial features of all faces as the facial features of the key frame image.

In step S406, the facial features of the video file are determined according to the facial features of each key frame image.

According to some embodiments of the present disclosure, the facial features of each key frame image can be used as the facial features of the video file. That is to say, in the example of extracting only one key frame image, all the facial features in the key frame image are taken as the facial features of the video file; in the example of extracting more than two key frame images, each key frame image is extracted. All facial features of the frame image are used as the facial features of the video file. In addition, there is no restriction on the number of object categories corresponding to facial features, that is, there is no restriction on the number of different human faces contained in the key frame image.

According to other embodiments of the present disclosure, the number of key frame images extracted in the video file is at least two. In this case, firstly, each key frame image can be adjusted according to the time point of each key frame image in the video file. Sorting, that is, you can sort by the order in which each key frame image appears when the video file is played to obtain an image sequence; next, you can determine the correlation between adjacent key frame images in the image sequence, and The key frame images whose correlation is less than the correlation threshold are removed from these key frame images to obtain the key frame image set; then, the person of the video file can be determined according to the facial features of each key frame image in the key frame image set Face features. In an embodiment, the facial feature of each key frame image in the set of key frame images can be used as the facial feature of the video file.

Specifically, the correlation between the key frame images can be determined based on the image quality and the similarity of the target object. The higher the image quality and the higher the similarity of the target object, the higher the correlation. For example, there are key frame image sequences A, B, C, D, E, where image B is blurry, and the target object in image E is less similar to the target object in other images. Then image B and image E can be changed from Removed from the sequence. It should be noted that the image quality and similarity can also be combined, and the two weighting methods can be used to determine the correlation.

In addition, in another embodiment, for the process of using the facial features of the key frame images in the key frame image set to determine the facial features of the video file, specifically, first of all, for each key frame image in the key frame image set. The face features are clustered to obtain a face feature set of at least one object category, where different faces correspond to different object categories. Next, determine the score of each face feature in the face feature set of each object category, and for each object category, filter out the face feature with the highest score as the face feature corresponding to each object category, and The facial feature is determined as the facial feature of the video file.

The process of scoring a face can be determined based on the feature scoring result of the above-mentioned convolutional neural network. In addition, a face scoring model can also be constructed by itself to score different facial features. This disclosure does not do this limit.

For example, a video file leaves 10 key frame images after excluding weakly correlated images. Each key frame image has three objects a, b, and c, which can be clustered according to different objects and can be divided into three categories. Subsequently, the face score can be determined through analysis, and the face feature with the highest score in each cluster can be determined as the face feature of the video file.

According to still other embodiments of the present disclosure, after the facial features of each key frame image are determined, these facial features are clustered to distinguish different photographed objects. Then, for each subject, the face feature with the highest face score is determined from the clustering result, and used as the face feature of the video file.

It is understandable that in the case that there are video files that do not contain human faces among the multiple video files obtained in step S32, the process of extracting facial features in step S36 can also convert such video files that do not contain human faces. Culling.

In addition, the order of step S34 and step S36 of the exemplary embodiment of the present disclosure may be interchanged.

S38. Cluster the at least one picture file and the at least one video file according to the facial features of each picture file and the facial features of each video file.

In the exemplary embodiment of the present disclosure, according to different photographing objects, the at least one picture file and the at least one video file obtained in step S32 can be clustered by using the facial features determined in step S34 and step S36. Specifically, a machine learning algorithm such as K-means (K-means clustering algorithm, K-means clustering algorithm) may be used to implement the clustering process, which is not limited in the present disclosure.

After clustering, different clusters correspond to different shooting objects. That is to say, the picture file and the video file are divided according to the cluster ID, and the shooting object corresponds to the cluster ID one to one.

In addition, for a target object, extract only the sub-picture of the target object in the picture file, and assign the same cluster ID to the sub-picture and the corresponding video file containing the target object. Or, extract a video segment containing the target object in the video file, and assign the same cluster ID to the video segment and the aforementioned sub-picture.

The present disclosure also provides a solution for editing the clustering result.

First, the terminal device can display the clustering result, specifically, it can be displayed in the album in modules; next, the terminal device can respond to the user's editing operation on the clustering result, edit the clustering result, and Save the edited result. The editing operations may include, but are not limited to: modifying the name of the album, deleting one or more picture files, deleting one or more video files, adding comments, changing the size, and so on.

In addition, you can upload the edited clustering results to the cloud for backup.

The entire process of file clustering in an exemplary embodiment of the present disclosure will be described below with reference to FIG. 5.

In step S512, the terminal device can obtain at least one picture file; in step S514, the terminal device can extract the facial features of each picture file; in step S516, the terminal device performs feature filtering on the facial features to remove the image Face information that users are not interested in.

In step S522, the terminal device can obtain at least one video file; in step S524, the terminal device can extract the key frame image of each video file; in step S526, the terminal device extracts facial features from the key frame image; In step S528, the terminal device may perform feature denoising, that is to say, remove the facial information that is not of interest to the user in the key frame image and the key frame image with poor correlation. In addition, perform individual clustering, that is, for Different photographed objects are clustered to determine the facial features of each photographed object with better quality as the facial features of the video file.

In step S530, the picture file and the video file are clustered using the facial features, and the same cluster ID is assigned to the same object.

It should be noted that although the various steps of the method in the present disclosure are described in a specific order in the drawings, this does not require or imply that these steps must be performed in the specific order, or that all the steps shown must be performed to achieve the desired the result of. Additionally or alternatively, some steps may be omitted, multiple steps may be combined into one step for execution, and/or one step may be decomposed into multiple steps for execution, etc.

Further, this exemplary embodiment also provides a file clustering device.

Fig. 6 schematically shows a block diagram of a file clustering apparatus according to an exemplary embodiment of the present disclosure. Referring to FIG. 6, the document clustering apparatus 6 according to an exemplary embodiment of the present disclosure may include a document acquisition module 61, a first feature extraction module 63, a second feature extraction module 65, and a document clustering module 67.

Specifically, the file acquisition module 61 can be used to acquire at least one picture file and at least one video file; the first feature extraction module 63 can be used to extract facial features of each picture file; the second feature extraction module 65 can be used to extract The facial features of each video file; the file clustering module 67 can be used to compare the at least one image file and the at least one video file according to the facial features of each image file and the facial features of each video file. Perform clustering.

Based on the file clustering device of the exemplary embodiment of the present disclosure, a hybrid clustering effect of pictures and videos can be realized, and pictures and videos can be classified according to shooting objects, which helps users to quickly determine pictures and videos containing the same shooting object. Perform operations such as viewing, sharing, and deleting.

According to an exemplary embodiment of the present disclosure, the first feature extraction module 63 may be configured to perform: extract the facial features of all faces contained in the picture file; determine at least one target object, and extract from the facial features of all faces The facial features related to the at least one target object are used as the facial features of the picture file.

According to an exemplary embodiment of the present disclosure, the second feature extraction module 65 may be configured to perform: extract at least one key frame image from the video file; extract the facial features of each key frame image; Face features, to determine the face features of the video file.

According to an exemplary embodiment of the present disclosure, the process of extracting at least one key frame image from the video file by the second feature extraction module 65 may be configured to perform: perform image quality evaluation on each video frame image of the video file to obtain a quality score ; Obtain the quality threshold, compare the quality score of each video frame image with the quality threshold; determine the video frame image with the quality score greater than the quality threshold as the key frame image.

According to an exemplary embodiment of the present disclosure, the process of extracting the facial features of each key frame image by the second feature extraction module 65 may be configured to perform: extract the facial features of all faces contained in the key frame image; determine at least one The target object, extracting the facial features related to the at least one target object from the facial features of all human faces, as the facial features of the key frame image.

According to an exemplary embodiment of the present disclosure, the number of key frame images in the video file is more than two. In this case, the second feature extraction module 65 determines the face of the video file according to the face feature of each key frame image. The feature process can be configured to execute: sort each key frame image according to the time point of each key frame image in the video file to obtain an image sequence; determine the correlation between adjacent key frame images in the image sequence; Remove the key frame images whose correlation is less than the correlation threshold from two or more key frame images to obtain the key frame image set; determine the face characteristics of the video file according to the facial features of each key frame image in the key frame image set .

According to an exemplary embodiment of the present disclosure, the process of determining the facial features of the video file by the second feature extraction module 65 using the facial features of the key frame images in the key frame image set may be configured to perform: Clustering the facial features of the key frame images to obtain the facial feature set of at least one object category; determine the score of each facial feature in the facial feature set of each object category; filter the facial feature with the highest score, As the face feature corresponding to the object category, the face feature corresponding to the object category is determined as the face feature of the video file.

According to an exemplary embodiment of the present disclosure, the file obtaining module 61 may be configured to execute: obtain a candidate picture set and a candidate video set; perform face detection on each candidate picture in the candidate picture set, and select candidate pictures containing faces Determine as a picture file; perform face detection on each candidate video in the candidate video set, and determine the candidate video containing the face as a video file.

According to an exemplary embodiment of the present disclosure, referring to FIG. 7, compared with the document clustering device 6, the document clustering device 7 may further include a result editing module 71.

Specifically, the result editing module 71 may be configured to execute: display the results of the clustering; wherein, each cluster in the result of the clustering corresponds to a different face object category; The result of the class is edited and saved.

Since each functional module of the file clustering device in the embodiment of the present disclosure is the same as in the above method embodiment, it will not be repeated here.

Through the description of the above embodiments, those skilled in the art can easily understand that the example embodiments described here can be implemented by software, or can be implemented by combining software with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.) or on the network , Including several instructions to make a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) execute the method according to the embodiments of the present disclosure.

In addition, the above-mentioned drawings are merely schematic illustrations of the processing included in the method according to the exemplary embodiments of the present disclosure, and are not intended for limitation. It is easy to understand that the processing shown in the above drawings does not indicate or limit the time sequence of these processings. In addition, it is easy to understand that these processes can be executed synchronously or asynchronously in multiple modules, for example.

It should be noted that although several modules or units of the device for action execution are mentioned in the above detailed description, this division is not mandatory. In fact, according to the embodiments of the present disclosure, the features and functions of two or more modules or units described above may be embodied in one module or unit. Conversely, the features and functions of a module or unit described above can be further divided into multiple modules or units to be embodied.

Those skilled in the art will easily think of other embodiments of the present disclosure after considering the description and practicing the content disclosed herein. This application is intended to cover any variations, uses, or adaptive changes of the present disclosure. These variations, uses, or adaptive changes follow the general principles of the present disclosure and include common knowledge or conventional technical means in the technical field that are not disclosed in the present disclosure. . The description and the embodiments are only regarded as exemplary, and the true scope and spirit of the present disclosure are pointed out by the claims.

It should be understood that the present disclosure is not limited to the precise structure that has been described above and shown in the drawings, and various modifications and changes can be made without departing from its scope. The scope of the present disclosure is limited only by the appended claims.

Claims

A file clustering method, including:

Obtain at least one picture file and at least one video file;

Extract the facial features of each of the picture files;

Extract the facial features of each of the video files;

Clustering the at least one picture file and the at least one video file according to the facial features of each of the picture files and the facial characteristics of each of the video files.
The document clustering method according to claim 1, wherein extracting the facial features of each of the picture files comprises:

Extracting facial features of all faces contained in the picture file;

At least one target object is determined, and the facial features related to the at least one target object are extracted from the facial features of all the faces as the facial features of the picture file.
The file clustering method according to claim 1, wherein extracting the facial features of each of the video files comprises:

Extract at least one key frame image from the video file;

Extracting the facial features of each of the key frame images;

Determine the face feature of the video file according to the face feature of each of the key frame images.
4. The file clustering method according to claim 3, wherein extracting at least one key frame image from the video file comprises:

Perform image quality evaluation on each video frame image of the video file to obtain a quality score;

Acquiring a quality threshold, and comparing the quality score of each video frame image with the quality threshold;

A video frame image with a quality score greater than the quality threshold is determined as the key frame image.
The document clustering method according to claim 3, wherein extracting the facial features of each of the key frame images comprises:

Extracting facial features of all faces included in the key frame image;

At least one target object is determined, and the facial features related to the at least one target object are extracted from the facial features of the human faces as the facial features of the key frame image.
The file clustering method according to claim 5, wherein more than two key frame images are extracted from the video file; wherein, according to the facial features of each of the key frame images, the person of the video file is determined Facial features, including:

Sorting each of the key frame images according to the time point of each of the key frame images in the video file to obtain an image sequence;

Determining the correlation between adjacent key frame images in the image sequence;

Removing key frame images whose correlation is less than the correlation threshold from the two or more key frame images to obtain a key frame image set;

Determine the face feature of the video file according to the face feature of each key frame image in the set of key frame images.
The file clustering method according to claim 6, wherein determining the facial features of the video file according to the facial features of each key frame image in the set of key frame images includes:

Clustering the face features of each key frame image in the key frame image set to obtain a face feature set of at least one object category;

Determining the score of each face feature in the face feature set of each object category;

Filter out the face feature with the highest score as the face feature corresponding to the object category;

The face feature corresponding to the object category is determined as the face feature of the video file.
The document clustering method according to claim 1, wherein the document clustering method further comprises:

Obtaining a set of candidate pictures and a set of candidate videos;

Performing face detection on each candidate picture in the candidate picture set, and determining a candidate picture containing the face as the picture file;

Perform face detection on each candidate video in the candidate video set, and determine the candidate video containing the face as the video file.
The file clustering method according to any one of claims 1 to 8, wherein, after clustering the at least one picture file and the at least one video file, the file clustering method further comprises:

Display the clustering results; among them, each cluster in the clustering result corresponds to a different face object category;

In response to the editing operation on the result of the cluster, the result of the cluster is edited and saved.
A file clustering device includes:

The file obtaining module is configured to obtain at least one picture file and at least one video file;

The first feature extraction module is configured to extract the facial features of each of the picture files;

The second feature extraction module is configured to extract the facial features of each of the video files;

The file clustering module is configured to cluster the at least one image file and the at least one video file according to the facial features of each of the image files and the facial features of each of the video files.
The document clustering device according to claim 10, wherein the first feature extraction module is configured to extract facial features of all faces contained in the image file, determine at least one target object, From the facial features of the face, the facial features related to the at least one target object are extracted as the facial features of the picture file.
The document clustering device according to claim 10, wherein the second feature extraction module is configured to extract at least one key frame image from the video file, and extract the facial features of each of the key frame images, Determine the face feature of the video file according to the face feature of each of the key frame images.
The file clustering device according to claim 12, wherein the process of extracting at least one key frame image from the video file by the second feature extraction module is configured to: for each video frame image of the video file Perform image quality evaluation to obtain a quality score, obtain a quality threshold, compare the quality score of each video frame image with the quality threshold, and determine a video frame image with a quality score greater than the quality threshold as the key frame image.
The document clustering device according to claim 12, wherein the process of extracting the facial features of each of the key frame images by the second feature extraction module is configured to: extract all faces contained in the key frame images At least one target object is determined, and the facial features related to the at least one target object are extracted from the facial features of all faces as the facial features of the key frame image.
The document clustering device according to claim 14, wherein in the case of extracting more than two key frame images from the video file, the second feature extraction module is based on the face of each of the key frame images. The process of determining the facial features of the video file is configured to: sort the key frame images according to the time point of each key frame image in the video file to obtain an image sequence, and determine the image The correlation between adjacent key frame images in the sequence is removed from the two or more key frame images whose correlation is less than the correlation threshold corresponding to key frame images to obtain a set of key frame images, according to the key frame images The facial features of each key frame image in the set are determined to determine the facial features of the video file.
The file clustering device according to claim 15, wherein the process of determining the facial features of the video file by the second feature extraction module according to the facial features of each key frame image in the set of key frame images is controlled by The configuration is configured to: cluster the face features of each key frame image in the key frame image set to obtain a face feature set of at least one object category, and determine each person in the face feature set of each object category According to the facial feature score, the facial feature with the highest score is selected as the facial feature corresponding to the object category, and the facial feature corresponding to the object category is determined as the facial feature of the video file.
The file clustering device according to claim 10, wherein the file obtaining module is configured to obtain a set of candidate pictures and a set of candidate videos, perform face detection on each candidate picture in the set of candidate pictures, and combine A candidate picture containing a human face is determined as the picture file, face detection is performed on each candidate video in the candidate video set, and the candidate video containing a human face is determined as the video file.
The document clustering device according to any one of claims 10 to 17, wherein the document clustering device further comprises:

The result editing module is configured to display the results of the clustering; wherein, each cluster in the result of the clustering corresponds to a different face object category, and in response to the editing operation on the result of the cluster, the result of the cluster is The results are edited and saved.
A computer-readable storage medium having a computer program stored thereon, and when the program is executed by a processor, the file clustering method according to any one of claims 1 to 9 is realized.
An electronic device including:

processor;

The memory is configured to store one or more programs, and when the one or more programs are executed by the processor, the processor realizes the file clustering according to any one of claims 1 to 9 method.