WO2021129444A1 - File clustering method and apparatus, and storage medium and electronic device - Google Patents

File clustering method and apparatus, and storage medium and electronic device Download PDF

Info

Publication number
WO2021129444A1
WO2021129444A1 PCT/CN2020/136176 CN2020136176W WO2021129444A1 WO 2021129444 A1 WO2021129444 A1 WO 2021129444A1 CN 2020136176 W CN2020136176 W CN 2020136176W WO 2021129444 A1 WO2021129444 A1 WO 2021129444A1
Authority
WO
WIPO (PCT)
Prior art keywords
file
key frame
facial features
video
frame image
Prior art date
Application number
PCT/CN2020/136176
Other languages
French (fr)
Chinese (zh)
Inventor
彭冬炜
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2021129444A1 publication Critical patent/WO2021129444A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Definitions

  • the present disclosure relates to the field of terminal technology, and in particular, to a file clustering method, a file clustering device, a computer-readable storage medium, and an electronic device.
  • a file clustering method including: obtaining at least one picture file and at least one video file; extracting facial features of each picture file; extracting facial features of each video file; Clustering the at least one picture file and the at least one video file according to the face feature of each picture file and the face feature of each video file.
  • a file clustering device including: a file acquisition module for acquiring at least one picture file and at least one video file; a first feature extraction module for extracting information about each picture file Face features; the second feature extraction module is used to extract the face features of each video file; the file clustering module is used to compare all the face features of each image file and the face features of each video file The at least one picture file and the at least one video file are clustered.
  • a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the above-mentioned file clustering method is realized.
  • an electronic device including a processor; a memory, configured to store one or more programs, and when the one or more programs are executed by the processor, the processor can realize the above File clustering method.
  • FIG. 1 shows a schematic diagram of an exemplary system architecture of a document clustering method or document clustering device to which an embodiment of the present disclosure can be applied;
  • Figure 2 shows a schematic structural diagram of an electronic device suitable for implementing embodiments of the present disclosure
  • FIG. 3 schematically shows a flowchart of a file clustering method according to an exemplary embodiment of the present disclosure
  • Fig. 4 schematically shows a flowchart of extracting facial features of a video file according to an exemplary embodiment of the present disclosure
  • FIG. 5 schematically shows a flowchart of the entire process of file clustering according to an exemplary embodiment of the present disclosure
  • Fig. 6 schematically shows a block diagram of a file clustering apparatus according to an exemplary embodiment of the present disclosure
  • Fig. 7 schematically shows a block diagram of a file clustering apparatus according to another exemplary embodiment of the present disclosure.
  • FIG. 1 shows a schematic diagram of an exemplary system architecture of a file clustering method or a file clustering device to which an embodiment of the present disclosure can be applied.
  • the system architecture 1000 may include one or more of terminal devices 1001, 1002, 1003, a network 1004 and a server 1005.
  • the network 1004 is used to provide a medium for communication links between the terminal devices 1001, 1002, 1003 and the server 1005.
  • the network 1004 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.
  • the numbers of terminal devices, networks, and servers in FIG. 1 are merely illustrative. According to implementation needs, there can be any number of terminal devices, networks, and servers.
  • the server 1005 may be a server cluster composed of multiple servers.
  • the user can use the terminal devices 1001, 1002, 1003 to interact with the server 1005 through the network 1004 to receive or send messages and so on.
  • the terminal devices 1001, 1002, 1003 may be various electronic devices with display screens, including but not limited to smart phones, tablet computers, portable computers, desktop computers, and so on.
  • the terminal device 1001, 1002, 1003 can obtain at least one picture file and at least one video file from the server 1005 through the network 1004, in this case
  • the server 1005 may be, for example, a cloud platform such as a cloud photo album.
  • the terminal devices 1001, 1002, 1003 may take pictures and videos through the camera modules equipped with them, and obtain at least one picture file and at least one video file from them.
  • some of the picture files and video files acquired by the terminal devices 1001, 1002, 1003 are determined by shooting by the camera module itself, and the other part is from the server 1005. This disclosure does not limit the sources of image files and video files.
  • the picture files and video files described in the exemplary embodiments of the present disclosure both contain human face images, that is, the present disclosure mainly focuses on the clustering of pictures and videos based on human faces.
  • the solution of the present disclosure can also be applied to clustering of other photographed objects, and these other photographed objects may include, for example, animals, vehicles, buildings, etc., which is not limited in the present disclosure.
  • the terminal devices 1001, 1002, 1003 may respectively extract facial features, and use the extracted facial features to cluster the acquired picture files and video files. This allows image files and video files for the same subject to be assigned the same cluster ID, which is convenient for users to view.
  • the server 1005 can obtain the picture files and video files taken by the camera modules of the terminal devices 1001, 1002, 1003, extract their facial features, and use them according to The facial features of the picture files and video files cluster the obtained picture files and video files.
  • the file clustering apparatus of the exemplary embodiment of the present disclosure may be configured in the terminal equipment 1001, 1002, 1003.
  • FIG. 2 shows a schematic diagram of an electronic device suitable for implementing the exemplary embodiments of the present disclosure, and the electronic device corresponds to the above terminal device such as a mobile phone. It should be noted that the electronic device shown in FIG. 2 is only an example, and should not bring any limitation to the function and scope of use of the embodiments of the present disclosure.
  • the electronic device of the present disclosure includes at least a processor and a memory.
  • the memory is used to store one or more programs.
  • the processor can implement the file clustering method of the exemplary embodiment of the present disclosure. .
  • the electronic device 200 may include: a processor 210, an internal memory 221, an external memory interface 222, a universal serial bus (USB) interface 230, a charging management module 240, and a power management module 241, battery 242, antenna 1, antenna 2, mobile communication module 250, wireless communication module 260, audio module 270, speaker 271, receiver 272, microphone 273, earphone interface 274, sensor module 280, display screen 290, camera module 291 , Indicator 292, motor 293, button 294, Subscriber Identification Module (SIM) card interface 295, etc.
  • SIM Subscriber Identification Module
  • the sensor module 280 may include a depth sensor 2801, a pressure sensor 2802, a gyroscope sensor 2803, an air pressure sensor 2804, a magnetic sensor 2805, an acceleration sensor 2806, a distance sensor 2807, a proximity light sensor 2808, a fingerprint sensor 2809, a temperature sensor 2810, and a touch sensor. 2811, ambient light sensor 2812, bone conduction sensor 2813, etc.
  • the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the electronic device 200.
  • the electronic device 200 may include more or fewer components than shown, or combine certain components, or split certain components, or arrange different components.
  • the illustrated components can be implemented in hardware, software, or a combination of software and hardware.
  • the processor 210 may include one or more processing units.
  • the processor 210 may include an application processor (AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (Image Signal Processor, ISP), controller, video codec, digital signal processor (Digital Signal Processor, DSP), baseband processor and/or Neural-etwork Processing Unit (NPU), etc.
  • AP application processor
  • GPU graphics processing unit
  • ISP image Signal Processor
  • controller video codec
  • digital signal processor Digital Signal Processor
  • NPU Neural-etwork Processing Unit
  • the different processing units may be independent devices or integrated in one or more processors.
  • a memory may be provided in the processor 210 to store instructions and data.
  • the USB interface 230 is an interface that complies with the USB standard specification, and specifically may be a MiniUSB interface, a MicroUSB interface, a USBTypeC interface, and so on.
  • the USB interface 230 can be used to connect a charger to charge the electronic device 200, and can also be used to transfer data between the electronic device 200 and peripheral devices. It can also be used to connect earphones and play audio through earphones. This interface can also be used to connect other electronic devices, such as AR devices.
  • the charging management module 240 is used to receive charging input from the charger.
  • the charger can be a wireless charger or a wired charger.
  • the power management module 241 is used to connect the battery 242, the charging management module 240, and the processor 210.
  • the power management module 241 receives input from the battery 242 and/or the charging management module 240, and supplies power to the processor 210, the internal memory 221, the display screen 290, the camera module 291, and the wireless communication module 260.
  • the wireless communication function of the electronic device 200 can be implemented by the antenna 1, the antenna 2, the mobile communication module 250, the wireless communication module 260, the modem processor, and the baseband processor.
  • the mobile communication module 250 can provide a wireless communication solution including 2G/3G/4G/5G and the like applied to the electronic device 200.
  • the wireless communication module 260 can provide wireless local area networks (Wireless Local Area Networks, WLAN) (such as Wireless Fidelity (Wi-Fi) networks), Bluetooth (Bluetooth, BT), and global navigation satellites used on the electronic device 200.
  • WLAN Wireless Local Area Networks
  • Wi-Fi Wireless Fidelity
  • Bluetooth Bluetooth
  • BT Bluetooth
  • global navigation satellites used on the electronic device 200.
  • System Global Navigation Satellite System, GNSS
  • FM Frequency Modulation
  • NFC Near Field Communication
  • Infrared Technology Infrared, IR
  • the electronic device 200 implements a display function through a GPU, a display screen 290, an application processor, and the like.
  • the GPU is a microprocessor for image processing and is connected to the display screen 290 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations and is used for graphics rendering.
  • the processor 210 may include one or more GPUs that execute program instructions to generate or change display information.
  • the electronic device 200 can realize a shooting function through an ISP, a camera module 291, a video codec, a GPU, a display screen 290, and an application processor.
  • the electronic device 200 may include 1 or N camera modules 291, and N is a positive integer greater than 1. If the electronic device 200 includes N cameras, one of the N cameras is the main camera.
  • the internal memory 221 may be used to store computer executable program code, where the executable program code includes instructions.
  • the internal memory 221 may include a storage program area and a storage data area.
  • the external memory interface 222 may be used to connect an external memory card, such as a Micro SD card, so as to expand the storage capacity of the electronic device 200.
  • the electronic device 200 can implement audio functions through an audio module 270, a speaker 271, a receiver 272, a microphone 273, a headphone interface 274, an application processor, and the like. For example, music playback, recording, etc.
  • the audio module 270 is used to convert digital audio information into an analog audio signal for output, and is also used to convert an analog audio input into a digital audio signal.
  • the audio module 270 can also be used to encode and decode audio signals.
  • the audio module 270 may be provided in the processor 210, or part of the functional modules of the audio module 270 may be provided in the processor 210.
  • the speaker 271 also called a "speaker” is used to convert audio electrical signals into sound signals.
  • the electronic device 200 can listen to music through the speaker 271, or listen to a hands-free call.
  • the microphone 273, also called “microphone” or “microphone”, is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can approach the microphone 273 through the mouth to make a sound, and input the sound signal to the microphone 273.
  • the electronic device 200 may be provided with at least one microphone 273.
  • the earphone interface 274 is used to connect wired earphones.
  • the depth sensor 2801 is used to obtain depth information of the scene.
  • the pressure sensor 2802 is used to sense the pressure signal and can convert the pressure signal into an electrical signal.
  • the gyro sensor 2803 may be used to determine the movement posture of the electronic device 200.
  • the air pressure sensor 2804 is used to measure air pressure.
  • the magnetic sensor 2805 includes a Hall sensor.
  • the electronic device 200 can use the magnetic sensor 2805 to detect the opening and closing of the flip holster.
  • the acceleration sensor 2806 can detect the magnitude of the acceleration of the electronic device 200 in various directions (generally three axes).
  • the distance sensor 2807 is used to measure distance.
  • the proximity light sensor 2808 may include, for example, a light emitting diode (LED) and a light detector, such as a photodiode.
  • the fingerprint sensor 2809 is used to collect fingerprints.
  • the temperature sensor 2810 is used to detect temperature.
  • the touch sensor 2811 may pass the detected touch operation to the application processor to determine the type of the touch event.
  • the visual output related to the touch operation can be provided through the display screen 290.
  • the ambient light sensor 2812 is used to sense the brightness of the ambient light.
  • the bone conduction sensor 2813 can acquire vibration signals.
  • the button 294 includes a power-on button, a volume button, and so on.
  • the button 294 may be a mechanical button. It can also be a touch button.
  • the motor 293 can generate vibration prompts. The motor 293 can be used for incoming call vibration notification, and can also be used for touch vibration feedback.
  • the indicator 292 can be an indicator light, which can be used to indicate the charging status, power change, or to indicate messages, missed calls, notifications, and so on.
  • the SIM card interface 295 is used to connect to the SIM card.
  • the electronic device 200 interacts with the network through the SIM card to implement functions such as call and data communication.
  • the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium may be included in the electronic device described in the foregoing embodiment; or it may exist alone without being assembled into the electronic device.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable removable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device.
  • the computer-readable storage medium can send, propagate, or transmit the program for use by or in combination with the instruction execution system, apparatus, or device.
  • the program code contained on the computer-readable storage medium can be transmitted by any suitable medium, including but not limited to: wireless, wire, optical cable, RF, etc., or any suitable combination of the above.
  • the computer-readable storage medium carries one or more programs, and when the above one or more programs are executed by an electronic device, the electronic device realizes the method described in the following embodiments.
  • each block in the flowchart or block diagram may represent a module, program segment, or part of the code, and the above-mentioned module, program segment, or part of the code contains one or more for realizing the specified logic function.
  • Executable instructions may also occur in a different order from the order marked in the drawings. For example, two blocks shown one after another can actually be executed substantially in parallel, and they can sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagram or flowchart, and the combination of blocks in the block diagram or flowchart can be implemented by a dedicated hardware-based system that performs the specified functions or operations, or can be implemented by It is realized by a combination of dedicated hardware and computer instructions.
  • the units described in the embodiments of the present disclosure may be implemented in software or hardware, and the described units may also be provided in a processor. Among them, the names of these units do not constitute a limitation on the unit itself under certain circumstances.
  • FIG. 3 schematically shows a flowchart of a file clustering method according to an exemplary embodiment of the present disclosure.
  • the document clustering method may include the following steps:
  • the terminal device can determine the data format of the file to determine whether the file is a picture file or a video file.
  • the terminal device can use a face detection algorithm to determine whether the picture file contains a human face. If it contains a human face, the file is determined to be the picture file to be clustered in this disclosure. If it does not contain a human face, it can be Discard the picture file, where “discard” means not to use the picture file as a picture file to be clustered in the present disclosure.
  • the terminal device can use a video-based face detection algorithm to determine whether the video file contains a human face. Similarly, if it contains a human face, the video file is determined to be the video file to be clustered in this disclosure, if If a human face is not included, the video file can be discarded, where discarding means not using the video file as a video file to be clustered in the present disclosure.
  • the terminal device may first obtain a candidate picture set and a candidate video set, the candidate picture set includes at least one candidate picture, and the candidate video set includes at least one candidate video.
  • face detection is performed on each candidate picture in the candidate picture set, and the candidate picture containing the face is determined as the above-mentioned picture file, which is used to perform the following method and process;
  • Face detection is performed on each candidate video, and the candidate video containing the face is determined as the above-mentioned video file, which is used to perform the following method and process.
  • the terminal device can directly acquire at least one picture file containing a face image and at least one video file containing a face image.
  • the process of specifically determining whether a picture file and a video file contain a face image can be performed by the front-end module before executing the solution of the present disclosure, and the present disclosure does not limit the function of the front-end module except for determining whether a face image is contained.
  • the facial features of all faces contained in the image file can be extracted.
  • some embodiments of the present disclosure may extract facial features of all faces contained in a picture file through a convolutional neural network.
  • the present disclosure does not limit the model structure and training process of the convolutional neural network.
  • methods based on geometric features, template matching methods, methods based on wavelet theory, methods based on hidden Markov models, methods based on support vector machines, etc. can also be used to implement human faces.
  • the feature extraction is not particularly limited in this disclosure.
  • At least one target object is determined, and facial features related to the at least one target object are extracted from the facial features of all faces as the facial features of the image file.
  • the target object may be an object determined in the shooting scene, and may also be referred to as a target shooting object.
  • the target object may be an object designated by the user, which is not limited in the present disclosure.
  • the user's selection operation of clicking on the subject during the camera preview can be obtained to determine the target subject, that is, when previewing the picture ,
  • the object corresponding to the position where the user clicks on the screen is the target shooting object.
  • the user can set the determination criteria of the target subject to determine the target subject. These determination criteria may include, but are not limited to, for example, the number of repetitions in the historical picture exceeds a predetermined number, and the height is less than 120cm children, subjects wearing hats, etc.
  • sub-pictures containing only the target object can be intercepted from the picture file, and used as pictures for later analysis, clustering, and display, and save them in the album. in. It is easy to see that the sub-picture does not contain the face image of the non-target object.
  • step S402 to step S406 in FIG. 4 the process of extracting the facial features of the video file by those skilled in the art will be described.
  • step S402 at least one key frame image is extracted from the video file.
  • the terminal device may perform image quality evaluation on each video frame image of the video file to obtain a quality score.
  • the quality score can be determined based on factors such as saturation and exposure.
  • the image quality of each video frame image can also be evaluated based on the Human Visual System (HVS).
  • HVS Human Visual System
  • the terminal device may obtain the quality threshold, compare the quality score of each video frame image with the quality threshold, and determine the video frame image with the quality score greater than the quality threshold as the key frame image.
  • the quality threshold can be set in advance, and the present disclosure does not limit its value.
  • the quality threshold may be set to 7.5.
  • the quality threshold may be determined in combination with the processing capability of the terminal device. For example, the higher the processing capability of the terminal device, the lower the quality threshold may be set to obtain multiple key frame images.
  • the video frame image may be extracted at a predetermined time interval as a key frame image, for example, the predetermined time interval may be, for example, 3 seconds.
  • the terminal device may extract only one video frame image from the video file through analysis means, as a key frame image, to represent the entire video file.
  • step S404 the facial features of each key frame image are extracted.
  • the facial features of all faces contained in the key frame image can be extracted.
  • a convolutional neural network can also be used to extract the face features of all faces contained in the key frame image. Face features, and the convolutional neural network used here can be the same as the convolutional neural network for determining the facial features of the picture file.
  • at least one target object is determined, and the facial features related to the at least one target object are extracted from the facial features of all faces as the facial features of the key frame image.
  • step S406 the facial features of the video file are determined according to the facial features of each key frame image.
  • the facial features of each key frame image can be used as the facial features of the video file. That is to say, in the example of extracting only one key frame image, all the facial features in the key frame image are taken as the facial features of the video file; in the example of extracting more than two key frame images, each key frame image is extracted. All facial features of the frame image are used as the facial features of the video file.
  • there is no restriction on the number of object categories corresponding to facial features that is, there is no restriction on the number of different human faces contained in the key frame image.
  • the number of key frame images extracted in the video file is at least two.
  • each key frame image can be adjusted according to the time point of each key frame image in the video file. Sorting, that is, you can sort by the order in which each key frame image appears when the video file is played to obtain an image sequence; next, you can determine the correlation between adjacent key frame images in the image sequence, and The key frame images whose correlation is less than the correlation threshold are removed from these key frame images to obtain the key frame image set; then, the person of the video file can be determined according to the facial features of each key frame image in the key frame image set Face features.
  • the facial feature of each key frame image in the set of key frame images can be used as the facial feature of the video file.
  • the correlation between the key frame images can be determined based on the image quality and the similarity of the target object.
  • image B and image E can be changed from Removed from the sequence.
  • image quality and similarity can also be combined, and the two weighting methods can be used to determine the correlation.
  • the facial features of the key frame images in the key frame image set to determine the facial features of the video file, specifically, first of all, for each key frame image in the key frame image set.
  • the face features are clustered to obtain a face feature set of at least one object category, where different faces correspond to different object categories.
  • the facial feature is determined as the facial feature of the video file.
  • the process of scoring a face can be determined based on the feature scoring result of the above-mentioned convolutional neural network.
  • a face scoring model can also be constructed by itself to score different facial features. This disclosure does not do this limit.
  • a video file leaves 10 key frame images after excluding weakly correlated images.
  • Each key frame image has three objects a, b, and c, which can be clustered according to different objects and can be divided into three categories.
  • the face score can be determined through analysis, and the face feature with the highest score in each cluster can be determined as the face feature of the video file.
  • facial features of each key frame image are determined, these facial features are clustered to distinguish different photographed objects. Then, for each subject, the face feature with the highest face score is determined from the clustering result, and used as the face feature of the video file.
  • step S36 can also convert such video files that do not contain human faces. Culling.
  • step S34 and step S36 of the exemplary embodiment of the present disclosure may be interchanged.
  • the at least one picture file and the at least one video file obtained in step S32 can be clustered by using the facial features determined in step S34 and step S36.
  • a machine learning algorithm such as K-means (K-means clustering algorithm, K-means clustering algorithm) may be used to implement the clustering process, which is not limited in the present disclosure.
  • clusters correspond to different shooting objects. That is to say, the picture file and the video file are divided according to the cluster ID, and the shooting object corresponds to the cluster ID one to one.
  • the present disclosure also provides a solution for editing the clustering result.
  • the terminal device can display the clustering result, specifically, it can be displayed in the album in modules; next, the terminal device can respond to the user's editing operation on the clustering result, edit the clustering result, and Save the edited result.
  • the editing operations may include, but are not limited to: modifying the name of the album, deleting one or more picture files, deleting one or more video files, adding comments, changing the size, and so on.
  • the terminal device can obtain at least one picture file; in step S514, the terminal device can extract the facial features of each picture file; in step S516, the terminal device performs feature filtering on the facial features to remove the image Face information that users are not interested in.
  • the terminal device can obtain at least one video file; in step S524, the terminal device can extract the key frame image of each video file; in step S526, the terminal device extracts facial features from the key frame image; In step S528, the terminal device may perform feature denoising, that is to say, remove the facial information that is not of interest to the user in the key frame image and the key frame image with poor correlation. In addition, perform individual clustering, that is, for Different photographed objects are clustered to determine the facial features of each photographed object with better quality as the facial features of the video file.
  • step S530 the picture file and the video file are clustered using the facial features, and the same cluster ID is assigned to the same object.
  • this exemplary embodiment also provides a file clustering device.
  • Fig. 6 schematically shows a block diagram of a file clustering apparatus according to an exemplary embodiment of the present disclosure.
  • the document clustering apparatus 6 may include a document acquisition module 61, a first feature extraction module 63, a second feature extraction module 65, and a document clustering module 67.
  • the file acquisition module 61 can be used to acquire at least one picture file and at least one video file; the first feature extraction module 63 can be used to extract facial features of each picture file; the second feature extraction module 65 can be used to extract The facial features of each video file; the file clustering module 67 can be used to compare the at least one image file and the at least one video file according to the facial features of each image file and the facial features of each video file. Perform clustering.
  • a hybrid clustering effect of pictures and videos can be realized, and pictures and videos can be classified according to shooting objects, which helps users to quickly determine pictures and videos containing the same shooting object. Perform operations such as viewing, sharing, and deleting.
  • the first feature extraction module 63 may be configured to perform: extract the facial features of all faces contained in the picture file; determine at least one target object, and extract from the facial features of all faces The facial features related to the at least one target object are used as the facial features of the picture file.
  • the second feature extraction module 65 may be configured to perform: extract at least one key frame image from the video file; extract the facial features of each key frame image; Face features, to determine the face features of the video file.
  • the process of extracting at least one key frame image from the video file by the second feature extraction module 65 may be configured to perform: perform image quality evaluation on each video frame image of the video file to obtain a quality score ; Obtain the quality threshold, compare the quality score of each video frame image with the quality threshold; determine the video frame image with the quality score greater than the quality threshold as the key frame image.
  • the process of extracting the facial features of each key frame image by the second feature extraction module 65 may be configured to perform: extract the facial features of all faces contained in the key frame image; determine at least one The target object, extracting the facial features related to the at least one target object from the facial features of all human faces, as the facial features of the key frame image.
  • the number of key frame images in the video file is more than two.
  • the second feature extraction module 65 determines the face of the video file according to the face feature of each key frame image.
  • the feature process can be configured to execute: sort each key frame image according to the time point of each key frame image in the video file to obtain an image sequence; determine the correlation between adjacent key frame images in the image sequence; Remove the key frame images whose correlation is less than the correlation threshold from two or more key frame images to obtain the key frame image set; determine the face characteristics of the video file according to the facial features of each key frame image in the key frame image set .
  • the process of determining the facial features of the video file by the second feature extraction module 65 using the facial features of the key frame images in the key frame image set may be configured to perform: Clustering the facial features of the key frame images to obtain the facial feature set of at least one object category; determine the score of each facial feature in the facial feature set of each object category; filter the facial feature with the highest score, As the face feature corresponding to the object category, the face feature corresponding to the object category is determined as the face feature of the video file.
  • the file obtaining module 61 may be configured to execute: obtain a candidate picture set and a candidate video set; perform face detection on each candidate picture in the candidate picture set, and select candidate pictures containing faces Determine as a picture file; perform face detection on each candidate video in the candidate video set, and determine the candidate video containing the face as a video file.
  • the document clustering device 7 may further include a result editing module 71.
  • the result editing module 71 may be configured to execute: display the results of the clustering; wherein, each cluster in the result of the clustering corresponds to a different face object category; The result of the class is edited and saved.
  • the example embodiments described here can be implemented by software, or can be implemented by combining software with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.) or on the network , Including several instructions to make a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) execute the method according to the embodiments of the present disclosure.
  • a computing device which may be a personal computer, a server, a terminal device, or a network device, etc.
  • modules or units of the device for action execution are mentioned in the above detailed description, this division is not mandatory.
  • the features and functions of two or more modules or units described above may be embodied in one module or unit.
  • the features and functions of a module or unit described above can be further divided into multiple modules or units to be embodied.

Abstract

A file clustering method, a file clustering apparatus, a computer-readable storage medium and an electronic device, which relate to the technical field of terminals. The file clustering method comprises: acquiring at least one picture file and at least one video file (S32); extracting a facial feature from each picture file (S34); extracting a facial feature from each video file (S36); and clustering the at least one picture file and the at least one video file according to the facial feature of each picture file and the facial feature of each video file (S38).

Description

文件聚类方法及装置、存储介质和电子设备File clustering method and device, storage medium and electronic equipment
相关申请的交叉引用Cross references to related applications
本申请要求于2019年12月27日提交的申请号为201911382475.0、名称为“文件聚类方法及装置、存储介质和电子设备”的中国专利申请的优先权,该中国专利申请的全部内容通过引用全部并入本文。This application claims the priority of the Chinese patent application with the application number 201911382475.0 and titled "File clustering method and device, storage medium and electronic equipment" filed on December 27, 2019. The entire content of the Chinese patent application is incorporated by reference. All are incorporated into this article.
技术领域Technical field
本公开涉及终端技术领域,具体而言,涉及一种文件聚类方法、文件聚类装置、计算机可读存储介质和电子设备。The present disclosure relates to the field of terminal technology, and in particular, to a file clustering method, a file clustering device, a computer-readable storage medium, and an electronic device.
背景技术Background technique
随着终端技术的发展,终端上可以处理和存储大量的图片和视频,这些图片和视频主要是利用终端上的摄像模组对场景进行拍摄而得到,其中,拍摄对象为人的情况占绝大多数。With the development of terminal technology, a large number of pictures and videos can be processed and stored on the terminal. These pictures and videos are mainly obtained by shooting the scene with the camera module on the terminal. Among them, the case where the subject is a person accounts for the vast majority .
在实际存储时,通常仅针对类型(图片或视频)、拍摄时间进行存储。存储方式单一,不便于用户快速找到属于同一对象的拍摄结果。In actual storage, storage is usually only for the type (picture or video) and shooting time. The storage method is single, and it is not convenient for users to quickly find the shooting results belonging to the same object.
发明内容Summary of the invention
根据本公开的第一方面,提供了一种文件聚类方法,包括:获取至少一个图片文件和至少一个视频文件;提取每个图片文件的人脸特征;提取每个视频文件的人脸特征;根据每个图片文件的人脸特征和每个视频文件的人脸特征,对所述至少一个图片文件和所述至少一个视频文件进行聚类。According to a first aspect of the present disclosure, there is provided a file clustering method, including: obtaining at least one picture file and at least one video file; extracting facial features of each picture file; extracting facial features of each video file; Clustering the at least one picture file and the at least one video file according to the face feature of each picture file and the face feature of each video file.
根据本公开的第二方面,提供了一种文件聚类装置,包括:文件获取模块,用于获取至少一个图片文件和至少一个视频文件;第一特征提取模块,用于提取每个图片文件的人脸特征;第二特征提取模块,用于提取每个视频文件的人脸特征;文件聚类模块,用于根据每个图片文件的人脸特征和每个视频文件的人脸特征,对所述至少一个图片文件和所述至少一个视频文件进行聚类。According to a second aspect of the present disclosure, there is provided a file clustering device, including: a file acquisition module for acquiring at least one picture file and at least one video file; a first feature extraction module for extracting information about each picture file Face features; the second feature extraction module is used to extract the face features of each video file; the file clustering module is used to compare all the face features of each image file and the face features of each video file The at least one picture file and the at least one video file are clustered.
根据本公开的第三方面,提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现上述的文件聚类方法。According to a third aspect of the present disclosure, there is provided a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the above-mentioned file clustering method is realized.
根据本公开的第四方面,提供了一种电子设备,包括处理器;存储器,用于存储一个或多个程序,当一个或多个程序被处理器执行时,使得所述处理器实现上述的文件聚类方法。According to a fourth aspect of the present disclosure, there is provided an electronic device, including a processor; a memory, configured to store one or more programs, and when the one or more programs are executed by the processor, the processor can realize the above File clustering method.
附图说明Description of the drawings
图1示出了可以应用本公开实施例的文件聚类方法或文件聚类装置的示例性系统架构的示意图;FIG. 1 shows a schematic diagram of an exemplary system architecture of a document clustering method or document clustering device to which an embodiment of the present disclosure can be applied;
图2示出了适于用来实现本公开实施例的电子设备的结构示意图;Figure 2 shows a schematic structural diagram of an electronic device suitable for implementing embodiments of the present disclosure;
图3示意性示出了根据本公开的示例性实施方式的文件聚类方法的流程图;Fig. 3 schematically shows a flowchart of a file clustering method according to an exemplary embodiment of the present disclosure;
图4示意性示出了根据本公开的示例性实施方式的提取视频文件的人脸特征的流程图;Fig. 4 schematically shows a flowchart of extracting facial features of a video file according to an exemplary embodiment of the present disclosure;
图5示意性示出了根据本公开的示例性实施方式的文件聚类的整个过程的流程图;FIG. 5 schematically shows a flowchart of the entire process of file clustering according to an exemplary embodiment of the present disclosure;
图6示意性示出了根据本公开的示例性实施方式的文件聚类装置的方框图;Fig. 6 schematically shows a block diagram of a file clustering apparatus according to an exemplary embodiment of the present disclosure;
图7示意性示出了根据本公开的另一示例性实施方式的文件聚类装置的方框图。Fig. 7 schematically shows a block diagram of a file clustering apparatus according to another exemplary embodiment of the present disclosure.
具体实施方式Detailed ways
现在将参考附图更全面地描述示例实施方式。然而,示例实施方式能够以多种形式实施,且不应被理解为限于在此阐述的范例;相反,提供这些实施方式使得本公开将更加全面和完整,并将示例实施方式的构思全面地传达给本领域的技术人员。所描述的特征、结构或特性可以以任何合适的方式结合在一个或更多实施方式中。在下面的描述中,提供许多具体细节从而给出对本公开的实施方式的充分理解。然而,本领域技术人员将意识到,可以实践本公开的技术方案而省略所述特定细节中的一个或更多,或者可以采用其它的方法、组元、装置、步骤等。在其它情况下,不详细示出或描述公知技术方案以避免喧宾夺主而使得本公开的各方面变得模糊。Example embodiments will now be described more fully with reference to the accompanying drawings. However, the example embodiments can be implemented in various forms, and should not be construed as being limited to the examples set forth herein; on the contrary, these embodiments are provided so that the present disclosure will be more comprehensive and complete, and the concept of the example embodiments will be fully conveyed To those skilled in the art. The described features, structures or characteristics can be combined in one or more embodiments in any suitable way. In the following description, many specific details are provided to give a sufficient understanding of the embodiments of the present disclosure. However, those skilled in the art will realize that the technical solutions of the present disclosure can be practiced without one or more of the specific details, or other methods, components, devices, steps, etc. can be used. In other cases, the well-known technical solutions are not shown or described in detail in order to avoid overwhelming the crowd and obscure all aspects of the present disclosure.
此外,附图仅为本公开的示意性图解,并非一定是按比例绘制。图中相同的附图标记表示相同或类似的部分,因而将省略对它们的重复描述。附图中所示的一些方框图是功能实体,不一定必须与物理或逻辑上独立的实体相对应。可以采用软件形式来实现这些功能实体,或在一个或多个硬件模块或集成电路中实现这些功能实体,或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。In addition, the drawings are only schematic illustrations of the present disclosure, and are not necessarily drawn to scale. The same reference numerals in the figures denote the same or similar parts, and thus their repeated description will be omitted. Some of the block diagrams shown in the drawings are functional entities and do not necessarily correspond to physically or logically independent entities. These functional entities may be implemented in the form of software, or implemented in one or more hardware modules or integrated circuits, or implemented in different networks and/or processor devices and/or microcontroller devices.
附图中所示的流程图仅是示例性说明,不是必须包括所有的步骤。例如,有的步骤还可以分解,而有的步骤可以合并或部分合并,因此实际执行的顺序有可能根据实际情况改变。另外,下面所有的术语“第一”、“第二”仅是为了区分的目的,不应作为本公开内容的限制。The flowchart shown in the drawings is only an exemplary description, and does not necessarily include all the steps. For example, some steps can be decomposed, and some steps can be combined or partially combined, so the actual execution order may be changed according to actual conditions. In addition, all the terms "first" and "second" below are only for distinguishing purposes and should not be regarded as a limitation of the present disclosure.
图1示出了可以应用本公开实施例的文件聚类方法或文件聚类装置的示例性系统架构的示意图。FIG. 1 shows a schematic diagram of an exemplary system architecture of a file clustering method or a file clustering device to which an embodiment of the present disclosure can be applied.
如图1所示,系统架构1000可以包括终端设备1001、1002、1003中的一种或多种,网络1004和服务器1005。网络1004用以在终端设备1001、1002、1003和服务器1005之间提供通信链路的介质。网络1004可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。As shown in FIG. 1, the system architecture 1000 may include one or more of terminal devices 1001, 1002, 1003, a network 1004 and a server 1005. The network 1004 is used to provide a medium for communication links between the terminal devices 1001, 1002, 1003 and the server 1005. The network 1004 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要, 可以具有任意数目的终端设备、网络和服务器。比如服务器1005可以是多个服务器组成的服务器集群等。It should be understood that the numbers of terminal devices, networks, and servers in FIG. 1 are merely illustrative. According to implementation needs, there can be any number of terminal devices, networks, and servers. For example, the server 1005 may be a server cluster composed of multiple servers.
用户可以使用终端设备1001、1002、1003通过网络1004与服务器1005交互,以接收或发送消息等。终端设备1001、1002、1003可以是具有显示屏的各种电子设备,包括但不限于智能手机、平板电脑、便携式计算机和台式计算机等等。The user can use the terminal devices 1001, 1002, 1003 to interact with the server 1005 through the network 1004 to receive or send messages and so on. The terminal devices 1001, 1002, 1003 may be various electronic devices with display screens, including but not limited to smart phones, tablet computers, portable computers, desktop computers, and so on.
在以终端设备1001、1002、1003执行下述文件聚类方法的实施例中,终端设备1001、1002、1003可以通过网络1004从服务器1005获取至少一个图片文件和至少一个视频文件,在这种情况下,服务器1005可以例如为云相册之类的云端平台。或者,终端设备1001、1002、1003可以通过其配备的摄像模组拍摄图片和视频,并从中获取至少一个图片文件和至少一个视频文件。又或者,终端设备1001、1002、1003获取到的图片文件和视频文件中一部分来自于由自身摄像模组拍摄而确定的,另一部分来自服务器1005。本公开对图片文件和视频文件的来源不做限制。In the embodiment where the terminal device 1001, 1002, 1003 executes the following file clustering method, the terminal device 1001, 1002, 1003 can obtain at least one picture file and at least one video file from the server 1005 through the network 1004, in this case Next, the server 1005 may be, for example, a cloud platform such as a cloud photo album. Alternatively, the terminal devices 1001, 1002, 1003 may take pictures and videos through the camera modules equipped with them, and obtain at least one picture file and at least one video file from them. Or, some of the picture files and video files acquired by the terminal devices 1001, 1002, 1003 are determined by shooting by the camera module itself, and the other part is from the server 1005. This disclosure does not limit the sources of image files and video files.
需要说明的是,本公开示例性实施方式所述的图片文件和视频文件均包含有人脸图像,也就是说,本公开主要针对的是基于人脸的图片及视频聚类。然而,应当理解的是,还可以将本公开方案应用于对其他拍摄对象的聚类,这些其他拍摄对象可以例如包括动物、车辆、建筑物等等,本公开对此不做限制。It should be noted that the picture files and video files described in the exemplary embodiments of the present disclosure both contain human face images, that is, the present disclosure mainly focuses on the clustering of pictures and videos based on human faces. However, it should be understood that the solution of the present disclosure can also be applied to clustering of other photographed objects, and these other photographed objects may include, for example, animals, vehicles, buildings, etc., which is not limited in the present disclosure.
接下来,针对获取到的图片文件和视频文件,终端设备1001、1002、1003可以分别提取人脸特征,并利用提取到的人脸特征对获取到的图片文件和视频文件进行聚类。使得针对同一拍摄对象的图片文件和视频文件能够被分配同一聚类ID,方便用户进行查看。Next, for the acquired picture files and video files, the terminal devices 1001, 1002, 1003 may respectively extract facial features, and use the extracted facial features to cluster the acquired picture files and video files. This allows image files and video files for the same subject to be assigned the same cluster ID, which is convenient for users to view.
在以服务器1005执行下述文件聚类方法的实施例中,服务器1005可以获取由终端设备1001、1002、1003的摄像模组拍摄的图片文件和视频文件,分别提取它们的人脸特征,并根据图片文件和视频文件的人脸特征对获取到的图片文件和视频文件进行聚类。In the embodiment in which the server 1005 executes the following file clustering method, the server 1005 can obtain the picture files and video files taken by the camera modules of the terminal devices 1001, 1002, 1003, extract their facial features, and use them according to The facial features of the picture files and video files cluster the obtained picture files and video files.
下面将以终端设备1001、1002、1003执行本公开方案为例进行说明,在这种情况下,本公开示例性实施方式的文件聚类装置可以被配置在终端设备1001、1002、1003中。The following will take the terminal equipment 1001, 1002, 1003 to execute the solution of the present disclosure as an example for description. In this case, the file clustering apparatus of the exemplary embodiment of the present disclosure may be configured in the terminal equipment 1001, 1002, 1003.
图2示出了适于用来实现本公开示例性实施方式的电子设备的示意图,该电子设备对应于上面例如手机的终端设备。需要说明的是,图2示出的电子设备仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。FIG. 2 shows a schematic diagram of an electronic device suitable for implementing the exemplary embodiments of the present disclosure, and the electronic device corresponds to the above terminal device such as a mobile phone. It should be noted that the electronic device shown in FIG. 2 is only an example, and should not bring any limitation to the function and scope of use of the embodiments of the present disclosure.
本公开的电子设备至少包括处理器和存储器,存储器用于存储一个或多个程序,当一个或多个程序被处理器执行时,使得处理器可以实现本公开示例性实施方式的文件聚类方法。The electronic device of the present disclosure includes at least a processor and a memory. The memory is used to store one or more programs. When the one or more programs are executed by the processor, the processor can implement the file clustering method of the exemplary embodiment of the present disclosure. .
具体的,如图2所示,电子设备200可以包括:处理器210、内部存储器221、外部存储器接口222、通用串行总线(Universal Serial Bus,USB)接口230、充电管理模块240、电源管理模块241、电池242、天线1、天线2、移动通信模块250、无线通信模块260、音频模块270、扬声器271、受话器272、麦克风273、耳机接口274、传感器模块280、显示屏290、摄像模组291、指示器292、马达293、按键294以及用户标识模块(Subscriber  Identification Module,SIM)卡接口295等。其中传感器模块280可以包括深度传感器2801、压力传感器2802、陀螺仪传感器2803、气压传感器2804、磁传感器2805、加速度传感器2806、距离传感器2807、接近光传感器2808、指纹传感器2809、温度传感器2810、触摸传感器2811、环境光传感器2812及骨传导传感器2813等。Specifically, as shown in FIG. 2, the electronic device 200 may include: a processor 210, an internal memory 221, an external memory interface 222, a universal serial bus (USB) interface 230, a charging management module 240, and a power management module 241, battery 242, antenna 1, antenna 2, mobile communication module 250, wireless communication module 260, audio module 270, speaker 271, receiver 272, microphone 273, earphone interface 274, sensor module 280, display screen 290, camera module 291 , Indicator 292, motor 293, button 294, Subscriber Identification Module (SIM) card interface 295, etc. The sensor module 280 may include a depth sensor 2801, a pressure sensor 2802, a gyroscope sensor 2803, an air pressure sensor 2804, a magnetic sensor 2805, an acceleration sensor 2806, a distance sensor 2807, a proximity light sensor 2808, a fingerprint sensor 2809, a temperature sensor 2810, and a touch sensor. 2811, ambient light sensor 2812, bone conduction sensor 2813, etc.
可以理解的是,本申请实施例示意的结构并不构成对电子设备200的具体限定。在本申请另一些实施例中,电子设备200可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件、软件或软件和硬件的组合实现。It can be understood that the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the electronic device 200. In other embodiments of the present application, the electronic device 200 may include more or fewer components than shown, or combine certain components, or split certain components, or arrange different components. The illustrated components can be implemented in hardware, software, or a combination of software and hardware.
处理器210可以包括一个或多个处理单元,例如:处理器210可以包括应用处理器(Application Processor,AP)、调制解调处理器、图形处理器(Graphics Processing Unit,GPU)、图像信号处理器(Image Signal Processor,ISP)、控制器、视频编解码器、数字信号处理器(Digital Signal Processor,DSP)、基带处理器和/或神经网络处理器(Neural-etwork Processing Unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。另外,处理器210中还可以设置存储器,用于存储指令和数据。The processor 210 may include one or more processing units. For example, the processor 210 may include an application processor (AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (Image Signal Processor, ISP), controller, video codec, digital signal processor (Digital Signal Processor, DSP), baseband processor and/or Neural-etwork Processing Unit (NPU), etc. Among them, the different processing units may be independent devices or integrated in one or more processors. In addition, a memory may be provided in the processor 210 to store instructions and data.
USB接口230是符合USB标准规范的接口,具体可以是MiniUSB接口,MicroUSB接口,USBTypeC接口等。USB接口230可以用于连接充电器为电子设备200充电,也可以用于电子设备200与外围设备之间传输数据。也可以用于连接耳机,通过耳机播放音频。该接口还可以用于连接其他电子设备,例如AR设备等。The USB interface 230 is an interface that complies with the USB standard specification, and specifically may be a MiniUSB interface, a MicroUSB interface, a USBTypeC interface, and so on. The USB interface 230 can be used to connect a charger to charge the electronic device 200, and can also be used to transfer data between the electronic device 200 and peripheral devices. It can also be used to connect earphones and play audio through earphones. This interface can also be used to connect other electronic devices, such as AR devices.
充电管理模块240用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。电源管理模块241用于连接电池242、充电管理模块240与处理器210。电源管理模块241接收电池242和/或充电管理模块240的输入,为处理器210、内部存储器221、显示屏290、摄像模组291和无线通信模块260等供电。The charging management module 240 is used to receive charging input from the charger. Among them, the charger can be a wireless charger or a wired charger. The power management module 241 is used to connect the battery 242, the charging management module 240, and the processor 210. The power management module 241 receives input from the battery 242 and/or the charging management module 240, and supplies power to the processor 210, the internal memory 221, the display screen 290, the camera module 291, and the wireless communication module 260.
电子设备200的无线通信功能可以通过天线1、天线2、移动通信模块250、无线通信模块260、调制解调处理器以及基带处理器等实现。The wireless communication function of the electronic device 200 can be implemented by the antenna 1, the antenna 2, the mobile communication module 250, the wireless communication module 260, the modem processor, and the baseband processor.
移动通信模块250可以提供应用在电子设备200上的包括2G/3G/4G/5G等无线通信的解决方案。The mobile communication module 250 can provide a wireless communication solution including 2G/3G/4G/5G and the like applied to the electronic device 200.
无线通信模块260可以提供应用在电子设备200上的包括无线局域网(Wireless Local Area Networks,WLAN)(如无线保真(Wireless Fidelity,Wi-Fi)网络)、蓝牙(Bluetooth,BT)、全球导航卫星系统(Global Navigation Satellite System,GNSS)、调频(Frequency Modulation,FM)、近距离无线通信技术(Near Field Communication,NFC)、红外技术(Infrared,IR)等无线通信的解决方案。The wireless communication module 260 can provide wireless local area networks (Wireless Local Area Networks, WLAN) (such as Wireless Fidelity (Wi-Fi) networks), Bluetooth (Bluetooth, BT), and global navigation satellites used on the electronic device 200. System (Global Navigation Satellite System, GNSS), Frequency Modulation (FM), Near Field Communication (NFC), Infrared Technology (Infrared, IR) and other wireless communication solutions.
电子设备200通过GPU、显示屏290及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏290和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器210可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。The electronic device 200 implements a display function through a GPU, a display screen 290, an application processor, and the like. The GPU is a microprocessor for image processing and is connected to the display screen 290 and the application processor. The GPU is used to perform mathematical and geometric calculations and is used for graphics rendering. The processor 210 may include one or more GPUs that execute program instructions to generate or change display information.
电子设备200可以通过ISP、摄像模组291、视频编解码器、GPU、显示屏290及应 用处理器等实现拍摄功能。在一些实施例中,电子设备200可以包括1个或N个摄像模组291,N为大于1的正整数,若电子设备200包括N个摄像头,N个摄像头中有一个是主摄像头。The electronic device 200 can realize a shooting function through an ISP, a camera module 291, a video codec, a GPU, a display screen 290, and an application processor. In some embodiments, the electronic device 200 may include 1 or N camera modules 291, and N is a positive integer greater than 1. If the electronic device 200 includes N cameras, one of the N cameras is the main camera.
内部存储器221可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。内部存储器221可以包括存储程序区和存储数据区。外部存储器接口222可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备200的存储能力。The internal memory 221 may be used to store computer executable program code, where the executable program code includes instructions. The internal memory 221 may include a storage program area and a storage data area. The external memory interface 222 may be used to connect an external memory card, such as a Micro SD card, so as to expand the storage capacity of the electronic device 200.
电子设备200可以通过音频模块270、扬声器271、受话器272、麦克风273、耳机接口274及应用处理器等实现音频功能。例如音乐播放、录音等。The electronic device 200 can implement audio functions through an audio module 270, a speaker 271, a receiver 272, a microphone 273, a headphone interface 274, an application processor, and the like. For example, music playback, recording, etc.
音频模块270用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块270还可以用于对音频信号编码和解码。在一些实施例中,音频模块270可以设置于处理器210中,或将音频模块270的部分功能模块设置于处理器210中。The audio module 270 is used to convert digital audio information into an analog audio signal for output, and is also used to convert an analog audio input into a digital audio signal. The audio module 270 can also be used to encode and decode audio signals. In some embodiments, the audio module 270 may be provided in the processor 210, or part of the functional modules of the audio module 270 may be provided in the processor 210.
扬声器271,也称“喇叭”,用于将音频电信号转换为声音信号。电子设备200可以通过扬声器271收听音乐,或收听免提通话。受话器272,也称“听筒”,用于将音频电信号转换成声音信号。当电子设备200接听电话或语音信息时,可以通过将受话器272靠近人耳接听语音。麦克风273,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风273发声,将声音信号输入到麦克风273。电子设备200可以设置至少一个麦克风273。耳机接口274用于连接有线耳机。The speaker 271, also called a "speaker", is used to convert audio electrical signals into sound signals. The electronic device 200 can listen to music through the speaker 271, or listen to a hands-free call. The receiver 272, also called "earpiece", is used to convert audio electrical signals into sound signals. When the electronic device 200 answers a call or voice message, it can receive the voice by bringing the receiver 272 close to the human ear. The microphone 273, also called "microphone" or "microphone", is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can approach the microphone 273 through the mouth to make a sound, and input the sound signal to the microphone 273. The electronic device 200 may be provided with at least one microphone 273. The earphone interface 274 is used to connect wired earphones.
针对电子设备200包括的传感器,深度传感器2801用于获取景物的深度信息。压力传感器2802用于感受压力信号,可以将压力信号转换成电信号。陀螺仪传感器2803可以用于确定电子设备200的运动姿态。气压传感器2804用于测量气压。磁传感器2805包括霍尔传感器。电子设备200可以利用磁传感器2805检测翻盖皮套的开合。加速度传感器2806可检测电子设备200在各个方向上(一般为三轴)加速度的大小。距离传感器2807用于测量距离。接近光传感器2808可以包括例如发光二极管(LED)和光检测器,例如光电二极管。指纹传感器2809用于采集指纹。温度传感器2810用于检测温度。触摸传感器2811可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏290提供与触摸操作相关的视觉输出。环境光传感器2812用于感知环境光亮度。骨传导传感器2813可以获取振动信号。Regarding the sensors included in the electronic device 200, the depth sensor 2801 is used to obtain depth information of the scene. The pressure sensor 2802 is used to sense the pressure signal and can convert the pressure signal into an electrical signal. The gyro sensor 2803 may be used to determine the movement posture of the electronic device 200. The air pressure sensor 2804 is used to measure air pressure. The magnetic sensor 2805 includes a Hall sensor. The electronic device 200 can use the magnetic sensor 2805 to detect the opening and closing of the flip holster. The acceleration sensor 2806 can detect the magnitude of the acceleration of the electronic device 200 in various directions (generally three axes). The distance sensor 2807 is used to measure distance. The proximity light sensor 2808 may include, for example, a light emitting diode (LED) and a light detector, such as a photodiode. The fingerprint sensor 2809 is used to collect fingerprints. The temperature sensor 2810 is used to detect temperature. The touch sensor 2811 may pass the detected touch operation to the application processor to determine the type of the touch event. The visual output related to the touch operation can be provided through the display screen 290. The ambient light sensor 2812 is used to sense the brightness of the ambient light. The bone conduction sensor 2813 can acquire vibration signals.
按键294包括开机键,音量键等。按键294可以是机械按键。也可以是触摸式按键。马达293可以产生振动提示。马达293可以用于来电振动提示,也可以用于触摸振动反馈。指示器292可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。SIM卡接口295用于连接SIM卡。电子设备200通过SIM卡和网络交互,实现通话以及数据通信等功能。The button 294 includes a power-on button, a volume button, and so on. The button 294 may be a mechanical button. It can also be a touch button. The motor 293 can generate vibration prompts. The motor 293 can be used for incoming call vibration notification, and can also be used for touch vibration feedback. The indicator 292 can be an indicator light, which can be used to indicate the charging status, power change, or to indicate messages, missed calls, notifications, and so on. The SIM card interface 295 is used to connect to the SIM card. The electronic device 200 interacts with the network through the SIM card to implement functions such as call and data communication.
本申请还提供了一种计算机可读存储介质,该计算机可读存储介质可以是上述实施例 中描述的电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。The present application also provides a computer-readable storage medium. The computer-readable storage medium may be included in the electronic device described in the foregoing embodiment; or it may exist alone without being assembled into the electronic device.
计算机可读存储介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。The computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable removable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In the present disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device.
计算机可读存储介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。计算机可读存储介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、电线、光缆、RF等等,或者上述的任意合适的组合。The computer-readable storage medium can send, propagate, or transmit the program for use by or in combination with the instruction execution system, apparatus, or device. The program code contained on the computer-readable storage medium can be transmitted by any suitable medium, including but not limited to: wireless, wire, optical cable, RF, etc., or any suitable combination of the above.
计算机可读存储介质承载有一个或者多个程序,当上述一个或者多个程序被一个该电子设备执行时,使得该电子设备实现如下述实施例中所述的方法。The computer-readable storage medium carries one or more programs, and when the above one or more programs are executed by an electronic device, the electronic device realizes the method described in the following embodiments.
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,上述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图或流程图中的每个方框、以及框图或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowcharts and block diagrams in the accompanying drawings illustrate the possible implementation architecture, functions, and operations of the system, method, and computer program product according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagram may represent a module, program segment, or part of the code, and the above-mentioned module, program segment, or part of the code contains one or more for realizing the specified logic function. Executable instructions. It should also be noted that, in some alternative implementations, the functions marked in the block may also occur in a different order from the order marked in the drawings. For example, two blocks shown one after another can actually be executed substantially in parallel, and they can sometimes be executed in the reverse order, depending on the functions involved. It should also be noted that each block in the block diagram or flowchart, and the combination of blocks in the block diagram or flowchart, can be implemented by a dedicated hardware-based system that performs the specified functions or operations, or can be implemented by It is realized by a combination of dedicated hardware and computer instructions.
描述于本公开实施例中所涉及到的单元可以通过软件的方式实现,也可以通过硬件的方式来实现,所描述的单元也可以设置在处理器中。其中,这些单元的名称在某种情况下并不构成对该单元本身的限定。The units described in the embodiments of the present disclosure may be implemented in software or hardware, and the described units may also be provided in a processor. Among them, the names of these units do not constitute a limitation on the unit itself under certain circumstances.
图3示意性示出了本公开的示例性实施方式的文件聚类方法的流程图。参考图3,所述文件聚类方法可以包括以下步骤:FIG. 3 schematically shows a flowchart of a file clustering method according to an exemplary embodiment of the present disclosure. Referring to FIG. 3, the document clustering method may include the following steps:
S32.获取至少一个图片文件和至少一个视频文件。S32. Obtain at least one picture file and at least one video file.
在本公开的示例性实施方式中,对于获取到的文件,终端设备可以判别该文件的数据格式,以确定该文件是图片文件,还是视频文件。In the exemplary embodiment of the present disclosure, for the acquired file, the terminal device can determine the data format of the file to determine whether the file is a picture file or a video file.
针对图片文件,终端设备可以采用人脸检测算法确定该图片文件是否包含人脸,如果包含人脸,则将该文件确定为本公开待进行聚类的图片文件,如果不包含人脸,则可以将该图片文件丢弃,其中,丢弃意指不将该图片文件作为本公开待进行聚类的图片文件。For a picture file, the terminal device can use a face detection algorithm to determine whether the picture file contains a human face. If it contains a human face, the file is determined to be the picture file to be clustered in this disclosure. If it does not contain a human face, it can be Discard the picture file, where “discard” means not to use the picture file as a picture file to be clustered in the present disclosure.
针对视频文件,终端设备可以采用基于视频的人脸检测算法确定该视频文件是否包含人脸,类似地,如果包含人脸,则将该视频文件确定为本公开待进行聚类的视频文件,如 果不包含人脸,则可以将该视频文件丢弃,其中,丢弃意指不将该视频文件作为本公开待进行聚类的视频文件。For a video file, the terminal device can use a video-based face detection algorithm to determine whether the video file contains a human face. Similarly, if it contains a human face, the video file is determined to be the video file to be clustered in this disclosure, if If a human face is not included, the video file can be discarded, where discarding means not using the video file as a video file to be clustered in the present disclosure.
也就是说,在包含图片和视频的实际场景中,终端设备首先可以获取候选图片集合和候选视频集合,该候选图片集合中包括至少一个候选图片,该候选视频集合中包括至少一个候选视频。接下来,一方面,对候选图片集合中每个候选图片进行人脸检测,将包含人脸的候选图片确定为上述图片文件,用于执行下述方法过程;另一方面,对候选视频集合中每个候选视频进行人脸检测,将包含人脸的候选视频确定为上述视频文件,用于执行下述方法过程。That is, in an actual scene containing pictures and videos, the terminal device may first obtain a candidate picture set and a candidate video set, the candidate picture set includes at least one candidate picture, and the candidate video set includes at least one candidate video. Next, on the one hand, face detection is performed on each candidate picture in the candidate picture set, and the candidate picture containing the face is determined as the above-mentioned picture file, which is used to perform the following method and process; on the other hand, for the candidate video set Face detection is performed on each candidate video, and the candidate video containing the face is determined as the above-mentioned video file, which is used to perform the following method and process.
根据本公开的一些实施例,终端设备可以直接获取包含人脸图像的至少一个图片文件和包含人脸图像的至少一个视频文件。对于具体确定图片文件和视频文件是否包含人脸图像的过程,可以由在执行本公开方案之前的前端模块执行,本公开对该前端模块除确定是否包含人脸图像外的功能不做限制。According to some embodiments of the present disclosure, the terminal device can directly acquire at least one picture file containing a face image and at least one video file containing a face image. The process of specifically determining whether a picture file and a video file contain a face image can be performed by the front-end module before executing the solution of the present disclosure, and the present disclosure does not limit the function of the front-end module except for determining whether a face image is contained.
S34.提取每个图片文件的人脸特征。S34. Extract the facial features of each picture file.
针对步骤S32中获取到的每一个图片文件,均执行下述过程:For each picture file obtained in step S32, the following process is performed:
首先,可以提取图片文件包含的所有人脸的人脸特征。具体的,本公开一些实施例可以通过卷积神经网络来提取图片文件包含的所有人脸的人脸特征。本公开对卷积神经网络的模型结构及训练过程均不做限制。另外,除卷积神经网络外,还可以采用基于几何特征的方法、基于模板匹配的方式、基于小波理论的方法、基于隐马尔可夫模型的方法、基于支持向量机的方法等来实现人脸特征的提取,本公开对此不做特殊限制。First, the facial features of all faces contained in the image file can be extracted. Specifically, some embodiments of the present disclosure may extract facial features of all faces contained in a picture file through a convolutional neural network. The present disclosure does not limit the model structure and training process of the convolutional neural network. In addition, in addition to convolutional neural networks, methods based on geometric features, template matching methods, methods based on wavelet theory, methods based on hidden Markov models, methods based on support vector machines, etc. can also be used to implement human faces. The feature extraction is not particularly limited in this disclosure.
接下来,确定至少一个目标对象,并从所有人脸的人脸特征中提取与所述至少一个目标对象相关的人脸特征,作为该图片文件的人脸特征。Next, at least one target object is determined, and facial features related to the at least one target object are extracted from the facial features of all faces as the facial features of the image file.
其中,以拍摄的场景为例,目标对象可以是拍摄场景下确定出的对象,又可称为目标拍摄对象。另外,在图片文件并非对应拍摄场景的情况下,例如,从互联网上下载的或其他用户传送的图片文件,目标对象可以是由用户指定的对象,本公开对此不做限制。Among them, taking the shooting scene as an example, the target object may be an object determined in the shooting scene, and may also be referred to as a target shooting object. In addition, in the case where the picture file does not correspond to the shooting scene, for example, a picture file downloaded from the Internet or transmitted by other users, the target object may be an object designated by the user, which is not limited in the present disclosure.
针对拍摄场景下从多个拍摄对象中确定目标拍摄对象的过程,在一个实施例中,可以获取用户相机预览时点击拍摄对象的选择操作来确定出目标拍摄对象,也就是说,在预览图片时,用户在屏幕上的点击位置所对应的对象为目标拍摄对象。在另一个实施例中,用户可以自行设定目标拍摄对象的确定标准,来确定出目标拍摄对象,这些确定标准例如可以包括但不限于:在历史图片中重复出现次数超过预定次数、身高低于120cm的儿童、戴帽子的拍摄对象,等等。For the process of determining the target subject from multiple subjects in the shooting scene, in one embodiment, the user's selection operation of clicking on the subject during the camera preview can be obtained to determine the target subject, that is, when previewing the picture , The object corresponding to the position where the user clicks on the screen is the target shooting object. In another embodiment, the user can set the determination criteria of the target subject to determine the target subject. These determination criteria may include, but are not limited to, for example, the number of repetitions in the historical picture exceeds a predetermined number, and the height is less than 120cm children, subjects wearing hats, etc.
应当注意的是,根据本公开的另一些实施例,通过人脸特征提取过程,可以从图片文件中截取仅包含目标对象的子图片,作为之后分析、聚类、展示的图片,并保存于相册中。容易看出,子图片中不包含非目标对象的人脸图像。It should be noted that, according to some other embodiments of the present disclosure, through the facial feature extraction process, sub-pictures containing only the target object can be intercepted from the picture file, and used as pictures for later analysis, clustering, and display, and save them in the album. in. It is easy to see that the sub-picture does not contain the face image of the non-target object.
在步骤S32中获取的多个图片文件中存在不包含人脸的图片文件的情况下,通过步骤S34提取人脸特征的过程,还可以将这种不包含人脸的图片文件剔除。In the case that there are picture files that do not contain human faces among the multiple picture files obtained in step S32, through the process of extracting human face features in step S34, such picture files that do not contain human faces can also be eliminated.
S36.提取每个视频文件的人脸特征。S36. Extract the facial features of each video file.
参考图4中的步骤S402至步骤S406对本领域技术人员提取视频文件的人脸特征的过程进行说明。With reference to step S402 to step S406 in FIG. 4, the process of extracting the facial features of the video file by those skilled in the art will be described.
在步骤S402中,从视频文件中提取至少一个关键帧图像。In step S402, at least one key frame image is extracted from the video file.
根据本公开的一些实施例,首先,终端设备可以对视频文件的每个视频帧图像进行图像质量评价,得到质量评分。具体的,可以基于饱和度、曝光量等因素来确定质量评分。另外,还可以基于人类视觉系统(Human Visual System,HVS)来对各视频帧图像进行图像质量评价。According to some embodiments of the present disclosure, first, the terminal device may perform image quality evaluation on each video frame image of the video file to obtain a quality score. Specifically, the quality score can be determined based on factors such as saturation and exposure. In addition, the image quality of each video frame image can also be evaluated based on the Human Visual System (HVS).
接下来,终端设备可以获取质量阈值,并将每个视频帧图像的质量评分与该质量阈值进行比较,并将质量评分大于质量阈值的视频帧图像确定为关键帧图像。其中,质量阈值可以预先提前设定,本公开对其数值不做限制。例如,在以图像质量评分范围为0至10的实例中,质量阈值可以设置为7.5。另外,可以结合终端设备的处理能力来确定质量阈值,例如,终端设备的处理能力越高,质量阈值可以设置偏低,以获取到多个关键帧图像。Next, the terminal device may obtain the quality threshold, compare the quality score of each video frame image with the quality threshold, and determine the video frame image with the quality score greater than the quality threshold as the key frame image. Among them, the quality threshold can be set in advance, and the present disclosure does not limit its value. For example, in an example where the image quality score ranges from 0 to 10, the quality threshold may be set to 7.5. In addition, the quality threshold may be determined in combination with the processing capability of the terminal device. For example, the higher the processing capability of the terminal device, the lower the quality threshold may be set to obtain multiple key frame images.
根据本公开的另一些实施例,针对视频文件,可以以预定时间间隔抽取视频帧图像,作为关键帧图像,例如,该预定时间间隔可以例如为3秒等。According to other embodiments of the present disclosure, for a video file, the video frame image may be extracted at a predetermined time interval as a key frame image, for example, the predetermined time interval may be, for example, 3 seconds.
根据本公开的又一些实施例,终端设备可以通过分析手段从视频文件中仅提取一个视频帧图像,作为关键帧图像,来代表整个视频文件。According to still other embodiments of the present disclosure, the terminal device may extract only one video frame image from the video file through analysis means, as a key frame image, to represent the entire video file.
在步骤S404中,提取每个关键帧图像的人脸特征。In step S404, the facial features of each key frame image are extracted.
与上述确定图片文件的人脸特征类似,首先,可以提取关键帧图像包含的所有人脸的人脸特征,具体的,也可以采用卷积神经网络来提取关键帧图像包含的所有人脸的人脸特征,并且此处所采用的卷积神经网络可以与上述确定图片文件的人脸特征的卷积神经网络相同。接下来,确定至少一个目标对象,从所有人脸的人脸特征中提取与至少一个目标对象相关的人脸特征,作为关键帧图像的人脸特征。Similar to determining the facial features of the picture file described above, first, the facial features of all faces contained in the key frame image can be extracted. Specifically, a convolutional neural network can also be used to extract the face features of all faces contained in the key frame image. Face features, and the convolutional neural network used here can be the same as the convolutional neural network for determining the facial features of the picture file. Next, at least one target object is determined, and the facial features related to the at least one target object are extracted from the facial features of all faces as the facial features of the key frame image.
在步骤S406中,根据每个关键帧图像的人脸特征,确定视频文件的人脸特征。In step S406, the facial features of the video file are determined according to the facial features of each key frame image.
根据本公开的一些实施例,可以将每个关键帧图像的人脸特征作为视频文件的人脸特征。也就是说,在仅提取一个关键帧图像的实例中,将该关键帧图像中所有的人脸特征作为视频文件的人脸特征;在提取两个以上关键帧图像的实例中,将每个关键帧图像的所有人脸特征作为视频文件的人脸特征。另外,这里对人脸特征对应的对象类别的数量不做限制,也就是说,对关键帧图像中包含不同人脸的数量不做限制。According to some embodiments of the present disclosure, the facial features of each key frame image can be used as the facial features of the video file. That is to say, in the example of extracting only one key frame image, all the facial features in the key frame image are taken as the facial features of the video file; in the example of extracting more than two key frame images, each key frame image is extracted. All facial features of the frame image are used as the facial features of the video file. In addition, there is no restriction on the number of object categories corresponding to facial features, that is, there is no restriction on the number of different human faces contained in the key frame image.
根据本公开的另一些实施例,视频文件中提取的关键帧图像的数量为至少两个,在这种情况下,首先,可以根据各关键帧图像在视频文件中的时间点对各关键帧图像进行排序,也就是说,可以按视频文件播放时各关键帧图像出现的先后顺序进行排序,以得到图像序列;接下来,可以确定图像序列中各相邻关键帧图像之间的相关性,并从这些关键帧图像中剔除相关性小于相关性阈值所对应的关键帧图像,得到关键帧图像集合;然后,可以根据关键帧图像集合中每个关键帧图像的人脸特征,确定视频文件的人脸特征。在一个实施 例中,可以将该关键帧图像集合中每个关键帧图像的人脸特征作为视频文件的人脸特征。According to other embodiments of the present disclosure, the number of key frame images extracted in the video file is at least two. In this case, firstly, each key frame image can be adjusted according to the time point of each key frame image in the video file. Sorting, that is, you can sort by the order in which each key frame image appears when the video file is played to obtain an image sequence; next, you can determine the correlation between adjacent key frame images in the image sequence, and The key frame images whose correlation is less than the correlation threshold are removed from these key frame images to obtain the key frame image set; then, the person of the video file can be determined according to the facial features of each key frame image in the key frame image set Face features. In an embodiment, the facial feature of each key frame image in the set of key frame images can be used as the facial feature of the video file.
具体的,关键帧图像之间的相关性可以基于图像质量、目标对象相似性确定出。图像质量越高、目标对象相似性越高,则认为相关性越高。例如,存在关键帧图像序列A、B、C、D、E,其中,图像B较模糊,图像E中目标对象与其他图像中目标对象的相似度较小,则可以将图像B和图像E从该序列中剔除。需要说明的是,还可以结合图像质量与相似性,并利用二者加权的方式,确定出相关性。Specifically, the correlation between the key frame images can be determined based on the image quality and the similarity of the target object. The higher the image quality and the higher the similarity of the target object, the higher the correlation. For example, there are key frame image sequences A, B, C, D, E, where image B is blurry, and the target object in image E is less similar to the target object in other images. Then image B and image E can be changed from Removed from the sequence. It should be noted that the image quality and similarity can also be combined, and the two weighting methods can be used to determine the correlation.
另外,在另一个实施例中,针对利用关键帧图像集合中关键帧图像的人脸特征确定视频文件的人脸特征的过程,具体的,首先,对关键帧图像集合中每个关键帧图像的人脸特征进行聚类,得到至少一个对象类别的人脸特征集合,其中,不同人脸对应不同对象类别。接下来,确定每个对象类别的人脸特征集合中各人脸特征的评分,针对每个对象类别,筛选出评分最高的人脸特征,作为与每个对象类别分别对应的人脸特征,并将该人脸特征确定为视频文件的人脸特征。In addition, in another embodiment, for the process of using the facial features of the key frame images in the key frame image set to determine the facial features of the video file, specifically, first of all, for each key frame image in the key frame image set. The face features are clustered to obtain a face feature set of at least one object category, where different faces correspond to different object categories. Next, determine the score of each face feature in the face feature set of each object category, and for each object category, filter out the face feature with the highest score as the face feature corresponding to each object category, and The facial feature is determined as the facial feature of the video file.
针对人脸进行打分的过程,可以基于上述卷积神经网络的特征评分结果来确定出,另外,还可以自行构建人脸评分模型,以对不同的人脸特征进行评分,本公开对此不做限制。The process of scoring a face can be determined based on the feature scoring result of the above-mentioned convolutional neural network. In addition, a face scoring model can also be constructed by itself to score different facial features. This disclosure does not do this limit.
例如,视频文件剔除相关性弱的图像后剩下10个关键帧图像,每个关键帧图像有a、b、c三个对象,根据对象不同进行聚类,可以分成三类。随后,可以通过分析确定出人脸评分,并确定每个簇中评分最高的人脸特征,作为视频文件的人脸特征。For example, a video file leaves 10 key frame images after excluding weakly correlated images. Each key frame image has three objects a, b, and c, which can be clustered according to different objects and can be divided into three categories. Subsequently, the face score can be determined through analysis, and the face feature with the highest score in each cluster can be determined as the face feature of the video file.
根据本公开的又一些实施例,在确定出每个关键帧图像的人脸特征后,对这些人脸特征进行聚类,以区分不同拍摄对象。然后,针对每一个拍摄对象,从聚类的结果中确定出人脸评分最高的人脸特征,作为视频文件的人脸特征。According to still other embodiments of the present disclosure, after the facial features of each key frame image are determined, these facial features are clustered to distinguish different photographed objects. Then, for each subject, the face feature with the highest face score is determined from the clustering result, and used as the face feature of the video file.
可以理解的是,在步骤S32中获取的多个视频文件中存在不包含人脸的视频文件的情况下,通过步骤S36提取人脸特征的过程,还可以将这种不包含人脸的视频文件剔除。It is understandable that in the case that there are video files that do not contain human faces among the multiple video files obtained in step S32, the process of extracting facial features in step S36 can also convert such video files that do not contain human faces. Culling.
此外,本公开示例性实施方式的步骤S34和步骤S36的顺序可以互换。In addition, the order of step S34 and step S36 of the exemplary embodiment of the present disclosure may be interchanged.
S38.根据每个图片文件的人脸特征和每个视频文件的人脸特征,对所述至少一个图片文件和所述至少一个视频文件进行聚类。S38. Cluster the at least one picture file and the at least one video file according to the facial features of each picture file and the facial features of each video file.
在本公开的示例性实施方式中,按照拍摄对象的不同,利用步骤S34和步骤S36确定出的人脸特征,可以对步骤S32中获取的至少一个图片文件和至少一个视频文件进行聚类。具体的,可以采用例如K-means(K-means clustering algorithm,K均值聚类算法)等机器学习算法实现聚类过程,本公开对此不做限制。In the exemplary embodiment of the present disclosure, according to different photographing objects, the at least one picture file and the at least one video file obtained in step S32 can be clustered by using the facial features determined in step S34 and step S36. Specifically, a machine learning algorithm such as K-means (K-means clustering algorithm, K-means clustering algorithm) may be used to implement the clustering process, which is not limited in the present disclosure.
进行聚类后,不同的簇对应不同的拍摄对象。也就是说,图片文件与视频文件按聚类ID进行了划分,拍摄对象与聚类ID一一对应。After clustering, different clusters correspond to different shooting objects. That is to say, the picture file and the video file are divided according to the cluster ID, and the shooting object corresponds to the cluster ID one to one.
此外,针对一目标对象,提取图片文件中仅包含目标对象的子图片,为该子图片和对应包含目标对象的视频文件分配同一聚类ID。或者,提取视频文件中包含目标对象的视频段,并为该视频段和上述子图片分配同一聚类ID。In addition, for a target object, extract only the sub-picture of the target object in the picture file, and assign the same cluster ID to the sub-picture and the corresponding video file containing the target object. Or, extract a video segment containing the target object in the video file, and assign the same cluster ID to the video segment and the aforementioned sub-picture.
本公开还提供了一种对聚类的结果进行编辑的方案。The present disclosure also provides a solution for editing the clustering result.
首先,终端设备可以将聚类的结果进行展示,具体的,可以分模块展示在相册中;接下来,终端设备可以响应用户针对聚类的结果的编辑操作,对聚类的结果进行编辑,并保存编辑后的结果。其中,编辑操作可以包括但不限于:修改相册名称、删除一个或多个图片文件、删除一个或多个视频文件、添加批注、改变尺寸等。First, the terminal device can display the clustering result, specifically, it can be displayed in the album in modules; next, the terminal device can respond to the user's editing operation on the clustering result, edit the clustering result, and Save the edited result. The editing operations may include, but are not limited to: modifying the name of the album, deleting one or more picture files, deleting one or more video files, adding comments, changing the size, and so on.
另外,还可以将编辑后的聚类结果上传至云端,以便进行备份。In addition, you can upload the edited clustering results to the cloud for backup.
下面将参考图5对本公开示例性实施方式的文件聚类的整个过程进行说明。The entire process of file clustering in an exemplary embodiment of the present disclosure will be described below with reference to FIG. 5.
在步骤S512中,终端设备可以获取至少一个图片文件;在步骤S514中,终端设备可以提取每个图片文件的人脸特征;在步骤S516中,终端设备对人脸特征进行特征过滤,以去除图像中用户不感兴趣的人脸信息。In step S512, the terminal device can obtain at least one picture file; in step S514, the terminal device can extract the facial features of each picture file; in step S516, the terminal device performs feature filtering on the facial features to remove the image Face information that users are not interested in.
在步骤S522中,终端设备可以获取至少一个视频文件;在步骤S524中,终端设备可以提取每个视频文件的关键帧图像;在步骤S526中,终端设备从关键帧图像中提取人脸特征;在步骤S528中,终端设备可以进行特征去噪,也就是说,去除关键帧图像中用户不感兴趣的人脸信息以及相关性较差的关键帧图像,另外,进行个体聚类,也就是说,针对不同的拍摄对象进行聚类,以确定出每一个拍摄对象质量较好的人脸特征,作为视频文件的人脸特征。In step S522, the terminal device can obtain at least one video file; in step S524, the terminal device can extract the key frame image of each video file; in step S526, the terminal device extracts facial features from the key frame image; In step S528, the terminal device may perform feature denoising, that is to say, remove the facial information that is not of interest to the user in the key frame image and the key frame image with poor correlation. In addition, perform individual clustering, that is, for Different photographed objects are clustered to determine the facial features of each photographed object with better quality as the facial features of the video file.
在步骤S530中,利用人脸特征,对图片文件与视频文件进行聚类,针对同一对象分配同一聚类ID。In step S530, the picture file and the video file are clustered using the facial features, and the same cluster ID is assigned to the same object.
应当注意,尽管在附图中以特定顺序描述了本公开中方法的各个步骤,但是,这并非要求或者暗示必须按照该特定顺序来执行这些步骤,或是必须执行全部所示的步骤才能实现期望的结果。附加的或备选的,可以省略某些步骤,将多个步骤合并为一个步骤执行,以及/或者将一个步骤分解为多个步骤执行等。It should be noted that although the various steps of the method in the present disclosure are described in a specific order in the drawings, this does not require or imply that these steps must be performed in the specific order, or that all the steps shown must be performed to achieve the desired the result of. Additionally or alternatively, some steps may be omitted, multiple steps may be combined into one step for execution, and/or one step may be decomposed into multiple steps for execution, etc.
进一步的,本示例实施方式中还提供了一种文件聚类装置。Further, this exemplary embodiment also provides a file clustering device.
图6示意性示出了本公开的示例性实施方式的文件聚类装置的方框图。参考图6,根据本公开的示例性实施方式的文件聚类装置6可以包括文件获取模块61、第一特征提取模块63、第二特征提取模块65和文件聚类模块67。Fig. 6 schematically shows a block diagram of a file clustering apparatus according to an exemplary embodiment of the present disclosure. Referring to FIG. 6, the document clustering apparatus 6 according to an exemplary embodiment of the present disclosure may include a document acquisition module 61, a first feature extraction module 63, a second feature extraction module 65, and a document clustering module 67.
具体的,文件获取模块61可以用于获取至少一个图片文件和至少一个视频文件;第一特征提取模块63可以用于提取每个图片文件的人脸特征;第二特征提取模块65可以用于提取每个视频文件的人脸特征;文件聚类模块67可以用于根据每个图片文件的人脸特征和每个视频文件的人脸特征,对所述至少一个图片文件和所述至少一个视频文件进行聚类。Specifically, the file acquisition module 61 can be used to acquire at least one picture file and at least one video file; the first feature extraction module 63 can be used to extract facial features of each picture file; the second feature extraction module 65 can be used to extract The facial features of each video file; the file clustering module 67 can be used to compare the at least one image file and the at least one video file according to the facial features of each image file and the facial features of each video file. Perform clustering.
基于本公开示例性实施方式的文件聚类装置,可以实现图片和视频的混合聚类效果,将图片和视频按拍摄对象进行分类,有助于用户快速确定出包含同一拍摄对象的图片和视频,进行查看、分享、删除等操作。Based on the file clustering device of the exemplary embodiment of the present disclosure, a hybrid clustering effect of pictures and videos can be realized, and pictures and videos can be classified according to shooting objects, which helps users to quickly determine pictures and videos containing the same shooting object. Perform operations such as viewing, sharing, and deleting.
根据本公开的示例性实施例,第一特征提取模块63可以被配置为执行:提取图片文件包含的所有人脸的人脸特征;确定至少一个目标对象,从所有人脸的人脸特征中提取与 所述至少一个目标对象相关的人脸特征,作为图片文件的人脸特征。According to an exemplary embodiment of the present disclosure, the first feature extraction module 63 may be configured to perform: extract the facial features of all faces contained in the picture file; determine at least one target object, and extract from the facial features of all faces The facial features related to the at least one target object are used as the facial features of the picture file.
根据本公开的示例性实施例,第二特征提取模块65可以被配置为执行:从视频文件中提取至少一个关键帧图像;提取每个关键帧图像的人脸特征;根据每个关键帧图像的人脸特征,确定视频文件的人脸特征。According to an exemplary embodiment of the present disclosure, the second feature extraction module 65 may be configured to perform: extract at least one key frame image from the video file; extract the facial features of each key frame image; Face features, to determine the face features of the video file.
根据本公开的示例性实施例,第二特征提取模块65从视频文件中提取至少一个关键帧图像的过程可以被配置为执行:对视频文件的每个视频帧图像进行图像质量评价,得到质量评分;获取质量阈值,将每个视频帧图像的质量评分与质量阈值进行比较;将质量评分大于质量阈值的视频帧图像确定为关键帧图像。According to an exemplary embodiment of the present disclosure, the process of extracting at least one key frame image from the video file by the second feature extraction module 65 may be configured to perform: perform image quality evaluation on each video frame image of the video file to obtain a quality score ; Obtain the quality threshold, compare the quality score of each video frame image with the quality threshold; determine the video frame image with the quality score greater than the quality threshold as the key frame image.
根据本公开的示例性实施例,第二特征提取模块65提取每个关键帧图像的人脸特征的过程可以被配置为执行:提取关键帧图像包含的所有人脸的人脸特征;确定至少一个目标对象,从所有人脸的人脸特征中提取与所述至少一个目标对象相关的人脸特征,作为关键帧图像的人脸特征。According to an exemplary embodiment of the present disclosure, the process of extracting the facial features of each key frame image by the second feature extraction module 65 may be configured to perform: extract the facial features of all faces contained in the key frame image; determine at least one The target object, extracting the facial features related to the at least one target object from the facial features of all human faces, as the facial features of the key frame image.
根据本公开的示例性实施例,视频文件中关键帧图像的数量为两个以上,在这种情况下,第二特征提取模块65根据每个关键帧图像的人脸特征确定视频文件的人脸特征的过程可以被配置为执行:根据各关键帧图像在视频文件中的时间点对各关键帧图像进行排序,得到图像序列;确定图像序列中各相邻关键帧图像之间的相关性;从两个以上关键帧图像中剔除相关性小于相关性阈值所对应的关键帧图像,得到关键帧图像集合;根据关键帧图像集合中每个关键帧图像的人脸特征,确定视频文件的人脸特征。According to an exemplary embodiment of the present disclosure, the number of key frame images in the video file is more than two. In this case, the second feature extraction module 65 determines the face of the video file according to the face feature of each key frame image. The feature process can be configured to execute: sort each key frame image according to the time point of each key frame image in the video file to obtain an image sequence; determine the correlation between adjacent key frame images in the image sequence; Remove the key frame images whose correlation is less than the correlation threshold from two or more key frame images to obtain the key frame image set; determine the face characteristics of the video file according to the facial features of each key frame image in the key frame image set .
根据本公开的示例性实施例,第二特征提取模块65利用关键帧图像集合中关键帧图像的人脸特征确定视频文件的人脸特征的过程可以被配置为执行:对关键帧图像集合中每个关键帧图像的人脸特征进行聚类,得到至少一个对象类别的人脸特征集合;确定每个对象类别的人脸特征集合中各人脸特征的评分;筛选出评分最高的人脸特征,作为与对象类别对应的人脸特征,并将对象类别对应的人脸特征确定为视频文件的人脸特征。According to an exemplary embodiment of the present disclosure, the process of determining the facial features of the video file by the second feature extraction module 65 using the facial features of the key frame images in the key frame image set may be configured to perform: Clustering the facial features of the key frame images to obtain the facial feature set of at least one object category; determine the score of each facial feature in the facial feature set of each object category; filter the facial feature with the highest score, As the face feature corresponding to the object category, the face feature corresponding to the object category is determined as the face feature of the video file.
根据本公开的示例性实施例,文件获取模块61可以被配置为执行:获取候选图片集合和候选视频集合;对候选图片集合中每个候选图片进行人脸检测,并将包含人脸的候选图片确定为图片文件;对候选视频集合中每个候选视频进行人脸检测,并将包含人脸的候选视频确定为视频文件。According to an exemplary embodiment of the present disclosure, the file obtaining module 61 may be configured to execute: obtain a candidate picture set and a candidate video set; perform face detection on each candidate picture in the candidate picture set, and select candidate pictures containing faces Determine as a picture file; perform face detection on each candidate video in the candidate video set, and determine the candidate video containing the face as a video file.
根据本公开的示例性实施例,参考图7,相比于文件聚类装置6,文件聚类装置7还可以包括结果编辑模块71。According to an exemplary embodiment of the present disclosure, referring to FIG. 7, compared with the document clustering device 6, the document clustering device 7 may further include a result editing module 71.
具体的,结果编辑模块71可以被配置为执行:将聚类的结果进行展示;其中,聚类的结果中每个簇对应不同人脸对象类别;响应针对聚类的结果的编辑操作,对聚类的结果进行编辑并保存。Specifically, the result editing module 71 may be configured to execute: display the results of the clustering; wherein, each cluster in the result of the clustering corresponds to a different face object category; The result of the class is edited and saved.
由于本公开实施方式的文件聚类装置的各个功能模块与上述方法实施方式中相同,因此在此不再赘述。Since each functional module of the file clustering device in the embodiment of the present disclosure is the same as in the above method embodiment, it will not be repeated here.
通过以上的实施方式的描述,本领域的技术人员易于理解,这里描述的示例实施方式 可以通过软件实现,也可以通过软件结合必要的硬件的方式来实现。因此,根据本公开实施方式的技术方案可以以软件产品的形式体现出来,该软件产品可以存储在一个非易失性存储介质(可以是CD-ROM,U盘,移动硬盘等)中或网络上,包括若干指令以使得一台计算设备(可以是个人计算机、服务器、终端装置、或者网络设备等)执行根据本公开实施方式的方法。Through the description of the above embodiments, those skilled in the art can easily understand that the example embodiments described here can be implemented by software, or can be implemented by combining software with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.) or on the network , Including several instructions to make a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) execute the method according to the embodiments of the present disclosure.
此外,上述附图仅是根据本公开示例性实施例的方法所包括的处理的示意性说明,而不是限制目的。易于理解,上述附图所示的处理并不表明或限制这些处理的时间顺序。另外,也易于理解,这些处理可以是例如在多个模块中同步或异步执行的。In addition, the above-mentioned drawings are merely schematic illustrations of the processing included in the method according to the exemplary embodiments of the present disclosure, and are not intended for limitation. It is easy to understand that the processing shown in the above drawings does not indicate or limit the time sequence of these processings. In addition, it is easy to understand that these processes can be executed synchronously or asynchronously in multiple modules, for example.
应当注意,尽管在上文详细描述中提及了用于动作执行的设备的若干模块或者单元,但是这种划分并非强制性的。实际上,根据本公开的实施方式,上文描述的两个或更多模块或者单元的特征和功能可以在一个模块或者单元中具体化。反之,上文描述的一个模块或者单元的特征和功能可以进一步划分为由多个模块或者单元来具体化。It should be noted that although several modules or units of the device for action execution are mentioned in the above detailed description, this division is not mandatory. In fact, according to the embodiments of the present disclosure, the features and functions of two or more modules or units described above may be embodied in one module or unit. Conversely, the features and functions of a module or unit described above can be further divided into multiple modules or units to be embodied.
本领域技术人员在考虑说明书及实践这里公开的内容后,将容易想到本公开的其他实施例。本申请旨在涵盖本公开的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本公开的一般性原理并包括本公开未公开的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本公开的真正范围和精神由权利要求指出。Those skilled in the art will easily think of other embodiments of the present disclosure after considering the description and practicing the content disclosed herein. This application is intended to cover any variations, uses, or adaptive changes of the present disclosure. These variations, uses, or adaptive changes follow the general principles of the present disclosure and include common knowledge or conventional technical means in the technical field that are not disclosed in the present disclosure. . The description and the embodiments are only regarded as exemplary, and the true scope and spirit of the present disclosure are pointed out by the claims.
应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限。It should be understood that the present disclosure is not limited to the precise structure that has been described above and shown in the drawings, and various modifications and changes can be made without departing from its scope. The scope of the present disclosure is limited only by the appended claims.

Claims (20)

  1. 一种文件聚类方法,包括:A file clustering method, including:
    获取至少一个图片文件和至少一个视频文件;Obtain at least one picture file and at least one video file;
    提取每个所述图片文件的人脸特征;Extract the facial features of each of the picture files;
    提取每个所述视频文件的人脸特征;Extract the facial features of each of the video files;
    根据每个所述图片文件的人脸特征和每个所述视频文件的人脸特征,对所述至少一个图片文件和所述至少一个视频文件进行聚类。Clustering the at least one picture file and the at least one video file according to the facial features of each of the picture files and the facial characteristics of each of the video files.
  2. 根据权利要求1所述的文件聚类方法,其中,提取每个所述图片文件的人脸特征包括:The document clustering method according to claim 1, wherein extracting the facial features of each of the picture files comprises:
    提取所述图片文件包含的所有人脸的人脸特征;Extracting facial features of all faces contained in the picture file;
    确定至少一个目标对象,从所述所有人脸的人脸特征中提取与所述至少一个目标对象相关的人脸特征,作为所述图片文件的人脸特征。At least one target object is determined, and the facial features related to the at least one target object are extracted from the facial features of all the faces as the facial features of the picture file.
  3. 根据权利要求1所述的文件聚类方法,其中,提取每个所述视频文件的人脸特征包括:The file clustering method according to claim 1, wherein extracting the facial features of each of the video files comprises:
    从所述视频文件中提取至少一个关键帧图像;Extract at least one key frame image from the video file;
    提取每个所述关键帧图像的人脸特征;Extracting the facial features of each of the key frame images;
    根据每个所述关键帧图像的人脸特征,确定所述视频文件的人脸特征。Determine the face feature of the video file according to the face feature of each of the key frame images.
  4. 根据权利要求3所述的文件聚类方法,其中,从所述视频文件中提取至少一个关键帧图像包括:4. The file clustering method according to claim 3, wherein extracting at least one key frame image from the video file comprises:
    对所述视频文件的每个视频帧图像进行图像质量评价,得到质量评分;Perform image quality evaluation on each video frame image of the video file to obtain a quality score;
    获取质量阈值,将每个所述视频帧图像的质量评分与所述质量阈值进行比较;Acquiring a quality threshold, and comparing the quality score of each video frame image with the quality threshold;
    将质量评分大于所述质量阈值的视频帧图像确定为所述关键帧图像。A video frame image with a quality score greater than the quality threshold is determined as the key frame image.
  5. 根据权利要求3所述的文件聚类方法,其中,提取每个所述关键帧图像的人脸特征包括:The document clustering method according to claim 3, wherein extracting the facial features of each of the key frame images comprises:
    提取所述关键帧图像包含的所有人脸的人脸特征;Extracting facial features of all faces included in the key frame image;
    确定至少一个目标对象,从所述所有人脸的人脸特征中提取与所述至少一个目标对象相关的人脸特征,作为所述关键帧图像的人脸特征。At least one target object is determined, and the facial features related to the at least one target object are extracted from the facial features of the human faces as the facial features of the key frame image.
  6. 根据权利要求5所述的文件聚类方法,其中,从所述视频文件中提取两个以上关键帧图像;其中,根据每个所述关键帧图像的人脸特征,确定所述视频文件的人脸特征,包括:The file clustering method according to claim 5, wherein more than two key frame images are extracted from the video file; wherein, according to the facial features of each of the key frame images, the person of the video file is determined Facial features, including:
    根据各所述关键帧图像在所述视频文件中的时间点对各所述关键帧图像进行排序,得到图像序列;Sorting each of the key frame images according to the time point of each of the key frame images in the video file to obtain an image sequence;
    确定所述图像序列中各相邻关键帧图像之间的相关性;Determining the correlation between adjacent key frame images in the image sequence;
    从所述两个以上关键帧图像中剔除相关性小于相关性阈值所对应的关键帧图像,得到 关键帧图像集合;Removing key frame images whose correlation is less than the correlation threshold from the two or more key frame images to obtain a key frame image set;
    根据所述关键帧图像集合中每个关键帧图像的人脸特征,确定所述视频文件的人脸特征。Determine the face feature of the video file according to the face feature of each key frame image in the set of key frame images.
  7. 根据权利要求6所述的文件聚类方法,其中,根据所述关键帧图像集合中每个关键帧图像的人脸特征,确定所述视频文件的人脸特征,包括:The file clustering method according to claim 6, wherein determining the facial features of the video file according to the facial features of each key frame image in the set of key frame images includes:
    对所述关键帧图像集合中每个关键帧图像的人脸特征进行聚类,得到至少一个对象类别的人脸特征集合;Clustering the face features of each key frame image in the key frame image set to obtain a face feature set of at least one object category;
    确定每个所述对象类别的人脸特征集合中各人脸特征的评分;Determining the score of each face feature in the face feature set of each object category;
    筛选出评分最高的人脸特征,作为与所述对象类别对应的人脸特征;Filter out the face feature with the highest score as the face feature corresponding to the object category;
    将所述对象类别对应的人脸特征确定为所述视频文件的人脸特征。The face feature corresponding to the object category is determined as the face feature of the video file.
  8. 根据权利要求1所述的文件聚类方法,其中,所述文件聚类方法还包括:The document clustering method according to claim 1, wherein the document clustering method further comprises:
    获取候选图片集合和候选视频集合;Obtaining a set of candidate pictures and a set of candidate videos;
    对所述候选图片集合中每个候选图片进行人脸检测,并将包含人脸的候选图片确定为所述图片文件;Performing face detection on each candidate picture in the candidate picture set, and determining a candidate picture containing the face as the picture file;
    对所述候选视频集合中每个候选视频进行人脸检测,并将包含人脸的候选视频确定为所述视频文件。Perform face detection on each candidate video in the candidate video set, and determine the candidate video containing the face as the video file.
  9. 根据权利要求1至8中任一项所述的文件聚类方法,其中,在对所述至少一个图片文件和所述至少一个视频文件进行聚类后,所述文件聚类方法还包括:The file clustering method according to any one of claims 1 to 8, wherein, after clustering the at least one picture file and the at least one video file, the file clustering method further comprises:
    将聚类的结果进行展示;其中,聚类的结果中每个簇对应不同人脸对象类别;Display the clustering results; among them, each cluster in the clustering result corresponds to a different face object category;
    响应针对所述聚类的结果的编辑操作,对所述聚类的结果进行编辑并保存。In response to the editing operation on the result of the cluster, the result of the cluster is edited and saved.
  10. 一种文件聚类装置,包括:A file clustering device includes:
    文件获取模块,被配置为获取至少一个图片文件和至少一个视频文件;The file obtaining module is configured to obtain at least one picture file and at least one video file;
    第一特征提取模块,被配置为提取每个所述图片文件的人脸特征;The first feature extraction module is configured to extract the facial features of each of the picture files;
    第二特征提取模块,被配置为提取每个所述视频文件的人脸特征;The second feature extraction module is configured to extract the facial features of each of the video files;
    文件聚类模块,被配置为根据每个所述图片文件的人脸特征和每个所述视频文件的人脸特征,对所述至少一个图片文件和所述至少一个视频文件进行聚类。The file clustering module is configured to cluster the at least one image file and the at least one video file according to the facial features of each of the image files and the facial features of each of the video files.
  11. 根据权利要求10所述的文件聚类装置,其中,所述第一特征提取模块被配置为提取所述图片文件包含的所有人脸的人脸特征,确定至少一个目标对象,从所述所有人脸的人脸特征中提取与所述至少一个目标对象相关的人脸特征,作为所述图片文件的人脸特征。The document clustering device according to claim 10, wherein the first feature extraction module is configured to extract facial features of all faces contained in the image file, determine at least one target object, From the facial features of the face, the facial features related to the at least one target object are extracted as the facial features of the picture file.
  12. 根据权利要求10所述的文件聚类装置,其中,所述第二特征提取模块被配置为从所述视频文件中提取至少一个关键帧图像,提取每个所述关键帧图像的人脸特征,根据每个所述关键帧图像的人脸特征,确定所述视频文件的人脸特征。The document clustering device according to claim 10, wherein the second feature extraction module is configured to extract at least one key frame image from the video file, and extract the facial features of each of the key frame images, Determine the face feature of the video file according to the face feature of each of the key frame images.
  13. 根据权利要求12所述的文件聚类装置,其中,所述第二特征提取模块从所述视频文件中提取至少一个关键帧图像的过程被配置为:对所述视频文件的每个视频帧图像进 行图像质量评价,得到质量评分,获取质量阈值,将每个所述视频帧图像的质量评分与所述质量阈值进行比较,将质量评分大于所述质量阈值的视频帧图像确定为所述关键帧图像。The file clustering device according to claim 12, wherein the process of extracting at least one key frame image from the video file by the second feature extraction module is configured to: for each video frame image of the video file Perform image quality evaluation to obtain a quality score, obtain a quality threshold, compare the quality score of each video frame image with the quality threshold, and determine a video frame image with a quality score greater than the quality threshold as the key frame image.
  14. 根据权利要求12所述的文件聚类装置,其中,所述第二特征提取模块提取每个所述关键帧图像的人脸特征的过程被配置为:提取所述关键帧图像包含的所有人脸的人脸特征,确定至少一个目标对象,从所述所有人脸的人脸特征中提取与所述至少一个目标对象相关的人脸特征,作为所述关键帧图像的人脸特征。The document clustering device according to claim 12, wherein the process of extracting the facial features of each of the key frame images by the second feature extraction module is configured to: extract all faces contained in the key frame images At least one target object is determined, and the facial features related to the at least one target object are extracted from the facial features of all faces as the facial features of the key frame image.
  15. 根据权利要求14所述的文件聚类装置,其中,在从所述视频文件中提取两个以上关键帧图像的情况下,所述第二特征提取模块根据每个所述关键帧图像的人脸特征确定所述视频文件的人脸特征的过程被配置为:根据各所述关键帧图像在所述视频文件中的时间点对各所述关键帧图像进行排序,得到图像序列,确定所述图像序列中各相邻关键帧图像之间的相关性,从所述两个以上关键帧图像中剔除相关性小于相关性阈值所对应的关键帧图像,得到关键帧图像集合,根据所述关键帧图像集合中每个关键帧图像的人脸特征,确定所述视频文件的人脸特征。The document clustering device according to claim 14, wherein in the case of extracting more than two key frame images from the video file, the second feature extraction module is based on the face of each of the key frame images. The process of determining the facial features of the video file is configured to: sort the key frame images according to the time point of each key frame image in the video file to obtain an image sequence, and determine the image The correlation between adjacent key frame images in the sequence is removed from the two or more key frame images whose correlation is less than the correlation threshold corresponding to key frame images to obtain a set of key frame images, according to the key frame images The facial features of each key frame image in the set are determined to determine the facial features of the video file.
  16. 根据权利要求15所述的文件聚类装置,其中,所述第二特征提取模块根据所述关键帧图像集合中每个关键帧图像的人脸特征确定所述视频文件的人脸特征的过程被配置为:对所述关键帧图像集合中每个关键帧图像的人脸特征进行聚类,得到至少一个对象类别的人脸特征集合,确定每个所述对象类别的人脸特征集合中各人脸特征的评分,筛选出评分最高的人脸特征,作为与所述对象类别对应的人脸特征,将所述对象类别对应的人脸特征确定为所述视频文件的人脸特征。The file clustering device according to claim 15, wherein the process of determining the facial features of the video file by the second feature extraction module according to the facial features of each key frame image in the set of key frame images is controlled by The configuration is configured to: cluster the face features of each key frame image in the key frame image set to obtain a face feature set of at least one object category, and determine each person in the face feature set of each object category According to the facial feature score, the facial feature with the highest score is selected as the facial feature corresponding to the object category, and the facial feature corresponding to the object category is determined as the facial feature of the video file.
  17. 根据权利要求10所述的文件聚类装置,其中,所述文件获取模块被配置为:获取候选图片集合和候选视频集合,对所述候选图片集合中每个候选图片进行人脸检测,并将包含人脸的候选图片确定为所述图片文件,对所述候选视频集合中每个候选视频进行人脸检测,并将包含人脸的候选视频确定为所述视频文件。The file clustering device according to claim 10, wherein the file obtaining module is configured to obtain a set of candidate pictures and a set of candidate videos, perform face detection on each candidate picture in the set of candidate pictures, and combine A candidate picture containing a human face is determined as the picture file, face detection is performed on each candidate video in the candidate video set, and the candidate video containing a human face is determined as the video file.
  18. 根据权利要求10至17中任一项所述的文件聚类装置,其中,所述文件聚类装置还包括:The document clustering device according to any one of claims 10 to 17, wherein the document clustering device further comprises:
    结果编辑模块,被配置为将聚类的结果进行展示;其中,聚类的结果中每个簇对应不同人脸对象类别,响应针对所述聚类的结果的编辑操作,对所述聚类的结果进行编辑并保存。The result editing module is configured to display the results of the clustering; wherein, each cluster in the result of the clustering corresponds to a different face object category, and in response to the editing operation on the result of the cluster, the result of the cluster is The results are edited and saved.
  19. 一种计算机可读存储介质,其上存储有计算机程序,所述程序被处理器执行时实现如权利要求1至9中任一项所述的文件聚类方法。A computer-readable storage medium having a computer program stored thereon, and when the program is executed by a processor, the file clustering method according to any one of claims 1 to 9 is realized.
  20. 一种电子设备,包括:An electronic device including:
    处理器;processor;
    存储器,被配置为存储一个或多个程序,当所述一个或多个程序被所述处理器执行时,使得所述处理器实现如权利要求1至9中任一项所述的文件聚类方法。The memory is configured to store one or more programs, and when the one or more programs are executed by the processor, the processor realizes the file clustering according to any one of claims 1 to 9 method.
PCT/CN2020/136176 2019-12-27 2020-12-14 File clustering method and apparatus, and storage medium and electronic device WO2021129444A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911382475.0A CN111177086A (en) 2019-12-27 2019-12-27 File clustering method and device, storage medium and electronic equipment
CN201911382475.0 2019-12-27

Publications (1)

Publication Number Publication Date
WO2021129444A1 true WO2021129444A1 (en) 2021-07-01

Family

ID=70623970

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/136176 WO2021129444A1 (en) 2019-12-27 2020-12-14 File clustering method and apparatus, and storage medium and electronic device

Country Status (2)

Country Link
CN (1) CN111177086A (en)
WO (1) WO2021129444A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111177086A (en) * 2019-12-27 2020-05-19 Oppo广东移动通信有限公司 File clustering method and device, storage medium and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020165964A1 (en) * 2001-04-19 2002-11-07 International Business Machines Corporation Method and apparatus for providing a single system image in a clustered environment
CN105426515A (en) * 2015-12-01 2016-03-23 小米科技有限责任公司 Video classification method and apparatus
CN110175549A (en) * 2019-05-20 2019-08-27 腾讯科技(深圳)有限公司 Face image processing process, device, equipment and storage medium
CN111177086A (en) * 2019-12-27 2020-05-19 Oppo广东移动通信有限公司 File clustering method and device, storage medium and electronic equipment

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5149570B2 (en) * 2006-10-16 2013-02-20 キヤノン株式会社 File management apparatus, file management apparatus control method, and program
CN105631408B (en) * 2015-12-21 2019-12-27 小米科技有限责任公司 Face photo album processing method and device based on video
CN110347876A (en) * 2019-07-12 2019-10-18 Oppo广东移动通信有限公司 Video classification methods, device, terminal device and computer readable storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020165964A1 (en) * 2001-04-19 2002-11-07 International Business Machines Corporation Method and apparatus for providing a single system image in a clustered environment
CN105426515A (en) * 2015-12-01 2016-03-23 小米科技有限责任公司 Video classification method and apparatus
CN110175549A (en) * 2019-05-20 2019-08-27 腾讯科技(深圳)有限公司 Face image processing process, device, equipment and storage medium
CN111177086A (en) * 2019-12-27 2020-05-19 Oppo广东移动通信有限公司 File clustering method and device, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN111177086A (en) 2020-05-19

Similar Documents

Publication Publication Date Title
CN108594997B (en) Gesture skeleton construction method, device, equipment and storage medium
US11889180B2 (en) Photographing method and electronic device
US20220415010A1 (en) Map construction method, relocalization method, and electronic device
WO2021129514A1 (en) Augmented reality processing method, apparatus and system, and storage medium, and electronic device
KR101906827B1 (en) Apparatus and method for taking a picture continously
CN111083364A (en) Control method, electronic equipment, computer readable storage medium and chip
CN110650379B (en) Video abstract generation method and device, electronic equipment and storage medium
CN111429517A (en) Relocation method, relocation device, storage medium and electronic device
WO2021184952A1 (en) Augmented reality processing method and apparatus, storage medium, and electronic device
CN111815666B (en) Image processing method and device, computer readable storage medium and electronic equipment
JP7100824B2 (en) Data processing equipment, data processing methods and programs
CN111161176B (en) Image processing method and device, storage medium and electronic equipment
CN111784614A (en) Image denoising method and device, storage medium and electronic equipment
CN111917980B (en) Photographing control method and device, storage medium and electronic equipment
CN110600040B (en) Voiceprint feature registration method and device, computer equipment and storage medium
CN111625670A (en) Picture grouping method and device
US20210350823A1 (en) Systems and methods for processing audio and video using a voice print
CN111311758A (en) Augmented reality processing method and device, storage medium and electronic equipment
CN105635452A (en) Mobile terminal and contact person identification method thereof
CN113574525A (en) Media content recommendation method and equipment
CN111368127B (en) Image processing method, image processing device, computer equipment and storage medium
CN111432245A (en) Multimedia information playing control method, device, equipment and storage medium
WO2021129444A1 (en) File clustering method and apparatus, and storage medium and electronic device
CN105426904A (en) Photo processing method, apparatus and device
WO2022199500A1 (en) Model training method, scene recognition method, and related device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20907967

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20907967

Country of ref document: EP

Kind code of ref document: A1