WO2021129444A1 - Procédé et appareil de regroupement de fichiers, et support de stockage et dispositif électronique - Google Patents

Procédé et appareil de regroupement de fichiers, et support de stockage et dispositif électronique Download PDF

Info

Publication number
WO2021129444A1
WO2021129444A1 PCT/CN2020/136176 CN2020136176W WO2021129444A1 WO 2021129444 A1 WO2021129444 A1 WO 2021129444A1 CN 2020136176 W CN2020136176 W CN 2020136176W WO 2021129444 A1 WO2021129444 A1 WO 2021129444A1
Authority
WO
WIPO (PCT)
Prior art keywords
file
key frame
facial features
video
frame image
Prior art date
Application number
PCT/CN2020/136176
Other languages
English (en)
Chinese (zh)
Inventor
彭冬炜
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2021129444A1 publication Critical patent/WO2021129444A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation

Definitions

  • the present disclosure relates to the field of terminal technology, and in particular, to a file clustering method, a file clustering device, a computer-readable storage medium, and an electronic device.
  • a file clustering method including: obtaining at least one picture file and at least one video file; extracting facial features of each picture file; extracting facial features of each video file; Clustering the at least one picture file and the at least one video file according to the face feature of each picture file and the face feature of each video file.
  • a file clustering device including: a file acquisition module for acquiring at least one picture file and at least one video file; a first feature extraction module for extracting information about each picture file Face features; the second feature extraction module is used to extract the face features of each video file; the file clustering module is used to compare all the face features of each image file and the face features of each video file The at least one picture file and the at least one video file are clustered.
  • a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the above-mentioned file clustering method is realized.
  • an electronic device including a processor; a memory, configured to store one or more programs, and when the one or more programs are executed by the processor, the processor can realize the above File clustering method.
  • FIG. 1 shows a schematic diagram of an exemplary system architecture of a document clustering method or document clustering device to which an embodiment of the present disclosure can be applied;
  • Figure 2 shows a schematic structural diagram of an electronic device suitable for implementing embodiments of the present disclosure
  • FIG. 3 schematically shows a flowchart of a file clustering method according to an exemplary embodiment of the present disclosure
  • Fig. 4 schematically shows a flowchart of extracting facial features of a video file according to an exemplary embodiment of the present disclosure
  • FIG. 5 schematically shows a flowchart of the entire process of file clustering according to an exemplary embodiment of the present disclosure
  • Fig. 6 schematically shows a block diagram of a file clustering apparatus according to an exemplary embodiment of the present disclosure
  • Fig. 7 schematically shows a block diagram of a file clustering apparatus according to another exemplary embodiment of the present disclosure.
  • FIG. 1 shows a schematic diagram of an exemplary system architecture of a file clustering method or a file clustering device to which an embodiment of the present disclosure can be applied.
  • the system architecture 1000 may include one or more of terminal devices 1001, 1002, 1003, a network 1004 and a server 1005.
  • the network 1004 is used to provide a medium for communication links between the terminal devices 1001, 1002, 1003 and the server 1005.
  • the network 1004 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.
  • the numbers of terminal devices, networks, and servers in FIG. 1 are merely illustrative. According to implementation needs, there can be any number of terminal devices, networks, and servers.
  • the server 1005 may be a server cluster composed of multiple servers.
  • the user can use the terminal devices 1001, 1002, 1003 to interact with the server 1005 through the network 1004 to receive or send messages and so on.
  • the terminal devices 1001, 1002, 1003 may be various electronic devices with display screens, including but not limited to smart phones, tablet computers, portable computers, desktop computers, and so on.
  • the terminal device 1001, 1002, 1003 can obtain at least one picture file and at least one video file from the server 1005 through the network 1004, in this case
  • the server 1005 may be, for example, a cloud platform such as a cloud photo album.
  • the terminal devices 1001, 1002, 1003 may take pictures and videos through the camera modules equipped with them, and obtain at least one picture file and at least one video file from them.
  • some of the picture files and video files acquired by the terminal devices 1001, 1002, 1003 are determined by shooting by the camera module itself, and the other part is from the server 1005. This disclosure does not limit the sources of image files and video files.
  • the picture files and video files described in the exemplary embodiments of the present disclosure both contain human face images, that is, the present disclosure mainly focuses on the clustering of pictures and videos based on human faces.
  • the solution of the present disclosure can also be applied to clustering of other photographed objects, and these other photographed objects may include, for example, animals, vehicles, buildings, etc., which is not limited in the present disclosure.
  • the terminal devices 1001, 1002, 1003 may respectively extract facial features, and use the extracted facial features to cluster the acquired picture files and video files. This allows image files and video files for the same subject to be assigned the same cluster ID, which is convenient for users to view.
  • the server 1005 can obtain the picture files and video files taken by the camera modules of the terminal devices 1001, 1002, 1003, extract their facial features, and use them according to The facial features of the picture files and video files cluster the obtained picture files and video files.
  • the file clustering apparatus of the exemplary embodiment of the present disclosure may be configured in the terminal equipment 1001, 1002, 1003.
  • FIG. 2 shows a schematic diagram of an electronic device suitable for implementing the exemplary embodiments of the present disclosure, and the electronic device corresponds to the above terminal device such as a mobile phone. It should be noted that the electronic device shown in FIG. 2 is only an example, and should not bring any limitation to the function and scope of use of the embodiments of the present disclosure.
  • the electronic device of the present disclosure includes at least a processor and a memory.
  • the memory is used to store one or more programs.
  • the processor can implement the file clustering method of the exemplary embodiment of the present disclosure. .
  • the electronic device 200 may include: a processor 210, an internal memory 221, an external memory interface 222, a universal serial bus (USB) interface 230, a charging management module 240, and a power management module 241, battery 242, antenna 1, antenna 2, mobile communication module 250, wireless communication module 260, audio module 270, speaker 271, receiver 272, microphone 273, earphone interface 274, sensor module 280, display screen 290, camera module 291 , Indicator 292, motor 293, button 294, Subscriber Identification Module (SIM) card interface 295, etc.
  • SIM Subscriber Identification Module
  • the sensor module 280 may include a depth sensor 2801, a pressure sensor 2802, a gyroscope sensor 2803, an air pressure sensor 2804, a magnetic sensor 2805, an acceleration sensor 2806, a distance sensor 2807, a proximity light sensor 2808, a fingerprint sensor 2809, a temperature sensor 2810, and a touch sensor. 2811, ambient light sensor 2812, bone conduction sensor 2813, etc.
  • the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the electronic device 200.
  • the electronic device 200 may include more or fewer components than shown, or combine certain components, or split certain components, or arrange different components.
  • the illustrated components can be implemented in hardware, software, or a combination of software and hardware.
  • the processor 210 may include one or more processing units.
  • the processor 210 may include an application processor (AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (Image Signal Processor, ISP), controller, video codec, digital signal processor (Digital Signal Processor, DSP), baseband processor and/or Neural-etwork Processing Unit (NPU), etc.
  • AP application processor
  • GPU graphics processing unit
  • ISP image Signal Processor
  • controller video codec
  • digital signal processor Digital Signal Processor
  • NPU Neural-etwork Processing Unit
  • the different processing units may be independent devices or integrated in one or more processors.
  • a memory may be provided in the processor 210 to store instructions and data.
  • the USB interface 230 is an interface that complies with the USB standard specification, and specifically may be a MiniUSB interface, a MicroUSB interface, a USBTypeC interface, and so on.
  • the USB interface 230 can be used to connect a charger to charge the electronic device 200, and can also be used to transfer data between the electronic device 200 and peripheral devices. It can also be used to connect earphones and play audio through earphones. This interface can also be used to connect other electronic devices, such as AR devices.
  • the charging management module 240 is used to receive charging input from the charger.
  • the charger can be a wireless charger or a wired charger.
  • the power management module 241 is used to connect the battery 242, the charging management module 240, and the processor 210.
  • the power management module 241 receives input from the battery 242 and/or the charging management module 240, and supplies power to the processor 210, the internal memory 221, the display screen 290, the camera module 291, and the wireless communication module 260.
  • the wireless communication function of the electronic device 200 can be implemented by the antenna 1, the antenna 2, the mobile communication module 250, the wireless communication module 260, the modem processor, and the baseband processor.
  • the mobile communication module 250 can provide a wireless communication solution including 2G/3G/4G/5G and the like applied to the electronic device 200.
  • the wireless communication module 260 can provide wireless local area networks (Wireless Local Area Networks, WLAN) (such as Wireless Fidelity (Wi-Fi) networks), Bluetooth (Bluetooth, BT), and global navigation satellites used on the electronic device 200.
  • WLAN Wireless Local Area Networks
  • Wi-Fi Wireless Fidelity
  • Bluetooth Bluetooth
  • BT Bluetooth
  • global navigation satellites used on the electronic device 200.
  • System Global Navigation Satellite System, GNSS
  • FM Frequency Modulation
  • NFC Near Field Communication
  • Infrared Technology Infrared, IR
  • the electronic device 200 implements a display function through a GPU, a display screen 290, an application processor, and the like.
  • the GPU is a microprocessor for image processing and is connected to the display screen 290 and the application processor.
  • the GPU is used to perform mathematical and geometric calculations and is used for graphics rendering.
  • the processor 210 may include one or more GPUs that execute program instructions to generate or change display information.
  • the electronic device 200 can realize a shooting function through an ISP, a camera module 291, a video codec, a GPU, a display screen 290, and an application processor.
  • the electronic device 200 may include 1 or N camera modules 291, and N is a positive integer greater than 1. If the electronic device 200 includes N cameras, one of the N cameras is the main camera.
  • the internal memory 221 may be used to store computer executable program code, where the executable program code includes instructions.
  • the internal memory 221 may include a storage program area and a storage data area.
  • the external memory interface 222 may be used to connect an external memory card, such as a Micro SD card, so as to expand the storage capacity of the electronic device 200.
  • the electronic device 200 can implement audio functions through an audio module 270, a speaker 271, a receiver 272, a microphone 273, a headphone interface 274, an application processor, and the like. For example, music playback, recording, etc.
  • the audio module 270 is used to convert digital audio information into an analog audio signal for output, and is also used to convert an analog audio input into a digital audio signal.
  • the audio module 270 can also be used to encode and decode audio signals.
  • the audio module 270 may be provided in the processor 210, or part of the functional modules of the audio module 270 may be provided in the processor 210.
  • the speaker 271 also called a "speaker” is used to convert audio electrical signals into sound signals.
  • the electronic device 200 can listen to music through the speaker 271, or listen to a hands-free call.
  • the microphone 273, also called “microphone” or “microphone”, is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can approach the microphone 273 through the mouth to make a sound, and input the sound signal to the microphone 273.
  • the electronic device 200 may be provided with at least one microphone 273.
  • the earphone interface 274 is used to connect wired earphones.
  • the depth sensor 2801 is used to obtain depth information of the scene.
  • the pressure sensor 2802 is used to sense the pressure signal and can convert the pressure signal into an electrical signal.
  • the gyro sensor 2803 may be used to determine the movement posture of the electronic device 200.
  • the air pressure sensor 2804 is used to measure air pressure.
  • the magnetic sensor 2805 includes a Hall sensor.
  • the electronic device 200 can use the magnetic sensor 2805 to detect the opening and closing of the flip holster.
  • the acceleration sensor 2806 can detect the magnitude of the acceleration of the electronic device 200 in various directions (generally three axes).
  • the distance sensor 2807 is used to measure distance.
  • the proximity light sensor 2808 may include, for example, a light emitting diode (LED) and a light detector, such as a photodiode.
  • the fingerprint sensor 2809 is used to collect fingerprints.
  • the temperature sensor 2810 is used to detect temperature.
  • the touch sensor 2811 may pass the detected touch operation to the application processor to determine the type of the touch event.
  • the visual output related to the touch operation can be provided through the display screen 290.
  • the ambient light sensor 2812 is used to sense the brightness of the ambient light.
  • the bone conduction sensor 2813 can acquire vibration signals.
  • the button 294 includes a power-on button, a volume button, and so on.
  • the button 294 may be a mechanical button. It can also be a touch button.
  • the motor 293 can generate vibration prompts. The motor 293 can be used for incoming call vibration notification, and can also be used for touch vibration feedback.
  • the indicator 292 can be an indicator light, which can be used to indicate the charging status, power change, or to indicate messages, missed calls, notifications, and so on.
  • the SIM card interface 295 is used to connect to the SIM card.
  • the electronic device 200 interacts with the network through the SIM card to implement functions such as call and data communication.
  • the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium may be included in the electronic device described in the foregoing embodiment; or it may exist alone without being assembled into the electronic device.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above. More specific examples of computer-readable storage media may include, but are not limited to: electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable removable Programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program, and the program may be used by or in combination with an instruction execution system, apparatus, or device.
  • the computer-readable storage medium can send, propagate, or transmit the program for use by or in combination with the instruction execution system, apparatus, or device.
  • the program code contained on the computer-readable storage medium can be transmitted by any suitable medium, including but not limited to: wireless, wire, optical cable, RF, etc., or any suitable combination of the above.
  • the computer-readable storage medium carries one or more programs, and when the above one or more programs are executed by an electronic device, the electronic device realizes the method described in the following embodiments.
  • each block in the flowchart or block diagram may represent a module, program segment, or part of the code, and the above-mentioned module, program segment, or part of the code contains one or more for realizing the specified logic function.
  • Executable instructions may also occur in a different order from the order marked in the drawings. For example, two blocks shown one after another can actually be executed substantially in parallel, and they can sometimes be executed in the reverse order, depending on the functions involved.
  • each block in the block diagram or flowchart, and the combination of blocks in the block diagram or flowchart can be implemented by a dedicated hardware-based system that performs the specified functions or operations, or can be implemented by It is realized by a combination of dedicated hardware and computer instructions.
  • the units described in the embodiments of the present disclosure may be implemented in software or hardware, and the described units may also be provided in a processor. Among them, the names of these units do not constitute a limitation on the unit itself under certain circumstances.
  • FIG. 3 schematically shows a flowchart of a file clustering method according to an exemplary embodiment of the present disclosure.
  • the document clustering method may include the following steps:
  • the terminal device can determine the data format of the file to determine whether the file is a picture file or a video file.
  • the terminal device can use a face detection algorithm to determine whether the picture file contains a human face. If it contains a human face, the file is determined to be the picture file to be clustered in this disclosure. If it does not contain a human face, it can be Discard the picture file, where “discard” means not to use the picture file as a picture file to be clustered in the present disclosure.
  • the terminal device can use a video-based face detection algorithm to determine whether the video file contains a human face. Similarly, if it contains a human face, the video file is determined to be the video file to be clustered in this disclosure, if If a human face is not included, the video file can be discarded, where discarding means not using the video file as a video file to be clustered in the present disclosure.
  • the terminal device may first obtain a candidate picture set and a candidate video set, the candidate picture set includes at least one candidate picture, and the candidate video set includes at least one candidate video.
  • face detection is performed on each candidate picture in the candidate picture set, and the candidate picture containing the face is determined as the above-mentioned picture file, which is used to perform the following method and process;
  • Face detection is performed on each candidate video, and the candidate video containing the face is determined as the above-mentioned video file, which is used to perform the following method and process.
  • the terminal device can directly acquire at least one picture file containing a face image and at least one video file containing a face image.
  • the process of specifically determining whether a picture file and a video file contain a face image can be performed by the front-end module before executing the solution of the present disclosure, and the present disclosure does not limit the function of the front-end module except for determining whether a face image is contained.
  • the facial features of all faces contained in the image file can be extracted.
  • some embodiments of the present disclosure may extract facial features of all faces contained in a picture file through a convolutional neural network.
  • the present disclosure does not limit the model structure and training process of the convolutional neural network.
  • methods based on geometric features, template matching methods, methods based on wavelet theory, methods based on hidden Markov models, methods based on support vector machines, etc. can also be used to implement human faces.
  • the feature extraction is not particularly limited in this disclosure.
  • At least one target object is determined, and facial features related to the at least one target object are extracted from the facial features of all faces as the facial features of the image file.
  • the target object may be an object determined in the shooting scene, and may also be referred to as a target shooting object.
  • the target object may be an object designated by the user, which is not limited in the present disclosure.
  • the user's selection operation of clicking on the subject during the camera preview can be obtained to determine the target subject, that is, when previewing the picture ,
  • the object corresponding to the position where the user clicks on the screen is the target shooting object.
  • the user can set the determination criteria of the target subject to determine the target subject. These determination criteria may include, but are not limited to, for example, the number of repetitions in the historical picture exceeds a predetermined number, and the height is less than 120cm children, subjects wearing hats, etc.
  • sub-pictures containing only the target object can be intercepted from the picture file, and used as pictures for later analysis, clustering, and display, and save them in the album. in. It is easy to see that the sub-picture does not contain the face image of the non-target object.
  • step S402 to step S406 in FIG. 4 the process of extracting the facial features of the video file by those skilled in the art will be described.
  • step S402 at least one key frame image is extracted from the video file.
  • the terminal device may perform image quality evaluation on each video frame image of the video file to obtain a quality score.
  • the quality score can be determined based on factors such as saturation and exposure.
  • the image quality of each video frame image can also be evaluated based on the Human Visual System (HVS).
  • HVS Human Visual System
  • the terminal device may obtain the quality threshold, compare the quality score of each video frame image with the quality threshold, and determine the video frame image with the quality score greater than the quality threshold as the key frame image.
  • the quality threshold can be set in advance, and the present disclosure does not limit its value.
  • the quality threshold may be set to 7.5.
  • the quality threshold may be determined in combination with the processing capability of the terminal device. For example, the higher the processing capability of the terminal device, the lower the quality threshold may be set to obtain multiple key frame images.
  • the video frame image may be extracted at a predetermined time interval as a key frame image, for example, the predetermined time interval may be, for example, 3 seconds.
  • the terminal device may extract only one video frame image from the video file through analysis means, as a key frame image, to represent the entire video file.
  • step S404 the facial features of each key frame image are extracted.
  • the facial features of all faces contained in the key frame image can be extracted.
  • a convolutional neural network can also be used to extract the face features of all faces contained in the key frame image. Face features, and the convolutional neural network used here can be the same as the convolutional neural network for determining the facial features of the picture file.
  • at least one target object is determined, and the facial features related to the at least one target object are extracted from the facial features of all faces as the facial features of the key frame image.
  • step S406 the facial features of the video file are determined according to the facial features of each key frame image.
  • the facial features of each key frame image can be used as the facial features of the video file. That is to say, in the example of extracting only one key frame image, all the facial features in the key frame image are taken as the facial features of the video file; in the example of extracting more than two key frame images, each key frame image is extracted. All facial features of the frame image are used as the facial features of the video file.
  • there is no restriction on the number of object categories corresponding to facial features that is, there is no restriction on the number of different human faces contained in the key frame image.
  • the number of key frame images extracted in the video file is at least two.
  • each key frame image can be adjusted according to the time point of each key frame image in the video file. Sorting, that is, you can sort by the order in which each key frame image appears when the video file is played to obtain an image sequence; next, you can determine the correlation between adjacent key frame images in the image sequence, and The key frame images whose correlation is less than the correlation threshold are removed from these key frame images to obtain the key frame image set; then, the person of the video file can be determined according to the facial features of each key frame image in the key frame image set Face features.
  • the facial feature of each key frame image in the set of key frame images can be used as the facial feature of the video file.
  • the correlation between the key frame images can be determined based on the image quality and the similarity of the target object.
  • image B and image E can be changed from Removed from the sequence.
  • image quality and similarity can also be combined, and the two weighting methods can be used to determine the correlation.
  • the facial features of the key frame images in the key frame image set to determine the facial features of the video file, specifically, first of all, for each key frame image in the key frame image set.
  • the face features are clustered to obtain a face feature set of at least one object category, where different faces correspond to different object categories.
  • the facial feature is determined as the facial feature of the video file.
  • the process of scoring a face can be determined based on the feature scoring result of the above-mentioned convolutional neural network.
  • a face scoring model can also be constructed by itself to score different facial features. This disclosure does not do this limit.
  • a video file leaves 10 key frame images after excluding weakly correlated images.
  • Each key frame image has three objects a, b, and c, which can be clustered according to different objects and can be divided into three categories.
  • the face score can be determined through analysis, and the face feature with the highest score in each cluster can be determined as the face feature of the video file.
  • facial features of each key frame image are determined, these facial features are clustered to distinguish different photographed objects. Then, for each subject, the face feature with the highest face score is determined from the clustering result, and used as the face feature of the video file.
  • step S36 can also convert such video files that do not contain human faces. Culling.
  • step S34 and step S36 of the exemplary embodiment of the present disclosure may be interchanged.
  • the at least one picture file and the at least one video file obtained in step S32 can be clustered by using the facial features determined in step S34 and step S36.
  • a machine learning algorithm such as K-means (K-means clustering algorithm, K-means clustering algorithm) may be used to implement the clustering process, which is not limited in the present disclosure.
  • clusters correspond to different shooting objects. That is to say, the picture file and the video file are divided according to the cluster ID, and the shooting object corresponds to the cluster ID one to one.
  • the present disclosure also provides a solution for editing the clustering result.
  • the terminal device can display the clustering result, specifically, it can be displayed in the album in modules; next, the terminal device can respond to the user's editing operation on the clustering result, edit the clustering result, and Save the edited result.
  • the editing operations may include, but are not limited to: modifying the name of the album, deleting one or more picture files, deleting one or more video files, adding comments, changing the size, and so on.
  • the terminal device can obtain at least one picture file; in step S514, the terminal device can extract the facial features of each picture file; in step S516, the terminal device performs feature filtering on the facial features to remove the image Face information that users are not interested in.
  • the terminal device can obtain at least one video file; in step S524, the terminal device can extract the key frame image of each video file; in step S526, the terminal device extracts facial features from the key frame image; In step S528, the terminal device may perform feature denoising, that is to say, remove the facial information that is not of interest to the user in the key frame image and the key frame image with poor correlation. In addition, perform individual clustering, that is, for Different photographed objects are clustered to determine the facial features of each photographed object with better quality as the facial features of the video file.
  • step S530 the picture file and the video file are clustered using the facial features, and the same cluster ID is assigned to the same object.
  • this exemplary embodiment also provides a file clustering device.
  • Fig. 6 schematically shows a block diagram of a file clustering apparatus according to an exemplary embodiment of the present disclosure.
  • the document clustering apparatus 6 may include a document acquisition module 61, a first feature extraction module 63, a second feature extraction module 65, and a document clustering module 67.
  • the file acquisition module 61 can be used to acquire at least one picture file and at least one video file; the first feature extraction module 63 can be used to extract facial features of each picture file; the second feature extraction module 65 can be used to extract The facial features of each video file; the file clustering module 67 can be used to compare the at least one image file and the at least one video file according to the facial features of each image file and the facial features of each video file. Perform clustering.
  • a hybrid clustering effect of pictures and videos can be realized, and pictures and videos can be classified according to shooting objects, which helps users to quickly determine pictures and videos containing the same shooting object. Perform operations such as viewing, sharing, and deleting.
  • the first feature extraction module 63 may be configured to perform: extract the facial features of all faces contained in the picture file; determine at least one target object, and extract from the facial features of all faces The facial features related to the at least one target object are used as the facial features of the picture file.
  • the second feature extraction module 65 may be configured to perform: extract at least one key frame image from the video file; extract the facial features of each key frame image; Face features, to determine the face features of the video file.
  • the process of extracting at least one key frame image from the video file by the second feature extraction module 65 may be configured to perform: perform image quality evaluation on each video frame image of the video file to obtain a quality score ; Obtain the quality threshold, compare the quality score of each video frame image with the quality threshold; determine the video frame image with the quality score greater than the quality threshold as the key frame image.
  • the process of extracting the facial features of each key frame image by the second feature extraction module 65 may be configured to perform: extract the facial features of all faces contained in the key frame image; determine at least one The target object, extracting the facial features related to the at least one target object from the facial features of all human faces, as the facial features of the key frame image.
  • the number of key frame images in the video file is more than two.
  • the second feature extraction module 65 determines the face of the video file according to the face feature of each key frame image.
  • the feature process can be configured to execute: sort each key frame image according to the time point of each key frame image in the video file to obtain an image sequence; determine the correlation between adjacent key frame images in the image sequence; Remove the key frame images whose correlation is less than the correlation threshold from two or more key frame images to obtain the key frame image set; determine the face characteristics of the video file according to the facial features of each key frame image in the key frame image set .
  • the process of determining the facial features of the video file by the second feature extraction module 65 using the facial features of the key frame images in the key frame image set may be configured to perform: Clustering the facial features of the key frame images to obtain the facial feature set of at least one object category; determine the score of each facial feature in the facial feature set of each object category; filter the facial feature with the highest score, As the face feature corresponding to the object category, the face feature corresponding to the object category is determined as the face feature of the video file.
  • the file obtaining module 61 may be configured to execute: obtain a candidate picture set and a candidate video set; perform face detection on each candidate picture in the candidate picture set, and select candidate pictures containing faces Determine as a picture file; perform face detection on each candidate video in the candidate video set, and determine the candidate video containing the face as a video file.
  • the document clustering device 7 may further include a result editing module 71.
  • the result editing module 71 may be configured to execute: display the results of the clustering; wherein, each cluster in the result of the clustering corresponds to a different face object category; The result of the class is edited and saved.
  • the example embodiments described here can be implemented by software, or can be implemented by combining software with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.) or on the network , Including several instructions to make a computing device (which may be a personal computer, a server, a terminal device, or a network device, etc.) execute the method according to the embodiments of the present disclosure.
  • a computing device which may be a personal computer, a server, a terminal device, or a network device, etc.
  • modules or units of the device for action execution are mentioned in the above detailed description, this division is not mandatory.
  • the features and functions of two or more modules or units described above may be embodied in one module or unit.
  • the features and functions of a module or unit described above can be further divided into multiple modules or units to be embodied.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Image Analysis (AREA)

Abstract

Procédé et appareil de regroupement de fichiers, support de stockage lisible par ordinateur et dispositif électronique, se rapportant au domaine technique des terminaux. Le procédé de regroupement de fichiers consiste : à acquérir au moins un fichier d'image et au moins un fichier vidéo (S32) ; à extraire une caractéristique faciale à partir de chaque fichier d'image (S34) ; à extraire une caractéristique faciale à partir de chaque fichier vidéo (S36) ; et à regrouper ledit fichier d'image et ledit fichier vidéo en fonction de la caractéristique faciale de chaque fichier d'image et de la caractéristique faciale de chaque fichier vidéo (S38).
PCT/CN2020/136176 2019-12-27 2020-12-14 Procédé et appareil de regroupement de fichiers, et support de stockage et dispositif électronique WO2021129444A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201911382475.0 2019-12-27
CN201911382475.0A CN111177086A (zh) 2019-12-27 2019-12-27 文件聚类方法及装置、存储介质和电子设备

Publications (1)

Publication Number Publication Date
WO2021129444A1 true WO2021129444A1 (fr) 2021-07-01

Family

ID=70623970

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/136176 WO2021129444A1 (fr) 2019-12-27 2020-12-14 Procédé et appareil de regroupement de fichiers, et support de stockage et dispositif électronique

Country Status (2)

Country Link
CN (1) CN111177086A (fr)
WO (1) WO2021129444A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111177086A (zh) * 2019-12-27 2020-05-19 Oppo广东移动通信有限公司 文件聚类方法及装置、存储介质和电子设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020165964A1 (en) * 2001-04-19 2002-11-07 International Business Machines Corporation Method and apparatus for providing a single system image in a clustered environment
CN105426515A (zh) * 2015-12-01 2016-03-23 小米科技有限责任公司 视频归类方法及装置
CN110175549A (zh) * 2019-05-20 2019-08-27 腾讯科技(深圳)有限公司 人脸图像处理方法、装置、设备及存储介质
CN111177086A (zh) * 2019-12-27 2020-05-19 Oppo广东移动通信有限公司 文件聚类方法及装置、存储介质和电子设备

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5149570B2 (ja) * 2006-10-16 2013-02-20 キヤノン株式会社 ファイル管理装置、ファイル管理装置の制御方法、及びプログラム
CN105631408B (zh) * 2015-12-21 2019-12-27 小米科技有限责任公司 基于视频的面孔相册处理方法和装置
CN110347876A (zh) * 2019-07-12 2019-10-18 Oppo广东移动通信有限公司 视频分类方法、装置、终端设备及计算机可读存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020165964A1 (en) * 2001-04-19 2002-11-07 International Business Machines Corporation Method and apparatus for providing a single system image in a clustered environment
CN105426515A (zh) * 2015-12-01 2016-03-23 小米科技有限责任公司 视频归类方法及装置
CN110175549A (zh) * 2019-05-20 2019-08-27 腾讯科技(深圳)有限公司 人脸图像处理方法、装置、设备及存储介质
CN111177086A (zh) * 2019-12-27 2020-05-19 Oppo广东移动通信有限公司 文件聚类方法及装置、存储介质和电子设备

Also Published As

Publication number Publication date
CN111177086A (zh) 2020-05-19

Similar Documents

Publication Publication Date Title
CN108594997B (zh) 手势骨架构建方法、装置、设备及存储介质
WO2021129514A1 (fr) Procédé, appareil, et système de traitement de réalité augmentée, support de stockage et dispositif électronique
US11889180B2 (en) Photographing method and electronic device
US20220415010A1 (en) Map construction method, relocalization method, and electronic device
KR101906827B1 (ko) 연속 사진 촬영 장치 및 방법
CN111083364A (zh) 一种控制方法、电子设备、计算机可读存储介质、芯片
CN111429517A (zh) 重定位方法、重定位装置、存储介质与电子设备
CN111815666B (zh) 图像处理方法及装置、计算机可读存储介质和电子设备
CN111917980B (zh) 拍照控制方法及装置、存储介质和电子设备
CN110600040B (zh) 声纹特征注册方法、装置、计算机设备及存储介质
JP7100824B2 (ja) データ処理装置、データ処理方法及びプログラム
CN111161176B (zh) 图像处理方法及装置、存储介质和电子设备
CN111784614A (zh) 图像去噪方法及装置、存储介质和电子设备
CN111311758A (zh) 增强现实处理方法及装置、存储介质和电子设备
CN111625670A (zh) 一种图片分组方法及设备
US20210350823A1 (en) Systems and methods for processing audio and video using a voice print
CN110650379A (zh) 视频摘要生成方法、装置、电子设备及存储介质
CN105635452A (zh) 移动终端及其联系人标识方法
CN111368127B (zh) 图像处理方法、装置、计算机设备及存储介质
WO2022199500A1 (fr) Procédé d'entraînement de modèle, procédé de reconnaissance de scène et dispositif associé
CN113574525A (zh) 媒体内容推荐方法及设备
CN111432245A (zh) 多媒体信息的播放控制方法、装置、设备及存储介质
CN105335714A (zh) 照片处理方法、装置和设备
CN111416996A (zh) 多媒体文件检测方法、播放方法、装置、设备及存储介质
WO2021129444A1 (fr) Procédé et appareil de regroupement de fichiers, et support de stockage et dispositif électronique

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20907967

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20907967

Country of ref document: EP

Kind code of ref document: A1