WO2020173379A1

WO2020173379A1 - Picture grouping method and device

Info

Publication number: WO2020173379A1
Application number: PCT/CN2020/076040
Authority: WO
Inventors: 蒋东生; 李洪敏
Original assignee: 华为技术有限公司
Priority date: 2019-02-27
Filing date: 2020-02-20
Publication date: 2020-09-03
Also published as: CN111625670A

Abstract

A picture grouping method and device, relating to the technical field of electronics. The picture grouping method and device are used for grouping face pictures and capable of clustering face pictures in an electronic device according to face images in different forms in a reference image set obtained by the electronic device, improving clustering accuracy. The method comprises: an electronic obtains at least one video; extracting a plurality of face image frames from the at least one video; performing, according to the plurality of face image frames, cluster processing on at least one face picture obtained the electronic device; and displaying at least one group according to a cluster processing result, each group respectively comprising at least one face picture of one user.

Description

A method and equipment for grouping pictures. This application requires the priority of a Chinese patent application filed with the State Intellectual Property Office of China, the application number is 201910147299.6, and the title of the invention is "a method and equipment for grouping pictures" on February 27, 2019. The entire content is incorporated into this application by reference. Technical field

The embodiments of the present application relate to the field of electronic technology, and in particular, to a method and device for grouping pictures.

Background technique

With the continuous development of terminal technology, more and more pictures are taken by users through mobile phones and other terminal devices, and some users even store thousands of pictures in their mobile phones. It often takes a lot of time and energy for the user to manually search for the target picture from a large number of pictures and to classify and manage the large number of pictures.

With the advancement of facial feature extraction technology, the use of face information to cluster different face pictures provides an effective picture clustering method, which is convenient for users to manage and find face pictures on mobile phones.

Current clustering methods mainly use face detection algorithms to detect faces and feature points in pictures (such as key points such as corners of eyes, nose tip, and mouth corners), extract facial features, and use facial features to cluster pictures. This method has high clustering accuracy for frontal face images, and low clustering accuracy for face images taken from other angles.

Summary of the invention

The embodiments of the present application provide a picture grouping method and device, which can cluster face pictures stored in an electronic device according to face images of different shapes in a reference image set obtained by the electronic device, and improve clustering accuracy.

To achieve the foregoing objectives, the following technical solutions are adopted in the embodiments of this application:

On the one hand, an embodiment of the present application provides a picture grouping method, which can be applied to an electronic device. The electronic device obtains at least one face picture. The method includes: the electronic device obtains at least one video. Then, the electronic device extracts multiple face image frames from at least one video. The electronic device performs clustering processing on at least one face picture according to multiple face image frames. After that, the electronic device 4 displays at least one group according to the clustering processing result, and each group includes at least one face picture of a user.

In this way, the electronic device can use multiple face image frames in at least one video as a priori information, and cluster the face images according to the multiple face image frames in the at least one video, thereby classifying the face images according to different users. Grouping makes the face pictures of the same user cluster into the same group, and improves the accuracy of face picture clustering and grouping.

In a possible design, the electronic device performs clustering processing on at least one face picture according to the multiple face image frames, including: the electronic device divides the multiple face image frames into at least one category, and each category corresponds to each category. Multiple face image frames of different shapes for a user. The electronic device performs clustering processing on at least one face picture according to the classification results of the multiple face image frames.

In this way, the electronic device can group the face pictures and the divided categories into a group according to the classification results, or regroup the face pictures into a group. When each category includes different face images of the same user, the electronic device can accurately group different face images of different face angles, expressions, etc., according to the different face images of different users, and improve the aggregation. The accuracy of clustering and grouping reduces the dispersion of clustering.

In another possible design, the electronic device classifying multiple face image frames into at least one category includes: the electronic device separately classifies the face image frames in each video into at least one category. If at least one of the first category The similarity between the face features of the first face image frame and the face features of the second face image frame in the second category is greater than or equal to the preset value, then the electronic device divides the first category with the second category Merged into the same category.

In other words, the electronic device can first classify the face image frames in the same video, and then merge the categories of the face image frames with greater similarity in different videos, that is, the faces of the same user in different videos The image frames are merged into the same category.

In another possible design, the electronic device divides the face image frames in each video into at least one category, including: the electronic device uses a face tracking algorithm to separately divide the time continuity in each video Multiple face image frames of the same user are divided into the same category.

Wherein, the face image frames of the same user with temporal continuity may be adjacent image frames. For example, the face images in the same video tracked by the electronic device through the face tracking algorithm have time continuity, meet the must-link constraint, are the face of the same user, and therefore can be classified into the same category.

In another possible design, each group also includes any one or a combination of any of the following: the video where the user’s face image frame is located, the video segment where the user’s face image frame is located, or the user’s At least one face image frame.

In this way, the electronic device can not only group face pictures, but also group videos, video segments and face image frames, etc., and jointly manage face pictures and videos, video segments, and face image frames to improve users Find efficiency and management experience.

In another possible design, at least one face picture of a user included in each group is a single photo or a group photo. In another possible design, obtaining at least one video by the electronic device includes: the electronic device obtains at least one video from a storage area of the electronic device.

Wherein, the at least one video may be a video previously captured, downloaded, copied, or obtained by other means by the electronic device.

In another possible design, the electronic device acquiring at least one video includes: the electronic device prompts the user to shoot a video including human face image frames. The electronic device records and generates at least one video after detecting the operation instructed by the user to shoot the video.

In this solution, the electronic device can record a video in real time for use in grouping face pictures.

In another possible design, the method further includes: the electronic device acquires at least one image group, and each image group includes multiple image frames of the same user in different forms. At least one image group includes any one or a combination of any of the following: moving pictures, a pre-photographed image group that includes different forms of the same user's face, an image group formed by multiple frames of images collected in real time during shooting preview, or An image group formed by multiple frames of images taken during continuous shooting. The electronic device extracting multiple face image frames from at least one video includes: the electronic device extracts multiple face image frames from at least one video and at least one image group.

In this way, the electronic device can classify face pictures not only according to videos, but also according to various image groups including user face picture frames of different forms, such as moving pictures.

In another possible design, after the electronic device detects the user's operation for viewing image classification, or after detecting the user's instruction to enable the face classification function, the electronic device treats at least one person according to multiple face image frames. The face pictures are clustered; and according to the clustering processing result, at least one group is displayed, and each group includes at least one face picture of a user.

In this way, the electronic device can display the grouping result of the face pictures in response to the user's instruction.

In another possible design, after opening the album, the electronic device automatically performs clustering processing on at least one face picture according to multiple face image frames; and according to the clustering processing result, displays at least one group, each Each group includes one At least one face image of each user.

In this solution, after opening the album, the electronic device can automatically perform clustering and display grouping processing.

In another possible design, the electronic device automatically performs clustering processing on at least one face image according to multiple face image frames when the power level is higher than the preset power level during the charging process; Then, according to the clustering processing result, at least one group is displayed, and each group includes at least one face picture of a user.

In this solution, the electronic device can automatically perform clustering and display grouping processing at different times.

In another possible design, when the electronic device displays at least one group, it may also prompt the user that the group is obtained by grouping face pictures according to face image frames in the video.

In this way, it is convenient for the user to know that the electronic device currently groups the face pictures according to the video.

On the other hand, an embodiment of the present application provides a method for grouping pictures, which is applied to an electronic device. The electronic device stores at least one video and at least one face picture. The method includes: the electronic device detects that the user is used for viewing After the image classification operation, at least one group is displayed. Wherein, each group includes at least one face picture of a user, and any one or a combination of any of the following: the video where the user's face image frame is located, and the video segment where the user's face image frame is located , Or at least one face image frame of the user.

On the other hand, an embodiment of the present application provides a picture grouping method, which is applied to an electronic device, and at least one face picture is stored on the electronic device. The method includes: the electronic device acquires at least one reference image set, the reference image set includes A series of face image frames with temporal continuity. Then, the electronic device performs clustering processing on at least one face image according to the face image frame. After that, the electronic device can display at least one group according to the clustering processing result, and each group includes at least one face picture of a user.

In a possible design, the reference image set may be a face image frame in a video; a face image frame in an animation; or a collection of time-continuous multi-frame images collected in real time in the shooting preview state, A collection of multi-frame images with time continuity captured in the capture mode, a collection of multi-frame images with time continuity captured by an electronic device during continuous shooting; or a user preset including different forms of the same user Face image group, etc.

On the other hand, an embodiment of the present application provides a picture grouping method, which is applied to an electronic device, and at least one picture is stored on the electronic device. The method includes: the electronic device acquires at least one video, and the video includes image frames; Perform clustering processing on at least one picture; according to the clustering processing result, at least one group is displayed, and each group includes at least one picture of an entity. For example, the entity may include human faces, dogs, cats, houses, etc.

On the other hand, an embodiment of the present application provides a picture grouping device, which is included in an electronic device, and the device has a function of implementing the behavior of the electronic device in any of the foregoing aspects and possible implementation manners. This function can be realized by hardware, or by hardware executing corresponding software. The hardware or software includes at least one module or unit corresponding to the above-mentioned functions. For example, acquiring modules or units, extracting modules or units, clustering modules or units, and displaying modules or units.

In another aspect, an embodiment of the present application provides an electronic device, including at least one processor and at least one memory. The at least one memory is coupled with at least one processor, and the at least one memory is used to store computer program code, and the computer program code includes computer instructions. When the at least one processor executes the computer instructions, the electronic device is made to perform any possible implementation of the foregoing aspects. The picture grouping method in.

On the other hand, an embodiment of the present application provides a computer storage medium, including computer instructions, which when the computer instructions run on an electronic device, cause the electronic device to execute the picture grouping method in any of the possible implementations of the foregoing aspects.

In another aspect, an embodiment of the present application provides a computer program product, which when the computer program product runs on a computer, causes the computer to execute the picture grouping method in any one of the possible implementations of the foregoing aspects. Description of the drawings

FIG. 1 is a schematic structural diagram of an electronic device provided by an embodiment of the application;

Figure 2 is a schematic diagram of a set of interfaces provided by an embodiment of the application;

FIG. 3 is a schematic diagram of an interface provided by an embodiment of the application;

FIG. 4 is a schematic diagram of another interface provided by an embodiment of the application;

Figure 5 is a schematic diagram of another interface provided by an embodiment of the application;

FIG. 6 is a schematic diagram of another interface provided by an embodiment of the application;

FIG. 7A is a schematic diagram of another interface provided by an embodiment of the application;

FIG. 7B is a schematic diagram of a video and a face image frame in the video provided by an embodiment of the application;

FIG. 8A is a schematic diagram of a classification effect provided by an embodiment of this application;

FIG. 8B is a schematic diagram of another classification effect provided by an embodiment of the application;

FIG. 9A is a schematic diagram of another interface provided by an embodiment of the application;

FIG. 9B is a schematic diagram of another interface provided by an embodiment of the application;

FIG. 9C is a schematic diagram of another interface provided by an embodiment of the application;

FIG. 10 is a schematic diagram of another set of interfaces provided by an embodiment of the application;

FIG. 11 is a schematic diagram of another set of interfaces provided by an embodiment of the application;

FIG. 12 is a schematic diagram of another set of interfaces provided by an embodiment of the application;

FIG. 13 is a schematic diagram of another set of interfaces provided by an embodiment of the application;

FIG. 14 is a schematic diagram of another set of interfaces provided by an embodiment of the application;

FIG. 15 is a schematic diagram of another interface provided by an embodiment of the application;

FIG. 16 is a schematic diagram of another set of interfaces provided by an embodiment of the application;

FIG. 17 is a schematic diagram of face image frames in an image group provided by an embodiment of the application;

FIG. 18 is a schematic diagram of another set of interfaces provided by an embodiment of the application;

FIG. 19A is a schematic diagram of another interface provided by an embodiment of this application;

FIG. 19B is a schematic diagram of another interface provided by an embodiment of the application;

FIG. 20 is a schematic diagram of a face image frame in another image group provided by an embodiment of the application;

FIG. 21 is a schematic diagram of another interface provided by an embodiment of the application;

FIG. 22 is a schematic diagram of another set of interfaces provided by an embodiment of the application;

FIG. 23 is a flowchart of a method for grouping pictures according to an embodiment of this application;

FIG. 24 is a schematic structural diagram of another electronic device provided by an embodiment of the application.

detailed description

The technical solutions in the embodiments of the present application will be described below in conjunction with the drawings in the embodiments of the present application. Wherein, in the description of the embodiments of the present application, unless otherwise specified, "/" means or, for example, A/B can mean A or B; "and/or" in this document is only a description of related objects The association relationship of indicates that there can be three relationships, for example, A and/or B can indicate: A, B, and AB. In addition, in the description of the embodiments of the present application, "multiple," refers to two or more than two.

The embodiment of the present application provides a method for grouping pictures, which can be applied to electronic devices. The electronic device may cluster the face pictures (that is, pictures containing the face images) stored on the electronic device according to the reference image set. The reference image set includes multiple face images of different shapes with temporal continuity. Among them, the form here can include the angle of the face (such as side face, face up or down, etc.), facial expressions (such as laughing, crying or funny expressions, etc.), whether to have a beard, whether to wear Sunglasses, whether the face is covered by a hat, whether the face is covered by hair, etc. Different from the video or image group in the reference image set, the face pictures stored on the electronic device refer to static pictures that exist independently.

The reference image set may include a collection of a series of image frames with temporal continuity in the video acquired by the electronic device. For example, the video may be a video taken by a camera of an electronic device, a video obtained by an electronic device from an application (application, App) (such as Douyin, Kuaishou, Meipai, YOYO, etc.), and a video obtained by an electronic device from other devices , Or the video saved during the video call, etc.

The reference image set may also include animated images (Gif) acquired by the electronic device, and the animated images include multiple frames of images with temporal continuity.

In addition, the reference image set may also include an image group composed of a series of images with time continuity acquired by the electronic device. For example, the image group may be a collection of time-continuous multi-frame images collected in real time by the electronic device in the shooting preview state. For another example, the image group may be a collection of time-continuous multi-frame images captured by the electronic device in the capture mode (the electronic device or the user may designate one of the images as the captured image). For another example, the image group may be a collection of multiple frames of images with time continuity captured by the electronic device during continuous shooting. For another example, the image group may be a user-preset image group including different forms of faces of the same user (for example, a pre-photographed front face image, side face image, and laughing face image of the same user). One or more of the image group), etc.

Since the reference image set usually includes multiple face images of the same user in different forms, the electronic device can use the face images in the reference image set as prior information. According to the face images of different forms in the reference image set, the electronic device The stored pictures are clustered, so that face pictures of different shapes can also be accurately clustered, and the clustering accuracy of face pictures is improved.

Among them, the electronic device may be a mobile phone, a tablet computer, a wearable device, a vehicle-mounted device, an augmented reality (AR)/virtual reality (VR) device, a notebook computer, an ultra-mobile personal computer (ultra-mobile personal computer). On electronic devices such as computers, UMPCs), netbooks, and personal digital assistants (personal digital assistants, PDAs), the embodiments of this application do not impose any restrictions on the specific types of electronic devices.

Exemplarily, FIG. 1 shows a schematic structural diagram of an electronic device 100. The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, and an antenna 2. , Mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, earphone jack 170D, sensor module 180, button 190, motor 191, indicator 192, camera 193, display 194, and Subscriber identification module (subscriber identification module, SIM) card interface 195, etc. The sensor module 180 may include pressure sensor 180A, gyroscope sensor 180B, air pressure sensor 180C, magnetic sensor 180D, acceleration sensor 180E, distance sensor 180F, proximity light sensor 180G, fingerprint sensor 180H, temperature sensor 180J, touch sensor 180K, ambient light Sensor 180L, bone conduction sensor 180M, etc.

It can be understood that the structure illustrated in the embodiment of the present application does not constitute a specific limitation on the electronic device 100. In other embodiments of the present application, the electronic device 100 may include more or fewer components than shown, or combine certain components, or disassemble certain components, or arrange different components. The components shown in the figure can be implemented in hardware, software or a combination of software and hardware.

The processor 110 may include one or more processing units. For example, the processor 110 may include an application processor (AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processor (neural-network processing unit, NPU) etc. Among them, different processing units may be independent devices, or may be integrated in one or more processors.

Wherein, the controller can be the nerve center and command center of the electronic device 100. The controller can generate operation control signals according to the command operation code and timing signals to complete the control of fetching and executing commands.

A memory may also be provided in the processor 110 to store instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory can store instructions or data that the processor 110 has just used or used cyclically. If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Repeated access is avoided, the waiting time of the processor 110 is reduced, and the efficiency of the system is improved.

In some embodiments, the processor 110 may include one or more interfaces. The interface may include an integrated circuit (inter-integrated circuit, I2C) interface, an integrated circuit audio (inter-integrated circuit sound, I2S) interface, a pulse code modulation (PCM) interface, and a universal asynchronous transmitter (universal asynchronous transmitter) interface. receiver/transmitter, UART) interface, mobile industry processor interface (MIPI), general-purpose input/output (GPIO) interface, subscriber identity module (SIM) interface, and / Or universal serial bus (USB) interface, etc.

The I2C interface is a two-way synchronous serial bus that includes a serial data line (SDA) and a derail clock line (SCL). In some embodiments, the processor 110 may include multiple sets of I2C buses. The processor 110 may be coupled to the touch sensor 180K, charger, flash, camera 193, etc., through different I2C bus interfaces. For example, the processor 110 may couple the touch sensor 180K through the I2C interface, so that the processor 110 and the touch sensor 180K communicate through the I2C bus interface to realize the touch function of the electronic device 100.

The I2S interface can be used for audio communication. In some embodiments, the processor 110 may include multiple sets of I2S buses. The processor 110 may be coupled with the audio module 170 through an I2S bus to implement communication between the processor 110 and the audio module 170. In some embodiments, the audio module 170 may transmit audio signals to the wireless communication module 160 through the I2S interface, so as to realize the function of answering calls through the Bluetooth headset.

The PCM interface can also be used for audio communication to sample, quantize and encode analog signals. In some embodiments, the audio module 170 and the wireless communication module 160 may be coupled through a PCM bus interface.

In some embodiments, the audio module 170 may also transmit audio signals to the wireless communication module 160 through the PCM interface, so as to realize the function of answering calls through the Bluetooth headset. Both I2S interface and PCM interface can be used for audio communication.

The UART interface is a universal serial data bus used for asynchronous communication. The bus can be a two-way communication bus. It converts the data to be transmitted between serial communication and parallel communication.

In some embodiments, the UART interface is generally used to connect the processor 110 and the wireless communication module 160. For example, the processor 110 communicates with the Bluetooth module in the wireless communication module 160 through the UART interface to realize the Bluetooth function. In some embodiments, the audio module 170 can transmit audio signals to the wireless communication module 160 through the UART interface, so as to realize the function of playing music through the Bluetooth headset.

The MIPI interface can be used to connect the processor 110 with the display screen 194, the camera 193 and other peripheral devices. MIPI interface includes camera serial interface (CSI), display serial interface (DSI) and so on. In some embodiments, the processor 110 and the camera 193 communicate through a CSI interface to implement the shooting function of the electronic device 100. The processor 110 and the display screen 194 communicate through the DSI interface to realize the display function of the electronic device 100.

The GPI0 interface can be configured through software. The GPI0 interface can be configured as a control signal or as a data signal. In some embodiments, the GPI0 interface may be used to connect the processor 110 with the camera 193, the display screen 194, the wireless communication module 160, the audio module 170, the sensor module 180, and so on. GPI0 interface can also be configured as I2C interface, I2S Interface, UART interface, MIPI interface, etc.

The USB interface 130 is an interface that complies with the USB standard specification, and specifically may be a Mini USB interface, a Micro USB interface, a USB Type C interface, and so on. The USB interface 130 can be used to connect a charger to charge the electronic device 100, and can also be used to transfer data between the electronic device 100 and peripheral devices. It can also be used to connect headphones and play audio through headphones. This interface can also be used to connect other electronic devices, such as AR devices.

It can be understood that the interface connection relationship between the modules illustrated in the embodiment of the present application is merely illustrative, and does not constitute a structural limitation of the electronic device 100. In other embodiments of the present application, the electronic device 100 may also adopt different interface connection modes in the foregoing embodiments, or a combination of multiple interface connection modes.

The charging management module 140 is used to receive charging input from the charger. Among them, the charger can be a wireless charger or a wired charger. In some wired charging embodiments, the charging management module 140 may receive the charging input of the wired charger through the USB interface 130. In some embodiments of wireless charging, the charging management module 140 may receive the wireless charging input through the wireless charging coil of the electronic device 100. While the charging management module 140 charges the battery 142, the power management module 141 can also supply power to electronic devices.

The power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140, and supplies power to the processor 110, the internal memory 121, the external memory, the display screen 194, the camera 193, and the wireless communication module 160. The power management module 141 can also be used to monitor parameters such as battery capacity, battery cycle times, and battery health status (leakage, impedance).

In some other embodiments, the power management module 141 may also be provided in the processor 110. In other embodiments, the power management module 141 and the charging management module 140 may also be provided in the same device.

The wireless communication function of the electronic device 100 can be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor, and the baseband processor.

Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals. Each antenna in the electronic device 100 can be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization. For example: Antenna 1 can be multiplexed as a diversity antenna for wireless LAN. In other embodiments, the antenna can be used in combination with a tuning switch.

The mobile communication module 150 can provide a wireless communication solution including 2G/3G/4G/5G and the like applied to the electronic device 100. The mobile communication module 150 may include at least one filter, a switch, a power amplifier, a low noise amplifier (LNA), etc. The mobile communication module 150 can receive electromagnetic waves by the antenna 1, and perform processing such as filtering and amplifying the received electromagnetic waves, and then transmitting them to the modem processor for demodulation. The mobile communication module 150 can also amplify the signal modulated by the modem processor, and convert it into electromagnetic waves through the antenna 1 for radiation.

In some embodiments, at least part of the functional modules of the mobile communication module 150 may be provided in the processor 110. In some embodiments, at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be provided in the same device.

The modem processor may include a modulator and a demodulator. Among them, the modulator is used to modulate the low frequency baseband signal to be sent into a medium and high frequency signal. The demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal. Then the demodulator transmits the demodulated low-frequency baseband signal to the baseband processor for processing. After the low-frequency baseband signal is processed by the baseband processor, it is passed to the application processor. The application processor outputs sound signals through audio equipment (not limited to speakers 170A, receiver 170B, etc.), or displays images or videos through the display 194. In some embodiments, the modem processor may be an independent device. In other embodiments, the modem processor may be independent of the processor 110 and be provided in the same device as the mobile communication module 150 or other functional modules.

The wireless communication module 160 can provide applications on the electronic device 100 including wireless local area network (WLAN). networks, WLAN) (such as wireless fidelity (Wi-Fi) networks), Bluetooth (bluetooth, BT), global navigation satellite system (GNSS), frequency modulation (FM), close range Wireless communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions. The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be sent from the processor 110, perform frequency modulation, amplify it, and convert it into electromagnetic waves through the antenna 2 for radiation.

In some embodiments, the antenna 1 of the electronic device 100 is coupled with the mobile communication module 150, and the antenna 2 is coupled with the wireless communication module 160, so that the electronic device 100 can communicate with the network and other devices through wireless communication technology. Wireless communication technologies can include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), and broadband code division. Multiple access (wideband code division multiple access, WCDMA), time-division code division multiple access (TD-SCDMA), long term evolution (LTE), BT, GNSS, WLAN, NFC, FM , And/or IR technology, etc. GNSS can include global positioning system (GPS), global navigation satellite system (GLONASS), Beidou navigation satellite system (BDS), quasi-zenith satellite system (QZSS) and/or satellite based augmentation systems (SBAS) _»

The electronic device 100 implements a display function through GRJ, a display screen 194, and an application processor. The GPU is a microprocessor for image processing, connected to the display 194 and the application processor. The GPU is used to perform mathematical and geometric calculations, and is used for graphics rendering. The processor 110 may include one or more GPUs, which execute program instructions to generate or change display information.

The display screen 194 is used to display images, videos, etc. The display screen 194 includes a display panel. The display panel can adopt liquid crystal display (LCD), organic light-emitting diode (OLED), active matrix organic light-emitting diode or active-matrix organic light-emitting diode (active-matrix organic light-emitting diode). AMOLED), flexible light-emitting diode (FLED), Miniled, MicroLed, Micro-oLed, quantum dot light emitting diodes (QLED), etc. In some embodiments, the electronic device 100 may include one or N display screens 194, and N is a positive integer greater than one.

The electronic device 100 can implement shooting functions through an ISP, a camera 193, a video codec, a GPU, a display screen 194, and an application processor.

The ISP is used to process the data fed back from the camera 193. For example, when taking a picture, the shutter is opened, the light is transmitted to the photosensitive element of the camera through the lens, the light signal is converted into an electrical signal, and the photosensitive element of the camera transmits the electrical signal to the ISP for processing, which is converted into an image visible to the naked eye. ISP can also optimize the image noise, brightness, and skin color. The ISP can also optimize the exposure, color temperature and other parameters of the shooting scene. In some embodiments, the ISP may be provided in the camera 193.

The camera 193 is used to capture still images or videos. The object generates an optical image through the lens and projects it to the photosensitive element. The photosensitive element can be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, and then transfers the electrical signal to the ISP to convert it into a digital image signal. ISP outputs digital image signals to DSP for processing. DSP converts digital image signals into standard RGB, YUV and other formats. In some embodiments, the electronic device 100 may include one or N cameras 193, and N is a positive integer greater than one.

The digital signal processor is used to process digital signals. In addition to processing digital image signals, it can also process other digital signals. For example, when the electronic device 100 selects the frequency point, the digital signal processor is used to Fourier transform the frequency point energy. Wait.

Video codecs are used to compress or decompress digital video. The electronic device 100 may support one or more video codecs. In this way, the electronic device 100 can play or record videos in multiple encoding formats, such as: moving picture experts group (MPEG)1, MPEG2, MPEG3, MPEG4, etc.

NPU is a neural-network (NN) computing processor. By borrowing the structure of biological neural network, for example, borrowing the transfer mode between human brain neurons, it can quickly process input information, and it can also continuously self-learn. Through the NPU, applications such as intelligent cognition of the electronic device 100 can be implemented, such as: image recognition, face recognition, voice recognition, text understanding, and so on.

In the embodiment of the present application, the NPU or other processor may be used to perform face detection, face tracking, face feature extraction, and image clustering operations on the face image in the video stored by the electronic device 100; 100 performs operations such as face detection and facial feature extraction on the face images in the stored pictures, and clusters the pictures stored in the electronic device 100 according to the face features of the pictures and the clustering results of the face images in the video.

The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device 100. The external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. For example, save music, video and other files in an external memory card.

The internal memory 121 may be used to store computer executable program code, and the executable program code includes instructions. The processor 110 executes various functional applications and data processing of the electronic device 100 by running instructions stored in the internal memory 121. The internal memory 121 may include a program storage area and a data storage area. Among them, the storage program area can store an operating system, at least one application program (such as a sound playback function, an image playback function, etc.) required by at least one function. The data storage area can store data (such as audio data, phone book, etc.) created during the use of the electronic device 100.

In addition, the internal memory 121 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash storage (UFS), and the like.

The electronic device 100 can implement audio functions through an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, a headphone interface 170D, and an application processor. For example, music playback, recording, etc.

The audio module 170 is used to convert digital audio information into an analog audio signal for output, and also used to convert an analog audio input into a digital audio signal. The audio module 170 can also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be disposed in the processor 110, or some functional modules of the audio module 170 may be disposed in the processor 110.

The loudspeaker 170A, also called "La ^ 8", is used to convert audio electrical signals into sound signals. The electronic device 100 can listen to music through the speaker 170A, or listen to a hands-free call.

The receiver 170B, also called a "handset", is used to convert audio electrical signals into sound signals. When the electronic device 100 answers a phone call or voice message, it can receive the voice by bringing the receiver 170B close to the human ear.

Microphone 170C, also called "microphone", "microphone", is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can make a sound by approaching the microphone 170C with a human mouth, and input the sound signal to the microphone 170C. The electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C, which can not only collect sound signals, but also implement noise reduction functions. In other embodiments, the electronic device 100 may also be provided with three, four or more microphones 170C, which can collect sound signals, reduce noise, identify sound sources, and realize directional recording functions.

The earphone interface 170D is used to connect wired earphones. The earphone interface 170D can be a USB interface 130, or a 3.5mm open mobile terminal platform (OMTP) standard interface, and the American cellular telecommunications industry association of the USA (CTIA) standard interface . The pressure sensor 180A is used to sense the pressure signal and can convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor 180A may be provided on the display screen 194. There are many types of pressure sensors 180A, such as resistive pressure sensors, inductive pressure sensors, and capacitive pressure sensors. The capacitive pressure sensor may include at least two parallel plates with conductive material. When a force is applied to the pressure sensor 180A, the capacitance between the electrodes changes. The electronic device 100 determines the intensity of the pressure according to the change in capacitance. When a touch operation acts on the display 194, the electronic device 100 detects the intensity of the touch operation according to the pressure sensor 180A. The electronic device 100 may also calculate the touched position according to the detection signal of the pressure sensor 180A.

In some embodiments, touch operations acting on the same touch position but with different touch operation strengths may correspond to different operation instructions. For example: when a touch operation whose intensity is less than the first pressure threshold is applied to the short message application icon, the instruction to view the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold acts on the short message application icon, an instruction to create a new short message is executed.

The gyro sensor 180B can be used to determine the movement posture of the electronic device 100. In some embodiments, the angular velocity of the electronic device 100 around three axes (that is, X, y, and z axes) can be determined through the gyro sensor 180B. The gyro sensor 180B can be used for shooting anti-shake. Exemplarily, when the shutter is pressed, the gyro sensor 180B detects the shake angle of the electronic device 100, calculates the distance that the lens module needs to compensate according to the angle, and allows the lens to counteract the shake of the electronic device 100 through a reverse movement to achieve anti-shake. The gyro sensor 180B can also be used for navigation and somatosensory game scenes.

The air pressure sensor 180C is used to measure air pressure. In some embodiments, the electronic device 100 calculates the altitude based on the air pressure value measured by the air pressure sensor 180C, and assists positioning and navigation.

The magnetic sensor 180D includes a Hall sensor. The electronic device 100 can use the magnetic sensor 180D to detect the opening and closing of the flip holster. In some embodiments, when the electronic device 100 is a flip machine, the electronic device 100 can detect the opening and closing of the flip according to the magnetic sensor 180D. Then, according to the detected opening and closing state of the leather case or the opening and closing state of the flip cover, features such as automatic flip cover unlocking are set.

The acceleration sensor 180E can detect the magnitude of the acceleration of the electronic device 100 in various directions (generally three axes). When the electronic device 100 is stationary, the magnitude and direction of gravity can be detected. It can also be used to identify the posture of electronic equipment, and can be used in applications such as horizontal and vertical screen switching, pedometers and so on.

Distance sensor 180F, used to measure distance. The electronic device 100 can measure the distance by infrared or laser. In some embodiments, when shooting a scene, the electronic device 100 may use the distance sensor 180F to measure the distance to achieve fast focusing.

The proximity light sensor 180G may include, for example, a light emitting diode (LED) and a light detector, such as a photodiode. The light emitting diode may be an infrared light emitting diode. The electronic device 100 emits infrared light to the outside through the light emitting diode. The electronic device 100 uses a photodiode to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the electronic device 100. When insufficient reflected light is detected, the electronic device 100 can determine that there is no object near the electronic device 100. The electronic device 100 can use the proximity light sensor 180G to detect that the user holds the electronic device 100 close to the ear to talk, so as to automatically turn off the screen to save power. The proximity light sensor 180G can also be used in leather case mode, and the pocket mode will automatically unlock and lock the screen.

The ambient light sensor 180L is used to sense the brightness of the ambient light. The electronic device 100 can adjust the brightness of the display screen 194 automatically according to the perceived brightness of the ambient light. The ambient light sensor 180L can also be used to automatically adjust the white balance when taking pictures. The ambient light sensor 180L can also cooperate with the proximity light sensor 180G to detect whether the electronic device 100 is in the pocket to prevent accidental touch.

The fingerprint sensor 180H is used to collect fingerprints. The electronic device 100 can use the collected fingerprint characteristics to realize fingerprint unlocking, access to the application lock, fingerprint photographs, fingerprint answering calls, and so on.

The temperature sensor 180J is used to detect temperature. In some embodiments, the electronic device 100 uses a temperature sensor 180J Detect the temperature, execute the temperature processing strategy. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold, the electronic device 100 reduces the performance of the processor located near the temperature sensor 180J, so as to reduce power consumption and implement thermal protection. In other embodiments, when the temperature is lower than another threshold, the electronic device 100 heats the battery 142 to avoid abnormal shutdown of the electronic device 100 due to low temperature. In some other embodiments, when the temperature is lower than another threshold, the electronic device 100 boosts the output voltage of the battery 142 to avoid abnormal shutdown caused by low temperature.

Touch sensor 180K, also called "touch panel". The touch sensor 180K can be set on the display screen 194, and the touch screen is composed of the touch sensor 180K and the display screen 194, which is also called “touch screen”. The touch sensor 180K is used to detect touch operations on or near it. The touch sensor can transmit the detected touch operation to the application processor to determine the type of touch event. The display screen 194 can provide visual output related to the touch operation. In other embodiments, the touch sensor 180K may also be disposed on the surface of the electronic device 100, which is different from the position of the display screen 194.

The bone conduction sensor 180M can acquire vibration signals. In some embodiments, the bone conduction sensor 180M can obtain the vibration signal of the vibrating bone mass of the human voice. The bone conduction sensor 180M can also contact the human pulse and receive the blood pressure pulse signal.

In some embodiments, the bone conduction sensor 180M may also be provided in the earphone, combined with the bone conduction earphone. The audio module 170 may parse the voice signal based on the vibration signal of the vibrating bone block of the voice obtained by the bone conduction sensor 180M, and implement the voice function. The application processor can analyze the heart rate information based on the blood pressure beating signal obtained by the bone conduction sensor 180M, and realize the heart rate detection function.

The button 190 includes a power button, a volume button, and so on. The button 190 may be a mechanical button. It can also be a touch button. The electronic device 100 can receive key input, and generate key signal input related to user settings and function control of the electronic device 100.

The motor 191 can generate vibration prompts. The motor 191 can be used for incoming call vibration notification, and can also be used for touch vibration feedback. For example, touch operations applied to different applications (such as photographing, audio playback, etc.) can correspond to different vibration feedback effects. Acting on touch operations in different areas of the display screen 194, the motor 191 can also correspond to different vibration feedback effects. Different application scenarios (for example: time reminder, receiving information, alarm clock, game, etc.) can also correspond to different vibration feedback effects. The touch vibration feedback effect can also support customization.

The indicator 192 may be an indicator light, which may be used to indicate a charging state, a change in power, and may also be used to indicate messages, missed calls, notifications, and the like.

The SIM card interface 195 is used to connect to the SIM card. The SIM card can be inserted into the SIM card interface 195 or pulled out from the SIM card interface 195 to achieve contact and separation with the electronic device 100. The electronic device 100 may support one or N SIM card interfaces, and N is a positive integer greater than one. SIM card interface 195 can support Nano SIM card, Micro SIM card, SIM card and so on. The same SIM card interface 195 can insert multiple cards at the same time. The types of multiple cards can be the same or different. The SIM card interface 195 can also be compatible with different types of SIM cards. The SIM card interface 195 is also compatible with external memory cards. The electronic device 100 interacts with the network through the SIM card to realize functions such as call and data communication. In some embodiments, the electronic device 100 adopts an eSIM, that is, an embedded SIM card. The eSIM card can be embedded in the electronic device 100 and cannot be separated from the electronic device 100.

The following mainly takes the electronic device 100 as a mobile phone as an example to describe the image grouping method provided in the embodiment of the present application. The above-mentioned reference image set such as video or image group usually records a process of continuous and dynamic change in time. Therefore, the reference image set may often include a series of different forms of face images of the same user in the process of dynamic change. Therefore, the mobile phone can first track the face images in each reference image set, and obtain people of the same user in each reference image set with time continuity, different angles, different expressions, different decorations, and different hairstyles. Face map The face images in each reference image set are automatically grouped into one category, and the reference image set clustering result is obtained; then, according to the facial features in the face image and the face image in the reference image set clustering result According to the similarity of face features, the face images are clustered, so that face images of different shapes can also be clustered correctly, and the clustering accuracy of face images is improved.

In an embodiment of the present application, if the mobile phone obtains the reference image set, the reference image set clustering process and the face image clustering process can be automatically performed.

In another embodiment of the present application, if the mobile phone obtains the reference image set, it can automatically perform the reference image set clustering process; after detecting the user's instruction to classify portraits or the user's instruction to turn on the portrait classification function, the reference The clustering result of the image set clusters the face pictures stored on the mobile phone.

Exemplarily, when the mobile phone detects that the user clicks the album icon 201 shown in (a) in FIG. 2, the mobile phone opens the album and displays the interface shown in (b) in FIG. 2. After the mobile phone detects that the user clicks on the control 202 shown in Figure 2 (b), it displays the portrait classification control 203 shown in Figure 2 (c); after the mobile phone detects that the user clicks on the control 203, It is determined that the user's instruction to classify portraits is detected, or the user instructs to enable the portrait classification function. Or, the mobile phone displays the interface as shown in (d) in Figure 2 after opening the album. After the mobile phone detects that the user clicks the discovery control 204 shown in (d) in Figure 2, it can display (e) in Figure 2 ) Shows the portrait classification control 205, the detail control 206 and so on. After the mobile phone detects that the user clicks on the control 206, it displays the portrait classification function description 207 as shown in (f) in Figure 2 so that the user can understand the specific content of the function. After the mobile phone detects that the user clicks on the control 205, it can determine that it detects the user's instruction to classify portraits, or the user instructs to enable the portrait classification function.

In another embodiment of the present application, if the mobile phone obtains the reference image set, and detects the operation of the user instructing portrait classification or the user instructing to open the portrait classification function, clustering of the reference image set and face picture clustering is performed.

In another embodiment of the present application, the user can also choose whether to classify the face pictures according to the reference image set.

Exemplarily, after detecting that the user clicks the control 202 shown in (b) in FIG. 2, the mobile phone may display the interface shown in FIG. 3. If the mobile phone detects that the user clicks on the control 302, it indicates that the user chooses to classify the face pictures according to the reference image set; if the mobile phone detects that the user clicks on the control 301, it indicates that the user chooses not to use the reference image set, but directly based on the face picture Portrait classification. As another example, the user can instruct the mobile phone to classify the face pictures according to the reference image set through voice or preset mobile phone.

In addition, the user can also set the content of the reference image set. For example, referring to Figure 4, the mobile phone can set a reference image set including insiders such as videos obtained by the mobile phone.

In another embodiment of the present application, after the mobile phone opens the album for the first time or every time, or the mobile phone detects that the user has instructed a portrait classification operation, the mobile phone may also prompt the user whether to cluster face pictures according to the reference image set. Exemplarily, after the mobile phone detects that the user clicks on the control 203 or the control 205, referring to FIG. 5, the mobile phone may prompt the user through a prompt box 501.

In another embodiment of the present application, after opening the album on the mobile phone, or detecting that the user chooses to cluster face pictures according to the reference image set, if the mobile phone does not obtain the reference image set, the user may be prompted to add one or more A reference image set, the reference image set includes the face image of the user corresponding to the face picture, so that the mobile phone can more accurately cluster the face picture according to the reference image set.

For example, referring to FIG. 6, the mobile phone can prompt the user to shoot (or download, copy) a video about the target user in the face picture through the prompt box 601. For another example, the mobile phone can prompt the user to let the target user play a game that can collect the face image of the target user, such as YOYO Xuanwu, and the mobile phone can record a video about the target user during the game. For another example, the mobile phone may prompt the user to add an image group, which may be the same image group selected by the user from the face pictures. Multiple face pictures of users in different forms. Then, the mobile phone can cluster the face pictures of the target user according to the acquired video or image group and other reference image sets.

In the following, taking the reference image set as a video as an example, clustering the reference image set on the mobile phone, and clustering the face pictures according to the clustering result of the reference image set as an example for description.

A large number of face pictures and videos can be stored on the mobile phone. The face picture may be taken by the user through the camera of the mobile phone, or downloaded through the Internet or Ba Mao, or obtained through screenshots, or copied from other devices, or obtained by other means. The video can be a video taken by the user through the camera of a mobile phone, or a video downloaded through the network or an App, or a video saved during a video call, or a video copied from other devices, or a video obtained by other means. The video and face picture may include face images of the user or other users (such as relatives, friends, celebrities, etc.).

The video records a continuous and dynamic process, so the video can often include multiple face images of the same user during the dynamic process. The mobile phone can first track the face images in the video to obtain the face images of the same user in the video with time continuity, different angles, different expressions, different decorations, different hairstyles, and other different forms, and then combine these faces The images are automatically clustered into one category to obtain the video clustering results; then the video clustering results are used as prior information, based on the similarity between the facial features in the face pictures and the facial features in the video clustering results , The face pictures are clustered, so that face pictures of different forms can also be clustered correctly, and the clustering accuracy of face pictures is improved.

For example, referring to FIG. 7A, the mobile phone stores video 1 and a large number of face pictures. For example, the face picture includes face picture 1, face picture 2, face picture 3, and face picture 4.

In video 1, referring to Figure 7B, the mobile phone detects face 1 at time 1, which is the face on frontal face A, and continues to track face 1 during time period 1; mobile phone at time 2 Face 2 is detected, the face 2 is the face on the smiling face D, and the face 2 is continuously tracked within the time period 2; the mobile phone detects the face 3 at the time 3, and the face 3 is the face The face on the face G, and keep tracking the face 3 in the time period 3. The face images tracked by the mobile phone in time period 1 include frontal face A, side face B, and face C with sunglasses, etc.; the face images tracked by the mobile phone in time period 2 include smiling face D, Face E with closed eyes and face F with funny expressions, etc.; Face images tracked by the mobile phone in time period 3 include face up G and face H.

Among them, there can be multiple face detection methods. For example, the skin color model method detects human faces based on the relatively concentrated distribution of facial skin color in the color space. For another example, referring to the template method, one or several standard face templates are preset, and then the matching degree between the sample image collected by the test and the standard template is calculated, and the threshold is used to determine whether there is a face. For another example, feature sub-face method, face rule method, sample learning method, etc.

There are also many face tracking methods. For example, for model-based tracking methods, common tracking models may include skin color model, ellipse model, texture model and binocular template. For another example, the tracking method based on motion information mainly uses the continuity law of the target motion between consecutive frames of the image to predict the face area to achieve the purpose of fast tracking. Methods such as motion segmentation, optical flow, and stereo vision are usually used, and spatio-temporal gradients and Kalman filters are often used for tracking. For another example, tracking methods based on local features of human faces, and tracking methods based on neural networks.

Since the face image tracked by the mobile phone meets the time continuity and the must-link constraint, it is the face of the same user. Therefore, the mobile phone can compare the front face A, side face B and Face C wearing sunglasses is automatically grouped into one category, for example, clustered into category 1; Face D with smiling eyes, face E with closed eyes, and face F with funny expressions in time period 2 are automatically grouped into one category, For example, the clustering is category 2; the face G looking up and the face H looking down in the time period 3 are automatically clustered into one category, for example, the clustering is category 3. Among them, the categories here can also be called cluster centers. It is understandable that, for videos other than Video 1 stored on the mobile phone, the mobile phone may also use face detection and face tracking methods to cluster the face images in the video.

After clustering the face images in each group of tracking results, in a solution, the mobile phone can also cluster the face images in different tracking results. Specifically, the mobile phone can extract the facial features of the facial images in different tracking results (before extracting the facial features of the facial image, the mobile phone can also perform face correction on the facial image (that is, convert the facial image from other angles) If a face image in a certain category has high similarity with a face image in another category, it can be grouped into one category, then these two categories All face images in can be grouped into one category. For example, the mobile phone determines that the front face A in category 1 has a high similarity to the upward face G in category 3, and can be grouped into one category, then the front face A, side face B, and side face B in category 1 and category 3 Face C, face G, and face H wearing sunglasses can be grouped together.

Among them, there are many face clustering methods that group different face images into one category, such as hierarchical clustering methods, partition-based clustering methods, density-based clustering methods, and grid-based clustering methods. Methods, model-based clustering method, distance-based clustering method and interconnection-based clustering method, etc. Specifically, there may be K-Means algorithm, DBSCAN algorithm, BIRCH algorithm, MeanShift algorithm, etc. For example, in a clustering method, the mobile phone can extract the facial features of different facial images, and perform clustering according to the similarity of the different facial features. Among them, face feature extraction can be understood as a process of mapping a face image to an n-dimensional vector (n is a positive integer), and the n-dimensional vector has the ability to characterize the face image. The higher the similarity between the facial features of different facial images, the more different facial images can be grouped into one category.

Among them, there can be multiple methods for measuring similarity. For example, the face feature is a multi-dimensional vector, and the similarity may be the distance between the multi-dimensional vectors corresponding to the face features of different faces. For example, the distance may be Euclidean distance, Mahalanobis distance, Manhattan distance, etc. For another example, the similarity can be the cosine similarity, correlation coefficient, information entropy, etc. between the facial features of different faces.

For example, if the face feature is a multi-dimensional vector, if the face feature 1 of the front face A in category 1 extracted by the mobile phone is [0.88, 0.64, 0.58, 0.11, ..., 0.04, 0.23]; category 3 extracted by the mobile phone The face feature 2 of the upward face G in the middle is [0.68, 0.74, 0.88, 0.81, ...,0.14, 0.53]; the similarity between the face feature 1 and the face feature 2 is based on the corresponding multi-dimensional The cosine similarity between vectors is measured. The cosine similarity is 0.96. According to the cosine similarity, it is determined that the similarity between face feature 1 and face feature 2 is 96%; the similarity threshold corresponding to the clustering is 80%; the similarity 96% is greater than the similarity The threshold is 80%, so the frontal face A in category 1 and the upward face G in category 3 can be grouped into one category, and all face images in category 1 and category 3 can be grouped into one category.

For another example, the face feature is a multi-dimensional vector, and the corresponding relationship between each face image and the face feature can be seen in Table 1.

Table 1

If the similarity between face features is measured by Euclidean distance, the distance threshold of clustering is 5, and the face feature A of front face A in category 1 and the face feature A of upside face G in category 3 are shown in Table 1. The Euclidean distance between facial features G is less than the distance threshold of 5, so the frontal face A in category 1 and the upward face G in category 3 can be grouped into one category, and all face images in category 1 and category 3 can be grouped together. As a class.

After the video clustering is completed, the mobile phone can use incremental clustering algorithms or other clustering algorithms based on the video clustering results, such as categories 1, category 2, and category 3, and the extracted facial features of the facial images in each category. The class method is to cluster face picture 1, face picture 2, face picture 3, and face picture 4, thereby fusing the face image and face picture in the video, and clustering the face picture to the previous In the video clustering results. For example, if the facial features of a face image stored in the mobile phone are similar to the facial features of a face image in a certain category (such as category 1) (such as greater than or equal to a preset value) 1), the face image can be clustered into the category of the face image. Among them, when the incremental clustering algorithm is adopted, the face image can be used to achieve the expansion of the video clustering result in an incremental manner. If the similarity between the facial features of a certain face picture stored in the mobile phone and the facial features of the facial images in each category of the video clustering result is small (for example, less than the preset value 1), then the face The pictures are grouped into a new category.

Exemplarily, the correspondence between face pictures and face features can be found in Table 1 above. In an example, if the similarity between face features is measured by Euclidean distance, the clustering distance threshold is 5, and the difference between the face feature a of face image 1 and the face feature A of front face A If the Euclidean distance is less than the distance threshold of 5, then the face image 1 can be clustered into category 1 where the front face A is located; the Euclidean distance between the face feature b of the face image 2 and the face feature B of the side face B is less than The distance threshold is 5, then the face image 2 can be clustered into the category 1 where the side face B is located; the Euclidean distance between the face feature c of the face image 3 and the face feature D of the smiling face D is less than the distance threshold 5. Then the face image 3 can be clustered into the category 2 where the smiling face D is located; the Euclidean distance between the face feature d of the face image 4 and the face feature G of the upward face G is less than the distance threshold of 5, then The face picture 4 can be clustered into category 3 where the upward face G is located.

In another example, if the similarity between face features is measured by the Euclidean distance with the reference feature, the range of the Euclidean distance between the face feature corresponding to category 1 and the reference feature is 0-50; The clustering range of 3 corresponding facial features is 100-150. If the clusters of category 1 and category 3 are the same category 4, the range of the Euclidean distance between the facial features corresponding to category 4 and the reference feature is [0,50] U [100,150]. The face features of face picture 1, face picture 2 and face picture 4 are all in the range of [0,50] U [100,150], so face picture 1, face picture 2 and face picture 4 are all acceptable Cluster into category 4, thus clustering into the same category. Exemplarily, a schematic diagram of the clustering effect can be seen in Figure 8A.

In another example, the mobile phone can separately extract the facial features of all face images and facial images listed in Table 1, and then cluster them according to the similarity of the facial features.

From the above description, it can be seen that after tracking and automatically clustering the faces in the video, the mobile phone can group multiple face images of the same user into the same category, so that the face images of different shapes in the category can be Face images with similar face images are also clustered into this category, so compared with the prior art, the dispersion of clustering can be reduced, the accuracy of face clustering can be improved, the management of the user is convenient, and the user experience is improved.

If the facial features of face picture 1, face picture 2, face picture 3, and face picture 4 stored in the mobile phone are still as shown in Table 1, the similarity between face features is measured by Euclidean distance, clustering If the distance threshold is 5, without considering the result of video clustering, if the face picture 1, face picture 2, face picture 3, and face picture 4 are clustered directly, since every two face pictures The Euclidean distance between the facial features of are all greater than the distance threshold of 5, so any two cannot be clustered into One type, which results in each face picture being divided into a category. For an example, the clustering effect diagram can be seen in FIG. 8B. Compared with FIG. 8A, the face clustering shown in FIG. 8B has a greater degree of dispersion and a lower clustering accuracy, which causes problems such as false positives or false negatives in the clustering results.

In addition, in the embodiment of the present application, after the video clustering is completed, the mobile phone may also perform identity marking on the face in the video according to the video clustering result.

The above description mainly takes the reference image set as the video as an example. When the reference image set is the image group mentioned above (for example, the image group formed by moving pictures, the image group corresponding to the multi-frame images obtained during shooting preview, etc.), Or when the reference image set includes the video and the aforementioned image group, the mobile phone can still perform clustering processing in a manner similar to the video processing process, which will not be repeated here.

It should be noted that if the reference image set is an image group preset by the user, the images in the image group are usually different face images of the same user set by the user. Therefore, the mobile phone does not need to perform face detection and Tracking, directly automatically group the face images in the image group into one category.

In addition, after clustering face picture 1, face picture 2, face picture 3, and face picture 4 according to the clustering result of the reference image set, if the face of user 1 or user 2 is newly added to the subsequent mobile phone The picture is similar to the face picture 1, the face picture 2, the face picture 3, and the face picture 4. The mobile phone can also cluster the newly added face pictures according to the clustering result of the reference image set. In a specific implementation, the mobile phone can extend the newly added face pictures to the previous clustering results through incremental clustering.

After clustering the face picture 1, face picture 2, face picture 3, and face picture 4 according to the reference image set clustering result, if the subsequent mobile phone obtains a new reference image set (such as video 2), then In one solution, the mobile phone performs face detection, tracking, and clustering according to the previous reference image set and the new reference image set, and compares face picture 1, face picture 2, face picture 3 according to the video clustering result. Perform clustering processing with face image 4; in another solution, the mobile phone does not perform clustering processing on face image 1, face image 2, face image 3, and face image 4 temporarily, and the user is detected Instruct the operation of portrait classification before re-clustering.

In another embodiment, regardless of whether the mobile phone acquires a new reference image set, the mobile phone periodically performs face detection, tracking, and clustering on the currently acquired reference image set, and calculates the current reference image set based on the reference image set clustering result. The stored face pictures are clustered.

In another embodiment, after the mobile phone detects the user's instruction to classify portraits, it performs face detection, tracking, and clustering according to the currently acquired reference image set, and performs face detection, tracking, and clustering according to the reference image set clustering results. Face images are clustered.

In another embodiment, because face clustering consumes a lot of resources, the mobile phone can be in a preset time period (for example, 00:00-6:00 at night), or in an idle state (for example, when the mobile phone is not performing other services). ), or when the mobile phone is charging and the power is greater than or equal to the preset value 2, face detection, tracking and clustering are performed according to the currently acquired reference image set, and the currently stored face The pictures are clustered.

After the clustering is completed, the mobile phone can display the clustering results. For example, mobile phones can be displayed in groups (for example, folders). In the following, the reference image set is still video 1, category 1 and category 3 are clustered as category 4. The face pictures stored on the mobile phone include face picture 1, face picture 2, face picture 3, and face picture 4 as examples Be explained.

In an embodiment of the present application, after the video clustering is completed, the mobile phone may display the video clustering result. For example, referring to the video portrait classification interface (that is, the video clustering result interface) shown in FIG. 9A, the mobile phone can display group 1 corresponding to category 4 and group 2 corresponding to category 2.

In one solution, the group corresponding to each cluster category may include the video where the face image of the category is located. example For example, group 1 corresponding to category 4 and group 2 corresponding to category 2 both include video 1. In an implementation, referring to FIG. 9A, the cover image displayed by the thumbnail of the video in the group may be a face image belonging to the category in the video. In particular, the cover image may be a relatively positive face image, or an image designated by the user.

For video, the mobile phone can put the video into the groups corresponding to all categories of the face images in the video; or, when the face image of a certain category in the video appears for a duration greater than or equal to the preset duration, The mobile phone puts the video into the group corresponding to the category; or, when the number of frames of the face image of a certain category in the video is greater than or equal to the preset value 3, the mobile phone puts the video into the group corresponding to the category Or, when a frontal face image of a certain category appears in the video, the mobile phone puts the video into the group corresponding to the category.

In another solution, the group corresponding to each cluster category may include a video segment where the face image of the category is located, and the video segment of the face image of the category appears.

For example, group 1 corresponding to category 4 may be group 1A in FIG. 9B, and group 1A may include video segment 1 corresponding to time period 1 in video 1 and video segment 3 corresponding to time period 3; in addition, category 2 The corresponding group 2 may be group 2A, and group 2A may include video segment 2 corresponding to time period 2 in video 1.

In another solution, the group corresponding to each cluster category may include the face image frame of the category in the video where the face image of the category is located.

For example, group 1 corresponding to category 4 may be group 1B in FIG. 9C, and group 1B may include frontal face A, side face B, face C wearing sunglasses, face upward G, and face upward H. The group 2 corresponding to category 2 can be group 2B, and group 2B can include smiling face D, face E with closed eyes, and face F with funny expressions. In an implementation, the group corresponding to the same category may include multiple sub-groups, and the face image frames belonging to the category in the same video belong to the same sub-group.

In another solution, the group corresponding to each cluster category may include the video segment in which the face image of the category is located in the video where the face image of the category appears, and the face image frame of the category.

In this embodiment, the mobile phone can display the result of video clustering, which is convenient for users to categorize and manage videos based on video images, which improves the efficiency of users for searching and managing videos, and improves user experience.

In another embodiment of the present application, after the video clustering is completed, the mobile phone may not display the clustering result; after the face image clustering is completed, the clustering result is displayed again.

In one solution, after the face image clustering is completed, the mobile phone may display the face image clustering result. The group corresponding to each cluster category includes the face pictures of that category.

For example, referring to (a)-(c) in Figure 10, the mobile phone can display group 3 corresponding to category 4 and group 4 corresponding to category 2, and group 3 includes face picture 1, face picture 2, and face picture 4. , Group 4 includes face picture 3.

In another solution, after the face image clustering is completed, the mobile phone can display the video clustering result and the face image clustering result. Among them, the video clustering result and the face image clustering result can be displayed in different groups respectively, or can be combined and displayed in the same group.

When the video clustering result and the face image clustering result can be displayed in different groups, the video clustering result can be displayed in group 5, and the face image clustering result can be displayed in group 6. Wherein, the content in group 5 may be the video clustering result described above (for example, as shown in FIGS. 9A-9C); the content in group 6 may be the above-described content (for example, (a) in FIG. 10 -( c) Shown) Clustering results of face images.

When the video clustering result and the face picture clustering result are combined and displayed in the same group, the group corresponding to each clustering category may include the face picture clustering result described above, or the video clustering result described above. Class result.

For example, referring to (a) in FIG. 11, category 4 corresponds to group 7, and category 2 corresponds to group 8. In one scheme, every The group corresponding to each cluster category may include the face image of the category and the video where the face image of the category is located. Exemplarily, referring to (b) in Figure 11, the group 7 corresponding to category 4 is group 7A, and group 7A includes face picture 1, face picture 2, face picture 4, and video 1; see Fig. 11 (C) The group 8 corresponding to category 2 is group 8A, and group 8A includes face image 3 and video 1.

It should be noted that the grouped cover image can be a face image in this category or a face image in this category in the video. The cover image of video 1 in group 7 and the cover image of video 1 in group 8 may be the same or different. Preferably, the cover image of Video 1 may be a face image included in the category corresponding to the group.

In one implementation, in the group corresponding to the same category, the face image of this category may belong to one sub-group, and the video where the face image of this category is located may belong to another sub-group. Exemplarily, referring to (a) in FIG. 12, group 7 corresponding to category 4 is group 7B, and group 7B includes sub-group 7-1 corresponding to face pictures and sub-group 7-2 corresponding to videos. See (b) in Figure 12, sub-group 7-1 includes face picture 1, face picture 2 and face picture 4; see (c) in Figure 12, sub-group 7-2 includes video 1.

In another solution, the group corresponding to each cluster category may include the face image of the category, and the video segment in which the face image of the category appears in the video where the face image of the category is located. Exemplarily, as an alternative to (b) and (c) in FIG. 11, referring to (a) and (b) in FIG. 13, the group 7 corresponding to category 4 is group 7C, and group 7C includes people Face picture 1, face picture 2, face picture 4, video segment 1 and video segment 3; group 8 corresponding to category 2 is group 8C, and group 8C includes face picture 3 and video segment 2. In one implementation, in groups corresponding to the same category, the face pictures of this category may belong to one sub-group, and the video segment may belong to another sub-group.

In another solution, the group corresponding to each cluster category may include the face picture of the category and the image frame intercepted or selected in the video where the face image of the category is located. Exemplarily, as an alternative to (b) and (c) in FIG. 11, see (a) and (b) in FIG. 14, the group 7 corresponding to category 4 is group 7D, and group 7D includes people Face picture 1, face picture 2, face picture 4, and face images A, B, C, G, H in video 1; group 8 corresponding to category 2 is group 8D, and group 8D includes face picture 3 and Face images D, E, F in video 1. In one implementation, the face pictures of this category may belong to one subgroup; in the video where the face images of this category are located, the face image frames of this category may belong to another subgroup.

In another solution, the group corresponding to each cluster category may include face pictures of that category. It may also include the video where the face image of the category is located; in the video where the face image of the category is located, the video segment in which the face image of the category appears, and one or more of the captured or selected image frames . In an implementation, the face pictures of this category and the face image frames of this category may belong to one subgroup; the video or video segment corresponding to this category may belong to another subgroup.

In another implementation, the face pictures of this category may belong to one subgroup, and the face image frames and videos or video segments of this category may belong to another subgroup. In yet another implementation, the face images of this category, the face image frames, videos, and video segments of this category belong to different subgroups respectively.

In other solutions, after the face picture clustering is completed, the mobile phone may display the face picture clustering result, and determine whether to display the video clustering result according to the user's instruction.

It should be noted that, in the embodiment of the present application, the name of the group may be a name manually input by the user; or it may be a name obtained by the mobile phone itself through learning. For example, the mobile phone can determine the user identity in the picture, such as father, mother, wife (or husband), son, daughter, etc., according to the actions and intimacy between users in the picture or video, and set the user identity as the group name. In addition, in this embodiment, when the mobile phone displays the face picture clustering result for the first time or every time, it may also prompt the user that the face picture clustering result is obtained by classifying the face picture according to a reference image set such as a video. Exemplarily, when the group 7 corresponding to category 4 and the group 8 corresponding to category 2 are displayed, referring to FIG. 15, the mobile phone may prompt the user by displaying information 1501, so that the user can learn the portrait classification function of the mobile phone.

In this embodiment, the mobile phone can improve the efficiency of the user to find and manage the face pictures and videos based on the clustering results of the comprehensive management and display of the face pictures and videos, and improve the user experience.

In another embodiment of the present application, after the face picture clustering is completed, if the user finds that a face picture clustering result is wrong, the user can actively add a reference image set corresponding to the user in the face picture. For example, if the clustering result of face image 5 is wrong, see (a) in Figure 16. The user can click control 1601, or the user can click control 1601 after selecting face image 5; then, the user can click The control 1602 shown in (b) in FIG. 16 adds a reference image set; or, the user can add a reference image set through voice, preset gestures, or the like.

Wherein, the reference image set may be a video or a set of images captured by the user in real time, or a set of images obtained by the user through a mobile phone, and the set of images includes different forms of the user corresponding to the wrong face picture. Human face. Exemplarily, the reference image set may be the image set shown in (a)-(h) in FIG. 17. After the reference image set is added, the mobile phone can combine the reference image set added by the user to re-cluster the face pictures whose clustering is wrong; or re-cluster all face pictures stored on the phone.

It should be noted that the clustering method described in the above embodiment classifies the face pictures of different users according to the similarity of the facial features, so the groups corresponding to different clustering categories can also be understood as the groups corresponding to different users. .

In some embodiments of the present application, the groups corresponding to different cluster categories displayed on the mobile phone, that is, the groups corresponding to different users, may correspond to different priorities. The users corresponding to the high-priority groups may be users who are more concerned about.

Among them, in a technical solution, the more the user cares about the user, the more facial pictures and videos of the user that the user saves on the mobile phone, so the mobile phone can determine what appears in the saved facial pictures and videos. The users with the highest frequency are the users most concerned about, and the groups corresponding to these users have the highest priority.

In another technical solution, the mobile phone can determine that the priority of the group corresponding to the user with high intimacy of the mobile phone user is higher. For example, a mobile phone can use emotions based on factors such as the intimacy of actions between different users, the expressions of different users, the frequency of different users’ appearance in videos and face pictures, and the position of different users in videos and face pictures. The analysis algorithm determines the intimacy between different users and the user, thereby determining that users with higher intimacy with the user are the users that the user cares more about, and the priority of the groups corresponding to these users is also higher.

In another technical solution, since the user’s relatives are usually more similar to the user’s facial information, the relatives are usually the users that the user cares more about, and the user prefers to display the groups corresponding to the relatives first, so the mobile phone can determine the identity of the mobile phone user. Groups corresponding to users with closer facial information have higher priority.

In some embodiments, groups with high priority may be displayed first. In a technical solution, the mobile phone can display the high-priority groups on the top of the portrait classification interface, and the low-priority groups need to be viewed by the user by sliding up or switching pages. In another technical solution, the mobile phone may only display the top N (positive integer) groups with the highest priority on the portrait classification interface, and may not display the groups corresponding to other users that the user does not care about.

In other embodiments, if the number of face pictures and videos of a user saved on the mobile phone exceeds the preset value of 4 (for example, it can be 5), then the user may be a user that the user is more concerned about, and the mobile phone may be The group corresponding to the user is displayed on the classification interface.

In some embodiments of the present application, a group photo of a certain user and another user in the face picture may be in the group corresponding to the certain user, and may also be in the group where the other user is located. Exemplarily, referring to (a) in FIG. 18, face picture 6 is a group photo of user 1 and user 2; referring to (b) in FIG. 18, face picture 6 is both in the group corresponding to user 1 and User 2 corresponds to the group.

In some other embodiments of the present application, referring to FIG. 19A, groups corresponding to different users only include a single photo of the user, and group photos of multiple users are additionally displayed.

In some other embodiments of the present application, referring to FIG. 19B, groups corresponding to different users only include a single photo of the user, and group photos of multiple users are in another group.

In addition, after the clustering of the face picture is completed, the mobile phone can also mark the face in the picture according to the clustering result.

In other embodiments of the present application, after the clustering of the face pictures is completed, the mobile phone may also display the clustering results in a personalized manner. For example, for pictures in a group, when the mobile phone detects the operation of the user instructing color retention, the mobile phone can reserve the area indicated by the user or the preset area in the grouped picture as a color image, and other areas on the picture become Gray image.

Exemplarily, the preset area is the area where the user is located, the mobile phone may retain the color of the image in the area where the user is located, and the images in other areas are grayscale images. For another example, for a picture in a group corresponding to a target user, when the mobile phone detects that the user has instructed to keep the user's operation, the image frame of the area where the target user is located remains, and the image frames of other areas disappear, that is, the images of other areas It can be blank, black, gray or other preset colors.

In other embodiments of the present application, after the clustering of the face pictures is completed, the mobile phone may also generate a protagonist story. The main character story may include a series of multiple images of a certain user. The images in the protagonist's story are images of the same category. Specifically, they can be images in the reference image set (for example, they can include video segments in a video or face image frames in a video), or they can be images in a face picture.

That is to say, the mobile phone can not only extract face pictures from the pictures to edit the protagonist's story, but also combine the face images in the reference image set such as videos to edit the protagonist's story, so that the source of the protagonist's image can be wider and the protagonist's story can be more extensive Vivid, interesting and colorful.

It should be noted that the above description has taken the video as the reference image set as an example. When the reference image set is another reference image set (for example, the images captured by the mobile phone shown in (a)-(f) in Figure 20) In the case of group), the face pictures can still be clustered according to other reference image sets in the manner described in the above embodiment, which will not be repeated here.

The above description is based on an example of a human face as a classification object. When the classification object is another object, the clustering method provided in this application embodiment can still be used to cluster the pictures on the mobile phone. In addition, the user can also set the classification objects that the mobile phone can cluster.

For example, the classification objects are animal faces (such as the face of a dog, the face of a cat), objects (such as a house, a car, a mobile phone, a water cup, etc.), a logo mark (such as the logo of the Olympic rings), etc. For example, if the classification object is a house, the mobile phone may first gather the reference image sets of houses in different angles, different directions, different positions, different brightness, and different scenes acquired by the mobile phone in the manner described in the above embodiments. Clustering (for example, tracking and automatic clustering processing), and then clustering the pictures of houses stored in the mobile phone according to the clustering results of the reference image set, so that the clustering accuracy of pictures of houses of different appearances is higher, which is convenient for users Find and manage pictures of houses.

When the classification object also includes a variety of other classification objects other than the face, the clustering results displayed by the mobile phone can include the grouping of face pictures and the grouping of other classification objects; it can also be said that the mobile phone can perform clustering and clustering according to different entities. Grouping.

For example, when the classification objects include human faces, dogs, and houses, referring to FIG. 21, the mobile phone can display in the clustering results the groups corresponding to the faces of different users (such as user 1 and user 2), and different dogs (such as Dog 1) The corresponding grouping, and the corresponding grouping of different houses (such as house 1). In another example, when the classification object includes a human face and a house, the clustering result may include a group 9 corresponding to a human face, a group 10 corresponding to a dog, and a group 11 corresponding to a house. Wherein, group 9 may include sub-groups corresponding to different users (for example, user 1 and user 2), group 10 may include sub-groups corresponding to different dogs, and group 11 may include sub-groups corresponding to different houses. In addition, the sub-groups may include image clustering results, or include image clustering results and reference image set clustering results, which will not be described in detail here.

In another solution, the user can also select the classification result of the classification object currently to be displayed. Exemplarily, referring to (a) in Figure 22, after the mobile phone detects that the user clicks on the control 2201 shown in Figure 22 (a), it can display the interface shown in Figure 22 (b); the mobile phone detects the user After clicking the control 2202 shown in (b) in Figure 22, the interface shown in (c) in Figure 22 can be displayed; then, when the user selects the portrait category, the mobile phone only displays the face; when the user selects the dog category When the user selects the classification of the house, the mobile phone only displays the house; when the user selects other classification objects, perform the clustering results of other classification objects. It should be noted that there can be multiple ways for the user to select the clustering results of the classification objects that need to be displayed currently, which is not limited to the example shown in FIG. 22.

In combination with the foregoing embodiment and corresponding drawings, another embodiment of the present application provides a method for grouping pictures, which can be implemented in an electronic device having the hardware structure shown in FIG. 1. At least one face picture is saved on the electronic device. As shown in Figure 23, the method may include:

2301. The electronic device acquires at least one video.

Wherein, at least one video obtained by the electronic device may include multiple face image frames, and each video may also include multiple face image frames. At least one face picture saved on the electronic device is a static picture taken by the user before, or the electronic device obtains a static picture through downloading, copying, or the like.

Exemplarily, the at least one face picture may be face picture 1-face picture 4 shown in FIG. 8A.

There may be multiple ways for the electronic device to obtain at least one video. For example, the storage area of the electronic device stores at least one video, and the electronic device obtains at least one video from the storage area. Wherein, the video stored in the storage area may be taken by the user before, downloaded by the electronic device, or obtained by the electronic device during the running of the application program.

For another example, referring to FIG. 6, the electronic device may prompt the user to shoot a video including face image frames, and after detecting the user's instruction to shoot the video, record and generate at least one video.

For another example, the electronic device prompts the user to download at least one video, and the downloaded video is obtained after the user instructs to download. Exemplarily, the at least one video acquired by the electronic device may include video 1 shown in FIG. 7B.

2302. The electronic device extracts multiple face image frames from at least one video.

After acquiring at least one video, the electronic device can extract multiple face image frames from the at least one video, so that the face images can be grouped subsequently according to the extracted face image frames. Exemplarily, when the video acquired by the electronic device includes the video 1 shown in FIG. 7B, the face image frame extracted by the electronic device from the video 1 may be the face image frame A-face image frame H in FIG. 7B.

In other embodiments, the electronic device may also extract a face image frame from at least one video, so that the face images may be grouped according to the extracted face image frame later.

2303. The electronic device performs clustering processing on at least one face image according to multiple face image frames.

Exemplarily, the electronic device may perform clustering processing on the face picture 1-the face picture 4 according to the extracted face image frame A-face image frame H.

Among them, there may be multiple clustering algorithms. For details, please refer to the related descriptions in the above embodiments and related technologies of existing clustering algorithms.

2304. The electronic device 4 displays at least one group according to the clustering processing result, and each group includes a user's At least one face picture.

In this step, each group obtained by the clustering process may include at least one face picture of a user, that is, a group may include at least one face picture of the same user, and at least one face picture of the same user. Face pictures can be in the same group.

That is, the electronic device may use multiple face image frames in at least one video as prior information, and cluster the face images according to the multiple face image frames in the at least one video, so as to classify the face images according to different Users are grouped, so that face pictures of the same user are clustered into the same group, and the accuracy of face picture grouping is improved.

Wherein, at least one face picture included in a group may be a face picture of the same user determined by the electronic device. The electronic device may calculate according to the similarity between the face features on the face picture, and determine that different face pictures with a similarity greater than or equal to the first preset value are face pictures of the same user.

Exemplarily, after the electronic device performs clustering processing on the face picture 1-the face picture 4 according to the face image frame A- the face image frame H, the obtained group may refer to the group shown in FIG. 10(b) 3 and group 4 shown in (c) in Figure 10. Group 3 includes the face picture of user 1, and group 4 includes the face picture of user 2.

In a technical solution, each group further includes any one or a combination of any of the following: the video where the user's face image frame is located, the video segment where the user's face image frame is located, or at least one of the user's face image frames Face image frame. That is, the electronic device can group face pictures, videos, video segments, and face image frames according to different users, unified or jointly manage users' videos and pictures, which is convenient for users to find and manage, and improve user experience.

Exemplarily, referring to (b) in FIG. 11, the group 7A corresponding to user 1 includes the face picture of the user and the video 1 where the face image frame of the user 1 is located.

For another example, referring to (a) in FIG. 13, the group 7C corresponding to user 1 includes the user's face picture and the video segment where the user's face image frame is located.

For another example, referring to (a) in FIG. 13, the group 7D corresponding to the user 1 includes the face picture of the user and multiple face image frames of the user.

In a technical solution, at least one face picture of a user included in each group is a single photo or a group photo. Exemplarily, (b) group 3 in FIG. 10 includes a single photo of user 1, and group 4 shown in (c) in FIG. 10 includes a single photo of user 2. Group 9 shown in (a) in FIG. 18 includes the single photo and group photo of user 1, and group 10 shown in (b) in FIG. 18 includes the single photo and group photo of user 2.

As shown in Figure 23, the foregoing step 2303 may specifically include:

2303A. The electronic device divides the multiple face image frames into at least one category, and each category corresponds to multiple face image frames of different forms of a user.

Exemplarily, referring to FIG. 8A, the electronic device may divide the face image frame AC into category 1, and the category 1 includes multiple face image frames of different forms of the user 1; and divide the face image frame DF into category 2, category 2. Includes multiple face image frames of user 2 in different forms; the face image frame GH is divided into category 3, and category 3 includes multiple face image frames of user 1 in different forms.

2303B. The electronic device performs clustering processing on at least one face image according to the classification results of multiple face image frames.

Exemplarily, the electronic device may perform clustering processing on the face pictures 1-4 according to category 1, category 2, and category 3 shown in FIG. 8A. The electronic device may group the face pictures and the divided categories into a group according to the classification results, or divide the face pictures into a new group.

Among them, the face image frame in the video is usually a dynamically changing face image frame, and may include face images of different forms Like. When each category divided by multiple face image frames in at least one video includes different types of face images of the same user, the electronic device can respond to different face images of different users in different categories according to different types of face images. Face pictures of different shapes such as angles and expressions are accurately grouped to improve the accuracy of grouping.

The above step 2303A may specifically include: the electronic device separately classifies the face image frames in each video into at least one category.

Among them, adjacent image frames in the same video have temporal continuity, and multiple face image frames of the same user with temporal continuity in the video can be classified into one category. In video, the face image frames of the same user with temporal continuity can usually be adjacent face image frames.

For example, the face images in the same video tracked by the electronic device through the face tracking algorithm have temporal continuity, meet the must-link constraint, are the faces of the same user, and can be classified into the same category. Therefore, the electronic device can separately classify multiple face image frames of the same user with temporal continuity in each video into the same category through the face tracking algorithm. In this way, the face image frames of multiple users in the same video can correspond to multiple categories.

Exemplarily, the result of the electronic device classifying the face image frame in the video 1 may be category 1, category 2, and category 3 shown in FIG. 8A.

The above step 2303A may specifically further include: if the facial features of the first face image frame in the first category in at least one category are similar to the face features of the second face image frame in the second category If the value is greater than or equal to the second preset value, the electronic device may merge the first category and the second category into the same category.

Among them, because the similarity between the facial features is greater than or equal to the second preset value, the two face image frames are generally the face image frames of the same user, and the categories of the two face image frames are also the same as those of the same user. Correspondingly, the electronic device can merge the categories of the two face image frames into the same category.

In this way, the electronic device can first divide the face image frames in the same video into categories, and then merge the categories of the face image frames with greater similarity in different videos, that is, the face image frames of the same user in different videos Merged into the same category.

Exemplarily, if the similarity between the facial features of the first face image frame in category 1 and the second face image frame in category 3 is greater than or equal to the second preset value, the electronic device sets the category 1 Merged with category 3 into category 4.

In the subsequent step 2303B, the electronic device may perform clustering processing on at least one face image saved (acquired) by the electronic device according to category 2 and category 4.

In addition, referring to FIG. 23, the method may further include:

2305. The electronic device acquires at least one image group, and each image group includes multiple image frames of the same user in different forms.

Wherein, the at least one image group includes any one or a combination of any of the following: a moving image, a pre-photographed image group including different forms of the same user's face, an image formed by multiple frames of images collected in real time during the shooting preview Group, or an image group formed by multiple frames of images taken during continuous shooting.

On the basis of step 2305, the above step 2302 may specifically include: the electronic device extracts multiple face image frames from at least one video and at least one image group.

Wherein, the image group in step 2305 and the video in step 2301 may be the reference image set described in the foregoing embodiment of this application. That is, the electronic device can obtain multiple face image frames of the same user in different poses from one or more reference image sets, so that the electronic device can accurately group the face images according to the multiple face image frames of the same user in different poses. Reduce the dispersion of clusters.

It can be understood that, in order to realize the above-mentioned functions, an electronic device includes hardware and software corresponding to each function. Piece modules. With reference to the algorithm steps of the examples described in the embodiments disclosed herein, the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software-driven hardware depends on the specific application and design constraint conditions of the technical solution. Those skilled in the art can use different methods for each specific application in combination with the embodiments to implement the described functions, but such implementation should not be considered beyond the scope of the present application.

The embodiment of the present application may divide the electronic device into functional modules according to the foregoing method examples. For example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The above integrated modules can be implemented in the form of hardware. It should be noted that the division of modules in the embodiments of the present application is illustrative, and is only a logical function division, and there may be other division methods in actual implementation.

In the case of dividing each functional module corresponding to each function, FIG. 24 shows a schematic diagram of a possible composition of the electronic device 2400 involved in the foregoing embodiment. As shown in FIG. 24, the electronic device 2400 may include: an acquiring unit 2401, extraction unit 2402, clustering unit 2403, display unit 2404, and so on.

Wherein, the obtaining unit 2401 may be used to support the electronic device 2400 to perform the foregoing step 2301, and/or other processes used in the technology described herein.

The extracting unit 2401 may be used to support the electronic device 2400 to perform the foregoing steps 2302, etc., and/or used in other processes of the technology described herein.

The clustering unit 2403 may be used to support the electronic device 2400 to perform the above steps 2303, 2303A, 2303B, etc., and/or other processes of the technology described herein.

The display unit 2404 may be used to support the electronic device 2400 to perform the above-mentioned steps 2304, etc., and/or used in other processes of the technology described herein.

It should be noted that all relevant content of the steps involved in the above method embodiments can be cited in the functional description of the corresponding functional module, which will not be repeated here.

The electronic device provided in the embodiment of the present application is used to perform the above-mentioned grouping method for pictures, and therefore can achieve the same effect as the above-mentioned implementation method.

In the case of an integrated unit, the electronic device may include a processing module and a storage module. The processing module can be used to control and manage the actions of the electronic device. For example, it can be used to support the electronic device to execute the steps performed by the above-mentioned obtaining unit 2401, extraction unit 2402, clustering unit 2403, and display unit 2404.

The storage module can be used to support electronic devices to store reference image sets such as face pictures and videos, moving pictures, and to store program codes and data.

In addition, the electronic device may also include a communication module, which may be used to support communication between the electronic device and other devices.

Wherein, the processing module may be a processor or a controller. It can implement or execute various exemplary logical blocks, modules, and circuits described in conjunction with the disclosure of this application. The processor may also be a combination of computing functions, for example, a combination of one or more microprocessors, a combination of digital signal processing (DSP) and a microprocessor, etc. The storage module may be a memory. The communication module may specifically be a radio frequency circuit, a Bluetooth chip, a wifi chip, and other devices that interact with other electronic devices.

In an embodiment, when the processing module is a processor and the storage module is a memory, the electronic device involved in the embodiment of the present application may be an electronic device having the structure shown in FIG. 1. Specifically, the internal memory 121 shown in FIG. 1 may store computer program instructions, and when the instructions are executed by the processor 110, the electronic device can execute: acquire at least one video; extract multiple face image frames from the at least one video; Perform clustering processing on at least one face picture according to multiple face image frames; and according to the clustering processing result, display at least one group, and each group includes at least one face picture of a user. Specifically, when the instructions are executed by the processor 110, the electronic device can specifically execute: divide multiple face image frames into at least one category, and each category corresponds to multiple face image frames of different forms of a user; and As a result of the classification of multiple face image frames, cluster processing is performed on at least one face image, and the steps in the foregoing method embodiment.

The embodiment of the present application also provides a computer storage medium, the computer storage medium stores computer instructions, when the computer instructions run on the electronic device, the electronic device executes the above-mentioned related method steps to implement the picture grouping method in the above-mentioned embodiment .

The embodiments of the present application also provide a computer program product. When the computer program product runs on a computer, the computer is caused to execute the above-mentioned related steps, so as to realize the picture grouping method in the above-mentioned embodiment.

In addition, the embodiments of the present application also provide a device. The device may specifically be a chip, component or module. The device may include a connected processor and a memory; where the memory is used to store computer execution instructions, and when the device is running, The processor can execute the computer-executable instructions stored in the memory, so that the chip executes the picture grouping methods in the foregoing method embodiments.

Among them, the electronic equipment, computer storage medium, computer program product, or chip provided in the embodiments of the present application are all used to execute the corresponding method provided above. Therefore, the beneficial effects that can be achieved can refer to the corresponding method provided above. The beneficial effects of the method are not repeated here.

Through the description of the above embodiments, those skilled in the art can clearly understand that for the convenience and clarity of description, only the division of the above-mentioned functional modules is used as an example for illustration. In actual applications, the above-mentioned functions can be allocated according to needs. It is completed by different functional modules, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above.

In the several embodiments provided in this application, it should be understood that the disclosed device and method can be implemented in other ways. For example, the device embodiments described above are merely illustrative, for example, the division of modules or units is only a logical function division, and there may be other division methods in actual implementation, for example, multiple units or components may be combined or It can be integrated into another device, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.

The units described as separate parts may or may not be physically separate. The parts displayed as a unit may be one physical unit or multiple physical units, that is, they may be located in one place or distributed to multiple different places. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be realized in the form of hardware or software functional unit.

If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a readable storage medium. Based on this understanding, the technical solutions of the embodiments of the present application essentially or the part that contributes to the prior art, or all or part of the technical solutions can be embodied in the form of a software product, and the software product is stored in a storage medium. There are thousands of instructions used to make a device (may be a single-chip microcomputer, a chip, etc.) or a processor (processor) execute all or part of the steps of the method described in each embodiment of the present application. The aforementioned storage media include: U disk, mobile hard disk, read only memory (read only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other media that can store program codes.

The above content is only the specific implementation manners of this application, but the protection scope of this application is not limited to this. Changes or replacements within the scope of the technology disclosed in the application shall be covered by the scope of protection of this application. Therefore, the protection scope of this application should be subject to the protection scope of the claims.

Claims

1. A picture grouping method applied to an electronic device, where at least one face picture is stored on the electronic device, characterized in that the method includes:

Earn at least one video;

Extracting multiple face image frames from the at least one video;

Perform clustering processing on the at least one face picture according to the multiple face image frames;

According to the clustering processing result, at least one group is displayed, and each group includes at least one A face picture of a user.

2. The method according to claim 1, wherein the clustering of the at least one face picture according to the multiple face image frames comprises:

Dividing the multiple face image frames into at least one category, and each category corresponds to multiple face image frames of different forms of a user;

Perform clustering processing on the at least one face picture according to the classification results of the multiple face image frames.

3. The method according to claim 2, wherein the dividing the plurality of face image frames into at least one category comprises:

Dividing the face image frames in each video into at least one category;

If the face feature of the first face image frame in the first category in the at least one category, the similarity between the face feature of the second face image frame in the second category is greater than or equal to the preset value , Then the first category and the second category are combined into the same category.

4. The method according to claim 3, wherein the separately dividing the face image frames in each of the videos into at least one category comprises:

Through the face tracking algorithm, the multiple face image frames of the same user with temporal continuity in each video are divided into the same category.

5. The method according to any one of claims 1-4, wherein each of the groups further includes any one or a combination of any of the following: the video where the user's face image frame is located, The video segment in which the face image frame of the user is located, or at least one face image frame of the user.

6. The method according to any one of claims 1-5, wherein at least one face picture of a user included in each group is a single photo or a group photo.

7. The method according to any one of claims 1-6, wherein the acquiring at least one video comprises: acquiring the at least one video from a storage area of the electronic device.

8. The method according to any one of claims 1-6, wherein said acquiring at least one video comprises: prompting a user to shoot a video including face image frames;

After detecting the user's instruction to shoot the video, at least one video is recorded and generated.

9. The method according to any one of claims 1-8, wherein the method further comprises:

At least one image group is acquired, and each image group includes multiple image frames of the same user in different forms; the at least one image group includes any one or a combination of any of the following: moving pictures, pre-photographed including the same Image groups of different forms of the user’s face, an image group formed by multiple frames of images captured in real-time during shooting preview, or an image group formed by multiple frames of images captured during continuous shooting;

The extracting multiple face image frames from the at least one video includes:

Extracting the plurality of face image frames from the at least one video and the at least one image group.

10. An electronic device, characterized in that the electronic device comprises: at least one processor; at least one memory; wherein, the at least one memory stores computer program instructions, and when the instructions are processed by the at least one When the device is executed, the electronic device is caused to perform the following steps:

Earn at least one video;

Extracting multiple face image frames from the at least one video;

11. The electronic device according to claim 10, wherein the clustering process on the at least one face image according to the multiple face image frames specifically comprises:

12. The electronic device according to claim 11, wherein the dividing the plurality of face image frames into at least one category specifically comprises:

Dividing the face image frames in each video into at least one category;

13. The electronic device according to claim 12, wherein said separately dividing the face image frames in each said video into at least one category specifically comprises:

14. The electronic device according to any one of claims 10-13, wherein each of the groups further comprises any one or a combination of any of the following: the video where the user's face image frame is located , The video segment where the face image frame of the user is located, or at least one face image frame of the user.

15. The electronic device according to any one of claims 10-14, wherein at least one face picture of a user included in each group is a single photo or a group photo.

16. The electronic device according to any one of claims 10-15, wherein said acquiring at least one video specifically comprises:

The at least one video is acquired from the at least one memory.

17. The electronic device according to any one of claims 10-15, wherein said acquiring at least one video specifically comprises:

Prompt the user to shoot a video including face image frames;

18. The electronic device according to any one of claims 10-17, wherein when the instruction is executed by the at least one processor, the electronic device is further caused to execute the following steps:

At least one image group is acquired, and each image group includes multiple image frames of the same user in different forms; the at least one image group includes any one or a combination of any of the following: moving pictures, pre-photographed including the same The image group of the user's face in different forms, the image group formed by the multi-frame images collected in real time during the shooting preview, or the image group taken during continuous shooting Image group formed by multiple frames of images;

The extracting multiple face image frames from the at least one video specifically includes:

19. A computer storage medium, characterized by comprising computer instructions, and when the computer instructions are run on an electronic device, the electronic device executes the picture grouping method according to any one of claims 1-9.

20. A computer program product, characterized in that, when the computer program product runs on a computer, the computer executes the picture grouping method according to any one of claims 1-9.