WO2017092127A1

WO2017092127A1 - Video classification method and apparatus

Info

Publication number: WO2017092127A1
Application number: PCT/CN2015/099610
Authority: WO
Inventors: 陈志军; 侯文迪; 龙飞
Original assignee: 小米科技有限责任公司
Priority date: 2015-12-01
Filing date: 2015-12-29
Publication date: 2017-06-08
Also published as: EP3176709A1; US10115019B2; KR20180081637A; JP6423872B2; US20170154221A1; KR101952486B1; RU2016136707A3; RU2016136707A; CN105426515A; MX2016005882A; RU2667027C2; CN105426515B; JP2018502340A

Abstract

The present disclosure relates to a video classification method and apparatus. The method comprises: acquiring a key frame, comprising a human face, in a video; acquiring a human face feature in the key frame; acquiring a human face feature corresponding to a picture type; according to the human face feature in the key frame and the human face feature corresponding to the picture type, determining a picture type to which the video belongs; and distributing the video to the picture type to which the video belongs. The technical solution can intelligently and automatically classify a video into a picture type corresponding to a person participating in the video, so that user manual classification is not needed and the classification accuracy is high.

Description

Video classification method and device

The present application is based on a Chinese patent application filed on Jan.

Technical field

The present disclosure relates to the field of multimedia clustering technologies, and in particular, to a video categorization method and apparatus.

Background technique

Currently, users can capture multimedia data such as videos and photos using the camera. For photos, there is already a face clustering technique that can group photos taken by the same person into the corresponding photo collection. However, there is currently no technology for face clustering of videos and photos that the same person participates in, and the user can only manually classify the videos, which is low in intelligence and low in efficiency.

Summary of the invention

Embodiments of the present disclosure provide a video categorization method and apparatus. The technical solution is as follows:

According to a first aspect of an embodiment of the present disclosure, a video categorization method is provided, including:

Obtain key frames in the video that include faces;

Obtaining a face feature in the key frame;

Obtaining a face feature corresponding to the picture category;

Determining, according to the face feature in the key frame and the face feature corresponding to the picture category, a picture category to which the video belongs;

The video is assigned to a picture category to which the video belongs.

In an embodiment, the acquiring a key frame including a face in the video includes:

Obtaining at least one video frame including a face from the video;

Determining, in the at least one video frame, a face parameter in each video frame, where the face parameter includes any one or two of a face number and a face position;

A key frame in the video is determined based on the face parameters in each of the video frames.

In an embodiment, the determining, according to the face parameters in each video frame, the key frames in the video, including:

Determining, according to the face parameter in each video frame, the non-repetitive video frame that the face parameter does not repeatedly appear in other video frames;

At least one of the non-repeating video frames is determined as the key frame.

Determining, according to the face parameter in each video frame, at least one set of repeated video frames with the same face parameter, and each group of the repeated video frames includes at least two video frames, each group of the repetition The difference between the ingest time between the video frame with the latest ingested time and the video frame with the earliest time of the video frame is less than or equal to the preset duration, and the face parameters of all the video frames in each group of the repeated video frames are the same;

Any one of the sets of the repeated video frames is determined as the key frame.

In an embodiment, the determining, according to the face feature in the key frame and the face feature corresponding to the picture category, the picture category to which the video belongs, including: when the number of the video is at least two Determining a face feature in the key frame of each video; performing face clustering processing on the at least two videos according to a face feature in the key frame of each video to obtain at least one a video category; determining, according to a face feature corresponding to each of the at least one video category and a face feature corresponding to the picture category, a video category and a picture category corresponding to the same facial feature;

The assigning the video to a picture category to which the video belongs includes: assigning a video in each of the video categories to a picture category corresponding to the same facial feature.

In an embodiment, the determining, according to the face feature in the key frame and the face feature corresponding to the picture category, the picture category to which the video belongs, including:

Determining, in a face feature corresponding to the picture category, a picture category that matches a face feature in the key frame;

The matched picture category is determined as the picture category to which the video belongs.

In an embodiment, the method further includes:

Obtaining the shooting time and shooting location of the video;

Determining a picture of the same purpose as the shooting time and shooting location of the video;

The video is assigned to a picture category to which the destination picture belongs.

According to a second aspect of the embodiments of the present disclosure, a video categorization apparatus is provided, including:

a first acquiring module, configured to acquire a key frame including a face in the video;

a second acquiring module, configured to acquire a facial feature in the key frame acquired by the first acquiring module;

a third acquiring module, configured to acquire a face feature corresponding to the picture category;

a first determining module, configured to determine the video according to a face feature in the key frame acquired by the second acquiring module and a face feature corresponding to the picture category acquired by the third acquiring module The category of the picture to which it belongs;

a first allocation module, configured to allocate the video to a picture category to which the video determined by the first determining module belongs.

In an embodiment, the first acquiring module includes:

Obtaining a submodule, configured to acquire at least one video frame including a human face from the video;

a first determining submodule, configured to determine a face parameter in each video frame in the at least one video frame acquired by the acquiring submodule, where the face parameter includes a number of faces and a face position Any one or two;

And a second determining submodule, configured to determine a key frame in the video according to the face parameter in each video frame.

In an embodiment, the second determining submodule is further configured to determine, according to the face parameter in each video frame, the non-repetitive video that the face parameter does not repeatedly appear in other video frames. a frame; determining at least one of the non-repeating video frames as the key frame.

In an embodiment, the second determining submodule is further configured to determine, according to the face parameter in each video frame, at least one set of repeated video frames with the same face parameter, each group of The repeated video frame includes at least two video frames, and the difference between the ingest time between the video frame with the latest ingested time and the video frame with the earliest time in each of the repeated video frames is less than or equal to a preset duration, each group The face parameters of all the video frames in the repeated video frames are the same; any one of the sets of the repeated video frames is determined as the key frame.

In an embodiment, the first determining module includes:

a third determining submodule, configured to determine a face feature in the key frame of each video when the number of the videos is at least two; according to a face feature in the key frame of each video, Performing face clustering processing on the at least two videos to obtain at least one video category; determining corresponding corresponding facial features according to respective facial features corresponding to the at least one video category and facial features corresponding to the image category Video category and image category;

The first distribution module includes:

And a first allocation submodule, configured to allocate, in the picture category corresponding to the same facial feature, the video in each video category determined by the third determining submodule.

In an embodiment, the first determining module includes:

a fourth determining submodule, configured to determine, in a facial feature corresponding to the picture category, a picture category that matches a facial feature in the key frame;

a second allocation submodule, configured to determine, by the fourth determining submodule, the matched picture category as a picture category to which the video belongs.

In one embodiment, the apparatus further includes:

a fourth acquiring module, configured to acquire a shooting time and a shooting location of the video;

a second determining module, configured to determine a target picture that is the same as the shooting time and the shooting location of the video acquired by the fourth acquiring module;

a second allocation module, configured to allocate the video to a picture category to which the target picture determined by the second determining module belongs.

According to a third aspect of the embodiments of the present disclosure, a video classification apparatus is provided, including:

processor;

a memory for storing processor executable instructions;

Wherein the processor is configured to:

Obtain key frames in the video that include faces;

Obtaining a face feature in the key frame;

Obtaining a face feature corresponding to the picture category;

The video is assigned to a picture category to which the video belongs.

The technical solutions provided by the embodiments of the present disclosure may include the following beneficial effects:

In the above technical solution, the video can be intelligently and automatically classified into the picture category corresponding to the person participating in the video, which not only does not require manual classification by the user, but also has high classification accuracy.

The above general description and the following detailed description are intended to be illustrative and not restrictive.

DRAWINGS

The accompanying drawings, which are incorporated in the specification

FIG. 1 is a flow chart showing a video categorization method according to an exemplary embodiment.

2 is a flow chart of another video categorization method, according to an exemplary embodiment.

FIG. 3 is a flowchart of still another video categorization method according to an exemplary embodiment.

FIG. 4 is a block diagram of a video categorizing device, according to an exemplary embodiment.

FIG. 5 is a block diagram of another video categorization device, according to an exemplary embodiment.

FIG. 6 is a block diagram of still another video categorization apparatus according to an exemplary embodiment.

FIG. 7 is a block diagram of still another video categorization apparatus, according to an exemplary embodiment.

FIG. 8 is a block diagram of still another video categorization apparatus, according to an exemplary embodiment.

FIG. 9 is a block diagram suitable for a network connection device, according to an exemplary embodiment.

detailed description

Exemplary embodiments will be described in detail herein, examples of which are illustrated in the accompanying drawings. The following description refers to the same or similar elements in the different figures unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present disclosure. Instead, they are merely examples of devices and methods consistent with aspects of the present disclosure as detailed in the appended claims.

The embodiment of the present disclosure provides a video categorization technology, which can intelligently and automatically assign a video into a picture category corresponding to a person participating in the video, which not only does not require manual classification by the user, but also has high classification accuracy.

Before describing the method provided by the embodiment of the present disclosure, the picture category and its generation method will be described first. One picture category corresponds to one face, and each picture category has the same face in the picture, or it can be said that one picture category corresponds to one person. Therefore, each picture category includes a group with the same facial feature. image. The embodiment of the present disclosure may adopt the following face clustering method to generate a picture category, but is not limited to the following method.

In the face clustering method, usually, the first clustering is initialized by a full-scale clustering method, and the subsequent clustering is generally an incremental clustering method. The face clustering method may include the following steps A1-A5:

Step A1: Obtain face features included in each of the N pictures, and obtain N face features, where N is greater than or equal to 2. At the beginning of the cluster, each face is treated as a class, then there are N classes at the beginning.

Step A2: In the N classes, calculate the distance between the class and the class, and the distance between the class and the class is the distance between the faces of the two classes.

Step A3, a distance threshold θ is preset, and when the distance between the two classes is less than θ, the two classes are considered to correspond to the same person, and this iteration merges the two classes into a new class. .

Step A4, step A3 is repeatedly performed to perform repeated iterations until no new class is generated in one iteration, and the iteration is terminated.

In step A5, the result is a total of M classes, each class containing at least one face, and one class representing one person.

FIG. 1 is a flowchart of a video categorization method according to an embodiment of the present disclosure. The execution body of the method may be an application for managing a multimedia file. At this time, the video, the picture category, and the picture under the picture category involved in the method refer to the video and picture category stored in the device where the application is located. And the image under the image category. In addition, the executor of the method may also be an electronic device that stores a multimedia file. At this time, the video, the picture category, and the picture under the picture category involved in the method refer to the video and picture stored in the electronic device. Categories and images under the image category. The foregoing application or the electronic device may automatically trigger the method periodically, or may trigger the method when receiving the indication of the user, or may automatically trigger the method when it detects that at least one new video is generated, and trigger The timing of the method may be various, and is not limited to the above exemplified ones. The ultimate purpose is to use the method to intelligently classify videos and save manpower. As shown in FIG. 1, the method includes steps S101-S105:

In step S101, a key frame including a face in the video is acquired.

In one embodiment, any one or more video frames including a human face may be selected from the video as a key frame, or a key frame may be acquired as shown in FIG. 2. As shown in FIG. 2, step S101 may be implemented as follows. Steps S201-S203:

In step S201, at least one video frame including a face is acquired from the video.

In step S202, a face parameter in each video frame is determined in at least one video frame, and the face parameter includes any one or two of a face number and a face position.

In step S203, a key frame in the video is determined based on the face parameters in each video frame.

The step S203 can be implemented as any one or two of the following manners 1 and 2. Method 1: According to each

The face parameters in the video frames determine that the face parameters are not repeatedly present in the non-repeating video frames in other video frames; the at least one non-repeating video frame is determined as the key frame.

That is, the non-repeating video frame refers to a video frame in which the face parameter is different from any other video frame, that is, the face picture is not repeatedly displayed in other video frames, and therefore, one or more non-repetitions can be arbitrarily selected. Video frames are used as key frames.

Manner 2: determining, according to the face parameters in each video frame, at least one set of repeated video frames with the same face parameters, and each set of repeated video frames includes at least two video frames, and each group of repeated video frames has the latest ingestion time. The difference between the ingestion time between the video frame and the earliest video frame is less than or equal to the preset duration, and the face parameters of all video frames in each group of repeated video frames are the same; any video in each group of repeated video frames will be repeated The frame is determined to be a key frame.

The preset duration can be preset. Since the same picture in the video does not last for too long, the preset duration should not be too long. Considering that the video is played 24 frames per second, the preset duration can be Controlled in N/24 seconds, N is greater than or equal to 1, and less than or equal to 24 (or 36, or other values, which can be determined as needed). The shorter the preset duration, the more accurate the last selected keyframe. That is, the face pictures of each video frame in each set of repeated video frames are the same, that is, the same face picture appears in multiple video frames. Therefore, any one of the video frames can be selected as a key frame in each set of repeated video frames, which realizes the deduplication effect and improves the efficiency of selecting key frames.

The first method and the second method may be implemented separately or in combination.

In step S102, a face feature in a key frame is acquired.

In step S103, a face feature corresponding to the picture category is acquired.

In step S104, the picture category to which the video belongs is determined according to the face feature in the key frame and the face feature corresponding to the picture category.

In step S105, the video is assigned to the picture category to which the video belongs.

The above method provided by the embodiment of the present disclosure can intelligently and automatically classify a video and a picture, and does not need to be manually classified by a user, and is classified according to a face feature, and has high accuracy.

In an embodiment, step S104 may be implemented as steps B1-B2: step B1, determining a picture category that matches a face feature in a key frame in a face feature corresponding to the picture category; for example, performing the foregoing step A1 -A5, through the face clustering process, determining the picture category to which the key frame belongs according to the face feature in the key frame, and the picture category to which the key frame belongs is the picture category matching the face feature in the key frame; Step B2: Determine the matched picture category determined by the above step B1 as the picture category to which the video belongs.

In another embodiment, step S104 can be implemented as steps C1-C3:

Step C1: determining a face feature in a key frame of each video when the number of videos is at least two; step C2, performing face on at least two videos according to a face feature in a key frame of each video The clustering process obtains at least one video category, and one video category corresponds to one human face; specifically, the face clustering method shown in the foregoing steps A1-A5 may be used to perform face clustering processing on each key frame to obtain at least a class; a class is a video category, such that each video category corresponds to a face feature; the video category to which the key frame of the video belongs is the video category to which the video belongs; step C3, the person corresponding to each of the at least one video category The face feature and the face feature corresponding to the picture category determine a video category and a picture category corresponding to the same facial feature; that is, a video category and a picture category corresponding to the same facial feature are determined. Correspondingly, the above step S105 can be implemented as: assigning videos in each video category to picture categories corresponding to the same facial features. In this way, the video is first subjected to face clustering processing to obtain a video category, and then the video category and the image category are subjected to face clustering processing to determine a video category and a picture category corresponding to the same face, each of which will be The video in the video category is assigned to the picture category corresponding to the same facial feature, thereby realizing the categorization processing of the video.

In one embodiment, the above method may also perform video categorization in the following manner, which does not require face clustering processing, but roughly assumes that as long as the shooting time and the shooting location are the same video and picture, it is considered They are the same person involved, they can be classified into one category, this method has certain accuracy and is fast. As shown in FIG. 3, the foregoing method may further include steps S301-S303: step S301, acquiring a shooting time and a shooting location of the video; and step S302, determining a destination image that is the same as the shooting time and the shooting location of the video; and step S303, the video is displayed. Assigned to the picture category to which the destination picture belongs.

A second aspect of the embodiments of the present disclosure provides a video categorization device, which can be used to manage an application of a multimedia file. At this time, the video, the picture category, and the picture under the picture category in the device refer to The video, image category, and image under the image category stored in the device where the above application is located. In addition, the device can also be used for an electronic device storing a multimedia file. In this case, the video, the picture category, and the picture under the picture category in the device refer to the video and picture categories stored in the electronic device. Picture under the picture category. The above application or electronic device may automatically trigger the device to perform an operation periodically, or may trigger the device to perform when receiving an instruction from the user. The operation may also automatically trigger the device to perform an operation when it detects that at least one new video is generated. The trigger timing may be various, and is not limited to the above-exemplified ones, and the ultimate purpose is to use the device to video. Intelligent classification, saving manpower. As shown in Figure 4, the device comprises:

The first obtaining module 41 is configured to acquire a key frame including a face in the video;

The second obtaining module 42 is configured to acquire a facial feature in the key frame acquired by the first obtaining module 41;

The third obtaining module 43 is configured to acquire a face feature corresponding to the picture category;

The first determining module 44 is configured to determine a picture category to which the video belongs according to the face feature in the key frame acquired by the second obtaining module 42 and the face feature corresponding to the picture category acquired by the third obtaining module 43;

The first allocating module 45 is configured to allocate the video to the picture category to which the video determined by the first determining module 41 belongs.

The foregoing device provided by the embodiment of the present disclosure can intelligently and automatically classify videos and pictures, and does not need to be manually classified by a user, and is classified according to facial features, and has high accuracy.

In an embodiment, as shown in FIG. 5, the first obtaining module 41 includes:

The obtaining submodule 51 is configured to acquire at least one video frame including a human face from the video;

The first determining sub-module 52 is configured to determine, in the at least one video frame acquired by the obtaining sub-module 51, a face parameter in each video frame, where the face parameter includes any one of a face number and a face position. Or two;

The second determining sub-module 53 is configured to determine key frames in the video based on the face parameters in each video frame.

In an embodiment, the second determining submodule 53 is further configured to determine, according to the face parameter in each video frame, a non-repetitive video frame in which the face parameter is not repeatedly present in other video frames; The repeated video frame is determined as a key frame. That is, the non-repeating video frame refers to a video frame in which the face parameter is different from any other video frame, that is, the face picture is not repeatedly displayed in other video frames, and therefore, one or more non-repetitions can be arbitrarily selected. Video frames are used as key frames.

In an embodiment, the second determining submodule 53 is further configured to determine, according to the face parameters in each video frame, at least one set of repeated video frames having the same face parameters, and each set of the repeated video frames includes at least two Video frames, the difference between the ingest time between the video frame with the latest ingested time and the video frame with the earliest time in each group of repeated video frames is less than or equal to the preset duration, and the face of all video frames in each group of repeated video frames The parameters are the same; any video frame in each set of repeated video frames is determined as a key frame.

The preset duration can be preset. Since the same picture in the video does not last for too long, the preset duration should not be too long. Considering that the video is played 24 frames per second, the preset duration can be Controlled in N/24 seconds, N is greater than or equal to 1, and less than or equal to 24 (or 36, or other values, which can be determined as needed). The shorter the preset duration, the more accurate the last selected keyframe. That is, the face image of each video frame in each group of repeated video frames is the same, that is, The same face picture appears in multiple video frames. Therefore, any one of the video frames can be selected as a key frame in each set of repeated video frames, which realizes the deduplication effect and improves the efficiency of selecting key frames.

In an embodiment, as shown in FIG. 6, the first determining module 44 includes:

a third determining sub-module 61 configured to determine a face feature in a key frame of each video when the number of videos is at least two; at least two according to a face feature in a key frame of each video The video performs face clustering processing to obtain at least one video category; one video category corresponds to one human face; specifically, the face clustering method shown in the foregoing steps A1-A5 may be used to perform face clustering for each key frame. Processing, obtaining at least one class; one class is a video category, such that each video category corresponds to a face feature; the video category to which the key frame of the video belongs is the video category to which the video belongs; corresponding to each of the at least one video category The face feature and the face feature corresponding to the picture category determine a video category and a picture category corresponding to the same facial feature; that is, a video category and a picture category corresponding to the same facial feature are determined.

The first distribution module 45 includes:

The first distribution sub-module 62 is configured to allocate the video in each video category determined by the third determination sub-mode 61 into the picture category corresponding to the same facial feature.

In the above device, the video is first subjected to face clustering processing to obtain a video category, and then the video category and the image category are subjected to face clustering processing to determine a video category and a picture category corresponding to the same face, and each video is selected. The video in the category is assigned to the picture category corresponding to the same facial feature, thereby realizing the categorization processing of the video.

In an embodiment, as shown in FIG. 7, the first determining module 44 includes:

The fourth determining sub-module 71 is configured to determine a picture category that matches a facial feature in the key frame in the facial feature corresponding to the picture category;

The second allocation sub-module 72 is configured to determine the matched picture category determined by the fourth determining sub-module 71 as the picture category to which the video belongs.

In an embodiment, as shown in FIG. 8, the foregoing apparatus further includes:

The fourth obtaining module 81 is configured to acquire a shooting time and a shooting location of the video;

The second determining module 82 is configured to determine a destination picture that is the same as the shooting time and the shooting location of the video acquired by the fourth obtaining module 81;

The second allocating module 83 is configured to allocate the video to the picture category to which the destination picture determined by the second determining module 82 belongs.

The above device does not need to perform face clustering processing, but roughly assumes that as long as the video and the picture with the same shooting time and shooting location are considered to be the same person, they can be classified into one category. Certain accuracy and fast classification.

processor;

a memory for storing processor executable instructions;

Wherein the processor is configured to:

Obtain key frames in the video that include faces;

Obtain face features in key frames;

Obtaining a face feature corresponding to the picture category;

Determining a picture category to which the video belongs according to the face feature in the key frame and the face feature corresponding to the picture category;

Assign the video to the picture category to which the video belongs.

FIG. 9 is a block diagram of an apparatus 800 for video categorization, according to an exemplary embodiment. For example, device 800 can be a mobile device such as a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, fitness device, personal digital assistant, and the like.

Referring to Figure 9, device 800 can include one or more of the following components: processing component 802, memory 804, power component 806, multimedia component 808, audio component 810, input/output (I/O) interface 812, sensor component 814, And a communication component 816.

Processing component 802 typically controls the overall operation of device 800, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. Processing component 802 can include one or more processors 820 to execute instructions to perform all or part of the steps of the above described methods. Moreover, processing component 802 can include one or more modules to facilitate interaction between component 802 and other components. For example, processing component 802 can include a multimedia module to facilitate interaction between multimedia component 808 and processing component 802.

Memory 804 is configured to store various types of data to support operation at device 800. Examples of such data include instructions for any application or method operating on device 800, contact data, phone book data, messages, pictures, videos, and the like. The memory 804 can be implemented by any type of volatile or non-volatile storage device, or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read only memory (EEPROM), erasable. Programmable Read Only Memory (EPROM), Programmable Read Only Memory (PROM), Read Only Memory (ROM), Magnetic Memory, Flash Memory, Disk or Optical Disk.

Power component 806 provides power to various components of device 800. Power component 806 can include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for device 800.

The multimedia component 808 includes a screen between the device 800 and the user that provides an output interface. In some embodiments, the screen can include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen can be implemented as a touch screen to receive input signals from the user. The touch panel includes one or more touch sensors to sense touches, slides, and gestures on the touch panel. The touch sensor can sense not only the boundary of the touch or the sliding action, It also detects the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 808 includes a front camera and/or a rear camera. When the device 800 is in an operation mode, such as a shooting mode or a video mode, the front camera and/or the rear camera can receive external multimedia data. Each front and rear camera can be a fixed optical lens system or have focal length and optical zoom capabilities.

The audio component 810 is configured to output and/or input an audio signal. For example, the audio component 810 includes a microphone (MIC) that is configured to receive an external audio signal when the device 800 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may be further stored in memory 804 or transmitted via communication component 816. In some embodiments, the audio component 810 also includes a speaker for outputting an audio signal.

The I/O interface 812 provides an interface between the processing component 802 and the peripheral interface module, which may be a keyboard, a click wheel, a button, or the like. These buttons may include, but are not limited to, a home button, a volume button, a start button, and a lock button.

Sensor assembly 814 includes one or more sensors for providing device 800 with a status assessment of various aspects. For example, sensor assembly 814 can detect an open/closed state of device 800, a relative positioning of components, such as the display and keypad of device 800, and sensor component 814 can also detect a change in position of one component of device 800 or device 800. The presence or absence of user contact with device 800, device 800 orientation or acceleration/deceleration, and temperature variation of device 800. Sensor assembly 814 can include a proximity sensor configured to detect the presence of nearby objects without any physical contact. Sensor assembly 814 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 814 can also include an acceleration sensor, a gyro sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

Communication component 816 is configured to facilitate wired or wireless communication between device 800 and other devices. The device 800 can access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 816 receives broadcast signals or broadcast associated information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component 816 also includes a near field communication (NFC) module to facilitate short range communication. For example, the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, device 800 may be implemented by one or more application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), field programmable A gate array (FPGA), controller, microcontroller, microprocessor, or other electronic component implementation for performing the above methods.

In an exemplary embodiment, there is also provided a non-transitory computer readable storage medium comprising instructions, such as a memory 804 comprising instructions executable by processor 820 of apparatus 800 to perform the above method. For example, the non-transitory computer readable storage medium may be a ROM, a random access memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, and Optical data storage devices, etc.

A non-transitory computer readable storage medium, when instructions in the storage medium are executed by a processor of a mobile terminal, enabling the mobile terminal to perform a video categorization method, the method comprising:

Obtain key frames in the video that include faces;

Obtaining a face feature in the key frame;

Obtaining a face feature corresponding to the picture category;

The video is assigned to a picture category to which the video belongs.

Other embodiments of the present disclosure will be readily apparent to those skilled in the <RTIgt; The present application is intended to cover any variations, uses, or adaptations of the present disclosure, which are in accordance with the general principles of the disclosure and include common general knowledge or common technical means in the art that are not disclosed in the present disclosure. . The specification and examples are to be regarded as illustrative only,

It is to be understood that the invention is not limited to the details of the details and The scope of the disclosure is to be limited only by the appended claims.

Claims

A video categorization method, comprising:

Obtain key frames in the video that include faces;

Obtaining a face feature in the key frame;

Obtaining a face feature corresponding to the picture category;

Determining, according to the face feature in the key frame and the face feature corresponding to the picture category, a picture category to which the video belongs;

The video is assigned to a picture category to which the video belongs.
The method according to claim 1, wherein the acquiring a key frame including a face in the video comprises:

Obtaining at least one video frame including a face from the video;

Determining, in the at least one video frame, a face parameter in each video frame, where the face parameter includes any one or two of a face number and a face position;

A key frame in the video is determined based on the face parameters in each of the video frames.
The method according to claim 2, wherein the determining a key frame in the video according to a face parameter in each video frame comprises:

Determining, according to the face parameter in each video frame, the non-repetitive video frame that the face parameter does not repeatedly appear in other video frames;

At least one of the non-repeating video frames is determined as the key frame.
The method according to claim 2, wherein the determining a key frame in the video according to a face parameter in each video frame comprises:

Determining, according to the face parameter in each video frame, at least one set of repeated video frames with the same face parameter, and each group of the repeated video frames includes at least two video frames, each group of the repetition The difference between the ingest time between the video frame with the latest ingested time and the video frame with the earliest time of the video frame is less than or equal to the preset duration, and the face parameters of all the video frames in each group of the repeated video frames are the same;

Any one of the sets of the repeated video frames is determined as the key frame.
The method of claim 1 wherein

Determining, according to the face feature in the key frame and the face feature corresponding to the picture category, the picture category to which the video belongs, including:

Determining a face feature in the key frame of each video when the number of the videos is at least two;

Performing face clustering processing on the at least two videos according to the face features in the key frame of each video Get at least one video category;

Determining a video category and a picture category corresponding to the same facial feature according to a face feature corresponding to each of the at least one video category and a face feature corresponding to the picture category;

The assigning the video to a picture category to which the video belongs includes:

The video in each of the video categories is assigned to a picture category corresponding to the same facial feature.
The method according to claim 1, wherein the determining the picture category to which the video belongs according to the face feature in the key frame and the face feature corresponding to the picture category comprises:

Determining, in a face feature corresponding to the picture category, a picture category that matches a face feature in the key frame;

The matched picture category is determined as the picture category to which the video belongs.
The method of claim 1 wherein the method further comprises:

Obtaining the shooting time and shooting location of the video;

Determining a picture of the same purpose as the shooting time and shooting location of the video;

The video is assigned to a picture category to which the destination picture belongs.
A video categorizing device, comprising:

a first acquiring module, configured to acquire a key frame including a face in the video;

a second acquiring module, configured to acquire a facial feature in the key frame acquired by the first acquiring module;

a third acquiring module, configured to acquire a face feature corresponding to the picture category;

a first determining module, configured to determine the video according to a face feature in the key frame acquired by the second acquiring module and a face feature corresponding to the picture category acquired by the third acquiring module The category of the picture to which it belongs;

a first allocation module, configured to allocate the video to a picture category to which the video determined by the first determining module belongs.
The device according to claim 8, wherein the first obtaining module comprises:

Obtaining a submodule, configured to acquire at least one video frame including a human face from the video;

a first determining submodule, configured to determine a face parameter in each video frame in the at least one video frame acquired by the acquiring submodule, where the face parameter includes a number of faces and a face position Any one or two;

And a second determining submodule, configured to determine a key frame in the video according to the face parameter in each video frame.
The device of claim 9 wherein:

The second determining submodule is further configured to determine, according to the face parameter in each video frame, a non-repetitive video frame in which the face parameter is not repeatedly displayed in other video frames; The non-repetitive video frame is determined as the key frame.
The device of claim 9 wherein:

The second determining sub-module is further configured to determine, according to the face parameter in each video frame, at least one set of repeated video frames with the same face parameter, where each group of the repeated video frames is included At least two video frames, the difference between the ingest time between the video frame with the latest ingested time and the video frame with the earliest time in each of the repeated video frames is less than or equal to a preset duration, and each group of the repeated video frames The face parameters of all video frames are the same; any one of the sets of the repeated video frames is determined as the key frame.
The device of claim 8 wherein:

The first determining module includes:

a third determining submodule, configured to determine a face feature in the key frame of each video when the number of the videos is at least two; according to a face feature in the key frame of each video, Performing face clustering processing on the at least two videos to obtain at least one video category; determining corresponding corresponding facial features according to respective facial features corresponding to the at least one video category and facial features corresponding to the image category Video category and image category;

The first distribution module includes:

And a first allocation submodule, configured to allocate, in the picture category corresponding to the same facial feature, the video in each video category determined by the third determining submodule.
The device of claim 8, wherein the first determining module comprises:

a fourth determining submodule, configured to determine, in a facial feature corresponding to the picture category, a picture category that matches a facial feature in the key frame;

a second allocation submodule, configured to determine, by the fourth determining submodule, the matched picture category as a picture category to which the video belongs.
The device of claim 8 further comprising:

a fourth acquiring module, configured to acquire a shooting time and a shooting location of the video;

a second determining module, configured to determine a target picture that is the same as the shooting time and the shooting location of the video acquired by the fourth acquiring module;

a second allocation module, configured to allocate the video to a picture category to which the target picture determined by the second determining module belongs.
A video classification device, comprising:

processor;

a memory for storing processor executable instructions;

Wherein the processor is configured to:

Obtain key frames in the video that include faces;

Obtaining a face feature in the key frame;

Obtaining a face feature corresponding to the picture category;

Determining, according to the face feature in the key frame and the face feature corresponding to the picture category, a picture category to which the video belongs;

The video is assigned to a picture category to which the video belongs.