CN107358141B

CN107358141B - Data identification method and device

Info

Publication number: CN107358141B
Application number: CN201610306321.3A
Authority: CN
Inventors: 毛锋
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2016-05-10
Filing date: 2016-05-10
Publication date: 2020-10-23
Anticipated expiration: 2036-05-10
Also published as: CN107358141A

Abstract

The application provides a data identification method and device. The method comprises the following steps: performing superframe segmentation on a live video, wherein a video segment composed of a plurality of frames which are adjacent in time and meet a preset similar condition is segmented into a superframe; when the super frame is divided into the super frames, acquiring key frames of the divided super frames; the key frame is identified to determine whether the superframe contains a particular type of content. According to the technical scheme, the method and the device for monitoring the content of the live video can monitor the content of a specific type in real time.

Description

Data identification method and device

Technical Field

The present application relates to the field of internet technologies, and in particular, to a data identification method and apparatus.

Background

The live video of the internet is taken as a new data content transmission mode, so that on one hand, the user experience and the efficiency are improved, on the other hand, a new transmission channel is provided for bad contents, particularly pornographic contents, and not only are huge violation risks brought to internet platform merchants and operators, but also huge negative effects are caused to the society and the country. The new channel of live broadcast video is a real-time content generator, and has huge cost and low efficiency if manual review is needed.

The identification of objectionable content in a video generally includes the extraction of video frames and the determination of video frames. For the extraction of video frames, in the prior art, frames are generally extracted by an equal-interval sampling method, the quality of the frames obtained by the method is often poor, the accuracy rate of judging the video frames in the second link is low, and the number of frames to be identified is large, so that the calculation amount is large, and the system delay is increased. In addition, in the prior art, a method is also used for extracting video frames through shot boundary detection, that is, a video is divided into a plurality of independent shots, and frames are randomly taken for each shot.

For the judgment of video frames, the existing identification method based on a skin color model or a sensitive part model is sensitive to illumination and the races with different skin colors, and has higher misjudgment rate for normal naked people such as faces and arms; the sensitive part model is sensitive to shielding, and the extraction speed and the recognition speed of the features are low.

Disclosure of Invention

An object of the present application is to provide a method and an apparatus for data identification, so as to monitor whether a specific type of content exists in a live video in real time.

According to an aspect of the present application, there is provided a method of data identification, wherein the method comprises the steps of:

performing superframe segmentation on a live video to obtain a plurality of superframes, wherein a video segment composed of a plurality of frames which are adjacent in time and meet a preset similar condition is segmented into one superframe; when the super frame is divided into the super frames, acquiring key frames of the divided super frames; key frames of the superframe are identified to determine whether the superframe contains a particular type of content.

Optionally, in the method, the step of performing superframe segmentation on the video in the live broadcast to obtain a plurality of superframes includes: sampling the video in the live broadcast once every a preset frame number interval to obtain a sampling frame; judging whether the obtained sampling frame and a video segment which is not divided into superframes before the sampling frame can be distributed in the same superframe or not; if not, the video segment which is not divided into the super frame before the sampling frame is divided into a super frame.

Optionally, in the method, the step of determining whether the obtained sample frame and the video segment that has not been divided into superframes before the sample frame can be allocated in the same superframe comprises: calculating the frame-to-frame difference between the sampling frame and the last sampling frame according to the color histogram of the sampling frame and the color histogram of the last sampling frame, which are obtained by extracting the color features of the sampling frame and the last sampling frame, wherein the last sampling frame is a sampling frame obtained by sampling the video in the live broadcast before the sampling frame; and determining whether the video segments which are not divided into the super frames before the sampling frame and the sampling frame can be distributed in the same super frame according to the inter-frame difference between the sampling frame and the last sampling frame and the average inter-frame difference of the video segments which are not divided into the super frames before the sampling frame.

Optionally, in the method, the step of determining whether the video segment of the sample frame and the video segment of the sample frame which has not been divided into superframes can be allocated in the same superframe according to the interframe difference between the sample frame and the last sample frame and the average interframe difference of the video segments of the sample frame which has not been divided into superframes before the sample frame comprises: judging whether the inter-frame difference between the sampling frame and the last sampling frame is smaller than a first preset threshold value or not, and whether the difference value between the inter-frame difference between the sampling frame and the last sampling frame and the average inter-frame difference of the video segments which are not divided into super frames before the sampling frame is smaller than a second preset threshold value or not; if yes, determining that the sampling frame and a video segment which is not divided into superframes before the sampling frame can be distributed in the same superframe;

and/or the like, and/or,

judging whether the inter-frame difference between the sampling frame and the last sampling frame is larger than a third preset threshold value or not, and whether the difference value between the inter-frame difference between the sampling frame and the last sampling frame and the average inter-frame difference of the video segments which are not divided into super frames before the sampling frame is smaller than a fourth preset threshold value or not; if so, it is determined that the sample frame and the video segment that has not been previously segmented into superframes by the sample frame can be allocated within the same superframe.

Optionally, in the method, the step of determining whether the sample frame and the video segment that has not been divided into superframes before the sample frame can be allocated in the same superframe further comprises: acquiring motion matching information of the sampling frame and the previous sampling frame; judging whether the motion matching information meets a preset condition or not, wherein the preset condition comprises the following steps: whether the number of the feature points matched with the sampling frame and the last sampling frame is greater than a first preset number, whether the ratio of the number of the feature points matched with the sampling frame and the last sampling frame to the number of the feature points of the last sampling frame is greater than a first preset ratio, and whether the proportion of stationary points in the feature points matched with the sampling frame and the last sampling frame is greater than a first preset ratio; if so, it is determined that the sample frame and the video segment that has not been previously segmented into superframes by the sample frame can be allocated within the same superframe.

Optionally, in the method, the step of obtaining motion matching information of the sampling frame and a previous sampling frame includes: and performing optical flow calculation based on the feature point set of the sampling frame and the feature point set of the previous sampling frame, which are obtained by performing texture feature extraction on the sampling frame and the previous sampling frame, so as to obtain the motion matching information of the sampling frame and the previous sampling frame.

Optionally, in the method, the step of performing superframe segmentation on the video in the live broadcast further includes: judging whether the divided superframe needs to be merged with the previous superframe or not, comprising the following steps: judging whether the length of the divided superframe is less than a preset frame number, if so, judging whether the average frame difference of the superframe is greater than a fifth preset threshold value, and whether the difference value of the average frame difference of the sampling frame and the superframe is less than a sixth preset threshold value, if so, determining that the divided superframe needs to be merged with the previous superframe; or, judging whether the length of the divided superframe is less than a preset frame number, if so, judging whether the number of the feature points matched with the last sampling frame of the sampling frame is more than a second preset number, whether the ratio of the number of the feature points matched with the last sampling frame of the sampling frame to the number of the feature points of the last sampling frame is more than a second preset ratio, and whether the proportion of stationary points in the feature points matched with the last sampling frame of the sampling frame to the super frame is more than a second preset ratio, if so, determining that the divided superframe needs to be combined with the last superframe;

and if so, merging the divided superframe with the previous superframe.

Optionally, in the method, the step of dividing the video segment, which has not been divided into superframes before the sampling frame, into a superframe includes: calculating the inter-frame difference of every two adjacent frames in the non-sampled frames between the sampling frame and the last sampling frame; and determining the position with the largest inter-frame difference of two adjacent frames as the shearing position of the divided superframe and the next superframe.

Optionally, in the method, the step of acquiring the key frame of the divided superframe includes: acquiring a sampling frame in the superframe as a candidate key frame set of the superframe; performing deduplication processing on candidate key frames in the candidate key frame set to obtain a deduplication candidate key frame set of the superframe; and screening the key frames of the super frame from the de-duplication candidate key frame set according to the quality scores of the candidate key frames in the de-duplication candidate key frame set.

Optionally, in the method, the step of performing deduplication processing on the candidate keyframes in the candidate keyframe set to obtain a deduplication candidate keyframe set of the superframe includes: calculating the inter-frame difference of every two candidate key frames in the candidate key frame set, and if the inter-frame difference of the two candidate key frames is smaller than a preset inter-frame difference threshold value, determining the two candidate key frames as similar frames; and for the similar frames in the candidate key frame set, keeping any one frame and deleting the rest similar frames.

Optionally, in the method, the quality scores of the candidate keyframes in the de-duplication candidate keyframe set are obtained by performing quality judgment on the candidate keyframes in the de-duplication candidate keyframe set based on a pre-established quality judgment model; the step of performing quality judgment on the candidate key frames in the de-duplication candidate key frame set based on a pre-established quality judgment model comprises the following steps: calculating the information entropy of the candidate key frame according to the global color histogram of the candidate key frame in the duplicate removal candidate key frame set; generating image feature vectors of the candidate key frames based on the calculated information entropy of the candidate key frames and the motion features of the candidate key frames; and performing quality judgment on the image feature vector according to a pre-established quality judgment model to obtain the quality scores of the candidate key frames.

Optionally, in the method, the step of screening out the keyframes of the superframe from the de-duplication candidate keyframe set according to the quality scores of the candidate keyframes in the de-duplication candidate keyframe set includes: eliminating candidate key frames with the quality scores smaller than a preset score threshold value in the duplication elimination candidate key frame set, and determining the reserved candidate key frames as the key frames of the super frame; or, sorting the candidate key frames in the duplicate removal candidate key frame set according to the sequence of the mass fractions from large to small; reserving a predetermined number of top ranked candidate key frames as key frames of the superframe.

Optionally, in the method, identifying a key frame of the superframe to determine whether the superframe contains a specific type of content includes:

identifying each key frame of the super frame based on a pre-established image identification model for identifying the specific type of content so as to acquire the probability that each key frame of the super frame contains the specific type of content; and judging whether the superframe contains the content of the specific type or not according to the average value of the probabilities that each key frame of the superframe contains the content of the specific type and a preset probability threshold.

Optionally, in the method, the image recognition model for recognizing the specific type of content is an image recognition model based on a deep convolutional neural network.

According to another aspect of the present application, there is also provided an apparatus for data identification, wherein the apparatus includes:

the video segment dividing unit is used for dividing the video in the live broadcast into a plurality of superframes, wherein the video segment composed of a plurality of frames which are adjacent in time and meet the preset similar condition is divided into one superframe; a key frame acquisition unit configured to acquire a key frame of a superframe divided when divided into the superframe; a content identification unit, configured to identify a key frame of the superframe to determine whether the superframe contains a specific type of content.

Optionally, in the apparatus, the superframe splitting unit includes:

the sampling unit is used for sampling the video in the live broadcast once at intervals of preset frame numbers to obtain a sampling frame; a judging unit, configured to judge whether the obtained sampling frame and a video segment that has not been divided into superframes before the sampling frame can be allocated in the same superframe; and the dividing unit is used for dividing the video segment which is not divided into the super frame before the sampling frame into one super frame if the sampling frame is not divided into the super frames.

Optionally, in the apparatus, the determining unit includes: an inter-frame difference calculation unit, configured to calculate an inter-frame difference between the sampling frame and a previous sampling frame according to a color histogram of the sampling frame and a color histogram of the previous sampling frame, where the color histogram of the sampling frame is obtained by performing color feature extraction on the sampling frame and the previous sampling frame, and the previous sampling frame is a sampling frame that is sampled before the sampling frame and is spaced from the sampling frame by a predetermined number of frames; a first determining unit, configured to determine whether the video segment that has not been divided into superframes before the sampling frame and the sampling frame can be allocated in the same superframe according to an interframe difference between the sampling frame and a previous sampling frame and an average interframe difference of the video segment that has not been divided into superframes before the sampling frame.

Optionally, in the apparatus, the first determining unit is further configured to: judging whether the inter-frame difference between the sampling frame and the last sampling frame is smaller than a first preset threshold value or not, and whether the difference value between the inter-frame difference between the sampling frame and the last sampling frame and the average inter-frame difference of the video segments which are not divided into super frames before the sampling frame is smaller than a second preset threshold value or not; if yes, determining that the sampling frame and a video segment which is not divided into superframes before the sampling frame can be distributed in the same superframe; and/or judging whether the inter-frame difference between the sampling frame and the last sampling frame is greater than a third preset threshold value and whether the difference value between the inter-frame difference between the sampling frame and the last sampling frame and the average inter-frame difference of the video segments which are not divided into superframes before the sampling frame is less than a fourth preset threshold value; if so, it is determined that the sample frame and the video segment that has not been previously segmented into superframes by the sample frame can be allocated within the same superframe.

Optionally, in the apparatus, the determining unit further includes: the information acquisition unit is used for acquiring the motion matching information of the sampling frame and the previous sampling frame; a condition judgment unit configured to judge whether the motion matching information meets a predetermined condition, where the predetermined condition includes: whether the number of the feature points matched with the sampling frame and the last sampling frame is greater than a first preset number, whether the ratio of the number of the feature points matched with the sampling frame and the last sampling frame to the number of the feature points of the last sampling frame is greater than a first preset ratio, and whether the proportion of stationary points in the feature points matched with the sampling frame and the last sampling frame is greater than a first preset ratio; and a second determining unit, configured to determine that the sample frame and the video segment that has not been divided into superframes before the sample frame can be allocated in the same superframe if yes.

Optionally, in the apparatus, the information obtaining unit is further configured to:

and performing optical flow calculation based on the feature point set of the sampling frame and the feature point set of the previous sampling frame, which are obtained by performing texture feature extraction on the sampling frame and the previous sampling frame, so as to obtain the motion matching information of the sampling frame and the previous sampling frame.

Optionally, in the apparatus, the determining unit further includes: a superframe judging unit, configured to judge whether the length of a divided superframe is less than a predetermined number of frames, if so, judge whether an average inter-frame difference of the superframe is greater than a fifth predetermined threshold, and whether a difference value between the average frame difference of the sampling frame and the superframe is less than a sixth predetermined threshold, and if so, determine that the divided superframe needs to be merged with a previous superframe;

or, the length of the superframe is used for judging whether the length of the superframe is smaller than a preset frame number, if so, judging whether the number of the feature points matched with the previous sampling frame is larger than a second preset number, whether the ratio of the number of the feature points matched with the previous sampling frame to the number of the feature points of the previous sampling frame is larger than a second preset ratio, and whether the proportion of stationary points in the feature points matched with the previous sampling frame to the sampling frame is larger than a second preset ratio, if so, determining that the superframe needs to be merged with the previous superframe; and a superframe merging unit, configured to merge the divided superframe with a previous superframe if the superframe judging unit judges that the divided superframe needs to be merged with the previous superframe.

Optionally, in the apparatus, the dividing unit includes: the computing unit is used for computing the interframe difference of every two adjacent frames between the sampling frame and the previous sampling frame; and a shear position determining unit for determining a position where the inter-frame difference between two adjacent frames is the largest as a shear position of the divided super frame and the next super frame.

Optionally, in the apparatus, the key frame acquiring unit includes: a candidate key frame set acquiring unit, configured to acquire a sample frame in the super frame as a candidate key frame set of the super frame; a duplicate removal processing unit, configured to perform duplicate removal processing on the candidate key frames in the candidate key frame set to obtain a duplicate removal candidate key frame set of the super frame; and the key frame screening unit is used for screening the key frames of the super frame from the de-duplication candidate key frame set according to the quality scores of the candidate key frames in the de-duplication candidate key frame set.

Optionally, in the apparatus, the deduplication processing unit includes: a similar frame determining unit, configured to calculate an inter-frame difference between every two candidate key frames in the candidate key frame set, and determine that the two candidate key frames are similar frames if the inter-frame difference between the two candidate key frames is smaller than a predetermined inter-frame difference threshold; and the similar frame duplicate removal unit is used for keeping any one of the similar frames in the candidate key frame set and deleting the rest similar frames.

Optionally, in the apparatus, the quality scores of the candidate keyframes in the de-duplication candidate keyframe set are obtained by performing quality judgment on the candidate keyframes in the de-duplication candidate keyframe set based on a pre-established quality judgment model; the device further comprises: the quality judgment unit is used for carrying out quality judgment on the candidate key frames in the duplicate removal candidate key frame set based on a pre-established quality judgment model; the quality determination unit includes: the information entropy calculation unit is used for calculating the information entropy of the candidate key frames according to the global color histogram of the candidate key frames in the duplicate removal candidate key frame set; the image feature vector acquisition unit is used for generating an image feature vector of the candidate key frame based on the calculated information entropy of the candidate key frame and the motion feature of the candidate key frame; and the quality score acquisition unit is used for carrying out quality judgment on the image feature vector according to a pre-established quality judgment model so as to acquire the quality score of the candidate key frame.

Optionally, in the apparatus, the key frame filtering unit is further configured to: eliminating candidate key frames with the quality scores smaller than a preset score threshold value in the duplication elimination candidate key frame set, and determining the reserved candidate key frames as the key frames of the super frame; or, sorting the candidate key frames in the duplicate removal candidate key frame set according to the sequence of the mass fractions from large to small; reserving a predetermined number of top ranked candidate key frames as key frames of the superframe.

Optionally, in the apparatus, the content identifying unit includes: a probability obtaining unit, configured to identify, based on a pre-established image identification model for identifying the specific type of content, each key frame of the super frame to obtain a probability that each key frame of the super frame includes the specific type of content; and the content judging unit is used for judging whether the superframe contains the content of the specific type according to the average value of the probabilities that each key frame of the superframe contains the content of the specific type and a preset probability threshold.

Optionally, in the apparatus, the image recognition model for recognizing the specific type of content is an image recognition model based on a deep convolutional neural network.

Compared with the prior art, the embodiment of the application has the following advantages:

the video live broadcasting method based on the super-frame concept has the advantages that the super-frame segmentation is carried out on continuously generated videos in the live broadcasting process of the videos, and the identification of specific types of contents is carried out by taking the super-frame as a unit, so that a super-frame video section containing the specific types of contents is identified in the live broadcasting process, and the real-time monitoring on the specific types of contents of the videos in the live broadcasting process is realized. According to the method and the device, based on the calculation of the entropy of the image information and the extraction of the motion characteristics, the quality judgment is carried out on each candidate key frame (sampling frame), so that the key frame of the super frame is screened out, and the high-quality key frame can be obtained. According to the method and the device, the specific type of content is identified by using the image identification model based on the deep convolutional neural network, so that the identification accuracy is improved.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is a flow chart of a method provided by one embodiment of the present application;

fig. 2 is a flowchart of an embodiment of a step (S110) of performing superframe segmentation on a video in live broadcast according to an embodiment of the present disclosure;

FIG. 3 is a flowchart of one embodiment of the step (S112) of determining whether the sample frame and a video segment that has not been previously segmented into superframes can be allocated in the same superframe according to the present application;

fig. 4 is a flowchart of another implementation of the step (S112) of determining whether the sample frame and the video segment that has not been divided into superframes before the sample frame can be allocated in the same superframe according to the present embodiment;

fig. 5 is a specific flowchart of the step (S113) of dividing the video segment which has not been divided into superframes according to the embodiment of the present application;

fig. 6 is a flowchart of another implementation of the step (S110) of performing superframe segmentation on a video in live broadcast according to an embodiment of the present application;

fig. 7 is a detailed flowchart of the step (S120) of acquiring the key frame of the superframe divided according to the embodiment of the present application;

FIG. 8 is a flowchart illustrating the step (S122) of performing deduplication processing on candidate keyframes in the candidate keyframe set according to an embodiment of the present disclosure;

FIG. 9 is a flowchart illustrating a specific process of determining the quality of candidate keyframes in the de-duplication candidate keyframe set based on a pre-established quality determination model according to an embodiment of the present application;

fig. 10 is a flowchart illustrating the step of identifying key frames of the superframe (S130) according to an embodiment of the present disclosure;

FIG. 11 is a schematic illustration of an apparatus provided by an embodiment of the present application;

fig. 12 is a schematic diagram illustrating an implementation manner of a superframe splitting unit 210 in an apparatus according to an embodiment of the present disclosure;

fig. 13 is a schematic diagram illustrating an implementation manner of the determining unit 212 in the apparatus according to the embodiment of the present disclosure;

fig. 14 is a schematic diagram of another implementation manner of the determining unit 212 in the apparatus provided in the embodiment of the present application;

fig. 15 is a schematic diagram illustrating another implementation manner of the superframe segmentation unit 210 in the apparatus according to the embodiment of the present application;

fig. 16 is a schematic diagram of a segmentation unit 213 in the apparatus provided in the embodiment of the present application;

fig. 17 is a schematic diagram of a key frame obtaining unit 220 in the apparatus according to the embodiment of the present application;

fig. 18 is a schematic diagram of a deduplication processing unit 222 in an apparatus provided in an embodiment of the present application;

fig. 19 is a schematic diagram of a content identification unit 230 in the apparatus according to the embodiment of the present application.

The same or similar reference numbers in the drawings identify the same or similar elements.

Detailed Description

Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel, concurrently, or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like.

The term "computer device" or "computer" in this context refers to an intelligent electronic device that can execute predetermined processes such as numerical calculation and/or logic calculation by running predetermined programs or instructions, and may include a processor and a memory, wherein the processor executes a pre-stored instruction stored in the memory to execute the predetermined processes, or the predetermined processes are executed by hardware such as ASIC, FPGA, DSP, or a combination thereof. Computer devices include, but are not limited to, servers, personal computers, laptops, tablets, smart phones, and the like.

The computer equipment comprises user equipment and network equipment. Wherein the user equipment includes but is not limited to computers, smart phones, PDAs, etc.; the network device includes, but is not limited to, a single network server, a server group consisting of a plurality of network servers, or a Cloud Computing (Cloud Computing) based Cloud consisting of a large number of computers or network servers, wherein Cloud Computing is one of distributed Computing, a super virtual computer consisting of a collection of loosely coupled computers. The computer equipment can be independently operated to realize the application, and can also be accessed into a network to realize the application through the interactive operation with other computer equipment in the network. The network in which the computer device is located includes, but is not limited to, the internet, a wide area network, a metropolitan area network, a local area network, a VPN network, and the like.

It should be noted that the user equipment, the network device, the network, etc. are only examples, and other existing or future computer devices or networks may also be included in the scope of the present application, if applicable, and are included by reference.

The methods discussed below, some of which are illustrated by flow diagrams, may be implemented by hardware, software, firmware, middleware, microcode, hardware description languages, or any combination thereof. When implemented in software, firmware, middleware or microcode, the program code or code segments to perform the necessary tasks may be stored in a machine or computer readable medium such as a storage medium. The processor(s) may perform the necessary tasks.

Specific structural and functional details disclosed herein are merely representative and are provided for purposes of describing example embodiments of the present application. This application may, however, be embodied in many alternate forms and should not be construed as limited to only the embodiments set forth herein.

It will be understood that, although the terms first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish one element from another. For example, a first element may be termed a second element, and, similarly, a second element may be termed a first element, without departing from the scope of example embodiments. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used herein, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be noted that, in some alternative implementations, the functions/acts noted may occur out of the order noted in the figures. For example, two figures shown in succession may, in fact, be executed substantially concurrently, or the figures may sometimes be executed in the reverse order, depending upon the functionality/acts involved.

The content of the known technology on which this application is based and which helps to understand this application:

live video: and synchronously making and releasing information along with the occurrence and development processes of events on site, namely, shooting contents which are generated and played in real time. Such as live (television or web) programming.

Video shot: is a video segment that is continuously taken by a camera at a time, such as a movie, which is typically composed of a plurality of shots.

Video key frame: a shot is usually shot in a scene, so there is considerable repeated information in each frame image under a shot, and therefore, frames capable of describing the main content of the shot are usually selected as key frames to express the shot compactly. A shot may have one or more key frames depending on the complexity of the shot content.

Image identification based on a deep convolutional neural network: the method is a multi-layer neural network which is specially designed and used for processing two-dimensional shapes, a large number of convolution filters (called local perception fields) are added, characteristics can be independently learned through a large number of labeled data, meanwhile, classifiers are generated, and finally generated models can overcome the problems of diversification of ambient light, variety of human species and the like.

Superframe: a video segment composed of a plurality of time-adjacent frames with certain similarity (reaching a preset similarity) in the video forms a superframe.

The present application is described in further detail below with reference to the attached figures.

Fig. 1 is a flowchart of a data identification method according to an embodiment of the present application. The method 1 according to the present application comprises at least a step S110, a step S120 and a step S130. The method and the device can be applied to a video live broadcast system so as to identify the specific type of content of the live broadcast video.

In step S110, a video in live is superframe-divided to obtain a plurality of superframes.

A video segment composed of a plurality of frames which are adjacent in time and meet a preset similarity condition is divided into a superframe. I.e. a video segment consisting of a series of consecutive frames with a certain similarity. For example, a person first sits in front of a shot (consisting of a number of consecutive frames with some similarity to adjacent frames), then stands up and then walks out of the shot, which here consists of 3 superframes. For example, by performing color feature extraction on a frame of live video, inter-frame difference calculation between two adjacent frames (or two adjacent sampling frames) is performed, so as to determine whether the two frames have certain similarity, i.e., can be allocated to the same superframe video segment, according to the inter-frame difference. The super-frame division is performed by allocating a plurality of consecutive frames having a certain similarity (satisfying a predetermined similarity condition) to one video segment as one super-frame, and thereby determining a specific type of content in units of super-frames. The superframe partition is actually a boundary between one superframe (into which the superframe is currently partitioned) and the next superframe of the superframe.

Referring to fig. 2, in one embodiment, step S110 specifically includes step S111, step S112, and step S113.

And step S111, sampling the video in the live broadcast once at intervals of preset frame numbers to obtain a sampling frame.

Since live video is a kind of streaming content, a streaming processing method is also used for processing live video, that is, processing video data generated in real time. In order to improve the video identification speed, the current frame is not processed when each frame of video flows in, but the video in the live broadcast is sampled once every preset frame interval in the live broadcast process (namely, frames currently flowing into a video live broadcast system are extracted every preset frame interval), and the video in the live broadcast is subjected to superframe segmentation attempt based on the obtained sampled frames. For example, live video generated in real time (streaming to a video live system) is sampled every 10 frames. The sampling processing is carried out when the live video is subjected to superframe division, so that the calculation amount for carrying out superframe division can be reduced, and the efficiency for carrying out superframe division can be improved.

In step S112, it is determined whether the obtained sampling frame and the video segment that has not been divided into superframes before the sampling frame can be allocated in the same superframe.

The sample frame has not been previously divided into video segments of superframes, i.e. video segments composed of frames after the last (already) divided superframe and before the sample frame. The superframe division actually judges a boundary between a superframe (into which the superframe is currently divided) and a next superframe of the superframe, judges whether the video segment which is not divided into superframes before the sampling frame and the sampling frame can be distributed in the same superframe, and judges whether the sampling frame obtained by sampling is a superframe boundary, that is, whether the sampling frame can form a new superframe (that is, the video segment which is not divided into superframes is divided into the next superframe after the superframe).

In a specific embodiment, based on picture color characteristics of the sampling frame and a last sampling frame (wherein the last sampling frame belongs to a video segment which is not divided into superframes before the sampling frame), similarity between the sampling frame and the last sampling frame is compared, so as to judge whether the sampling frame and the video segment which is not divided into superframes before can be allocated in the same superframe.

Referring to fig. 3, step S112 specifically includes step S1121 and step S1122.

Step S1121, calculating the inter-frame difference between the sampling frame and the previous sampling frame according to the color histogram of the sampling frame and the color histogram of the previous sampling frame, which are obtained by performing color feature extraction on the sampling frame and the previous sampling frame.

In a specific embodiment, a color histogram of each frame of image is calculated by adopting a color feature extraction method of block extraction. And when the color histogram of the sampling frame is calculated, dividing the sampling frame into N multiplied by N blocks, calculating the color histogram of each block respectively, and calculating the inter-frame difference between the sampling frame and the previous sampling frame according to the histogram of each corresponding block of the previous sampling frame obtained when the previous sampling frame is processed. Since the picture of the live video may be unstable, the calculation may be performed by overlapping blocks. For example, the sample frame is equally divided into 3 × 3 blocks, and each block is calculated by taking 1/3 of its own block and its neighboring blocks. And when calculating the color histogram of each block, converting the sampling frame from the RGB color space to the HSV color space, and calculating the color histogram of the HSV color space of each block after quantizing the HSV color space.

For example, H, S, V three components are quantized to 8, 3, 2, respectively, i.e., hue H space is divided into 8, saturation S space is divided into 3, and brightness V space is divided into 2. Therefore, the color space is divided into 8 × 3 × 2 — 48 color bins (bins) by quantization, and a color histogram is obtained by calculating the number of pixels whose colors fall within each color bin.

Wherein, the color histogram of each block Hist [ i ] { (h _ Hist [ hi ], s _ Hist [ si ], v _ Hist [ vi ]) }, i is more than or equal to 0 and less than 9, hi is more than or equal to 0 and less than 8, si is more than or equal to 0 and less than 3, and vi is more than or equal to 0 and less than 2;

where H _ hist [ hi ] ═ Σ (f (ih (x)) hi, where ih (x) represents any one pixel value in the H component of the block i, and f is a normalization function; similarly, s _ hist [ si ] ═ Σ (f (is (x)) si), h _ hist [ vi ] ═ Σ (f (iv (x)) vi).

It should be noted that the color histogram of each block of the previous sampled frame is obtained by using the above-described embodiment when the previous sampled frame is processed.

And calculating the interframe difference of the corresponding blocks of the sampling frame and the previous sampling frame according to the obtained color histograms of the blocks of the sampling frame and the previous sampling frame, and determining the interframe difference of the sampling frame and the previous sampling frame according to the calculated interframe difference of the corresponding blocks.

For example, the baryta distance of the histogram (Bhattacharyya distance) is used to calculate the inter-frame difference of the corresponding blocks of two frames:

Dist[i]＝1-∑sqrt(Hist1[j]*Hist2[j])，0≤i＜9，0≤j＜48 (1)

wherein i is the serial number of the block of the sampling frame, the color space is divided into 8 × 3 × 2 ═ 48 color intervals (bins) through the quantization, and j is the serial number of the color space of the color histogram; hist1[ j ] and Hist2[ j ] represent the pixel values of the color bin j in the color histograms of the block i of the (currently) sampled frame and the last sampled frame, respectively. The calculated pap distance is in the range of 0 to 1, with calculation results closer to 0 indicating more similarity. And respectively calculating the Babbitt distance of the corresponding blocks of the sampling frame and the previous sampling frame, and taking the average value of the first three maximum values obtained by calculation as the inter-frame difference of the sampling frame and the previous sampling frame. It should be noted that the calculation of the inter-frame difference is not limited to the use of the babbitt distance, and may also use calculation methods such as euclidean distance.

Step S1122 determines whether the video segments of the sample frame and the sample frame that have not been divided into superframes can be allocated in the same superframe according to the interframe difference between the sample frame and the previous sample frame and the average interframe difference of the video segments of the sample frame that have not been divided into superframes.

The frame-to-frame difference between the sampled frame and the previous sampled frame is calculated, and the similarity between the sampled frame and the previous sampled frame is actually compared. The video segments divided into superframes can be divided into still superframe video segments and moving superframe video segments. In which two frames of images in consecutive sampling intervals in a still superframe video segment are relatively still, it is possible that the camera lens is slightly shaken or objects in the video are slowly moving, for example, a person sits in front of the camera lens. There is a certain percentage or more of different content in the two frames of images in successive sampling intervals in a moving superframe video segment, for example, a person sitting in front of a shot stands up. Determining whether the sample frame and a video segment that has not been previously segmented into superframes by the frame-to-frame difference can be allocated within the same superframe comprises:

(1) judging whether the inter-frame difference between the sampling frame and the last sampling frame is smaller than a first preset threshold value or not, and whether the difference value between the inter-frame difference between the sampling frame and the last sampling frame and the average inter-frame difference of the video segments which are not divided into super frames before the sampling frame is smaller than a second preset threshold value or not; if so, it is determined that the sample frame and the video segment that has not been previously segmented into superframes by the sample frame can be allocated within the same superframe.

The method specifically comprises the step of judging whether the video which is obtained by current sampling and is not divided into super-frames before the current sampling frame can be distributed in the same static super-frame video segment or not. For example, if the difference between the currently obtained sample frame and the last sample frame is less than or equal to 0.3, and the difference between the average frame-to-frame difference between the sample frame and the video segment that has not been previously divided into superframes is less than 0.05, it can be determined that the sample frame and the video segment that has not been previously divided into superframes can be allocated in the same still superframe video segment.

And/or the like, and/or,

(2) judging whether the inter-frame difference between the sampling frame and the last sampling frame is larger than a third preset threshold value or not, and whether the difference value between the inter-frame difference between the sampling frame and the last sampling frame and the average inter-frame difference of the video segments which are not divided into super frames before the sampling frame is smaller than a fourth preset threshold value or not; if so, it is determined that the sample frame and the video segment that has not been previously segmented into superframes by the sample frame can be allocated within the same superframe.

The method specifically comprises the step of judging whether the current sampled sampling frame and the video which is not divided into super-frames before the sampling frame can be distributed in the same motion super-frame video segment or not. For example, if the difference between the frame of the sample frame and the last sample frame is greater than 0.3 (in the case of strong motion), and the difference between the average frame difference between the sample frame and the video segment that has not been divided into superframes before the sample frame is less than 0.17, it can be determined that the video segment that has not been divided into superframes before the sample frame and the sample frame can be allocated in the same motion superframe video segment.

It should be noted that the first predetermined threshold value is smaller than the third predetermined threshold value, and the second predetermined threshold value is smaller than the fourth predetermined threshold value, and the (2) th manner may be performed in a case where the determination result of the (1) th manner is no. That is, it is determined whether the sampled frame and the video segment that has not been divided into superframes before the sampled frame can be allocated in the same still superframe video segment, and if not, it is determined whether the sampled frame and the video segment that has not been divided into superframes before the sampled frame can be allocated in the same moving superframe video segment, and if not, the sampled frame and the video segment that has not been divided into superframes before the sampled frame can not be allocated in the same superframe.

Referring to fig. 4, based on the above embodiment, step S112 further includes step S1123, step S1124, and step S1125. In order to enhance the reliability of the above determination, in another embodiment, in a case where it is determined by the inter-frame difference that the sample frame and the video segment which has not been divided into super frames before the sample frame can be allocated in the same super frame, the re-determination is performed using the matching result of the motion information of the sample frame and the previous sample frame.

Step 1123, obtaining motion matching information of the sampling frame and the previous sampling frame.

Specifically, if it is determined in steps S1121 to S1122 that the sample frame and the video segment that has not been divided into superframes before the sample frame can be allocated in the same superframe, the motion matching information of the sample frame and the previous sample frame is acquired.

In a specific embodiment, optical flow calculation is performed on the feature point set of the sampling frame and the feature point set of the previous sampling frame, which are acquired by performing texture feature extraction on the sampling frame and the previous sampling frame, so as to acquire motion matching information of the sampling frame and the previous sampling frame.

In order to quickly acquire the texture features of the sampling frame, a FAST corner feature extraction method may be adopted to extract the feature point set of the sampling frame. Specifically, any pixel point on the sampling frame is taken as a candidate feature point; determining a gray value difference value of a pixel point at a preset pixel distance around the candidate feature point and the candidate feature point; judging whether the number of pixel points with the gray value difference value of the candidate characteristic points larger than a preset gray value exceeds a preset number or not; and if so, determining the candidate feature points as the feature points of the sampling frame. In order to improve the efficiency of feature point extraction, only a predetermined number of peripheral pixels are usually used for comparison, for example, 9 peripheral pixels are used for comparison.

And acquiring a feature point set of a sampling frame obtained by current sampling through the process, and performing optical flow calculation based on the feature point set of the sampling frame and the feature point set of the previous sampling frame to acquire a motion matching result of the sampling frame and the previous sampling frame.

In a specific embodiment, the pyramid Lucas-Kanade optical flow tracking algorithm is adopted to perform motion tracking on the currently sampled sampling frame. The pyramid Lucas-Kanade optical flow tracking algorithm calculates the optical flow at the highest layer of the image pyramid, uses the obtained motion estimation result as the starting point of the next layer of pyramid, and repeats the process until reaching the lowest layer of the pyramid. And a group of feature points are required to be specified before tracking during calculation, so that the optical flow method tracking is performed based on the feature point set of the sampling frame obtained by texture feature extraction and the feature point set of the last sampling frame. And performing optical flow calculation according to the feature point coordinates of the sampling frame and the feature coordinates of the last sampling frame to obtain the motion information of the feature points.

By the optical flow calculation, the following motion information can be obtained:

total _ match _ point: the number of the characteristic points matched with the sampling frame and the last sampling frame;

match _ rate: the ratio of the number of the characteristic points matched with the sampling frame and the last sampling frame to the number of the characteristic points of the last sampling frame;

static _ rate: the proportion of stationary points in the characteristic points matched with the sampling frame and the previous sampling frame is that the stationary points are points of which the motion module values are smaller than the preset module values;

motion _ val: an average optical flow strength of the sampled frame;

motion _ direct _ var: the statistical variance in the optical flow field direction of the sampling frame, that is, the variance of the included angle between the motion direction of the feature point and the coordinate axis, may represent the complexity of the motion.

Wherein a, b and c are motion matching information of the sampling frame and the last sampling frame; c. d is the motion characteristic information of the sample frame (relative to the last sample frame).

Step S1124, determining whether the motion matching information meets a predetermined condition.

The predetermined conditions include, for example: whether the number of feature points matched with the last sampling frame by the sampling frame is greater than a predetermined number (for example, whether total _ match _ point is greater than 80), whether the ratio of the number of feature points matched with the last sampling frame by the sampling frame to the number of feature points of the last sampling frame is greater than a first predetermined ratio (for example, whether match _ rate is greater than 80%), and the proportion of stationary points in the feature points matched with the last sampling frame by the sampling frame is greater than a first predetermined proportion (for example, whether static _ rate is greater than 70%).

In step S1125, it is determined that the sampling frame and the video segment that has not been divided into superframes before the sampling frame can be allocated in the same superframe if yes.

And if the motion matching information of the sampling frame and the previous sampling frame does not meet the preset condition, the sampling frame and the previous sampling frame cannot be allocated in the same superframe.

Returning to fig. 2, if not, in step S113, the video segment that has not been divided into superframes before the sampling frame is divided into one superframe.

If the current sampled sampling frame and the video segment which is not divided into the super frame before the sampling frame can not be allocated into the same super frame, which indicates that the sampling frame and the video segment which is not divided into the super frame before do not reach the similar condition, the video segment which is not divided into the super frame before can form the super frame, and the video segment which is not divided into the super frame before can be divided into a new super frame. If the sample frame and the previous sample frame can be allocated in the same superframe, the processing and the determination of the next sample frame are continued.

Since the equal-interval sampling is performed, the super frame is divided into coarse divisions in the above steps S111 and S112. Therefore, the accurate positioning of the shear position can be performed between the sampling frame obtained by the current sampling and the last sampling frame.

Referring to fig. 5, the step of dividing the video segment that has not been divided into superframes into one superframe specifically includes step S1131 and step S1132.

Step S1131, calculating an inter-frame difference between every two adjacent frames between the sampling frame and the previous sampling frame.

When processing the live video which is live broadcasting, preprocessing the incoming video data, for example, scaling the side of the current incoming frame to 128 pixels, and performing black edge removal and gamma correction, thereby reducing the influence of illumination on the image. And buffers the processed frames for use in this step. Calculating the interframe difference between every two adjacent frames between the sampled frame and the previous sampled frame, that is, calculating the interframe difference between every two adjacent frames in all the frames (including the sampled frame and the previous sampled frame and the non-sampled frame between the sampled frame and the previous sampled frame) of the sampled frame and the previous sampled frame, calculating the interframe difference between every two adjacent frames by using the method for calculating the interframe difference between the sampled frame and the previous sampled frame in the step 1121, and calculating by using the formula (1), which is not described herein again.

In step S1132, the position where the inter-frame difference between two adjacent frames is the largest is taken as the shear position between the divided superframe and the next superframe.

Wherein the video segment composed of frames after the last super frame and before the cut position is divided into super frames, and the frames after the cut position belong to the next (non-divided) super frame.

Referring to fig. 6, step 110 further includes step S114 and step S115.

Step S114 determines whether the divided superframe needs to be merged with the previous superframe.

The last superframe is a superframe (divided) last to the divided superframe, that is, the divided superframe is previously divided into superframes, which are adjacent to the divided superframe. When the live video has the motion of the shooting equipment, the equipment may move, rotate, zoom, tilt and the like along a certain direction, and the background shakes, which may cause the content change, distortion and illumination change of the shot picture.

Step S114 may include the following two specific embodiments:

(1) judging whether the length of the divided superframe is less than a preset number of frames, if so, judging whether the average frame-to-frame difference of the superframe is greater than a fifth preset threshold value, and whether the difference value of the average frame difference of the sampling frame and the superframe is less than a sixth preset threshold value, if so, determining that the divided superframe needs to be merged with the previous superframe.

First, it is determined whether the length of the divided superframes is less than a predetermined number of frames, for example, 40 frames, and if the condition that the length is less than the predetermined number of frames is not satisfied, the divided superframes do not need to be merged. If the length of the divided superframe meets the condition that the length of the divided superframe is less than the preset frame number, whether the superframe is a moving superframe video segment is judged, namely, whether the divided superframe needs to be combined is determined by judging whether the average interframe difference of the superframe is greater than a fifth preset threshold value and whether the difference value of the average frame difference of the sampling frame and the superframe is less than a sixth preset threshold value. Assuming that the fifth predetermined threshold is 0.08 and the sixth predetermined threshold is, for example, 0.2, if the average inter-frame difference of the divided super-frames is greater than 0.08 and the difference between the average inter-frame difference of the sample frame and the super-frame is less than 0.2, the super-frame is a motion super-frame video segment and needs to be merged with the previous super-frame. The above method is to determine whether the superframe divided by the frame-to-frame difference is a moving superframe video segment.

(2) Judging whether the length of the divided superframe is less than a preset frame number, if so, judging whether the number of the feature points matched with the previous sampling frame by the sampling frame is more than a second preset number, whether the ratio of the number of the feature points matched with the previous sampling frame by the sampling frame to the number of the feature points of the previous sampling frame is more than a second preset ratio, and whether the proportion occupied by static points in the feature points matched with the previous sampling frame by the sampling frame is more than a second preset ratio, if so, determining that the divided superframe needs to be combined with the previous superframe.

In this method, it is first determined whether the length of the divided superframes is smaller than the predetermined number of frames, and if the length is smaller than the predetermined number of frames, the divided superframes do not need to be merged. If the length of the divided superframe meets the condition that the length of the divided superframe is less than the preset frame number, then judging whether the superframe is a moving superframe video section, namely, judging whether the number of the characteristic points (total _ match _ point) matched with the previous sampling frame by the sampling frame is more than a second preset number, whether the ratio (match _ rate) of the number of the characteristic points matched with the previous sampling frame by the sampling frame and the number of the characteristic points of the previous sampling frame is more than a second preset ratio, and whether the ratio (static _ rate) occupied by static points in the characteristic points matched with the previous sampling frame by the sampling frame is more than a second preset ratio, so as to determine whether the divided superframe is required to be combined. Assuming that the second predetermined number is 80, the second predetermined ratio is 80%, and the second predetermined ratio is 30%.

In step S115, if yes, the divided super frame is merged with the previous super frame.

If the determination result in any one of the above steps S114 is positive, the divided superframes and the previous superframe are merged, that is, merged into one superframe.

Returning to fig. 1, in step S120, when divided into one superframe, a key frame of the divided superframe is acquired.

In other words, in the process of dividing the video in the live broadcast into superframes, each time the video is divided into a superframe, the key frame of the superframe which is currently divided into superframes is acquired, so that the video in the live broadcast is identified by taking the superframe as a unit in real time with a specific type of content. Referring to fig. 7, step S120 specifically includes step S121, step S122, and step S123.

Step S121, acquiring a sampling frame in the superframe as a candidate key frame set of the superframe.

In order to reduce the occupied memory and improve the processing efficiency, the key frame is acquired in a streaming manner, and the key frame is acquired simultaneously with the superframe segmentation. If it is determined that the currently sampled sampling frame and the video segment that has not yet formed a superframe before the sampling frame can be allocated to the same superframe, that is, the sampling frame does not form a new superframe, the sampling frame is used as a candidate key frame of the video segment that has not yet been divided into superframes (but will be divided into superframes in the future), and is stored in a candidate key frame set of the video segment that will be divided into superframes in the future.

Step S122, performing deduplication processing on the candidate keyframes in the candidate keyframe set to obtain a deduplication candidate keyframe set of the superframe.

Referring to fig. 8, step S122 specifically includes step S1221 and step S1222.

Step S1221, calculating an inter-frame difference between every two candidate key frames in the candidate key frame set, and if the inter-frame difference between the two candidate key frames is smaller than a predetermined inter-frame difference threshold, determining that the two candidate key frames are similar frames.

The inter-frame difference between the two candidate key frames may be calculated by using the method for calculating the inter-frame difference in step 1121 and using formula (1), and if the inter-frame difference between the two candidate key frames obtained by calculation is smaller than a predetermined inter-frame difference threshold, the two candidate key frames may be determined to be similar frames.

Step S1222, for the similar frames in the candidate key frame set, retaining any one of the frames, and deleting the rest of the similar frames.

For example, if only one of the 3 similar frames is retained, and the remaining two frames are deleted, the similar frames in the candidate key frame set are processed as described above, and the de-duplication candidate key frame set of the divided super frame is obtained.

Step S123, according to the quality scores of the candidate key frames in the de-duplication candidate key frame set, screening out the key frames of the super-frame from the de-duplication candidate key frame set.

The quality scores of the candidate key frames in the de-duplication candidate key frame set are obtained by performing quality judgment on the candidate key frames in the de-duplication candidate key frame set based on a pre-established quality judgment model. The quality determination of the candidate keyframes in the duplicate removal candidate keyframe set may be performed after obtaining the candidate keyframe set (i.e., after dividing into the current superframe), or may be performed while performing superframe division, i.e., while processing each sample frame, calculating the quality score of the sample frame.

The step of performing quality determination on the candidate keyframes in the de-duplication candidate keyframe set based on the pre-established quality determination model specifically includes step S1, step S2, and step S3, as shown in fig. 9.

Step S1, calculating the information entropy of the candidate key frame according to the global color histogram of the candidate key frame in the duplicate removal candidate key frame set.

Specifically, the color feature extraction method in step 1121 is adopted to obtain the block color histogram of the candidate key frame (the block color histogram of the candidate key frame obtained in step S1121 may be directly used), then the corresponding components of the color histograms of each block are added to obtain the global color histogram of the candidate key frame, and then the information entropy of the candidate key frame is calculated according to the global color histogram of the candidate key frame.

For example, the candidate key frame is divided into N × N blocks (e.g., 3 × 3 blocks), and the color histogram Hist [ i ] { (h _ Hist [ hi ], s _ Hist [ si ], v _ Hist [ vi ]) } of each block is calculated, respectively, 0 ≦ i < 9, 0 ≦ hi < 8, 0 ≦ si < 3, 0 ≦ vi < 2;

adding corresponding components of the color histogram of each block to obtain a global color histogram of the candidate key frame:

gHist＝{(gh_hist[hi]，gs_hist[si]，gv_hist[vi])}，0≤hi＜8，0≤si＜3，0≤vi＜2，

wherein:

gh_hist[hi]＝∑h_hist[i][hi]，0≤i＜9；

gs_hist[hi]＝∑s_hist[i][si]，0≤i＜9；

gv_hist[hi]＝∑v_hist[i][vi]，0≤i＜9；

then, performing information entropy imgEncopy calculation on the global color histogram gHist of the candidate key frame, wherein the calculation formula is as follows:

imgEntropy＝0-∑(gHist[j]*log(gHist[j]))，0≤j＜48 (2)

step S2, generating image feature vectors of the candidate key frames based on the calculated information entropy of the candidate key frames and the motion information of the candidate key frames.

Specifically, the motion information of the candidate key frame may be extracted using the same method in step S1123. That is, optical flow calculation is performed based on the feature point set of the candidate key frame and the feature point set of the last sampling frame of the candidate key frame, which are acquired by performing texture feature extraction on the candidate key frame and the last sampling frame of the candidate key frame, so as to acquire motion information of the candidate key frame and the last sampling frame of the candidate key frame. For a specific calculation process, reference may be made to step S1123, which is not described herein again.

Writing the information entropy of the candidate key frame and the motion information of the candidate key frame into a vector form, the following image feature vectors can be obtained:

framefeature＝[total_match_point，motion_val，match_rate，static_rate，motion_direct_var，imgEntropy]。

step S3, performing quality determination on the image feature vector according to a pre-established quality determination model to obtain a quality score of the candidate keyframe.

And inputting the obtained image characteristic vector of the candidate key frame into a pre-established quality judgment model, and outputting the quality score of the candidate key frame.

Specifically, a large number of screenshots in a video are collected, and a quality judgment model is obtained through training, wherein a quality judgment classifier (namely, the quality judgment model) can be established by using a regression analysis (logistic regression) method, and a quality score of a current frame can be output through the quality judgment model, wherein the quality score is a decimal number of 0-1.

Step S123 specifically includes the following two specific embodiments:

(1) and eliminating the candidate key frames with the quality scores smaller than a preset score threshold value in the duplication elimination candidate key frame set, and determining the reserved candidate key frames as the key frames of the super frame.

The predetermined score threshold is, for example, 0.2, that is, frames with a quality score less than 0.2 in the deduplication candidate key frame set are rejected, and frames with a remaining quality score greater than or equal to 0.2 are taken as key frames of the super frame.

Alternatively, the first and second electrodes may be,

(2) sorting the candidate key frames in the duplicate removal candidate key frame set according to the sequence of the mass fractions from large to small; reserving a predetermined number of top ranked candidate key frames as key frames of the superframe.

For example, after the candidate keyframes in the de-duplication candidate keyframe set are sorted in the order of decreasing quality scores, the top 10 candidate keyframes are reserved as the candidate keyframes of the superframe.

Returning to fig. 1, in step S130, a key frame of the superframe is identified to determine whether the superframe contains a specific type of content.

Referring to fig. 10, step S130 specifically includes step S131 and step S132.

Step S131, identifying each key frame of the super frame based on a pre-established image identification model for identifying the specific type of content, so as to obtain the probability that each key frame of the super frame contains the specific type of content.

Wherein the pre-established image recognition model that recognizes the specific type of content is an image recognition model based on a deep convolutional neural network. And training a mass of images of the content of the specific type, establishing an image recognition model based on a deep convolutional neural network, and inputting each key frame of the super frame into the image recognition model, so as to obtain the probability that each key frame of the super frame contains the content of the specific type, namely the probability that the content of each key frame belongs to the content of the specific type. By adopting the image recognition model based on the deep convolutional neural network, the complex characteristic extraction and data reconstruction process in the traditional image recognition algorithm can be avoided.

Step S132, determining whether the super frame includes the content of the specific type according to the average value of the probabilities that each key frame of the super frame includes the content of the specific type and a predetermined probability threshold.

And if the average value of the probabilities that all key frames of the superframe contain the specific type of content is greater than a preset probability threshold, determining that the superframe contains the specific type of content. Determining that the super frame does not contain the content of the specific type if the average of the probabilities that the key frames of the super frame contain the content of the specific type is less than or equal to a predetermined probability threshold. For example, the predetermined probability threshold may be 0.9, after the probability that each key frame of the currently segmented super frame includes the content of the specific type is output through the image recognition model, an average value of the probabilities of each key frame including the content of the specific type is calculated, whether the average value exceeds 0.9 is judged, and if yes, the super frame is determined to include the content of the specific type.

If the (currently segmented) superframe contains the content of the specific type, the system can be reminded to stop the playing of the video in the live broadcast, and if the (currently segmented) superframe does not contain the content of the specific type, the superframe segmentation and the judgment of the content of the specific type are continuously carried out until the live broadcast of the video is finished, so that the real-time identification of whether the live broadcast video contains the content of the specific type can be realized.

The video super-frame concept is provided, the video which is continuously generated is subjected to super-frame segmentation in the live broadcasting process of the video, and the specific type of content is identified by taking a super-frame as a unit, so that a super-frame video segment containing the specific type of content is identified in the live broadcasting process, and the real-time monitoring on the specific type of content of the current live video is realized. According to the method and the device, based on the calculation of the information entropy and the extraction of the motion characteristics, the quality judgment result of each sampling frame is output through the quality judgment model, so that the key frames of the superframe which are segmented at present are screened out, the calculation resources are saved, and the high-quality key frames can be obtained. According to the method and the device, the specific type of content is identified by using the image identification model based on the deep convolutional neural network, so that the identification accuracy is improved.

Based on the same inventive concept as the method, the application also provides a data identification device. Fig. 11 is a schematic diagram of a data recognition apparatus 2, the apparatus 2 comprising:

a superframe splitting unit 210, configured to perform superframe splitting on a live video to obtain multiple superframes, where a video segment composed of multiple frames that are adjacent in time and meet a predetermined similarity condition is split into one superframe;

a key frame acquiring unit 220 for acquiring a key frame of the divided superframe when divided into one superframe;

a content identification unit 230, configured to identify a key frame of the superframe to determine whether the superframe contains a specific type of content.

Referring to fig. 12, in a specific embodiment, the superframe segmentation unit 210 includes:

a sampling unit 211, configured to sample the live video at intervals of a predetermined number of frames to obtain a sampling frame;

a determining unit 212, configured to determine whether the obtained sample frame and a video segment that has not been divided into superframes before the sample frame can be allocated in the same superframe;

a dividing unit 213, configured to divide the video segment that has not been divided into superframes before the sampling frame into one superframe if not.

Referring to fig. 13, in a specific embodiment, the determining unit 212 includes:

an inter-frame difference calculation unit 2121, configured to calculate an inter-frame difference between the sample frame and a previous sample frame according to a color histogram of the sample frame obtained by performing color feature extraction on the sample frame and the previous sample frame and a color histogram of the previous sample frame, where the previous sample frame is a sample frame that is sampled before the sample frame and is separated from the sample frame by a predetermined number of frames;

a first determining unit 2122, configured to determine whether the video segment that has not been divided into the super frame before the sampling frame and the sampling frame can be allocated in the same super frame according to the inter-frame difference between the sampling frame and the previous sampling frame and the average inter-frame difference of the video segments that have not been divided into the super frame before the sampling frame.

Wherein the first determining unit 2122 is further configured to:

judging whether the inter-frame difference between the sampling frame and the last sampling frame is smaller than a first preset threshold value or not, and whether the difference value between the inter-frame difference between the sampling frame and the last sampling frame and the average inter-frame difference of the video segments which are not divided into super frames before the sampling frame is smaller than a second preset threshold value or not;

if yes, determining that the sampling frame and a video segment which is not divided into superframes before the sampling frame can be distributed in the same superframe;

and/or the like, and/or,

judging whether the inter-frame difference between the sampling frame and the last sampling frame is larger than a third preset threshold value or not, and whether the difference value between the inter-frame difference between the sampling frame and the last sampling frame and the average inter-frame difference of the video segments which are not divided into super frames before the sampling frame is smaller than a fourth preset threshold value or not;

if so, it is determined that the sample frame and the video segment that has not been previously segmented into superframes by the sample frame can be allocated within the same superframe.

Referring to fig. 14, based on the embodiment described in fig. 13, the determining unit 212 further includes:

an information obtaining unit 2123, configured to obtain motion matching information of the sample frame and a previous sample frame;

a condition determining unit 2124, configured to determine whether the motion matching information meets a predetermined condition, where the predetermined condition includes: whether the number of the feature points matched with the sampling frame and the last sampling frame is greater than a first preset number, whether the ratio of the number of the feature points matched with the sampling frame and the last sampling frame to the number of the feature points of the last sampling frame is greater than a first preset ratio, and whether the proportion of stationary points in the feature points matched with the sampling frame and the last sampling frame is greater than a first preset ratio;

a second determining unit 2125, configured to determine that the sample frame and the video segment that has not been divided into superframes before the sample frame can be allocated in the same superframe if yes.

Wherein the information obtaining unit 2123 is further configured to:

Referring to fig. 15, based on the above embodiment, the superframe splitting unit 210 further includes:

a superframe judging unit 214, configured to judge whether a length of a divided superframe is smaller than a predetermined number of frames, if so, judge whether an average inter-frame difference of the superframe is larger than a fifth predetermined threshold, and whether a difference between the average frame difference of the sampling frame and the superframe is smaller than a sixth predetermined threshold, and if so, determine that the divided superframe needs to be merged with a previous superframe;

alternatively, the first and second electrodes may be,

the device comprises a super frame, a sampling frame, a first sampling frame, a second sampling frame, a third sampling frame and a fourth sampling frame, wherein the super frame is divided into super frames, the super frames are used for judging whether the length of the super frames is smaller than a preset frame number, if so, judging whether the number of feature points matched with the last sampling frame is larger than a second preset number, the ratio of the number of the feature points matched with the last sampling frame to the number of the feature points of the last sampling frame is larger than a second preset ratio, and the proportion of stationary points in the feature points matched with the last sampling frame to the super frames is larger than a second;

a superframe merging unit 215, configured to merge the divided superframe with a previous superframe if the superframe determining unit determines that the divided superframe needs to be merged with the previous superframe.

Referring to fig. 16, the segmentation unit 213 includes:

a calculating unit 2131, configured to calculate an inter-frame difference between every two adjacent frames between the sampling frame and a previous sampling frame;

a shear position determining unit 2132, configured to determine a position where an inter-frame difference between two adjacent frames is the largest as a shear position of the divided super frame and a next super frame.

Referring to fig. 17, according to any of the above embodiments, the key frame acquiring unit 220 includes:

a candidate key frame set obtaining unit 221, configured to obtain a sample frame in the super frame as a candidate key frame set of the super frame;

a deduplication processing unit 222, configured to perform deduplication processing on candidate keyframes in the candidate keyframe set to obtain a deduplication candidate keyframe set of the superframe;

a key frame screening unit 223, configured to screen out the key frame of the super frame from the de-duplication candidate key frame set according to the quality score of the candidate key frame in the de-duplication candidate key frame set.

Referring to fig. 18, based on the embodiment described in fig. 17, the deduplication processing unit 222 includes:

a similar frame determining unit 2221, configured to calculate an inter-frame difference between every two candidate key frames in the candidate key frame set, and determine that the two candidate key frames are similar frames if the inter-frame difference between the two candidate key frames is smaller than a predetermined inter-frame difference threshold;

a similar frame deduplication unit 2222, configured to, for a similar frame in the candidate key frame set, retain any one of the frames, and delete the remaining similar frames.

Wherein the quality scores of the candidate key frames in the de-duplication candidate key frame set are obtained by performing quality judgment on the candidate key frames in the de-duplication candidate key frame set based on a pre-established quality judgment model,

the apparatus 2 further comprises: a quality determination unit (not shown) for performing quality determination on the candidate keyframes in the de-duplication candidate keyframe set based on a pre-established quality determination model;

the quality determination unit includes:

the information entropy calculation unit is used for calculating the information entropy of the candidate key frames according to the global color histogram of the candidate key frames in the duplicate removal candidate key frame set;

the image feature vector acquisition unit is used for generating an image feature vector of the candidate key frame based on the calculated information entropy of the candidate key frame and the motion feature of the candidate key frame;

and the quality score acquisition unit is used for carrying out quality judgment on the image feature vector according to a pre-established quality judgment model so as to acquire the quality score of the candidate key frame.

Based on the embodiment described in fig. 17, the key frame screening unit 223 is further configured to:

eliminating candidate key frames with the quality scores smaller than a preset score threshold value in the duplication elimination candidate key frame set, and determining the reserved candidate key frames as the key frames of the super frame;

alternatively, the first and second electrodes may be,

sorting the candidate key frames in the duplicate removal candidate key frame set according to the sequence of the mass fractions from large to small; reserving a predetermined number of top ranked candidate key frames as key frames of the superframe.

Referring to fig. 19, based on any of the above embodiments, the content identification unit 230 includes:

a probability obtaining unit 231, configured to identify, based on a pre-established image identification model for identifying the specific type of content, each key frame of the super frame to obtain a probability that each key frame of the super frame contains the specific type of content;

a content determining unit 232, configured to determine whether each key frame of the super frame includes a content of a specific type according to an average value of probabilities that each key frame of the super frame includes the content of the specific type and a predetermined probability threshold.

Wherein the pre-established image recognition model that recognizes the specific type of content is an image recognition model based on a deep convolutional neural network.

It is noted that the present application may be implemented in software and/or a combination of software and hardware, for example, the various means of the present application may be implemented using Application Specific Integrated Circuits (ASICs) or any other similar hardware devices. In one embodiment, the software programs of the present application may be executed by a processor to implement the steps or functions described above. Likewise, the software programs (including associated data structures) of the present application may be stored in a computer readable recording medium, such as RAM memory, magnetic or optical drive or diskette and the like. Additionally, some of the steps or functions of the present application may be implemented in hardware, for example, as circuitry that cooperates with the processor to perform various steps or functions.

It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned. Furthermore, it is obvious that the word "comprising" does not exclude other elements or steps, and the singular does not exclude the plural. A plurality of units or means recited in the system claims may also be implemented by one unit or means in software or hardware. The terms first, second, etc. are used to denote names, but not any particular order.

While exemplary embodiments have been particularly shown and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the claims.

Claims

1. A method of data identification, the method comprising the steps of:

the method comprises the following steps of carrying out superframe segmentation on a video in live broadcast to obtain a plurality of superframes, and specifically comprises the following steps:

sampling the video in the live broadcast once every a preset frame number interval to obtain a sampling frame;

judging whether the obtained sampling frame and a video segment which is not divided into superframes before the sampling frame can be distributed in the same superframe or not;

if not, dividing the video segment which is not divided into the super frame before the sampling frame into a super frame;

wherein, a video segment composed of a plurality of frames which are adjacent in time and meet the preset similarity condition is divided into a superframe;

when the super frame is divided into the super frames, acquiring key frames of the divided super frames;

identifying a key frame of the superframe to determine whether the superframe contains a particular type of content;

and/or the like, and/or,

2. The method according to claim 1, wherein the step of determining whether the obtained sample frame and the video segment not divided into superframes before the sample frame can be allocated in the same superframe comprises:

calculating the frame-to-frame difference between the sampling frame and the last sampling frame according to the color histogram of the sampling frame and the color histogram of the last sampling frame, which are obtained by extracting the color features of the sampling frame and the last sampling frame, wherein the last sampling frame is a sampling frame obtained by sampling the video in the live broadcast before the sampling frame;

and determining whether the video segments which are not divided into the super frames before the sampling frame and the sampling frame can be distributed in the same super frame according to the inter-frame difference between the sampling frame and the last sampling frame and the average inter-frame difference of the video segments which are not divided into the super frames before the sampling frame.

3. The method according to claim 2, wherein the step of determining whether the sample frame and the video segment that has not been previously segmented into superframes by the sample frame can be allocated within the same superframe further comprises:

acquiring motion matching information of the sampling frame and the previous sampling frame;

judging whether the motion matching information meets a preset condition or not, wherein the preset condition comprises the following steps: whether the number of the feature points matched with the sampling frame and the last sampling frame is greater than a first preset number, whether the ratio of the number of the feature points matched with the sampling frame and the last sampling frame to the number of the feature points of the last sampling frame is greater than a first preset ratio, and whether the proportion of stationary points in the feature points matched with the sampling frame and the last sampling frame is greater than a first preset ratio;

4. The method of claim 3, wherein the step of obtaining the motion matching information of the sample frame and the previous sample frame comprises:

5. The method of claim 4, wherein the step of performing superframe segmentation on the video in live broadcast further comprises:

judging whether the divided superframe needs to be merged with the previous superframe or not, comprising the following steps:

judging whether the length of the divided superframe is less than a preset frame number, if so, judging whether the average frame difference of the superframe is greater than a fifth preset threshold value, and whether the difference value of the average frame difference of the sampling frame and the superframe is less than a sixth preset threshold value, if so, determining that the divided superframe needs to be merged with the previous superframe;

alternatively, the first and second electrodes may be,

judging whether the length of the divided superframe is less than a preset frame number, if so, judging whether the number of the feature points matched with the previous sampling frame of the sampling frame is more than a second preset number, whether the ratio of the number of the feature points matched with the previous sampling frame of the sampling frame to the number of the feature points of the previous sampling frame is more than a second preset ratio, and whether the proportion occupied by static points in the feature points matched with the previous sampling frame of the sampling frame is more than a second preset ratio, if so, determining that the divided superframe needs to be merged with the previous superframe;

and if so, merging the divided superframe with the previous superframe.

6. The method of claim 1, wherein the step of segmenting into a superframe a video segment that has not been previously segmented into superframes by said sample frame comprises:

calculating the interframe difference of every two adjacent frames between the sampling frame and the last sampling frame;

and determining the position with the largest inter-frame difference of two adjacent frames as the shearing position of the divided superframe and the next superframe.

7. The method according to any of claims 1-6, wherein the step of obtaining key frames of the segmented superframe comprises:

acquiring a sampling frame in the superframe as a candidate key frame set of the superframe;

performing deduplication processing on candidate key frames in the candidate key frame set to obtain a deduplication candidate key frame set of the superframe;

and screening the key frames of the super frame from the de-duplication candidate key frame set according to the quality scores of the candidate key frames in the de-duplication candidate key frame set.

8. The method of claim 7, wherein the step of performing deduplication processing on the candidate keyframes in the set of candidate keyframes to obtain a set of deduplication candidate keyframes for the superframe comprises:

calculating the inter-frame difference of every two candidate key frames in the candidate key frame set, and if the inter-frame difference of the two candidate key frames is smaller than a preset inter-frame difference threshold value, determining the two candidate key frames as similar frames;

and for the similar frames in the candidate key frame set, keeping any one frame and deleting the rest similar frames.

9. The method according to claim 7, wherein the quality scores of the candidate keyframes in the de-duplication candidate keyframe set are obtained by performing quality judgment on the candidate keyframes in the de-duplication candidate keyframe set based on a pre-established quality judgment model;

the step of performing quality judgment on the candidate key frames in the de-duplication candidate key frame set based on a pre-established quality judgment model comprises the following steps:

calculating the information entropy of the candidate key frame according to the global color histogram of the candidate key frame in the duplicate removal candidate key frame set;

generating image feature vectors of the candidate key frames based on the calculated information entropy of the candidate key frames and the motion features of the candidate key frames;

and performing quality judgment on the image feature vector according to a pre-established quality judgment model to obtain the quality scores of the candidate key frames.

10. The method of claim 7, wherein the step of filtering out key frames of the superframe from the set of deduplication candidate key frames according to quality scores of candidate key frames in the set of deduplication candidate key frames comprises:

alternatively, the first and second electrodes may be,

sorting the candidate key frames in the duplicate removal candidate key frame set according to the sequence of the mass fractions from large to small;

reserving a predetermined number of top ranked candidate key frames as key frames of the superframe.

11. The method of claim 1, wherein identifying key frames of the superframe to determine whether the superframe contains a particular type of content comprises:

identifying each key frame of the super frame based on a pre-established image identification model for identifying the specific type of content so as to acquire the probability that each key frame of the super frame contains the specific type of content;

and judging whether the superframe contains the content of the specific type or not according to the average value of the probabilities that each key frame of the superframe contains the content of the specific type and a preset probability threshold.

12. The method of claim 11, wherein the image recognition model that identifies the particular type of content is an image recognition model based on a deep convolutional neural network.

13. An apparatus for data recognition, the apparatus comprising:

the video segment dividing unit is used for dividing the video in the live broadcast into a plurality of superframes, wherein the video segment composed of a plurality of frames which are adjacent in time and meet the preset similar condition is divided into one superframe;

the superframe division unit includes:

the sampling unit is used for sampling the video in the live broadcast once at intervals of preset frame numbers to obtain a sampling frame;

a judging unit, configured to judge whether the obtained sampling frame and a video segment that has not been divided into superframes before the sampling frame can be allocated in the same superframe;

a dividing unit, configured to divide, if not, the video segment that has not been divided into superframes before the sampling frame into a superframe;

a key frame acquisition unit configured to acquire a key frame of a superframe divided when divided into the superframe;

a content identification unit, configured to identify a key frame of the superframe to determine whether the superframe contains a specific type of content;

and/or the like, and/or,

14. The apparatus according to claim 13, wherein the judging unit includes:

an inter-frame difference calculation unit, configured to calculate an inter-frame difference between the sampling frame and a previous sampling frame according to a color histogram of the sampling frame and a color histogram of the previous sampling frame, where the color histogram of the sampling frame is obtained by performing color feature extraction on the sampling frame and the previous sampling frame, and the previous sampling frame is a sampling frame that is sampled before the sampling frame and is spaced from the sampling frame by a predetermined number of frames;

a first determining unit, configured to determine whether the video segment that has not been divided into superframes before the sampling frame and the sampling frame can be allocated in the same superframe according to an interframe difference between the sampling frame and a previous sampling frame and an average interframe difference of the video segment that has not been divided into superframes before the sampling frame.

15. The apparatus according to claim 14, wherein the determining unit further comprises:

the information acquisition unit is used for acquiring the motion matching information of the sampling frame and the previous sampling frame;

a condition judgment unit configured to judge whether the motion matching information meets a predetermined condition, where the predetermined condition includes: whether the number of the feature points matched with the sampling frame and the last sampling frame is greater than a first preset number, whether the ratio of the number of the feature points matched with the sampling frame and the last sampling frame to the number of the feature points of the last sampling frame is greater than a first preset ratio, and whether the proportion of stationary points in the feature points matched with the sampling frame and the last sampling frame is greater than a first preset ratio;

and a second determining unit, configured to determine that the sample frame and the video segment that has not been divided into superframes before the sample frame can be allocated in the same superframe if yes.

16. The apparatus of claim 15, wherein the information obtaining unit is further configured to: