CN111178204A

CN111178204A - Video data editing and identifying method and device, intelligent terminal and storage medium

Info

Publication number: CN111178204A
Application number: CN201911325258.8A
Authority: CN
Inventors: 梁文俊; 黄继武
Original assignee: Shenzhen University
Current assignee: Shenzhen University
Priority date: 2019-12-20
Filing date: 2019-12-20
Publication date: 2020-05-19
Anticipated expiration: 2039-12-20
Also published as: CN111178204B

Abstract

The invention discloses a video data editing and identifying method, a video data editing and identifying device, an intelligent terminal and a storage medium, wherein the method comprises the following steps: acquiring video data to be identified; inputting the video data to be recognized into the trained feature model to obtain a classification result; identifying whether the video data is edited or not according to the classification result; the feature model is constructed based on a gray level co-occurrence matrix. Compared with the existing SRM model, the SRM model after dimensionality reduction has fewer dimensions, and no additional information is needed for identifying the original video and the video processed by different software. The method judges the originality of the video from the perspective of video editing software, judges whether the video is processed by the software by detecting whether the video is compressed by the software, provides an auxiliary means for video forensics and provides an effective method for identifying the originality of the monitoring video.

Description

Video data editing and identifying method and device, intelligent terminal and storage medium

Technical Field

The invention relates to the technical field of multimedia information, in particular to a video data editing and identifying method, a video data editing and identifying device, an intelligent terminal and a storage medium.

Background

Video is a frequently used medium, and the authenticity of the video is required to be ensured in many cases, particularly when the video is used as a court presentation evidence, the authenticity of the video is required to be ensured, and the originality of the video and the judgment of whether the video is tampered or not are required to be verified. However, with the rapid development of image and Video processing technologies, with the help of professional Video editing software (such as Adobe Premiere, Adobe After Effects and core Video Studio, etc., common users can also tamper with digital Video without leaving visual traces, which makes the reality of Video in question, and subverts the traditional concept of "what you see is real" of people. In recent years, tampering of video has affected political, legal, media aspects, and the like, and the moral and legal problems that result are becoming more and more. For example, the video face-changing technique deepfake can change the face in the video into the face of other people, thereby tampering with the video without speech published by some politicians, the inelegant video of some stars, and the like. Various malicious tamper instances are countless and video authentication techniques have become important.

Typical active video authentication techniques include digital watermarking and digital signatures. Both methods need to add extra information to the video in advance, extract corresponding information from the video during authentication, and then perform matching judgment with the additional information in advance. In practical applications, it is impossible to obtain effective additional information in many cases, and thus the active authentication method has a great limitation. The video passive authentication technology does not need any additional information, but utilizes and analyzes some intrinsic characteristics of the video to achieve the purpose of authentication. At present, no method for obtaining evidence of video from the perspective of video editing software exists. Aiming at the current situation that the video editing software is more commonly tampered, the passive evidence obtaining of the video is carried out from the perspective of the video editing software, and the method has pertinence and effectiveness.

Accordingly, the prior art is yet to be improved and developed.

Disclosure of Invention

The technical problem to be solved by the present invention is to provide a method, an apparatus, an intelligent terminal and a storage medium for video data editing and identification, aiming at performing video forensics from the perspective of video editing software, so that forensics has better pertinence and effectiveness. The technical scheme adopted by the invention for solving the technical problem is as follows:

in a first aspect, an embodiment of the present invention provides a video data editing and identifying method, where the method includes:

acquiring video data to be identified;

inputting the video data to be recognized into the trained feature model to obtain a classification result; identifying whether the video data is edited or not according to the classification result; the feature model is constructed based on a gray level co-occurrence matrix.

The video data editing and identifying method is characterized in that the feature model training process comprises the following steps:

acquiring video data, and preprocessing the video data to obtain a training data set and a detection data set;

extracting the features of the training data in the training data set by using the feature model; constructing the characteristics extracted from the training data processed by the same type of video editing software into a same type of video characteristic set;

training the constructed different types of video feature sets into two classifiers in pairs; and training the two classifiers by using the training data set to obtain a trained feature model.

The video data editing and identifying method includes the following steps:

importing the video data into video editing software to obtain edited video data; the edited video data comprises first edited video data and second edited video data;

forming the first edited video data and part of the video data set into a training data set;

and forming the second edited video data and part of the video data set into a detection data set.

The video data editing and identifying method is characterized in that the training data is subjected to feature extraction and comprises the following steps:

expanding each video in the training data set into a video frame, wherein the video frame is stored in a portable gray mode image;

and extracting the characteristics of each video frame by using the characteristic model to form multi-dimensional characteristics, and extracting the multi-dimensional characteristics from the training data processed by the similar video editing software to form a similar video characteristic set.

The video data editing and identifying method is characterized in that the step of inputting the video data to be identified into the trained feature model to obtain a classification result comprises the following steps:

converting the video data to be identified into video frames in a portable gray scale mode;

and extracting the characteristics of each video frame by using the characteristic model, and classifying each video frame by using two classifiers to obtain a classification result.

The video data editing and identifying method is characterized in that the two classifiers are set classifiers.

According to the video data editing and identifying method, the feature model is a 603-dimensional feature model obtained by reducing the dimension of a space enrichment model based on a gray level co-occurrence matrix.

In a second aspect, an apparatus for video data editing and identification, the apparatus comprising:

the acquisition unit is used for acquiring video data to be identified;

the classification recognition unit is used for inputting the video data to be recognized into the trained feature model to obtain a classification result; identifying whether the video data is edited or not according to the classification result; the feature model is constructed based on a gray level co-occurrence matrix.

In a third aspect, an intelligent terminal includes a memory, and one or more programs, where the one or more programs are stored in the memory, and the one or more programs configured to be executed by the one or more processors include a method for performing the video data editing and identifying method.

In a fourth aspect, a non-transitory computer readable storage medium, wherein instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the above video data edit identification method.

The invention has the beneficial effects that: the method comprises the steps of constructing a characteristic model based on a gray level co-occurrence matrix, training the characteristic model, and classifying and judging videos to be recognized by utilizing the trained characteristic model, so that whether video data are processed by editing software is recognized. Compared with the existing SRM (Spatial Rich Models), the SRM model after dimensionality reduction has fewer dimensions and higher identification efficiency. On the other hand, the majority voting of the classification results of all video frames in the video obtains the classification result in units of video, which is equivalent to a more-than-one classification error correction process, so that the identification accuracy is high.

Drawings

Fig. 1 is a flowchart illustrating a video data editing and identifying method according to a preferred embodiment of the present invention.

Fig. 2 is a flowchart of a feature model training process in the video data editing and identifying method provided by the invention.

Fig. 3 is a schematic view of a video processing process based on video software in the video data editing and identifying method provided by the present invention.

Fig. 4 is a frame diagram of a video classifier processed by detection software in the video data editing and identifying method provided by the invention.

Fig. 5 is a classification model in units of video in the video data editing and identifying method provided by the present invention.

Fig. 6 is a functional schematic diagram of a video data editing and recognizing apparatus according to the present invention.

Fig. 7 is a functional schematic diagram of the intelligent terminal provided by the invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer and clearer, the present invention is further described in detail below with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As shown in fig. 1, the present invention provides a video data editing and identifying method, which includes the following steps:

and S10, acquiring the video data to be identified.

Specifically, in conjunction with fig. 3, the video data to be identified comes from a monitoring video segment in MP4 format of a camera. Of course, other video segments in non-MP 4 format captured by other video capturing devices are also possible.

S20, inputting the video data to be recognized into the trained feature model to obtain a classification result; identifying whether the video data is edited or not according to the classification result; the feature model is constructed based on a gray level co-occurrence matrix.

Specifically, a feature model is constructed first, and the feature model is constructed based on a gray level co-occurrence matrix. And then training the feature model by using the video data to obtain a trained feature model, classifying the video data to be recognized by using the trained feature model, and recognizing whether the video data is edited or not.

The feature model can be constructed by dimension reduction of a Spatial Rich model SRM (Spatial Rich Models) based on a gray level co-occurrence matrix, which is popular in the field of image steganalysis. For example, an image residual quantization threshold T in the SRM model is set to be 2, the order of the co-occurrence matrix O is set to be 2, and only one quantization step q is selected for the residual video frame obtained by each filter to be quantized. The quantization step q of each filter does not affect the dimension of the feature extraction model, only affects the process of video frame residual quantization, and the q value slightly affects the extraction of the video frame features. In the original SRM model, the residual error filtered by each filter is quantized with different quantization step sizes q, so as to obtain different sub SRM models, which are 106 sub models in total. Therefore, the SRM model has 39 filters in total, and when only one quantization step q is selected for quantization in the residual image obtained by each filter, 39 SRM submodels can be obtained. The feature extractor has 603 dimensions, and is recorded as

All 39 high-pass filters are retained, theseThe filter suppresses the content of the video frame in different ways and finally extracts features from the residual of the video frame. Table 1 shows the quantization step size q set for each filter.

In this embodiment, the training process of the feature model includes:

s201, acquiring video data, and preprocessing the video data to obtain a training data set and a detection data set;

specifically, the preprocessing of the video data includes, but is not limited to, importing each original monitoring video segment into video editing software, and then setting parameters (such as code rate, video format, key frame number, and frame rate) of the video editing software to be the same as the original monitoring video parameters; and finally, rendering and outputting the video processed by the video editing software.

The edited video data is divided into two parts, one part (first edited video data) is used for subsequent training, and the other part (second edited video data) is used for subsequent detection. Similarly, the original video data is also divided into two parts, one part is used for subsequent training, and the other part is used for subsequent detection.

Further, the Video editing software may be Adobe Premiere software, Adobe After Effects software, and core Video Studio software.

S202, extracting the features of the training data in the training data set by using the feature model; constructing the characteristics extracted from the training data processed by the same type of video editing software into a same type of video characteristic set;

specifically, with reference to fig. 4, feature extraction is performed on each original surveillance Video and videos (of 3 types in total) processed by 3 types of Video editing software (Adobe Premiere, Adobe After Effects and core Video Studio), where the feature extraction process is as follows:

PGM format, PGM is the picture of the portable gray scale format (portable gray map file format), the picture of the form has not been compressed by the picture, therefore can avoid the compression of the picture to the influence of the compression trace left by the video software. For each video frameMentioned with step S20

And extracting features to form 603-dimensional features. The extracted features of the videos processed by the same type of video software can form a feature set of the videos. One video frame through

603 dimensional features can be extracted, and when a video contains N number of video frames, N × 603 dimensional features can be obtained. If the number of the video samples processed by a certain type of video software is M, the feature set of the merged type of video samples is M × N × 603 dimensional features.

S203, training the constructed video feature sets of different classes into two classifiers in pairs; and training the two classifiers by using the training data set to obtain a trained feature model.

Specifically, the video feature sets of each software obtained in step S201 are used to mix the features edited by the same video software into one class, and the features of the video editing software class and the features of the original surveillance video are trained pairwise, so that the classifier model includes 6 classifiers. The two classifier here selects the Ensemble classifier. Two classes are trained in different combinations to form a two-classifier, and each class is trained with the other classes to form the two-classifier. If there are n classes, n (n-1)/2 classifiers can be obtained by training two by two, and each two classifier will vote for one of the two classes training it in the testing stage, so that each class has the same voting chance. In the present application, 4 types of videos are referred to, that is, n is 4, so that 6 classifiers can be obtained through training. Majority voting of 6 bi-classifiers can result in a quad classifier model.

And training the classifier by using the training data in the training data set to obtain a trained feature model.

Specifically, all videos in the detection data set are converted into video frames in a portable gray scale mode; and extracting the characteristics of each video frame by using the characteristic model, and classifying each video frame by using two classifiers to obtain a classification result.

In this embodiment, the video classification based on the video editing software should use the video as the classification unit. The classifiers trained in the process are all based on the video frame image as the classification unit, so that the conversion from the video frame as the classification unit to the video as the classification unit exists here. As shown in the classification framework of fig. 5, each video frame is first passed through all the two classifiers, and the final class with more tickets is the classification result of the frame through majority voting of the first two classifiers. And the video is judged as the class by majority voting of the video frames for the second time, wherein the most video frames in the video belong to which class. Therefore, the multi-classification problem based on video unit involves two majority voting processes.

TABLE 1 quantization step q value selection table for SRM submodel

The principle of the invention is as follows:

all software processing videos finally need to render and output the videos through software compression coding. The video frame is a component of the video, the video compression of the software can cause the modification of the pixel points of the video frame, and the correlation between the pixel points and the adjacent pixels cannot be avoided from being distorted. Different video editing software has different influences on the video frame pixel points after compression. Modern common steganalysis methods (such as rich model SRM and local binary model LBP) can derive statistical information of pixel points from a large number of image residuals. Therefore, the difference between the video frames can be described by constructing the characteristics of the video frames through the steganalysis method. The features are trained by a machine learning method, so that a classification model for distinguishing compressed videos of different video editing software can be obtained.

The invention greatly reduces the dimension by means of a steganalysis method (SRM), extracts the video frame characteristics by using the SRM model after dimension reduction and trains the characteristics. In this patent, an Ensemble classifier is mainly used, and an overall classification framework is adopted through the realization of random subspace and bagging method. In this framework, the ensemble classifier of the ensemble classifier consists of several linear discriminant analysis classifiers that are trained on a randomly selected subspace of the feature space using a randomly selected subset of all the training samples. Therefore, the computational complexity of training each linear discriminant classifier is particularly low. And enough linear discriminant classifiers are combined together, so that the classification effect is good. The ensemble classifier can achieve comparable performance with low training complexity and is therefore suitable for high-dimensional features. The method of majority voting can identify which video editing software has processed the video.

The above process provided by the present invention is further illustrated by the following specific examples.

The experiment is an experiment for identifying whether the video is processed by certain video editing software, and for diversification of data, 5 monitoring cameras of the same type are adopted in the experiment. All videos of the ith monitoring camera are selected for testing in each experiment, part of videos in the rest 4 cameras are used for training the model, and part of videos are used for testing the performance of the model. The experiment is set to be i ∈ {3,4,5}, so the experiment can be repeated three times to average the classification performance.

Because i ∈ {3,4,5}, i ═ 5 was chosen here for experimental description. Namely, 65 videos of 1 st, 2 nd, 3 th and 4 th monitoring cameras are selected to be used for training the model, the remaining 35 videos are used for testing the model, the monitoring video of the 5 th monitoring camera does not participate in the training, and all 100 monitoring videos are used for testing the model. The specific process is as follows:

from 5 Haokwev monitoring cameras of the same type, 100 segments of videos are respectively adopted, each segment of video is 5 seconds, the original monitoring video totally has 500 segments, the frame rate of each segment of video is 25 frames/second, the code rate is 7000kbps-9000kbps, the resolution of the video is 1280 x 720, and the video is in an MP4 format.

The original video is processed by Adobe Premiere Pro to obtain 500 segments pr of corresponding video. And processing the original video by using Adobe AfterEffects to obtain corresponding 500 sections of AE videos. The original Video is processed with Corel Video Studio Pro to obtain the corresponding 500 segments of Corel Video.

Thus, the experimental sample contains four types of video, 2000 video segments in total, and the number of video frames is 250000 instead. Each video type contains 500 video segments, 62500 video frames in total, containing 5 cameras, and 100 video segments per camera.

For better description, it is agreed that Ori _1_65 represents 65 videos of the 1 st camera video in the original video, AE _3_35 represents 35 videos of the 3 rd camera video in the videos processed by the AE video editing software, and so on.

The training samples include: original video: ori _1_65, Ori _2_65, Ori _3_65, Ori _4_ 65; pr software video: pr _1_65, Pr _2_65, Pr _3_65, Pr _4_ 65; AE software video: AE _1_65, AE _2_65, AE _3_65, AE _4_ 65; corel software video: corel _1_65, Corel _2_65, Corel _3_65, Corel _4_ 65.

The Pr software video, the AE software video, and the Corel software video in the training sample are described above as first editing video data. For each type of video, 65 monitor video segments each for cameras 1, 2, 3, and 4 are used for training. Therefore, each type of software has 260 videos for training, i.e. 32500 video frames participate in the training process. For each video frame

603 dimensional features are extracted, so that a video of each type of software can extract a feature set of 32500 x 603 dimensions. In addition, 100 videos for the 5 th camera of each type of video do not participate in training.

The test sample 1 (the camera of the test sample participates in the training process) includes: original video: ori _1_35, Ori _2_35, Ori _3_35, Ori _4_ 35; pr software video: pr _1_35, Pr _2_35, Pr _3_35, Pr _4_ 35; AE software video: AE _1_35, AE _2_35, AE _3_35, AE _4_ 35; corel software video: corel _1_35, Corel _2_35, Corel _3_35 and Corel _4_ 35.

The Pr software video, the AE software video, and the Corel software video in the test sample 1 are described above as second edited video data. Test sample 1 was 35 videos from each of cameras 1, 2, 3, and 4, with 65 other videos from each of cameras 1, 2, 3, and 4 participating in training. Thus, for each type of software video, there were 140 videos, 17500 frames of pictures tested. Where the accuracy of the test is given in video units.

The test sample 2 (the camera of the test sample does not participate in the training process) includes: original video: ori _5_ 100; pr software video: pr _5_ 100; AE software video: AE — 5 — 100; corel software video: corel _5_ 100.

The test sample 2 is described above as video data to be recognized for detecting the generalization ability of the training model. The test sample 2 is 100 videos of the camera 5, and the 100 videos of the camera 5 do not participate in training. Thus, for each type of software video, there are 100 videos, 12500 frames of pictures tested. Where the accuracy of the test is given in video units.

All videos in the training sample are firstly converted into a video frame (PGM) format of a portable gray scale mode. By using

And extracting the features of each video frame and combining the features of each type of video frame. Each video and the rest of the videos are trained with one Ensemble classifier two classifier, so that 6 Ensemble classifiers can be trained in the experiment, and the 6 classifiers form the video four classifier in the experiment.

All video is first converted to the portable gray mode video frame (PGM) format. By using

The characteristics of each video frame are extracted, each video frame passes through 6 second classifiers, the video frame can be judged to belong to one of 4 classes through majority voting of the second classifiers, finally, the classification conditions of all the video frames in one section of video are integrated, and the condition of four classes of the video can be obtained through secondary majority voting.

All videos in the test sample are first converted into a video frame (PGM) format in portable grayscale mode. By using

Results of the experiment

In the experiment, i belongs to {3,4 and 5}, and the process is repeated until i is 3, i is 4 and i is 5 to obtain the average value of the classification performance. The experimental results are shown below, where some videos of the test sample cameras of tables 2 and 4 participate in the training process, and none of the test sample cameras of tables 3 and 5 participate in the training process. Table 2 and table 3 are the results of classification accuracy of test sample 1 and test sample 2 in all the two classifiers, respectively. Table 4 and table 5 are the classification accuracy results of test sample 1 and test sample 2 in the four-classification model, respectively. From tables 2 and 4, it can be seen that when the camera has video to participate in the training process, the accuracy of classification is 100% for both the two-class model and the multi-class model. From table 3 and table 5, it can be found that even if the camera does not participate in the training process through the video, the classification accuracy rate is over 100% regardless of the two-classification model or the multi-classification model, which indicates that the trained classification model has strong generalization capability to the cameras of the same type.

TABLE 2 average value of classification accuracy of 1 two classifiers in test sample

TABLE 3 average value of classification accuracy of test sample 2 classifiers

TABLE 4 average value of classification accuracy of 1 four classifiers for test sample

TABLE 5 average value of classification accuracy of test sample 2 four classifiers

From the experimental results and analysis, the method provided by the invention can well identify whether the video is processed by certain video editing software, and the identification performance is good. Therefore, the method can be used as an auxiliary means for video originality (authenticity) identification.

Exemplary device

Referring to fig. 6, an embodiment of the present invention provides an apparatus for identifying video data editing, where the apparatus includes: an acquisition unit 610 and a classification identification unit 620.

Specifically, the obtaining unit 610 is configured to obtain video data to be identified; the feature model training unit 620 is configured to input the video data to be recognized into the trained feature model to obtain a classification result; identifying whether the video data is edited or not according to the classification result; the feature model is constructed based on a gray level co-occurrence matrix.

Based on the above embodiment, the present invention further provides an intelligent terminal, and a schematic block diagram thereof may be as shown in fig. 7. The intelligent terminal comprises a processor, a memory, a network interface, a display screen and a temperature sensor which are connected through a system bus. Wherein, the processor of the intelligent terminal is used for providing calculation and control capability. The memory of the intelligent terminal comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the intelligent terminal is used for being connected and communicated with an external terminal through a network. The computer program is executed by a processor to implement a video data edit recognition method. The display screen of the intelligent terminal can be a liquid crystal display screen or an electronic ink display screen, and the temperature sensor of the intelligent terminal is arranged inside the intelligent terminal in advance and used for detecting the current operating temperature of internal equipment.

It will be understood by those skilled in the art that the block diagram of fig. 7 is only a block diagram of a part of the structure related to the solution of the present invention, and does not constitute a limitation to the intelligent terminal to which the solution of the present invention is applied, and a specific intelligent terminal may include more or less components than those shown in the figure, or combine some components, or have different arrangements of components.

In one embodiment, an intelligent terminal is provided that includes a memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors, the one or more programs including instructions for:

acquiring video data to be identified;

In this embodiment, the intelligent terminal is used as a deployment carrier for editing and identifying the whole video data, a specific work flow of the intelligent terminal is that a user submits data, a system automatically selects an algorithm, model training is triggered, the accuracy is tested after the model converges, the accuracy reaches the standard, a new model version is formed, submitted to a model warehouse to be managed and formed into a new micro service, a logic code calls the new micro service through an interface, and a continuous integration and continuous deployment module is started to deploy the new application to the intelligent terminal.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, databases, or other media used in embodiments provided herein may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

In summary, the present invention discloses a method, an apparatus, an intelligent terminal and a storage medium for video data editing and identification, wherein the method comprises: acquiring video data to be identified; inputting the video data to be recognized into the trained feature model to obtain a classification result; identifying whether the video data is edited or not according to the classification result; the feature model is constructed based on a gray level co-occurrence matrix. The optimized SRM model can capture different traces left by different video processing software during compression, so that the fact that the video is processed by certain video editing software is accurately identified. Compared with the existing SRM model, the SRM model after dimensionality reduction has fewer dimensions, and the accuracy of the identified original video and the video processed by different software is extremely high. Whether the video is processed by software is judged by detecting whether the video is compressed by the software, an auxiliary means is provided for video forensics, and an effective method is provided for the originality identification of the monitoring video.

It is to be understood that the invention is not limited to the examples described above, but that modifications and variations may be effected thereto by those of ordinary skill in the art in light of the foregoing description, and that all such modifications and variations are intended to be within the scope of the invention as defined by the appended claims.

Claims

1. A method for identifying video data editing, the method comprising:

acquiring video data to be identified;

2. The method for video data editing and recognition according to claim 1, wherein the feature model training process comprises:

3. The method for editing and identifying video data according to claim 2, wherein the preprocessing the video data to obtain a training data set and a detection data set comprises:

4. The method for editing and identifying video data according to claim 2, wherein the training data is subjected to feature extraction, and the method comprises the following steps:

5. The method for editing and recognizing video data according to claim 2, wherein the step of inputting the video data to be recognized into the trained feature model to obtain a classification result comprises:

6. The method of claim 2, wherein the two classifiers are set classifiers.

7. The video data editing and identifying method according to claim 1, wherein the feature model is a 603-dimensional feature model obtained by performing dimension reduction on a space-rich model based on a gray level co-occurrence matrix.

8. An apparatus for video data editing and recognition, the apparatus comprising:

the acquisition unit is used for acquiring video data to be identified;

9. An intelligent terminal comprising a memory, and one or more programs, wherein the one or more programs are stored in the memory, and wherein the one or more programs being configured to be executed by the one or more processors comprises instructions for performing the method of any of claims 1-7.

10. A non-transitory computer-readable storage medium, wherein instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the method of any of claims 1-7.