CN111444878A - Video classification method and device and computer readable storage medium - Google Patents

Video classification method and device and computer readable storage medium Download PDF

Info

Publication number
CN111444878A
CN111444878A CN202010272792.3A CN202010272792A CN111444878A CN 111444878 A CN111444878 A CN 111444878A CN 202010272792 A CN202010272792 A CN 202010272792A CN 111444878 A CN111444878 A CN 111444878A
Authority
CN
China
Prior art keywords
video
classification
sample set
training sample
augmented
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010272792.3A
Other languages
Chinese (zh)
Other versions
CN111444878B (en
Inventor
尹康
吴宇斌
郭烽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp Ltd filed Critical Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority to CN202010272792.3A priority Critical patent/CN111444878B/en
Publication of CN111444878A publication Critical patent/CN111444878A/en
Application granted granted Critical
Publication of CN111444878B publication Critical patent/CN111444878B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a video classification method, a video classification device and a computer readable storage medium, wherein the video classification method comprises the following steps: acquiring an original training sample set comprising a plurality of video samples marked with classification labels; selecting a video sample combination and a corresponding classification label from an original training sample set for weighted fusion to obtain an augmented training sample set; inputting the video samples in the augmented training sample set into a neural network for training to obtain a video classification model; and classifying the videos to be classified based on the video classification model. Through the implementation of the scheme, the original video sample and the classification label are fused in a weighting fusion mode in the model training stage, the augmented training sample set can be obtained, the scale and the diversity of the training sample set are guaranteed, meanwhile, the operation complexity of the construction of the training sample set is effectively reduced, and the realizability of the construction of the training sample set is improved.

Description

Video classification method and device and computer readable storage medium
Technical Field
The present application relates to the field of electronic technologies, and in particular, to a video classification method and apparatus, and a computer-readable storage medium.
Background
As a fundamental task in the field of computer vision, video classification has been a research focus in the industry. With the continuous development of hardware equipment such as high-definition video equipment and the like, an artificial intelligence solution based on a video classification technology is widely applied to aspects of video interest recommendation, video security, intelligent home and the like, and the application scene is extremely wide.
In practical application, compared with an image classification model for classifying a single-frame image, when videos are classified, the video classification model requires to construct a larger model structure due to the fact that the correlation among multiple frames of input images needs to be captured, and then training data with larger quantity needs to be used in the model training process. However, at present, a manual labeling mode is usually adopted for constructing the training data set, and the operation complexity and the realizability of the training data set construction are high.
Disclosure of Invention
The embodiment of the application provides a video classification method, a video classification device and a computer-readable storage medium, which can at least solve the problems of high operation complexity and poor realizability caused by performing class marking on training data required by a video classification model in a manual marking mode in the related art.
A first aspect of an embodiment of the present application provides a video classification method, including:
acquiring an original training sample set comprising a plurality of video samples marked with classification labels;
selecting a video sample combination and the corresponding classification label from the original training sample set for weighted fusion to obtain an augmented training sample set; wherein the sample size of the augmented training sample set is larger than the original training sample set;
inputting the video samples in the augmented training sample set into a neural network for training to obtain a video classification model;
and classifying the video to be classified based on the video classification model.
A second aspect of the embodiments of the present application provides a video classification apparatus, including:
the system comprises an acquisition module, a classification module and a classification module, wherein the acquisition module is used for acquiring an original training sample set comprising a plurality of video samples marked with classification labels;
the augmentation module is used for selecting a video sample combination and the corresponding classification label from the original training sample set to carry out weighted fusion to obtain an augmentation training sample set; wherein the sample size of the augmented training sample set is larger than the original training sample set;
the training module is used for inputting the video samples in the augmented training sample set into a neural network for training to obtain a video classification model;
and the classification module is used for classifying the video to be classified based on the video classification model.
A third aspect of embodiments of the present application provides an electronic apparatus, including: a memory, a processor, and a bus; the bus is used for realizing the connection communication between the memory and the processor; a processor for executing a computer program stored on the memory; when the processor executes the computer program, the steps in the video classification method provided by the first aspect of the embodiment of the present application are implemented.
A fourth aspect of the embodiments of the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps in the video classification method provided in the first aspect of the embodiments of the present application.
As can be seen from the above, according to the video classification method, apparatus, and computer-readable storage medium provided in the present application, an original training sample set including a plurality of video samples labeled with classification labels is obtained; selecting a video sample combination and a corresponding classification label from an original training sample set for weighted fusion to obtain an augmented training sample set; inputting the video samples in the augmented training sample set into a neural network for training to obtain a video classification model; and classifying the videos to be classified based on the video classification model. Through the implementation of the scheme, the original video sample and the classification label are fused in a weighting fusion mode in the model training stage, the augmented training sample set can be obtained, the scale and the diversity of the training sample set are guaranteed, meanwhile, the operation complexity of the construction of the training sample set is effectively reduced, and the realizability of the construction of the training sample set is improved.
Drawings
Fig. 1 is a schematic basic flowchart of a video classification method according to a first embodiment of the present application;
fig. 2 is a flowchart illustrating a specific video classification method according to a first embodiment of the present application;
fig. 3 is a schematic flowchart of a training sample augmentation method according to a first embodiment of the present disclosure;
fig. 4 is a schematic diagram of sample weighted fusion according to a first embodiment of the present application;
FIG. 5 is a schematic flow chart illustrating a model training method according to a first embodiment of the present disclosure;
fig. 6 is a schematic flowchart of a model testing method according to a first embodiment of the present application;
fig. 7 is a schematic flowchart of a refinement method of a video classification method according to a second embodiment of the present application;
fig. 8 is a schematic diagram illustrating program modules of a video classification apparatus according to a third embodiment of the present application;
fig. 9 is a schematic diagram illustrating program modules of another video classification apparatus according to a third embodiment of the present application;
fig. 10 is a schematic structural diagram of an electronic device according to a fourth embodiment of the present application.
Detailed Description
In order to make the objects, features and advantages of the present invention more apparent and understandable, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In order to overcome the defects of high operation complexity and poor realizability caused by performing class marking on training data required by a video classification model in a manual marking mode in the related art, a first embodiment of the application provides a video classification method. As shown in fig. 1, which is a basic flowchart of a video classification method provided in this embodiment, the video classification method includes the following steps:
step 101, obtaining an original training sample set comprising a plurality of video samples marked with classification labels.
Specifically, in practical application, the neural network is trained under a supervised learning framework, so that training samples need to be obtained in the embodiment, and the neural network is trained based on different training samples. Wherein each sample in each sample set is provided with a classification label for representing the category of each sample, such as drama, war, psychology, comedy, etc.
It should be noted that, in this embodiment, the original training sample set is a batch of video samples with manually labeled categories acquired by the user himself collecting, labeling or downloading a common data set, and the original training sample set is a small-scale training sample set.
In some embodiments of this embodiment, in order to ensure the accuracy of the subsequently trained model, after the original training sample set is obtained, the video samples in the original training sample set may be subjected to an adjustment process. Firstly, uniformly sampling a video sample according to a preset sampling frequency fs, wherein the fs can be preferably 0.5 Hz; then, the sampled image frame is scaled to make the length of the long side scaled to a preset length value W, and the length of the short side is extended to W by means of black dots (RGB value is (0,0,0)), where W in this embodiment may be 512 pixels.
In addition, in order to increase the speed of subsequently reading the video sample, the adjusted video sample may be stored as a binary file in the embodiment, and the format of the binary file may be tfrecrd, so that the efficiency of subsequently training the model may be effectively improved.
And 102, selecting a video sample combination and a corresponding classification label from the original training sample set for weighted fusion to obtain an augmented training sample set.
Specifically, the sample size of the augmented training sample set of the present embodiment is larger than that of the original training sample set, that is, the number of samples of the augmented training sample set is larger than that of the original training sample set. In the present embodiment, "augmentation" may be understood as adding and augmenting, and the augmented training sample set is a large-scale training sample set obtained after sample augmentation is performed on the basis of the original training sample set.
In this embodiment, before selecting a video sample combination and performing weighted fusion on a corresponding classification label, training samples in an original training sample set may be preprocessed, a first preset value M and a second preset value H are set, where M may preferably be 12, and H may preferably be 448 pixels, first, video samples in the original training sample set are randomly clipped in a spatial dimension, that is, a sub-image region of H × H is randomly selected in a region of an image frame W × W, then, video samples in the original training sample set are randomly clipped in a temporal dimension, that is, the original video samples are recorded to have N frames, if M is less than N, consecutive M frames are randomly selected in the original video, if M is greater than N, a pure black frame (RGB value is (0,0,0), and) with M-N frame size being H × H pixels is supplemented after the original video samples, and if M is equal to N, no operation is performed.
And 103, inputting the video samples in the augmented training sample set into a neural network for training to obtain a video classification model.
Specifically, the present embodiment implements video classification based on a deep learning algorithm, wherein the neural network used may include any one of a Deep Neural Network (DNN), a Convolutional Neural Network (CNN), and a Recurrent Neural Network (RNN). In the embodiment, based on the training samples in the augmented training sample set, a certain optimization algorithm is adopted to perform neural network training in a specific training environment, wherein the learning rate and the training times during training can be determined according to actual requirements, and are not limited uniquely herein. It should be understood that the neural network of this embodiment may be correspondingly determined according to an algorithm operation scenario, for example, in a scenario insensitive to an operation duration of the classification algorithm, a neural network with a higher structural complexity may be adopted, and performance of the algorithm may be ensured, so that accuracy of a final classification result may be improved.
And 104, classifying the video to be classified based on the video classification model.
Specifically, in this embodiment, the video to be classified is used as an input of the trained video classification model, and the video classification model predicts the category of the video to be classified and assigns a corresponding classification label to the category of the video to be classified, so as to classify the video to be classified. Because the video classification model of the embodiment is obtained by training based on the augmented training sample set, the trained video classification model has strong generalization capability, and the accuracy of the model classification result is higher.
As shown in fig. 2, which is a schematic flow chart of a specific video classification method provided in this embodiment, in some embodiments of this embodiment, classifying videos to be classified based on a video classification model specifically includes the following steps:
step 201, preprocessing a video to be classified to obtain a plurality of video segments;
step 202, inputting a plurality of video segments into a video classification model to obtain a plurality of prediction classification label vectors;
and 203, determining the classification of the video to be classified based on the prediction classification label vector of which the maximum value of the classification label is greater than a preset threshold value in the plurality of prediction classification label vectors.
Further, preprocessing the video to be classified to obtain a plurality of video segments may include: uniformly sampling videos to be classified according to a preset sampling frequency; and equally dividing the sampled video to be classified according to the preset video segment length to obtain a plurality of video segments.
Specifically, in this embodiment, the foregoing method of adjusting the video samples in the original training sample set may be first adopted to adjust the to-be-classified mode, that is, fs is used to uniformly sample the to-be-classified video, then the size of the to-be-classified video is adjusted to W × W, and finally the size of the to-be-classified video is scaled to H × H, then the to-be-classified video is recorded to have T frames in total, if T is smaller than M, the to-be-classified video is supplemented to M frames according to the preprocessing method described in the foregoing embodiment, if T is greater than M, the to-be-classified video is equally divided according to M frames of each segment, if the length of the last segment is less than M frames, the to-be-classified video is supplemented to M frames, and if T is.
In addition, in this embodiment, m video segments obtained by preprocessing are input into the model obtained by training to obtain m prediction vectors, and if n classes to which the video to be classified may belong are total, the ith prediction vector is marked as predi={pi_1,pi_2,…,pi_n}. Then, go through m prediction vectors bit-wise, if p1_j,p2_j,…,pm_jIf the maximum value in the values is greater than the preset threshold value t, it indicates that the input video belongs to the jth class, otherwise, the input video does not belong to the jth class, and in this embodiment, t may be preferably set to 0.5.
Further, before uniformly sampling the video to be classified according to the preset sampling frequency, the method comprises the following steps: obtaining the allowable time consumption of classification operation corresponding to the video to be classified; allowing the sampling frequency to be determined from the classification operation.
Specifically, in the embodiment, when performing uniform sampling, the sampling frequencies used in different scenes may be different, for example, the embodiment may determine the corresponding sampling frequency according to the allowed time consumption of the classification operation of each classification scene, that is, in a scene insensitive to the operation duration of the classification algorithm, a relatively high sampling frequency may be used.
As shown in fig. 3, a flow diagram of a training sample augmentation method provided in this embodiment is further provided, in some embodiments of this embodiment, a video sample combination and a corresponding class label are selected from an original training sample set for weighted fusion, so as to obtain an augmented training sample set, which specifically includes the following steps:
301, randomly selecting two video samples from an original training sample set;
step 302, performing weighted fusion on the two video samples and the corresponding classification labels according to a preset weighted fusion formula to obtain augmented video samples correspondingly marked with the classification labels;
and step 303, obtaining an augmented training sample set based on all the augmented video samples.
Specifically, as shown in fig. 4, a sample weighted fusion diagram provided in this embodiment is shown, and a weighted fusion formula of this embodiment is expressed as:
Figure BDA0002443715680000071
wherein x is1、x2Representing two video samples, y, respectively1、y2Respectively, a classification label corresponding to two video samples, x an augmented video sample, and y a classification label corresponding to the augmented video sample, β -B (a, a), indicating beta distribution subject to preset parameters, a may preferably be 0.4.
As shown in fig. 5, which is a schematic flow chart of a model training method provided in this embodiment, in some embodiments of this embodiment, inputting video samples in an augmented training sample set to a neural network for training, and obtaining a video classification model specifically includes the following steps:
step 501, inputting video samples in an augmented training sample set into a neural network for training to obtain a predicted classification label vector actually output by the iterative training;
step 502, comparing the classification label vector corresponding to the augmented training sample set with the prediction classification label vector by adopting a preset loss function;
and 503, when the comparison result meets a preset model convergence condition, determining the network model obtained by the iterative training as the trained video classification model.
Specifically, in this embodiment, a training process is repeated for a plurality of times of iterative optimization, an output predicted by each training of the neural network and a classification label carried by a sample are calculated as a loss Function (L oss Function), if the CNN structure is moblie-V3 + NeXtV V L AD, the loss Function may be cross entropy loss, then, for example, a BP algorithm is used to update a trainable parameter in the network in a reverse gradient manner, parameters such as a weight of the neural network are adjusted to reduce a loss Function value of a next iteration, when the loss Function value satisfies a preset standard, it is determined that a model convergence condition is satisfied, that is, the training process of the entire neural network model is completed, otherwise, the next iteration training is continued until the model convergence condition is satisfied.
As shown in fig. 6, which is a schematic flow chart of a model testing method provided in this embodiment, in some embodiments of this embodiment, after inputting video samples in an augmented training sample set to a neural network for training to obtain a video classification model, the method further includes the following steps:
601, obtaining a test sample set comprising a plurality of video samples marked with classification labels;
step 602, inputting video samples in a test sample set into a video classification model to obtain a test classification label vector;
603, carrying out correlation calculation on the test classification label vector and the classification label vector marked by the test sample set;
and step 604, determining that the video classification model is effective when the correlation degree is greater than a preset correlation degree threshold value.
Specifically, in this embodiment, after the video classification model is trained, the test sample is further used to verify the validity of the trained video classification model, that is, the test sample is input to the trained model, and then the correlation between the output label vector and the label vector of the test sample is compared to determine the validity of the model. When the correlation is greater than the preset threshold value, it is determined that the trained video classification model is a correct and effective model, and then the step of classifying the video to be classified based on the video classification model is allowed to be further executed, otherwise, the model performance is poor, and the model needs to be trained again.
Based on the technical scheme of the embodiment of the application, an original training sample set comprising a plurality of video samples marked with classification labels is obtained; selecting a video sample combination and a corresponding classification label from an original training sample set for weighted fusion to obtain an augmented training sample set; inputting the video samples in the augmented training sample set into a neural network for training to obtain a video classification model; and classifying the videos to be classified based on the video classification model. Through the implementation of the scheme, the original video sample and the classification label are fused in a weighting fusion mode in the model training stage, the augmented training sample set can be obtained, the scale and the diversity of the training sample set are guaranteed, meanwhile, the operation complexity of the construction of the training sample set is effectively reduced, and the realizability of the construction of the training sample set is improved.
The method in fig. 7 is a refined video classification method provided in the second embodiment of the present application, and the video classification method includes:
step 701, obtaining an original training sample set including a plurality of video samples marked with classification labels.
In this embodiment, the original training sample set is a batch of video samples with manually labeled categories acquired by a user in a manner of collecting, labeling or downloading a public data set, and the original training sample set is a small-scale training sample set.
Step 702, selecting a video sample combination and a corresponding classification label from an original training sample set to perform weighted fusion to obtain an augmented training sample set.
Specifically, the augmented training sample set of the present embodiment is a large-scale training sample set obtained after sample expansion is performed on the basis of the original training sample set, and the sample scale of the augmented training sample set is larger than that of the original training sample set. In this embodiment, any two samples and their corresponding label vectors are taken, and the augmented sample and its label vector are generated according to a weighted fusion method.
And 703, inputting the video samples in the augmented training sample set into a neural network for training to obtain a predicted classification label vector actually output by the iterative training.
Step 704, comparing the classification label vector corresponding to the augmented training sample set with the predicted classification label vector by using a preset loss function.
In this embodiment, the training process is repeated for a plurality of times to perform iterative optimization, the output obtained by each training prediction of the neural network and the classification label carried by the sample are calculated as a loss Function (L oss Function), and then, for example, a BP algorithm is used to update the trainable parameters in the network with a backward gradient, and parameters such as the weight of the neural network are adjusted to reduce the loss Function value of the next iteration.
Step 705, when the comparison result meets the model convergence condition, determining the network model obtained by the iterative training as the trained video classification model.
Specifically, in this embodiment, when the loss function value satisfies the preset standard, it is determined that the model convergence condition is satisfied, that is, the training process of the whole neural network model is completed, otherwise, the next iterative training is continued until the model convergence condition is satisfied.
Step 706, inputting a plurality of video segments obtained by preprocessing the video to be classified into the video classification model to obtain a plurality of prediction classification label vectors.
Specifically, the embodiment uniformly samples the video to be classified according to a preset sampling frequency; and equally dividing the sampled video to be classified according to the preset video segment length to obtain a plurality of video segments. In addition, in this embodiment, m video segments obtained by preprocessing are input into the model obtained by training to obtain m prediction vectors, and if n classes to which the video to be classified may belong are total, the ith prediction vector is marked as predi={pi_1,pi_2,…,pi_n}。
And 707, determining the classification of the video to be classified based on the predicted classification label vector of which the maximum value of the classification label is greater than a preset threshold value in the plurality of predicted classification label vectors.
Specifically, the present embodiment traverses m prediction vectors bitwise, if p1_j,p2_j,…,pm_jIf the maximum value is greater than the preset threshold value t, the input video is indicatedAnd belongs to the jth class, otherwise, the input video does not belong to the jth class, and the embodiment may preferably set t to 0.5.
It should be understood that, the size of the serial number of each step in this embodiment does not mean the execution sequence of the step, and the execution sequence of each step should be determined by its function and inherent logic, and should not be limited uniquely to the implementation process of the embodiment of the present application.
The embodiment of the application discloses a video classification method, which comprises the steps of obtaining an original training sample set comprising a plurality of video samples marked with classification labels; selecting a video sample combination and a corresponding classification label from an original training sample set for weighted fusion to obtain an augmented training sample set; inputting the video samples in the augmented training sample set into a neural network for training to obtain a video classification model; and classifying the videos to be classified based on the video classification model. Through the implementation of the scheme, the original video sample and the classification label are fused in a weighting fusion mode in the model training stage, the augmented training sample set can be obtained, the scale and the diversity of the training sample set are guaranteed, meanwhile, the operation complexity of the construction of the training sample set is effectively reduced, and the realizability of the construction of the training sample set is improved.
Fig. 8 is a video classification apparatus according to a third embodiment of the present application. The video classification apparatus can be used to implement the video classification method in the foregoing embodiments. As shown in fig. 8, the video classification apparatus mainly includes:
an obtaining module 801, configured to obtain an original training sample set including a plurality of video samples labeled with classification labels;
the augmentation module 802 is configured to select a video sample combination and a corresponding classification label from an original training sample set to perform weighted fusion, so as to obtain an augmented training sample set; wherein the sample scale of the augmented training sample set is larger than that of the original training sample set;
the training module 803 is configured to input the video samples in the augmented training sample set to a neural network for training to obtain a video classification model;
the classification module 804 is configured to classify the video to be classified based on the video classification model.
In some embodiments of the present embodiment, the augmentation module 802 is specifically configured to: randomly selecting two video samples from an original training sample set; performing weighted fusion on the two video samples and the corresponding classification labels according to a preset weighted fusion formula to obtain the corresponding augmented video samples marked with the classification labels; and obtaining an augmented training sample set based on all the augmented video samples. The weighted fusion formula of the present embodiment can be expressed as:
Figure BDA0002443715680000111
wherein x is1、x2Representing two video samples, y, respectively1、y2Respectively, a classification label corresponding to two video samples, x represents an augmented video sample, y represents a classification label corresponding to the augmented video sample, and β represents a beta distribution subject to a preset parameter.
In some embodiments of this embodiment, the training module 803 is specifically configured to: inputting the video samples in the augmented training sample set into a neural network for training to obtain a predicted classification label vector actually output by the iterative training; comparing the classification label vector corresponding to the augmented training sample set with the prediction classification label vector by adopting a preset loss function; and when the comparison result meets the preset model convergence condition, determining the network model obtained by the iterative training as the trained video classification model.
As shown in fig. 9, another video classification apparatus provided in this embodiment is another video classification apparatus according to this embodiment, in another embodiment of this embodiment, the video classification apparatus further includes: the testing module 805 is configured to input the video samples in the augmented training sample set to a neural network for training to obtain a video classification model, and then obtain a testing sample set including a plurality of video samples labeled with classification labels; inputting the video samples in the test sample set into a video classification model to obtain a test classification label vector; carrying out correlation calculation on the test classification label vector and the classification label vector marked by the test sample set; and when the correlation degree is greater than a preset correlation degree threshold value, determining that the video classification model is effective. Correspondingly, the classification module 804 performs its function when the video classification model is valid.
In other embodiments of this embodiment, the classification module 804 is specifically configured to: preprocessing a video to be classified to obtain a plurality of video segments; inputting a plurality of video segments into a video classification model to obtain a plurality of prediction classification label vectors; and determining the classification of the video to be classified based on the prediction classification label vector of which the maximum value of the classification label is greater than a preset threshold value in the prediction classification label vectors.
Further, in some embodiments of this embodiment, when the classification module 804 preprocesses the video to be classified to obtain a plurality of video segments, it is specifically configured to: uniformly sampling videos to be classified according to a preset sampling frequency; and equally dividing the sampled video to be classified according to the preset video segment length to obtain a plurality of video segments.
Referring to fig. 9, in some embodiments of the present invention, the video classification apparatus further includes: a determining module 806, configured to obtain allowable time consumption of a classification operation corresponding to a video to be classified before uniformly sampling the video to be classified according to a preset sampling frequency; allowing the sampling frequency to be determined from the classification operation.
It should be noted that, the video classification methods in the first and second embodiments can be implemented based on the video classification device provided in this embodiment, and persons skilled in the art can clearly understand that, for convenience and simplicity of description, the specific working process of the video classification device described in this embodiment may refer to the corresponding process in the foregoing method embodiment, and details are not described here.
According to the video classification device provided by the embodiment, an original training sample set comprising a plurality of video samples marked with classification labels is obtained; selecting a video sample combination and a corresponding classification label from an original training sample set for weighted fusion to obtain an augmented training sample set; inputting the video samples in the augmented training sample set into a neural network for training to obtain a video classification model; and classifying the videos to be classified based on the video classification model. Through the implementation of the scheme, the original video sample and the classification label are fused in a weighting fusion mode in the model training stage, the augmented training sample set can be obtained, the scale and the diversity of the training sample set are guaranteed, meanwhile, the operation complexity of the construction of the training sample set is effectively reduced, and the realizability of the construction of the training sample set is improved.
Referring to fig. 10, fig. 10 is an electronic device according to a fourth embodiment of the present disclosure. The electronic device can be used for implementing the video classification method in the foregoing embodiment. As shown in fig. 10, the electronic device mainly includes:
a memory 1001, a processor 1002, a bus 1003 and a computer program stored on the memory 1001 and executable on the processor 1002, the memory 1001 and the processor 1002 being connected by the bus 1003. The processor 1002, when executing the computer program, implements the video classification method in the foregoing embodiments. Wherein the number of processors may be one or more.
The Memory 1001 may be a high-speed Random Access Memory (RAM) Memory or a non-volatile Memory (e.g., a disk Memory). The memory 1001 is used for storing executable program code, and the processor 1002 is coupled to the memory 1001.
Further, an embodiment of the present application also provides a computer-readable storage medium, where the computer-readable storage medium may be provided in an electronic device in the foregoing embodiments, and the computer-readable storage medium may be the memory in the foregoing embodiment shown in fig. 10.
The computer-readable storage medium has stored thereon a computer program which, when executed by a processor, implements the video classification method in the foregoing embodiments. Further, the computer-readable storage medium may be various media that can store program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a RAM, a magnetic disk, or an optical disk.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, a division of modules is merely a division of logical functions, and an actual implementation may have another division, for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or modules, and may be in an electrical, mechanical or other form.
Modules described as separate parts may or may not be physically separate, and parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment.
In addition, functional modules in the embodiments of the present application may be integrated into one processing module, or each of the modules may exist alone physically, or two or more modules are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode.
The integrated module, if implemented in the form of a software functional module and sold or used as a separate product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a readable storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method of the embodiments of the present application. And the aforementioned readable storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.
It should be noted that, for the sake of simplicity, the above-mentioned method embodiments are described as a series of acts or combinations, but those skilled in the art should understand that the present application is not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the present application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In view of the above description of the video classification method, apparatus and computer-readable storage medium provided by the present application, those skilled in the art will recognize that there may be variations in the embodiments and applications of the video classification method, apparatus and computer-readable storage medium provided by the present application.

Claims (10)

1. A method of video classification, comprising:
acquiring an original training sample set comprising a plurality of video samples marked with classification labels;
selecting a video sample combination and the corresponding classification label from the original training sample set for weighted fusion to obtain an augmented training sample set; wherein the sample size of the augmented training sample set is larger than the original training sample set;
inputting the video samples in the augmented training sample set into a neural network for training to obtain a video classification model;
and classifying the video to be classified based on the video classification model.
2. The video classification method according to claim 1, wherein the selecting a video sample combination and the corresponding classification label from the original training sample set for weighted fusion to obtain an augmented training sample set comprises:
randomly selecting two video samples from the original training sample set;
performing weighted fusion on the two video samples and the corresponding classification labels according to a preset weighted fusion formula to obtain the augmented video samples marked with the classification labels correspondingly; the weighted fusion formula is expressed as:
Figure FDA0002443715670000011
wherein x is1、x2Respectively representing the two video samples, y1、y2Respectively representing classification labels corresponding to the two video samples, x representing the augmented video sample, y representing a classification label corresponding to the augmented video sample, β representing a beta distribution subject to a preset parameter;
and obtaining an augmented training sample set based on all the augmented video samples.
3. The video classification method according to claim 1, wherein the inputting the video samples in the augmented training sample set into a neural network for training to obtain a video classification model comprises:
inputting the video samples in the augmented training sample set into a neural network for training to obtain a predicted classification label vector actually output by the iterative training;
comparing the classification label vector corresponding to the augmented training sample set with the prediction classification label vector by adopting a preset loss function;
and when the comparison result meets a preset model convergence condition, determining the network model obtained by the iterative training as a trained video classification model.
4. The video classification method according to claim 1, wherein after the video samples in the augmented training sample set are input to a neural network for training, and a video classification model is obtained, the method further comprises:
obtaining a test sample set comprising a plurality of video samples marked with classification labels;
inputting the video samples in the test sample set into the video classification model to obtain a test classification label vector;
performing correlation calculation on the test classification label vector and the classification label vector marked by the test sample set;
and when the correlation degree is greater than a preset correlation degree threshold value, determining that the video classification model is effective, and then executing the step of classifying the video to be classified based on the video classification model.
5. The video classification method according to any one of claims 1 to 4, wherein the classifying the video to be classified based on the video classification model comprises:
preprocessing a video to be classified to obtain a plurality of video segments;
inputting the video segments into the video classification model to obtain a plurality of prediction classification label vectors;
and determining the classification of the video to be classified based on the prediction classification label vector of which the maximum value of the classification label is greater than a preset threshold value in the plurality of prediction classification label vectors.
6. The video classification method according to claim 5, wherein the preprocessing the video to be classified to obtain a plurality of video segments comprises:
uniformly sampling the video to be classified according to a preset sampling frequency;
and equally dividing the sampled video to be classified according to the preset video segment length to obtain a plurality of video segments.
7. The video classification method according to claim 6, wherein before uniformly sampling the video to be classified according to a preset sampling frequency, the method comprises:
acquiring allowable time consumption of classification operation corresponding to the video to be classified;
the sampling frequency is determined according to the allowed time of the classification operation.
8. A video classification apparatus, comprising:
the system comprises an acquisition module, a classification module and a classification module, wherein the acquisition module is used for acquiring an original training sample set comprising a plurality of video samples marked with classification labels;
the augmentation module is used for selecting a video sample combination and the corresponding classification label from the original training sample set to carry out weighted fusion to obtain an augmentation training sample set; wherein the sample size of the augmented training sample set is larger than the original training sample set;
the training module is used for inputting the video samples in the augmented training sample set into a neural network for training to obtain a video classification model;
and the classification module is used for classifying the video to be classified based on the video classification model.
9. An electronic device, comprising: a memory, a processor, and a bus;
the bus is used for realizing connection communication between the memory and the processor;
the processor is configured to execute a computer program stored on the memory;
the processor, when executing the computer program, performs the steps of the method of any one of claims 1 to 7.
10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 7.
CN202010272792.3A 2020-04-09 2020-04-09 Video classification method, device and computer readable storage medium Active CN111444878B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010272792.3A CN111444878B (en) 2020-04-09 2020-04-09 Video classification method, device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010272792.3A CN111444878B (en) 2020-04-09 2020-04-09 Video classification method, device and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN111444878A true CN111444878A (en) 2020-07-24
CN111444878B CN111444878B (en) 2023-07-18

Family

ID=71650174

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010272792.3A Active CN111444878B (en) 2020-04-09 2020-04-09 Video classification method, device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN111444878B (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111860671A (en) * 2020-07-28 2020-10-30 中山大学 Classification model training method and device, terminal equipment and readable storage medium
CN112000842A (en) * 2020-08-31 2020-11-27 北京字节跳动网络技术有限公司 Video processing method and device
CN112052356A (en) * 2020-08-14 2020-12-08 腾讯科技(深圳)有限公司 Multimedia classification method, apparatus and computer-readable storage medium
CN112131430A (en) * 2020-09-24 2020-12-25 腾讯科技(深圳)有限公司 Video clustering method and device, storage medium and electronic equipment
CN112489043A (en) * 2020-12-21 2021-03-12 无锡祥生医疗科技股份有限公司 Heart disease detection device, model training method, and storage medium
CN112651356A (en) * 2020-12-30 2021-04-13 杭州菲助科技有限公司 Video difficulty grading model obtaining method and video difficulty grading method
CN112686193A (en) * 2021-01-06 2021-04-20 东北大学 Action recognition method and device based on compressed video and computer equipment
CN112784111A (en) * 2021-03-12 2021-05-11 有半岛(北京)信息科技有限公司 Video classification method, device, equipment and medium
CN112883861A (en) * 2021-02-07 2021-06-01 同济大学 Feedback type bait casting control method based on fine-grained classification of fish school feeding state
CN113011534A (en) * 2021-04-30 2021-06-22 平安科技(深圳)有限公司 Classifier training method and device, electronic equipment and storage medium
CN113178189A (en) * 2021-04-27 2021-07-27 科大讯飞股份有限公司 Information classification method and device and information classification model training method and device
CN113344214A (en) * 2021-05-31 2021-09-03 北京百度网讯科技有限公司 Training method and device of data processing model, electronic equipment and storage medium
CN113392269A (en) * 2020-10-22 2021-09-14 腾讯科技(深圳)有限公司 Video classification method, device, server and computer readable storage medium
CN113705315A (en) * 2021-04-08 2021-11-26 腾讯科技(深圳)有限公司 Video processing method, device, equipment and storage medium
CN111783902B (en) * 2020-07-30 2023-11-07 腾讯科技(深圳)有限公司 Data augmentation, service processing method, device, computer equipment and storage medium
US11928563B2 (en) 2020-12-18 2024-03-12 Beijing Baidu Netcom Science Technology Co., Ltd. Model training, image processing method, device, storage medium, and program product
CN113705315B (en) * 2021-04-08 2024-05-14 腾讯科技(深圳)有限公司 Video processing method, device, equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109978071A (en) * 2019-04-03 2019-07-05 西北工业大学 Hyperspectral image classification method based on data augmentation and Multiple Classifier Fusion
US10402691B1 (en) * 2018-10-04 2019-09-03 Capital One Services, Llc Adjusting training set combination based on classification accuracy
CN110263217A (en) * 2019-06-28 2019-09-20 北京奇艺世纪科技有限公司 A kind of video clip label identification method and device
CN110633751A (en) * 2019-09-17 2019-12-31 上海眼控科技股份有限公司 Training method of car logo classification model, car logo identification method, device and equipment
CN110751224A (en) * 2019-10-25 2020-02-04 Oppo广东移动通信有限公司 Training method of video classification model, video classification method, device and equipment
CN110807437A (en) * 2019-11-08 2020-02-18 腾讯科技(深圳)有限公司 Video granularity characteristic determination method and device and computer-readable storage medium
CN110837579A (en) * 2019-11-05 2020-02-25 腾讯科技(深圳)有限公司 Video classification method, device, computer and readable storage medium
CN110929622A (en) * 2019-11-15 2020-03-27 腾讯科技(深圳)有限公司 Video classification method, model training method, device, equipment and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10402691B1 (en) * 2018-10-04 2019-09-03 Capital One Services, Llc Adjusting training set combination based on classification accuracy
CN109978071A (en) * 2019-04-03 2019-07-05 西北工业大学 Hyperspectral image classification method based on data augmentation and Multiple Classifier Fusion
CN110263217A (en) * 2019-06-28 2019-09-20 北京奇艺世纪科技有限公司 A kind of video clip label identification method and device
CN110633751A (en) * 2019-09-17 2019-12-31 上海眼控科技股份有限公司 Training method of car logo classification model, car logo identification method, device and equipment
CN110751224A (en) * 2019-10-25 2020-02-04 Oppo广东移动通信有限公司 Training method of video classification model, video classification method, device and equipment
CN110837579A (en) * 2019-11-05 2020-02-25 腾讯科技(深圳)有限公司 Video classification method, device, computer and readable storage medium
CN110807437A (en) * 2019-11-08 2020-02-18 腾讯科技(深圳)有限公司 Video granularity characteristic determination method and device and computer-readable storage medium
CN110929622A (en) * 2019-11-15 2020-03-27 腾讯科技(深圳)有限公司 Video classification method, model training method, device, equipment and storage medium

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111860671A (en) * 2020-07-28 2020-10-30 中山大学 Classification model training method and device, terminal equipment and readable storage medium
CN111783902B (en) * 2020-07-30 2023-11-07 腾讯科技(深圳)有限公司 Data augmentation, service processing method, device, computer equipment and storage medium
CN112052356A (en) * 2020-08-14 2020-12-08 腾讯科技(深圳)有限公司 Multimedia classification method, apparatus and computer-readable storage medium
CN112052356B (en) * 2020-08-14 2023-11-24 腾讯科技(深圳)有限公司 Multimedia classification method, apparatus and computer readable storage medium
CN112000842A (en) * 2020-08-31 2020-11-27 北京字节跳动网络技术有限公司 Video processing method and device
CN112131430A (en) * 2020-09-24 2020-12-25 腾讯科技(深圳)有限公司 Video clustering method and device, storage medium and electronic equipment
CN113392269A (en) * 2020-10-22 2021-09-14 腾讯科技(深圳)有限公司 Video classification method, device, server and computer readable storage medium
US11928563B2 (en) 2020-12-18 2024-03-12 Beijing Baidu Netcom Science Technology Co., Ltd. Model training, image processing method, device, storage medium, and program product
CN112489043A (en) * 2020-12-21 2021-03-12 无锡祥生医疗科技股份有限公司 Heart disease detection device, model training method, and storage medium
CN112651356A (en) * 2020-12-30 2021-04-13 杭州菲助科技有限公司 Video difficulty grading model obtaining method and video difficulty grading method
CN112651356B (en) * 2020-12-30 2024-01-23 杭州菲助科技有限公司 Video difficulty grading model acquisition method and video difficulty grading method
CN112686193B (en) * 2021-01-06 2024-02-06 东北大学 Action recognition method and device based on compressed video and computer equipment
CN112686193A (en) * 2021-01-06 2021-04-20 东北大学 Action recognition method and device based on compressed video and computer equipment
CN112883861A (en) * 2021-02-07 2021-06-01 同济大学 Feedback type bait casting control method based on fine-grained classification of fish school feeding state
CN112784111A (en) * 2021-03-12 2021-05-11 有半岛(北京)信息科技有限公司 Video classification method, device, equipment and medium
CN113705315A (en) * 2021-04-08 2021-11-26 腾讯科技(深圳)有限公司 Video processing method, device, equipment and storage medium
CN113705315B (en) * 2021-04-08 2024-05-14 腾讯科技(深圳)有限公司 Video processing method, device, equipment and storage medium
CN113178189A (en) * 2021-04-27 2021-07-27 科大讯飞股份有限公司 Information classification method and device and information classification model training method and device
CN113011534A (en) * 2021-04-30 2021-06-22 平安科技(深圳)有限公司 Classifier training method and device, electronic equipment and storage medium
CN113011534B (en) * 2021-04-30 2024-03-29 平安科技(深圳)有限公司 Classifier training method and device, electronic equipment and storage medium
CN113344214A (en) * 2021-05-31 2021-09-03 北京百度网讯科技有限公司 Training method and device of data processing model, electronic equipment and storage medium
CN113344214B (en) * 2021-05-31 2022-06-14 北京百度网讯科技有限公司 Training method and device of data processing model, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN111444878B (en) 2023-07-18

Similar Documents

Publication Publication Date Title
CN111444878A (en) Video classification method and device and computer readable storage medium
Giraldo et al. Graph moving object segmentation
Suganuma et al. Attention-based adaptive selection of operations for image restoration in the presence of unknown combined distortions
Xiong et al. Learning to generate time-lapse videos using multi-stage dynamic generative adversarial networks
Dosovitskiy et al. Generating images with perceptual similarity metrics based on deep networks
Vondrick et al. Generating the future with adversarial transformers
CN109543502B (en) Semantic segmentation method based on deep multi-scale neural network
Zhang et al. Unifying motion deblurring and frame interpolation with events
CN113688723B (en) Infrared image pedestrian target detection method based on improved YOLOv5
Pang et al. Visual haze removal by a unified generative adversarial network
CN113378600B (en) Behavior recognition method and system
KR20200145827A (en) Facial feature extraction model learning method, facial feature extraction method, apparatus, device, and storage medium
CN111488932B (en) Self-supervision video time-space characterization learning method based on frame rate perception
Zhang et al. Single image dehazing via dual-path recurrent network
CN112560827B (en) Model training method, model training device, model prediction method, electronic device, and medium
CN114494981B (en) Action video classification method and system based on multi-level motion modeling
CN114463218B (en) Video deblurring method based on event data driving
CN112257855B (en) Neural network training method and device, electronic equipment and storage medium
Desai et al. Next frame prediction using ConvLSTM
CN112633100B (en) Behavior recognition method, behavior recognition device, electronic equipment and storage medium
CN112383824A (en) Video advertisement filtering method, device and storage medium
CN114170484B (en) Picture attribute prediction method and device, electronic equipment and storage medium
CN112084371B (en) Movie multi-label classification method and device, electronic equipment and storage medium
CN112926517B (en) Artificial intelligence monitoring method
Razali et al. A log-likelihood regularized KL divergence for video prediction with a 3D convolutional variational recurrent network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant