WO2020221278A1 - Procédé d'entraînement de modèles, procédé de classification de vidéos, appareil associé, et dispositif électronique - Google Patents

Procédé d'entraînement de modèles, procédé de classification de vidéos, appareil associé, et dispositif électronique Download PDF

Info

Publication number
WO2020221278A1
WO2020221278A1 PCT/CN2020/087690 CN2020087690W WO2020221278A1 WO 2020221278 A1 WO2020221278 A1 WO 2020221278A1 CN 2020087690 W CN2020087690 W CN 2020087690W WO 2020221278 A1 WO2020221278 A1 WO 2020221278A1
Authority
WO
WIPO (PCT)
Prior art keywords
classification
video
frame
training
neural network
Prior art date
Application number
PCT/CN2020/087690
Other languages
English (en)
Chinese (zh)
Inventor
苏驰
李凯
陈宜航
刘弘也
Original Assignee
北京金山云网络技术有限公司
北京金山云科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京金山云网络技术有限公司, 北京金山云科技有限公司 filed Critical 北京金山云网络技术有限公司
Publication of WO2020221278A1 publication Critical patent/WO2020221278A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames

Definitions

  • This application relates to the field of image processing technology, and in particular to a video classification method and model training method, device and electronic equipment.
  • the video can be classified by 3D convolutional neural network, and the spatiotemporal features of the video can be extracted by 3D convolution.
  • the network parameters of the 3D convolutional neural network are relatively large, resulting in high computational cost in the network training process and recognition process , The time cost is large; in addition, the number of layers of the three-dimensional convolutional neural network is relatively shallow, and it is difficult to mine high-level semantic features, which makes the video classification accuracy rate low.
  • the purpose of this application is to provide a video classification method and model training method, device, and electronic equipment to reduce the amount of calculation, improve model training and recognition efficiency, and at the same time improve the accuracy of video classification.
  • an embodiment of the present application provides a method for training a video classification model.
  • the method includes: determining current training data based on a preset training set; the training data includes multiple video frames; and inputting the training data to the initial model ;
  • the initial model includes a convolutional neural network, a recurrent neural network and an output network; the initial features of a multi-frame video frame are extracted through the convolutional neural network; the final feature of a multi-frame video frame is extracted from the initial feature through the recurrent neural network; the final feature Input to the output network and output the prediction results of multiple frames of video frames; determine the loss value of the prediction result through the preset prediction loss function; train the initial model according to the loss value until the parameters in the initial model converge to obtain a video classification model.
  • an embodiment of the present application provides a video classification method, which includes: obtaining a video to be classified; obtaining multiple video frames from the video according to a preset sampling interval; and inputting the multiple video frames to the pre-training
  • the completed video classification model outputs the classification results of multi-frame video frames; the video classification model is trained through the training method of the above-mentioned video classification model; the video category is determined according to the classification results of the multi-frame video frames.
  • an embodiment of the present application provides a training device for a video classification model.
  • the device includes: a training data determination module configured to determine current training data based on a preset training set; the training data includes multiple video frames;
  • the training data input module is set to input training data to the initial model;
  • the initial model includes a convolutional neural network, a cyclic neural network and an output network;
  • the initial feature extraction module is set to extract the initial features of a multi-frame video frame through the convolutional neural network ;
  • the final feature extraction module is set to extract the final features of the multi-frame video frame from the initial features through the recurrent neural network;
  • the prediction result output module is set to input the final feature to the output network and output the prediction result of the multi-frame video frame;
  • loss The value determination and training module is set to determine the loss value of the prediction result through a preset prediction loss function; the initial model is trained according to the loss value until the parameters in the initial model converge to obtain a video classification model.
  • an embodiment of the present application provides a video classification device.
  • the device includes: a video acquisition module configured to acquire a video to be classified; a video frame acquisition module configured to acquire a video from the video at a preset sampling interval. Frame video frame; classification module, set to input multi-frame video frames to the pre-trained video classification model, and output the classification result of the multi-frame video frame; the video classification model is trained through the training method of the above-mentioned video classification model; category determination module , Set to determine the category of the video according to the classification result of the multi-frame video frame.
  • an embodiment of the present application provides an electronic device, including a processor and a memory.
  • the memory stores machine-executable instructions that can be executed by the processor.
  • the processor executes the machine-executable instructions to implement the training of the aforementioned video classification model. Method, or the steps of the above video classification method.
  • an embodiment of the present application provides a machine-readable storage medium that stores machine-executable instructions.
  • the machine-executable instructions When the machine-executable instructions are called and executed by a processor, the machine-executable instructions prompt The processor implements the training method of the video classification model or the steps of the video classification method.
  • an embodiment of the present application provides an executable program code, the executable program code is set to be executed to execute any of the above-mentioned video classification model training methods, or the steps of any of the above-mentioned video classification methods .
  • the video classification method and its model training method, device and electronic equipment provided by the embodiments of the application adopt a combination of convolutional neural network and recurrent neural network, and extract features through a combination of two-dimensional convolution and one-dimensional convolution, Compared with three-dimensional convolution, the amount of calculation can be greatly reduced, thereby improving the efficiency of model training and recognition; this method can also consider the correlation information between video frames in the process of extracting features, so the extracted features can be accurate Characterize the video type, thereby improving the accuracy of video classification.
  • FIG. 1 is a flowchart of a method for training a video classification model provided by an embodiment of the application
  • FIG. 2 is a schematic structural diagram of a convolutional neural network in an initial model provided by an embodiment of this application;
  • FIG. 3 is a schematic structural diagram of an initial model provided by an embodiment of this application.
  • FIG. 4 is a schematic structural diagram of another initial model provided by an embodiment of the application.
  • FIG. 5 is a flowchart of another video classification model training method provided by an embodiment of the application.
  • FIG. 6 is a flowchart of a video classification method provided by an embodiment of the application.
  • FIG. 7 is a schematic structural diagram of a training device for a video classification model provided by an embodiment of the application.
  • FIG. 8 is a schematic structural diagram of a video classification device provided by an embodiment of this application.
  • FIG. 9 is a schematic structural diagram of an electronic device provided by an embodiment of the application.
  • embodiments of the present application provide a video classification method and model training method, device and electronic equipment ; This technology can be widely used in the classification of conventional video and short video files in various formats, and can be used in scenes such as video surveillance, video push, and video management.
  • the method includes the following steps:
  • Step S102 Determine current training data based on a preset training set; the training data includes multiple video frames.
  • the training data needs to be determined multiple times during the training of the initial model; in one embodiment, the current training data can be determined from the preset training set each time; or, other implementations In this way, new training data can also be obtained every time.
  • the training set can contain multiple videos or multiple groups of video frames, and each group contains multiple video frames.
  • the multi-frame video frames are collected from the same video.
  • Each video or each group of video frames is pre-labeled with a type tag, which can be divided from multiple angles, such as video theme, scene, action, character attributes, etc., so each video or each group of video frames can be performed from multiple angles classification.
  • the genre tags of video A include TV series, metropolis, crime solving, idol, etc.
  • the training set contains multiple videos, you can select a video from it, and then collect multiple video frames from the video, and determine the collected multiple video frames as the training data; if the training set is It contains multiple sets of video frames, from which a set of video frames can be selected, and the multiple video frames in the set of video frames are determined as training data.
  • the above-mentioned training set can also be divided into a training subset and a cross-validation subset according to a preset ratio.
  • the current training data can be determined from the training subset.
  • test data can be obtained from the cross-validation subset to verify the performance of the model.
  • Step S104 input the training data to the initial model;
  • the initial model includes a convolutional neural network, a cyclic neural network and an output network.
  • multiple video frames in the training data can be adjusted to a preset size, such as 512 pixels * 512 pixels, so that the input video frame matches the convolutional neural network.
  • step S106 the feature of the multi-frame video frame is extracted through the convolutional neural network as the initial feature.
  • multiple video frames may be input to the convolutional neural network.
  • the features output by the convolutional neural network are referred to as initial features.
  • the convolutional neural network can be implemented by a multi-layer convolutional layer, of course, it can also include a pooling layer, a fully connected layer, an activation function, and so on.
  • the convolutional neural network performs convolution operations on each input video frame to obtain the feature map corresponding to each video frame. That is, the initial feature includes multiple feature maps, or the initial feature is composed of multiple feature maps. A large feature map composed.
  • step S108 the feature of the multi-frame video frame is extracted from the initial feature through the cyclic neural network as the final feature.
  • the above-mentioned initial features can be input to the recurrent neural network.
  • the features output by the recurrent neural network are referred to as final features here.
  • the multiple video frames are related to each other in content.
  • the above-mentioned convolutional neural network usually processes each video frame separately, and the extracted feature maps of each video frame are not related to each other.
  • the initial feature can be processed through the cyclic neural network. According to the timing between the multi-frame video frame, the feature processing process is introduced The associated information of the upper and lower video frames makes the final feature more representative of the video type.
  • Recurrent neural network is a type of recurrent neural network that takes sequence data as input and recursively in the evolution direction of the sequence. Therefore, the use of recurrent neural network to process the initial features can introduce the associated information of the upper and lower video frames.
  • Step S110 input the final feature to the output network, and output the prediction result of the multi-frame video frame.
  • the output network can be realized by a fully connected layer. If the final feature is a two-dimensional multilayer feature, the fully connected layer can convert the final feature of the two-dimensional multilayer into a prediction result in the form of a one-dimensional vector. Each element in the prediction result corresponds to a category, and the value of this element represents the possibility that the video belongs to the category. Alternatively, the final feature may also be a feature of other dimensions, which is not specifically limited.
  • Step S112 Determine the loss value of the prediction result through the preset prediction loss function; train the initial model according to the loss value until the parameters in the initial model converge to obtain the video classification model.
  • the multi-frame video frames in the training data are pre-labeled with type labels.
  • the type labels can be converted into vector form.
  • the probability value corresponding to the category of the video is usually 1.
  • the probability value corresponding to the category not belonging is usually 0.
  • the prediction loss function can compare the difference between the prediction result and the labeled type label. Generally, the greater the difference, the greater the aforementioned loss value.
  • the parameters of each part of the above-mentioned initial model can be adjusted to achieve the purpose of training. When each parameter in the model converges, the training ends and the video classification model is obtained.
  • the training method of the video classification model uses a combination of convolutional neural network and recurrent neural network, and extracts features through a combination of two-dimensional convolution and one-dimensional convolution. Compared with three-dimensional convolution, The amount of calculation can be greatly reduced, thereby improving the efficiency of model training and recognition; this method can also consider the associated information between video frames in the process of extracting features, so the extracted features can accurately represent the video type, thereby improving the video The accuracy of classification.
  • the above model can use multi-frame video frames sampled from the video to process and identify the video category.
  • the amount of processed data is small, thus further reducing the amount of calculation , Improve the efficiency of training and recognition.
  • the embodiment of this application also provides another method for training a video classification model, which is implemented on the basis of the method described in the above embodiment; from the above embodiment, it can be seen that the initial model includes a convolutional neural network, a recurrent neural network, and an output network In this embodiment, the specific structure of the initial model is further described.
  • Figure 2 shows a schematic diagram of the structure of a convolutional neural network in an initial model.
  • the convolutional neural network includes multiple groups of sub-networks connected in sequence (three groups of sub-networks are taken as an example in Figure 2), a global average pooling layer and Classification fully connected layer; each group of sub-networks includes a batch normalization layer, an activation function layer, a convolution layer, and a pooling layer that are sequentially connected.
  • the batch normalization layer in each group of sub-networks is used to normalize the data in the input video frame or feature map.
  • This process can speed up the convergence speed of the convolutional neural network and the initial model, and can alleviate The problem of gradient dispersion in the multi-layer convolutional network enables the activation function layer in the convolutional neural network to perform function transformation on the normalized video frame or feature map.
  • This transformation process breaks the linear combination of the convolutional layer input , Can improve the feature expression ability of convolutional neural network.
  • the activation function layer may specifically be Sigmoid function, tanh function, Relu function, etc.
  • the convolution layer is used to perform convolution calculations on the video frame or feature map transformed by the activation function layer, and output the corresponding feature map;
  • the pooling layer can be an average pooling layer (average pooling or mean-pooling), global average pooling Layer (Global Average Pooling), max-pooling layer (max-pooling), etc.;
  • the pooling layer can be used to compress the feature map output by the convolutional layer, retain the main features in the feature map, and delete non-main features to reduce
  • the dimension of the feature map taking the average pooling layer as an example, the average pooling layer can average the feature point values in the neighborhood of the preset range size of the current feature point, and use the average value as the new feature of the current feature point Point value.
  • the pooling layer can also help the feature map to maintain some non-deformation, such as rotation invariance, translation invariance, and expansion invariance.
  • the global average pooling layer connected to the sub-network is used for the feature maps output by the last set of sub-networks, and the feature sub-maps of each layer are averaged to obtain a one-dimensional feature vector to further reduce the dimensionality of the feature map.
  • the classification fully connected layer performs fully connected calculations on the feature vectors output by the global average pooling layer, and normalizes the calculation results through functions such as softmax.
  • the convolutional neural network before executing the training method of the video classification model, may be pre-trained through a large number of data sets in advance, so as to obtain the initial parameters of the convolutional neural network.
  • the data set may include an object recognition data set and a scene recognition data set.
  • the batch size can be set to 256 (that is, the above-mentioned preset number)
  • the momentum is set to 0.9
  • the weight attenuation coefficient is set to 0.0001.
  • the momentum and weight attenuation coefficients are used to update various parameters in the convolutional neural network through the back propagation algorithm and the stochastic gradient descent method.
  • Figure 3 shows a schematic structural diagram of an initial model
  • the initial model includes a convolutional neural network, a recurrent neural network and an output network, and also includes a global average pooling network; the global average pooling network is set in the convolutional neural network Between the cyclic neural network and the cyclic neural network; through the global average pooling network, the initial feature can be reduced in dimensionality, so that the dimension of the initial feature matches the cyclic neural network. That is, the dimensionality reduction process can be performed on the initial feature through the global average pooling network to obtain the dimensionality reduction feature; the feature of the multi-frame video frame can be extracted from the dimensionality reduction feature through the recurrent neural network , As the final feature.
  • the recurrent neural network may specifically be a Long Short Term Memory Network (Long Short Term Memory Network, which may be referred to as an LSTM network for short).
  • the performance of the long and short term memory network is better than that of ordinary recurrent neural networks and can make up for ordinary loops. Defects such as gradient explosion and gradient disappearance of neural network.
  • the LSTM network includes input gates, output gates, and forgetting gates; the input gate is set to lift the features that need to be memorized from the initial features; the output gate is set to read the memory features, and the forgetting gate is set to determine whether to retain the memory features .
  • the opening and closing timing of the input gate, output gate, and forget gate can be trained to complete the training of the cyclic neural network.
  • the initial feature contains M feature vectors, expressed as z t ,t ⁇ [1,...,M], and then these M feature vectors can be sent to the LSTM network.
  • the final feature of the multi-frame video frame is obtained, denoted as h M ; the calculation process of each feature vector by the LSTM network is as follows:
  • W f , W i , W C , W o , b f , b i , b C and b o are the preset parameters of the LSTM; after the M-th feature vector is input to the LSTM, h M is obtained; the h M That is, the final feature can be input to the subsequent output network.
  • the above-mentioned output network may include a classification fully connected layer; the above-mentioned final features are input to the classification fully connected layer, and the classification result vector may be output.
  • the classification fully connected layer contains multiple neurons, and the classification fully connected layer presets a weight vector; the weight vector contains the weight elements corresponding to each neuron in the classification fully connected layer; for each neuron, the neuron The element is connected with each feature element of the final feature.
  • the neuron multiplies each feature element in the final feature with the corresponding weight element in the weight vector to obtain the predicted value of the neuron; because of the fully connected layer Contains multiple neurons, and the predicted values corresponding to multiple neurons constitute the above classification result vector.
  • the above-mentioned initial model may further include a classification function; inputting the classification result vector output by the above-mentioned classification fully connected layer into the classification function can output the classification probability vector corresponding to the classification result vector.
  • the classification function is used to calculate the probability of each element in the classification result vector.
  • the function can be a Softmax function or other probability regression functions.
  • the above-mentioned initial model uses a combination of convolutional neural network and long and short-term memory network, and extracts features through a combination of two-dimensional convolution and one-dimensional convolution.
  • this method can also consider the associated information between video frames in the process of extracting features, so the extracted features can accurately represent the video type; and the long- and short-term memory network can also avoid the deep network level.
  • the problem of gradient explosion and gradient disappearance improves the performance of the model, which is conducive to extracting the deep features of the video frame, thereby further improving the accuracy of video classification.
  • the embodiment of the present application also provides another method for training a video classification model, which is implemented on the basis of the method described in the foregoing embodiment; this embodiment focuses on the specific content of the output network and the prediction loss function.
  • the prediction loss function includes a classification loss function;
  • the classification loss function can be expressed as:
  • represents the summation operation
  • exp represents the exponential function with the natural constant e as the base
  • log represents the logarithmic operation
  • p l is the lth element of the classification probability vector corresponding to the classification result vector in the prediction result
  • y l is the pre-labeled multi-frame video frame standard The l-th element of the probability vector
  • r l is the proportion of the category corresponding to y l in the training set
  • is a preset hyperparameter, which can be set to 1.
  • r l is the proportion of the category corresponding to y l in the training set. If the proportion of a category in the training set is low, the r l value corresponding to the category will be smaller, and the w l value will be larger, so Play a balanced role, alleviate the problem of uneven distribution of samples in each category, and then improve the training efficiency of the model and the recognition accuracy of the model.
  • the output network includes a classification fully connected layer.
  • the output network also includes a threshold fully connected layer, as shown in Figure 4; the final feature is input to the threshold fully connected layer, and the threshold result vector can be output .
  • the threshold fully connected layer contains multiple neurons, and the threshold fully connected layer is preset with a weight vector; the weight vector contains the weight elements corresponding to each neuron of the threshold fully connected layer; for each This neuron is connected with each feature element of the final feature.
  • the neuron multiplies each feature element in the final feature with the corresponding weight element in the weight vector to obtain the prediction corresponding to the neuron Value; Since the fully connected layer contains multiple neurons, the predicted values corresponding to multiple neurons constitute the above threshold result vector.
  • the threshold fully connected layer is set to extract the threshold result of the model for each category learning from the final feature, that is, the threshold result vector.
  • Each category corresponds to its own threshold.
  • the thresholds of each category can be the same or different. Compared with the way of manually setting the threshold, the threshold of model learning is more accurate and reasonable, which is beneficial to improve the classification accuracy of the model.
  • the prediction loss function also includes a threshold loss function to evaluate the accuracy of the threshold result vector;
  • the function value of the classification loss function and the function value of the threshold loss function can be weighted and summed to obtain
  • the classification loss function takes into account the proportion of each category in the training set, which alleviates the problem of uneven sample distribution in each category, thereby improving the training efficiency of the model and the recognition accuracy of the model; the output network also sets The threshold fully connected layer, compared with the way of manually setting the threshold, the threshold of the model learning is more accurate and reasonable, which further improves the classification accuracy of the model.
  • This embodiment of the application also provides another method for training a video classification model, which is implemented on the basis of the method described in the above embodiment; this embodiment focuses on the specific process of training the initial model according to the loss value; as shown in Figure 5 As shown, the method includes the following steps:
  • Step S502 Determine current training data based on a preset training set; the training data includes multiple video frames;
  • Step S504 input training data to an initial model;
  • the initial model includes a convolutional neural network, a cyclic neural network, and an output network;
  • Step S506 extracting features of multiple video frames through a convolutional neural network as initial features
  • Step S508 extracting the features of the multi-frame video frame from the initial features through the recurrent neural network as the final feature;
  • Step S510 input the final feature to the output network, and output the prediction result of the multi-frame video frame;
  • Step S512 Determine the loss value of the prediction result through a preset prediction loss function
  • Step S51 update the parameters in the initial model according to the loss value
  • the function mapping relationship can be set in advance, the original parameters and the loss value are input into the function mapping relationship, and the updated parameters can be calculated.
  • the function mapping relationship of different parameters can be the same or different.
  • the parameters to be updated can be determined from the initial model according to preset rules; the parameters to be updated can be all parameters in the initial model, or some parameters can be randomly determined from the initial model; and then calculate the loss value to be updated parameters Derivative of Among them, L is the loss value of the probability matrix; W is the parameter to be updated; Represents partial derivative operation; the parameter to be updated can also be called the weight of each neuron. This process can also be called a back-propagation algorithm; if the loss value is large, it means that the output of the current initial model does not match the expected output result, then the derivative of the loss value to the parameter to be updated in the initial model can be calculated. As a basis for adjusting the parameters to be updated.
  • is the preset coefficient.
  • This process can also be referred to as a stochastic gradient descent algorithm; the derivative of each parameter to be updated can also be understood as the direction in which the loss value drops the fastest based on the current parameter to be updated. By adjusting the parameters in this direction, the loss value can be quickly reduced, so that The parameter converges.
  • the initial model is trained once, a loss value is obtained. At this time, one or more parameters can be randomly selected from each parameter in the initial model to perform the above-mentioned update process. The model training time is shorter and the algorithm is faster. ; Of course, the above-mentioned update process can also be performed on all the parameters in the initial model, and the model training in this way is more accurate.
  • Step S516 Determine whether the updated parameters are all converged; if the updated parameters are all converged, perform step S518; if the updated parameters are not all converged, perform step S502;
  • the step of determining the current training data based on the preset training set is continued until the updated parameters all converge.
  • Step S518 Determine the initial model after parameter update as the video classification model.
  • the combination of convolutional neural network and recurrent neural network is used to extract features through the combination of two-dimensional convolution and one-dimensional convolution.
  • the amount of calculation can be greatly reduced, thereby increasing Model training and recognition efficiency; this method can also consider the associated information between video frames in the process of extracting features, so the extracted features can accurately characterize the video type, thereby improving the accuracy of video classification.
  • an embodiment of the present application also provides a video classification method; this method is implemented on the basis of the video classification model training method described in the above embodiment, as shown in FIG. 6, the method includes The following steps:
  • Step S602 Obtain a video to be classified
  • the video can be a regular video or a short video; the specific format of the video can be MPEG (Moving Picture Experts Group), AVI (Audio Video Interleaved, audio video interleaved format), MOV (QuickTime film format) Wait, it is not limited here.
  • MPEG Motion Picture Experts Group
  • AVI Audio Video Interleaved, audio video interleaved format
  • MOV QuadickTime film format
  • Step S604 Obtain multiple video frames from the video according to a preset sampling interval
  • the sampling interval can be preset, for example, the sampling interval is 0.2 seconds, that is, 5 frames are sampled in 1 second.
  • Step S606 Input the multi-frame video frame to the pre-trained video classification model, and output the classification result of the multi-frame video frame; the video classification model is obtained by training the above-mentioned video classification model training method;
  • Step S608 Determine the video category according to the classification result of the multiple video frames.
  • the video classification method provided by the embodiment of the application first obtains multiple video frames from a video to be classified according to a preset sampling interval; inputs the multiple video frames to a pre-trained video classification model, and outputs multiple frames The classification result of the video frame; and the classification of the video is determined according to the classification result of the multi-frame video frame. Since the video classification model uses a combination of convolutional neural networks and recurrent neural networks, the features are extracted through the combination of two-dimensional convolution and one-dimensional convolution.
  • this method can also consider the associated information between video frames in the process of extracting features, so the extracted features can accurately characterize the video type, thereby improving the accuracy of video classification.
  • the classification result of the multi-frame video frame output by the above-mentioned video classification model may include one or more categories, and the classification result of the multi-frame video frame can be directly determined as the video category.
  • the classification result of the multi-frame video frame includes a classification probability vector and a threshold result vector.
  • the probability value of each category in the classification probability vector can be compared with the corresponding threshold in the threshold result vector to determine the video category.
  • calculate the category vector of the video according to the following formula
  • p l is the l-th element of the classification probability vector
  • ⁇ l is the l-th element of the threshold result vector
  • the category corresponding to the non-zero element is determined as the category of the video. Since the probability value of the category corresponding to the non-zero element is greater than the corresponding threshold, the category can be regarded as the category of the video.
  • the model not only outputs the classification probability vector, but also the threshold result vector. Based on the comparison result of the two vectors, the video category is finally determined.
  • the modulus output threshold is more accurate and reasonable. Helps improve the accuracy of video classification. Identifying tags for videos based on the classification result is helpful for users to quickly discover the content they are interested in, and also helpful for recommending videos of interest to users, which improves user experience.
  • the training data determining module 70 is configured to determine the current training data based on a preset training set; the training data includes multiple video frames;
  • the training data input module 71 is configured to input training data to the initial model;
  • the initial model includes a convolutional neural network, a recurrent neural network, and an output network;
  • the initial feature extraction module 72 is configured to extract features of multiple frames of video frames through a convolutional neural network as the initial features
  • the final feature extraction module 73 is configured to extract features of multiple video frames from the initial features through a recurrent neural network as the final feature;
  • the prediction result output module 74 is configured to input the final feature to the output network and output the prediction result of the multi-frame video frame;
  • the loss value determination and training module 75 is configured to determine the loss value of the prediction result through a preset prediction loss function; the initial model is trained according to the loss value until the parameters in the initial model converge to obtain a video classification model.
  • the training device for the video classification model uses a combination of a convolutional neural network and a recurrent neural network to extract features through a combination of two-dimensional convolution and one-dimensional convolution. Compared with three-dimensional convolution, The amount of calculation can be greatly reduced, thereby improving the efficiency of model training and recognition; this method can also consider the associated information between video frames in the process of extracting features, so the extracted features can accurately represent the video type, thereby improving the video The accuracy of classification.
  • the above-mentioned convolutional neural network includes multiple groups of sub-networks, a global average pooling layer, and a fully connected layer that are sequentially connected; each group of sub-networks includes a batch normalization layer, an activation function layer, and a volume Multilayer and pooling layer; the initial parameters of the above-mentioned convolutional neural network are obtained by training on a preset data set.
  • the above-mentioned initial model further includes a global average pooling network; the global average pooling network is set between the convolutional neural network and the recurrent neural network; the above-mentioned device further includes: a dimensionality reduction module set to pass the global average pooling The dimensionalization network performs dimensionality reduction processing on the initial features to obtain dimensionality reduction features; the final feature extraction module 73 is specifically configured to extract the features of the multi-frame video frame from the dimensionality reduction features through the recurrent neural network as the final feature .
  • the above-mentioned cyclic neural network includes a long and short-term memory network.
  • the above-mentioned output network includes a fully-connected classification layer; the initial model also includes a classification function; the above-mentioned prediction result output module is configured to: input the final feature into the fully-connected classification layer and output a classification result vector; the above device also includes : Probability vector output module, set to input the classification result vector to the classification function, and output the classification probability vector corresponding to the classification result vector.
  • the aforementioned prediction loss function includes a classification loss function;
  • the classification loss function is Among them, ⁇ represents the summation operation, exp represents the exponential function with the natural constant e as the base, log represents the logarithmic operation;
  • p l is the lth element of the classification probability vector corresponding to the classification result vector in the prediction result;
  • y l is the pre-labeled multi-frame video frame standard The l-th element of the probability vector;
  • r l is the proportion of the category corresponding to y l in the training set;
  • is the preset hyperparameter.
  • the aforementioned output network includes a threshold fully connected layer; the aforementioned prediction result output module is configured to: input the final feature to the threshold fully connected layer, and output a threshold result vector.
  • the aforementioned prediction loss function includes a threshold loss function;
  • the threshold loss function is y l is the l-th element of the standard probability vector of the pre-labeled multi-frame video frame;
  • ⁇ l element p l- ⁇ l );
  • ⁇ l is the l-th element of the threshold result vector in the prediction result;
  • the above prediction loss function includes a classification loss function and a threshold loss function; the above loss value determination and training module is configured to: perform a weighted summation on the function value of the classification loss function and the function value of the threshold loss function to obtain The loss value of the prediction result.
  • the above-mentioned loss value determination and training module is configured to: update the parameters in the initial model according to the loss value; determine whether the updated parameters are all converged; if the updated parameters are all converged, set the updated initial parameters
  • the model is determined to be a video classification model; if the updated parameters do not all converge, continue to perform the step of determining the current training data based on the preset training set until the updated parameters all converge.
  • the aforementioned loss value determination and training module is configured to: determine the parameters to be updated from the initial model according to preset rules; calculate the derivative of the loss value to the parameters to be updated in the initial model Among them, L is the loss value; W is the parameter to be updated; Indicates partial derivative operation; update the parameter to be updated, and get the updated parameter to be updated Among them, ⁇ is the preset coefficient.
  • FIG. 8 See FIG. 8 for a schematic structural diagram of a video classification device; the device includes:
  • the video acquisition module 80 is configured to acquire the video to be classified
  • the video frame obtaining module 81 is configured to obtain multiple video frames from the video according to a preset sampling interval
  • the classification module 82 is configured to input a multi-frame video frame to the pre-trained video classification model, and output the classification result of the multi-frame video frame; the video classification model is obtained through training of the above-mentioned video classification model training method;
  • the category determining module 83 is configured to determine the category of the video according to the classification result of the multiple video frames.
  • the classification result of the above-mentioned multi-frame video frame includes: a classification probability vector and a threshold result vector; the above-mentioned category determination module is configured to: calculate the category vector of the video Among them, p l is the l-th element of the classification probability vector; ⁇ l is the l-th element of the threshold result vector; in the category vector, the category corresponding to the non-zero element is determined as the category of the video.
  • the electronic device includes a memory 100 and a processor 101, where the memory 100 is configured to store one or more computer instructions, and one or more computer instructions are The processor 101 executes to implement the training method of the video classification model or the steps of the video classification method.
  • the electronic device shown in FIG. 9 further includes a bus 102 and a communication interface 103, and the processor 101, the communication interface 103, and the memory 100 are connected through the bus 102.
  • the memory 100 may include a high-speed random access memory (RAM, Random Access Memory), and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory.
  • RAM Random Access Memory
  • non-volatile memory such as at least one disk memory.
  • the communication connection between the system network element and at least one other network element is realized through at least one communication interface 103 (which may be wired or wireless), and the Internet, a wide area network, a local network, a metropolitan area network, etc. may be used.
  • the bus 102 may be an ISA bus, PCI bus, EISA bus, or the like.
  • the bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one bidirectional arrow is used in FIG. 9, but it does not mean that there is only one bus or one type of bus.
  • the processor 101 may be an integrated circuit chip with signal processing capability. In the implementation process, the steps of the foregoing method can be completed by an integrated logic circuit of hardware in the processor 101 or instructions in the form of software.
  • the aforementioned processor 101 may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU for short), a network processor (Network Processor, NP), etc.; it may also be a digital signal processor (Digital Signal Processing, DSP for short). ), Application Specific Integrated Circuit (ASIC for short), Field-Programmable Gate Array (FPGA for short) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
  • the methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory 100, and the processor 101 reads information in the memory 100, and completes the steps of the method of the foregoing embodiment in combination with its hardware.
  • the embodiment of the present application also provides a machine-readable storage medium that stores machine-executable instructions.
  • the machine-executable instructions When the machine-executable instructions are called and executed by a processor, the machine-executable instructions prompt the processor to implement
  • the training method of the above-mentioned video classification model or the steps of the video classification method please refer to the method embodiment, which will not be repeated here.
  • the video classification method and its model training method, device, and computer program product of the electronic device provided by the embodiments of the present application include a computer-readable storage medium storing program code, and the instructions included in the program code can be set to execute the previous
  • a computer-readable storage medium storing program code
  • the instructions included in the program code can be set to execute the previous
  • the function is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the computer software product is stored in a storage medium, including several
  • the instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program code .
  • the embodiment of the present application provides an executable program code that is configured to be executed to execute any of the above-mentioned training methods for video classification models or the steps of any of the above-mentioned video classification methods.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

La présente invention concerne un procédé de classification de vidéos, un procédé d'entraînement de modèles, un appareil associé, et un dispositif électronique. Le procédé d'entraînement consiste à : extraire des caractéristiques initiales d'une pluralité de trames vidéo au moyen d'un réseau de neurones convolutif ; extraire des caractéristiques finales de la pluralité de trames vidéo à partir des caractéristiques initiales au moyen d'un réseau de neurones récurrent ; entrer les caractéristiques finales dans un réseau de sortie, et émettre un résultat de prédiction de la pluralité de trames vidéo ; déterminer une valeur de perte du résultat de prédiction au moyen d'une fonction de perte prédite prédéfinie ; et entraîner un modèle initial en fonction de la valeur de perte jusqu'à ce qu'un paramètre dans le modèle initial converge, et obtenir un modèle de classification de vidéos. Selon la présente invention, le réseau de neurones convolutif et le réseau neuronal récurrent sont combinés, de telle sorte qu'une quantité de fonctionnement peut être fortement réduite, ce qui permet d'améliorer l'entraînement de modèles et l'efficacité de reconnaissance ; et en même temps, des informations d'association entre les trames vidéo peuvent être prises en considération dans un processus d'extraction de caractéristiques, de telle sorte que les caractéristiques extraites peuvent représenter avec précision les types de vidéos, et la précision de la classification de vidéos est améliorée.
PCT/CN2020/087690 2019-04-29 2020-04-29 Procédé d'entraînement de modèles, procédé de classification de vidéos, appareil associé, et dispositif électronique WO2020221278A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910359704.0 2019-04-29
CN201910359704.0A CN110070067B (zh) 2019-04-29 2019-04-29 视频分类方法及其模型的训练方法、装置和电子设备

Publications (1)

Publication Number Publication Date
WO2020221278A1 true WO2020221278A1 (fr) 2020-11-05

Family

ID=67369701

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/087690 WO2020221278A1 (fr) 2019-04-29 2020-04-29 Procédé d'entraînement de modèles, procédé de classification de vidéos, appareil associé, et dispositif électronique

Country Status (2)

Country Link
CN (1) CN110070067B (fr)
WO (1) WO2020221278A1 (fr)

Cited By (52)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112364204A (zh) * 2020-11-12 2021-02-12 北京达佳互联信息技术有限公司 视频搜索方法、装置、计算机设备及存储介质
CN112418320A (zh) * 2020-11-24 2021-02-26 杭州未名信科科技有限公司 一种企业关联关系识别方法、装置及存储介质
CN112560996A (zh) * 2020-12-24 2021-03-26 北京百度网讯科技有限公司 用户画像识别模型训练方法、设备、可读存储介质及产品
CN112597864A (zh) * 2020-12-16 2021-04-02 佳都新太科技股份有限公司 一种监控视频异常检测方法及装置
CN112613486A (zh) * 2021-01-07 2021-04-06 福州大学 基于多层注意力和BiGRU的专业立体视频舒适度分类方法
CN112613577A (zh) * 2020-12-31 2021-04-06 上海商汤智能科技有限公司 神经网络的训练方法、装置、计算机设备及存储介质
CN112633407A (zh) * 2020-12-31 2021-04-09 深圳云天励飞技术股份有限公司 分类模型的训练方法、装置、电子设备及存储介质
CN112734699A (zh) * 2020-12-24 2021-04-30 浙江大华技术股份有限公司 物品状态告警方法、装置、存储介质及电子装置
CN112734013A (zh) * 2021-01-07 2021-04-30 北京迈格威科技有限公司 图像处理方法、装置、电子设备及存储介质
CN112749685A (zh) * 2021-01-28 2021-05-04 北京百度网讯科技有限公司 视频分类方法、设备和介质
CN112819011A (zh) * 2021-01-28 2021-05-18 北京迈格威科技有限公司 对象间关系的识别方法、装置和电子系统
CN112835008A (zh) * 2021-01-12 2021-05-25 西安电子科技大学 基于姿态自适应卷积网络的高分辨距离像目标识别方法
CN112866156A (zh) * 2021-01-15 2021-05-28 浙江工业大学 一种基于深度学习的无线电信号聚类方法及系统
CN112954312A (zh) * 2021-02-07 2021-06-11 福州大学 一种融合时空特征的无参考视频质量评估方法
CN112949456A (zh) * 2021-02-26 2021-06-11 北京达佳互联信息技术有限公司 视频特征提取模型训练、视频特征提取方法和装置
CN112949460A (zh) * 2021-02-26 2021-06-11 陕西理工大学 一种基于视频的人体行为网络模型及识别方法
CN112950581A (zh) * 2021-02-25 2021-06-11 北京金山云网络技术有限公司 质量评估方法、装置和电子设备
CN112950579A (zh) * 2021-02-26 2021-06-11 北京金山云网络技术有限公司 图像质量评价方法、装置和电子设备
CN113079136A (zh) * 2021-03-22 2021-07-06 广州虎牙科技有限公司 动作捕捉方法、装置、电子设备和计算机可读存储介质
CN113095372A (zh) * 2021-03-22 2021-07-09 国网江苏省电力有限公司营销服务中心 一种基于鲁棒神经网络的低压台区线损合理区间计算方法
CN113094933A (zh) * 2021-05-10 2021-07-09 华东理工大学 基于注意力机制的超声波损伤检测分析方法及其应用
CN113112998A (zh) * 2021-05-11 2021-07-13 腾讯音乐娱乐科技(深圳)有限公司 模型训练方法、混响效果复现方法、设备及可读存储介质
CN113139956A (zh) * 2021-05-12 2021-07-20 深圳大学 基于语言知识导向的切面识别模型的生成方法及识别方法
CN113158971A (zh) * 2021-05-11 2021-07-23 北京易华录信息技术股份有限公司 一种事件检测模型训练方法及事件分类方法、系统
CN113177540A (zh) * 2021-04-14 2021-07-27 北京明略软件系统有限公司 基于铁轨旁部件的定位方法和系统
CN113177529A (zh) * 2021-05-27 2021-07-27 腾讯音乐娱乐科技(深圳)有限公司 识别花屏的方法、装置、设备及存储介质
CN113223058A (zh) * 2021-05-12 2021-08-06 北京百度网讯科技有限公司 光流估计模型的训练方法、装置、电子设备及存储介质
CN113220940A (zh) * 2021-05-13 2021-08-06 北京小米移动软件有限公司 视频分类方法、装置、电子设备及存储介质
CN113239869A (zh) * 2021-05-31 2021-08-10 西安电子科技大学 基于关键帧序列和行为信息的两阶段行为识别方法及系统
CN113411425A (zh) * 2021-06-21 2021-09-17 深圳思谋信息科技有限公司 视频超分模型构建处理方法、装置、计算机设备和介质
CN113469249A (zh) * 2021-06-30 2021-10-01 阿波罗智联(北京)科技有限公司 图像分类模型训练方法、分类方法、路侧设备和云控平台
CN113469450A (zh) * 2021-07-14 2021-10-01 润联软件系统(深圳)有限公司 一种数据分类方法、装置、计算机设备及存储介质
CN113591603A (zh) * 2021-07-09 2021-11-02 北京旷视科技有限公司 证件的验证方法、装置、电子设备及存储介质
CN113627536A (zh) * 2021-08-12 2021-11-09 北京百度网讯科技有限公司 模型训练、视频分类方法,装置,设备以及存储介质
CN113625244A (zh) * 2021-08-11 2021-11-09 青岛本原微电子有限公司 一种基于lstm的多源域的高重频雷达目标检测方法
CN113705686A (zh) * 2021-08-30 2021-11-26 平安科技(深圳)有限公司 图像分类方法、装置、电子设备及可读存储介质
CN113749668A (zh) * 2021-08-23 2021-12-07 华中科技大学 一种基于深度神经网络的可穿戴心电图实时诊断系统
CN113794900A (zh) * 2021-08-31 2021-12-14 北京达佳互联信息技术有限公司 视频处理方法和装置
CN113869182A (zh) * 2021-09-24 2021-12-31 北京理工大学 一种视频异常检测网络及其训练方法
CN114550310A (zh) * 2022-04-22 2022-05-27 杭州魔点科技有限公司 一种识别多标签行为的方法和装置
CN114611634A (zh) * 2022-05-11 2022-06-10 上海闪马智能科技有限公司 一种行为类型的确定方法、装置、存储介质及电子装置
CN114611584A (zh) * 2022-02-21 2022-06-10 上海市胸科医院 Cp-ebus弹性模式视频的处理方法、装置、设备与介质
CN114708531A (zh) * 2022-03-18 2022-07-05 南京大学 电梯内异常行为检测方法、装置及存储介质
CN115205763A (zh) * 2022-09-09 2022-10-18 阿里巴巴(中国)有限公司 视频处理方法及设备
CN115618282A (zh) * 2022-12-16 2023-01-17 国检中心深圳珠宝检验实验室有限公司 一种合成宝石的鉴定方法、装置及存储介质
CN115695025A (zh) * 2022-11-04 2023-02-03 中国电信股份有限公司 网络安全态势预测模型的训练方法及装置
WO2023016290A1 (fr) * 2021-08-12 2023-02-16 北京有竹居网络技术有限公司 Procédé et appareil de classification de vidéo, support lisible et dispositif électronique
CN115830516A (zh) * 2023-02-13 2023-03-21 新乡职业技术学院 一种用于电池爆燃检测的计算机神经网络图像处理方法
CN116451770A (zh) * 2023-05-19 2023-07-18 北京百度网讯科技有限公司 神经网络模型的压缩方法、训练方法、处理方法和装置
CN116567294A (zh) * 2023-05-19 2023-08-08 上海国威互娱文化科技有限公司 全景视频分割处理方法及系统
CN116935363A (zh) * 2023-07-04 2023-10-24 东莞市微振科技有限公司 刀具识别方法、装置、电子设备及可读存储介质
WO2024104068A1 (fr) * 2022-11-15 2024-05-23 腾讯科技(深圳)有限公司 Procédé et appareil de détection vidéo, dispositif, support de stockage et produit

Families Citing this family (43)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110070067B (zh) * 2019-04-29 2021-11-12 北京金山云网络技术有限公司 视频分类方法及其模型的训练方法、装置和电子设备
CN110457525B (zh) * 2019-08-12 2023-09-26 央视国际网络无锡有限公司 一种短视频分类方法
CN110489593B (zh) * 2019-08-20 2023-04-28 腾讯科技(深圳)有限公司 视频的话题处理方法、装置、电子设备及存储介质
CN110418163B (zh) * 2019-08-27 2021-10-08 北京百度网讯科技有限公司 视频帧采样方法、装置、电子设备及存储介质
CN110503160B (zh) * 2019-08-28 2022-03-25 北京达佳互联信息技术有限公司 图像识别方法、装置、电子设备及存储介质
CN110674488B (zh) * 2019-09-06 2024-04-26 深圳壹账通智能科技有限公司 基于神经网络的验证码识别方法、系统及计算机设备
CN110751030A (zh) * 2019-09-12 2020-02-04 厦门网宿有限公司 一种视频分类方法、设备及系统
CN110852195A (zh) * 2019-10-24 2020-02-28 杭州趣维科技有限公司 一种基于video slice的视频类型分类方法
CN110766096B (zh) * 2019-10-31 2022-09-23 北京金山云网络技术有限公司 视频分类方法、装置及电子设备
CN110807437B (zh) * 2019-11-08 2023-01-03 腾讯科技(深圳)有限公司 视频粒度特征确定方法、装置和计算机可读存储介质
CN110929780B (zh) * 2019-11-19 2023-07-11 腾讯科技(深圳)有限公司 视频分类模型构建、视频分类的方法、装置、设备及介质
CN111008579A (zh) * 2019-11-22 2020-04-14 华中师范大学 专注度识别方法、装置和电子设备
CN111046232B (zh) * 2019-11-30 2024-06-14 北京达佳互联信息技术有限公司 一种视频分类方法、装置及系统
CN111177460B (zh) * 2019-12-20 2023-04-18 腾讯科技(深圳)有限公司 提取关键帧的方法及装置
CN111143612B (zh) * 2019-12-27 2023-06-27 广州市百果园信息技术有限公司 视频审核模型训练方法、视频审核方法及相关装置
CN111242222B (zh) * 2020-01-14 2023-12-19 北京迈格威科技有限公司 分类模型的训练方法、图像处理方法及装置
CN113539304B (zh) * 2020-04-21 2022-09-16 华为云计算技术有限公司 视频拆条方法和装置
CN111507289A (zh) * 2020-04-22 2020-08-07 上海眼控科技股份有限公司 视频匹配方法、计算机设备和存储介质
CN111507288A (zh) * 2020-04-22 2020-08-07 上海眼控科技股份有限公司 图像检测方法、装置、计算机设备和存储介质
CN111783613B (zh) * 2020-06-28 2021-10-08 北京百度网讯科技有限公司 异常检测方法、模型的训练方法、装置、设备及存储介质
CN113842111A (zh) * 2020-06-28 2021-12-28 珠海格力电器股份有限公司 一种睡眠分期方法、装置、计算设备及存储介质
CN111782879B (zh) * 2020-07-06 2023-04-18 Oppo(重庆)智能科技有限公司 模型训练方法及装置
CN111967382A (zh) * 2020-08-14 2020-11-20 北京金山云网络技术有限公司 年龄估计方法、年龄估计模型的训练方法及装置
CN112131995A (zh) * 2020-09-16 2020-12-25 北京影谱科技股份有限公司 一种动作分类方法、装置、计算设备、以及存储介质
CN112330711B (zh) * 2020-11-26 2023-12-05 北京奇艺世纪科技有限公司 模型生成方法、信息提取方法、装置及电子设备
CN112464831B (zh) * 2020-12-01 2021-07-30 马上消费金融股份有限公司 视频分类方法、视频分类模型的训练方法及相关设备
CN112488014B (zh) * 2020-12-04 2022-06-10 重庆邮电大学 基于门控循环单元的视频预测方法
CN112669270A (zh) * 2020-12-21 2021-04-16 北京金山云网络技术有限公司 视频质量的预测方法、装置及服务器
CN112766618B (zh) * 2020-12-25 2024-02-02 苏艺然 异常预测方法及装置
CN112804561A (zh) * 2020-12-29 2021-05-14 广州华多网络科技有限公司 视频插帧方法、装置、计算机设备及存储介质
CN112799547B (zh) * 2021-01-26 2023-04-07 广州创知科技有限公司 红外触摸屏的触摸定位方法、模型训练方法、装置、设备及介质
CN113011562A (zh) * 2021-03-18 2021-06-22 华为技术有限公司 一种模型训练方法及装置
CN113163121A (zh) * 2021-04-21 2021-07-23 安徽清新互联信息科技有限公司 一种视频防抖方法及可读存储介质
CN113268631B (zh) * 2021-04-21 2024-04-19 北京点众快看科技有限公司 一种基于大数据的视频筛选方法和装置
CN113536939B (zh) * 2021-06-18 2023-02-10 西安电子科技大学 一种基于3d卷积神经网络的视频去重方法
CN113473026B (zh) * 2021-07-08 2023-04-07 厦门四信通信科技有限公司 一种摄像头的日夜切换方法、装置、设备和存储介质
CN113449700B (zh) * 2021-08-30 2021-11-23 腾讯科技(深圳)有限公司 视频分类模型的训练、视频分类方法、装置、设备及介质
CN113822382B (zh) * 2021-11-22 2022-02-15 平安科技(深圳)有限公司 基于多模态特征表示的课程分类方法、装置、设备及介质
CN114064973B (zh) * 2022-01-11 2022-05-03 人民网科技(北京)有限公司 视频新闻分类模型建立方法、分类方法、装置及设备
CN115119013B (zh) * 2022-03-26 2023-05-05 浙江九鑫智能科技有限公司 多级数据机控应用系统
CN117351463A (zh) * 2022-06-28 2024-01-05 魔门塔(苏州)科技有限公司 参数检测方法和设备
CN115205768B (zh) * 2022-09-16 2023-01-31 山东百盟信息技术有限公司 一种基于分辨率自适应网络的视频分类方法
CN117456308A (zh) * 2023-11-20 2024-01-26 脉得智能科技(无锡)有限公司 一种模型训练方法、视频分类方法及相关装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104331442A (zh) * 2014-10-24 2015-02-04 华为技术有限公司 视频分类方法和装置
US20170169315A1 (en) * 2015-12-15 2017-06-15 Sighthound, Inc. Deeply learned convolutional neural networks (cnns) for object localization and classification
CN107480707A (zh) * 2017-07-26 2017-12-15 天津大学 一种基于信息无损池化的深度神经网络方法
CN109409242A (zh) * 2018-09-28 2019-03-01 东南大学 一种基于循环卷积神经网络的黑烟车检测方法
CN110070067A (zh) * 2019-04-29 2019-07-30 北京金山云网络技术有限公司 视频分类方法及其模型的训练方法、装置和电子设备

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170178346A1 (en) * 2015-12-16 2017-06-22 High School Cube, Llc Neural network architecture for analyzing video data
CN107330362B (zh) * 2017-05-25 2020-10-09 北京大学 一种基于时空注意力的视频分类方法
CN107341462A (zh) * 2017-06-28 2017-11-10 电子科技大学 一种基于注意力机制的视频分类方法
CN108805259A (zh) * 2018-05-23 2018-11-13 北京达佳互联信息技术有限公司 神经网络模型训练方法、装置、存储介质及终端设备
CN108899075A (zh) * 2018-06-28 2018-11-27 众安信息技术服务有限公司 一种基于深度学习的dsa图像检测方法、装置及设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104331442A (zh) * 2014-10-24 2015-02-04 华为技术有限公司 视频分类方法和装置
US20170169315A1 (en) * 2015-12-15 2017-06-15 Sighthound, Inc. Deeply learned convolutional neural networks (cnns) for object localization and classification
CN107480707A (zh) * 2017-07-26 2017-12-15 天津大学 一种基于信息无损池化的深度神经网络方法
CN109409242A (zh) * 2018-09-28 2019-03-01 东南大学 一种基于循环卷积神经网络的黑烟车检测方法
CN110070067A (zh) * 2019-04-29 2019-07-30 北京金山云网络技术有限公司 视频分类方法及其模型的训练方法、装置和电子设备

Cited By (86)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112364204A (zh) * 2020-11-12 2021-02-12 北京达佳互联信息技术有限公司 视频搜索方法、装置、计算机设备及存储介质
CN112364204B (zh) * 2020-11-12 2024-03-12 北京达佳互联信息技术有限公司 视频搜索方法、装置、计算机设备及存储介质
CN112418320A (zh) * 2020-11-24 2021-02-26 杭州未名信科科技有限公司 一种企业关联关系识别方法、装置及存储介质
CN112418320B (zh) * 2020-11-24 2024-01-19 杭州未名信科科技有限公司 一种企业关联关系识别方法、装置及存储介质
CN112597864A (zh) * 2020-12-16 2021-04-02 佳都新太科技股份有限公司 一种监控视频异常检测方法及装置
CN112560996A (zh) * 2020-12-24 2021-03-26 北京百度网讯科技有限公司 用户画像识别模型训练方法、设备、可读存储介质及产品
CN112734699A (zh) * 2020-12-24 2021-04-30 浙江大华技术股份有限公司 物品状态告警方法、装置、存储介质及电子装置
CN112560996B (zh) * 2020-12-24 2024-03-05 北京百度网讯科技有限公司 用户画像识别模型训练方法、设备、可读存储介质及产品
CN112613577B (zh) * 2020-12-31 2024-06-11 上海商汤智能科技有限公司 神经网络的训练方法、装置、计算机设备及存储介质
CN112613577A (zh) * 2020-12-31 2021-04-06 上海商汤智能科技有限公司 神经网络的训练方法、装置、计算机设备及存储介质
CN112633407A (zh) * 2020-12-31 2021-04-09 深圳云天励飞技术股份有限公司 分类模型的训练方法、装置、电子设备及存储介质
CN112633407B (zh) * 2020-12-31 2023-10-13 深圳云天励飞技术股份有限公司 分类模型的训练方法、装置、电子设备及存储介质
CN112613486A (zh) * 2021-01-07 2021-04-06 福州大学 基于多层注意力和BiGRU的专业立体视频舒适度分类方法
CN112613486B (zh) * 2021-01-07 2023-08-08 福州大学 基于多层注意力和BiGRU的专业立体视频舒适度分类方法
CN112734013A (zh) * 2021-01-07 2021-04-30 北京迈格威科技有限公司 图像处理方法、装置、电子设备及存储介质
CN112835008A (zh) * 2021-01-12 2021-05-25 西安电子科技大学 基于姿态自适应卷积网络的高分辨距离像目标识别方法
CN112835008B (zh) * 2021-01-12 2022-03-04 西安电子科技大学 基于姿态自适应卷积网络的高分辨距离像目标识别方法
CN112866156A (zh) * 2021-01-15 2021-05-28 浙江工业大学 一种基于深度学习的无线电信号聚类方法及系统
CN112749685A (zh) * 2021-01-28 2021-05-04 北京百度网讯科技有限公司 视频分类方法、设备和介质
CN112749685B (zh) * 2021-01-28 2023-11-03 北京百度网讯科技有限公司 视频分类方法、设备和介质
CN112819011A (zh) * 2021-01-28 2021-05-18 北京迈格威科技有限公司 对象间关系的识别方法、装置和电子系统
CN112954312B (zh) * 2021-02-07 2024-01-05 福州大学 一种融合时空特征的无参考视频质量评估方法
CN112954312A (zh) * 2021-02-07 2021-06-11 福州大学 一种融合时空特征的无参考视频质量评估方法
CN112950581A (zh) * 2021-02-25 2021-06-11 北京金山云网络技术有限公司 质量评估方法、装置和电子设备
CN112949460A (zh) * 2021-02-26 2021-06-11 陕西理工大学 一种基于视频的人体行为网络模型及识别方法
CN112949456B (zh) * 2021-02-26 2023-12-12 北京达佳互联信息技术有限公司 视频特征提取模型训练、视频特征提取方法和装置
CN112950579A (zh) * 2021-02-26 2021-06-11 北京金山云网络技术有限公司 图像质量评价方法、装置和电子设备
CN112950579B (zh) * 2021-02-26 2024-05-31 北京金山云网络技术有限公司 图像质量评价方法、装置和电子设备
CN112949456A (zh) * 2021-02-26 2021-06-11 北京达佳互联信息技术有限公司 视频特征提取模型训练、视频特征提取方法和装置
CN112949460B (zh) * 2021-02-26 2024-02-13 陕西理工大学 一种基于视频的人体行为网络模型及识别方法
CN113079136A (zh) * 2021-03-22 2021-07-06 广州虎牙科技有限公司 动作捕捉方法、装置、电子设备和计算机可读存储介质
CN113095372A (zh) * 2021-03-22 2021-07-09 国网江苏省电力有限公司营销服务中心 一种基于鲁棒神经网络的低压台区线损合理区间计算方法
CN113177540A (zh) * 2021-04-14 2021-07-27 北京明略软件系统有限公司 基于铁轨旁部件的定位方法和系统
CN113094933A (zh) * 2021-05-10 2021-07-09 华东理工大学 基于注意力机制的超声波损伤检测分析方法及其应用
CN113094933B (zh) * 2021-05-10 2023-08-08 华东理工大学 基于注意力机制的超声波损伤检测分析方法及其应用
CN113158971B (zh) * 2021-05-11 2024-03-08 北京易华录信息技术股份有限公司 一种事件检测模型训练方法及事件分类方法、系统
CN113158971A (zh) * 2021-05-11 2021-07-23 北京易华录信息技术股份有限公司 一种事件检测模型训练方法及事件分类方法、系统
CN113112998B (zh) * 2021-05-11 2024-03-15 腾讯音乐娱乐科技(深圳)有限公司 模型训练方法、混响效果复现方法、设备及可读存储介质
CN113112998A (zh) * 2021-05-11 2021-07-13 腾讯音乐娱乐科技(深圳)有限公司 模型训练方法、混响效果复现方法、设备及可读存储介质
CN113139956A (zh) * 2021-05-12 2021-07-20 深圳大学 基于语言知识导向的切面识别模型的生成方法及识别方法
CN113223058B (zh) * 2021-05-12 2024-04-30 北京百度网讯科技有限公司 光流估计模型的训练方法、装置、电子设备及存储介质
CN113139956B (zh) * 2021-05-12 2023-04-14 深圳大学 基于语言知识导向的切面识别模型的生成方法及识别方法
CN113223058A (zh) * 2021-05-12 2021-08-06 北京百度网讯科技有限公司 光流估计模型的训练方法、装置、电子设备及存储介质
CN113220940B (zh) * 2021-05-13 2024-02-09 北京小米移动软件有限公司 视频分类方法、装置、电子设备及存储介质
CN113220940A (zh) * 2021-05-13 2021-08-06 北京小米移动软件有限公司 视频分类方法、装置、电子设备及存储介质
CN113177529A (zh) * 2021-05-27 2021-07-27 腾讯音乐娱乐科技(深圳)有限公司 识别花屏的方法、装置、设备及存储介质
CN113177529B (zh) * 2021-05-27 2024-04-23 腾讯音乐娱乐科技(深圳)有限公司 识别花屏的方法、装置、设备及存储介质
CN113239869B (zh) * 2021-05-31 2023-08-11 西安电子科技大学 基于关键帧序列和行为信息的两阶段行为识别方法及系统
CN113239869A (zh) * 2021-05-31 2021-08-10 西安电子科技大学 基于关键帧序列和行为信息的两阶段行为识别方法及系统
CN113411425B (zh) * 2021-06-21 2023-11-07 深圳思谋信息科技有限公司 视频超分模型构建处理方法、装置、计算机设备和介质
CN113411425A (zh) * 2021-06-21 2021-09-17 深圳思谋信息科技有限公司 视频超分模型构建处理方法、装置、计算机设备和介质
CN113469249A (zh) * 2021-06-30 2021-10-01 阿波罗智联(北京)科技有限公司 图像分类模型训练方法、分类方法、路侧设备和云控平台
CN113469249B (zh) * 2021-06-30 2024-04-09 阿波罗智联(北京)科技有限公司 图像分类模型训练方法、分类方法、路侧设备和云控平台
CN113591603A (zh) * 2021-07-09 2021-11-02 北京旷视科技有限公司 证件的验证方法、装置、电子设备及存储介质
CN113469450B (zh) * 2021-07-14 2024-05-10 华润数字科技有限公司 一种数据分类方法、装置、计算机设备及存储介质
CN113469450A (zh) * 2021-07-14 2021-10-01 润联软件系统(深圳)有限公司 一种数据分类方法、装置、计算机设备及存储介质
CN113625244A (zh) * 2021-08-11 2021-11-09 青岛本原微电子有限公司 一种基于lstm的多源域的高重频雷达目标检测方法
CN113627536B (zh) * 2021-08-12 2024-01-16 北京百度网讯科技有限公司 模型训练、视频分类方法,装置,设备以及存储介质
WO2023016290A1 (fr) * 2021-08-12 2023-02-16 北京有竹居网络技术有限公司 Procédé et appareil de classification de vidéo, support lisible et dispositif électronique
CN113627536A (zh) * 2021-08-12 2021-11-09 北京百度网讯科技有限公司 模型训练、视频分类方法,装置,设备以及存储介质
CN113749668A (zh) * 2021-08-23 2021-12-07 华中科技大学 一种基于深度神经网络的可穿戴心电图实时诊断系统
CN113749668B (zh) * 2021-08-23 2022-08-09 华中科技大学 一种基于深度神经网络的可穿戴心电图实时诊断系统
CN113705686B (zh) * 2021-08-30 2023-09-15 平安科技(深圳)有限公司 图像分类方法、装置、电子设备及可读存储介质
CN113705686A (zh) * 2021-08-30 2021-11-26 平安科技(深圳)有限公司 图像分类方法、装置、电子设备及可读存储介质
CN113794900A (zh) * 2021-08-31 2021-12-14 北京达佳互联信息技术有限公司 视频处理方法和装置
CN113794900B (zh) * 2021-08-31 2023-04-07 北京达佳互联信息技术有限公司 视频处理方法和装置
CN113869182B (zh) * 2021-09-24 2024-05-31 北京理工大学 一种视频异常检测网络及其训练方法
CN113869182A (zh) * 2021-09-24 2021-12-31 北京理工大学 一种视频异常检测网络及其训练方法
CN114611584A (zh) * 2022-02-21 2022-06-10 上海市胸科医院 Cp-ebus弹性模式视频的处理方法、装置、设备与介质
CN114708531A (zh) * 2022-03-18 2022-07-05 南京大学 电梯内异常行为检测方法、装置及存储介质
CN114550310A (zh) * 2022-04-22 2022-05-27 杭州魔点科技有限公司 一种识别多标签行为的方法和装置
CN114611634B (zh) * 2022-05-11 2023-07-28 上海闪马智能科技有限公司 一种行为类型的确定方法、装置、存储介质及电子装置
CN114611634A (zh) * 2022-05-11 2022-06-10 上海闪马智能科技有限公司 一种行为类型的确定方法、装置、存储介质及电子装置
CN115205763B (zh) * 2022-09-09 2023-02-17 阿里巴巴(中国)有限公司 视频处理方法及设备
CN115205763A (zh) * 2022-09-09 2022-10-18 阿里巴巴(中国)有限公司 视频处理方法及设备
CN115695025B (zh) * 2022-11-04 2024-05-14 中国电信股份有限公司 网络安全态势预测模型的训练方法及装置
CN115695025A (zh) * 2022-11-04 2023-02-03 中国电信股份有限公司 网络安全态势预测模型的训练方法及装置
WO2024104068A1 (fr) * 2022-11-15 2024-05-23 腾讯科技(深圳)有限公司 Procédé et appareil de détection vidéo, dispositif, support de stockage et produit
CN115618282B (zh) * 2022-12-16 2023-06-06 国检中心深圳珠宝检验实验室有限公司 一种合成宝石的鉴定方法、装置及存储介质
CN115618282A (zh) * 2022-12-16 2023-01-17 国检中心深圳珠宝检验实验室有限公司 一种合成宝石的鉴定方法、装置及存储介质
CN115830516A (zh) * 2023-02-13 2023-03-21 新乡职业技术学院 一种用于电池爆燃检测的计算机神经网络图像处理方法
CN116567294A (zh) * 2023-05-19 2023-08-08 上海国威互娱文化科技有限公司 全景视频分割处理方法及系统
CN116451770A (zh) * 2023-05-19 2023-07-18 北京百度网讯科技有限公司 神经网络模型的压缩方法、训练方法、处理方法和装置
CN116451770B (zh) * 2023-05-19 2024-03-01 北京百度网讯科技有限公司 神经网络模型的压缩方法、训练方法、处理方法和装置
CN116935363B (zh) * 2023-07-04 2024-02-23 东莞市微振科技有限公司 刀具识别方法、装置、电子设备及可读存储介质
CN116935363A (zh) * 2023-07-04 2023-10-24 东莞市微振科技有限公司 刀具识别方法、装置、电子设备及可读存储介质

Also Published As

Publication number Publication date
CN110070067A (zh) 2019-07-30
CN110070067B (zh) 2021-11-12

Similar Documents

Publication Publication Date Title
WO2020221278A1 (fr) Procédé d'entraînement de modèles, procédé de classification de vidéos, appareil associé, et dispositif électronique
CN109359636B (zh) 视频分类方法、装置及服务器
CN110175580B (zh) 一种基于时序因果卷积网络的视频行为识别方法
CN109543714B (zh) 数据特征的获取方法、装置、电子设备及存储介质
CN110147711B (zh) 视频场景识别方法、装置、存储介质和电子装置
WO2020114378A1 (fr) Procédé et appareil d'identification de filigrane de vidéo, dispositif, et support de stockage
US11042798B2 (en) Regularized iterative collaborative feature learning from web and user behavior data
CN110633669B (zh) 家居环境中基于深度学习的移动端人脸属性识别方法
WO2019052301A1 (fr) Procédé de classification de vidéos, procédé de traitement d'informations et serveur
CN111522996B (zh) 视频片段的检索方法和装置
WO2020108396A1 (fr) Procédé de classement de vidéo et serveur
CN111062871A (zh) 一种图像处理方法、装置、计算机设备及可读存储介质
CN113850162B (zh) 一种视频审核方法、装置及电子设备
JP7089045B2 (ja) メディア処理方法、その関連装置及びコンピュータプログラム
WO2020238353A1 (fr) Procédé et appareil de traitement de données, support de stockage et dispositif électronique
CN112508094A (zh) 垃圾图片的识别方法、装置及设备
WO2021138855A1 (fr) Procédé d'instruction de modèle, procédé et appareil de traitement de vidéos, support de stockage et dispositif électronique
CN113469289B (zh) 视频自监督表征学习方法、装置、计算机设备和介质
CN112488218A (zh) 图像分类方法、图像分类模型的训练方法和装置
US11948359B2 (en) Video processing method and apparatus, computing device and medium
CN114282047A (zh) 小样本动作识别模型训练方法、装置、电子设备及存储介质
CN115203471B (zh) 一种基于注意力机制的多模融合视频推荐方法
WO2023123923A1 (fr) Procédé d'identification de poids de corps humain, dispositif d'identification de poids de corps humain, dispositif informatique, et support
CN112581355A (zh) 图像处理方法、装置、电子设备和计算机可读介质
CN115705706A (zh) 视频处理方法、装置、计算机设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20799318

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 180222)

122 Ep: pct application non-entry in european phase

Ref document number: 20799318

Country of ref document: EP

Kind code of ref document: A1