CN110096617A - Video classification methods, device, electronic equipment and computer readable storage medium - Google Patents
Video classification methods, device, electronic equipment and computer readable storage medium Download PDFInfo
- Publication number
- CN110096617A CN110096617A CN201910357559.2A CN201910357559A CN110096617A CN 110096617 A CN110096617 A CN 110096617A CN 201910357559 A CN201910357559 A CN 201910357559A CN 110096617 A CN110096617 A CN 110096617A
- Authority
- CN
- China
- Prior art keywords
- sequence
- feature
- video
- subcharacter
- vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/70—Information retrieval; Database structures therefor; File system structures therefor of video data
- G06F16/75—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Multimedia (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Databases & Information Systems (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Television Signal Processing For Recording (AREA)
Abstract
The present invention provides a kind of video classification methods, device, electronic equipment and computer readable storage medium.This method comprises: obtaining the fisrt feature sequence of video to be sorted;Wherein, the feature in fisrt feature sequence arranges sequentially in time;By fisrt feature sequence inputting target pyramid attention network, the first output result of target pyramid attention network output is obtained;According to the first output as a result, obtaining object vector;According to object vector, classify to video to be sorted.Compared with prior art, the embodiment of the present invention can effectively improve the classification effectiveness of video, and, target pyramid attention network is using attention type method, its most effective feature that can extract and merge video can preferably guarantee the accuracy of classification results to be used for visual classification in this way.
Description
Technical field
The present embodiments relate to visual classification technical fields more particularly to a kind of video classification methods, device, electronics to set
Standby and computer readable storage medium.
Background technique
Visual classification is one of task most important, most basic in computer vision, and visual classification refers to by analyzing, managing
The relevant information for solving video, video is assigned in predefined classification, visual classification is in video search, video recommendations etc.
Key effect is played under application scenarios, the video techniques such as visual classification or video tab, video monitor, video title generation
Important dependence.
Currently, common visual classification mode are as follows: directly input all frames of video and be used to carry out setting for visual classification
It is standby, to obtain the classification results of equipment output.When in this way, need to analyze all frames of video, the classification of video
Efficiency is very low.
Summary of the invention
The embodiment of the present invention provides a kind of video classification methods, device, electronic equipment and computer readable storage medium, with
Solve the problems, such as that the classification effectiveness of existing visual classification mode is low.
In order to solve the above-mentioned technical problem, the present invention is implemented as follows:
In a first aspect, the embodiment of the present invention provides a kind of video classification methods, which comprises
Obtain the fisrt feature sequence of video to be sorted;Wherein, the feature in the fisrt feature sequence is suitable according to the time
Sequence arrangement;
By the fisrt feature sequence inputting target pyramid attention network, the target pyramid attention net is obtained
First output result of network output;
According to first output as a result, obtaining object vector;
According to the object vector, classify to the video to be sorted.
Second aspect, the embodiment of the present invention provide a kind of visual classification device, and described device includes:
First obtains module, for obtaining the fisrt feature sequence of video to be sorted;Wherein, in the fisrt feature sequence
Feature arrange sequentially in time;
Second obtains module, for by the fisrt feature sequence inputting target pyramid attention network, described in acquisition
First output result of target pyramid attention network output;
Third obtains module, for exporting according to described first as a result, obtaining object vector;
Categorization module, for classifying to the video to be sorted according to the object vector.
The third aspect, the embodiment of the present invention provide a kind of electronic equipment, including processor, memory, are stored in described deposit
On reservoir and the computer program that can run on the processor, the computer program are realized when being executed by the processor
The step of above-mentioned video classification methods.
Fourth aspect, the embodiment of the present invention provide a kind of computer readable storage medium, the computer-readable storage medium
Computer program is stored in matter, the step of computer program realizes above-mentioned video classification methods when being executed by processor.
It, can be first defeated by the fisrt feature sequence of video to be sorted in order to realize the classification of video in the embodiment of the present invention
Enter target pyramid attention network, then according to the first output of target pyramid attention network output as a result, obtaining mesh
Mark vector classifies to video to be sorted finally according to object vector.As it can be seen that in the embodiment of the present invention, using wait divide
The fisrt feature sequence and target pyramid attention network of class video, can be realized the classification of video, in this way, with existing
The case where must analyzing all frames in video to be sorted in technology, is compared, and the embodiment of the present invention can effectively improve
The classification effectiveness of video, also, target pyramid attention network can be extracted and be merged using attention type method
The most effective feature of video can preferably guarantee the accuracy of classification results to be used for visual classification in this way.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, needed in being described below to the embodiment of the present invention
Attached drawing to be used is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention,
For those of ordinary skill in the art, without any creative labor, it can also obtain according to these attached drawings
Take other attached drawings.
Fig. 1 is the flow chart of video classification methods provided in an embodiment of the present invention;
Fig. 2 is one of the schematic diagram of video classification methods provided in an embodiment of the present invention;
Fig. 3 is the use sequence chart of charging plug;
Fig. 4 is the two of the schematic diagram of video classification methods provided in an embodiment of the present invention;
Fig. 5 is the three of the schematic diagram of video classification methods provided in an embodiment of the present invention;
Fig. 6 is the structural block diagram of visual classification device provided in an embodiment of the present invention;
Fig. 7 is the structural schematic diagram of electronic equipment provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair
Embodiment in bright, those of ordinary skill in the art's acquired every other implementation without creative efforts
Example, shall fall within the protection scope of the present invention.
Video classification methods provided in an embodiment of the present invention are illustrated first below.
It should be noted that video classification methods provided in an embodiment of the present invention are applied to electronic equipment.Specifically, electronics
Equipment can be server, and certainly, the type of electronic equipment is not limited to server, may be other kinds of, energy
It is enough in the equipment for carrying out visual classification, the embodiment of the present invention does not do any restriction to the type of electronic equipment.
Referring to Fig. 1, the flow chart of video classification methods provided in an embodiment of the present invention is shown in figure.As shown in Figure 1, should
Method includes the following steps:
Step 101, the fisrt feature sequence of video to be sorted is obtained;Wherein, the feature in fisrt feature sequence according to when
Between sequentially arrange.
In a step 101, electronic equipment can use based on convolutional neural networks (Convolutional Neural
Networks, CNN) model, carry out video key feature extract, to obtain the fisrt feature sequence of video to be sorted;Its
In, the feature in fisrt feature sequence can be arranged according to the time by the sequence of morning to evening.It is understood that CNN is a kind of
Comprising convolutional calculation, and the feedforward neural network (Feedforward Neural Networks, FNN) with depth structure,
One of the representative algorithm of CNN or deep learning (deep learning).
Step 102, by fisrt feature sequence inputting target pyramid attention network, target pyramid attention net is obtained
First output result of network output.
It here, can only include a type of pyramid attention network in target pyramid attention network, for example,
It can only include time pyramid attention network or channel pyramid attention network in target pyramid attention network;Or
Person may include the pyramid attention network of at least two types in target pyramid attention network, for example, target gold word
It can simultaneously include time pyramid attention network and channel pyramid attention network in tower attention network.
If including the pyramid attention network of at least two types in target pyramid attention network, in step
In 102, fisrt feature sequence can be inputted to each type of pyramid attention network respectively, to obtain each type respectively
The output of pyramid attention network the first output as a result, and first defeated according to each type of pyramid attention network
Out as a result, executing subsequent step 103.
Step 103, according to the first output as a result, obtaining object vector.
Here, object vector is the vector that can represent the feature of entire video to be sorted.It should be noted that according to
One exports the specific implementation form multiplicity as a result, acquisition object vector, clear in order to be laid out, subsequent to carry out citing introduction.
Step 104, according to object vector, classify to video to be sorted.
It should be noted that video classification involved in the embodiment of the present invention there can be K kind in total, it is followed successively by B1、
B2、……、BK;Wherein, K is the integer greater than 1.According to object vector, after classifying to video to be sorted, electronic equipment
It may include K probability value in obtained classification results, be followed successively by G1、G2、……、GK;Wherein, G1Belong to B for video to be sorted1
This other probability value of video class, G2Belong to B for video to be sorted2This other probability value ... ... of video class, GKIt is to be sorted
Video belongs to BKThis other probability value of video class.
At step 104, if when visual classification to be sorted, progress be single label adaptation, then, G1、G2、……、
GKThis K probability value is 1 with value;If when visual classification to be sorted, progress be multi-tag adaptation, then, G1、
G2、……、GKThis K probability value may be 1 with value, it is also possible to be not 1.
It, can be first defeated by the fisrt feature sequence of video to be sorted in order to realize the classification of video in the embodiment of the present invention
Enter target pyramid attention network, then according to the first output of target pyramid attention network output as a result, obtaining mesh
Mark vector classifies to video to be sorted finally according to object vector.As it can be seen that in the embodiment of the present invention, using wait divide
The fisrt feature sequence and target pyramid attention network of class video, can be realized the classification of video, in this way, with existing
The case where must analyzing all frames in video to be sorted in technology, is compared, and the embodiment of the present invention can effectively improve
The classification effectiveness of video, also, target pyramid attention network can be extracted and be merged using attention type method
The most effective feature of video can preferably guarantee the accuracy of classification results to be used for visual classification in this way.
Optionally, according to object vector, classify to video to be sorted, comprising:
Object vector is inputted into fully-connected network, to obtain fully-connected network output, the classification results of video to be sorted.
Here, fully-connected network may be considered one and train in advance, be stored in the disaggregated model of electronic equipment local;
Wherein, disaggregated model can be using the object vector of multitude of video as input, and the type of multitude of video is instructed as output
It gets.Specifically, disaggregated model can be obtained by electronic equipment self training;Alternatively, disaggregated model can be by other equipment
Electronic equipment is distributed to after training.
In the present embodiment, only object vector need to be inputted fully-connected network, can be obtained the classification results of video to be sorted,
Therefore, the operation for obtaining the classification results of video to be sorted implements very convenient.
Optionally, target pyramid attention network is time pyramid attention network;
It include M characteristic sequence set of time scale inequality in first output result, each characteristic sequence set is by the
Each second feature Sequence composition that one characteristic sequence is divided according to corresponding time scale, in each characteristic sequence set
Two characteristic sequences arrange sequentially in time, and the feature in each second feature sequence arranges sequentially in time, and M is greater than 1
Integer.
Here, the value of M can be 2,3,4,5,6 or the integer greater than 6, will not enumerate herein.In addition, due to
The quantity of the time scale inequality of each characteristic sequence set, the second feature sequence in each characteristic sequence set can be difference
's.
Assuming that the fisrt feature sequence of video to be sorted is the X in Fig. 2(1)1, X(1)1In include sequentially in time arranging
Feature x1, feature x2, feature x3, feature x4, feature x5, feature x6, feature x7With feature x8, by X(1)1Input time pyramid
It may include 3 characteristic sequence collection in the first output result of time pyramid attention network output after attention network
It closes.That is, the value of M is 3, at this moment, it is believed that the pyramid level of time pyramid attention network is 3 layers, example
Level 1, level 2 and level 3 in for example Fig. 2, wherein level 1, level 2 and level 3 can be respectively corresponded
The characteristic sequence set of different time scales.
Specifically, the corresponding characteristic sequence set of level 1 can be by a second feature Sequence composition, X(1)As this
A second feature sequence.The corresponding characteristic sequence set of level 2 can be by X(2)1And X(2)2The two second feature sequence structures
At;Wherein, X(2)1In include the x that arranges sequentially in time1、x2、x3And x4, X(2)2In include sequentially in time arranging
x5、x6、x7And x8;The corresponding characteristic sequence set of level 3 can be by X(3)1、X(3)2、X(3)3And X(3)4This four second feature sequences
Column are constituted;Wherein, X(3)1In include the x that arranges sequentially in time1And x2, X(3)2In include the x that arranges sequentially in time3With
x4, X(3)3In include the x that arranges sequentially in time5And x6, X(3)4In include the x that arranges sequentially in time7And x8。
It can easily be seen that the corresponding characteristic sequence set of level 1 is by X(1)1It is divided into what portion obtained from the time,
The corresponding characteristic sequence set of level 2 is by X(1)1It is divided into what two equal portions obtained from the time, the corresponding feature of level 3
Arrangement set is by X(1)1It is divided into what four equal portions obtained from the time.
In this way, electronic equipment can obtain including the corresponding characteristic sequence set of level 1, the corresponding feature of level 2
First output result of arrangement set and the corresponding characteristic sequence set of level 3.Next, can be according to the first output
As a result, obtaining object vector, and according to object vector, realize the classification to video to be sorted.
It should be noted that, if giving no thought to the timing of video, all features are put when carrying out the classification of video
In a unordered set, all key features are treated equally in a group, the sequential correlation between complete override feature, this
It is effective in some scenes, but is invalid in other scenes.For example, as shown in figure 3, if all keys
All unordered pair waits for feature in a group, then the operation that user cannot be distinguished is that charging plug is inserted into socket on earth, still
Charging plug is come out of the socket.
In view of this, time pyramid attention network can be used, first by the first of video to be sorted in the present embodiment
Characteristic sequence is divided into several second feature sequences under some time scale, then obtains the feature sequence of some time scale
Column are gathered, and the second feature sequence in characteristic sequence set arranges sequentially in time, the feature in second feature sequence
Arrange sequentially in time, timing can be introduced in unordered attention mechanism in this way, with efficiently solve strong timing according to
Bad visual classification problem.As it can be seen that the present embodiment is applicable not only to the visual classification under weak Temporal dependency scene, it is also applied for strong
Visual classification under Temporal dependency scene.
Optionally, the value of M is bigger, and the video length of video to be sorted is longer.
Specifically, the corresponding relationship that can be previously stored in electronic equipment between video length range and the value of M;Its
In, this video length range can this value be corresponding with 5 within 10 minutes to 15 minutes, 5 minutes to 10 minutes this videos when
Long range can this value be corresponding with 4, this video length range can this value be corresponding with 3 within 0 minute to 5 minutes.
So, the case where the video length of video to be sorted is located at 10 minutes to 15 minutes this video length range
Under, it may include 5 characteristic sequence set of time scale inequality, at this moment, time pyramid attention in the first output result
The time of network pyramidal level is 5 grades.When the video length of video to be sorted is located at 0 minute to 5 minutes this videos
It may include 3 characteristic sequence set of time scale inequality, at this moment, time in the case where long range, in the first output result
The time pyramidal level of pyramid attention network is 3 grades.
As it can be seen that in the present embodiment, when in use between pyramid attention network when, time pyramidal level is not complete
Changeless, time pyramidal level can neatly be adjusted according to the length of the video length of video to be sorted,
So that time pyramidal level matches with the video length of video to be sorted, to guarantee classification effectiveness and classification effect
Fruit.
Optionally, according to the first output as a result, obtaining object vector, comprising:
Respectively by each second feature sequence inputting channel pyramid attention network in the first output result, to obtain
Channel pyramid attention network exports respectively, and each second feature sequence corresponding second exports result;
According to corresponding second output of each second feature sequence as a result, obtaining object vector;
It wherein, include N number of son spy of feature fine granularity inequality in the corresponding second output result of any second feature sequence
Arrangement set is levied, each subcharacter arrangement set is split by a second feature sequence according to individual features fine granularity
Each subcharacter Sequence composition, the subcharacter in each subcharacter sequence arrange sequentially in time, and N is the integer greater than 1.
Here, the value of N can be 2,3,4,5,6 or the integer greater than 6, will not enumerate herein.In addition, M
Value and the value of N may be the same or different.
Assuming that channel pyramid attention network is indicated with CPAtt, as shown in Fig. 2, obtaining including X(1)、X(2)1、X(2)2、
X(3)1、X(3)2、X(3)3And X(3)4This 7 second feature sequences first output result after, electronic equipment can respectively by this 7
A second feature sequence inputting CPAtt, to obtain CPAtt output, the corresponding 7 second output knot of 7 second feature sequences
Fruit.
Assuming that a certain second feature sequence in above-mentioned 7 second feature sequences can also be expressed as the X in Fig. 4(1)1, and
X(1)1In include the feature x that arranges sequentially in time1, feature x2..., feature xL, by X(1)1Input channel pyramid note
It anticipates after power network, may include feature fine granularity inequality in the second output result of channel pyramid attention network output
3 sub- characteristic sequence set.That is, the value of N is 3, at this moment, it is believed that the golden word of channel pyramid attention network
Tower level is 3 layers, and level 1, level 2 and the level 3, level 1, level 2 and level 3 in for example, Fig. 4 can
To respectively correspond the fine-grained subcharacter arrangement set of different characteristic.
Specifically, the corresponding subcharacter arrangement set of level 1 is made of a sub- characteristic sequence, X(1)1As this height
Characteristic sequence.The corresponding subcharacter arrangement set of level 2 can be by X(2)1And X(2)2The two subcharacter Sequence compositions;Wherein,
X(2)1In include by x1Divide one of two obtained subcharacters, by x2Divide one in two obtained subcharacters
Person ... ..., by xLDivide obtained one of two subcharacters, X(2)2In include by x1Divide in two obtained subcharacters
Another one, by x2Divide obtained the other of two subcharacters ... ..., by xLDivide in two obtained subcharacters
Another one.The corresponding subcharacter arrangement set of level3 can be by X(3)1、X(3)2、X(3)3And X(3)4This four sub- characteristic sequence structures
At;Wherein, X(3)1In include by x1Divide the one in four obtained subcharacters, by x2Divide four obtained subcharacters
In one ... ..., by xLDivide the one in four obtained subcharacters;X(3)2In include by x1Divide four obtained
The two in a subcharacter, by x2Divide the two ... ... in four obtained subcharacters, by xLDivide four obtained
The two in subcharacter, X(3)3And X(3)4In include content the rest may be inferred, details are not described herein.
It should be noted that the content reference for including in the corresponding second output result of other second feature sequences is above stated
Bright, details are not described herein.Later, can according to each second feature sequence it is corresponding second output as a result, obtain target to
Amount.
In a specific embodiment, in any second output result further include each subcharacter sequence included by it
In the corresponding weight of each subcharacter;
According to corresponding second output of each second feature sequence as a result, obtaining object vector, comprising:
For each subcharacter sequence in each second output result, according to each subcharacter therein and corresponding power
Weight, is weighted summation, obtains corresponding feature vector;
According to the corresponding feature vector of all subcharacter sequences, splicing operation is carried out, obtains splicing vector;
Vector will be spliced as object vector.
Specifically, for above-mentioned X(2)1This subcharacter sequence, it is assumed that its subcharacter for including is followed successively by x11、
x21、……、xL1, wherein x11、x21、……、xL1It is vector form, and x11Corresponding weight is z1、x21Corresponding weight is
z2、xL1Corresponding weight is zL, then, X(2)1Corresponding feature vector y can be calculated using following formula:
Y=x11z1+x21z2+……+xL1zL
It should be noted that the calculation of other corresponding feature vectors of subcharacter sequence is referring to above-mentioned to X(2)1This
The explanation of subcharacter sequence, details are not described herein.It, can be with after obtaining the corresponding feature vector of all subcharacter sequences
Splicing operation is carried out to these feature vectors, to obtain for the splicing vector as object vector.
It should be noted that the Att in Fig. 4 may be considered the calculating operation of feature vector, in Fig. 2 and Fig. 4
Contact may be considered the concatenation of vector.As shown in figure 4, can be to X(2)1Corresponding feature vector and X(2)2It is corresponding
Feature vector carries out splicing operation, to obtain the first splicing vector, such as obtains the y in Fig. 4(2);To X(3)1Corresponding feature to
Amount, X(3)2Corresponding feature vector, X(3)3Corresponding feature vector and X(3)4Corresponding feature vector carries out splicing operation, with
To the second splicing vector, such as obtain the y3 in Fig. 4(3).Next, again to X(1)1Corresponding feature vector (such as the y in Fig. 4(1)), first splicing vector sum second splice vector carry out splicing operation, with obtain third splicing vector, the third splice vector
It is corresponding with a certain second feature sequence in above-mentioned 7 second feature sequences.
Later, it can also be obtained and other 6 second feature sequences corresponding 6 according to the mode similar with above-mentioned process
A third splices vector, is also just saying, can finally obtain X(1)、X(2)1、X(2)2、X(3)1、X(3)2、X(3)3And X(3)4This 7 second
Corresponding 7 thirds of characteristic sequence splice vector.It at this moment, as shown in Fig. 2, can be to X(2)1Corresponding third splices vector sum
X(2)2Corresponding third splicing vector carries out splicing operation, to obtain the 4th splicing vector, such as obtains the y in Fig. 2(2);It is right
X(3)1Corresponding third splices vector, X(3)2Corresponding third splices vector, X(3)3Corresponding third splices vector sum X(3)4It is corresponding
Third splicing vector carry out splicing operation, to obtain the 5th splicing vector, such as obtain the y in Fig. 2(3).Next, right again
X(2)1Corresponding third splicing vector (such as the y in Fig. 2(1)), the 4th splicing vector sum the 5th splice vector carry out splicing fortune
It calculates, to obtain for the splicing vector as object vector.
It should be noted that when carrying out the classification of video, if electronic equipment is directly each feature calculation one power
Weight, the weight of each feature are used directly for visual classification, but in many cases, only have passage portion in video to be sorted
Facilitate visual classification.For example, as shown in figure 5, may include two videos of Frame1 and Frame2 in video to be sorted
Frame, the two video frames each contribute to visual classification, still, the important channel in the two video frames be it is visibly different,
The important channel of Frame1 corresponds to rectangle frame 510 and encloses the region set, and the important channel of Frame2 corresponds to rectangle frame 520 and encloses the area set
Domain.On the basis of Fig. 5, weight is respectively specified that if it is the entire feature of two video frames, as shown in the lower left corner in Fig. 5, only
The weight of relative equilibrium can be provided for two video frames, for example, the power specified for the two features of Feature1 and Feature2
Weight can be 0.5, in this way, the weight of uncorrelated noise is also 0.5, the important channel of two features can become after weighted average
Weak, the accuracy that this will lead to visual classification is lower.
In view of this, channel pyramid attention network can be used from coarse to fine gradually will be each in the present embodiment
Image Segmentation Methods Based on Features is several subcharacters, and specifies corresponding weight for each subcharacter, in this way, as shown in the lower right corner in Fig. 5, it can
To set 1.0 for the weight of the pith in each feature, and 0.0 is set by the weight of inessential part, for example, can
1.0 are set as with the weight for the subcharacter that Feature1 this feature is located at top half, by this Q-character of Feature1
It is set as 0.0 in the weight of the subcharacter of lower half portion, and it is possible to which Feature2 this feature to be located to the son of top half
The weight of feature is set as 0.0, sets 1.0 for the weight for the subcharacter that Feature2 this feature is located at lower half portion, this
Sample, after subsequent be weighted, important channel information can be fully retained, and be conducive to obtain in this way subject to more
True classification results.As it can be seen that, by using channel pyramid attention network, can effectively guarantee to classify in the present embodiment
As a result accuracy.
Optionally, the value of N is bigger, and the video length of video to be sorted is longer.
Specifically, the corresponding relationship that can be previously stored in electronic equipment between video length range and the value of N;Its
In, this video length range can this value be corresponding with 5 within 10 minutes to 15 minutes, 5 minutes to 10 minutes this videos when
Long range can this value be corresponding with 4, this video length range can this value be corresponding with 3 within 0 minute to 5 minutes.
So, the case where the video length of video to be sorted is located at 10 minutes to 15 minutes this video length range
Under, it may include 5 sub- characteristic sequence set of feature fine granularity inequality, at this moment, channel gold word in each second output result
The pyramidal level in channel of tower attention network is 5 grades.Video to be sorted video length be located at 0 minute to 5 minutes this
It may include 3 subcharacter sequences of feature fine granularity inequality in the case where a video length range, in each second output result
Column set, at this moment, the pyramidal level in channel of channel pyramid attention network are 3 grades.
As it can be seen that when using channel pyramid attention network, the pyramidal level in channel is not complete in the present embodiment
Changeless, the pyramidal level in channel can neatly be adjusted according to the length of the video length of video to be sorted,
So that the pyramidal level in channel and the video length of video to be sorted match, to guarantee classification effectiveness and classification effect
Fruit.
Optionally, the quantity of fisrt feature sequence is at least two, and the corresponding characteristic type of each fisrt feature sequence is mutual
It is different.
Here, the quantity of fisrt feature sequence can be two, three, four or four or more, herein no longer one by one
It enumerates.
In a specific embodiment, at least two fisrt feature sequences may include first object characteristic sequence,
Two target signature sequences and third target signature sequence;Wherein,
The corresponding characteristic type of first object characteristic sequence is characteristics of image type, the corresponding spy of the second target signature sequence
Sign type is Optical-flow Feature type, and the corresponding characteristic type of third target signature sequence is phonetic feature type.
In another embodiment specific implementation mode, at least two fisrt feature sequences can only include first object characteristic sequence
With the second target signature sequence;Wherein,
The corresponding characteristic type of first object characteristic sequence is characteristics of image type, Optical-flow Feature type and phonetic feature class
Any one of type;The corresponding characteristic type of second target signature sequence is characteristics of image type, Optical-flow Feature type and voice
Any one of characteristic type.
It should be noted that the fisrt feature sequence of different characteristic type may be considered the different modalities of video to be sorted
Feature carries out visual classification by using at least two fisrt feature sequences, can be realized multi-modal fusion, to improve classification
Robustness and precision.
As it can be seen that in the present embodiment, it can be based on the multi-modal feature of video to be sorted, with time pyramid attention network
With both pyramid attention networks of channel pyramid attention network, the classification of Lai Jinhang video.It specifically, can be first
The key features such as characteristics of image, Optical-flow Feature and the phonetic feature of model extraction video based on convolutional neural networks are first used,
Then the fisrt feature sequence of various characteristic types is successively passed through into time time pyramid attention network and channel pyramid
The feature of various characteristic types is connected fusion again later by attention network, obtains the feature for representing entire video to be sorted
Object vector is classified finally by a fully-connected network, so that it is possible in each classification to obtain video to be sorted
Probability is so far achieved that the classification of video.
Timing information is not considered in the prior art by the above-mentioned means, can overcome using time pyramid attention network
Weakness, and whole classification accuracy and classification effectiveness can be improved using channel pyramid attention network, in this way, this reality
Apply video classification methods in example single label, multi-tag, short-sighted frequency, long video, weak Temporal dependency, strong Temporal dependency video
Under scene of classifying, all obtain well as a result, also, this method application can reduce training for different classifications scene and
Tuning time, the more succinct intelligence of whole flow process save human cost.
Visual classification device provided in an embodiment of the present invention is illustrated below.
Fig. 6 is participated in, shows the structural block diagram of visual classification device 600 provided in an embodiment of the present invention in figure.Such as Fig. 6 institute
Show, visual classification device 600 includes:
First obtains module 601, for obtaining the fisrt feature sequence of video to be sorted;Wherein, in fisrt feature sequence
Feature arrange sequentially in time;
Second obtains module 602, for obtaining target gold for fisrt feature sequence inputting target pyramid attention network
First output result of word tower attention network output;
Third obtains module 603, for exporting according to first as a result, obtaining object vector;
Categorization module 604, for classifying to video to be sorted according to object vector.
Optionally, target pyramid attention network is time pyramid attention network;
It include M characteristic sequence set of time scale inequality in first output result, each characteristic sequence set is by the
Each second feature Sequence composition that one characteristic sequence is divided according to corresponding time scale, in each characteristic sequence set
Two characteristic sequences arrange sequentially in time, and the feature in each second feature sequence arranges sequentially in time, and M is greater than 1
Integer.
Optionally, the value of M is bigger, and the video length of video to be sorted is longer.
Optionally, third obtains module 603, comprising:
First obtains unit, for exporting each second feature sequence inputting channel pyramid in result by first respectively
Attention network is exported respectively with to obtain channel pyramid attention network, and each second feature sequence is corresponding second defeated
Result out;
Second obtaining unit, for being exported according to each second feature sequence corresponding second as a result, obtaining object vector;
It wherein, include N number of son spy of feature fine granularity inequality in the corresponding second output result of any second feature sequence
Arrangement set is levied, each subcharacter arrangement set is split by a second feature sequence according to individual features fine granularity
Each subcharacter Sequence composition, the subcharacter in each subcharacter sequence arrange sequentially in time, and N is the integer greater than 1.
Optionally, the value of N is bigger, and the video length of video to be sorted is longer.
Optionally, in any second output result further include each subcharacter in each subcharacter sequence included by it
Corresponding weight;
Second obtaining unit, comprising:
First obtains subelement, for exporting each subcharacter sequence in result for each second, according to therein
Each subcharacter and respective weights are weighted summation, obtain corresponding feature vector;
Second obtains subelement, for carrying out splicing operation, obtaining according to the corresponding feature vector of all subcharacter sequences
Splice vector;
Subelement is determined, for vector will to be spliced as object vector.
Optionally, categorization module 604 are specifically used for:
Object vector is inputted into fully-connected network, to obtain fully-connected network output, the classification results of video to be sorted.
Optionally, the quantity of fisrt feature sequence is at least two, and the corresponding characteristic type of each fisrt feature sequence is mutual
It is different.
Optionally, at least two fisrt feature sequences include first object characteristic sequence, the second target signature sequence and
Three target signature sequences;Wherein,
The corresponding characteristic type of first object characteristic sequence is characteristics of image type, the corresponding spy of the second target signature sequence
Sign type is Optical-flow Feature type, and the corresponding characteristic type of third target signature sequence is phonetic feature type.
It, can be first defeated by the fisrt feature sequence of video to be sorted in order to realize the classification of video in the embodiment of the present invention
Enter target pyramid attention network, then according to the first output of target pyramid attention network output as a result, obtaining mesh
Mark vector classifies to video to be sorted finally according to object vector.As it can be seen that in the embodiment of the present invention, using wait divide
The fisrt feature sequence and target pyramid attention network of class video, can be realized the classification of video, in this way, with existing
The case where must analyzing all frames in video to be sorted in technology, is compared, and the embodiment of the present invention can effectively improve
The classification effectiveness of video, also, target pyramid attention network can be extracted and be merged using attention type method
The most effective feature of video can preferably guarantee the accuracy of classification results to be used for visual classification in this way.
Electronic equipment provided in an embodiment of the present invention is illustrated below.
Referring to Fig. 7, the structural schematic diagram of electronic equipment 700 provided in an embodiment of the present invention is shown in figure.Such as Fig. 7 institute
Show, electronic equipment 700 includes: processor 701, memory 703, user interface 704 and bus interface.
Processor 701 executes following process for reading the program in memory 703:
Obtain the fisrt feature sequence of video to be sorted;Wherein, the feature in fisrt feature sequence is arranged sequentially in time
Column;
By fisrt feature sequence inputting target pyramid attention network, the output of target pyramid attention network is obtained
First output result;
According to the first output as a result, obtaining object vector;
According to object vector, classify to video to be sorted.
In Fig. 7, bus architecture may include the bus and bridge of any number of interconnection, specifically be represented by processor 701
One or more processors and the various circuits of memory that represent of memory 703 link together.Bus architecture can be with
Various other circuits of such as peripheral equipment, voltage-stablizer and management circuit or the like are linked together, these are all these
Well known to field, therefore, it will not be further described herein.Bus interface provides interface.For different users
Equipment, user interface 704, which can also be, external the interface for needing equipment is inscribed, and the equipment of connection includes but is not limited to small key
Disk, display, loudspeaker, microphone, control stick etc..
Processor 701, which is responsible for management bus architecture and common processing, memory 703, can store processor 701 and is holding
Used data when row operation.
Optionally, target pyramid attention network is time pyramid attention network;
It include M characteristic sequence set of time scale inequality in first output result, each characteristic sequence set is by the
Each second feature Sequence composition that one characteristic sequence is divided according to corresponding time scale, in each characteristic sequence set
Two characteristic sequences arrange sequentially in time, and the feature in each second feature sequence arranges sequentially in time, and M is greater than 1
Integer.
Optionally, the value of M is bigger, and the video length of video to be sorted is longer.
Optionally, processor 701 are specifically used for:
Respectively by each second feature sequence inputting channel pyramid attention network in the first output result, to obtain
Channel pyramid attention network exports respectively, and each second feature sequence corresponding second exports result;
According to corresponding second output of each second feature sequence as a result, obtaining object vector;
It wherein, include N number of son spy of feature fine granularity inequality in the corresponding second output result of any second feature sequence
Arrangement set is levied, each subcharacter arrangement set is split by a second feature sequence according to individual features fine granularity
Each subcharacter Sequence composition, the subcharacter in each subcharacter sequence arrange sequentially in time, and N is the integer greater than 1.
Optionally, the value of N is bigger, and the video length of video to be sorted is longer.
Optionally, in any second output result further include each subcharacter in each subcharacter sequence included by it
Corresponding weight;
Processor 701, is specifically used for:
For each subcharacter sequence in each second output result, according to each subcharacter therein and corresponding power
Weight, is weighted summation, obtains corresponding feature vector;
According to the corresponding feature vector of all subcharacter sequences, splicing operation is carried out, obtains splicing vector;
Vector will be spliced as object vector.
Optionally, processor 701 are specifically used for:
Object vector is inputted into fully-connected network, to obtain fully-connected network output, the classification results of video to be sorted.
Optionally, the quantity of fisrt feature sequence is at least two, and the corresponding characteristic type of each fisrt feature sequence is mutual
It is different.
Optionally, at least two fisrt feature sequences include first object characteristic sequence, the second target signature sequence and
Three target signature sequences;Wherein,
The corresponding characteristic type of first object characteristic sequence is characteristics of image type, the corresponding spy of the second target signature sequence
Sign type is Optical-flow Feature type, and the corresponding characteristic type of third target signature sequence is phonetic feature type.
It, can be first defeated by the fisrt feature sequence of video to be sorted in order to realize the classification of video in the embodiment of the present invention
Enter target pyramid attention network, then according to the first output of target pyramid attention network output as a result, obtaining mesh
Mark vector classifies to video to be sorted finally according to object vector.As it can be seen that in the embodiment of the present invention, using wait divide
The fisrt feature sequence and target pyramid attention network of class video, can be realized the classification of video, in this way, with existing
The case where must analyzing all frames in video to be sorted in technology, is compared, and the embodiment of the present invention can effectively improve
The classification effectiveness of video, also, target pyramid attention network can be extracted and be merged using attention type method
The most effective feature of video can preferably guarantee the accuracy of classification results to be used for visual classification in this way.
Preferably, the embodiment of the present invention also provides a kind of electronic equipment, including processor 701, and memory 703 is stored in
On memory 703 and the computer program that can run on the processor 701, the computer program are executed by processor 701
Each process of the above-mentioned video classification methods embodiment of Shi Shixian, and identical technical effect can be reached, to avoid repeating, here
It repeats no more.
The embodiment of the present invention also provides a kind of computer readable storage medium, and meter is stored on computer readable storage medium
Calculation machine program, the computer program realize each process of above-mentioned video classification methods embodiment, and energy when being executed by processor
Reach identical technical effect, to avoid repeating, which is not described herein again.Wherein, the computer readable storage medium, such as only
Read memory (Read-Only Memory, abbreviation ROM), random access memory (Random Access Memory, abbreviation
RAM), magnetic or disk etc..
The embodiment of the present invention is described with above attached drawing, but the invention is not limited to above-mentioned specific
Embodiment, the above mentioned embodiment is only schematical, rather than restrictive, those skilled in the art
Under the inspiration of the present invention, without breaking away from the scope protected by the purposes and claims of the present invention, it can also make very much
Form belongs within protection of the invention.
Claims (20)
1. a kind of video classification methods, which is characterized in that the described method includes:
Obtain the fisrt feature sequence of video to be sorted;Wherein, the feature in the fisrt feature sequence is arranged sequentially in time
Column;
By the fisrt feature sequence inputting target pyramid attention network, it is defeated to obtain the target pyramid attention network
The first output result out;
According to first output as a result, obtaining object vector;
According to the object vector, classify to the video to be sorted.
2. the method according to claim 1, wherein
The target pyramid attention network is time pyramid attention network;
It include M characteristic sequence set of time scale inequality, each characteristic sequence set in the first output result
Each second feature Sequence composition divided by the fisrt feature sequence according to corresponding time scale, each feature sequence
Column set in the second feature sequence arrange sequentially in time, the feature in each second feature sequence according to when
Between sequentially arrange, M is integer greater than 1.
3. according to the method described in claim 2, the video length of the video to be sorted is got over it is characterized in that, the value of M is bigger
It is long.
4. according to the method described in claim 2, it is characterized in that, it is described according to it is described first output as a result, obtain target to
Amount, comprising:
Each of result second feature sequence inputting channel pyramid attention network is exported by described first respectively, with
Obtain what the channel pyramid attention network exported respectively, each second feature sequence corresponding second exports knot
Fruit;
It is exported according to each second feature sequence corresponding second as a result, obtaining object vector;
It wherein, include N number of son spy of feature fine granularity inequality in the corresponding second output result of any second feature sequence
Arrangement set is levied, each subcharacter arrangement set is divided by a second feature sequence according to individual features fine granularity
Each subcharacter Sequence composition cut, the subcharacter in each subcharacter sequence arrange sequentially in time, N be greater than
1 integer.
5. according to the method described in claim 4, the video length of the video to be sorted is got over it is characterized in that, the value of N is bigger
It is long.
6. according to the method described in claim 4, it is characterized in that, further including included by it in any second output result
Each of each of the subcharacter sequence corresponding weight of the subcharacter;
It is described to be exported according to each second feature sequence corresponding second as a result, obtaining object vector, comprising:
For each second output each of result subcharacter sequence, according to each subcharacter therein with
And respective weights, it is weighted summation, obtains corresponding feature vector;
According to the corresponding feature vector of all subcharacter sequences, splicing operation is carried out, obtains splicing vector;
Using the splicing vector as object vector.
7. the method according to claim 1, wherein described according to the object vector, to the view to be sorted
Frequency is classified, comprising:
The object vector is inputted into fully-connected network, to obtain the fully-connected network output, the video to be sorted
Classification results.
8. the method according to claim 1, wherein the quantity of the fisrt feature sequence is at least two, often
The corresponding characteristic type inequality of a fisrt feature sequence.
9. according to the method described in claim 8, it is characterized in that, at least two fisrt feature sequences include first object
Characteristic sequence, the second target signature sequence and third target signature sequence;Wherein,
The corresponding characteristic type of the first object characteristic sequence is characteristics of image type, and the second target signature sequence is corresponding
Characteristic type be Optical-flow Feature type, the corresponding characteristic type of the third target signature sequence is phonetic feature type.
10. a kind of visual classification device, which is characterized in that described device includes:
First obtains module, for obtaining the fisrt feature sequence of video to be sorted;Wherein, the spy in the fisrt feature sequence
Sign arranges sequentially in time;
Second obtains module, for obtaining the target for the fisrt feature sequence inputting target pyramid attention network
First output result of pyramid attention network output;
Third obtains module, for exporting according to described first as a result, obtaining object vector;
Categorization module, for classifying to the video to be sorted according to the object vector.
11. device according to claim 10, which is characterized in that
The target pyramid attention network is time pyramid attention network;
It include M characteristic sequence set of time scale inequality, each characteristic sequence set in the first output result
Each second feature Sequence composition divided by the fisrt feature sequence according to corresponding time scale, each feature sequence
Column set in the second feature sequence arrange sequentially in time, the feature in each second feature sequence according to when
Between sequentially arrange, M is integer greater than 1.
12. device according to claim 11, which is characterized in that the value of M is bigger, the video length of the video to be sorted
It is longer.
13. device according to claim 11, which is characterized in that the third obtains module, comprising:
First obtains unit, for respectively by each of the first output result second feature sequence inputting channel gold
Word tower attention network is exported respectively with to obtain the channel pyramid attention network, each second feature sequence
Corresponding second output result;
Second obtaining unit, for being exported according to each second feature sequence corresponding second as a result, obtaining object vector;
It wherein, include N number of son spy of feature fine granularity inequality in the corresponding second output result of any second feature sequence
Arrangement set is levied, each subcharacter arrangement set is divided by a second feature sequence according to individual features fine granularity
Each subcharacter Sequence composition cut, the subcharacter in each subcharacter sequence arrange sequentially in time, N be greater than
1 integer.
14. device according to claim 13, which is characterized in that the value of N is bigger, the video length of the video to be sorted
It is longer.
15. device according to claim 13, which is characterized in that further include that it is wrapped in any second output result
Each of include each of the subcharacter sequence corresponding weight of the subcharacter;
Second obtaining unit, comprising:
First obtains subelement, for exporting each of the result subcharacter sequence for each described second, according to it
Each of the subcharacter and respective weights, be weighted summation, obtain corresponding feature vector;
Second obtains subelement, for carrying out splicing operation, obtaining according to the corresponding feature vector of all subcharacter sequences
Splice vector;
Subelement is determined, for using the splicing vector as object vector.
16. device according to claim 10, which is characterized in that the categorization module is specifically used for:
The object vector is inputted into fully-connected network, to obtain the fully-connected network output, the video to be sorted
Classification results.
17. device according to claim 10, which is characterized in that the quantity of the fisrt feature sequence is at least two,
The corresponding characteristic type inequality of each fisrt feature sequence.
18. device according to claim 17, which is characterized in that at least two fisrt feature sequences include the first mesh
Mark characteristic sequence, the second target signature sequence and third target signature sequence;Wherein,
The corresponding characteristic type of the first object characteristic sequence is characteristics of image type, and the second target signature sequence is corresponding
Characteristic type be Optical-flow Feature type, the corresponding characteristic type of the third target signature sequence is phonetic feature type.
19. a kind of electronic equipment, which is characterized in that including processor, memory is stored on the memory and can be described
The computer program run on processor is realized when the computer program is executed by the processor as in claim 1 to 9
The step of described in any item video classification methods.
20. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium
Program, the computer program realize video classification methods as claimed in any one of claims 1-9 wherein when being executed by processor
The step of.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910357559.2A CN110096617B (en) | 2019-04-29 | 2019-04-29 | Video classification method and device, electronic equipment and computer-readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910357559.2A CN110096617B (en) | 2019-04-29 | 2019-04-29 | Video classification method and device, electronic equipment and computer-readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110096617A true CN110096617A (en) | 2019-08-06 |
CN110096617B CN110096617B (en) | 2021-08-10 |
Family
ID=67446566
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910357559.2A Active CN110096617B (en) | 2019-04-29 | 2019-04-29 | Video classification method and device, electronic equipment and computer-readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110096617B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111246256A (en) * | 2020-02-21 | 2020-06-05 | 华南理工大学 | Video recommendation method based on multi-mode video content and multi-task learning |
CN111291643A (en) * | 2020-01-20 | 2020-06-16 | 北京百度网讯科技有限公司 | Video multi-label classification method and device, electronic equipment and storage medium |
CN111491187A (en) * | 2020-04-15 | 2020-08-04 | 腾讯科技(深圳)有限公司 | Video recommendation method, device, equipment and storage medium |
CN111797800A (en) * | 2020-07-14 | 2020-10-20 | 中国传媒大学 | Video classification method based on content mining |
CN112507920A (en) * | 2020-12-16 | 2021-03-16 | 重庆交通大学 | Examination abnormal behavior identification method based on time displacement and attention mechanism |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105917354A (en) * | 2014-10-09 | 2016-08-31 | 微软技术许可有限责任公司 | Spatial pyramid pooling networks for image processing |
US20180032846A1 (en) * | 2016-08-01 | 2018-02-01 | Nvidia Corporation | Fusing multilayer and multimodal deep neural networks for video classification |
CN108416795A (en) * | 2018-03-04 | 2018-08-17 | 南京理工大学 | The video actions recognition methods of space characteristics is merged based on sequence pondization |
CN108830212A (en) * | 2018-06-12 | 2018-11-16 | 北京大学深圳研究生院 | A kind of video behavior time shaft detection method |
CN109359592A (en) * | 2018-10-16 | 2019-02-19 | 北京达佳互联信息技术有限公司 | Processing method, device, electronic equipment and the storage medium of video frame |
CN109389055A (en) * | 2018-09-21 | 2019-02-26 | 西安电子科技大学 | Video classification methods based on mixing convolution sum attention mechanism |
CN109670453A (en) * | 2018-12-20 | 2019-04-23 | 杭州东信北邮信息技术有限公司 | A method of extracting short video subject |
-
2019
- 2019-04-29 CN CN201910357559.2A patent/CN110096617B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105917354A (en) * | 2014-10-09 | 2016-08-31 | 微软技术许可有限责任公司 | Spatial pyramid pooling networks for image processing |
US20180032846A1 (en) * | 2016-08-01 | 2018-02-01 | Nvidia Corporation | Fusing multilayer and multimodal deep neural networks for video classification |
CN108416795A (en) * | 2018-03-04 | 2018-08-17 | 南京理工大学 | The video actions recognition methods of space characteristics is merged based on sequence pondization |
CN108830212A (en) * | 2018-06-12 | 2018-11-16 | 北京大学深圳研究生院 | A kind of video behavior time shaft detection method |
CN109389055A (en) * | 2018-09-21 | 2019-02-26 | 西安电子科技大学 | Video classification methods based on mixing convolution sum attention mechanism |
CN109359592A (en) * | 2018-10-16 | 2019-02-19 | 北京达佳互联信息技术有限公司 | Processing method, device, electronic equipment and the storage medium of video frame |
CN109670453A (en) * | 2018-12-20 | 2019-04-23 | 杭州东信北邮信息技术有限公司 | A method of extracting short video subject |
Non-Patent Citations (1)
Title |
---|
胡春海: "视觉显著性驱动的运动鱼体视频分割算法", 《燕山大学学报》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111291643A (en) * | 2020-01-20 | 2020-06-16 | 北京百度网讯科技有限公司 | Video multi-label classification method and device, electronic equipment and storage medium |
CN111291643B (en) * | 2020-01-20 | 2023-08-22 | 北京百度网讯科技有限公司 | Video multi-label classification method, device, electronic equipment and storage medium |
CN111246256A (en) * | 2020-02-21 | 2020-06-05 | 华南理工大学 | Video recommendation method based on multi-mode video content and multi-task learning |
CN111491187A (en) * | 2020-04-15 | 2020-08-04 | 腾讯科技(深圳)有限公司 | Video recommendation method, device, equipment and storage medium |
CN111491187B (en) * | 2020-04-15 | 2023-10-31 | 腾讯科技(深圳)有限公司 | Video recommendation method, device, equipment and storage medium |
CN111797800A (en) * | 2020-07-14 | 2020-10-20 | 中国传媒大学 | Video classification method based on content mining |
CN111797800B (en) * | 2020-07-14 | 2024-03-05 | 中国传媒大学 | Video classification method based on content mining |
CN112507920A (en) * | 2020-12-16 | 2021-03-16 | 重庆交通大学 | Examination abnormal behavior identification method based on time displacement and attention mechanism |
CN112507920B (en) * | 2020-12-16 | 2023-01-24 | 重庆交通大学 | Examination abnormal behavior identification method based on time displacement and attention mechanism |
Also Published As
Publication number | Publication date |
---|---|
CN110096617B (en) | 2021-08-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110096617A (en) | Video classification methods, device, electronic equipment and computer readable storage medium | |
Wang et al. | SaliencyGAN: Deep learning semisupervised salient object detection in the fog of IoT | |
Wang et al. | A deep network solution for attention and aesthetics aware photo cropping | |
Li et al. | Cnnpruner: Pruning convolutional neural networks with visual analytics | |
CN110147711A (en) | Video scene recognition methods, device, storage medium and electronic device | |
CN110348387A (en) | A kind of image processing method, device and computer readable storage medium | |
CN104933428B (en) | A kind of face identification method and device based on tensor description | |
CN109145784A (en) | Method and apparatus for handling video | |
CN110353675A (en) | The EEG signals emotion identification method and device generated based on picture | |
CN105893478A (en) | Tag extraction method and equipment | |
CN110503076A (en) | Video classification methods, device, equipment and medium based on artificial intelligence | |
CN110378348A (en) | Instance of video dividing method, equipment and computer readable storage medium | |
CN109325516A (en) | A kind of integrated learning approach and device towards image classification | |
CN106529996A (en) | Deep learning-based advertisement display method and device | |
CN105989067A (en) | Method for generating text abstract from image, user equipment and training server | |
CN110472050A (en) | A kind of clique's clustering method and device | |
CN114913303A (en) | Virtual image generation method and related device, electronic equipment and storage medium | |
CN112508048A (en) | Image description generation method and device | |
CN114360018B (en) | Rendering method and device of three-dimensional facial expression, storage medium and electronic device | |
CN111368707A (en) | Face detection method, system, device and medium based on feature pyramid and dense block | |
CN109409305A (en) | A kind of facial image clarity evaluation method and device | |
CN116701706B (en) | Data processing method, device, equipment and medium based on artificial intelligence | |
CN111046213B (en) | Knowledge base construction method based on image recognition | |
CN110287761A (en) | A kind of face age estimation method analyzed based on convolutional neural networks and hidden variable | |
CN109658369A (en) | Video intelligent generation method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |