CN110096617A - Video classification methods, device, electronic equipment and computer readable storage medium - Google Patents

Video classification methods, device, electronic equipment and computer readable storage medium Download PDF

Info

Publication number
CN110096617A
CN110096617A CN201910357559.2A CN201910357559A CN110096617A CN 110096617 A CN110096617 A CN 110096617A CN 201910357559 A CN201910357559 A CN 201910357559A CN 110096617 A CN110096617 A CN 110096617A
Authority
CN
China
Prior art keywords
sequence
feature
video
subcharacter
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910357559.2A
Other languages
Chinese (zh)
Other versions
CN110096617B (en
Inventor
龙翔
何栋梁
李甫
迟至真
周志超
赵翔
李鑫
文石磊
丁二锐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201910357559.2A priority Critical patent/CN110096617B/en
Publication of CN110096617A publication Critical patent/CN110096617A/en
Application granted granted Critical
Publication of CN110096617B publication Critical patent/CN110096617B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/75Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Television Signal Processing For Recording (AREA)

Abstract

The present invention provides a kind of video classification methods, device, electronic equipment and computer readable storage medium.This method comprises: obtaining the fisrt feature sequence of video to be sorted;Wherein, the feature in fisrt feature sequence arranges sequentially in time;By fisrt feature sequence inputting target pyramid attention network, the first output result of target pyramid attention network output is obtained;According to the first output as a result, obtaining object vector;According to object vector, classify to video to be sorted.Compared with prior art, the embodiment of the present invention can effectively improve the classification effectiveness of video, and, target pyramid attention network is using attention type method, its most effective feature that can extract and merge video can preferably guarantee the accuracy of classification results to be used for visual classification in this way.

Description

Video classification methods, device, electronic equipment and computer readable storage medium
Technical field
The present embodiments relate to visual classification technical fields more particularly to a kind of video classification methods, device, electronics to set Standby and computer readable storage medium.
Background technique
Visual classification is one of task most important, most basic in computer vision, and visual classification refers to by analyzing, managing The relevant information for solving video, video is assigned in predefined classification, visual classification is in video search, video recommendations etc. Key effect is played under application scenarios, the video techniques such as visual classification or video tab, video monitor, video title generation Important dependence.
Currently, common visual classification mode are as follows: directly input all frames of video and be used to carry out setting for visual classification It is standby, to obtain the classification results of equipment output.When in this way, need to analyze all frames of video, the classification of video Efficiency is very low.
Summary of the invention
The embodiment of the present invention provides a kind of video classification methods, device, electronic equipment and computer readable storage medium, with Solve the problems, such as that the classification effectiveness of existing visual classification mode is low.
In order to solve the above-mentioned technical problem, the present invention is implemented as follows:
In a first aspect, the embodiment of the present invention provides a kind of video classification methods, which comprises
Obtain the fisrt feature sequence of video to be sorted;Wherein, the feature in the fisrt feature sequence is suitable according to the time Sequence arrangement;
By the fisrt feature sequence inputting target pyramid attention network, the target pyramid attention net is obtained First output result of network output;
According to first output as a result, obtaining object vector;
According to the object vector, classify to the video to be sorted.
Second aspect, the embodiment of the present invention provide a kind of visual classification device, and described device includes:
First obtains module, for obtaining the fisrt feature sequence of video to be sorted;Wherein, in the fisrt feature sequence Feature arrange sequentially in time;
Second obtains module, for by the fisrt feature sequence inputting target pyramid attention network, described in acquisition First output result of target pyramid attention network output;
Third obtains module, for exporting according to described first as a result, obtaining object vector;
Categorization module, for classifying to the video to be sorted according to the object vector.
The third aspect, the embodiment of the present invention provide a kind of electronic equipment, including processor, memory, are stored in described deposit On reservoir and the computer program that can run on the processor, the computer program are realized when being executed by the processor The step of above-mentioned video classification methods.
Fourth aspect, the embodiment of the present invention provide a kind of computer readable storage medium, the computer-readable storage medium Computer program is stored in matter, the step of computer program realizes above-mentioned video classification methods when being executed by processor.
It, can be first defeated by the fisrt feature sequence of video to be sorted in order to realize the classification of video in the embodiment of the present invention Enter target pyramid attention network, then according to the first output of target pyramid attention network output as a result, obtaining mesh Mark vector classifies to video to be sorted finally according to object vector.As it can be seen that in the embodiment of the present invention, using wait divide The fisrt feature sequence and target pyramid attention network of class video, can be realized the classification of video, in this way, with existing The case where must analyzing all frames in video to be sorted in technology, is compared, and the embodiment of the present invention can effectively improve The classification effectiveness of video, also, target pyramid attention network can be extracted and be merged using attention type method The most effective feature of video can preferably guarantee the accuracy of classification results to be used for visual classification in this way.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, needed in being described below to the embodiment of the present invention Attached drawing to be used is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, For those of ordinary skill in the art, without any creative labor, it can also obtain according to these attached drawings Take other attached drawings.
Fig. 1 is the flow chart of video classification methods provided in an embodiment of the present invention;
Fig. 2 is one of the schematic diagram of video classification methods provided in an embodiment of the present invention;
Fig. 3 is the use sequence chart of charging plug;
Fig. 4 is the two of the schematic diagram of video classification methods provided in an embodiment of the present invention;
Fig. 5 is the three of the schematic diagram of video classification methods provided in an embodiment of the present invention;
Fig. 6 is the structural block diagram of visual classification device provided in an embodiment of the present invention;
Fig. 7 is the structural schematic diagram of electronic equipment provided in an embodiment of the present invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are some of the embodiments of the present invention, instead of all the embodiments.Based on this hair Embodiment in bright, those of ordinary skill in the art's acquired every other implementation without creative efforts Example, shall fall within the protection scope of the present invention.
Video classification methods provided in an embodiment of the present invention are illustrated first below.
It should be noted that video classification methods provided in an embodiment of the present invention are applied to electronic equipment.Specifically, electronics Equipment can be server, and certainly, the type of electronic equipment is not limited to server, may be other kinds of, energy It is enough in the equipment for carrying out visual classification, the embodiment of the present invention does not do any restriction to the type of electronic equipment.
Referring to Fig. 1, the flow chart of video classification methods provided in an embodiment of the present invention is shown in figure.As shown in Figure 1, should Method includes the following steps:
Step 101, the fisrt feature sequence of video to be sorted is obtained;Wherein, the feature in fisrt feature sequence according to when Between sequentially arrange.
In a step 101, electronic equipment can use based on convolutional neural networks (Convolutional Neural Networks, CNN) model, carry out video key feature extract, to obtain the fisrt feature sequence of video to be sorted;Its In, the feature in fisrt feature sequence can be arranged according to the time by the sequence of morning to evening.It is understood that CNN is a kind of Comprising convolutional calculation, and the feedforward neural network (Feedforward Neural Networks, FNN) with depth structure, One of the representative algorithm of CNN or deep learning (deep learning).
Step 102, by fisrt feature sequence inputting target pyramid attention network, target pyramid attention net is obtained First output result of network output.
It here, can only include a type of pyramid attention network in target pyramid attention network, for example, It can only include time pyramid attention network or channel pyramid attention network in target pyramid attention network;Or Person may include the pyramid attention network of at least two types in target pyramid attention network, for example, target gold word It can simultaneously include time pyramid attention network and channel pyramid attention network in tower attention network.
If including the pyramid attention network of at least two types in target pyramid attention network, in step In 102, fisrt feature sequence can be inputted to each type of pyramid attention network respectively, to obtain each type respectively The output of pyramid attention network the first output as a result, and first defeated according to each type of pyramid attention network Out as a result, executing subsequent step 103.
Step 103, according to the first output as a result, obtaining object vector.
Here, object vector is the vector that can represent the feature of entire video to be sorted.It should be noted that according to One exports the specific implementation form multiplicity as a result, acquisition object vector, clear in order to be laid out, subsequent to carry out citing introduction.
Step 104, according to object vector, classify to video to be sorted.
It should be noted that video classification involved in the embodiment of the present invention there can be K kind in total, it is followed successively by B1、 B2、……、BK;Wherein, K is the integer greater than 1.According to object vector, after classifying to video to be sorted, electronic equipment It may include K probability value in obtained classification results, be followed successively by G1、G2、……、GK;Wherein, G1Belong to B for video to be sorted1 This other probability value of video class, G2Belong to B for video to be sorted2This other probability value ... ... of video class, GKIt is to be sorted Video belongs to BKThis other probability value of video class.
At step 104, if when visual classification to be sorted, progress be single label adaptation, then, G1、G2、……、 GKThis K probability value is 1 with value;If when visual classification to be sorted, progress be multi-tag adaptation, then, G1、 G2、……、GKThis K probability value may be 1 with value, it is also possible to be not 1.
It, can be first defeated by the fisrt feature sequence of video to be sorted in order to realize the classification of video in the embodiment of the present invention Enter target pyramid attention network, then according to the first output of target pyramid attention network output as a result, obtaining mesh Mark vector classifies to video to be sorted finally according to object vector.As it can be seen that in the embodiment of the present invention, using wait divide The fisrt feature sequence and target pyramid attention network of class video, can be realized the classification of video, in this way, with existing The case where must analyzing all frames in video to be sorted in technology, is compared, and the embodiment of the present invention can effectively improve The classification effectiveness of video, also, target pyramid attention network can be extracted and be merged using attention type method The most effective feature of video can preferably guarantee the accuracy of classification results to be used for visual classification in this way.
Optionally, according to object vector, classify to video to be sorted, comprising:
Object vector is inputted into fully-connected network, to obtain fully-connected network output, the classification results of video to be sorted.
Here, fully-connected network may be considered one and train in advance, be stored in the disaggregated model of electronic equipment local; Wherein, disaggregated model can be using the object vector of multitude of video as input, and the type of multitude of video is instructed as output It gets.Specifically, disaggregated model can be obtained by electronic equipment self training;Alternatively, disaggregated model can be by other equipment Electronic equipment is distributed to after training.
In the present embodiment, only object vector need to be inputted fully-connected network, can be obtained the classification results of video to be sorted, Therefore, the operation for obtaining the classification results of video to be sorted implements very convenient.
Optionally, target pyramid attention network is time pyramid attention network;
It include M characteristic sequence set of time scale inequality in first output result, each characteristic sequence set is by the Each second feature Sequence composition that one characteristic sequence is divided according to corresponding time scale, in each characteristic sequence set Two characteristic sequences arrange sequentially in time, and the feature in each second feature sequence arranges sequentially in time, and M is greater than 1 Integer.
Here, the value of M can be 2,3,4,5,6 or the integer greater than 6, will not enumerate herein.In addition, due to The quantity of the time scale inequality of each characteristic sequence set, the second feature sequence in each characteristic sequence set can be difference 's.
Assuming that the fisrt feature sequence of video to be sorted is the X in Fig. 2(1)1, X(1)1In include sequentially in time arranging Feature x1, feature x2, feature x3, feature x4, feature x5, feature x6, feature x7With feature x8, by X(1)1Input time pyramid It may include 3 characteristic sequence collection in the first output result of time pyramid attention network output after attention network It closes.That is, the value of M is 3, at this moment, it is believed that the pyramid level of time pyramid attention network is 3 layers, example Level 1, level 2 and level 3 in for example Fig. 2, wherein level 1, level 2 and level 3 can be respectively corresponded The characteristic sequence set of different time scales.
Specifically, the corresponding characteristic sequence set of level 1 can be by a second feature Sequence composition, X(1)As this A second feature sequence.The corresponding characteristic sequence set of level 2 can be by X(2)1And X(2)2The two second feature sequence structures At;Wherein, X(2)1In include the x that arranges sequentially in time1、x2、x3And x4, X(2)2In include sequentially in time arranging x5、x6、x7And x8;The corresponding characteristic sequence set of level 3 can be by X(3)1、X(3)2、X(3)3And X(3)4This four second feature sequences Column are constituted;Wherein, X(3)1In include the x that arranges sequentially in time1And x2, X(3)2In include the x that arranges sequentially in time3With x4, X(3)3In include the x that arranges sequentially in time5And x6, X(3)4In include the x that arranges sequentially in time7And x8
It can easily be seen that the corresponding characteristic sequence set of level 1 is by X(1)1It is divided into what portion obtained from the time, The corresponding characteristic sequence set of level 2 is by X(1)1It is divided into what two equal portions obtained from the time, the corresponding feature of level 3 Arrangement set is by X(1)1It is divided into what four equal portions obtained from the time.
In this way, electronic equipment can obtain including the corresponding characteristic sequence set of level 1, the corresponding feature of level 2 First output result of arrangement set and the corresponding characteristic sequence set of level 3.Next, can be according to the first output As a result, obtaining object vector, and according to object vector, realize the classification to video to be sorted.
It should be noted that, if giving no thought to the timing of video, all features are put when carrying out the classification of video In a unordered set, all key features are treated equally in a group, the sequential correlation between complete override feature, this It is effective in some scenes, but is invalid in other scenes.For example, as shown in figure 3, if all keys All unordered pair waits for feature in a group, then the operation that user cannot be distinguished is that charging plug is inserted into socket on earth, still Charging plug is come out of the socket.
In view of this, time pyramid attention network can be used, first by the first of video to be sorted in the present embodiment Characteristic sequence is divided into several second feature sequences under some time scale, then obtains the feature sequence of some time scale Column are gathered, and the second feature sequence in characteristic sequence set arranges sequentially in time, the feature in second feature sequence Arrange sequentially in time, timing can be introduced in unordered attention mechanism in this way, with efficiently solve strong timing according to Bad visual classification problem.As it can be seen that the present embodiment is applicable not only to the visual classification under weak Temporal dependency scene, it is also applied for strong Visual classification under Temporal dependency scene.
Optionally, the value of M is bigger, and the video length of video to be sorted is longer.
Specifically, the corresponding relationship that can be previously stored in electronic equipment between video length range and the value of M;Its In, this video length range can this value be corresponding with 5 within 10 minutes to 15 minutes, 5 minutes to 10 minutes this videos when Long range can this value be corresponding with 4, this video length range can this value be corresponding with 3 within 0 minute to 5 minutes.
So, the case where the video length of video to be sorted is located at 10 minutes to 15 minutes this video length range Under, it may include 5 characteristic sequence set of time scale inequality, at this moment, time pyramid attention in the first output result The time of network pyramidal level is 5 grades.When the video length of video to be sorted is located at 0 minute to 5 minutes this videos It may include 3 characteristic sequence set of time scale inequality, at this moment, time in the case where long range, in the first output result The time pyramidal level of pyramid attention network is 3 grades.
As it can be seen that in the present embodiment, when in use between pyramid attention network when, time pyramidal level is not complete Changeless, time pyramidal level can neatly be adjusted according to the length of the video length of video to be sorted, So that time pyramidal level matches with the video length of video to be sorted, to guarantee classification effectiveness and classification effect Fruit.
Optionally, according to the first output as a result, obtaining object vector, comprising:
Respectively by each second feature sequence inputting channel pyramid attention network in the first output result, to obtain Channel pyramid attention network exports respectively, and each second feature sequence corresponding second exports result;
According to corresponding second output of each second feature sequence as a result, obtaining object vector;
It wherein, include N number of son spy of feature fine granularity inequality in the corresponding second output result of any second feature sequence Arrangement set is levied, each subcharacter arrangement set is split by a second feature sequence according to individual features fine granularity Each subcharacter Sequence composition, the subcharacter in each subcharacter sequence arrange sequentially in time, and N is the integer greater than 1.
Here, the value of N can be 2,3,4,5,6 or the integer greater than 6, will not enumerate herein.In addition, M Value and the value of N may be the same or different.
Assuming that channel pyramid attention network is indicated with CPAtt, as shown in Fig. 2, obtaining including X(1)、X(2)1、X(2)2、 X(3)1、X(3)2、X(3)3And X(3)4This 7 second feature sequences first output result after, electronic equipment can respectively by this 7 A second feature sequence inputting CPAtt, to obtain CPAtt output, the corresponding 7 second output knot of 7 second feature sequences Fruit.
Assuming that a certain second feature sequence in above-mentioned 7 second feature sequences can also be expressed as the X in Fig. 4(1)1, and X(1)1In include the feature x that arranges sequentially in time1, feature x2..., feature xL, by X(1)1Input channel pyramid note It anticipates after power network, may include feature fine granularity inequality in the second output result of channel pyramid attention network output 3 sub- characteristic sequence set.That is, the value of N is 3, at this moment, it is believed that the golden word of channel pyramid attention network Tower level is 3 layers, and level 1, level 2 and the level 3, level 1, level 2 and level 3 in for example, Fig. 4 can To respectively correspond the fine-grained subcharacter arrangement set of different characteristic.
Specifically, the corresponding subcharacter arrangement set of level 1 is made of a sub- characteristic sequence, X(1)1As this height Characteristic sequence.The corresponding subcharacter arrangement set of level 2 can be by X(2)1And X(2)2The two subcharacter Sequence compositions;Wherein, X(2)1In include by x1Divide one of two obtained subcharacters, by x2Divide one in two obtained subcharacters Person ... ..., by xLDivide obtained one of two subcharacters, X(2)2In include by x1Divide in two obtained subcharacters Another one, by x2Divide obtained the other of two subcharacters ... ..., by xLDivide in two obtained subcharacters Another one.The corresponding subcharacter arrangement set of level3 can be by X(3)1、X(3)2、X(3)3And X(3)4This four sub- characteristic sequence structures At;Wherein, X(3)1In include by x1Divide the one in four obtained subcharacters, by x2Divide four obtained subcharacters In one ... ..., by xLDivide the one in four obtained subcharacters;X(3)2In include by x1Divide four obtained The two in a subcharacter, by x2Divide the two ... ... in four obtained subcharacters, by xLDivide four obtained The two in subcharacter, X(3)3And X(3)4In include content the rest may be inferred, details are not described herein.
It should be noted that the content reference for including in the corresponding second output result of other second feature sequences is above stated Bright, details are not described herein.Later, can according to each second feature sequence it is corresponding second output as a result, obtain target to Amount.
In a specific embodiment, in any second output result further include each subcharacter sequence included by it In the corresponding weight of each subcharacter;
According to corresponding second output of each second feature sequence as a result, obtaining object vector, comprising:
For each subcharacter sequence in each second output result, according to each subcharacter therein and corresponding power Weight, is weighted summation, obtains corresponding feature vector;
According to the corresponding feature vector of all subcharacter sequences, splicing operation is carried out, obtains splicing vector;
Vector will be spliced as object vector.
Specifically, for above-mentioned X(2)1This subcharacter sequence, it is assumed that its subcharacter for including is followed successively by x11、 x21、……、xL1, wherein x11、x21、……、xL1It is vector form, and x11Corresponding weight is z1、x21Corresponding weight is z2、xL1Corresponding weight is zL, then, X(2)1Corresponding feature vector y can be calculated using following formula:
Y=x11z1+x21z2+……+xL1zL
It should be noted that the calculation of other corresponding feature vectors of subcharacter sequence is referring to above-mentioned to X(2)1This The explanation of subcharacter sequence, details are not described herein.It, can be with after obtaining the corresponding feature vector of all subcharacter sequences Splicing operation is carried out to these feature vectors, to obtain for the splicing vector as object vector.
It should be noted that the Att in Fig. 4 may be considered the calculating operation of feature vector, in Fig. 2 and Fig. 4 Contact may be considered the concatenation of vector.As shown in figure 4, can be to X(2)1Corresponding feature vector and X(2)2It is corresponding Feature vector carries out splicing operation, to obtain the first splicing vector, such as obtains the y in Fig. 4(2);To X(3)1Corresponding feature to Amount, X(3)2Corresponding feature vector, X(3)3Corresponding feature vector and X(3)4Corresponding feature vector carries out splicing operation, with To the second splicing vector, such as obtain the y3 in Fig. 4(3).Next, again to X(1)1Corresponding feature vector (such as the y in Fig. 4(1)), first splicing vector sum second splice vector carry out splicing operation, with obtain third splicing vector, the third splice vector It is corresponding with a certain second feature sequence in above-mentioned 7 second feature sequences.
Later, it can also be obtained and other 6 second feature sequences corresponding 6 according to the mode similar with above-mentioned process A third splices vector, is also just saying, can finally obtain X(1)、X(2)1、X(2)2、X(3)1、X(3)2、X(3)3And X(3)4This 7 second Corresponding 7 thirds of characteristic sequence splice vector.It at this moment, as shown in Fig. 2, can be to X(2)1Corresponding third splices vector sum X(2)2Corresponding third splicing vector carries out splicing operation, to obtain the 4th splicing vector, such as obtains the y in Fig. 2(2);It is right X(3)1Corresponding third splices vector, X(3)2Corresponding third splices vector, X(3)3Corresponding third splices vector sum X(3)4It is corresponding Third splicing vector carry out splicing operation, to obtain the 5th splicing vector, such as obtain the y in Fig. 2(3).Next, right again X(2)1Corresponding third splicing vector (such as the y in Fig. 2(1)), the 4th splicing vector sum the 5th splice vector carry out splicing fortune It calculates, to obtain for the splicing vector as object vector.
It should be noted that when carrying out the classification of video, if electronic equipment is directly each feature calculation one power Weight, the weight of each feature are used directly for visual classification, but in many cases, only have passage portion in video to be sorted Facilitate visual classification.For example, as shown in figure 5, may include two videos of Frame1 and Frame2 in video to be sorted Frame, the two video frames each contribute to visual classification, still, the important channel in the two video frames be it is visibly different, The important channel of Frame1 corresponds to rectangle frame 510 and encloses the region set, and the important channel of Frame2 corresponds to rectangle frame 520 and encloses the area set Domain.On the basis of Fig. 5, weight is respectively specified that if it is the entire feature of two video frames, as shown in the lower left corner in Fig. 5, only The weight of relative equilibrium can be provided for two video frames, for example, the power specified for the two features of Feature1 and Feature2 Weight can be 0.5, in this way, the weight of uncorrelated noise is also 0.5, the important channel of two features can become after weighted average Weak, the accuracy that this will lead to visual classification is lower.
In view of this, channel pyramid attention network can be used from coarse to fine gradually will be each in the present embodiment Image Segmentation Methods Based on Features is several subcharacters, and specifies corresponding weight for each subcharacter, in this way, as shown in the lower right corner in Fig. 5, it can To set 1.0 for the weight of the pith in each feature, and 0.0 is set by the weight of inessential part, for example, can 1.0 are set as with the weight for the subcharacter that Feature1 this feature is located at top half, by this Q-character of Feature1 It is set as 0.0 in the weight of the subcharacter of lower half portion, and it is possible to which Feature2 this feature to be located to the son of top half The weight of feature is set as 0.0, sets 1.0 for the weight for the subcharacter that Feature2 this feature is located at lower half portion, this Sample, after subsequent be weighted, important channel information can be fully retained, and be conducive to obtain in this way subject to more True classification results.As it can be seen that, by using channel pyramid attention network, can effectively guarantee to classify in the present embodiment As a result accuracy.
Optionally, the value of N is bigger, and the video length of video to be sorted is longer.
Specifically, the corresponding relationship that can be previously stored in electronic equipment between video length range and the value of N;Its In, this video length range can this value be corresponding with 5 within 10 minutes to 15 minutes, 5 minutes to 10 minutes this videos when Long range can this value be corresponding with 4, this video length range can this value be corresponding with 3 within 0 minute to 5 minutes.
So, the case where the video length of video to be sorted is located at 10 minutes to 15 minutes this video length range Under, it may include 5 sub- characteristic sequence set of feature fine granularity inequality, at this moment, channel gold word in each second output result The pyramidal level in channel of tower attention network is 5 grades.Video to be sorted video length be located at 0 minute to 5 minutes this It may include 3 subcharacter sequences of feature fine granularity inequality in the case where a video length range, in each second output result Column set, at this moment, the pyramidal level in channel of channel pyramid attention network are 3 grades.
As it can be seen that when using channel pyramid attention network, the pyramidal level in channel is not complete in the present embodiment Changeless, the pyramidal level in channel can neatly be adjusted according to the length of the video length of video to be sorted, So that the pyramidal level in channel and the video length of video to be sorted match, to guarantee classification effectiveness and classification effect Fruit.
Optionally, the quantity of fisrt feature sequence is at least two, and the corresponding characteristic type of each fisrt feature sequence is mutual It is different.
Here, the quantity of fisrt feature sequence can be two, three, four or four or more, herein no longer one by one It enumerates.
In a specific embodiment, at least two fisrt feature sequences may include first object characteristic sequence, Two target signature sequences and third target signature sequence;Wherein,
The corresponding characteristic type of first object characteristic sequence is characteristics of image type, the corresponding spy of the second target signature sequence Sign type is Optical-flow Feature type, and the corresponding characteristic type of third target signature sequence is phonetic feature type.
In another embodiment specific implementation mode, at least two fisrt feature sequences can only include first object characteristic sequence With the second target signature sequence;Wherein,
The corresponding characteristic type of first object characteristic sequence is characteristics of image type, Optical-flow Feature type and phonetic feature class Any one of type;The corresponding characteristic type of second target signature sequence is characteristics of image type, Optical-flow Feature type and voice Any one of characteristic type.
It should be noted that the fisrt feature sequence of different characteristic type may be considered the different modalities of video to be sorted Feature carries out visual classification by using at least two fisrt feature sequences, can be realized multi-modal fusion, to improve classification Robustness and precision.
As it can be seen that in the present embodiment, it can be based on the multi-modal feature of video to be sorted, with time pyramid attention network With both pyramid attention networks of channel pyramid attention network, the classification of Lai Jinhang video.It specifically, can be first The key features such as characteristics of image, Optical-flow Feature and the phonetic feature of model extraction video based on convolutional neural networks are first used, Then the fisrt feature sequence of various characteristic types is successively passed through into time time pyramid attention network and channel pyramid The feature of various characteristic types is connected fusion again later by attention network, obtains the feature for representing entire video to be sorted Object vector is classified finally by a fully-connected network, so that it is possible in each classification to obtain video to be sorted Probability is so far achieved that the classification of video.
Timing information is not considered in the prior art by the above-mentioned means, can overcome using time pyramid attention network Weakness, and whole classification accuracy and classification effectiveness can be improved using channel pyramid attention network, in this way, this reality Apply video classification methods in example single label, multi-tag, short-sighted frequency, long video, weak Temporal dependency, strong Temporal dependency video Under scene of classifying, all obtain well as a result, also, this method application can reduce training for different classifications scene and Tuning time, the more succinct intelligence of whole flow process save human cost.
Visual classification device provided in an embodiment of the present invention is illustrated below.
Fig. 6 is participated in, shows the structural block diagram of visual classification device 600 provided in an embodiment of the present invention in figure.Such as Fig. 6 institute Show, visual classification device 600 includes:
First obtains module 601, for obtaining the fisrt feature sequence of video to be sorted;Wherein, in fisrt feature sequence Feature arrange sequentially in time;
Second obtains module 602, for obtaining target gold for fisrt feature sequence inputting target pyramid attention network First output result of word tower attention network output;
Third obtains module 603, for exporting according to first as a result, obtaining object vector;
Categorization module 604, for classifying to video to be sorted according to object vector.
Optionally, target pyramid attention network is time pyramid attention network;
It include M characteristic sequence set of time scale inequality in first output result, each characteristic sequence set is by the Each second feature Sequence composition that one characteristic sequence is divided according to corresponding time scale, in each characteristic sequence set Two characteristic sequences arrange sequentially in time, and the feature in each second feature sequence arranges sequentially in time, and M is greater than 1 Integer.
Optionally, the value of M is bigger, and the video length of video to be sorted is longer.
Optionally, third obtains module 603, comprising:
First obtains unit, for exporting each second feature sequence inputting channel pyramid in result by first respectively Attention network is exported respectively with to obtain channel pyramid attention network, and each second feature sequence is corresponding second defeated Result out;
Second obtaining unit, for being exported according to each second feature sequence corresponding second as a result, obtaining object vector;
It wherein, include N number of son spy of feature fine granularity inequality in the corresponding second output result of any second feature sequence Arrangement set is levied, each subcharacter arrangement set is split by a second feature sequence according to individual features fine granularity Each subcharacter Sequence composition, the subcharacter in each subcharacter sequence arrange sequentially in time, and N is the integer greater than 1.
Optionally, the value of N is bigger, and the video length of video to be sorted is longer.
Optionally, in any second output result further include each subcharacter in each subcharacter sequence included by it Corresponding weight;
Second obtaining unit, comprising:
First obtains subelement, for exporting each subcharacter sequence in result for each second, according to therein Each subcharacter and respective weights are weighted summation, obtain corresponding feature vector;
Second obtains subelement, for carrying out splicing operation, obtaining according to the corresponding feature vector of all subcharacter sequences Splice vector;
Subelement is determined, for vector will to be spliced as object vector.
Optionally, categorization module 604 are specifically used for:
Object vector is inputted into fully-connected network, to obtain fully-connected network output, the classification results of video to be sorted.
Optionally, the quantity of fisrt feature sequence is at least two, and the corresponding characteristic type of each fisrt feature sequence is mutual It is different.
Optionally, at least two fisrt feature sequences include first object characteristic sequence, the second target signature sequence and Three target signature sequences;Wherein,
The corresponding characteristic type of first object characteristic sequence is characteristics of image type, the corresponding spy of the second target signature sequence Sign type is Optical-flow Feature type, and the corresponding characteristic type of third target signature sequence is phonetic feature type.
It, can be first defeated by the fisrt feature sequence of video to be sorted in order to realize the classification of video in the embodiment of the present invention Enter target pyramid attention network, then according to the first output of target pyramid attention network output as a result, obtaining mesh Mark vector classifies to video to be sorted finally according to object vector.As it can be seen that in the embodiment of the present invention, using wait divide The fisrt feature sequence and target pyramid attention network of class video, can be realized the classification of video, in this way, with existing The case where must analyzing all frames in video to be sorted in technology, is compared, and the embodiment of the present invention can effectively improve The classification effectiveness of video, also, target pyramid attention network can be extracted and be merged using attention type method The most effective feature of video can preferably guarantee the accuracy of classification results to be used for visual classification in this way.
Electronic equipment provided in an embodiment of the present invention is illustrated below.
Referring to Fig. 7, the structural schematic diagram of electronic equipment 700 provided in an embodiment of the present invention is shown in figure.Such as Fig. 7 institute Show, electronic equipment 700 includes: processor 701, memory 703, user interface 704 and bus interface.
Processor 701 executes following process for reading the program in memory 703:
Obtain the fisrt feature sequence of video to be sorted;Wherein, the feature in fisrt feature sequence is arranged sequentially in time Column;
By fisrt feature sequence inputting target pyramid attention network, the output of target pyramid attention network is obtained First output result;
According to the first output as a result, obtaining object vector;
According to object vector, classify to video to be sorted.
In Fig. 7, bus architecture may include the bus and bridge of any number of interconnection, specifically be represented by processor 701 One or more processors and the various circuits of memory that represent of memory 703 link together.Bus architecture can be with Various other circuits of such as peripheral equipment, voltage-stablizer and management circuit or the like are linked together, these are all these Well known to field, therefore, it will not be further described herein.Bus interface provides interface.For different users Equipment, user interface 704, which can also be, external the interface for needing equipment is inscribed, and the equipment of connection includes but is not limited to small key Disk, display, loudspeaker, microphone, control stick etc..
Processor 701, which is responsible for management bus architecture and common processing, memory 703, can store processor 701 and is holding Used data when row operation.
Optionally, target pyramid attention network is time pyramid attention network;
It include M characteristic sequence set of time scale inequality in first output result, each characteristic sequence set is by the Each second feature Sequence composition that one characteristic sequence is divided according to corresponding time scale, in each characteristic sequence set Two characteristic sequences arrange sequentially in time, and the feature in each second feature sequence arranges sequentially in time, and M is greater than 1 Integer.
Optionally, the value of M is bigger, and the video length of video to be sorted is longer.
Optionally, processor 701 are specifically used for:
Respectively by each second feature sequence inputting channel pyramid attention network in the first output result, to obtain Channel pyramid attention network exports respectively, and each second feature sequence corresponding second exports result;
According to corresponding second output of each second feature sequence as a result, obtaining object vector;
It wherein, include N number of son spy of feature fine granularity inequality in the corresponding second output result of any second feature sequence Arrangement set is levied, each subcharacter arrangement set is split by a second feature sequence according to individual features fine granularity Each subcharacter Sequence composition, the subcharacter in each subcharacter sequence arrange sequentially in time, and N is the integer greater than 1.
Optionally, the value of N is bigger, and the video length of video to be sorted is longer.
Optionally, in any second output result further include each subcharacter in each subcharacter sequence included by it Corresponding weight;
Processor 701, is specifically used for:
For each subcharacter sequence in each second output result, according to each subcharacter therein and corresponding power Weight, is weighted summation, obtains corresponding feature vector;
According to the corresponding feature vector of all subcharacter sequences, splicing operation is carried out, obtains splicing vector;
Vector will be spliced as object vector.
Optionally, processor 701 are specifically used for:
Object vector is inputted into fully-connected network, to obtain fully-connected network output, the classification results of video to be sorted.
Optionally, the quantity of fisrt feature sequence is at least two, and the corresponding characteristic type of each fisrt feature sequence is mutual It is different.
Optionally, at least two fisrt feature sequences include first object characteristic sequence, the second target signature sequence and Three target signature sequences;Wherein,
The corresponding characteristic type of first object characteristic sequence is characteristics of image type, the corresponding spy of the second target signature sequence Sign type is Optical-flow Feature type, and the corresponding characteristic type of third target signature sequence is phonetic feature type.
It, can be first defeated by the fisrt feature sequence of video to be sorted in order to realize the classification of video in the embodiment of the present invention Enter target pyramid attention network, then according to the first output of target pyramid attention network output as a result, obtaining mesh Mark vector classifies to video to be sorted finally according to object vector.As it can be seen that in the embodiment of the present invention, using wait divide The fisrt feature sequence and target pyramid attention network of class video, can be realized the classification of video, in this way, with existing The case where must analyzing all frames in video to be sorted in technology, is compared, and the embodiment of the present invention can effectively improve The classification effectiveness of video, also, target pyramid attention network can be extracted and be merged using attention type method The most effective feature of video can preferably guarantee the accuracy of classification results to be used for visual classification in this way.
Preferably, the embodiment of the present invention also provides a kind of electronic equipment, including processor 701, and memory 703 is stored in On memory 703 and the computer program that can run on the processor 701, the computer program are executed by processor 701 Each process of the above-mentioned video classification methods embodiment of Shi Shixian, and identical technical effect can be reached, to avoid repeating, here It repeats no more.
The embodiment of the present invention also provides a kind of computer readable storage medium, and meter is stored on computer readable storage medium Calculation machine program, the computer program realize each process of above-mentioned video classification methods embodiment, and energy when being executed by processor Reach identical technical effect, to avoid repeating, which is not described herein again.Wherein, the computer readable storage medium, such as only Read memory (Read-Only Memory, abbreviation ROM), random access memory (Random Access Memory, abbreviation RAM), magnetic or disk etc..
The embodiment of the present invention is described with above attached drawing, but the invention is not limited to above-mentioned specific Embodiment, the above mentioned embodiment is only schematical, rather than restrictive, those skilled in the art Under the inspiration of the present invention, without breaking away from the scope protected by the purposes and claims of the present invention, it can also make very much Form belongs within protection of the invention.

Claims (20)

1. a kind of video classification methods, which is characterized in that the described method includes:
Obtain the fisrt feature sequence of video to be sorted;Wherein, the feature in the fisrt feature sequence is arranged sequentially in time Column;
By the fisrt feature sequence inputting target pyramid attention network, it is defeated to obtain the target pyramid attention network The first output result out;
According to first output as a result, obtaining object vector;
According to the object vector, classify to the video to be sorted.
2. the method according to claim 1, wherein
The target pyramid attention network is time pyramid attention network;
It include M characteristic sequence set of time scale inequality, each characteristic sequence set in the first output result Each second feature Sequence composition divided by the fisrt feature sequence according to corresponding time scale, each feature sequence Column set in the second feature sequence arrange sequentially in time, the feature in each second feature sequence according to when Between sequentially arrange, M is integer greater than 1.
3. according to the method described in claim 2, the video length of the video to be sorted is got over it is characterized in that, the value of M is bigger It is long.
4. according to the method described in claim 2, it is characterized in that, it is described according to it is described first output as a result, obtain target to Amount, comprising:
Each of result second feature sequence inputting channel pyramid attention network is exported by described first respectively, with Obtain what the channel pyramid attention network exported respectively, each second feature sequence corresponding second exports knot Fruit;
It is exported according to each second feature sequence corresponding second as a result, obtaining object vector;
It wherein, include N number of son spy of feature fine granularity inequality in the corresponding second output result of any second feature sequence Arrangement set is levied, each subcharacter arrangement set is divided by a second feature sequence according to individual features fine granularity Each subcharacter Sequence composition cut, the subcharacter in each subcharacter sequence arrange sequentially in time, N be greater than 1 integer.
5. according to the method described in claim 4, the video length of the video to be sorted is got over it is characterized in that, the value of N is bigger It is long.
6. according to the method described in claim 4, it is characterized in that, further including included by it in any second output result Each of each of the subcharacter sequence corresponding weight of the subcharacter;
It is described to be exported according to each second feature sequence corresponding second as a result, obtaining object vector, comprising:
For each second output each of result subcharacter sequence, according to each subcharacter therein with And respective weights, it is weighted summation, obtains corresponding feature vector;
According to the corresponding feature vector of all subcharacter sequences, splicing operation is carried out, obtains splicing vector;
Using the splicing vector as object vector.
7. the method according to claim 1, wherein described according to the object vector, to the view to be sorted Frequency is classified, comprising:
The object vector is inputted into fully-connected network, to obtain the fully-connected network output, the video to be sorted Classification results.
8. the method according to claim 1, wherein the quantity of the fisrt feature sequence is at least two, often The corresponding characteristic type inequality of a fisrt feature sequence.
9. according to the method described in claim 8, it is characterized in that, at least two fisrt feature sequences include first object Characteristic sequence, the second target signature sequence and third target signature sequence;Wherein,
The corresponding characteristic type of the first object characteristic sequence is characteristics of image type, and the second target signature sequence is corresponding Characteristic type be Optical-flow Feature type, the corresponding characteristic type of the third target signature sequence is phonetic feature type.
10. a kind of visual classification device, which is characterized in that described device includes:
First obtains module, for obtaining the fisrt feature sequence of video to be sorted;Wherein, the spy in the fisrt feature sequence Sign arranges sequentially in time;
Second obtains module, for obtaining the target for the fisrt feature sequence inputting target pyramid attention network First output result of pyramid attention network output;
Third obtains module, for exporting according to described first as a result, obtaining object vector;
Categorization module, for classifying to the video to be sorted according to the object vector.
11. device according to claim 10, which is characterized in that
The target pyramid attention network is time pyramid attention network;
It include M characteristic sequence set of time scale inequality, each characteristic sequence set in the first output result Each second feature Sequence composition divided by the fisrt feature sequence according to corresponding time scale, each feature sequence Column set in the second feature sequence arrange sequentially in time, the feature in each second feature sequence according to when Between sequentially arrange, M is integer greater than 1.
12. device according to claim 11, which is characterized in that the value of M is bigger, the video length of the video to be sorted It is longer.
13. device according to claim 11, which is characterized in that the third obtains module, comprising:
First obtains unit, for respectively by each of the first output result second feature sequence inputting channel gold Word tower attention network is exported respectively with to obtain the channel pyramid attention network, each second feature sequence Corresponding second output result;
Second obtaining unit, for being exported according to each second feature sequence corresponding second as a result, obtaining object vector;
It wherein, include N number of son spy of feature fine granularity inequality in the corresponding second output result of any second feature sequence Arrangement set is levied, each subcharacter arrangement set is divided by a second feature sequence according to individual features fine granularity Each subcharacter Sequence composition cut, the subcharacter in each subcharacter sequence arrange sequentially in time, N be greater than 1 integer.
14. device according to claim 13, which is characterized in that the value of N is bigger, the video length of the video to be sorted It is longer.
15. device according to claim 13, which is characterized in that further include that it is wrapped in any second output result Each of include each of the subcharacter sequence corresponding weight of the subcharacter;
Second obtaining unit, comprising:
First obtains subelement, for exporting each of the result subcharacter sequence for each described second, according to it Each of the subcharacter and respective weights, be weighted summation, obtain corresponding feature vector;
Second obtains subelement, for carrying out splicing operation, obtaining according to the corresponding feature vector of all subcharacter sequences Splice vector;
Subelement is determined, for using the splicing vector as object vector.
16. device according to claim 10, which is characterized in that the categorization module is specifically used for:
The object vector is inputted into fully-connected network, to obtain the fully-connected network output, the video to be sorted Classification results.
17. device according to claim 10, which is characterized in that the quantity of the fisrt feature sequence is at least two, The corresponding characteristic type inequality of each fisrt feature sequence.
18. device according to claim 17, which is characterized in that at least two fisrt feature sequences include the first mesh Mark characteristic sequence, the second target signature sequence and third target signature sequence;Wherein,
The corresponding characteristic type of the first object characteristic sequence is characteristics of image type, and the second target signature sequence is corresponding Characteristic type be Optical-flow Feature type, the corresponding characteristic type of the third target signature sequence is phonetic feature type.
19. a kind of electronic equipment, which is characterized in that including processor, memory is stored on the memory and can be described The computer program run on processor is realized when the computer program is executed by the processor as in claim 1 to 9 The step of described in any item video classification methods.
20. a kind of computer readable storage medium, which is characterized in that be stored with computer on the computer readable storage medium Program, the computer program realize video classification methods as claimed in any one of claims 1-9 wherein when being executed by processor The step of.
CN201910357559.2A 2019-04-29 2019-04-29 Video classification method and device, electronic equipment and computer-readable storage medium Active CN110096617B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910357559.2A CN110096617B (en) 2019-04-29 2019-04-29 Video classification method and device, electronic equipment and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910357559.2A CN110096617B (en) 2019-04-29 2019-04-29 Video classification method and device, electronic equipment and computer-readable storage medium

Publications (2)

Publication Number Publication Date
CN110096617A true CN110096617A (en) 2019-08-06
CN110096617B CN110096617B (en) 2021-08-10

Family

ID=67446566

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910357559.2A Active CN110096617B (en) 2019-04-29 2019-04-29 Video classification method and device, electronic equipment and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN110096617B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111246256A (en) * 2020-02-21 2020-06-05 华南理工大学 Video recommendation method based on multi-mode video content and multi-task learning
CN111291643A (en) * 2020-01-20 2020-06-16 北京百度网讯科技有限公司 Video multi-label classification method and device, electronic equipment and storage medium
CN111491187A (en) * 2020-04-15 2020-08-04 腾讯科技(深圳)有限公司 Video recommendation method, device, equipment and storage medium
CN111797800A (en) * 2020-07-14 2020-10-20 中国传媒大学 Video classification method based on content mining
CN112507920A (en) * 2020-12-16 2021-03-16 重庆交通大学 Examination abnormal behavior identification method based on time displacement and attention mechanism

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105917354A (en) * 2014-10-09 2016-08-31 微软技术许可有限责任公司 Spatial pyramid pooling networks for image processing
US20180032846A1 (en) * 2016-08-01 2018-02-01 Nvidia Corporation Fusing multilayer and multimodal deep neural networks for video classification
CN108416795A (en) * 2018-03-04 2018-08-17 南京理工大学 The video actions recognition methods of space characteristics is merged based on sequence pondization
CN108830212A (en) * 2018-06-12 2018-11-16 北京大学深圳研究生院 A kind of video behavior time shaft detection method
CN109359592A (en) * 2018-10-16 2019-02-19 北京达佳互联信息技术有限公司 Processing method, device, electronic equipment and the storage medium of video frame
CN109389055A (en) * 2018-09-21 2019-02-26 西安电子科技大学 Video classification methods based on mixing convolution sum attention mechanism
CN109670453A (en) * 2018-12-20 2019-04-23 杭州东信北邮信息技术有限公司 A method of extracting short video subject

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105917354A (en) * 2014-10-09 2016-08-31 微软技术许可有限责任公司 Spatial pyramid pooling networks for image processing
US20180032846A1 (en) * 2016-08-01 2018-02-01 Nvidia Corporation Fusing multilayer and multimodal deep neural networks for video classification
CN108416795A (en) * 2018-03-04 2018-08-17 南京理工大学 The video actions recognition methods of space characteristics is merged based on sequence pondization
CN108830212A (en) * 2018-06-12 2018-11-16 北京大学深圳研究生院 A kind of video behavior time shaft detection method
CN109389055A (en) * 2018-09-21 2019-02-26 西安电子科技大学 Video classification methods based on mixing convolution sum attention mechanism
CN109359592A (en) * 2018-10-16 2019-02-19 北京达佳互联信息技术有限公司 Processing method, device, electronic equipment and the storage medium of video frame
CN109670453A (en) * 2018-12-20 2019-04-23 杭州东信北邮信息技术有限公司 A method of extracting short video subject

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
胡春海: "视觉显著性驱动的运动鱼体视频分割算法", 《燕山大学学报》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291643A (en) * 2020-01-20 2020-06-16 北京百度网讯科技有限公司 Video multi-label classification method and device, electronic equipment and storage medium
CN111291643B (en) * 2020-01-20 2023-08-22 北京百度网讯科技有限公司 Video multi-label classification method, device, electronic equipment and storage medium
CN111246256A (en) * 2020-02-21 2020-06-05 华南理工大学 Video recommendation method based on multi-mode video content and multi-task learning
CN111491187A (en) * 2020-04-15 2020-08-04 腾讯科技(深圳)有限公司 Video recommendation method, device, equipment and storage medium
CN111491187B (en) * 2020-04-15 2023-10-31 腾讯科技(深圳)有限公司 Video recommendation method, device, equipment and storage medium
CN111797800A (en) * 2020-07-14 2020-10-20 中国传媒大学 Video classification method based on content mining
CN111797800B (en) * 2020-07-14 2024-03-05 中国传媒大学 Video classification method based on content mining
CN112507920A (en) * 2020-12-16 2021-03-16 重庆交通大学 Examination abnormal behavior identification method based on time displacement and attention mechanism
CN112507920B (en) * 2020-12-16 2023-01-24 重庆交通大学 Examination abnormal behavior identification method based on time displacement and attention mechanism

Also Published As

Publication number Publication date
CN110096617B (en) 2021-08-10

Similar Documents

Publication Publication Date Title
CN110096617A (en) Video classification methods, device, electronic equipment and computer readable storage medium
Wang et al. SaliencyGAN: Deep learning semisupervised salient object detection in the fog of IoT
Wang et al. A deep network solution for attention and aesthetics aware photo cropping
Li et al. Cnnpruner: Pruning convolutional neural networks with visual analytics
CN110147711A (en) Video scene recognition methods, device, storage medium and electronic device
CN110348387A (en) A kind of image processing method, device and computer readable storage medium
CN104933428B (en) A kind of face identification method and device based on tensor description
CN109145784A (en) Method and apparatus for handling video
CN110353675A (en) The EEG signals emotion identification method and device generated based on picture
CN105893478A (en) Tag extraction method and equipment
CN110503076A (en) Video classification methods, device, equipment and medium based on artificial intelligence
CN110378348A (en) Instance of video dividing method, equipment and computer readable storage medium
CN109325516A (en) A kind of integrated learning approach and device towards image classification
CN106529996A (en) Deep learning-based advertisement display method and device
CN105989067A (en) Method for generating text abstract from image, user equipment and training server
CN110472050A (en) A kind of clique's clustering method and device
CN114913303A (en) Virtual image generation method and related device, electronic equipment and storage medium
CN112508048A (en) Image description generation method and device
CN114360018B (en) Rendering method and device of three-dimensional facial expression, storage medium and electronic device
CN111368707A (en) Face detection method, system, device and medium based on feature pyramid and dense block
CN109409305A (en) A kind of facial image clarity evaluation method and device
CN116701706B (en) Data processing method, device, equipment and medium based on artificial intelligence
CN111046213B (en) Knowledge base construction method based on image recognition
CN110287761A (en) A kind of face age estimation method analyzed based on convolutional neural networks and hidden variable
CN109658369A (en) Video intelligent generation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant