CN109670453B - Method for extracting short video theme - Google Patents
Method for extracting short video theme Download PDFInfo
- Publication number
- CN109670453B CN109670453B CN201811567121.9A CN201811567121A CN109670453B CN 109670453 B CN109670453 B CN 109670453B CN 201811567121 A CN201811567121 A CN 201811567121A CN 109670453 B CN109670453 B CN 109670453B
- Authority
- CN
- China
- Prior art keywords
- video
- space
- time characteristic
- video space
- time
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
A method of extracting short video topics, comprising: cutting the short video into M video frame-cut pictures; acquiring a video spatial feature vector set of a video frame-cutting picture by using a convolutional neural network in a transfer learning mode; forming a feature vector time sequence by the video space feature vectors according to the playing time sequence, and inputting the feature vector time sequence into a bidirectional cyclic neural network so as to output a video space-time feature sequence set H; adjusting each video space-time characteristic sequence in the H by adopting an attention mechanism so as to obtain a new video space-time characteristic sequence set Q; and expanding the Q into a video space-time characteristic vector Z, carrying out linear transformation on the Z, and then respectively calculating the probability of the short video belonging to each topic by adopting a normalized exponential function so as to extract the topic of the short video. The invention belongs to the technical field of information, can automatically extract subject information from short videos, and effectively reduces the calculation amount.
Description
Technical Field
The invention relates to a method for extracting a short video theme, belonging to the technical field of information.
Background
Short videos are increasingly becoming a way for people to know the world, the complex process of manually labeling the short videos can be greatly reduced by labeling a large number of short videos, and preparation is also made for subsequent short video classification and pushing favorite short videos for users.
Patent application CN 201810496579.3 (application name: a new unsupervised video semantic extraction method, application date: 2018-05-22, applicant: electronic technology university) discloses a new unsupervised video semantic extraction method, which comprises constructing a three-dimensional convolutional neural network model, and training the three-dimensional convolutional neural network model by using a video data set with tags in a video database; processing video data without labels in a video database into data which is in accordance with the input of a three-dimensional convolution neural network by using a sliding window; the generated data is used as input data of a three-dimensional convolution neural network model, and output data of a full connection layer of the three-dimensional convolution neural network model is taken as semantic features of a video segment; and using the generated video segment semantic feature sequence as the input of a video semantic self-encoder, and integrating by a self-encoder to obtain the overall semantic features of the video. According to the technical scheme, the semantic features of the video segments are directly extracted through the three-dimensional convolutional neural network, so that the system efficiency is not high due to the fact that the extremely large calculation amount is caused.
Therefore, how to automatically extract the subject information from the short video and effectively reduce the amount of calculation becomes a technical problem which needs to be solved urgently by technicians.
Disclosure of Invention
In view of the above, the present invention provides a method for extracting short video topics, which can automatically extract topic information from short videos and effectively reduce the amount of computation.
In order to achieve the above object, the present invention provides a method for extracting short video topics, comprising:
step one, cutting short video into M video frame-cut pictures according to a frame length at a certain interval;
step two, adopting a transfer learning mode, and obtaining a video space characteristic vector set Y = [ Y ] of M video frame-cut pictures by using a convolutional neural network 1 ,y 2 ,...,y M ]Wherein, y 1 、y 2 、…、y M Video space feature vectors are obtained by each video frame-cutting picture through a convolutional neural network;
step three, according to the playing time sequence of the short video, forming a feature vector time sequence by the video space feature vectors of M video frame-cut pictures, inputting the feature vector time sequence into a bidirectional cyclic neural network, and outputting a video space-time feature sequence set H = [ H ] 1 ,h 2 ,...,h M ]Wherein h is 1 、h 2 、…、h M Respectively outputting each video space-time characteristic sequence in the video space-time characteristic sequence set H;
step four, adopting an attention mechanism to calculate each video space-time characteristic sequence in the video space-time characteristic sequence set HAttention to other video space-time feature sequences, and adjusting each video space-time feature sequence in the video space-time feature sequence set H according to the attention, thereby obtaining a new video space-time feature sequence set Q = [ Q ] 1 ,q 2 ,...,q M ]Wherein q is 1 、q 2 、…、q M Respectively, video spatial-temporal feature sequences adjusted according to attention;
and fifthly, expanding the new video space-time characteristic sequence set Q into a video space-time characteristic vector Z, carrying out linear transformation on the video space-time characteristic vector Z, and then respectively calculating the probability of the short video belonging to each topic by adopting a normalized exponential function so as to extract the topic of the short video.
Compared with the prior art, the invention has the beneficial effects that: the method comprises the steps of intercepting a certain number of pictures from a short video according to a certain frame length, extracting spatial features of the short video from each picture, then transmitting the features into a Bidirectional LSTM network according to a time sequence for combination, thereby extracting spatial and temporal feature information of the short video, and introducing an attention mechanism to mainly mine a key frame which is most related to the video category.
Drawings
Fig. 1 is a flow chart of a method for extracting short video topics according to the present invention.
Fig. 2 is a flowchart illustrating the detailed steps of step two in fig. 1.
Fig. 3 is a flowchart showing the detailed steps of step four in fig. 1.
Fig. 4 is a flowchart showing the detailed steps of step five in fig. 1.
Detailed Description
To make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings.
As shown in fig. 1, the method for extracting short video topics of the present invention includes:
step one, cutting short video into M video frame-cut pictures according to a frame length at a certain interval, wherein the value of M can be set according to the actual service requirement;
step two, adopting a transfer learning mode, and obtaining a video space characteristic vector set Y = [ Y ] of M video frame-cut pictures by using a convolutional neural network 1 ,y 2 ,...,y M ]Wherein, y 1 、y 2 、…、y M Respectively obtaining video space characteristic vectors of each video frame-cutting picture through a convolutional neural network; through a convolutional neural network, content characteristic information can be extracted from each video frame-cut picture;
step three, according to the playing time sequence of the short video, forming a feature vector time sequence by the video space feature vectors of M video frame-cut pictures, inputting the feature vector time sequence into a bidirectional cyclic neural network, and outputting a video space-time feature sequence set H = [ H ] 1 ,h 2 ,...,h M ]Wherein h is 1 、h 2 、…、h M Respectively outputting each video space-time characteristic sequence in the video space-time characteristic sequence set H; through a bidirectional cyclic neural network, content and time characteristic information can be extracted from all video frame-cut pictures;
step four, calculating the attention of each video space-time characteristic sequence in the video space-time characteristic sequence set H to other video space-time characteristic sequences by adopting an attention mechanism, and adjusting each video space-time characteristic sequence in the video space-time characteristic sequence set H according to the attention so as to obtain a new video space-time characteristic sequence set Q = [ Q ] 1 ,q 2 ,...,q M ]Wherein q is 1 、q 2 、…、q M Respectively video space-time characteristic sequences adjusted according to attention;
and fifthly, expanding the new video space-time characteristic sequence set Q into a video space-time characteristic vector Z, carrying out linear transformation on the video space-time characteristic vector Z, and then respectively calculating the probability of the short video belonging to each topic by adopting a normalized exponential function so as to extract the topic of the short video.
In the second step, considering that the labeled sample pictures in a specific field are fewer and cannot be sufficiently trained, a migration learning method may be adopted, and a convolutional neural network is used to extract content feature information of each video frame-cut picture, as shown in fig. 2, the second step may further include:
Step three, a Bidirectional LSTM algorithm can be adopted, video space feature vectors of all the video frame-cut pictures extracted in step two are input into the model according to time sequence, and a video space-time feature sequence set H = [ H ] is output 1 ,h 2 ,...,h M ]Through the time sequence layer in the third step, the invention can combine the characteristics of the frame-cut pictures of different videos, thereby obtaining the video space-time characteristic sequence.
According to the method, an attention mechanism is added behind the time sequence layer corresponding to the step three, so that the relation between different video space-time characteristic sequences in the video space-time characteristic sequence set H can be captured, and different attention degrees among the video space-time characteristic sequences are fully mined. As shown in fig. 3, the fourth step may further include:
As shown in fig. 4, step five may further include:
and step 55, arranging the L probabilities larger than the probability threshold value according to the descending order of the probabilities, and then outputting the P subjects with the maximum probability as the subjects of the short video.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and should not be taken as limiting the scope of the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.
Claims (6)
1. A method for extracting short video topics is characterized by comprising the following steps:
step one, cutting short video into M video frame-cut pictures according to a frame length at a certain interval;
step two, adopting a transfer learning mode, and obtaining a video space characteristic vector set Y = [ Y ] of M video frame-cut pictures by using a convolutional neural network 1 ,y 2 ,...,y M ]Wherein, y 1 、y 2 、…、y M Video space feature vectors are obtained by each video frame-cutting picture through a convolutional neural network;
step three, according to the playing time sequence of the short video, forming a feature vector time sequence by the video space feature vectors of M video frame-cut pictures, inputting the feature vector time sequence into a bidirectional cyclic neural network, and outputting a video space-time feature sequence set H = [ H ] 1 ,h 2 ,...,h M ]Wherein h is 1 、h 2 、…、h M Respectively outputting each video space-time characteristic sequence in the video space-time characteristic sequence set H;
step four, calculating the attention of each video space-time characteristic sequence in the video space-time characteristic sequence set H to other video space-time characteristic sequences by adopting an attention mechanism, and adjusting each video space-time characteristic sequence in the video space-time characteristic sequence set H according to the attention so as to obtain a new video space-time characteristic sequence set Q = [ Q ] 1 ,q 2 ,...,q M ]Wherein q is 1 、q 2 、…、q M Respectively, video spatial-temporal feature sequences adjusted according to attention;
and fifthly, expanding the new video space-time characteristic sequence set Q into a video space-time characteristic vector Z, carrying out linear transformation on the video space-time characteristic vector Z, and then respectively calculating the probability of the short video belonging to each topic by adopting a normalized exponential function so as to extract the topic of the short video.
2. The method of claim 1, wherein step two further comprises:
step 21, constructing and training an inclusion-v 3 pre-training convolutional neural network model based on an ImageNet public data set by adopting a transfer learning mode, wherein the input of the model is a video frame-cutting picture, and the output of the model is the probability that the video frame-cutting picture belongs to different subjects;
step 22, inputting the M video frame-cut pictures into the convolutional neural network model trained in step 21, extracting the output of the second last layer from the convolutional neural network model as the video space feature vector of each video frame-cut picture, and forming a video space feature vector set Y = [ Y ] by the video space feature vectors of the M video frame-cut pictures 1 ,y 2 ,...,y M ]And the M video frames are taken as the content characteristics of the M video frame-cut pictures.
3. The method of claim 1, wherein step three employs a Bidirectional LSTM algorithm.
4. The method of claim 1, wherein step four further comprises:
step 41, calculating a relation value between every two video space-time characteristic sequences in the video space-time characteristic sequence set H:wherein, f (h) i ,h j ) Is the relation value between the ith video space-time feature sequence and the jth video space-time feature sequence in H, W θ (h i )、/>Are respectively paired with h i 、h j Value after nonlinear transformation, W θ (h i ) T Is to W θ (h i ) Performing transposition;
step 42, respectively calculating the attention of each video space-time characteristic sequence in the video space-time characteristic sequence set H to other video space-time characteristic sequences:wherein, a i j Is the attention of the ith video spatio-temporal feature sequence to the jth video spatio-temporal feature sequence, s i j =f(h i ,h j );
Step 43, adjusting each video space-time characteristic sequence in the video space-time characteristic sequence set H according to the attention, wherein the calculation formula is as follows:wherein q is i Is the ith video spatio-temporal feature sequence, h, adjusted according to attention j Is the jth video space-time characteristic sequence of the video space-time characteristic sequence set H, thereby forming a new video space-time characteristic sequence set.
5. The method of claim 1, wherein step five further comprises:
step 51, fully expanding the new video space-time feature sequence set Q to obtain a video space-time feature vector Z = [ Z ] 1 ,z 2 ,z 3 ,…,z M ]The dimension of each video space-time characteristic sequence in Q is N, and the dimension of a video space-time characteristic vector Z is M multiplied by N;
and 52, performing linear transformation on the video space-time characteristic vector Z through the full connection layer, and then respectively calculating the probability of the short video belonging to each theme by adopting a normalized exponential function:wherein p is k Probability of short video belonging to the kth topic, f w (Z) k 、f w (Z) t The values of the kth class and the t class are obtained after the Z is subjected to linear transformation, and w is a linear function parameter.
6. The method of claim 1, wherein step five further comprises:
step A1, counting the number L of the short videos belonging to each theme, wherein the probability is greater than a probability threshold value, judging whether L is 0, if so, sorting the short videos belonging to each theme in a descending order, outputting P themes with the maximum probability value as the themes of the short videos, and ending the process; if not, continuing the next step;
step A2, judging whether L is smaller than or equal to P, if so, outputting the theme with the probability larger than the probability threshold value as the theme of the short video, and ending the process; if not, continuing the next step;
and A3, arranging the L probabilities larger than the probability threshold value according to a probability descending order, and then outputting the P subjects with the maximum probability as the subjects of the short video.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811567121.9A CN109670453B (en) | 2018-12-20 | 2018-12-20 | Method for extracting short video theme |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811567121.9A CN109670453B (en) | 2018-12-20 | 2018-12-20 | Method for extracting short video theme |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109670453A CN109670453A (en) | 2019-04-23 |
CN109670453B true CN109670453B (en) | 2023-04-07 |
Family
ID=66144078
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811567121.9A Active CN109670453B (en) | 2018-12-20 | 2018-12-20 | Method for extracting short video theme |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109670453B (en) |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110096617B (en) * | 2019-04-29 | 2021-08-10 | 北京百度网讯科技有限公司 | Video classification method and device, electronic equipment and computer-readable storage medium |
CN110070511B (en) * | 2019-04-30 | 2022-01-28 | 北京市商汤科技开发有限公司 | Image processing method and device, electronic device and storage medium |
CN110807369B (en) * | 2019-10-09 | 2024-02-20 | 南京航空航天大学 | Short video content intelligent classification method based on deep learning and attention mechanism |
CN113762571A (en) * | 2020-10-27 | 2021-12-07 | 北京京东尚科信息技术有限公司 | Short video category prediction method, system, electronic device and storage medium |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108921032A (en) * | 2018-06-04 | 2018-11-30 | 四川创意信息技术股份有限公司 | A kind of new video semanteme extracting method based on deep learning model |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9830709B2 (en) * | 2016-03-11 | 2017-11-28 | Qualcomm Incorporated | Video analysis with convolutional attention recurrent neural networks |
CN107038221B (en) * | 2017-03-22 | 2020-11-17 | 杭州电子科技大学 | Video content description method based on semantic information guidance |
CN107341462A (en) * | 2017-06-28 | 2017-11-10 | 电子科技大学 | A kind of video classification methods based on notice mechanism |
CN108024158A (en) * | 2017-11-30 | 2018-05-11 | 天津大学 | There is supervision video abstraction extraction method using visual attention mechanism |
-
2018
- 2018-12-20 CN CN201811567121.9A patent/CN109670453B/en active Active
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108921032A (en) * | 2018-06-04 | 2018-11-30 | 四川创意信息技术股份有限公司 | A kind of new video semanteme extracting method based on deep learning model |
Also Published As
Publication number | Publication date |
---|---|
CN109670453A (en) | 2019-04-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109670453B (en) | Method for extracting short video theme | |
WO2021088510A1 (en) | Video classification method and apparatus, computer, and readable storage medium | |
CN112163594B (en) | Network encryption traffic identification method and device | |
CN108537134B (en) | Video semantic scene segmentation and labeling method | |
Wang et al. | Retweet wars: Tweet popularity prediction via dynamic multimodal regression | |
CN111144448A (en) | Video barrage emotion analysis method based on multi-scale attention convolutional coding network | |
CN111444367B (en) | Image title generation method based on global and local attention mechanism | |
CN101739428B (en) | Method for establishing index for multimedia | |
CN114666663A (en) | Method and apparatus for generating video | |
CN113434716B (en) | Cross-modal information retrieval method and device | |
CN112784929B (en) | Small sample image classification method and device based on double-element group expansion | |
CN111723239B (en) | Video annotation method based on multiple modes | |
CN111401063B (en) | Text processing method and device based on multi-pool network and related equipment | |
CN111597983B (en) | Method for realizing identification of generated false face image based on deep convolutional neural network | |
CN111061837A (en) | Topic identification method, device, equipment and medium | |
CN113064995A (en) | Text multi-label classification method and system based on deep learning of images | |
CN109635647B (en) | Multi-picture multi-face clustering method based on constraint condition | |
CN115203471A (en) | Attention mechanism-based multimode fusion video recommendation method | |
CN111488813A (en) | Video emotion marking method and device, electronic equipment and storage medium | |
US11580979B2 (en) | Methods and systems for pushing audiovisual playlist based on text-attentional convolutional neural network | |
CN111310516A (en) | Behavior identification method and device | |
CN112052869B (en) | User psychological state identification method and system | |
CN107656760A (en) | Data processing method and device, electronic equipment | |
CN113536952A (en) | Video question-answering method based on attention network of motion capture | |
WO2021081741A1 (en) | Image classification method and system employing multi-relationship social network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP01 | Change in the name or title of a patent holder | ||
CP01 | Change in the name or title of a patent holder |
Address after: 310013 4th floor, No.398 Wensan Road, Xihu District, Hangzhou City, Zhejiang Province Patentee after: Xinxun Digital Technology (Hangzhou) Co.,Ltd. Address before: 310013 4th floor, No.398 Wensan Road, Xihu District, Hangzhou City, Zhejiang Province Patentee before: EB Information Technology Ltd. |