CN106529492A - Video topic classification and description method based on multi-image fusion in view of network query - Google Patents
Video topic classification and description method based on multi-image fusion in view of network query Download PDFInfo
- Publication number
- CN106529492A CN106529492A CN201611035152.0A CN201611035152A CN106529492A CN 106529492 A CN106529492 A CN 106529492A CN 201611035152 A CN201611035152 A CN 201611035152A CN 106529492 A CN106529492 A CN 106529492A
- Authority
- CN
- China
- Prior art keywords
- video
- text
- event
- classification
- many
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2323—Non-hierarchical techniques based on graph theory, e.g. minimum spanning trees [MST] or graph cuts
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/44—Event detection
Abstract
The invention belongs to the technical field of video processing. According to characteristics of multiple video data, video event detection is realized, event text description is formed, and video topic classification and description based on contents in view of network query are realized. According to the technical scheme adopted by the invention, the video topic classification and description method based on multi-image fusion in view of network query comprises steps: 1) in combination with the text information and the visual information of the video, through building a multi-image model, an image cutting method is used for realizing event classification; and 2) a tf-idf or word vector technology is used for extracting key words in a video event, the key words are modified to be perfect by using priori information about the topic from websites such as Wikipedia, and text description on the event is realized. the method of the invention is mainly applied to a video classification occasion.
Description
Technical field
The invention belongs to technical field of video processing.There is substantial amounts of video data for MultiMedia Field in the present invention, no
A kind of the features such as information being easy to needed for user obtains, there is provided subject classification side for realizing multiple videos in same Query Result
Method, and on this basis the corresponding keyword of subject distillation under event is described, realize the video of network-oriented inquiry
The classification of theme and description.
Background technology
As the fast development of information technology, video data are emerged in multitude, become people obtain information important channel it
One.However, due to the sharp increase of number of videos, occurring redundancy and the information for repeating in multitude of video data.In the face of substantial amounts of webpage
Video, user wants to obtain correct information becomes extremely difficult.When the topic of dependent event is searched for, most of user's sense is emerging
Interesting is staple of conversation event and their development of the topic.But from track of events in substantial amounts of video search result
What progress was very difficult to.Therefore, in this case, the massive video data under same subject can be carried out in the urgent need to a kind of
The technology integrated, analyze, meets people and wants fast and accurately to browse the demand of video main information, improves people and obtains
The ability of information.
Usually, news topic be by occur the specific time, it is specific where, with common focus
A series of dependent event compositions.And event is by described by some have identification, representational word.In the past few decades
In, in order to improve the efficiency of management of video data, allowing users to quickly and accurately obtain the information that they want, correlation is ground
Property of the person of studying carefully for video data information, it is proposed that some methods Internet video classified and is described, but the technology
Still in the elementary step.This is mainly due to following reason:1) as visual signature has semantic gap, it is more difficult to visually right
Event is classified, and this is accomplished by the classification that Video Events are realized with reference to the text message of video.Upload yet with user
Text message be it is limited, and generally have noise, it is fuzzy, incomplete even have it is misleading, hence with
Text message to event category and description when with certain error.2) in addition, tag information is retouched just for whole video
State, be not that a certain specific video scene or camera lens are described, and to there is theme multifarious for longer video
Feature, this brings certain difficulty to the classification of video.
In recent years, with the development of multimedia technology, correlative study person is carried with description problem for the classification of many video subjects
Some countermeasures are gone out.Wherein, the event structure for exploring Internet video is a class classical way.The method is first with co-occurrence
The text feature of analysis (co-occurrence) model analysis video explores the Text Mode of event.Then by shifting closure
Classifiable event, and from the angle of text, give event description.Finally video is detected using the approximate repeating frame detection of video
Main matter.And by the Events Fusion with similar vision and text property, realize the exploration and description of event.Although the party
Method has certain lifting in the effect that event is explored, but the method explores event from the angle of vision and text respectively, does not have
Have while using the structure of multiple modalities detecting event, the multi-modal information not using video during detection is complementary to one another
Advantage.
The present invention proposes many graph models, is merged by many figures, realizes visual classification using the method that figure cuts.And utilize
Tf-idf extracts the keyword per class event, and event is described.Video multi-modal information is taken full advantage of in this scenario
Complementary advantage, preferably realizes the video subject classification based on many figure fusions and description of network-oriented inquiry.
The content of the invention
To overcome the deficiencies in the prior art, it is contemplated that proposing a kind of video master based on content of network-oriented inquiry
Topic classification and description.The event detection of video according to the characteristics of many video datas, is realized, the text to event is formed and is described, it is real
The video subject classification based on content of existing network-oriented inquiry and description.The technical solution used in the present invention is, network-oriented
Based on the classification of many figure fusion video subjects and description method, step is for inquiry, 1) with reference to the text message and visual information of video,
By building many graph models, the classification of event is realized using figure blanking method.2) using the reverse document-frequency idf of word frequency tf- or
Text depth representing model word2vector extracts the keyword of Video Events, and to keyword using from wikipedia etc.
Website is modified with regard to the prior information of the topic, is allowed to perfect, realizes that the text to event is described.
Comprising the concrete steps that in one example,
A topic inquiry is given first, then related content is searched for from the related web sites such as wikipedia, is obtained and is somebody's turn to do
The relevant prior information of topic:
M video under same event is given, with T={ t1,t2,...,tMRepresenting the text label under corresponding video
Set, tiI-th text feature in expression text collection T, V={ v1,v2,...,vM, wherein viRepresent i-th video and M
The vision similarity vector of individual video, and viJ () represents the approximate repeating frame of i-th video and j-th video, vi(i)=0, structure
Build many graph model G1=(T, E1),G2=(V, E2), wherein T, V are the vertex set of two figures respectively, E1,E2Side collection, respectively from
Text and visual information represent the relation between any two videoIts weight calculation formula is as follows:
Wherein sijIt is the average camera lens number between video i and video j, vi(j) represent i-th video and j-th video it
Between it is approximate repeat frame number, carry out many figure fusions using linear fusion technology, its detailed process is expressed as follows with formula:
Wherein α be between (0,1) between positive number, for balancing the relation of before and after two;
Then realize that Video Events are classified by the method that figure cuts, the Text character extraction finally by video file is each
Keyword under class sub-topicses, and the keyword set of sub-topicses is carried out with regard to the relevant information of this event according to wikipedia
Modification, expansion.
The characteristics of of the invention and beneficial effect are:
The present invention is primarily directed to the shortcoming that the event category of existing many videos and description are present, and design is suitable for many videos
The video subject classification based on content of the network-oriented inquiry of data structure feature and description, are allowed to fully using data
Peculiar information.Its advantage is mainly reflected in:
(1) novelty:Many graph models are applied to into Video Events classification, the multi-modal information of video is sufficiently used,
The event detection of many video sets has been better achieved.
(2) multimode state property:In video sub-topicses detection process, the text message of video is on the one hand make use of to calculate video
Between similitude.On the other hand the nearly repeating frame between video is detected using the visual information of video, video is calculated with this
Between similitude.The sub-topicses detection for realizing jointly video is combined in terms of two.
(3) validity:Be experimentally confirmed be typically applied to video subject classification with describe method compared with,
Video subject classification based on many graph models and the performance of the method for description of present invention design is substantially better than both, therefore more suitable
Together in the video subject classification and description of network-oriented inquiry.
(4) practicality:Simple possible, can be used in multimedia signal processing field.
Description of the drawings:
Fig. 1 is the stream of the video subject detection that algorithm is cut based on the figure of many graph models with keyword extraction process of the present invention
Cheng Tu.
Specific embodiment
It is an object of the invention to provide the video subject classification based on content and description of a kind of network-oriented inquiry.Root
The characteristics of according to many video datas, first, it is proposed that build many graph models using the text message and visual information of video, by figure
The cluster that method realizes video such as cut, that is, realize the event detection of video.Then, using tf-idf or text depth representing
Keyword of the similar techniques such as the model word2vector extraction per class event, TF word frequency (Term Frequency), IDF is reverse
Document-frequency (Inverse Document Frequency).Form the text to event to describe, realize network-oriented inquiry
Based on content video subject classification with description.
Method provided by the present invention is broadly divided into two processes:1) with reference to the text message and visual information of video, lead to
Cross and build many graph models, the classification that method realizes event such as cut using figure.2) it is similar using tf-idf or word2vector etc.
Technology extract Video Events keyword, and to keyword using from the websites such as wikipedia with regard to the topic prior information
Modify, be allowed to perfect, realize that the text to event is described.Below its general procedure is described:
A topic inquiry is given first, then related content is searched for from the related web sites such as wikipedia, is obtained and is somebody's turn to do
The relevant prior information of topic.
M video under same event is given, with T={ t1,t2,...,tMRepresenting the text label under corresponding video
Set, tiI-th text feature in expression text collection T.These text labels are one-to-one with corresponding video.V
={ v1,v2,...,vM, wherein viRepresent the vision similarity vector of i-th video and M video, and viJ () represents i-th
The approximate repeating frame of video and j-th video, vi(i)=0.Build many graph model G1=(T, E1),G2=(V, E2), wherein T, V
It is the vertex set of two figures respectively, E1,E2Be side collection, the pass between any two video is represented from text and visual information respectively
SystemIts weight calculation formula is as follows:
Wherein sijIt is the average camera lens number between video i and video j, vi(j) represent i-th video and j-th video it
Between approximate repeat frame number.Many figure fusions are carried out using linear fusion technology, its detailed process is expressed as follows with formula:
Wherein α be between (0,1) between positive number, for balancing the relation of before and after two.
Then realize that Video Events are classified by the method that figure cuts.Text character extraction finally by video file is each
Keyword under class sub-topicses.And the keyword set of sub-topicses is carried out with regard to the relevant information of this event according to wikipedia
Modification, expansion.
Fig. 1 describes the topic detection process of many videos for being proposed.It is assumed that under same event, have M video,
First, the text feature of text corresponding to video, T={ t are extracted1,t2,...,tMRepresent corresponding to M video
Text set, tiRepresent the text feature corresponding to i-th video.Then, the visual signature of frame of video is extracted, and is based on this feature
Calculate the similarity between video.Detect approximate between any two video here by the similarity such as minhash detection algorithm
Repeat frame number.V={ v1,v2,…vM, wherein viRepresent that i-th video repeats frame number (near with the near of M video
Duplicate keyframes) vector that constituted, viJ () represents the nearly repetition frame number of i-th video and j-th video, and
vi(i)=0.
Finally, many graph model G are built respectively using the text message and visual information of video1=(T, E1),G2=(V,
E2), wherein T, V are the vertex set of two figures respectively, E1,E2It is side collection, represents that any two is regarded from text and visual information respectively
Relation between frequencyThat is (1), (2) formula.Average weight coefficient integration technology is recycled to carry out many figure fusions, i.e. (3) formula.
Then the detection of sub-topicses is realized by the method that figure cuts.And according to the keyword of the corresponding sub-topicses of tf-idf extractions, realize
The video subject classification based on content of network-oriented inquiry and description.
Claims (2)
1. a kind of network-oriented inquiry is characterized in that based on the classification of many figure fusion video subjects and description method step is 1) to tie
The text message and visual information of video are closed, and by building many graph models, the classification of event are realized using figure blanking method;2) utilize
The reverse document-frequency idf of word frequency tf- or text depth representing model word2vector extracts the keyword of Video Events, and
Keyword is modified using the prior information from websites such as wikipedias with regard to the topic, is allowed to perfect, realize to thing
The text description of part.
2. network-oriented inquiry as claimed in claim 1 is based on the classification of many figure fusion video subjects and description method, its feature
It is, comprising the concrete steps that in an example,
A topic inquiry is given first, is then searched for related content from the related web sites such as wikipedia, is obtained and the topic
Relevant prior information:
M video under same event is given, with T={ t1,t2,...,tMRepresenting the text label set under corresponding video,
tiI-th text feature in expression text collection T, V={ v1,v2,...,vM, wherein viRepresent i-th video and M video
Vision similarity vector, and viJ () represents the approximate repeating frame of i-th video and j-th video, viI ()=0, builds many figures
Model G1=(T, E1),G2=(V, E2), wherein T, V are the vertex set of two figures respectively, E1,E2Side collection, respectively from text and
Visual information represents the relation between any two videoIts weight calculation formula is as follows:
Wherein sijIt is the average camera lens number between video i and video j, viJ () is represented between i-th video and j-th video
It is approximate to repeat frame number, many figure fusions are carried out using linear fusion technology, its detailed process is expressed as follows with formula:
Wherein α be between (0,1) between positive number, for balancing the relation of before and after two;
Then realize that Video Events are classified by the method that figure cuts, finally by each class of the Text character extraction of video file
Keyword under theme, and according to wikipedia with regard to this event relevant information, the keyword set of sub-topicses is modified,
Expand.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611035152.0A CN106529492A (en) | 2016-11-17 | 2016-11-17 | Video topic classification and description method based on multi-image fusion in view of network query |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611035152.0A CN106529492A (en) | 2016-11-17 | 2016-11-17 | Video topic classification and description method based on multi-image fusion in view of network query |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106529492A true CN106529492A (en) | 2017-03-22 |
Family
ID=58356169
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611035152.0A Pending CN106529492A (en) | 2016-11-17 | 2016-11-17 | Video topic classification and description method based on multi-image fusion in view of network query |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106529492A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108932252A (en) * | 2017-05-25 | 2018-12-04 | 合网络技术(北京)有限公司 | Video aggregation method and device |
CN109190471A (en) * | 2018-07-27 | 2019-01-11 | 天津大学 | The attention model method of video monitoring pedestrian search based on natural language description |
CN109688428A (en) * | 2018-12-13 | 2019-04-26 | 连尚(新昌)网络科技有限公司 | Video comments generation method and device |
CN109933709A (en) * | 2019-01-31 | 2019-06-25 | 平安科技(深圳)有限公司 | Public sentiment tracking, device and the computer equipment of videotext data splitting |
CN111259851A (en) * | 2020-01-23 | 2020-06-09 | 清华大学 | Multi-mode event detection method and device |
CN114201622A (en) * | 2021-12-13 | 2022-03-18 | 北京百度网讯科技有限公司 | Method and device for acquiring event information, electronic equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102332031A (en) * | 2011-10-18 | 2012-01-25 | 中国科学院自动化研究所 | Method for clustering retrieval results based on video collection hierarchical theme structure |
CN103778443A (en) * | 2014-02-20 | 2014-05-07 | 公安部第三研究所 | Method for achieving scene analysis description based on theme model method and field rule library |
CN104199933A (en) * | 2014-09-04 | 2014-12-10 | 华中科技大学 | Multi-modal information fusion football video event detection and semantic annotation method |
-
2016
- 2016-11-17 CN CN201611035152.0A patent/CN106529492A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102332031A (en) * | 2011-10-18 | 2012-01-25 | 中国科学院自动化研究所 | Method for clustering retrieval results based on video collection hierarchical theme structure |
CN103778443A (en) * | 2014-02-20 | 2014-05-07 | 公安部第三研究所 | Method for achieving scene analysis description based on theme model method and field rule library |
CN104199933A (en) * | 2014-09-04 | 2014-12-10 | 华中科技大学 | Multi-modal information fusion football video event detection and semantic annotation method |
Non-Patent Citations (1)
Title |
---|
DONG-QING ZHANG 等: "SEMATNIC VIDEO CLUSTERING ACROSS SOURCES USING BIPARTITE SPECTRAL CLUSTERING", 《IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME)》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108932252A (en) * | 2017-05-25 | 2018-12-04 | 合网络技术(北京)有限公司 | Video aggregation method and device |
CN109190471A (en) * | 2018-07-27 | 2019-01-11 | 天津大学 | The attention model method of video monitoring pedestrian search based on natural language description |
CN109190471B (en) * | 2018-07-27 | 2021-07-13 | 天津大学 | Attention model method for video monitoring pedestrian search based on natural language description |
CN109688428A (en) * | 2018-12-13 | 2019-04-26 | 连尚(新昌)网络科技有限公司 | Video comments generation method and device |
CN109688428B (en) * | 2018-12-13 | 2022-01-21 | 连尚(新昌)网络科技有限公司 | Video comment generation method and device |
CN109933709A (en) * | 2019-01-31 | 2019-06-25 | 平安科技(深圳)有限公司 | Public sentiment tracking, device and the computer equipment of videotext data splitting |
CN109933709B (en) * | 2019-01-31 | 2023-09-26 | 平安科技(深圳)有限公司 | Public opinion tracking method and device for video text combined data and computer equipment |
CN111259851A (en) * | 2020-01-23 | 2020-06-09 | 清华大学 | Multi-mode event detection method and device |
CN114201622A (en) * | 2021-12-13 | 2022-03-18 | 北京百度网讯科技有限公司 | Method and device for acquiring event information, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Xue et al. | Detecting fake news by exploring the consistency of multimodal data | |
CN106529492A (en) | Video topic classification and description method based on multi-image fusion in view of network query | |
Papadopoulou et al. | A corpus of debunked and verified user-generated videos | |
Thakkar et al. | Approaches for sentiment analysis on twitter: A state-of-art study | |
Wang et al. | Event driven web video summarization by tag localization and key-shot identification | |
CN111079444A (en) | Network rumor detection method based on multi-modal relationship | |
CN104268160A (en) | Evaluation object extraction method based on domain dictionary and semantic roles | |
CN109272440B (en) | Thumbnail generation method and system combining text and image content | |
Anoop et al. | Leveraging heterogeneous data for fake news detection | |
CN105912684A (en) | Cross-media retrieval method based on visual features and semantic features | |
Shang et al. | A duo-generative approach to explainable multimodal covid-19 misinformation detection | |
Lv et al. | Storyrolenet: Social network construction of role relationship in video | |
CN104376108A (en) | Unstructured natural language information extraction method based on 6W semantic annotation | |
Maynard et al. | Multimodal sentiment analysis of social media | |
Abdali | Multi-modal misinformation detection: Approaches, challenges and opportunities | |
Sreeja et al. | A unified model for egocentric video summarization: an instance-based approach | |
Campbell et al. | Content+ context networks for user classification in twitter | |
Nadeem et al. | SSM: Stylometric and semantic similarity oriented multimodal fake news detection | |
Unal et al. | Visual persuasion in covid-19 social media content: A multi-modal characterization | |
Maynard et al. | Entity-based opinion mining from text and multimedia | |
Kuppusamy et al. | CaSePer: An efficient model for personalized web page change detection based on segmentation | |
Zhang et al. | Towards better graph representation: Two-branch collaborative graph neural networks for multimodal marketing intention detection | |
Tian et al. | Research on image classification based on a combination of text and visual features | |
Brenner et al. | Multimodal detection, retrieval and classification of social events in web photo collections | |
Arya et al. | Predicting behavioural patterns in discussion forums using deep learning on hypergraphs |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170322 |
|
RJ01 | Rejection of invention patent application after publication |