CN106529492A - Video topic classification and description method based on multi-image fusion in view of network query - Google Patents

Video topic classification and description method based on multi-image fusion in view of network query Download PDF

Info

Publication number
CN106529492A
CN106529492A CN201611035152.0A CN201611035152A CN106529492A CN 106529492 A CN106529492 A CN 106529492A CN 201611035152 A CN201611035152 A CN 201611035152A CN 106529492 A CN106529492 A CN 106529492A
Authority
CN
China
Prior art keywords
video
text
event
classification
many
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611035152.0A
Other languages
Chinese (zh)
Inventor
冀中
马亚茹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201611035152.0A priority Critical patent/CN106529492A/en
Publication of CN106529492A publication Critical patent/CN106529492A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2323Non-hierarchical techniques based on graph theory, e.g. minimum spanning trees [MST] or graph cuts
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/44Event detection

Abstract

The invention belongs to the technical field of video processing. According to characteristics of multiple video data, video event detection is realized, event text description is formed, and video topic classification and description based on contents in view of network query are realized. According to the technical scheme adopted by the invention, the video topic classification and description method based on multi-image fusion in view of network query comprises steps: 1) in combination with the text information and the visual information of the video, through building a multi-image model, an image cutting method is used for realizing event classification; and 2) a tf-idf or word vector technology is used for extracting key words in a video event, the key words are modified to be perfect by using priori information about the topic from websites such as Wikipedia, and text description on the event is realized. the method of the invention is mainly applied to a video classification occasion.

Description

Network-oriented inquiry is based on the classification of many figure fusion video subjects and description method
Technical field
The invention belongs to technical field of video processing.There is substantial amounts of video data for MultiMedia Field in the present invention, no A kind of the features such as information being easy to needed for user obtains, there is provided subject classification side for realizing multiple videos in same Query Result Method, and on this basis the corresponding keyword of subject distillation under event is described, realize the video of network-oriented inquiry The classification of theme and description.
Background technology
As the fast development of information technology, video data are emerged in multitude, become people obtain information important channel it One.However, due to the sharp increase of number of videos, occurring redundancy and the information for repeating in multitude of video data.In the face of substantial amounts of webpage Video, user wants to obtain correct information becomes extremely difficult.When the topic of dependent event is searched for, most of user's sense is emerging Interesting is staple of conversation event and their development of the topic.But from track of events in substantial amounts of video search result What progress was very difficult to.Therefore, in this case, the massive video data under same subject can be carried out in the urgent need to a kind of The technology integrated, analyze, meets people and wants fast and accurately to browse the demand of video main information, improves people and obtains The ability of information.
Usually, news topic be by occur the specific time, it is specific where, with common focus A series of dependent event compositions.And event is by described by some have identification, representational word.In the past few decades In, in order to improve the efficiency of management of video data, allowing users to quickly and accurately obtain the information that they want, correlation is ground Property of the person of studying carefully for video data information, it is proposed that some methods Internet video classified and is described, but the technology Still in the elementary step.This is mainly due to following reason:1) as visual signature has semantic gap, it is more difficult to visually right Event is classified, and this is accomplished by the classification that Video Events are realized with reference to the text message of video.Upload yet with user Text message be it is limited, and generally have noise, it is fuzzy, incomplete even have it is misleading, hence with Text message to event category and description when with certain error.2) in addition, tag information is retouched just for whole video State, be not that a certain specific video scene or camera lens are described, and to there is theme multifarious for longer video Feature, this brings certain difficulty to the classification of video.
In recent years, with the development of multimedia technology, correlative study person is carried with description problem for the classification of many video subjects Some countermeasures are gone out.Wherein, the event structure for exploring Internet video is a class classical way.The method is first with co-occurrence The text feature of analysis (co-occurrence) model analysis video explores the Text Mode of event.Then by shifting closure Classifiable event, and from the angle of text, give event description.Finally video is detected using the approximate repeating frame detection of video Main matter.And by the Events Fusion with similar vision and text property, realize the exploration and description of event.Although the party Method has certain lifting in the effect that event is explored, but the method explores event from the angle of vision and text respectively, does not have Have while using the structure of multiple modalities detecting event, the multi-modal information not using video during detection is complementary to one another Advantage.
The present invention proposes many graph models, is merged by many figures, realizes visual classification using the method that figure cuts.And utilize Tf-idf extracts the keyword per class event, and event is described.Video multi-modal information is taken full advantage of in this scenario Complementary advantage, preferably realizes the video subject classification based on many figure fusions and description of network-oriented inquiry.
The content of the invention
To overcome the deficiencies in the prior art, it is contemplated that proposing a kind of video master based on content of network-oriented inquiry Topic classification and description.The event detection of video according to the characteristics of many video datas, is realized, the text to event is formed and is described, it is real The video subject classification based on content of existing network-oriented inquiry and description.The technical solution used in the present invention is, network-oriented Based on the classification of many figure fusion video subjects and description method, step is for inquiry, 1) with reference to the text message and visual information of video, By building many graph models, the classification of event is realized using figure blanking method.2) using the reverse document-frequency idf of word frequency tf- or Text depth representing model word2vector extracts the keyword of Video Events, and to keyword using from wikipedia etc. Website is modified with regard to the prior information of the topic, is allowed to perfect, realizes that the text to event is described.
Comprising the concrete steps that in one example,
A topic inquiry is given first, then related content is searched for from the related web sites such as wikipedia, is obtained and is somebody's turn to do The relevant prior information of topic:
M video under same event is given, with T={ t1,t2,...,tMRepresenting the text label under corresponding video Set, tiI-th text feature in expression text collection T, V={ v1,v2,...,vM, wherein viRepresent i-th video and M The vision similarity vector of individual video, and viJ () represents the approximate repeating frame of i-th video and j-th video, vi(i)=0, structure Build many graph model G1=(T, E1),G2=(V, E2), wherein T, V are the vertex set of two figures respectively, E1,E2Side collection, respectively from Text and visual information represent the relation between any two videoIts weight calculation formula is as follows:
Wherein sijIt is the average camera lens number between video i and video j, vi(j) represent i-th video and j-th video it Between it is approximate repeat frame number, carry out many figure fusions using linear fusion technology, its detailed process is expressed as follows with formula:
Wherein α be between (0,1) between positive number, for balancing the relation of before and after two;
Then realize that Video Events are classified by the method that figure cuts, the Text character extraction finally by video file is each Keyword under class sub-topicses, and the keyword set of sub-topicses is carried out with regard to the relevant information of this event according to wikipedia Modification, expansion.
The characteristics of of the invention and beneficial effect are:
The present invention is primarily directed to the shortcoming that the event category of existing many videos and description are present, and design is suitable for many videos The video subject classification based on content of the network-oriented inquiry of data structure feature and description, are allowed to fully using data Peculiar information.Its advantage is mainly reflected in:
(1) novelty:Many graph models are applied to into Video Events classification, the multi-modal information of video is sufficiently used, The event detection of many video sets has been better achieved.
(2) multimode state property:In video sub-topicses detection process, the text message of video is on the one hand make use of to calculate video Between similitude.On the other hand the nearly repeating frame between video is detected using the visual information of video, video is calculated with this Between similitude.The sub-topicses detection for realizing jointly video is combined in terms of two.
(3) validity:Be experimentally confirmed be typically applied to video subject classification with describe method compared with, Video subject classification based on many graph models and the performance of the method for description of present invention design is substantially better than both, therefore more suitable Together in the video subject classification and description of network-oriented inquiry.
(4) practicality:Simple possible, can be used in multimedia signal processing field.
Description of the drawings:
Fig. 1 is the stream of the video subject detection that algorithm is cut based on the figure of many graph models with keyword extraction process of the present invention Cheng Tu.
Specific embodiment
It is an object of the invention to provide the video subject classification based on content and description of a kind of network-oriented inquiry.Root The characteristics of according to many video datas, first, it is proposed that build many graph models using the text message and visual information of video, by figure The cluster that method realizes video such as cut, that is, realize the event detection of video.Then, using tf-idf or text depth representing Keyword of the similar techniques such as the model word2vector extraction per class event, TF word frequency (Term Frequency), IDF is reverse Document-frequency (Inverse Document Frequency).Form the text to event to describe, realize network-oriented inquiry Based on content video subject classification with description.
Method provided by the present invention is broadly divided into two processes:1) with reference to the text message and visual information of video, lead to Cross and build many graph models, the classification that method realizes event such as cut using figure.2) it is similar using tf-idf or word2vector etc. Technology extract Video Events keyword, and to keyword using from the websites such as wikipedia with regard to the topic prior information Modify, be allowed to perfect, realize that the text to event is described.Below its general procedure is described:
A topic inquiry is given first, then related content is searched for from the related web sites such as wikipedia, is obtained and is somebody's turn to do The relevant prior information of topic.
M video under same event is given, with T={ t1,t2,...,tMRepresenting the text label under corresponding video Set, tiI-th text feature in expression text collection T.These text labels are one-to-one with corresponding video.V ={ v1,v2,...,vM, wherein viRepresent the vision similarity vector of i-th video and M video, and viJ () represents i-th The approximate repeating frame of video and j-th video, vi(i)=0.Build many graph model G1=(T, E1),G2=(V, E2), wherein T, V It is the vertex set of two figures respectively, E1,E2Be side collection, the pass between any two video is represented from text and visual information respectively SystemIts weight calculation formula is as follows:
Wherein sijIt is the average camera lens number between video i and video j, vi(j) represent i-th video and j-th video it Between approximate repeat frame number.Many figure fusions are carried out using linear fusion technology, its detailed process is expressed as follows with formula:
Wherein α be between (0,1) between positive number, for balancing the relation of before and after two.
Then realize that Video Events are classified by the method that figure cuts.Text character extraction finally by video file is each Keyword under class sub-topicses.And the keyword set of sub-topicses is carried out with regard to the relevant information of this event according to wikipedia Modification, expansion.
Fig. 1 describes the topic detection process of many videos for being proposed.It is assumed that under same event, have M video,
First, the text feature of text corresponding to video, T={ t are extracted1,t2,...,tMRepresent corresponding to M video Text set, tiRepresent the text feature corresponding to i-th video.Then, the visual signature of frame of video is extracted, and is based on this feature Calculate the similarity between video.Detect approximate between any two video here by the similarity such as minhash detection algorithm Repeat frame number.V={ v1,v2,…vM, wherein viRepresent that i-th video repeats frame number (near with the near of M video Duplicate keyframes) vector that constituted, viJ () represents the nearly repetition frame number of i-th video and j-th video, and vi(i)=0.
Finally, many graph model G are built respectively using the text message and visual information of video1=(T, E1),G2=(V, E2), wherein T, V are the vertex set of two figures respectively, E1,E2It is side collection, represents that any two is regarded from text and visual information respectively Relation between frequencyThat is (1), (2) formula.Average weight coefficient integration technology is recycled to carry out many figure fusions, i.e. (3) formula. Then the detection of sub-topicses is realized by the method that figure cuts.And according to the keyword of the corresponding sub-topicses of tf-idf extractions, realize The video subject classification based on content of network-oriented inquiry and description.

Claims (2)

1. a kind of network-oriented inquiry is characterized in that based on the classification of many figure fusion video subjects and description method step is 1) to tie The text message and visual information of video are closed, and by building many graph models, the classification of event are realized using figure blanking method;2) utilize The reverse document-frequency idf of word frequency tf- or text depth representing model word2vector extracts the keyword of Video Events, and Keyword is modified using the prior information from websites such as wikipedias with regard to the topic, is allowed to perfect, realize to thing The text description of part.
2. network-oriented inquiry as claimed in claim 1 is based on the classification of many figure fusion video subjects and description method, its feature It is, comprising the concrete steps that in an example,
A topic inquiry is given first, is then searched for related content from the related web sites such as wikipedia, is obtained and the topic Relevant prior information:
M video under same event is given, with T={ t1,t2,...,tMRepresenting the text label set under corresponding video, tiI-th text feature in expression text collection T, V={ v1,v2,...,vM, wherein viRepresent i-th video and M video Vision similarity vector, and viJ () represents the approximate repeating frame of i-th video and j-th video, viI ()=0, builds many figures Model G1=(T, E1),G2=(V, E2), wherein T, V are the vertex set of two figures respectively, E1,E2Side collection, respectively from text and Visual information represents the relation between any two videoIts weight calculation formula is as follows:
W i j 1 = exp ( | | t i - t j | | 2 / 2 σ 2 ) - - - ( 1 )
W i j 2 = v i ( j ) / s i j - - - ( 2 )
Wherein sijIt is the average camera lens number between video i and video j, viJ () is represented between i-th video and j-th video It is approximate to repeat frame number, many figure fusions are carried out using linear fusion technology, its detailed process is expressed as follows with formula:
W i j = αW i j 1 + ( 1 - α ) W i j 2 - - - ( 3 )
Wherein α be between (0,1) between positive number, for balancing the relation of before and after two;
Then realize that Video Events are classified by the method that figure cuts, finally by each class of the Text character extraction of video file Keyword under theme, and according to wikipedia with regard to this event relevant information, the keyword set of sub-topicses is modified, Expand.
CN201611035152.0A 2016-11-17 2016-11-17 Video topic classification and description method based on multi-image fusion in view of network query Pending CN106529492A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611035152.0A CN106529492A (en) 2016-11-17 2016-11-17 Video topic classification and description method based on multi-image fusion in view of network query

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611035152.0A CN106529492A (en) 2016-11-17 2016-11-17 Video topic classification and description method based on multi-image fusion in view of network query

Publications (1)

Publication Number Publication Date
CN106529492A true CN106529492A (en) 2017-03-22

Family

ID=58356169

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611035152.0A Pending CN106529492A (en) 2016-11-17 2016-11-17 Video topic classification and description method based on multi-image fusion in view of network query

Country Status (1)

Country Link
CN (1) CN106529492A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108932252A (en) * 2017-05-25 2018-12-04 合网络技术(北京)有限公司 Video aggregation method and device
CN109190471A (en) * 2018-07-27 2019-01-11 天津大学 The attention model method of video monitoring pedestrian search based on natural language description
CN109688428A (en) * 2018-12-13 2019-04-26 连尚(新昌)网络科技有限公司 Video comments generation method and device
CN109933709A (en) * 2019-01-31 2019-06-25 平安科技(深圳)有限公司 Public sentiment tracking, device and the computer equipment of videotext data splitting
CN111259851A (en) * 2020-01-23 2020-06-09 清华大学 Multi-mode event detection method and device
CN114201622A (en) * 2021-12-13 2022-03-18 北京百度网讯科技有限公司 Method and device for acquiring event information, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102332031A (en) * 2011-10-18 2012-01-25 中国科学院自动化研究所 Method for clustering retrieval results based on video collection hierarchical theme structure
CN103778443A (en) * 2014-02-20 2014-05-07 公安部第三研究所 Method for achieving scene analysis description based on theme model method and field rule library
CN104199933A (en) * 2014-09-04 2014-12-10 华中科技大学 Multi-modal information fusion football video event detection and semantic annotation method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102332031A (en) * 2011-10-18 2012-01-25 中国科学院自动化研究所 Method for clustering retrieval results based on video collection hierarchical theme structure
CN103778443A (en) * 2014-02-20 2014-05-07 公安部第三研究所 Method for achieving scene analysis description based on theme model method and field rule library
CN104199933A (en) * 2014-09-04 2014-12-10 华中科技大学 Multi-modal information fusion football video event detection and semantic annotation method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
DONG-QING ZHANG 等: "SEMATNIC VIDEO CLUSTERING ACROSS SOURCES USING BIPARTITE SPECTRAL CLUSTERING", 《IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME)》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108932252A (en) * 2017-05-25 2018-12-04 合网络技术(北京)有限公司 Video aggregation method and device
CN109190471A (en) * 2018-07-27 2019-01-11 天津大学 The attention model method of video monitoring pedestrian search based on natural language description
CN109190471B (en) * 2018-07-27 2021-07-13 天津大学 Attention model method for video monitoring pedestrian search based on natural language description
CN109688428A (en) * 2018-12-13 2019-04-26 连尚(新昌)网络科技有限公司 Video comments generation method and device
CN109688428B (en) * 2018-12-13 2022-01-21 连尚(新昌)网络科技有限公司 Video comment generation method and device
CN109933709A (en) * 2019-01-31 2019-06-25 平安科技(深圳)有限公司 Public sentiment tracking, device and the computer equipment of videotext data splitting
CN109933709B (en) * 2019-01-31 2023-09-26 平安科技(深圳)有限公司 Public opinion tracking method and device for video text combined data and computer equipment
CN111259851A (en) * 2020-01-23 2020-06-09 清华大学 Multi-mode event detection method and device
CN114201622A (en) * 2021-12-13 2022-03-18 北京百度网讯科技有限公司 Method and device for acquiring event information, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
Xue et al. Detecting fake news by exploring the consistency of multimodal data
CN106529492A (en) Video topic classification and description method based on multi-image fusion in view of network query
Papadopoulou et al. A corpus of debunked and verified user-generated videos
Thakkar et al. Approaches for sentiment analysis on twitter: A state-of-art study
Wang et al. Event driven web video summarization by tag localization and key-shot identification
CN111079444A (en) Network rumor detection method based on multi-modal relationship
CN104268160A (en) Evaluation object extraction method based on domain dictionary and semantic roles
CN109272440B (en) Thumbnail generation method and system combining text and image content
Anoop et al. Leveraging heterogeneous data for fake news detection
CN105912684A (en) Cross-media retrieval method based on visual features and semantic features
Shang et al. A duo-generative approach to explainable multimodal covid-19 misinformation detection
Lv et al. Storyrolenet: Social network construction of role relationship in video
CN104376108A (en) Unstructured natural language information extraction method based on 6W semantic annotation
Maynard et al. Multimodal sentiment analysis of social media
Abdali Multi-modal misinformation detection: Approaches, challenges and opportunities
Sreeja et al. A unified model for egocentric video summarization: an instance-based approach
Campbell et al. Content+ context networks for user classification in twitter
Nadeem et al. SSM: Stylometric and semantic similarity oriented multimodal fake news detection
Unal et al. Visual persuasion in covid-19 social media content: A multi-modal characterization
Maynard et al. Entity-based opinion mining from text and multimedia
Kuppusamy et al. CaSePer: An efficient model for personalized web page change detection based on segmentation
Zhang et al. Towards better graph representation: Two-branch collaborative graph neural networks for multimodal marketing intention detection
Tian et al. Research on image classification based on a combination of text and visual features
Brenner et al. Multimodal detection, retrieval and classification of social events in web photo collections
Arya et al. Predicting behavioural patterns in discussion forums using deep learning on hypergraphs

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20170322

RJ01 Rejection of invention patent application after publication