CN102222101A - Method for video semantic mining - Google Patents

Method for video semantic mining Download PDF

Info

Publication number
CN102222101A
CN102222101A CN 201110168952 CN201110168952A CN102222101A CN 102222101 A CN102222101 A CN 102222101A CN 201110168952 CN201110168952 CN 201110168952 CN 201110168952 A CN201110168952 A CN 201110168952A CN 102222101 A CN102222101 A CN 102222101A
Authority
CN
China
Prior art keywords
video
recognition
semantic
words
vertices
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 201110168952
Other languages
Chinese (zh)
Inventor
张师林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
North China University of Technology
Original Assignee
North China University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by North China University of Technology filed Critical North China University of Technology
Priority to CN 201110168952 priority Critical patent/CN102222101A/en
Publication of CN102222101A publication Critical patent/CN102222101A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Character Discrimination (AREA)

Abstract

本发明涉及一种视频语义挖掘方法,该方法首先对待处理的视频进行中文连续语音识别、视频目标识别和视频文字识别,然后对于识别结果进行中文分词和词性标注,并保留名词和动词作为图模型的顶点,顶点之间的边权重设置为两个顶点所代表的词语的中文语义距离,最后根据稠密子图发现算法挖掘视频的语义信息。本发明的特点是,利用中文连续语音识别、视频目标识别和视频文字识别三种识别结果的融合实现视频的语义挖掘;把视频表达为一个图模型,顶点为视频中的词语,边的权重设置为两个顶点的语义距离;进一步把视频语义挖掘算法转化为图模型的稠密子图发现算法。本发明解决了中文连续语音识别、视频目标识别和视频文字识别过程中的单识别结果错误率高和多识别结果不能有效融合的问题;解决了视频的结构化表达问题和视频语义挖掘的算法实现问题。本发明可以用来对批量视频进行自动标注、分类和语义挖掘。The invention relates to a video semantic mining method. The method first performs Chinese continuous speech recognition, video target recognition and video text recognition on the video to be processed, and then performs Chinese word segmentation and part-of-speech tagging on the recognition results, and retains nouns and verbs as graph models. The vertices, the edge weights between the vertices are set to the Chinese semantic distance of the words represented by the two vertices, and finally the semantic information of the video is mined according to the dense subgraph discovery algorithm. The present invention is characterized in that it realizes the semantic mining of video through the fusion of three recognition results of Chinese continuous speech recognition, video target recognition and video text recognition; the video is expressed as a graph model, the vertices are the words in the video, and the weights of the edges are set is the semantic distance between two vertices; further transform the video semantic mining algorithm into a dense subgraph discovery algorithm of the graph model. The invention solves the problems of high error rate of single recognition results and ineffective fusion of multiple recognition results in the process of Chinese continuous speech recognition, video target recognition and video text recognition; it solves the problem of structured expression of video and the algorithm realization of video semantic mining question. The invention can be used for automatic labeling, classification and semantic mining of batch videos.

Description

一种视频语义挖掘方法A Video Semantic Mining Method

技术领域technical field

本发明涉及数字媒体和机器学习领域,它对于用户输入的视频进行语义分析,通过融合语音、文字和图像信息对于视频进行语义标注。The invention relates to the fields of digital media and machine learning. It performs semantic analysis on video input by users, and performs semantic annotation on the video by fusing voice, text and image information.

背景技术Background technique

随着在线视频分享网站和视频处理技术的发展,大量视频格式的内容涌现出来。由于视频是非格式化的数据并且缺少必要的描述信息,因此并不能像文本那样很容易地进行处理。对于视频进行人工语义标注又耗时耗力,不能满足批量视频处理的要求。基于内容的视频处理技术是目前的研究热点,但是现有技术对于视频内容的标注错误率高,并且没有综合考虑图像、文字和语音多方面内容的有效融合。目前图像目标识别技术逐渐成熟起来,在视觉目标类别分类挑战赛中,图像目标识别已经到达实用的程度。连续语音识别技术使得语音信号可以转录为文本。视频文字识别可以把视频中的嵌入文字识别出来,作为文本文字来处理。结合以上三种识别技术,视频语义分析需要一种有效的融合方法。“知网”是一个中文语义辞典,利用“知网”中的概念层次关系,可以计算两个词语之间的语义距离。根据语义距离,可以对三种识别结果进行语义度量。根据视频图像、文字和语音这三种模态信息的高度相关性,可以有效融合不同模态信息,去除识别错误信息。图模型由顶点和边构成,可以表达整个视频中概念的关系。稠密子图发现算法可以实现在视频图模型中发现语义聚集关系,达到视频语义标注的目的。With the development of online video sharing sites and video processing technology, a large number of content in video formats has emerged. Because video is unformatted data and lacks necessary descriptive information, it cannot be processed as easily as text. Manual semantic annotation of videos is time-consuming and labor-intensive, and cannot meet the requirements of batch video processing. Content-based video processing technology is a current research hotspot, but the existing technology has a high error rate for video content labeling, and does not comprehensively consider the effective integration of image, text and voice content. At present, the image target recognition technology is gradually mature. In the visual target category classification challenge, the image target recognition has reached a practical level. Continuous speech recognition technology enables speech signals to be transcribed into text. Video text recognition can recognize the embedded text in the video and process it as text. Combining the above three recognition techniques, video semantic analysis requires an effective fusion method. "HowNet" is a Chinese semantic dictionary, which can calculate the semantic distance between two words by using the conceptual hierarchical relationship in "HowNet". According to the semantic distance, the semantic measurement of the three kinds of recognition results can be carried out. According to the high correlation of the three modal information of video image, text and voice, different modal information can be effectively fused to remove recognition error information. The graph model is composed of vertices and edges, which can express the relationship of concepts in the whole video. The dense subgraph discovery algorithm can realize the discovery of semantic aggregation relationship in the video graph model, and achieve the purpose of video semantic annotation.

发明内容Contents of the invention

现有的基于内容的视频处理技术,并没有完全利用图像、语音和文字三个高层语义方面的信息,并且不能在高层语义上进行视频分类和挖掘。为了解决现有技术问题的不足,本发明提出一种对视频进行语义挖掘的方法。The existing content-based video processing technology does not fully utilize the three high-level semantic information of image, voice and text, and cannot perform video classification and mining on high-level semantics. In order to solve the deficiencies of the existing technical problems, the present invention proposes a method for semantic mining of videos.

为了达成所述目的,本发明提供一种视频表达和挖掘的方法,其技术方案包括如下步骤:In order to achieve the stated purpose, the present invention provides a method for video expression and mining, and its technical solution includes the following steps:

步骤S1:对于待处理的视频,分别进行中文连续语音识别、视频目标识别和视频文字识别;Step S1: For the video to be processed, respectively perform Chinese continuous speech recognition, video object recognition and video text recognition;

步骤S2:对于步骤S1所述的三种识别结果,各自表达为一个文字向量,共同组成一个张量以表达视频;Step S2: For the three kinds of recognition results described in step S1, each is expressed as a text vector, which together form a tensor to express the video;

步骤S3:对于步骤S2中的三个文字向量,分别进行中文分词和词性标注,保留名词和动词;Step S3: For the three text vectors in step S2, respectively perform Chinese word segmentation and part-of-speech tagging, and retain nouns and verbs;

步骤S4:构造图模型来表达视频,其中图的顶点为S3中所得到的名词和动词,图的边权重设置为两个顶点所代表的中文词语的语义距离;Step S4: Construct a graph model to express the video, wherein the vertices of the graph are the nouns and verbs obtained in S3, and the edge weights of the graph are set to the semantic distance of the Chinese words represented by the two vertices;

步骤S5:对于步骤S4所构造的图模型,使用稠密子图发现算法挖掘图模型中的语义。Step S5: For the graph model constructed in step S4, use the dense subgraph discovery algorithm to mine the semantics in the graph model.

本发明的有益效果:对于视频可以实现自动的语义标注、自动分类和视频相似度度量。对于海量视频数据,借助于本技术可以避免手工标注所带来的枯燥繁琐的劳动。本发明有效融合了中文连续语音识别、中文文字识别和图像目标识别的结果,通过把视频表达为一个图模型而展现了视频中各个语义概念的语义距离关系,这个距离关系是通过基于“知网”的语义距离度量来实现的;最后通过稠密子图发现算法可以实现视频中语义概念的标注和挖掘。The beneficial effect of the present invention is that automatic semantic annotation, automatic classification and video similarity measurement can be realized for videos. For massive video data, with the help of this technology, the boring and tedious labor caused by manual labeling can be avoided. The present invention effectively integrates the results of Chinese continuous speech recognition, Chinese text recognition and image target recognition, and shows the semantic distance relationship of each semantic concept in the video by expressing the video as a graphical model. "Semantic distance measurement to achieve; Finally, the dense subgraph discovery algorithm can realize the labeling and mining of semantic concepts in videos.

附图说明Description of drawings

图1是本发明的视频处理整体流程图。FIG. 1 is an overall flow chart of video processing in the present invention.

图2是本发明的中文连续语音识别流程图。Fig. 2 is the Chinese continuous speech recognition flowchart of the present invention.

图3是本发明的视频文字识别流程图。Fig. 3 is a flow chart of video text recognition in the present invention.

图4是本发明的图像目标识别流程图。Fig. 4 is a flow chart of image target recognition in the present invention.

图5是本发明的语义距离度量层级关系图。Fig. 5 is a hierarchical relationship diagram of semantic distance measurement in the present invention.

图6是本发明的视频稠密子图挖掘表示。Figure 6 is a video dense subgraph mining representation of the present invention.

图7是本发明的视频标注结果。Fig. 7 is the video labeling result of the present invention.

具体实施方式Detailed ways

下面结合附图详细说明本发明技术方案中所涉及的各个细节问题。应指出的是,所描述的实施例仅旨在便于对本发明的理解,而对其不起任何限定作用。Various details involved in the technical solution of the present invention will be described in detail below in conjunction with the accompanying drawings. It should be pointed out that the described embodiments are only intended to facilitate the understanding of the present invention, rather than limiting it in any way.

本发明提出了一种视频语义挖掘的方法,如图1所示,该方法在处理流程上分为四层。最下边是视频库层,存放了各种形式的视频资源;视频库上边一层是多模态融合层,在这一层完成对于视频的结构分析和图像、文字以及语音的识别和有效融合;再往上一层是视频挖掘层,在该层实现对于视频的图模型表示和基于稠密子图发现的视频挖掘算法,此外还可以根据支持向量机模型实现视频分类挖掘;最上层是对用户提供的透明的智能视频服务层;最右侧是基于“知网”的语义计算支持层。根据上述流程,具体的实施步骤如下所示:The present invention proposes a method for video semantic mining. As shown in FIG. 1 , the method is divided into four layers in the processing flow. The bottom layer is the video library layer, which stores various forms of video resources; the upper layer of the video library is the multi-modal fusion layer, where the structural analysis of the video and the recognition and effective fusion of images, text, and voice are completed; The next upper layer is the video mining layer, which realizes the graphical model representation of video and the video mining algorithm based on dense subgraph discovery. In addition, video classification and mining can be realized based on the support vector machine model; the top layer is provided to users. The transparent intelligent video service layer; the far right is the semantic computing support layer based on "HowNet". According to the above process, the specific implementation steps are as follows:

1、视频预处理1. Video preprocessing

对于待处理的视频进行镜头分割,然后对于每个镜头提取关键帧,并把这些关键帧保存下来供后续图像目标识别使用;对于视频中的音频信号,按照16KHZ,16bit的要求采样,并且保存成wav格式供后续语音识别使用。Carry out shot segmentation for the video to be processed, then extract key frames for each shot, and save these key frames for subsequent image target recognition; for the audio signal in the video, sample according to the requirements of 16KHZ, 16bit, and save as The wav format is used for subsequent speech recognition.

2、图像目标识别2. Image target recognition

首先下载视觉目标分类挑战赛图片库(PASCAL VOC Challenge 2010 Database,http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2010/index.html),分别提取图片库中每个图片的局部梯度特征(HOG),采取均值聚类算法(k-means)对图片库中的特征进行聚类,类别数可以定为1000个,这样就形成了1000个视觉单词,然后使用这1000个视觉单词描述每个图片,此时每个图片就构成一个词袋(bag of words)作为中间特征,最后使用支持向量机(SVM)方法在图片词袋特征上训练得到20个视觉类别的分类模型,这20个类别分别是:人、鸟、猫、牛、狗、马、羊、飞机、船、自行车、摩托车、火车、轿车、公共汽车、瓶子、椅子、饭桌、花盆、沙发、电视机。最后使用这些分类模型对于视频关键帧图像进行目标识别,识别结果保存成一个文本,记为TextOBJECT。本步骤处理流程参见图2。First download the Visual Object Classification Challenge picture library (PASCAL VOC Challenge 2010 Database, http://pascallin.ecs.soton.ac.uk/challenges/VOC/voc2010/index.html), and extract the Local gradient features (HOG), using mean value clustering algorithm (k-means) to cluster the features in the picture library, the number of categories can be set to 1000, thus forming 1000 visual words, and then using these 1000 visual words Words describe each picture. At this time, each picture constitutes a bag of words (bag of words) as an intermediate feature. Finally, the support vector machine (SVM) method is used to train the classification model of 20 visual categories on the bag of words feature of the picture. The 20 categories are: people, birds, cats, cows, dogs, horses, sheep, airplanes, boats, bicycles, motorcycles, trains, cars, buses, bottles, chairs, dining tables, flower pots, sofas, TV sets . Finally, these classification models are used to perform target recognition on video key frame images, and the recognition results are saved as a text, which is recorded as Text OBJECT . Refer to Figure 2 for the processing flow of this step.

3、中文连续语音识别3. Chinese continuous speech recognition

首先下载史芬克斯连续语音识别系统开源代码和配套的汉语语言模型、汉语声学模型以及汉语词表文件(Sphinx,http://sphinx.sourceforge.net)。对于视频预处理所得到的音频信号进行连续语音识别,把音频信号转录成文本,记为TextASR。本步骤处理流程参见图3。First download the open source code of the Sphinx continuous speech recognition system and the supporting Chinese language model, Chinese acoustic model and Chinese vocabulary files (Sphinx, http://sphinx.sourceforge.net). Continuous speech recognition is performed on the audio signal obtained by video preprocessing, and the audio signal is transcribed into text, which is recorded as Text ASR . Refer to Figure 3 for the processing flow of this step.

1、视频文字识别1. Video text recognition

视频图像中的文字定位,首先基于字符笔画的双边缘模型得到候选文字区域,然后对候选文字区域进行分解得到精确定位的文本块。视频中的文字提取算法每隔若干视频帧取一帧进行基于图像的文字定位得到文字对象,然后在视频帧序列中对文字对象进行向前和向后的跟踪,最后对文字对象进行识别得到文字提取结果,并把识别结果保存成一个文本,记为TextVOCR。本步骤处理流程参见图4。For text positioning in video images, firstly, candidate text regions are obtained based on the double-edge model of character strokes, and then the candidate text regions are decomposed to obtain precisely positioned text blocks. The text extraction algorithm in the video takes a frame every several video frames to perform image-based text positioning to obtain the text object, then tracks the text object forward and backward in the video frame sequence, and finally recognizes the text object to obtain the text Extract the result, and save the recognition result as a text, denoted as Text VOCR . Refer to Figure 4 for the processing flow of this step.

5、对于TextOBJECT、TextASR和TextVOCR分别进行基于隐马尔科大模型的中文分词、去掉无含义的“停用词”和中文词性标注,保留下动词和名词进行下一步的分析,处理之后分别记为WordOBJECT、WordASR和WordVOCR。于是整个视频在语义上可以表示为一个张量,即

Figure BSA00000522595100031
其中Ψ表示视频的语义张量特征,
Figure BSA00000522595100032
表示由三个向量WordOBJECT、WordASR和WordVOCR形成的张量空间。5. For Text OBJECT , Text ASR , and Text VOCR , perform Chinese word segmentation based on the Hidden Marko Large Model, remove meaningless "stop words" and Chinese part-of-speech tagging, and retain verbs and nouns for the next step of analysis. After processing, respectively Recorded as Word OBJECT , Word ASR and Word VOCR . Then the entire video can be semantically expressed as a tensor, namely
Figure BSA00000522595100031
where Ψ represents the semantic tensor feature of the video,
Figure BSA00000522595100032
Represents the tensor space formed by the three vectors Word OBJECT , Word ASR , and Word VOCR .

6、对于上一步中得到的名词和动词,计算两个相同词性词语之间的语义相似度。计算方法采取基于“知网”的层次距离度量方法,相似度定义在0和1之间,比如桌子和椅子之间的相似度0.8,而风景和轮船的相似度为0.1。本步骤处理流程参见图5。6. For the nouns and verbs obtained in the previous step, calculate the semantic similarity between two words with the same part of speech. The calculation method adopts the hierarchical distance measurement method based on "HowNet", and the similarity is defined between 0 and 1. For example, the similarity between a table and a chair is 0.8, and the similarity between a landscape and a ship is 0.1. Refer to Figure 5 for the processing flow of this step.

7、把WordOBJECT、WordASR和WordVOCR中的词语作为图模型的顶点V,把上一步中定义的词语之间的相似度定义为顶点之间边的权重w,构造表达视频的图模型。这是一个带权无向图,G=(V,E),V表示顶点集合,E表示边集合,|V|=n表示顶点个数,每条边

Figure BSA00000522595100041
有一个非负的权重该权重的定义就是上一步中确定的词语之间的语义相似度。7. Take the words in Word OBJECT , Word ASR and Word VOCR as the vertex V of the graph model, define the similarity between the words defined in the previous step as the weight w of the edge between the vertices, and construct a graph model to express the video. This is a weighted undirected graph, G=(V, E), V represents the set of vertices, E represents the set of edges, |V|=n represents the number of vertices, each edge
Figure BSA00000522595100041
has a non-negative weight The definition of this weight is the semantic similarity between words determined in the previous step.

8、由于视频中图像、文字和音频共同表达了一个主题,所以在它们在语义上是一致的。从而视频张量的三个向量的语义相似度应该距离最小化,不符合最小化原则的词语是由于识别错误导致的,应当去除。这个原则可以表示为:

Figure BSA00000522595100043
其中
Figure BSA00000522595100044
表示两个词语,其中m,n∈{WordOBJECT,WordASR,WordVOCR},表示的视频张量中的一个向量;0≤i≤|m|,0≤j≤|n|,表示的词语编号,该编号最大为视频张量中一个向量的最大维数。f(·)表示的词语的相似度距离值,定义在0和1之间。8. Since the image, text and audio in the video jointly express a theme, they are consistent in semantics. Therefore, the semantic similarity of the three vectors of the video tensor should be minimized, and the words that do not meet the minimization principle are caused by recognition errors and should be removed. This principle can be expressed as:
Figure BSA00000522595100043
in
Figure BSA00000522595100044
Represents two words, where m, n∈{Word OBJECT , Word ASR , Word VOCR }, represents a vector in the video tensor; 0≤i≤|m|, 0≤j≤|n|, represents the word number, This number is up to the largest dimension of a vector in the video tensor. The similarity distance value of words represented by f(·), defined between 0 and 1.

9、对于上述最优化问题,转化为图模型的稠密子图发现问题。即在G=(V,E)中,找到子图H=(X,F),H为子图,x为子图顶点集合,F为子图的边集合。稠密子图的发现算法可以表示为

Figure BSA00000522595100045
即子图中各个边的平均权重之和最大化,其中|X|表示子图顶点个数,
Figure BSA00000522595100046
表示子图边的集合,w(l)表示边的权重,边的权重计算方法同上一步中的f(·)。视频语义特征张量空间包含三个向量,即
Figure BSA00000522595100047
每个向量构造一个图模型社区(community),从而整个视频表达为由三个社区组成的一个图模型。稠密子图发现算法在上述图模型上进行,如图6所示。9. For the above optimization problem, it is transformed into a dense subgraph discovery problem of the graphical model. That is, in G=(V, E), find the subgraph H=(X, F), H is the subgraph, x is the vertex set of the subgraph, and F is the edge set of the subgraph. The dense subgraph discovery algorithm can be expressed as
Figure BSA00000522595100045
That is, the sum of the average weights of each edge in the subgraph is maximized, where |X| represents the number of subgraph vertices,
Figure BSA00000522595100046
Represents the set of subgraph edges, w(l) represents the weight of the edge, and the calculation method of the weight of the edge is the same as f(·) in the previous step. The video semantic feature tensor space contains three vectors, namely
Figure BSA00000522595100047
Each vector constructs a graph model community (community), so that the whole video is expressed as a graph model composed of three communities. The dense subgraph discovery algorithm is performed on the above graph model, as shown in Figure 6.

10、对于上一步中发现的图模型中的稠密子图,记录稠密子图中的顶点所代表的词语作为视频的有效标注,该标注即体现了视频的语义信息。视频标注结果如图7所示。10. For the dense subgraph in the graphical model found in the previous step, record the words represented by the vertices in the dense subgraph as effective annotations of the video, which embodies the semantic information of the video. The video annotation results are shown in Figure 7.

Claims (5)

1.一种视频语义挖掘方法,其特征在于,所述方法的步骤如下:1. A video semantic mining method is characterized in that the steps of the method are as follows: 步骤S1:对于待处理的视频,分别进行中文连续语音识别、视频目标识别和视频文字识别;Step S1: For the video to be processed, respectively perform Chinese continuous speech recognition, video object recognition and video text recognition; 步骤S2:对于步骤S1所述的三种识别结果,各自表达为一个文字向量,共同组成一个张量以表达视频;Step S2: For the three kinds of recognition results described in step S1, each is expressed as a text vector, which together form a tensor to express the video; 步骤S3:对于步骤S2中的三个文字向量,分别进行中文分词和词性标注,保留名词和动词;Step S3: For the three text vectors in step S2, respectively perform Chinese word segmentation and part-of-speech tagging, and retain nouns and verbs; 步骤S4:构造图模型来表达视频,其中图的顶点为S3中所得到的名词和动词,图的边权重设置为两个顶点所代表的中文词语的语义距离;Step S4: Construct a graph model to express the video, wherein the vertices of the graph are the nouns and verbs obtained in S3, and the edge weights of the graph are set to the semantic distance of the Chinese words represented by the two vertices; 步骤S5:对于步骤S4所构造的图模型,使用稠密子图发现算法挖掘图模型中的语义。Step S5: For the graph model constructed in step S4, use the dense subgraph discovery algorithm to mine the semantics in the graph model. 2.根据权利要求1所述的视频语义挖掘方法,其特征在于,所述视频目标识别首先在视觉目标分类挑战赛图片库(PASCAL VOC Challenge 2010)上提取图片的梯度特征(HOG)和尺度不变特征(SIFT),并对于这些特征使用均值聚类(K-means)算法聚类,称这些类为视觉单词,然后使用这些视觉单词构造词袋(bag of words)对图片库中的图片进行描述,以词袋为图像特征训练支撑向量机模型(SVM),使用支撑向量机模型对视频镜头关键帧图像进行目标识别。2. video semantics mining method according to claim 1, is characterized in that, described video object recognition first extracts the gradient feature (HOG) of picture and scale difference on Visual Object Classification Challenge picture storehouse (PASCAL VOC Challenge 2010). Variable features (SIFT), and use the mean value clustering (K-means) algorithm clustering for these features, these classes are called visual words, and then use these visual words to construct a bag of words (bag of words) for the pictures in the picture library To describe, use the bag of words as the image feature to train the support vector machine model (SVM), and use the support vector machine model to perform target recognition on the key frame image of the video lens. 3.根据权利要求1所述的视频语义挖掘方法,其特征在于,对于视频的处理融合了中文连续语音识别、视频目标识别和视频文字识别,并且把三种识别结果统一作为文字特征来处理,文字处理包括中文分词和词性标注。3. video semantic mining method according to claim 1, is characterized in that, for the processing of video, has merged Chinese continuous speech recognition, video object recognition and video character recognition, and three kinds of recognition results are unified as character feature and are processed, Word processing includes Chinese word segmentation and part-of-speech tagging. 4.根据权利要求1所述的视频语义挖掘方法,其特征在于,对于图模型的构造,顶点代表了视频三种识别结果中的名词和动词,边的权重代表了顶点之间的语义距离,边的权重计算采取的是基于“知网”的语义度量方法,通过查询“知网”语义辞典中词语之间的层次和隶属关系来计算两个词语之间的语义距离。4. video semantic mining method according to claim 1, is characterized in that, for the structure of graph model, apex represents noun and verb in three kinds of recognition results of video, and the weight of edge represents the semantic distance between apex, The calculation of edge weight is based on the semantic measurement method of "HowNet", and the semantic distance between two words is calculated by querying the hierarchy and affiliation relationship between words in the semantic dictionary of "HowNet". 5.根据权利要求1所述的视频语义挖掘方法,其特征在于,稠密子图的发现算法是通过不断地去除图模型中孤立的顶点来实现的,视频语义的挖掘过程表达为图模型中稠密子图的发现问题。 5. The video semantic mining method according to claim 1, wherein the discovery algorithm of the dense subgraph is realized by constantly removing isolated vertices in the graph model, and the mining process of the video semantics is expressed as a dense subgraph in the graph model. Subgraph discovery problem. the
CN 201110168952 2011-06-22 2011-06-22 Method for video semantic mining Pending CN102222101A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201110168952 CN102222101A (en) 2011-06-22 2011-06-22 Method for video semantic mining

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201110168952 CN102222101A (en) 2011-06-22 2011-06-22 Method for video semantic mining

Publications (1)

Publication Number Publication Date
CN102222101A true CN102222101A (en) 2011-10-19

Family

ID=44778653

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201110168952 Pending CN102222101A (en) 2011-06-22 2011-06-22 Method for video semantic mining

Country Status (1)

Country Link
CN (1) CN102222101A (en)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014146463A1 (en) * 2013-03-19 2014-09-25 中国科学院自动化研究所 Behaviour recognition method based on hidden structure reasoning
CN105629747A (en) * 2015-09-18 2016-06-01 宇龙计算机通信科技(深圳)有限公司 Voice control method and device of smart home system
CN106202421A (en) * 2012-02-02 2016-12-07 联想(北京)有限公司 A kind of obtain the method for video, device and play the method for video, device
CN107203586A (en) * 2017-04-19 2017-09-26 天津大学 A kind of automation result generation method based on multi-modal information
CN108320318A (en) * 2018-01-15 2018-07-24 腾讯科技(深圳)有限公司 Image processing method, device, computer equipment and storage medium
CN108427713A (en) * 2018-02-01 2018-08-21 宁波诺丁汉大学 A kind of video summarization method and system for homemade video
CN110879974A (en) * 2019-11-01 2020-03-13 北京微播易科技股份有限公司 Video classification method and device
CN111107381A (en) * 2018-10-25 2020-05-05 武汉斗鱼网络科技有限公司 Live broadcast room bullet screen display method, storage medium, equipment and system
CN111126373A (en) * 2019-12-23 2020-05-08 北京中科神探科技有限公司 Device and method for judging Internet short video violations based on cross-modal recognition technology
CN111797850A (en) * 2019-04-09 2020-10-20 Oppo广东移动通信有限公司 Video classification method and device, storage medium and electronic equipment
CN113343974A (en) * 2021-07-06 2021-09-03 国网天津市电力公司 Multi-modal fusion classification optimization method considering inter-modal semantic distance measurement
CN115203408A (en) * 2022-06-24 2022-10-18 中国人民解放军国防科技大学 An intelligent labeling method for multimodal test data

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101093500A (en) * 2007-07-16 2007-12-26 武汉大学 Method for recognizing semantics of events in video
US20080059872A1 (en) * 2006-09-05 2008-03-06 National Cheng Kung University Video annotation method by integrating visual features and frequent patterns

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080059872A1 (en) * 2006-09-05 2008-03-06 National Cheng Kung University Video annotation method by integrating visual features and frequent patterns
CN101093500A (en) * 2007-07-16 2007-12-26 武汉大学 Method for recognizing semantics of events in video

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
《中国博士学位论文全文数据库(电子期刊)》 20101215 刘亚楠 多模态特征融合和变量选择的视频语义理解 第4章 1-5 , *
《信息化纵横》 20091031 张焕生等 基于图的频繁子结构挖掘算法综述 第5-9页 1-5 , *
《计算机研究与发展》 20090131 刘亚楠等 基于多模态子空间相关性传递的视频语义挖掘 第1-8页 1-5 , *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106202421B (en) * 2012-02-02 2020-01-31 联想(北京)有限公司 method and device for obtaining video and method and device for playing video
CN106202421A (en) * 2012-02-02 2016-12-07 联想(北京)有限公司 A kind of obtain the method for video, device and play the method for video, device
WO2014146463A1 (en) * 2013-03-19 2014-09-25 中国科学院自动化研究所 Behaviour recognition method based on hidden structure reasoning
CN105629747A (en) * 2015-09-18 2016-06-01 宇龙计算机通信科技(深圳)有限公司 Voice control method and device of smart home system
CN107203586A (en) * 2017-04-19 2017-09-26 天津大学 A kind of automation result generation method based on multi-modal information
CN108320318A (en) * 2018-01-15 2018-07-24 腾讯科技(深圳)有限公司 Image processing method, device, computer equipment and storage medium
CN108427713A (en) * 2018-02-01 2018-08-21 宁波诺丁汉大学 A kind of video summarization method and system for homemade video
CN108427713B (en) * 2018-02-01 2021-11-16 宁波诺丁汉大学 Video abstraction method and system for self-made video
CN111107381A (en) * 2018-10-25 2020-05-05 武汉斗鱼网络科技有限公司 Live broadcast room bullet screen display method, storage medium, equipment and system
CN111797850A (en) * 2019-04-09 2020-10-20 Oppo广东移动通信有限公司 Video classification method and device, storage medium and electronic equipment
CN110879974A (en) * 2019-11-01 2020-03-13 北京微播易科技股份有限公司 Video classification method and device
CN111126373A (en) * 2019-12-23 2020-05-08 北京中科神探科技有限公司 Device and method for judging Internet short video violations based on cross-modal recognition technology
CN113343974A (en) * 2021-07-06 2021-09-03 国网天津市电力公司 Multi-modal fusion classification optimization method considering inter-modal semantic distance measurement
CN115203408A (en) * 2022-06-24 2022-10-18 中国人民解放军国防科技大学 An intelligent labeling method for multimodal test data

Similar Documents

Publication Publication Date Title
CN102222101A (en) Method for video semantic mining
CN110489395B (en) Method for automatically acquiring knowledge of multi-source heterogeneous data
WO2023065617A1 (en) Cross-modal retrieval system and method based on pre-training model and recall and ranking
CN110598005B (en) Public safety event-oriented multi-source heterogeneous data knowledge graph construction method
CN102663015B (en) Video semantic labeling method based on characteristics bag models and supervised learning
CN101382937B (en) Speech recognition-based multimedia resource processing method and its online teaching system
CN110309331A (en) A Self-Supervised Cross-Modal Deep Hash Retrieval Method
CN110619051B (en) Question sentence classification method, device, electronic equipment and storage medium
WO2018010365A1 (en) Cross-media search method
CN112836487B (en) Automatic comment method and device, computer equipment and storage medium
CN108268600B (en) AI-based unstructured data management method and device
CN108132968A (en) Network text is associated with the Weakly supervised learning method of Semantic unit with image
CN106484674A (en) A kind of Chinese electronic health record concept extraction method based on deep learning
CN101877064B (en) Image classification method and image classification device
CN106446109A (en) Acquiring method and device for audio file abstract
CN105912625A (en) Linked data oriented entity classification method and system
CN109034248B (en) A deep learning-based classification method for images with noisy labels
CN112069826A (en) Vertical Domain Entity Disambiguation Method Fusing Topic Models and Convolutional Neural Networks
CN113221882B (en) Image text aggregation method and system for curriculum field
CN114997288B (en) A design resource association method
WO2024130751A1 (en) Text-to-image generation method and system based on local detail editing
CN110489548A (en) A kind of Chinese microblog topic detecting method and system based on semanteme, time and social networks
CN108009135A (en) The method and apparatus for generating documentation summary
CN110188359B (en) Text entity extraction method
CN112347761A (en) BERT-based drug relationship extraction method

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20111019