CN105279495A - Video description method based on deep learning and text summarization - Google Patents
Video description method based on deep learning and text summarization Download PDFInfo
- Publication number
- CN105279495A CN105279495A CN201510697454.3A CN201510697454A CN105279495A CN 105279495 A CN105279495 A CN 105279495A CN 201510697454 A CN201510697454 A CN 201510697454A CN 105279495 A CN105279495 A CN 105279495A
- Authority
- CN
- China
- Prior art keywords
- video
- description
- neural network
- network model
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Artificial Intelligence (AREA)
- Mathematical Physics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
技术领域technical field
本发明涉及视频描述领域,尤其涉及一种基于深度学习和文本总结的视频描述方法。The invention relates to the field of video description, in particular to a video description method based on deep learning and text summarization.
背景技术Background technique
使用自然语言对一个视频进行描述,无论是对该视频的理解还是在Web检索该视频都是极其重要的。同时,视频的语言描述也是多媒体和计算机视觉领域中重点研究的课题。所谓视频描述,是指对给定的视频,通过观察它所包含的内容,即获得视频特征,并根据这些内容,生成相应的句子。当人们看到一个视频时,特别是一些动作类别的视频,在观看完视频后会对该视频有一定程度的了解,并可以通过语言去讲述视频中所发生的事情。例如:使用“一个人正在骑摩托”这样的句子对视频进行描述。然而,面对大量的视频,采用人工的方式对视频进行逐一的描述需要大量的时间,人力和财力。使用计算机技术对视频特征进行分析,并与自然语言处理的方法进行结合,生成对视频的描述是非常有必要的。一方面,通过视频描述的方法,人们可以从语义的角度更加精确的去理解视频。另一方面,在视频检索领域,当用户输入一段文字性的描述来检索出相应的视频这件事情是非常困难的并且具有一定的挑战。Using natural language to describe a video is extremely important for both understanding the video and retrieving the video on the Web. At the same time, the language description of video is also a key research topic in the field of multimedia and computer vision. The so-called video description means that for a given video, by observing the content contained in it, the video features are obtained, and corresponding sentences are generated according to these contents. When people see a video, especially some action videos, they will have a certain degree of understanding of the video after watching the video, and can use language to tell what happened in the video. Example: Describe a video with a sentence like "A man is riding a motorcycle." However, in the face of a large number of videos, it takes a lot of time, manpower and financial resources to manually describe the videos one by one. It is very necessary to use computer technology to analyze video features and combine them with natural language processing methods to generate video descriptions. On the one hand, through the method of video description, people can understand videos more precisely from the perspective of semantics. On the other hand, in the field of video retrieval, when a user enters a textual description to retrieve the corresponding video, it is very difficult and has certain challenges.
在过去的几年中已经涌现出了各种各样的视频描述方法,比如:通过对视频特征进行分析,可以识别视频中存在的物体,以及物体之间所具有的动作关系。然后采用固定的语言模板:主语+动词+宾语,从所识别物体中确定主语、宾语以及将物体之间的动作关系作为谓语,采用这样的方式生成句子对视频的描述。In the past few years, a variety of video description methods have emerged. For example, by analyzing video features, objects in the video and the action relationship between objects can be identified. Then use a fixed language template: subject + verb + object, determine the subject and object from the recognized objects, and use the action relationship between objects as the predicate, and use this method to generate sentences to describe the video.
但是这样的方法存在一定的局限性,例如:使用语言模板生成句子容易导致生成的句子句式相对固定,句式过于单一,缺乏人类自然语言表达的色彩。同时,识别视频中的物体和动作等均需要使用不同的特征,造成步骤相对繁琐,并需要大量的时间对视频特征进行训练。不仅如此,识别的准确率直接影响生成句子的好坏,这种分步式的方法需要在每个步骤保证较高的正确性,实现有一定的困难。However, there are certain limitations in such a method. For example, the use of language templates to generate sentences can easily lead to a relatively fixed sentence structure, which is too single and lacks the color of human natural language expression. At the same time, different features are required to recognize objects and actions in the video, which makes the steps relatively cumbersome and requires a lot of time to train the video features. Not only that, the accuracy of recognition directly affects the quality of generated sentences. This step-by-step method needs to ensure high accuracy at each step, and it is difficult to implement.
发明内容Contents of the invention
本发明提供了一种基于深度学习和文本总结的视频描述方法,本发明通过自然语言描述一段视频中正在发生的事件以及与事件相关的物体属性,从而达到对视频内容进行描述和总结的目的,详见下文描述:The present invention provides a video description method based on deep learning and text summarization. The present invention uses natural language to describe the events that are happening in a video and the attributes of objects related to the events, so as to achieve the purpose of describing and summarizing the video content. See the description below for details:
一种基于深度学习和文本总结的视频描述方法,其特征在于,所述视频描述方法包括以下步骤:A video description method based on deep learning and text summarization, characterized in that the video description method comprises the following steps:
从互联网下载视频,并对每个视频进行描述,形成<视频,描述>对,构成文本描述训练集;Download videos from the Internet, and describe each video to form a <video, description> pair to form a text description training set;
通过现有的图像数据集按照图像分类任务训练卷积神经网络模型;Train the convolutional neural network model according to the image classification task through the existing image data set;
对视频提取视频帧序列,并利用卷积神经网络模型提取卷积神经网络特征,构成<视频帧序列,文本描述序列>对作为递归神经网络模型的输入,训练得到递归神经网络模型;Extract video frame sequence to video, and utilize convolutional neural network model to extract convolutional neural network feature, form <video frame sequence, text description sequence> pair as the input of recurrent neural network model, train and obtain recursive neural network model;
通过训练得到的递归神经网络模型对待描述视频的视频帧序列进行描述,得到描述序列;Describe the video frame sequence of the video to be described by the recursive neural network model obtained through training, and obtain the description sequence;
通过基于图的词汇中心度作为文本总结的显著性的方法,对描述序列进行排序,输出视频的最终描述结果。By using graph-based lexical centrality as a saliency method for text summarization, the sequence of descriptions is sorted, and the final description result of the video is output.
所述从互联网下载视频,并对每个视频进行描述,形成<视频,描述>对,构成文本描述训练集具体为:The downloading of videos from the Internet, and describing each video to form a pair of <video, description> to form a text description training set is as follows:
通过现有的视频集合、以及每个视频对应的句子描述组成<视频,描述>对,构成文本描述训练集。The <video, description> pair is composed of the existing video collection and the sentence description corresponding to each video to form a text description training set.
所述对视频提取视频帧序列,并利用卷积神经网络模型提取卷积神经网络特征,构成<视频帧序列,文本描述序列>对作为递归神经网络模型的输入,训练得到递归神经网络模型的步骤具体为:The step of extracting a video frame sequence from the video, extracting convolutional neural network features using a convolutional neural network model, forming a <video frame sequence, text description sequence> pair as the input of a recurrent neural network model, and training to obtain a recurrent neural network model Specifically:
使用训练卷积神经网络模型后的参数,提取图像的卷积神经网络特征,以及图像对应的句子描述进行建模,获取目标函数;Use the parameters after training the convolutional neural network model to extract the convolutional neural network features of the image and the sentence description corresponding to the image for modeling to obtain the objective function;
构造递归神经网络;对于非线性函数通过长短时间记忆网络进行建模;Construct a recurrent neural network; model nonlinear functions through long and short-term memory networks;
使用梯度下降的方式优化目标函数,并得到训练后的长短时间记忆网络参数。Use gradient descent to optimize the objective function and obtain the trained long-short-term memory network parameters.
所述通过训练得到的递归神经网络模型对待描述视频的视频帧序列进行描述,得到描述序列的步骤具体为:Describe the video frame sequence of the video to be described by the recursive neural network model obtained through training, and the steps of obtaining the description sequence are specifically:
利用训练好的模型参数并使用卷积神经网络模型提取每个图像的卷积神经网络特征,得到图像特征;Use the trained model parameters and use the convolutional neural network model to extract the convolutional neural network features of each image to obtain image features;
将图像特征作为输入并利用训练得到的模型参数得到句子描述,从而得到视频对应的句子描述。The image feature is used as input and the model parameters obtained by training are used to obtain a sentence description, so as to obtain a sentence description corresponding to the video.
本发明提供的技术方案的有益效果是:每一个视频由一个帧序列构成,使用卷积神经网络提取视频每一帧的底层特征,采用本方法能够有效避免传统的使用深度学习提取视频特征引入过多的噪点,降低后期生成句子的准确性。使用训练好的循环神经网络将每一帧图片转化成句子,从而生成一个句子的集合。并使用自动文本总结的方法通过计算句子之间的中心度并从句子的集合只中筛选出质量高,具有代表性的句子作为视频的描述,采用本方法能够产生更好的视频描述效果和准确性以及句子的多样性。同时,采用基于深度和文本总结的方法可以有效地推广到视频检索的应用当中,但本方法仅限于对视频内容的英文描述。The beneficial effect of the technical solution provided by the present invention is: each video is composed of a sequence of frames, and the convolutional neural network is used to extract the underlying features of each frame of the video. This method can effectively avoid the traditional use of deep learning to extract video features. A lot of noise will reduce the accuracy of the sentence generated in the later stage. Use the trained recurrent neural network to convert each frame of pictures into sentences to generate a collection of sentences. And use the method of automatic text summarization to calculate the centrality between sentences and select high-quality and representative sentences from the collection of sentences as video descriptions. This method can produce better video description effects and accuracy. Sex and sentence variety. At the same time, the method based on depth and text summarization can be effectively extended to the application of video retrieval, but this method is limited to the English description of video content.
附图说明Description of drawings
图1为一种基于深度学习和文本总结的视频描述方法的流程图;Fig. 1 is a flow chart of a video description method based on deep learning and text summarization;
图2本发明所使用的卷积神经网络模型(CNN)示意图;The convolutional neural network model (CNN) schematic diagram that Fig. 2 present invention uses;
其中,Cov表示卷积核;ReLU表示公式为max(0,x);Pool表示Pooling操作;LRN为局部相应归一化操作;Softmax为目标函数。Among them, Cov represents the convolution kernel; ReLU represents the formula as max(0,x); Pool represents the Pooling operation; LRN represents the local corresponding normalization operation; Softmax represents the objective function.
图3本发明所使用的递归神经网络示意图;The schematic diagram of the recursive neural network used in the present invention of Fig. 3;
其中,t表示t状态下的输入;ht-1表示上一状态的隐态;i为inputgate;f为forgetgate;o为outputgate;c为cell;mt为经过一个LSTM单元后的输出。Among them, t represents the input in the t state; h t-1 represents the hidden state of the previous state; i is the input gate; f is the forgetgate; o is the output gate; c is the cell; m t is the output after passing through an LSTM unit.
图4(a)为LexRank剪枝后连接图;Figure 4(a) is the connection diagram after LexRank pruning;
其中,S={S1,…,S10}为经过递归神经网络(RNN)生成的10个句子,采用图模式将这10个句子表示为10个节点;节点与节点之间的相似度通过直线来表示并构成全连接图,连线的粗细表示相似度的大小。Among them, S={S 1 ,…,S 10 } are 10 sentences generated by the recurrent neural network (RNN), and these 10 sentences are represented as 10 nodes in graph mode; the similarity between nodes is obtained by A straight line is used to represent and form a fully connected graph, and the thickness of the connection represents the size of the similarity.
图4(b)为LexRank初始全连接图;Figure 4(b) is the initial fully connected graph of LexRank;
通过设置阈值,将节点与节点之间相似度较小的连线去除,剩余的节点与节点之间的连线即句子之间的相似度较高。By setting the threshold, the connection lines with low similarity between nodes are removed, and the remaining connections between nodes, that is, the similarity between sentences is relatively high.
图5为部分视频帧经过描述后所产生的句子的示意图。Fig. 5 is a schematic diagram of sentences generated after partial video frames are described.
其中,每帧图像下面为采用本发明中所用的CNN-RNN模型后所生成的句子,其箭头指向部分为经过LexRank方法后对视频文本描述的总结作为该视频的文本描述。Wherein, below each frame of image is the sentence generated after adopting the CNN-RNN model used in the present invention, and its arrow points to a summary of the video text description after the LexRank method as the text description of the video.
具体实施方式detailed description
为使本发明的目的、技术方案和优点更加清楚,下面对本发明实施方式作进一步地详细描述。In order to make the purpose, technical solution and advantages of the present invention clearer, the implementation manners of the present invention will be further described in detail below.
基于背景技术中存在的问题,以及在图像中使用深度学习的方法对图像进行描述效果取得显著的提升后,人们从中受到启发,并在视频中运用深度学习的方法,其生成的视频描述的多样性和正确性有了一定的提高。Based on the problems in the background technology, and the effect of using deep learning methods in images to describe images has been significantly improved, people are inspired by it and use deep learning methods in videos. The generated video descriptions are diverse. The accuracy and accuracy have been improved to a certain extent.
为此,本发明实施例提出了一种基于深度学习和文本总结的视频描述方法,首先,本方法通过卷积神经网络框架对视频的每一帧的视觉特征进行提取。然后,将每一个视频特征作为输入到循环神经网络框架中,采用这种框架可以对每一个视觉特征,即视频的每一帧生成一句描述。这样,就得到了一个句子的集合,为了得到最具有表现性并且高质量的句子作为该视频的描述,本方法采用文本总结的方法,通过计算句子之间的相似度对所有句子进行排序,从而避免了一些错误句子和低质量的句子作为视频的最终描述。采用自动文本总结的方法不仅可以得到一个具有代表性的句子,并且具有一定的正确性和可靠性,从而提高了视频描述的准确性。同时,本方法也克服了视频检索所面临的一些技术上的困难。To this end, the embodiment of the present invention proposes a video description method based on deep learning and text summarization. First, this method extracts the visual features of each frame of the video through a convolutional neural network framework. Then, each video feature is used as input into the recurrent neural network framework, which can generate a sentence description for each visual feature, that is, each frame of the video. In this way, a collection of sentences is obtained. In order to obtain the most expressive and high-quality sentences as the description of the video, this method adopts the method of text summarization and sorts all sentences by calculating the similarity between sentences, so that Some wrong sentences and low-quality sentences are avoided as the final description of the video. The method of automatic text summarization can not only get a representative sentence, but also has certain correctness and reliability, thus improving the accuracy of video description. At the same time, this method also overcomes some technical difficulties faced by video retrieval.
实施例1Example 1
一种基于深度学习和文本总结的视频描述方法,参见图1,该方法包括以下步骤:A video description method based on deep learning and text summary, see Figure 1, the method includes the following steps:
101:从互联网下载视频,并对每个视频进行描述(英文描述),形成<视频,描述>对,构成文本描述训练集,其中每个视频对应多句描述,从而构成一个文本描述序列;101: Download videos from the Internet, and describe each video (in English), form a <video, description> pair, and form a text description training set, wherein each video corresponds to multiple sentence descriptions, thereby forming a text description sequence;
102:利用现有的图像数据集,按照图像分类任务训练卷积神经网络(CNN)模型;102: Utilize the existing image data set to train a convolutional neural network (CNN) model according to the image classification task;
例如:ImageNet。For example: ImageNet.
103:对视频提取视频帧序列,并利用卷积神经网络(CNN)模型提取CNN特征,构成<视频帧序列,文本描述序列>对作为递归神经网络(RNN)模型的输入,训练得到递归神经网络(RNN)模型;103: Extract the video frame sequence from the video, and use the convolutional neural network (CNN) model to extract CNN features, form a <video frame sequence, text description sequence> pair as the input of the recurrent neural network (RNN) model, and train the recurrent neural network (RNN) model;
104:利用训练得到的RNN模型对待描述视频的视频帧序列进行描述,得到描述序列;104: Using the trained RNN model to describe the video frame sequence of the video to be described to obtain a description sequence;
105:利用基于图的词汇中心度作为文本总结的显著性(LexRank)的方法对描述序列的合理性进行排序,选择最合理描述作为对该视频的最终描述。105: Use graph-based lexical centrality as the method of text summary significance (LexRank) to rank the rationality of the description sequence, and select the most reasonable description as the final description of the video.
综上所述,本发明实施例通过步骤101-步骤105实现了通过自然语言描述一段视频中正在发生的事件以及与事件相关的物体属性,从而达到对视频内容进行描述和总结的目的。To sum up, the embodiment of the present invention realizes the description of the event occurring in a video and the object attributes related to the event through natural language through steps 101 to 105, so as to achieve the purpose of describing and summarizing the video content.
实施例2Example 2
201:从互联网下载图像,并对每个视频进行描述,形成<视频,描述>对,构成文本描述训练集;201: Download images from the Internet, and describe each video to form a <video, description> pair to form a text description training set;
该步骤具体包括:This step specifically includes:
(1)从互联网中下载微软研究院视频描述数据集(MicrosoftResearchVideoDescriptionCorpus),这个数据集包括从YouTube中收集的1970个视频段,数据集可表示为
(2)每个视频都会有多个相应的描述,每一个视频的句子描述为Sentences={Sentence1,…,SentenceN},其中,N表示每一个视频所对应的句子(Sentence1,…,SentenceN)的描述个数。(2) Each video will have multiple corresponding descriptions, and the sentence description of each video is Sentences={Sentence 1 ,...,Sentence N }, where N represents the sentence corresponding to each video (Sentence 1 ,..., Sentence N ) description number.
(3)通过现有的视频集合VID以及每个视频对应的句子描述Sentences组成<视频,描述>对,构成文本描述训练集。(3) The <video, description> pair is composed of the existing video collection VID and the sentence description Sentences corresponding to each video to form a text description training set.
202:利用现有的图像数据集,按照图像分类任务训练卷积神经网络(CNN)模型,训练CNN模型参数;202: Utilize the existing image data set, train a convolutional neural network (CNN) model according to the image classification task, and train CNN model parameters;
该步骤具体包括:This step specifically includes:
(1)构造图2中所示的AlexNet[1]CNN模型:该模型包括了8个网络层,其中前5层是卷积层,后3层是全连接层。(1) Construct the AlexNet[1]CNN model shown in Figure 2: This model includes 8 network layers, of which the first 5 layers are convolutional layers, and the last 3 layers are fully connected layers.
(2)使用Imagenet作为训练集,将图像数据集中的每一张图片采样到256*256大小的图片,
F1(IMAGE)=norm{pool[max(0,W1*IMAGE+B1)]}(1)F 1 (IMAGE)=norm{pool[max(0,W 1 *IMAGE+B 1 )]}(1)
其中,IMAGE表示输入图像;W1表示卷积核参数;B1表示偏置;F1(IMAGE)表示为经过第一层网络后的输出结果;Norm表示归一化操作。在这一网络层中,通过线性纠正函数(max(0,x),x为W1*IMAGE+B1)对卷积后的图像进行处理,再经过映射pool操作,并对其进行局部相应归一化(LRN),其归一化的方式为:Among them, IMAGE represents the input image; W 1 represents the convolution kernel parameter; B 1 represents the bias; F 1 (IMAGE) represents the output result after the first layer of network; Norm represents the normalization operation. In this network layer, the convolved image is processed through the linear correction function (max(0,x), x is W 1 *IMAGE+B 1 ), and then the mapping pool operation is performed, and local corresponding Normalization (LRN), the normalization method is:
其中,M为pooling之后特征映射的个数;i为M个特征映射中的第i个;n为局部归一化的大小,即每n个特征映射进行归一化;ai x,y表示在第i个特征映射中坐标(x,y)下所对应的值;k为偏置;α,β为归一化的参数;bi x,y为经过局部相应归一化(LRN)后的输出结果。Among them, M is the number of feature maps after pooling; i is the i-th of M feature maps; n is the size of local normalization, that is, every n feature maps are normalized; a i x, y means The value corresponding to the coordinate (x, y) in the i-th feature map; k is the bias; α, β are normalized parameters; b i x, y are after local corresponding normalization (LRN) output result.
在AlexNet中,k=2,n=5,α=10-4,β=0.75。In AlexNet, k=2, n=5, α=10 −4 , β=0.75.
继续采用该模型,将F1(IMAGE)作为第二个网络层的输入,根据第二层网络层,可表示为:Continuing to use this model, F 1 (IMAGE) is used as the input of the second network layer. According to the second network layer, it can be expressed as:
F2(IMAGE)=max(0,W2*F1(IMAGE)+B2)(3)F 2 (IMAGE)=max(0,W 2 *F 1 (IMAGE)+B 2 )(3)
其中,W2表示卷积核参数;B2表示偏置;F2(IMAGE)表示为经过第二层网络后的输出结果。第一层与第二层的设置相同,只是卷积层与pooling层的映射核kernel的大小发生变化。Among them, W 2 represents the convolution kernel parameter; B 2 represents the bias; F 2 (IMAGE) represents the output result after passing through the second layer network. The settings of the first layer and the second layer are the same, but the size of the mapping kernel kernel of the convolution layer and the pooling layer changes.
按照AlexNet的网络设置,剩余的卷积层可依次表示为:According to AlexNet's network settings, the remaining convolutional layers can be expressed in turn as:
F3(IMAGE)=max(0,W3*F2(IMAGE)+B3)(4)F 3 (IMAGE)=max(0,W 3 *F 2 (IMAGE)+B 3 )(4)
F4(IMAGE)=max(0,W4*F3(IMAGE)+B4)(5)F 4 (IMAGE)=max(0,W 4 *F 3 (IMAGE)+B 4 )(5)
F5(IMAGE)=pool[max(0,W5*F4(IMAGE)+B5)](6)F 5 (IMAGE)=pool[max(0,W 5 *F 4 (IMAGE)+B 5 )](6)
其中,W3,W4,W5以及B3,B4,B5为各层的卷积参数和偏置。Among them, W 3 , W 4 , W 5 and B 3 , B 4 , B 5 are the convolution parameters and biases of each layer.
后3层为全连接层,根据图2的网络层设置可依次表示为:The last three layers are fully connected layers, which can be expressed in sequence according to the network layer settings in Figure 2:
F6(IMAGE)=fc[F5(IMAGE),θ1](7)F 6 (IMAGE)=fc[F 5 (IMAGE),θ 1 ](7)
F7(IMAGE)=fc[F6(IMAGE),θ2](8)F 7 (IMAGE)=fc[F 6 (IMAGE),θ 2 ](8)
F8(IMAGE)=fc[F7(IMAGE),θ3](9)F 8 (IMAGE)=fc[F 7 (IMAGE),θ 3 ](9)
其中,fc表示全连接层,θ1,θ2,θ3表示三个全连接层的参数,并将最后一层的特征F8(IMAGE)输入到1000个类别的多元分类器进行分类。Among them, fc represents the fully connected layer, θ 1 , θ 2 , and θ 3 represent the parameters of the three fully connected layers, and the feature F 8 (IMAGE) of the last layer is input to a multi-class classifier with 1000 categories for classification.
(3)根据当前网络,设置多元分类器,其公式可表示为:(3) According to the current network, set a multivariate classifier, and its formula can be expressed as:
其中,l(Θ)为目标函数,m为Imagenet中图像的类别,x(t)为每一类别经过Alexnet网络之后提取的CNN特征,y(t)为每个图像对应的标签,Θ={Wp,Bp,θq},p=1,...,5,q=1,2,3,分别为各个网络层中的参数。采用梯度下降的方法对目标函数参数进行优化,从而得到Alexnet网络设置的参数Θ。Among them, l(Θ) is the objective function, m is the category of the image in Imagenet, x (t) is the CNN feature extracted after each category passes through the Alexnet network, y (t) is the label corresponding to each image, Θ={ W p , B p , θ q }, p=1,...,5, q=1, 2, 3 are parameters in each network layer respectively. The gradient descent method is used to optimize the parameters of the objective function, so as to obtain the parameter Θ set by the Alexnet network.
203:对视频提取视频帧序列,并利用卷积神经网络(CNN)模型提取CNN特征,构成<视频帧序列,文本描述序列>对作为递归神经网络(RNN)模型的输入,训练得到递归神经网络(RNN)模型;203: Extract the video frame sequence from the video, and use the convolutional neural network (CNN) model to extract CNN features, form a <video frame sequence, text description sequence> pair as the input of the recurrent neural network (RNN) model, and train the recurrent neural network (RNN) model;
该步骤具体为:The steps are specifically:
(1)根据步骤201,使用训练CNN模型后的参数,提取图像的CNN特征I,以及图像对应的句子描述S进行建模,其目标函数为:(1) According to step 201, use the parameters after training the CNN model to extract the CNN feature I of the image, and the sentence description S corresponding to the image to model, and its objective function is:
θ*=argmax∑logp(S|I;θ)θ * = argmax∑logp(S|I; θ)
(11)(11)
其中,(S,I)代表训练数据中的图像-文本对;θ为模型待优化参数;θ*为优化后的参数;Among them, (S, I) represents the image-text pair in the training data; θ is the model parameter to be optimized; θ* is the optimized parameter;
训练的目的是使得所有样本在给定输入图像I的观察下生成的句子的对数概率之和最大,采用条件概率的链式法则计算概率p(S|I;θ),表达式为:The purpose of the training is to maximize the sum of the logarithmic probabilities of the sentences generated by all samples under the observation of the given input image I. The chain rule of conditional probability is used to calculate the probability p(S|I; θ), the expression is:
其中,S0,S1,...,St-1,St表示句子中的单词。对公式中的未知量p(St|I,S0,S1,...,St-1)使用递归神经网络进行建模。Wherein, S 0 , S 1 ,...,S t-1 , S t represent words in the sentence. The unknown quantity p(S t |I,S 0 ,S 1 ,...,S t-1 ) in the formula is modeled using a recurrent neural network.
(2)构造递归神经网络(RNN):(2) Construct a recurrent neural network (RNN):
在t-1个单词作为条件下,并将这些词表示为固定长度的隐态ht,直到出现新的输入xt,并通过非线性函数f对隐态进行更新,表达式为:Under the condition of t-1 words, and represent these words as a fixed-length hidden state h t until a new input x t appears, and update the hidden state through the nonlinear function f, the expression is:
ht+1=f(ht,xt)(13)h t+1 =f(h t ,x t )(13)
其中,ht+1表示下一隐态。Among them, h t+1 represents the next hidden state.
(3)对于非线性函数f,通过构造如图3所示的长短时间记忆网络(LSTM)进行建模;(3) For the nonlinear function f, model it by constructing a long-short-term memory network (LSTM) as shown in Figure 3;
其中,it为输入门inputgate,ft为遗忘门forgetgate,ot为输出门outputgate,c为细胞cell,各个状态的更新和输出可表示为:Among them, it is the input gate input gate, f t is the forget gate forgetgate, o t is the output gate output gate, c is the cell cell, the update and output of each state can be expressed as:
it=σ(Wixxt+Wimmt-1)(14)i t =σ(W ix x t +W im m t-1 )(14)
ft=σ(Wfxxt+Wfmmt-1)(15)f t = σ(W fx x t +W fm m t-1 )(15)
ot=σ(Woxxt+Wommt-1)(16)o t = σ(W ox x t +W om m t-1 )(16)
pt+1=Softmax(mt)(19)p t+1 =Softmax(m t )(19)
其中,表示为gate值之间的乘积,矩阵W={Wix;Wim;Wfx;Wfm;Wox;Wom;Wcx;Wix;Wcm}为需要训练的参数,σ(·)为S型函数(例如:σ(Wixxt+Wimmt-1)、σ(Wfxxt+Wfmmt-1)为S型函数),h(·)为双曲线正切函数(例如:h(Wcxxt+Wcmmt-1)为双曲线正切函数)。pt+1为经过Softmax分类后下一个词的概率分布;mt为当前状态特征。in, Expressed as the product between gate values, the matrix W={W ix ; W im ; W fx ; W fm ; W ox ; W om ; W cx ; W ix ; W cm } is the parameter to be trained, σ( ) is a Sigmoid function (for example: σ(W ix x t +W im m t-1 ), σ(W fx x t +W fm m t-1 ) is a Sigmoid function), h( ) is the hyperbolic tangent function (for example: h(W cx x t +W cm m t-1 ) is a hyperbolic tangent function). p t+1 is the probability distribution of the next word after Softmax classification; m t is the current state feature.
(4)使用梯度下降的方式优化目标函数(11),并得到训练后的长短时间记忆网络LSTM参数W。(4) Optimize the objective function (11) by gradient descent, and obtain the trained long-short-term memory network LSTM parameter W.
204:利用训练得到的RNN模型对待描述视频的视频帧序列进行描述,得到描述序列,进行预测的步骤如下:204: Using the trained RNN model to describe the video frame sequence of the video to be described, to obtain the description sequence, the steps for prediction are as follows:
(1)提取测试集
(2)利用训练好的模型参数Θ={Wi,Bi,θj},i=1,...,5,j=1,2,3,并使用CNN模型提取Imaget中每个图像的CNN特征,得到图像特征It={It 1,…,It 10}。(2) Use the trained model parameters Θ={W i ,B i ,θ j }, i=1,...,5, j =1,2,3, and use the CNN model to extract each The CNN feature of the image, the image feature I t ={I t 1 ,...,I t 10 } is obtained.
(3)将图像特征It作为输入并利用训练得到的模型参数W求得公式(12),得到句子描述S={S1,…,Sn}。从而得到该视频对应的句子描述。(3) The image feature I t is used as an input and the model parameter W obtained from training is used to obtain the formula (12), and the sentence description S={S 1 ,...,S n } is obtained. Thus, the sentence description corresponding to the video is obtained.
205:利用LexRank的方法对描述序列的合理性进行排序,选择最合理描述作为对该视频的最终描述。205: Use the method of LexRank to sort the rationality of the description sequence, and select the most reasonable description as the final description of the video.
(1)通过使用RNN模型对视频特征序列It={It 1,…,It 10}进行测试,生成相应的句子集合S={S1,…Si,…,Sn}。(1) Test the video feature sequence I t ={I t 1 ,...,I t 10 } by using the RNN model, and generate a corresponding sentence set S={S 1 ,...S i ,...,S n }.
(2)生成句子特征,顺序扫描所有句子集合中S中每一个句子Si中的所有单词,其中i=1,…,Nd,每个不同单词保留一个,形成单词列表表示的词汇表VOL={wi,…,wNw},其中Nw是词汇表VOL中的单词总数。对词汇表VOL中的每个单词wi,顺序扫描句子集合S中的每句子Sj,统计每个单词wi在每个句子Sj中出现的次数nij,其中j=1,…,Ns,Ns是句子总数,并统计集合S中包含单词wi的伴随文本个数num(wi);根据公式(20)计算每个单词wi在每个句子Sj中的词频tf(wi,sj),其中i=1,…,Nd,Nd是词汇表中单词总数,j=1,…,Ns,Ns是集合中所有句子S总数;(2) Generate sentence features, sequentially scan all words in each sentence S i in S in all sentence sets, where i=1,...,N d , keep one for each different word, and form a vocabulary VOL represented by a word list ={w i ,...,w Nw }, where N w is the total number of words in the vocabulary VOL. For each word w i in the vocabulary VOL, sequentially scan each sentence S j in the sentence set S, and count the number of occurrences n ij of each word w i in each sentence S j , where j=1,..., N s , N s is the total number of sentences, and count the number of accompanying texts num(w i ) containing word w i in the set S; calculate the word frequency tf of each word w i in each sentence S j according to formula (20) (w i , s j ), where i=1,...,N d , N d is the total number of words in the vocabulary, j=1,...,N s , N s is the total number of all sentences S in the collection;
其中,nkj为第k个词在第j个句子中出现的个数。Among them, n kj is the number of occurrences of the kth word in the jth sentence.
对词汇表VOL中的每个单词wi,根据公式(21)计算其逆文档词频idf(wi);For each word w i in the vocabulary VOL, calculate its inverse document word frequency idf(w i ) according to formula (21);
idf(wi)=log(Nd/num(wi))(21)idf(w i )=log(N d /num(w i ))(21)
其中,Nd为每个句子单词的个数。Among them, N d is the number of words in each sentence.
根据向量空间模型,将集合S中每个句子Sj表示成Nw维向量,第i维对应词汇表中的单词wi,其值为tfidf(wi),计算公式如下:According to the vector space model, each sentence S j in the set S is expressed as an N w -dimensional vector, the i-th dimension corresponds to the word w i in the vocabulary, and its value is tfidf(w i ), the calculation formula is as follows:
tfidf(wi)=tf(wi,sj)×idf(wi)(22)tfidf(w i )=tf(w i ,s j )×idf(w i )(22)
(3)采用两个向量Si,Sj之间的余弦值作为句子相似度,计算公式如下:(3) The cosine value between the two vectors S i and S j is used as the sentence similarity, and the calculation formula is as follows:
其中,为每个单词w在句子Si中的词频;为每个单词w在句子Sj中的词频;idfw为每个单词的逆文档词频;sm为句子Si中任意一个单词;为单词sm在Si中的词频;为单词sm的逆文档词频;sn为句子Sj中任意一个单词;为单词sn在Sj中的词频;为单词sn的逆文档词频。in, is the word frequency of each word w in sentence S i ; is the word frequency of each word w in the sentence S j ; idf w is the inverse document word frequency of each word; s m is any word in the sentence S i ; is the word frequency of word s m in S i ; is the inverse document word frequency of the word s m ; s n is any word in the sentence S j ; is the word frequency of word s n in S j ; is the inverse document frequency of word s n .
并形成全连接无向图,如图4(a),每个节点ui为句子Si,节点之间边作为为句子相似度。And form a fully connected undirected graph, as shown in Figure 4(a), each node u i is a sentence S i , and the edges between nodes are taken as sentence similarity.
(4)设置阈值Degree,将所有相似度similarity小于Degree的边删除,如图4(b)。(4) Set the threshold Degree, and delete all edges whose similarity is less than Degree, as shown in Figure 4(b).
(5)计算每个句子节点ui的LexRank分数LR,每个句子节点的初始分数为:d/N,其中N为句子节点个数,d为阻尼因子,d通常选在[0.1,0.2]之间,根据公式(4)计算分数LR:(5) Calculate the LexRank score LR of each sentence node u i , the initial score of each sentence node is: d/N, where N is the number of sentence nodes, d is the damping factor, and d is usually selected in [0.1,0.2] Between, the score LR is calculated according to the formula (4):
其中,deg(v)为节点v的阈值;LR(u)为节点u的分数;LR(v)为节点v的分数。Among them, deg(v) is the threshold of node v; LR(u) is the score of node u; LR(v) is the score of node v.
(6)计算每个句子节点的LR分数,并排序,选择分数最高的句子作为视频的最终描述。(6) Calculate the LR score of each sentence node, and sort them, and select the sentence with the highest score as the final description of the video.
综上所述,本发明实施例通过步骤201-步骤205实现了通过自然语言描述一段视频中正在发生的事件以及与事件相关的物体属性,从而达到对视频内容进行描述和总结的目的。To sum up, the embodiment of the present invention realizes the description of the event occurring in a video and the object attributes related to the event through natural language through steps 201 to 205, so as to achieve the purpose of describing and summarizing the video content.
实施例3Example 3
这里选取两个视频作为待描述视频,如图5所示,使用本发明中基于深度学习和文本总结的方法对其进行预测输出相应的视频描述:Here, two videos are selected as videos to be described, as shown in Figure 5, using the method based on deep learning and text summarization in the present invention to predict and output corresponding video descriptions:
(1)使用ImageNet作为训练集,将数据集中的每一张图片采样到256*256大小的图片,
(2)搭建第一层卷积层,设置卷积核cov1大小为11,步长stride为4,选择ReLU为max(0,x),对卷积后的featuremap进行pooling操作,核的大小为3,步长stride为2,并使用局部相应归一化对卷积后的数据进行归一化。在AlexNet中,k=2,n=5,α=10-4,β=0.75。(2) Build the first convolutional layer, set the convolution kernel cov1 size to 11, stride to 4, select ReLU as max(0,x), perform pooling operation on the convolutional featuremap, and the kernel size is 3. The stride is 2, and the convolutional data is normalized using local corresponding normalization. In AlexNet, k=2, n=5, α=10 −4 , β=0.75.
(3)搭建第二层卷积层,设置卷积核cov2大小为5,步长stride为1,选择ReLU为max(0,x),对卷积后的featuremap进行pooling操作,核的大小为3,步长stride为2,并使用局部相应归一化对卷积后的数据进行归一化。(3) Build the second convolutional layer, set the convolution kernel cov2 size to 5, stride to 1, select ReLU as max(0,x), perform pooling operation on the convolutional featuremap, and the kernel size is 3. The stride is 2, and the convolutional data is normalized using local corresponding normalization.
(4)搭建第三层卷积层,设置卷积核cov3大小为3,步长stride为1,选择ReLU为max(0,x)。(4) Build the third convolutional layer, set the convolution kernel cov3 size to 3, stride to 1, and select ReLU as max(0,x).
(5)搭建第四层卷积层,设置卷积核cov4大小为3,步长stride为1,选择ReLU为max(0,x)。(5) Build the fourth convolution layer, set the convolution kernel cov4 size to 3, stride to 1, and select ReLU to max(0,x).
(6)搭建第五层卷积层,设置卷积核cov5大小为3,步长stride为1,选择ReLU为max(0,x),并对卷积后的featuremap进行pooling操作,核的大小为3,步长stride为2。(6) Build the fifth convolutional layer, set the convolution kernel cov5 size to 3, stride to 1, select ReLU as max(0,x), and perform pooling operation on the convolutional featuremap, the size of the kernel is 3, and the stride is 2.
(7)搭建第六层全连接层,设置该层为fc6,选择ReLU为max(0,x),对处理后的数据进行dropout。(7) Build the sixth fully connected layer, set this layer as fc6, select ReLU as max(0,x), and perform dropout on the processed data.
(8)搭建第七层全连接层,设置该层为fc7,选择ReLU为max(0,x),对处理后的数据进行dropout。(8) Build the seventh fully connected layer, set this layer as fc7, select ReLU as max(0,x), and perform dropout on the processed data.
(9)搭建第八层全连接层,设置该层为fc8,并加入Softmax分类器作为目标函数。(9) Build the eighth fully connected layer, set this layer as fc8, and add the Softmax classifier as the objective function.
(10)通过设置上述八层网络层,建立卷积神经网络(CNN)模型。(10) Establish a convolutional neural network (CNN) model by setting the above-mentioned eight-layer network layer.
(11)训练CNN模型参数。(11) Training CNN model parameters.
(12)数据处理:将数据集中的每个视频均匀提取10帧,并采样到256*256大小。并将图像输入到训练好的CNN模型中得到图像特征,每帧图像随机对应该视频的5句文本表述作为图像-文本对(12) Data processing: 10 frames are evenly extracted from each video in the data set, and sampled to a size of 256*256. And input the image into the trained CNN model to obtain image features, and each frame of image randomly corresponds to the 5 sentences of the video as an image-text pair
(13)构造递归神经网络(RNN)模型。(13) Construct a recurrent neural network (RNN) model.
图5为经过本发明后所产生的视频文本描述结果。图中的图片部分为从视频中提取的视频帧,每帧图像对应的句子为视频特征经过语言模型后所得到的结果。图片下半部分表示经过总结后,只采用视频特征和通过图像迁移所生成的句子以及该视频原本的描述。Fig. 5 is the video text description result generated by the present invention. The picture part in the figure is the video frame extracted from the video, and the sentence corresponding to each frame of image is the result obtained after the video features pass through the language model. The lower part of the picture shows that after summarization, only the video features and sentences generated by image migration and the original description of the video are used.
综上所述,本发明实施例将每一个视频的帧序列通过卷积神经网络和循环神经网络转化成一系列的句子,并通过文本总结的方法,从众多的句子中筛选出质量高并具有代表性的句子。用户可以使用这种方法得到视频的描述,其描述的准确性较高,并且可以推广到视频的检索中去。In summary, the embodiment of the present invention converts the frame sequence of each video into a series of sentences through the convolutional neural network and the cyclic neural network, and selects high-quality and representative sentences from a large number of sentences through the method of text summarization. sex sentences. Users can use this method to obtain video descriptions, which have high accuracy and can be extended to video retrieval.
参考文献references
[1]KrizhevskyA,SutskeverI,HintonG.基于深度卷积神经网络的图像分类方法[J].神经信息处理系统进展,2012.[1] KrizhevskyA, SutskeverI, HintonG. Image classification method based on deep convolutional neural network [J]. Neural Information Processing System Progress, 2012.
本领域技术人员可以理解附图只是一个优选实施例的示意图,上述本发明实施例序号仅仅为了描述,不代表实施例的优劣。Those skilled in the art can understand that the accompanying drawing is only a schematic diagram of a preferred embodiment, and the serial numbers of the above-mentioned embodiments of the present invention are for description only, and do not represent the advantages and disadvantages of the embodiments.
以上所述仅为本发明的较佳实施例,并不用以限制本发明,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included in the protection of the present invention. within range.
Claims (4)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510697454.3A CN105279495B (en) | 2015-10-23 | 2015-10-23 | A video description method based on deep learning and text summarization |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510697454.3A CN105279495B (en) | 2015-10-23 | 2015-10-23 | A video description method based on deep learning and text summarization |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105279495A true CN105279495A (en) | 2016-01-27 |
CN105279495B CN105279495B (en) | 2019-06-04 |
Family
ID=55148479
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510697454.3A Active CN105279495B (en) | 2015-10-23 | 2015-10-23 | A video description method based on deep learning and text summarization |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105279495B (en) |
Cited By (66)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105894043A (en) * | 2016-04-27 | 2016-08-24 | 上海高智科技发展有限公司 | Method and system for generating video description sentences |
CN106126492A (en) * | 2016-06-07 | 2016-11-16 | 北京高地信息技术有限公司 | Statement recognition methods based on two-way LSTM neutral net and device |
CN106227793A (en) * | 2016-07-20 | 2016-12-14 | 合网络技术(北京)有限公司 | A kind of video and the determination method and device of Video Key word degree of association |
CN106372107A (en) * | 2016-08-19 | 2017-02-01 | 中兴通讯股份有限公司 | Generation method and device of natural language sentence library |
CN106485251A (en) * | 2016-10-08 | 2017-03-08 | 天津工业大学 | Egg embryo classification based on deep learning |
CN106503055A (en) * | 2016-09-27 | 2017-03-15 | 天津大学 | A kind of generation method from structured text to iamge description |
CN106599198A (en) * | 2016-12-14 | 2017-04-26 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | Image description method for multi-stage connection recurrent neural network |
CN106650789A (en) * | 2016-11-16 | 2017-05-10 | 同济大学 | Image description generation method based on depth LSTM network |
CN106650756A (en) * | 2016-12-28 | 2017-05-10 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | Image text description method based on knowledge transfer multi-modal recurrent neural network |
CN106782602A (en) * | 2016-12-01 | 2017-05-31 | 南京邮电大学 | Speech-emotion recognition method based on length time memory network and convolutional neural networks |
CN106845411A (en) * | 2017-01-19 | 2017-06-13 | 清华大学 | A kind of video presentation generation method based on deep learning and probability graph model |
CN106886768A (en) * | 2017-03-02 | 2017-06-23 | 杭州当虹科技有限公司 | A kind of video fingerprinting algorithms based on deep learning |
CN106934352A (en) * | 2017-02-28 | 2017-07-07 | 华南理工大学 | A kind of video presentation method based on two-way fractal net work and LSTM |
CN107038221A (en) * | 2017-03-22 | 2017-08-11 | 杭州电子科技大学 | A kind of video content description method guided based on semantic information |
CN107203598A (en) * | 2017-05-08 | 2017-09-26 | 广州智慧城市发展研究院 | A kind of method and system for realizing image switch labels |
WO2017168252A1 (en) * | 2016-03-31 | 2017-10-05 | Maluuba Inc. | Method and system for processing an input query |
CN107291882A (en) * | 2017-06-19 | 2017-10-24 | 江苏软开信息科技有限公司 | A kind of data automatic statistical analysis method |
CN107292086A (en) * | 2016-04-07 | 2017-10-24 | 西门子保健有限责任公司 | Graphical analysis question and answer |
CN107368887A (en) * | 2017-07-25 | 2017-11-21 | 江西理工大学 | A kind of structure and its construction method of profound memory convolutional neural networks |
CN107391505A (en) * | 2016-05-16 | 2017-11-24 | 腾讯科技(深圳)有限公司 | A kind of image processing method and system |
CN107515900A (en) * | 2017-07-24 | 2017-12-26 | 宗晖(上海)机器人有限公司 | Intelligent robot and its event memorandum system and method |
CN107578062A (en) * | 2017-08-19 | 2018-01-12 | 四川大学 | A Image Caption Method Based on Attribute Probability Vector Guided Attention Patterns |
CN107609501A (en) * | 2017-09-05 | 2018-01-19 | 东软集团股份有限公司 | The close action identification method of human body and device, storage medium, electronic equipment |
CN107707931A (en) * | 2016-08-08 | 2018-02-16 | 阿里巴巴集团控股有限公司 | Generated according to video data and explain data, data synthesis method and device, electronic equipment |
CN107784372A (en) * | 2016-08-24 | 2018-03-09 | 阿里巴巴集团控股有限公司 | Forecasting Methodology, the device and system of destination object attribute |
CN107818306A (en) * | 2017-10-31 | 2018-03-20 | 天津大学 | A kind of video answering method based on attention model |
CN107844751A (en) * | 2017-10-19 | 2018-03-27 | 陕西师范大学 | The sorting technique of guiding filtering length Memory Neural Networks high-spectrum remote sensing |
CN108200483A (en) * | 2017-12-26 | 2018-06-22 | 中国科学院自动化研究所 | Dynamically multi-modal video presentation generation method |
CN108228686A (en) * | 2017-06-15 | 2018-06-29 | 北京市商汤科技开发有限公司 | It is used to implement the matched method, apparatus of picture and text and electronic equipment |
CN108307229A (en) * | 2018-02-02 | 2018-07-20 | 新华智云科技有限公司 | A kind of processing method and equipment of video-audio data |
CN108491208A (en) * | 2018-01-31 | 2018-09-04 | 中山大学 | A kind of code annotation sorting technique based on neural network model |
WO2018170671A1 (en) * | 2017-03-20 | 2018-09-27 | Intel Corporation | Topic-guided model for image captioning system |
CN108665055A (en) * | 2017-03-28 | 2018-10-16 | 上海荆虹电子科技有限公司 | A kind of figure says generation method and device |
CN108683924A (en) * | 2018-05-30 | 2018-10-19 | 北京奇艺世纪科技有限公司 | A kind of method and apparatus of video processing |
CN108734614A (en) * | 2017-04-13 | 2018-11-02 | 腾讯科技(深圳)有限公司 | Traffic congestion prediction technique and device, storage medium |
CN108765383A (en) * | 2018-03-22 | 2018-11-06 | 山西大学 | Video presentation method based on depth migration study |
CN108830287A (en) * | 2018-04-18 | 2018-11-16 | 哈尔滨理工大学 | The Chinese image, semantic of Inception network integration multilayer GRU based on residual error connection describes method |
CN108881950A (en) * | 2018-05-30 | 2018-11-23 | 北京奇艺世纪科技有限公司 | A kind of method and apparatus of video processing |
WO2019024083A1 (en) * | 2017-08-04 | 2019-02-07 | Nokia Technologies Oy | Artificial neural network |
CN109522451A (en) * | 2018-12-13 | 2019-03-26 | 连尚(新昌)网络科技有限公司 | Repeat video detecting method and device |
CN109522531A (en) * | 2017-09-18 | 2019-03-26 | 腾讯科技(北京)有限公司 | Official documents and correspondence generation method and device, storage medium and electronic device |
CN109711022A (en) * | 2018-12-17 | 2019-05-03 | 哈尔滨工程大学 | A submarine anti-sinking system based on deep learning |
CN109891897A (en) * | 2016-10-27 | 2019-06-14 | 诺基亚技术有限公司 | Method for analyzing media content |
CN109960747A (en) * | 2019-04-02 | 2019-07-02 | 腾讯科技(深圳)有限公司 | The generation method of video presentation information, method for processing video frequency, corresponding device |
CN110019952A (en) * | 2017-09-30 | 2019-07-16 | 华为技术有限公司 | Video presentation method, system and device |
CN110119750A (en) * | 2018-02-05 | 2019-08-13 | 浙江宇视科技有限公司 | Data processing method, device and electronic equipment |
CN110188779A (en) * | 2019-06-03 | 2019-08-30 | 中国矿业大学 | A Method for Generating Image Semantic Description |
CN110210499A (en) * | 2019-06-03 | 2019-09-06 | 中国矿业大学 | A kind of adaptive generation system of image, semantic description |
US10445871B2 (en) | 2017-05-22 | 2019-10-15 | General Electric Company | Image analysis neural network systems |
CN110612537A (en) * | 2017-05-02 | 2019-12-24 | 柯达阿拉里斯股份有限公司 | System and method for batch normalized loop highway network |
CN110678816A (en) * | 2017-04-04 | 2020-01-10 | 西门子股份公司 | Method and control device for controlling a technical system |
CN110765921A (en) * | 2019-10-18 | 2020-02-07 | 北京工业大学 | A video object localization method based on weakly supervised learning and video spatiotemporal features |
CN110781345A (en) * | 2019-10-31 | 2020-02-11 | 北京达佳互联信息技术有限公司 | Video description generation model acquisition method, video description generation method and device |
CN111325068A (en) * | 2018-12-14 | 2020-06-23 | 北京京东尚科信息技术有限公司 | Video description method and device based on convolutional neural network |
CN111404676A (en) * | 2020-03-02 | 2020-07-10 | 北京丁牛科技有限公司 | Method and device for generating, storing and transmitting secure and secret key and cipher text |
CN111400545A (en) * | 2020-03-01 | 2020-07-10 | 西北工业大学 | A video annotation method based on deep learning |
CN111461974A (en) * | 2020-02-17 | 2020-07-28 | 天津大学 | Image scanning path control method based on L STM model from coarse to fine |
CN111488807A (en) * | 2020-03-29 | 2020-08-04 | 复旦大学 | Video description generation system based on graph convolution network |
CN111681676A (en) * | 2020-06-09 | 2020-09-18 | 杭州星合尚世影视传媒有限公司 | Method, system and device for identifying and constructing audio frequency by video object and readable storage medium |
WO2020220702A1 (en) * | 2019-04-29 | 2020-11-05 | 北京三快在线科技有限公司 | Generation of natural language |
CN111931690A (en) * | 2020-08-28 | 2020-11-13 | Oppo广东移动通信有限公司 | Model training method, device, equipment and storage medium |
WO2021056750A1 (en) * | 2019-09-29 | 2021-04-01 | 北京市商汤科技开发有限公司 | Search method and device, and storage medium |
CN113191262A (en) * | 2021-04-29 | 2021-07-30 | 桂林电子科技大学 | Video description data processing method, device and storage medium |
CN113641854A (en) * | 2021-07-28 | 2021-11-12 | 上海影谱科技有限公司 | Method and system for converting characters into video |
US11328512B2 (en) | 2019-09-30 | 2022-05-10 | Wipro Limited | Method and system for generating a text summary for a multimedia content |
CN119011953A (en) * | 2024-09-14 | 2024-11-22 | 广州九微信息科技有限公司 | Video-on-demand and audio-frequency service system and method based on cloud computing |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8442927B2 (en) * | 2009-07-30 | 2013-05-14 | Nec Laboratories America, Inc. | Dynamically configurable, multi-ported co-processor for convolutional neural networks |
CN104113789A (en) * | 2014-07-10 | 2014-10-22 | 杭州电子科技大学 | On-line video abstraction generation method based on depth learning |
-
2015
- 2015-10-23 CN CN201510697454.3A patent/CN105279495B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8442927B2 (en) * | 2009-07-30 | 2013-05-14 | Nec Laboratories America, Inc. | Dynamically configurable, multi-ported co-processor for convolutional neural networks |
CN104113789A (en) * | 2014-07-10 | 2014-10-22 | 杭州电子科技大学 | On-line video abstraction generation method based on depth learning |
Non-Patent Citations (2)
Title |
---|
GUNES ERKAN: "LexRank: Graph-based Lexical Centrality as Salience in Text Summarization", 《JOURNAL OF ARTIFICIAL INTELLIGENCE RESEARCH》 * |
SUBHASHINI VENUGOPALAN等: "Translating Videos to Natural Language Using Deep Recurrent Neural Networks", 《COMPUTER SCIENCE》 * |
Cited By (107)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10437929B2 (en) | 2016-03-31 | 2019-10-08 | Maluuba Inc. | Method and system for processing an input query using a forward and a backward neural network specific to unigrams |
WO2017168252A1 (en) * | 2016-03-31 | 2017-10-05 | Maluuba Inc. | Method and system for processing an input query |
CN107292086A (en) * | 2016-04-07 | 2017-10-24 | 西门子保健有限责任公司 | Graphical analysis question and answer |
CN105894043A (en) * | 2016-04-27 | 2016-08-24 | 上海高智科技发展有限公司 | Method and system for generating video description sentences |
CN107391505A (en) * | 2016-05-16 | 2017-11-24 | 腾讯科技(深圳)有限公司 | A kind of image processing method and system |
CN107391505B (en) * | 2016-05-16 | 2020-10-23 | 腾讯科技(深圳)有限公司 | Image processing method and system |
CN106126492A (en) * | 2016-06-07 | 2016-11-16 | 北京高地信息技术有限公司 | Statement recognition methods based on two-way LSTM neutral net and device |
CN106126492B (en) * | 2016-06-07 | 2019-02-05 | 北京高地信息技术有限公司 | Sentence recognition methods and device based on two-way LSTM neural network |
CN106227793A (en) * | 2016-07-20 | 2016-12-14 | 合网络技术(北京)有限公司 | A kind of video and the determination method and device of Video Key word degree of association |
CN106227793B (en) * | 2016-07-20 | 2019-10-22 | 优酷网络技术(北京)有限公司 | A kind of determination method and device of video and the Video Key word degree of correlation |
CN107707931A (en) * | 2016-08-08 | 2018-02-16 | 阿里巴巴集团控股有限公司 | Generated according to video data and explain data, data synthesis method and device, electronic equipment |
CN106372107B (en) * | 2016-08-19 | 2020-01-17 | 中兴通讯股份有限公司 | Method and device for generating natural language sentence library |
CN106372107A (en) * | 2016-08-19 | 2017-02-01 | 中兴通讯股份有限公司 | Generation method and device of natural language sentence library |
CN107784372A (en) * | 2016-08-24 | 2018-03-09 | 阿里巴巴集团控股有限公司 | Forecasting Methodology, the device and system of destination object attribute |
CN107784372B (en) * | 2016-08-24 | 2022-02-22 | 阿里巴巴集团控股有限公司 | Target object attribute prediction method, device and system |
CN106503055B (en) * | 2016-09-27 | 2019-06-04 | 天津大学 | A Generating Method from Structured Text to Image Descriptions |
CN106503055A (en) * | 2016-09-27 | 2017-03-15 | 天津大学 | A kind of generation method from structured text to iamge description |
CN106485251B (en) * | 2016-10-08 | 2019-12-24 | 天津工业大学 | Classification of egg embryos based on deep learning |
CN106485251A (en) * | 2016-10-08 | 2017-03-08 | 天津工业大学 | Egg embryo classification based on deep learning |
CN109891897B (en) * | 2016-10-27 | 2021-11-05 | 诺基亚技术有限公司 | Method for analyzing media content |
US11068722B2 (en) | 2016-10-27 | 2021-07-20 | Nokia Technologies Oy | Method for analysing media content to generate reconstructed media content |
CN109891897A (en) * | 2016-10-27 | 2019-06-14 | 诺基亚技术有限公司 | Method for analyzing media content |
CN106650789B (en) * | 2016-11-16 | 2023-04-07 | 同济大学 | Image description generation method based on depth LSTM network |
CN106650789A (en) * | 2016-11-16 | 2017-05-10 | 同济大学 | Image description generation method based on depth LSTM network |
CN106782602A (en) * | 2016-12-01 | 2017-05-31 | 南京邮电大学 | Speech-emotion recognition method based on length time memory network and convolutional neural networks |
CN106599198B (en) * | 2016-12-14 | 2021-04-06 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | An image description method based on multi-level connection recurrent neural network |
CN106599198A (en) * | 2016-12-14 | 2017-04-26 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | Image description method for multi-stage connection recurrent neural network |
CN106650756B (en) * | 2016-12-28 | 2019-12-10 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | knowledge migration-based image text description method of multi-mode recurrent neural network |
CN106650756A (en) * | 2016-12-28 | 2017-05-10 | 广东顺德中山大学卡内基梅隆大学国际联合研究院 | Image text description method based on knowledge transfer multi-modal recurrent neural network |
CN106845411B (en) * | 2017-01-19 | 2020-06-30 | 清华大学 | Video description generation method based on deep learning and probability map model |
CN106845411A (en) * | 2017-01-19 | 2017-06-13 | 清华大学 | A kind of video presentation generation method based on deep learning and probability graph model |
CN106934352A (en) * | 2017-02-28 | 2017-07-07 | 华南理工大学 | A kind of video presentation method based on two-way fractal net work and LSTM |
CN106886768A (en) * | 2017-03-02 | 2017-06-23 | 杭州当虹科技有限公司 | A kind of video fingerprinting algorithms based on deep learning |
WO2018170671A1 (en) * | 2017-03-20 | 2018-09-27 | Intel Corporation | Topic-guided model for image captioning system |
CN107038221A (en) * | 2017-03-22 | 2017-08-11 | 杭州电子科技大学 | A kind of video content description method guided based on semantic information |
CN108665055B (en) * | 2017-03-28 | 2020-10-23 | 深圳荆虹科技有限公司 | Method and device for generating graphic description |
CN108665055A (en) * | 2017-03-28 | 2018-10-16 | 上海荆虹电子科技有限公司 | A kind of figure says generation method and device |
US10983485B2 (en) | 2017-04-04 | 2021-04-20 | Siemens Aktiengesellschaft | Method and control device for controlling a technical system |
CN110678816A (en) * | 2017-04-04 | 2020-01-10 | 西门子股份公司 | Method and control device for controlling a technical system |
CN110678816B (en) * | 2017-04-04 | 2021-02-19 | 西门子股份公司 | Method and control device for controlling a technical system |
CN108734614A (en) * | 2017-04-13 | 2018-11-02 | 腾讯科技(深圳)有限公司 | Traffic congestion prediction technique and device, storage medium |
CN110612537A (en) * | 2017-05-02 | 2019-12-24 | 柯达阿拉里斯股份有限公司 | System and method for batch normalized loop highway network |
CN107203598A (en) * | 2017-05-08 | 2017-09-26 | 广州智慧城市发展研究院 | A kind of method and system for realizing image switch labels |
US10445871B2 (en) | 2017-05-22 | 2019-10-15 | General Electric Company | Image analysis neural network systems |
CN108228686A (en) * | 2017-06-15 | 2018-06-29 | 北京市商汤科技开发有限公司 | It is used to implement the matched method, apparatus of picture and text and electronic equipment |
CN108228686B (en) * | 2017-06-15 | 2021-03-23 | 北京市商汤科技开发有限公司 | Method and device for realizing image-text matching and electronic equipment |
CN107291882A (en) * | 2017-06-19 | 2017-10-24 | 江苏软开信息科技有限公司 | A kind of data automatic statistical analysis method |
CN107515900A (en) * | 2017-07-24 | 2017-12-26 | 宗晖(上海)机器人有限公司 | Intelligent robot and its event memorandum system and method |
CN107368887A (en) * | 2017-07-25 | 2017-11-21 | 江西理工大学 | A kind of structure and its construction method of profound memory convolutional neural networks |
CN107368887B (en) * | 2017-07-25 | 2020-08-07 | 江西理工大学 | A device for deep memory convolutional neural network and its construction method |
US11481625B2 (en) | 2017-08-04 | 2022-10-25 | Nokia Technologies Oy | Artificial neural network |
WO2019024083A1 (en) * | 2017-08-04 | 2019-02-07 | Nokia Technologies Oy | Artificial neural network |
CN107578062A (en) * | 2017-08-19 | 2018-01-12 | 四川大学 | A Image Caption Method Based on Attribute Probability Vector Guided Attention Patterns |
CN107609501A (en) * | 2017-09-05 | 2018-01-19 | 东软集团股份有限公司 | The close action identification method of human body and device, storage medium, electronic equipment |
CN109522531A (en) * | 2017-09-18 | 2019-03-26 | 腾讯科技(北京)有限公司 | Official documents and correspondence generation method and device, storage medium and electronic device |
CN109522531B (en) * | 2017-09-18 | 2023-04-07 | 腾讯科技(北京)有限公司 | Document generation method and device, storage medium and electronic device |
CN110019952A (en) * | 2017-09-30 | 2019-07-16 | 华为技术有限公司 | Video presentation method, system and device |
CN110019952B (en) * | 2017-09-30 | 2023-04-18 | 华为技术有限公司 | Video description method, system and device |
CN107844751A (en) * | 2017-10-19 | 2018-03-27 | 陕西师范大学 | The sorting technique of guiding filtering length Memory Neural Networks high-spectrum remote sensing |
CN107844751B (en) * | 2017-10-19 | 2021-08-27 | 陕西师范大学 | Method for classifying hyperspectral remote sensing images of guide filtering long and short memory neural network |
CN107818306B (en) * | 2017-10-31 | 2020-08-07 | 天津大学 | Video question-answering method based on attention model |
CN107818306A (en) * | 2017-10-31 | 2018-03-20 | 天津大学 | A kind of video answering method based on attention model |
CN108200483A (en) * | 2017-12-26 | 2018-06-22 | 中国科学院自动化研究所 | Dynamically multi-modal video presentation generation method |
CN108200483B (en) * | 2017-12-26 | 2020-02-28 | 中国科学院自动化研究所 | A dynamic multimodal video description generation method |
CN108491208A (en) * | 2018-01-31 | 2018-09-04 | 中山大学 | A kind of code annotation sorting technique based on neural network model |
CN108307229B (en) * | 2018-02-02 | 2023-12-22 | 新华智云科技有限公司 | Video and audio data processing method and device |
CN108307229A (en) * | 2018-02-02 | 2018-07-20 | 新华智云科技有限公司 | A kind of processing method and equipment of video-audio data |
CN110119750A (en) * | 2018-02-05 | 2019-08-13 | 浙江宇视科技有限公司 | Data processing method, device and electronic equipment |
CN108765383B (en) * | 2018-03-22 | 2022-03-18 | 山西大学 | Video description method based on deep migration learning |
CN108765383A (en) * | 2018-03-22 | 2018-11-06 | 山西大学 | Video presentation method based on depth migration study |
CN108830287A (en) * | 2018-04-18 | 2018-11-16 | 哈尔滨理工大学 | The Chinese image, semantic of Inception network integration multilayer GRU based on residual error connection describes method |
CN108881950B (en) * | 2018-05-30 | 2021-05-25 | 北京奇艺世纪科技有限公司 | Video processing method and device |
CN108683924A (en) * | 2018-05-30 | 2018-10-19 | 北京奇艺世纪科技有限公司 | A kind of method and apparatus of video processing |
CN108881950A (en) * | 2018-05-30 | 2018-11-23 | 北京奇艺世纪科技有限公司 | A kind of method and apparatus of video processing |
CN109522451B (en) * | 2018-12-13 | 2024-02-27 | 连尚(新昌)网络科技有限公司 | Repeated video detection method and device |
CN109522451A (en) * | 2018-12-13 | 2019-03-26 | 连尚(新昌)网络科技有限公司 | Repeat video detecting method and device |
CN111325068B (en) * | 2018-12-14 | 2023-11-07 | 北京京东尚科信息技术有限公司 | Video description method and device based on convolutional neural network |
CN111325068A (en) * | 2018-12-14 | 2020-06-23 | 北京京东尚科信息技术有限公司 | Video description method and device based on convolutional neural network |
CN109711022A (en) * | 2018-12-17 | 2019-05-03 | 哈尔滨工程大学 | A submarine anti-sinking system based on deep learning |
CN109960747B (en) * | 2019-04-02 | 2022-12-16 | 腾讯科技(深圳)有限公司 | Video description information generation method, video processing method and corresponding devices |
CN109960747A (en) * | 2019-04-02 | 2019-07-02 | 腾讯科技(深圳)有限公司 | The generation method of video presentation information, method for processing video frequency, corresponding device |
US11861886B2 (en) | 2019-04-02 | 2024-01-02 | Tencent Technology (Shenzhen) Company Limited | Method and apparatus for generating video description information, and method and apparatus for video processing |
WO2020220702A1 (en) * | 2019-04-29 | 2020-11-05 | 北京三快在线科技有限公司 | Generation of natural language |
CN110210499B (en) * | 2019-06-03 | 2023-10-13 | 中国矿业大学 | An adaptive generation system for image semantic description |
CN110188779A (en) * | 2019-06-03 | 2019-08-30 | 中国矿业大学 | A Method for Generating Image Semantic Description |
CN110210499A (en) * | 2019-06-03 | 2019-09-06 | 中国矿业大学 | A kind of adaptive generation system of image, semantic description |
WO2021056750A1 (en) * | 2019-09-29 | 2021-04-01 | 北京市商汤科技开发有限公司 | Search method and device, and storage medium |
US11328512B2 (en) | 2019-09-30 | 2022-05-10 | Wipro Limited | Method and system for generating a text summary for a multimedia content |
CN110765921A (en) * | 2019-10-18 | 2020-02-07 | 北京工业大学 | A video object localization method based on weakly supervised learning and video spatiotemporal features |
CN110765921B (en) * | 2019-10-18 | 2022-04-19 | 北京工业大学 | Video object positioning method based on weak supervised learning and video spatiotemporal features |
CN110781345B (en) * | 2019-10-31 | 2022-12-27 | 北京达佳互联信息技术有限公司 | Video description generation model obtaining method, video description generation method and device |
CN110781345A (en) * | 2019-10-31 | 2020-02-11 | 北京达佳互联信息技术有限公司 | Video description generation model acquisition method, video description generation method and device |
CN111461974A (en) * | 2020-02-17 | 2020-07-28 | 天津大学 | Image scanning path control method based on L STM model from coarse to fine |
CN111461974B (en) * | 2020-02-17 | 2023-04-25 | 天津大学 | Image scanning path control method based on coarse-to-fine LSTM model |
CN111400545A (en) * | 2020-03-01 | 2020-07-10 | 西北工业大学 | A video annotation method based on deep learning |
CN111404676B (en) * | 2020-03-02 | 2023-08-29 | 北京丁牛科技有限公司 | Method and device for generating, storing and transmitting secret key and ciphertext |
CN111404676A (en) * | 2020-03-02 | 2020-07-10 | 北京丁牛科技有限公司 | Method and device for generating, storing and transmitting secure and secret key and cipher text |
CN111488807A (en) * | 2020-03-29 | 2020-08-04 | 复旦大学 | Video description generation system based on graph convolution network |
CN111488807B (en) * | 2020-03-29 | 2023-10-10 | 复旦大学 | Video description generation system based on graph rolling network |
CN111681676B (en) * | 2020-06-09 | 2023-08-08 | 杭州星合尚世影视传媒有限公司 | Method, system, device and readable storage medium for constructing audio frequency by video object identification |
CN111681676A (en) * | 2020-06-09 | 2020-09-18 | 杭州星合尚世影视传媒有限公司 | Method, system and device for identifying and constructing audio frequency by video object and readable storage medium |
CN111931690A (en) * | 2020-08-28 | 2020-11-13 | Oppo广东移动通信有限公司 | Model training method, device, equipment and storage medium |
CN111931690B (en) * | 2020-08-28 | 2024-08-13 | Oppo广东移动通信有限公司 | Model training method, device, equipment and storage medium |
CN113191262A (en) * | 2021-04-29 | 2021-07-30 | 桂林电子科技大学 | Video description data processing method, device and storage medium |
CN113641854B (en) * | 2021-07-28 | 2023-09-26 | 上海影谱科技有限公司 | Method and system for converting text into video |
CN113641854A (en) * | 2021-07-28 | 2021-11-12 | 上海影谱科技有限公司 | Method and system for converting characters into video |
CN119011953A (en) * | 2024-09-14 | 2024-11-22 | 广州九微信息科技有限公司 | Video-on-demand and audio-frequency service system and method based on cloud computing |
Also Published As
Publication number | Publication date |
---|---|
CN105279495B (en) | 2019-06-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105279495B (en) | A video description method based on deep learning and text summarization | |
CN106503055B (en) | A Generating Method from Structured Text to Image Descriptions | |
US20210256051A1 (en) | Theme classification method based on multimodality, device, and storage medium | |
CN109543084B (en) | Method for establishing detection model of hidden sensitive text facing network social media | |
CN110019839B (en) | Method and system for constructing medical knowledge graph based on neural network and remote supervision | |
CN112270196B (en) | Entity relationship identification method and device and electronic equipment | |
WO2019200806A1 (en) | Device for generating text classification model, method, and computer readable storage medium | |
CN106682192B (en) | A method and device for training an answer intent classification model based on search keywords | |
CN111159485B (en) | Tail entity linking method, device, server and storage medium | |
CN104142995B (en) | The social event recognition methods of view-based access control model attribute | |
CN106126619A (en) | A kind of video retrieval method based on video content and system | |
CN111814477B (en) | Dispute focus discovery method, device and terminal based on dispute focus entity | |
CN108280057A (en) | A kind of microblogging rumour detection method based on BLSTM | |
CN114661872B (en) | A beginner-oriented API adaptive recommendation method and system | |
CN110377778A (en) | Figure sort method, device and electronic equipment based on title figure correlation | |
CN114818724B (en) | A method for constructing an effective disaster information detection model on social media | |
CN113343690A (en) | Text readability automatic evaluation method and device | |
CN117033558A (en) | BERT-WWM and multi-feature fused film evaluation emotion analysis method | |
CN113239159A (en) | Cross-modal retrieval method of videos and texts based on relational inference network | |
CN105975497A (en) | Automatic microblog topic recommendation method and device | |
CN110046353A (en) | An Aspect-Level Sentiment Analysis Method Based on Multilingual Hierarchical Mechanism | |
CN114579741A (en) | Syntactic information fused GCN-RN aspect level emotion analysis method and system | |
CN110110137A (en) | Method and device for determining music characteristics, electronic equipment and storage medium | |
CN106599824A (en) | GIF cartoon emotion identification method based on emotion pairs | |
CN115775349A (en) | False news detection method and device based on multi-mode fusion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20220322 Address after: 511400 4th floor, No. 685, Shiqiao South Road, Panyu District, Guangzhou, Guangdong Patentee after: GUANGZHOU WELLTHINKER AUTOMATION TECHNOLOGY CO.,LTD. Address before: 300072 Tianjin City, Nankai District Wei Jin Road No. 92 Patentee before: Tianjin University |