CN104113789B - On-line video abstraction generation method based on depth learning - Google Patents

On-line video abstraction generation method based on depth learning Download PDF

Info

Publication number
CN104113789B
CN104113789B CN201410326406.9A CN201410326406A CN104113789B CN 104113789 B CN104113789 B CN 104113789B CN 201410326406 A CN201410326406 A CN 201410326406A CN 104113789 B CN104113789 B CN 104113789B
Authority
CN
China
Prior art keywords
video
frame
frame block
dictionary
parameter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410326406.9A
Other languages
Chinese (zh)
Other versions
CN104113789A (en
Inventor
李平
俞俊
李黎
徐向华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Huicui Intelligent Technology Co ltd
Original Assignee
Hangzhou Dianzi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dianzi University filed Critical Hangzhou Dianzi University
Priority to CN201410326406.9A priority Critical patent/CN104113789B/en
Publication of CN104113789A publication Critical patent/CN104113789A/en
Application granted granted Critical
Publication of CN104113789B publication Critical patent/CN104113789B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to an on-line video abstraction generation method based on depth learning. An original video is subjected to the following operation: 1) cutting the video uniformly into a group of small frame blocks, extracting statistical characteristics of each frame image and forming corresponding vectorization expressions; 2) pre-training video frame multilayer depth network and obtaining the nonlinearity expression of each frame; 3) selecting the front m frame blocks being as an initial concise video, and carrying out reconstruction on the concise video through a group sparse coding algorithm to obtain an initial dictionary and reconstruction coefficients; 4) updating depth network parameters according to the next frame block, carrying out reconstruction and reconstruction error calculation on the frame block, and adding the frame block to the concise video and updating the dictionary if the error is larger than a set threshold; and 5) processing new frame blocks till the end in sequence on line according to the step 4), and the updated concise video being generated video abstraction. With the method, latent high-level semantic information of the video can be excavated deeply, the video abstraction can be generated quickly, time of users is saved, and visual experience is improved.

Description

一种基于深度学习的视频摘要在线生成方法A Deep Learning-Based Online Video Summary Generation Method

技术领域technical field

本发明属于视频摘要生成的技术领域,特别是基于深度学习的视频摘要在线生成方法。The invention belongs to the technical field of video abstract generation, in particular to an online video abstract generation method based on deep learning.

背景技术Background technique

近年来,随着数字摄像机、智能手机、掌上电脑等便携式设备的日益普及,各类视频的数量呈井喷式增长。例如,在智能交通、安防监控、公安布防等社会重要领域的视频采集设备在一个中型城市中高达几万路,这些设备产生的视频数据达PB级。为了锁定目标人物或车辆,公安交警等人员需要耗费大量的时间调看冗长乏味监控的视频流,这极大地影响了办事效率,不利于平安城市的创建。因此,从冗长的视频流中有效地选取包含关键信息的视频帧,即视频摘要技术,受到了学术界和工业界的广泛关注。In recent years, with the increasing popularity of portable devices such as digital cameras, smart phones, and handheld computers, the number of various types of videos has shown a blowout growth. For example, there are tens of thousands of video acquisition devices in important social fields such as intelligent transportation, security monitoring, and public security deployment in a medium-sized city, and the video data generated by these devices reaches PB level. In order to lock the target person or vehicle, the public security traffic police and other personnel need to spend a lot of time watching the tedious surveillance video stream, which greatly affects the efficiency of work and is not conducive to the creation of a safe city. Therefore, efficient selection of video frames containing key information from lengthy video streams, namely video summarization techniques, has drawn extensive attention from academia and industry.

传统的视频摘要技术主要针对编辑过的结构化视频,如一部电影可分为多个场景,每个场景由同一地点发生的多个情节组成,每个情节又由一系列光滑连续的视频帧构成。不同于传统的电影、电视剧、新闻报道等结构化视频,监控视频一般是未经剪辑的非结构化视频,这为视频摘要技术的应用带来较大挑战。Traditional video summarization techniques are mainly aimed at edited structured videos, such as a movie can be divided into multiple scenes, each scene consists of multiple plots that take place at the same location, and each plot consists of a series of smooth and continuous video frames . Unlike traditional structured videos such as movies, TV dramas, and news reports, surveillance videos are generally unedited and unstructured videos, which brings great challenges to the application of video summarization technology.

目前,主要的视频摘要领域有基于关键帧方法、创建新图像、视频帧块、转自然语言处理等技术。基于关键帧的方法包括情节边缘检测、视频帧聚类、颜色直方图、动作稳定性等策略;创建新图像利用包含重要内容的一些连续帧生成,该方法容易受到不同帧之间的模糊因素影响;视频帧块方法利用结构化视频中的场景边缘检测、对话分析等技术对原始进行裁剪,形成短小的主题电影;转自然语言处理是指利用视频中的字幕和语音信息将视频摘要转化为文本摘要的技术,该技术不适合处理无字幕或声音的监控视频。At present, the main fields of video summarization are based on key frame methods, creating new images, video frame blocks, and converting to natural language processing. Keyframe-based methods include strategies such as plot edge detection, video frame clustering, color histogram, motion stability, etc.; creating a new image is generated using a few consecutive frames containing important content, which is susceptible to blurring factors between different frames ; The video frame block method uses scene edge detection, dialogue analysis and other technologies in the structured video to cut the original to form a short theme movie; converting to natural language processing refers to using the subtitles and voice information in the video to convert the video summary into text Abstract technology, which is not suitable for processing surveillance video without subtitles or sound.

针对智能交通、安防布控等重要领域源源不断地产生大量非结构化视频,传统的视频摘要方法不能满足在线处理流式视频的应用要求。为此,迫切需要既能在线处理视频流,又能高效准确选取包含关键内容的视频摘要方法。A large number of unstructured videos are continuously generated in important fields such as intelligent transportation and security control. Traditional video summarization methods cannot meet the application requirements of online processing streaming video. Therefore, there is an urgent need for a method that can not only process video streams online, but also efficiently and accurately select video summarization that contains key content.

发明内容Contents of the invention

为了高效准确地在线浓缩和精简冗长乏味的视频流,以节省用户时间并增强视频内容的视觉效果,本发明提出了一种基于深度学习的视频摘要在线生成方法,该方法包括以下步骤:In order to efficiently and accurately condense and streamline lengthy and tedious video streams online, so as to save user time and enhance the visual effect of video content, the present invention proposes a method for online generation of video summaries based on deep learning, which includes the following steps:

1、获取原始视频数据后,进行以下操作:1. After obtaining the original video data, perform the following operations:

1)将视频均匀切分为一组小帧块,每个帧块包含多帧,提取各帧图像的统计特征,形成相应的向量化表示;1) The video is evenly divided into a group of small frame blocks, each frame block contains multiple frames, and the statistical features of each frame image are extracted to form a corresponding vectorized representation;

2)预训练视频帧多层深度网络,获得各帧的非线性表示;2) Pre-train the multi-layer deep network of video frames to obtain the nonlinear representation of each frame;

3)选取前m帧块为初始精简视频,并通过组稀疏编码算法对其进行重构,获得初始词典和重构系数;3) Select the first m frame blocks as the initial simplified video, and reconstruct it through the group sparse coding algorithm to obtain the initial dictionary and reconstruction coefficients;

4)根据下一帧块更新深度网络参数,同时对该帧块进行重构并计算重构误差,若误差大于设定阈值,则将该帧块加入精简视频中并更新词典;4) Update the deep network parameters according to the next frame block, and at the same time reconstruct the frame block and calculate the reconstruction error. If the error is greater than the set threshold, add the frame block to the simplified video and update the dictionary;

5)按照步骤4)依次在线处理新的帧块直到结束,更新的精简视频即为生成的视频摘要。5) According to step 4), the new frame blocks are processed online sequentially until the end, and the updated simplified video is the generated video summary.

进一步,所述的步骤1)中所述的提取各帧图像的统计特征形成相应向量化表示,具体是:Further, the statistical features of each frame image extracted in the described step 1) form a corresponding vectorized representation, specifically:

1)设原始视频均匀分为n个帧块,即每个帧块包含t帧图像(如t=80),将各帧图像缩放成统一像素大小并保持原始的纵横比例;1) Suppose the original video is evenly divided into n frame blocks, namely Each frame block contains t frames of images (such as t=80), and each frame of images is scaled to a uniform pixel size and maintains the original aspect ratio;

2)提取各帧图像的颜色直方图、颜色矩、边缘方向直方图、Gabor小波变换、局部二值模式等全局特征和尺度不变特征变换(SIFT:Scale-Invariant Feature Transform)、加速鲁棒特征(SURF:Speeded Up Robust Feature)等局部特征;2) Extract global features such as color histogram, color moment, edge direction histogram, Gabor wavelet transform, local binary mode and scale invariant feature transform (SIFT: Scale-Invariant Feature Transform) and accelerated robust features of each frame image (SURF: Speeded Up Robust Feature) and other local features;

3)顺序联接各帧的上述图像特征,形成维度为nf的向量化表示。3) The above-mentioned image features of each frame are sequentially connected to form a vectorized representation with dimension nf .

进一步,所述的步骤2)中的预训练视频帧多层深度网络获得各帧的非线性表示,具体是:Further, the pre-training video frame multilayer depth network in the described step 2) obtains the non-linear representation of each frame, specifically:

利用堆叠去噪自编码器(SDA:Stacked Denoising Autoencoder)预训练多层深度网络(层数小于10);Use stacked denoising autoencoder (SDA: Stacked Denoising Autoencoder) to pre-train multi-layer deep network (the number of layers is less than 10);

a、在每一层对各帧图像进行如下操作:首先,通过添加较小的高斯噪声、随机设输入变量为任意值等途径生成各帧噪声图像;然后,噪声图像通过自编码器(AE:AutoEncoder)进行映射得到其非线性表示;a. Perform the following operations on each frame of image at each layer: First, generate each frame of noise image by adding small Gaussian noise, randomly setting the input variable to any value, etc.; then, the noise image passes through the autoencoder (AE: AutoEncoder) is mapped to obtain its nonlinear representation;

b、利用随机梯度下降算法对深度网络的各层参数进行调整更新;b. Use the stochastic gradient descent algorithm to adjust and update the parameters of each layer of the deep network;

所述的步骤3)中的通过组稀疏编码算法对初始精简视频进行重构,具体是:In the described step 3), the initial simplified video is reconstructed by the group sparse coding algorithm, specifically:

1)初始精简视频由原始视频的前m个帧块组成(m为小于50的正整数),即共有ninit=m×t帧图像,Xk对应第k个原始帧块;通过预训练深度网络得到相应的非线性表示为Yk对应第k个帧块的非线性表示;1) The initial streamlined video consists of the first m frame blocks of the original video (m is a positive integer less than 50), namely There are a total of n init =m×t frame images, X k corresponds to the kth original frame block; the corresponding nonlinear representation is obtained by pre-training the deep network as Y k corresponds to the nonlinear representation of the kth frame block;

2)设初始词典D由nd个原子组成,即dj对应第j个原子;设重构系数为C,其元素个数对应帧数目,其维度对应词典的原子数目,即Ck对应第k个帧块系数,对应第i帧图像;2) Let the initial dictionary D consist of n d atoms, namely d j corresponds to the jth atom; let the reconstruction coefficient be C, the number of its elements corresponds to the number of frames, and its dimension corresponds to the number of atoms in the dictionary, namely C k corresponds to the kth frame block coefficient, Corresponding to the i-th frame image;

3)利用乘子交替方向方法优化正则化词典的组稀疏编码目标函数,可以分别得到初始词典D和重构系数C,即求解3) Using the multiplier alternating direction method to optimize the group sparse coding objective function of the regularized dictionary, the initial dictionary D and the reconstruction coefficient C can be obtained respectively, that is, to solve

其中,符号||·||2表示变量的l2范式,正则化参数λ为大于0的实数,多元函数F(Yk,Ck,D)的具体表达为:Among them, the symbol ||·|| 2 represents the l 2 normal form of the variable, the regularization parameter λ is a real number greater than 0, and the specific expression of the multivariate function F(Y k ,C k ,D) is:

其中,参数γ为大于0的实数,符号中的数学式子表示使用词典D对第i帧图像进行重构。这里的乘子交替方向方法具体为:先固定参数D,使上述目标函数变成针对参数C的凸函数;然后固定参数C,使上述目标函数变成针对参数D的凸函数,迭代交替更新两个参数。Among them, the parameter γ is a real number greater than 0, and the symbol The mathematical formula in represents that the i-th frame image is reconstructed using the dictionary D. The method of alternating direction of the multiplier here is as follows: first, fix the parameter D, so that the above objective function becomes a convex function for the parameter C; then fix the parameter C, make the above objective function become a convex function for the parameter D, iteratively update the two parameters.

所述的步骤4)中的根据下一帧块更新深度网络参数并对该帧块进行重构和计算重构误差,具体是:In the described step 4), update the deep network parameters according to the next frame block and reconstruct the frame block and calculate the reconstruction error, specifically:

1)对该帧块的各帧图像依次做如下操作:1) Do the following operations in turn for each frame image of the frame block:

a.利用在线梯度下降算法更新深度神经网络中最后一层的参数,即权重W和偏移量b;a. Use the online gradient descent algorithm to update the parameters of the last layer in the deep neural network, that is, the weight W and the offset b;

b.利用后向传播算法更新深度神经网络中其他层的参数;b. Utilize the backpropagation algorithm to update the parameters of other layers in the deep neural network;

2)根据新的参数更新各帧图像的非线性表示;2) Update the nonlinear representation of each frame image according to the new parameters;

3)基于现有词典D,利用组稀疏编码对当前帧块进行重构并计算误差ε,即对当前帧块Xk的非线性表示Yk进行重构,具体步骤为:先最小化多元函数F(Yk,Ck,D)得到最优重构系数然后带入的第一项中并计算其值即为当前重构误差ε。3) Based on the existing dictionary D, use group sparse coding to reconstruct the current frame block and calculate the error ε, that is, reconstruct the nonlinear representation Y k of the current frame block X k . The specific steps are: first minimize the multivariate function F(Y k ,C k ,D) to get the optimal reconstruction coefficient then bring in the first item of and calculate its value as the current reconstruction error ε.

所述的步骤4)中的若误差大于设定阈值则将当前帧块加入精简视频中并更新词典,具体是:If error in described step 4) is greater than setting threshold value then current frame block is added in the streamlined video and update dictionary, specifically:

1)若对当前帧块Xk的非线性表示Yk计算得到的重构误差ε大于设定阈值θ(取经验值),则将当前帧块加入精简视频中,即 1) If the reconstruction error ε calculated from the nonlinear representation Y k of the current frame block X k is greater than the set threshold θ (experimental value), then the current frame block is added to the simplified video, that is

2)若当前精简视频中含有q个帧块,则更新词典的帧图像非线性表示集合为那么使用更新词典D即求解目标函数2) If the current streamlined video Contains q frame blocks in , then the frame image nonlinear representation set of the updated dictionary is then use Updating the dictionary D means solving the objective function

其中,参数λ为大于0的实数,用于调节正则化项的影响。Among them, the parameter λ is a real number greater than 0, which is used to adjust the influence of the regularization term.

本发明提出了基于深度学习的视频摘要在线生成方法,其优点在于:利用深度学习挖掘视频中的高层语义特征,使得组稀疏编码能更好反映词典重构当前视频帧块的程度,从而最具信息量的视频帧块构成包含兴趣区域和关键人物事件的视频摘要;精简的视频摘要为用户节省了大量的时间,同时增强了关键内容的视觉体验。The present invention proposes an online generation method of video summaries based on deep learning, which has the advantage of using deep learning to mine high-level semantic features in videos, so that group sparse coding can better reflect the extent to which the dictionary reconstructs the current video frame blocks, thereby making the most Information-rich video frame blocks form a video summary that includes regions of interest and key person events; the streamlined video summary saves a lot of time for users, and at the same time enhances the visual experience of key content.

附图说明Description of drawings

图1是本发明的方法流程图。Fig. 1 is a flow chart of the method of the present invention.

具体实施方式detailed description

参照附图1,进一步说明本发明:With reference to accompanying drawing 1, further illustrate the present invention:

1、获取原始视频数据后,进行以下操作:1. After obtaining the original video data, perform the following operations:

1)将视频均匀切分为一组小帧块,每个帧块包含多帧,提取各帧图像的统计特征,形成相应的向量化表示;1) The video is evenly divided into a group of small frame blocks, each frame block contains multiple frames, and the statistical features of each frame image are extracted to form a corresponding vectorized representation;

2)预训练视频帧多层深度网络,获得各帧的非线性表示;2) Pre-train the multi-layer deep network of video frames to obtain the nonlinear representation of each frame;

3)选取前m帧块为初始精简视频,并通过组稀疏编码算法对其进行重构,获得初始词典和重构系数;3) Select the first m frame blocks as the initial simplified video, and reconstruct it through the group sparse coding algorithm to obtain the initial dictionary and reconstruction coefficients;

4)根据下一帧块更新深度网络参数,同时对该帧块进行重构并计算重构误差,若误差大于设定阈值,则将该帧块加入精简视频中并更新词典;4) Update the deep network parameters according to the next frame block, and at the same time reconstruct the frame block and calculate the reconstruction error. If the error is greater than the set threshold, add the frame block to the simplified video and update the dictionary;

5)按照步骤4)依次在线处理新的帧块直到结束,更新的精简视频即为生成的视频摘要。5) According to step 4), the new frame blocks are processed online sequentially until the end, and the updated simplified video is the generated video summary.

步骤1)中所述的提取各帧图像的统计特征形成相应向量化表示,具体是:Step 1) described in extracting the statistical feature of each frame image forms corresponding vectorized representation, specifically:

1)设原始视频均匀分为n个帧块,即每个帧块包含t帧图像(如t=80),将各帧图像缩放成统一像素大小并保持原始的纵横比例;1) Suppose the original video is evenly divided into n frame blocks, namely Each frame block contains t frames of images (such as t=80), and each frame of images is scaled to a uniform pixel size and maintains the original aspect ratio;

2)提取各帧图像的颜色直方图、颜色矩、边缘方向直方图、Gabor小波变换、局部二值模式等全局特征和尺度不变特征变换(SIFT:Scale-Invariant Feature Transform)、加速鲁棒特征(SURF:Speeded Up Robust Feature)等局部特征;2) Extract global features such as color histogram, color moment, edge direction histogram, Gabor wavelet transform, local binary mode and scale invariant feature transform (SIFT: Scale-Invariant Feature Transform) and accelerated robust features of each frame image (SURF: Speeded Up Robust Feature) and other local features;

3)顺序联接各帧的上述图像特征,形成维度为nf的向量化表示。3) The above-mentioned image features of each frame are sequentially connected to form a vectorized representation with dimension nf.

步骤2)中的预训练视频帧多层深度网络获得各帧的非线性表示,具体是:The pre-training video frame multilayer deep network in step 2) obtains the non-linear representation of each frame, specifically:

利用堆叠去噪自编码器(SDA:Stacked Denoising Autoencoder)预训练多层深度网络(层数小于10);Use stacked denoising autoencoder (SDA: Stacked Denoising Autoencoder) to pre-train multi-layer deep network (the number of layers is less than 10);

a、在每一层对各帧图像进行如下操作:首先,通过添加较小的高斯噪声、随机设输入变量为任意值等途径生成各帧噪声图像;然后,噪声图像通过自编码器(AE:AutoEncoder)进行映射得到其非线性表示;a. Perform the following operations on each frame of image at each layer: First, generate each frame of noise image by adding small Gaussian noise, randomly setting the input variable to any value, etc.; then, the noise image passes through the autoencoder (AE: AutoEncoder) is mapped to obtain its nonlinear representation;

b、利用随机梯度下降算法对深度网络的各层参数进行调整更新;b. Use the stochastic gradient descent algorithm to adjust and update the parameters of each layer of the deep network;

步骤3)中的通过组稀疏编码算法对初始精简视频进行重构,具体是:In step 3), the initial simplified video is reconstructed through the group sparse coding algorithm, specifically:

1)初始精简视频由原始视频的前m个帧块组成(m为小于50的正整数),即共有ninit=m×t帧图像,Xk对应第k个原始帧块;通过预训练深度网络得到相应的非线性表示为Yk对应第k个帧块的非线性表示;1) The initial streamlined video consists of the first m frame blocks of the original video (m is a positive integer less than 50), namely A total of n init =m×t frame images, X k corresponds to the kth original frame block; the corresponding nonlinear representation is obtained by pre-training the deep network as Y k corresponds to the nonlinear representation of the kth frame block;

2)设初始词典D由nd个原子组成,即dj对应第j个原子;设重构系数为C,其元素个数对应帧数目,其维度对应词典的原子数目,即Ck对应第k个帧块系数,对应第i帧图像;2) Let the initial dictionary D consist of n d atoms, namely d j corresponds to the jth atom; let the reconstruction coefficient be C, the number of its elements corresponds to the number of frames, and its dimension corresponds to the number of atoms in the dictionary, namely C k corresponds to the kth frame block coefficient, Corresponding to the i-th frame image;

3)利用乘子交替方向方法优化正则化词典的组稀疏编码目标函数,可以分别得到初始词典D和重构系数C,即求解3) Using the multiplier alternating direction method to optimize the group sparse coding objective function of the regularized dictionary, the initial dictionary D and the reconstruction coefficient C can be obtained respectively, that is, to solve

其中,符号||·||2表示变量的l2范式,正则化参数λ为大于0的实数,多元函数F(Yk,Ck,D)的具体表达为:Among them, the symbol ||·|| 2 represents the l 2 normal form of the variable, the regularization parameter λ is a real number greater than 0, and the specific expression of the multivariate function F(Y k ,C k ,D) is:

其中,参数γ为大于0的实数,符号中的数学式子表示使用词典D对第i帧图像进行重构。这里的乘子交替方向方法具体为:先固定参数D,使上述目标函数变成针对参数C的凸函数;然后固定参数C,使上述目标函数变成针对参数D的凸函数,迭代交替更新两个参数。Among them, the parameter γ is a real number greater than 0, and the symbol The mathematical formula in represents that the i-th frame image is reconstructed using the dictionary D. The method of alternating direction of the multiplier here is as follows: first, fix the parameter D, so that the above objective function becomes a convex function for the parameter C; then fix the parameter C, make the above objective function become a convex function for the parameter D, iteratively update the two parameters.

步骤4)中的根据下一帧块更新深度网络参数并对该帧块进行重构和计算重构误差,具体是:In step 4), the depth network parameters are updated according to the next frame block and the frame block is reconstructed and the reconstruction error is calculated, specifically:

1)对该帧块的各帧图像依次做如下操作:1) Do the following operations in turn for each frame image of the frame block:

a.利用在线梯度下降算法更新深度神经网络中最后一层的参数,即权重W和偏移量b;a. Use the online gradient descent algorithm to update the parameters of the last layer in the deep neural network, that is, the weight W and the offset b;

b.利用后向传播算法更新深度神经网络中其他层的参数;b. Utilize the backpropagation algorithm to update the parameters of other layers in the deep neural network;

2)根据新的参数更新各帧图像的非线性表示;2) Update the nonlinear representation of each frame image according to the new parameters;

3)基于现有词典D,利用组稀疏编码对当前帧块进行重构并计算误差ε,即对当前帧块Xk的非线性表示Yk进行重构,具体步骤为:先最小化多元函数F(Yk,Ck,D)得到最优重构系数然后带入的第一项中并计算其值即为当前重构误差ε。3) Based on the existing dictionary D, use group sparse coding to reconstruct the current frame block and calculate the error ε, that is, reconstruct the nonlinear representation Y k of the current frame block X k . The specific steps are: first minimize the multivariate function F(Y k ,C k ,D) to get the optimal reconstruction coefficient then bring in the first item of and calculate its value as the current reconstruction error ε.

步骤4)中的若误差大于设定阈值则将当前帧块加入精简视频中并更新词典,具体是:If the error in step 4) is greater than the set threshold, the current frame block is added to the simplified video and the dictionary is updated, specifically:

1)若对当前帧块Xk的非线性表示Yk计算得到的重构误差ε大于设定阈值θ(取经验值),则将当前帧块加入精简视频中,即 1) If the reconstruction error ε calculated from the nonlinear representation Y k of the current frame block X k is greater than the set threshold θ (experimental value), then the current frame block is added to the simplified video, that is

2)若当前精简视频中含有q个帧块,则更新词典的帧图像非线性表示集合为那么使用更新词典D即求解目标函数2) If the current streamlined video Contains q frame blocks in , then the frame image nonlinear representation set of the updated dictionary is then use Updating the dictionary D means solving the objective function

其中,参数λ为大于0的实数,用于调节正则化项的影响。Among them, the parameter λ is a real number greater than 0, which is used to adjust the influence of the regularization term.

Claims (4)

1. a kind of online generation method of video frequency abstract based on deep learning, the method is characterized in that and obtain after original video, Proceed as follows:
1) it is one group little frame block by the uniform cutting of video, each frame block includes t two field pictures, extracts the statistical nature of each two field picture, Formation dimension is nfVectorization represent;
2) many layer depth networks of pre-training frame of video, obtain the non-linear expression of each frame;
3) m frames block is initially to simplify video before choosing, and it is reconstructed by a group sparse coding algorithm, obtains initial dictionary And reconstruction coefficients;
4) depth network parameter is updated according to next frame block, while reconstructed error is reconstructed and calculates to the frame block, if error More than given threshold, then the frame block is added and simplify in video and update dictionary;
5) according to step 4) successively the new frame block of online treatment until terminate, renewal simplify video be generate video pluck Will.
2. the online generation method of video frequency abstract of deep learning is based on as claimed in claim 1, it is characterised in that:Step 1) in The statistical nature of the described each two field picture of extraction forms corresponding vectorization and represents, comprises the concrete steps that:
1.1) set original video and be uniformly divided into n frame block, i.e.,Each frame block includes t two field pictures, will be each Two field picture is scaled to unified pixel size and keeps original vertical-horizontal proportion;
1.2) global characteristics and local feature of each two field picture are extracted;
The global characteristics include color histogram, color moment, edge orientation histogram, Gabor wavelet conversion, local binary mould Formula;
The local feature includes:Scale invariant features transform SIFT, acceleration robust features SURF;
1.3) sequentially couple the above-mentioned characteristics of image of each frame, form dimension for nfVectorization represent.
3. the online generation method of video frequency abstract of deep learning is based on as claimed in claim 1, it is characterised in that:Step 2) in The many layer depth networks of described pre-training frame of video obtain the non-linear expression of each frame, specifically using stacking denoising self-encoding encoder The many layer depth networks of SDA pre-training, including:
A, each two field picture is proceeded as follows in each layer:First, by adding Gaussian noise or setting input variable at random Each frame noise image is generated for arbitrary value;Then, noise image carries out mapping and obtains its non-linear expression by self-encoding encoder AE;
B, renewal is adjusted to each layer parameter of depth network using stochastic gradient descent algorithm.
4. the online generation method of video frequency abstract of deep learning is based on as claimed in claim 1, it is characterised in that:Step 3) in Described is reconstructed by group sparse coding algorithm to initially simplifying video, is comprised the concrete steps that:
3.1) initially simplify video to be made up of the front m frame block of original video, i.e.,It is total ninit=m × t two field pictures, XkK-th primitive frame block of correspondence;Corresponding non-linear table is obtained by pre-training depth network to be shown asYkThe non-linear expression of k-th frame block of correspondence;
3.2) initial dictionary D is set by ndIndividual atom composition, i.e.,djJ-th atom of correspondence;If reconstruction coefficients are C, its Element number correspondence frame number, the atom number of its dimension correspondence dictionary, i.e.,CkFor the reconstruct system of k-th frame block Number,The i-th two field picture of correspondence;
3.3) using the group sparse coding object function of multiplier alternating direction implicit Optimal Regularization dictionary, can respectively obtain just Beginning dictionary D and reconstruction coefficients C, that is, solve:
Wherein, symbol | | | |2Represent the l of variable2Normal form, regularization parameter λ is the real number more than 0, function of many variables F (Yk,Ck, Being embodied as D):
F ( Y k , C k , D ) = 1 2 n f Σ y i ∈ Y k , d j ∈ D | | y i - Σ j = 1 n d c j i d j | | 2 2 + γ Σ j = 1 n d | | c j | | 2 ;
Wherein, parameter γ is the real number more than 0, symbolIn mathematical expression subrepresentation the i-th two field picture is carried out using dictionary D Reconstruct;Here multiplier alternating direction implicit is specially:First preset parameter D, makes above-mentioned object function become for the convex of parameter C Function;Then preset parameter C, makes above-mentioned object function become the convex function for parameter D, and iteration alternately updates two parameters;
Step 4) described in depth network parameter is updated according to next frame block and reconstruct mistake is reconstructed to the frame block and calculates Difference, comprises the concrete steps that:
4.1) each two field picture of the frame block is done as follows successively:
4.1.1 the parameter of last layer in deep neural network, i.e. weight W and skew) are updated using online gradient descent algorithm Amount b;
4.1.2 the parameter of other layers in deep neural network) is updated using Back Propagation Algorithm;
4.2) the non-linear expression of each two field picture is updated according to new parameter;
4.3) based on existing dictionary D, present frame block is reconstructed using group sparse coding and calculation error ∈, i.e., to present frame Block XkNon-linear expression YkIt is reconstructed, specially:First minimize function of many variables F (Yk,Ck, D) and obtain optimum reconstruction coefficientsThen bring intoSection 1In and calculate its value and be current reconstructed error ∈;
Step 4) described in if error be more than given threshold if by present frame block add simplify in video and update dictionary, specifically It is:
(1) if to present frame block XkNon-linear expression YkCalculated reconstructed error ∈ is more than given threshold θ, then will be current Frame block is added and simplified in video, i.e.,
(2) if currently simplifying videoIn contain q frame block, then update dictionary two field picture it is non-linear expression set beSo UseUpdate dictionary D and solve object function:
Wherein, parameter lambda is the real number more than 0, for adjusting the impact of regularization term.
CN201410326406.9A 2014-07-10 2014-07-10 On-line video abstraction generation method based on depth learning Active CN104113789B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410326406.9A CN104113789B (en) 2014-07-10 2014-07-10 On-line video abstraction generation method based on depth learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410326406.9A CN104113789B (en) 2014-07-10 2014-07-10 On-line video abstraction generation method based on depth learning

Publications (2)

Publication Number Publication Date
CN104113789A CN104113789A (en) 2014-10-22
CN104113789B true CN104113789B (en) 2017-04-12

Family

ID=51710398

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410326406.9A Active CN104113789B (en) 2014-07-10 2014-07-10 On-line video abstraction generation method based on depth learning

Country Status (1)

Country Link
CN (1) CN104113789B (en)

Families Citing this family (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3241185A4 (en) * 2014-12-30 2018-07-25 Nokia Technologies Oy Moving object detection in videos
CN104778659A (en) * 2015-04-15 2015-07-15 杭州电子科技大学 Single-frame image super-resolution reconstruction method on basis of deep learning
CN105279495B (en) * 2015-10-23 2019-06-04 天津大学 A video description method based on deep learning and text summarization
CN105930314B (en) * 2016-04-14 2019-02-05 清华大学 Text summary generation system and method based on encoding-decoding deep neural network
CN106331433B (en) * 2016-08-25 2020-04-24 上海交通大学 Video denoising method based on deep recurrent neural network
CN106502985B (en) * 2016-10-20 2020-01-31 清华大学 A neural network modeling method and device for generating titles
CN106778571B (en) * 2016-12-05 2020-03-27 天津大学 Digital video feature extraction method based on deep neural network
CN106686403B (en) * 2016-12-07 2019-03-08 腾讯科技(深圳)有限公司 A kind of video preview drawing generating method, device, server and system
CN106993240B (en) * 2017-03-14 2020-10-16 天津大学 Multi-video abstraction method based on sparse coding
CN107679031B (en) * 2017-09-04 2021-01-05 昆明理工大学 Advertisement and blog identification method based on stacking noise reduction self-coding machine
CN107729821B (en) * 2017-09-27 2020-08-11 浙江大学 A video generalization method based on one-dimensional sequence learning
CN107886109B (en) * 2017-10-13 2021-06-25 天津大学 A Video Summarization Method Based on Supervised Video Segmentation
CN107911755B (en) * 2017-11-10 2020-10-20 天津大学 Multi-video abstraction method based on sparse self-encoder
CN109803067A (en) * 2017-11-16 2019-05-24 富士通株式会社 Video concentration method, video enrichment facility and electronic equipment
CN108417206A (en) * 2018-02-27 2018-08-17 四川云淞源科技有限公司 High speed information processing method based on big data
CN108417204A (en) * 2018-02-27 2018-08-17 四川云淞源科技有限公司 Information security processing method based on big data
CN108388942A (en) * 2018-02-27 2018-08-10 四川云淞源科技有限公司 Information intelligent processing method based on big data
CN108419094B (en) * 2018-03-05 2021-01-29 腾讯科技(深圳)有限公司 Video processing method, video retrieval method, device, medium and server
CN110366050A (en) * 2018-04-10 2019-10-22 北京搜狗科技发展有限公司 Processing method, device, electronic equipment and the storage medium of video data
CN108848422B (en) * 2018-04-19 2020-06-02 清华大学 Video abstract generation method based on target detection
CN111046887A (en) * 2018-10-15 2020-04-21 华北电力大学(保定) A method for feature extraction of noisy images
CN109360436B (en) * 2018-11-02 2021-01-08 Oppo广东移动通信有限公司 Video generation method, terminal and storage medium
CN111246246A (en) * 2018-11-28 2020-06-05 华为技术有限公司 Video playing method and device
CN109635777B (en) * 2018-12-24 2022-09-13 广东理致技术有限公司 Video data editing and identifying method and device
CN109905778B (en) * 2019-01-03 2021-12-03 上海大学 Method for scalable compression of single unstructured video based on group sparse coding
CN110110646B (en) * 2019-04-30 2021-05-04 浙江理工大学 A deep learning-based method for extracting keyframes from gesture images
CN110225368B (en) * 2019-06-27 2020-07-10 腾讯科技(深圳)有限公司 Video positioning method and device and electronic equipment
CN110446067B (en) * 2019-08-30 2021-11-02 杭州电子科技大学 Video Enrichment Method Based on Tensor Decomposition
US11295084B2 (en) 2019-09-16 2022-04-05 International Business Machines Corporation Cognitively generating information from videos
CN111563423A (en) * 2020-04-17 2020-08-21 西北工业大学 Object detection method and system in UAV image based on deep denoising autoencoder
CN113626641B (en) * 2021-08-11 2023-09-01 南开大学 Method for generating video abstract based on neural network of multi-modal data and aesthetic principle
CN117725148B (en) * 2024-02-07 2024-06-25 湖南三湘银行股份有限公司 Question-answer word library updating method based on self-learning

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102930518A (en) * 2012-06-13 2013-02-13 上海汇纳网络信息科技有限公司 Improved sparse representation based image super-resolution method
CN103118220A (en) * 2012-11-16 2013-05-22 佳都新太科技股份有限公司 Keyframe pick-up algorithm based on multi-dimensional feature vectors
CN103167284A (en) * 2011-12-19 2013-06-19 中国电信股份有限公司 Video streaming transmission method and system based on picture super-resolution
CN103295242A (en) * 2013-06-18 2013-09-11 南京信息工程大学 Multi-feature united sparse represented target tracking method
CN103413125A (en) * 2013-08-26 2013-11-27 中国科学院自动化研究所 Horror Video Recognition Method Based on Discriminative Example Selection Multiple Instance Learning
CN103531199A (en) * 2013-10-11 2014-01-22 福州大学 Ecological sound recognition method based on fast sparse decomposition and deep learning
CN103761531A (en) * 2014-01-20 2014-04-30 西安理工大学 Sparse-coding license plate character recognition method based on shape and contour features

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103167284A (en) * 2011-12-19 2013-06-19 中国电信股份有限公司 Video streaming transmission method and system based on picture super-resolution
CN102930518A (en) * 2012-06-13 2013-02-13 上海汇纳网络信息科技有限公司 Improved sparse representation based image super-resolution method
CN103118220A (en) * 2012-11-16 2013-05-22 佳都新太科技股份有限公司 Keyframe pick-up algorithm based on multi-dimensional feature vectors
CN103295242A (en) * 2013-06-18 2013-09-11 南京信息工程大学 Multi-feature united sparse represented target tracking method
CN103413125A (en) * 2013-08-26 2013-11-27 中国科学院自动化研究所 Horror Video Recognition Method Based on Discriminative Example Selection Multiple Instance Learning
CN103531199A (en) * 2013-10-11 2014-01-22 福州大学 Ecological sound recognition method based on fast sparse decomposition and deep learning
CN103761531A (en) * 2014-01-20 2014-04-30 西安理工大学 Sparse-coding license plate character recognition method based on shape and contour features

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于稀疏编码的自然特征提起及去噪;尚丽;《系统仿真学报》;20050731;第1782-1787页 *

Also Published As

Publication number Publication date
CN104113789A (en) 2014-10-22

Similar Documents

Publication Publication Date Title
CN104113789B (en) On-line video abstraction generation method based on depth learning
Qin et al. Coverless image steganography: a survey
CN111565318A (en) Video compression method based on sparse samples
CN102750385B (en) Correlation-quality sequencing image retrieval method based on tag retrieval
CN102682298B (en) Video fingerprint method based on graph modeling
CN115131710B (en) Real-time action detection method based on multi-scale feature fusion attention
CN111506773A (en) Video duplicate removal method based on unsupervised depth twin network
CN106203492A (en) The system and method that a kind of image latent writing is analyzed
CN106778571B (en) Digital video feature extraction method based on deep neural network
CN113902925A (en) A method and system for semantic segmentation based on deep convolutional neural network
CN105095857A (en) Face data enhancement method based on key point disturbance technology
CN107169417A (en) Strengthened based on multinuclear and the RGBD images of conspicuousness fusion cooperate with conspicuousness detection method
CN108509939A (en) A kind of birds recognition methods based on deep learning
CN104156464A (en) Micro-video retrieval method and device based on micro-video feature database
CN109034953A (en) A Movie Recommendation Method
CN111382305B (en) Video deduplication method, video deduplication device, computer equipment and storage medium
CN102547477A (en) Video fingerprint method based on contourlet transformation model
CN107027051A (en) A kind of video key frame extracting method based on linear dynamic system
Liu et al. Ensemble of CNN and rich model for steganalysis
CN107527010A (en) A kind of method that video gene is extracted according to local feature and motion vector
Zhang et al. Global priors with anchored-stripe attention and multiscale convolution for remote sensing image compression
Li et al. Coverless Video Steganography Based on Frame Sequence Perceptual Distance Mapping.
CN109657098A (en) A kind of method for extracting video fingerprints and device
Sun et al. Task-Oriented Scene Graph-Based Semantic Communications with Adaptive Channel Coding
Tian et al. Free-VSC: Free Semantics from Visual Foundation Models for Unsupervised Video Semantic Compression

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20220809

Address after: Room 406, building 19, haichuangyuan, No. 998, Wenyi West Road, Yuhang District, Hangzhou City, Zhejiang Province

Patentee after: HANGZHOU HUICUI INTELLIGENT TECHNOLOGY CO.,LTD.

Address before: 310018 No. 2 street, Xiasha Higher Education Zone, Hangzhou, Zhejiang

Patentee before: HANGZHOU DIANZI University

TR01 Transfer of patent right