CN104113789B - On-line video abstraction generation method based on depth learning - Google Patents
On-line video abstraction generation method based on depth learning Download PDFInfo
- Publication number
- CN104113789B CN104113789B CN201410326406.9A CN201410326406A CN104113789B CN 104113789 B CN104113789 B CN 104113789B CN 201410326406 A CN201410326406 A CN 201410326406A CN 104113789 B CN104113789 B CN 104113789B
- Authority
- CN
- China
- Prior art keywords
- video
- frame
- frame block
- dictionary
- parameter
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 25
- 230000014509 gene expression Effects 0.000 claims abstract description 13
- 238000004364 calculation method Methods 0.000 claims abstract 2
- 230000006870 function Effects 0.000 claims description 24
- 238000013135 deep learning Methods 0.000 claims description 8
- 238000013528 artificial neural network Methods 0.000 claims description 6
- 239000000284 extract Substances 0.000 claims description 3
- 230000001133 acceleration Effects 0.000 claims 1
- 230000015572 biosynthetic process Effects 0.000 claims 1
- 238000006243 chemical reaction Methods 0.000 claims 1
- 238000000605 extraction Methods 0.000 claims 1
- 238000013507 mapping Methods 0.000 claims 1
- 239000000203 mixture Substances 0.000 claims 1
- 230000000007 visual effect Effects 0.000 abstract description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000003708 edge detection Methods 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
Landscapes
- Compression Or Coding Systems Of Tv Signals (AREA)
- Image Analysis (AREA)
Abstract
Description
技术领域technical field
本发明属于视频摘要生成的技术领域,特别是基于深度学习的视频摘要在线生成方法。The invention belongs to the technical field of video abstract generation, in particular to an online video abstract generation method based on deep learning.
背景技术Background technique
近年来,随着数字摄像机、智能手机、掌上电脑等便携式设备的日益普及,各类视频的数量呈井喷式增长。例如,在智能交通、安防监控、公安布防等社会重要领域的视频采集设备在一个中型城市中高达几万路,这些设备产生的视频数据达PB级。为了锁定目标人物或车辆,公安交警等人员需要耗费大量的时间调看冗长乏味监控的视频流,这极大地影响了办事效率,不利于平安城市的创建。因此,从冗长的视频流中有效地选取包含关键信息的视频帧,即视频摘要技术,受到了学术界和工业界的广泛关注。In recent years, with the increasing popularity of portable devices such as digital cameras, smart phones, and handheld computers, the number of various types of videos has shown a blowout growth. For example, there are tens of thousands of video acquisition devices in important social fields such as intelligent transportation, security monitoring, and public security deployment in a medium-sized city, and the video data generated by these devices reaches PB level. In order to lock the target person or vehicle, the public security traffic police and other personnel need to spend a lot of time watching the tedious surveillance video stream, which greatly affects the efficiency of work and is not conducive to the creation of a safe city. Therefore, efficient selection of video frames containing key information from lengthy video streams, namely video summarization techniques, has drawn extensive attention from academia and industry.
传统的视频摘要技术主要针对编辑过的结构化视频,如一部电影可分为多个场景,每个场景由同一地点发生的多个情节组成,每个情节又由一系列光滑连续的视频帧构成。不同于传统的电影、电视剧、新闻报道等结构化视频,监控视频一般是未经剪辑的非结构化视频,这为视频摘要技术的应用带来较大挑战。Traditional video summarization techniques are mainly aimed at edited structured videos, such as a movie can be divided into multiple scenes, each scene consists of multiple plots that take place at the same location, and each plot consists of a series of smooth and continuous video frames . Unlike traditional structured videos such as movies, TV dramas, and news reports, surveillance videos are generally unedited and unstructured videos, which brings great challenges to the application of video summarization technology.
目前,主要的视频摘要领域有基于关键帧方法、创建新图像、视频帧块、转自然语言处理等技术。基于关键帧的方法包括情节边缘检测、视频帧聚类、颜色直方图、动作稳定性等策略;创建新图像利用包含重要内容的一些连续帧生成,该方法容易受到不同帧之间的模糊因素影响;视频帧块方法利用结构化视频中的场景边缘检测、对话分析等技术对原始进行裁剪,形成短小的主题电影;转自然语言处理是指利用视频中的字幕和语音信息将视频摘要转化为文本摘要的技术,该技术不适合处理无字幕或声音的监控视频。At present, the main fields of video summarization are based on key frame methods, creating new images, video frame blocks, and converting to natural language processing. Keyframe-based methods include strategies such as plot edge detection, video frame clustering, color histogram, motion stability, etc.; creating a new image is generated using a few consecutive frames containing important content, which is susceptible to blurring factors between different frames ; The video frame block method uses scene edge detection, dialogue analysis and other technologies in the structured video to cut the original to form a short theme movie; converting to natural language processing refers to using the subtitles and voice information in the video to convert the video summary into text Abstract technology, which is not suitable for processing surveillance video without subtitles or sound.
针对智能交通、安防布控等重要领域源源不断地产生大量非结构化视频,传统的视频摘要方法不能满足在线处理流式视频的应用要求。为此,迫切需要既能在线处理视频流,又能高效准确选取包含关键内容的视频摘要方法。A large number of unstructured videos are continuously generated in important fields such as intelligent transportation and security control. Traditional video summarization methods cannot meet the application requirements of online processing streaming video. Therefore, there is an urgent need for a method that can not only process video streams online, but also efficiently and accurately select video summarization that contains key content.
发明内容Contents of the invention
为了高效准确地在线浓缩和精简冗长乏味的视频流,以节省用户时间并增强视频内容的视觉效果,本发明提出了一种基于深度学习的视频摘要在线生成方法,该方法包括以下步骤:In order to efficiently and accurately condense and streamline lengthy and tedious video streams online, so as to save user time and enhance the visual effect of video content, the present invention proposes a method for online generation of video summaries based on deep learning, which includes the following steps:
1、获取原始视频数据后,进行以下操作:1. After obtaining the original video data, perform the following operations:
1)将视频均匀切分为一组小帧块,每个帧块包含多帧,提取各帧图像的统计特征,形成相应的向量化表示;1) The video is evenly divided into a group of small frame blocks, each frame block contains multiple frames, and the statistical features of each frame image are extracted to form a corresponding vectorized representation;
2)预训练视频帧多层深度网络,获得各帧的非线性表示;2) Pre-train the multi-layer deep network of video frames to obtain the nonlinear representation of each frame;
3)选取前m帧块为初始精简视频,并通过组稀疏编码算法对其进行重构,获得初始词典和重构系数;3) Select the first m frame blocks as the initial simplified video, and reconstruct it through the group sparse coding algorithm to obtain the initial dictionary and reconstruction coefficients;
4)根据下一帧块更新深度网络参数,同时对该帧块进行重构并计算重构误差,若误差大于设定阈值,则将该帧块加入精简视频中并更新词典;4) Update the deep network parameters according to the next frame block, and at the same time reconstruct the frame block and calculate the reconstruction error. If the error is greater than the set threshold, add the frame block to the simplified video and update the dictionary;
5)按照步骤4)依次在线处理新的帧块直到结束,更新的精简视频即为生成的视频摘要。5) According to step 4), the new frame blocks are processed online sequentially until the end, and the updated simplified video is the generated video summary.
进一步,所述的步骤1)中所述的提取各帧图像的统计特征形成相应向量化表示,具体是:Further, the statistical features of each frame image extracted in the described step 1) form a corresponding vectorized representation, specifically:
1)设原始视频均匀分为n个帧块,即每个帧块包含t帧图像(如t=80),将各帧图像缩放成统一像素大小并保持原始的纵横比例;1) Suppose the original video is evenly divided into n frame blocks, namely Each frame block contains t frames of images (such as t=80), and each frame of images is scaled to a uniform pixel size and maintains the original aspect ratio;
2)提取各帧图像的颜色直方图、颜色矩、边缘方向直方图、Gabor小波变换、局部二值模式等全局特征和尺度不变特征变换(SIFT:Scale-Invariant Feature Transform)、加速鲁棒特征(SURF:Speeded Up Robust Feature)等局部特征;2) Extract global features such as color histogram, color moment, edge direction histogram, Gabor wavelet transform, local binary mode and scale invariant feature transform (SIFT: Scale-Invariant Feature Transform) and accelerated robust features of each frame image (SURF: Speeded Up Robust Feature) and other local features;
3)顺序联接各帧的上述图像特征,形成维度为nf的向量化表示。3) The above-mentioned image features of each frame are sequentially connected to form a vectorized representation with dimension nf .
进一步,所述的步骤2)中的预训练视频帧多层深度网络获得各帧的非线性表示,具体是:Further, the pre-training video frame multilayer depth network in the described step 2) obtains the non-linear representation of each frame, specifically:
利用堆叠去噪自编码器(SDA:Stacked Denoising Autoencoder)预训练多层深度网络(层数小于10);Use stacked denoising autoencoder (SDA: Stacked Denoising Autoencoder) to pre-train multi-layer deep network (the number of layers is less than 10);
a、在每一层对各帧图像进行如下操作:首先,通过添加较小的高斯噪声、随机设输入变量为任意值等途径生成各帧噪声图像;然后,噪声图像通过自编码器(AE:AutoEncoder)进行映射得到其非线性表示;a. Perform the following operations on each frame of image at each layer: First, generate each frame of noise image by adding small Gaussian noise, randomly setting the input variable to any value, etc.; then, the noise image passes through the autoencoder (AE: AutoEncoder) is mapped to obtain its nonlinear representation;
b、利用随机梯度下降算法对深度网络的各层参数进行调整更新;b. Use the stochastic gradient descent algorithm to adjust and update the parameters of each layer of the deep network;
所述的步骤3)中的通过组稀疏编码算法对初始精简视频进行重构,具体是:In the described step 3), the initial simplified video is reconstructed by the group sparse coding algorithm, specifically:
1)初始精简视频由原始视频的前m个帧块组成(m为小于50的正整数),即共有ninit=m×t帧图像,Xk对应第k个原始帧块;通过预训练深度网络得到相应的非线性表示为Yk对应第k个帧块的非线性表示;1) The initial streamlined video consists of the first m frame blocks of the original video (m is a positive integer less than 50), namely There are a total of n init =m×t frame images, X k corresponds to the kth original frame block; the corresponding nonlinear representation is obtained by pre-training the deep network as Y k corresponds to the nonlinear representation of the kth frame block;
2)设初始词典D由nd个原子组成,即dj对应第j个原子;设重构系数为C,其元素个数对应帧数目,其维度对应词典的原子数目,即Ck对应第k个帧块系数,对应第i帧图像;2) Let the initial dictionary D consist of n d atoms, namely d j corresponds to the jth atom; let the reconstruction coefficient be C, the number of its elements corresponds to the number of frames, and its dimension corresponds to the number of atoms in the dictionary, namely C k corresponds to the kth frame block coefficient, Corresponding to the i-th frame image;
3)利用乘子交替方向方法优化正则化词典的组稀疏编码目标函数,可以分别得到初始词典D和重构系数C,即求解3) Using the multiplier alternating direction method to optimize the group sparse coding objective function of the regularized dictionary, the initial dictionary D and the reconstruction coefficient C can be obtained respectively, that is, to solve
其中,符号||·||2表示变量的l2范式,正则化参数λ为大于0的实数,多元函数F(Yk,Ck,D)的具体表达为:Among them, the symbol ||·|| 2 represents the l 2 normal form of the variable, the regularization parameter λ is a real number greater than 0, and the specific expression of the multivariate function F(Y k ,C k ,D) is:
其中,参数γ为大于0的实数,符号中的数学式子表示使用词典D对第i帧图像进行重构。这里的乘子交替方向方法具体为:先固定参数D,使上述目标函数变成针对参数C的凸函数;然后固定参数C,使上述目标函数变成针对参数D的凸函数,迭代交替更新两个参数。Among them, the parameter γ is a real number greater than 0, and the symbol The mathematical formula in represents that the i-th frame image is reconstructed using the dictionary D. The method of alternating direction of the multiplier here is as follows: first, fix the parameter D, so that the above objective function becomes a convex function for the parameter C; then fix the parameter C, make the above objective function become a convex function for the parameter D, iteratively update the two parameters.
所述的步骤4)中的根据下一帧块更新深度网络参数并对该帧块进行重构和计算重构误差,具体是:In the described step 4), update the deep network parameters according to the next frame block and reconstruct the frame block and calculate the reconstruction error, specifically:
1)对该帧块的各帧图像依次做如下操作:1) Do the following operations in turn for each frame image of the frame block:
a.利用在线梯度下降算法更新深度神经网络中最后一层的参数,即权重W和偏移量b;a. Use the online gradient descent algorithm to update the parameters of the last layer in the deep neural network, that is, the weight W and the offset b;
b.利用后向传播算法更新深度神经网络中其他层的参数;b. Utilize the backpropagation algorithm to update the parameters of other layers in the deep neural network;
2)根据新的参数更新各帧图像的非线性表示;2) Update the nonlinear representation of each frame image according to the new parameters;
3)基于现有词典D,利用组稀疏编码对当前帧块进行重构并计算误差ε,即对当前帧块Xk的非线性表示Yk进行重构,具体步骤为:先最小化多元函数F(Yk,Ck,D)得到最优重构系数然后带入的第一项中并计算其值即为当前重构误差ε。3) Based on the existing dictionary D, use group sparse coding to reconstruct the current frame block and calculate the error ε, that is, reconstruct the nonlinear representation Y k of the current frame block X k . The specific steps are: first minimize the multivariate function F(Y k ,C k ,D) to get the optimal reconstruction coefficient then bring in the first item of and calculate its value as the current reconstruction error ε.
所述的步骤4)中的若误差大于设定阈值则将当前帧块加入精简视频中并更新词典,具体是:If error in described step 4) is greater than setting threshold value then current frame block is added in the streamlined video and update dictionary, specifically:
1)若对当前帧块Xk的非线性表示Yk计算得到的重构误差ε大于设定阈值θ(取经验值),则将当前帧块加入精简视频中,即 1) If the reconstruction error ε calculated from the nonlinear representation Y k of the current frame block X k is greater than the set threshold θ (experimental value), then the current frame block is added to the simplified video, that is
2)若当前精简视频中含有q个帧块,则更新词典的帧图像非线性表示集合为那么使用更新词典D即求解目标函数2) If the current streamlined video Contains q frame blocks in , then the frame image nonlinear representation set of the updated dictionary is then use Updating the dictionary D means solving the objective function
其中,参数λ为大于0的实数,用于调节正则化项的影响。Among them, the parameter λ is a real number greater than 0, which is used to adjust the influence of the regularization term.
本发明提出了基于深度学习的视频摘要在线生成方法,其优点在于:利用深度学习挖掘视频中的高层语义特征,使得组稀疏编码能更好反映词典重构当前视频帧块的程度,从而最具信息量的视频帧块构成包含兴趣区域和关键人物事件的视频摘要;精简的视频摘要为用户节省了大量的时间,同时增强了关键内容的视觉体验。The present invention proposes an online generation method of video summaries based on deep learning, which has the advantage of using deep learning to mine high-level semantic features in videos, so that group sparse coding can better reflect the extent to which the dictionary reconstructs the current video frame blocks, thereby making the most Information-rich video frame blocks form a video summary that includes regions of interest and key person events; the streamlined video summary saves a lot of time for users, and at the same time enhances the visual experience of key content.
附图说明Description of drawings
图1是本发明的方法流程图。Fig. 1 is a flow chart of the method of the present invention.
具体实施方式detailed description
参照附图1,进一步说明本发明:With reference to accompanying drawing 1, further illustrate the present invention:
1、获取原始视频数据后,进行以下操作:1. After obtaining the original video data, perform the following operations:
1)将视频均匀切分为一组小帧块,每个帧块包含多帧,提取各帧图像的统计特征,形成相应的向量化表示;1) The video is evenly divided into a group of small frame blocks, each frame block contains multiple frames, and the statistical features of each frame image are extracted to form a corresponding vectorized representation;
2)预训练视频帧多层深度网络,获得各帧的非线性表示;2) Pre-train the multi-layer deep network of video frames to obtain the nonlinear representation of each frame;
3)选取前m帧块为初始精简视频,并通过组稀疏编码算法对其进行重构,获得初始词典和重构系数;3) Select the first m frame blocks as the initial simplified video, and reconstruct it through the group sparse coding algorithm to obtain the initial dictionary and reconstruction coefficients;
4)根据下一帧块更新深度网络参数,同时对该帧块进行重构并计算重构误差,若误差大于设定阈值,则将该帧块加入精简视频中并更新词典;4) Update the deep network parameters according to the next frame block, and at the same time reconstruct the frame block and calculate the reconstruction error. If the error is greater than the set threshold, add the frame block to the simplified video and update the dictionary;
5)按照步骤4)依次在线处理新的帧块直到结束,更新的精简视频即为生成的视频摘要。5) According to step 4), the new frame blocks are processed online sequentially until the end, and the updated simplified video is the generated video summary.
步骤1)中所述的提取各帧图像的统计特征形成相应向量化表示,具体是:Step 1) described in extracting the statistical feature of each frame image forms corresponding vectorized representation, specifically:
1)设原始视频均匀分为n个帧块,即每个帧块包含t帧图像(如t=80),将各帧图像缩放成统一像素大小并保持原始的纵横比例;1) Suppose the original video is evenly divided into n frame blocks, namely Each frame block contains t frames of images (such as t=80), and each frame of images is scaled to a uniform pixel size and maintains the original aspect ratio;
2)提取各帧图像的颜色直方图、颜色矩、边缘方向直方图、Gabor小波变换、局部二值模式等全局特征和尺度不变特征变换(SIFT:Scale-Invariant Feature Transform)、加速鲁棒特征(SURF:Speeded Up Robust Feature)等局部特征;2) Extract global features such as color histogram, color moment, edge direction histogram, Gabor wavelet transform, local binary mode and scale invariant feature transform (SIFT: Scale-Invariant Feature Transform) and accelerated robust features of each frame image (SURF: Speeded Up Robust Feature) and other local features;
3)顺序联接各帧的上述图像特征,形成维度为nf的向量化表示。3) The above-mentioned image features of each frame are sequentially connected to form a vectorized representation with dimension nf.
步骤2)中的预训练视频帧多层深度网络获得各帧的非线性表示,具体是:The pre-training video frame multilayer deep network in step 2) obtains the non-linear representation of each frame, specifically:
利用堆叠去噪自编码器(SDA:Stacked Denoising Autoencoder)预训练多层深度网络(层数小于10);Use stacked denoising autoencoder (SDA: Stacked Denoising Autoencoder) to pre-train multi-layer deep network (the number of layers is less than 10);
a、在每一层对各帧图像进行如下操作:首先,通过添加较小的高斯噪声、随机设输入变量为任意值等途径生成各帧噪声图像;然后,噪声图像通过自编码器(AE:AutoEncoder)进行映射得到其非线性表示;a. Perform the following operations on each frame of image at each layer: First, generate each frame of noise image by adding small Gaussian noise, randomly setting the input variable to any value, etc.; then, the noise image passes through the autoencoder (AE: AutoEncoder) is mapped to obtain its nonlinear representation;
b、利用随机梯度下降算法对深度网络的各层参数进行调整更新;b. Use the stochastic gradient descent algorithm to adjust and update the parameters of each layer of the deep network;
步骤3)中的通过组稀疏编码算法对初始精简视频进行重构,具体是:In step 3), the initial simplified video is reconstructed through the group sparse coding algorithm, specifically:
1)初始精简视频由原始视频的前m个帧块组成(m为小于50的正整数),即共有ninit=m×t帧图像,Xk对应第k个原始帧块;通过预训练深度网络得到相应的非线性表示为Yk对应第k个帧块的非线性表示;1) The initial streamlined video consists of the first m frame blocks of the original video (m is a positive integer less than 50), namely A total of n init =m×t frame images, X k corresponds to the kth original frame block; the corresponding nonlinear representation is obtained by pre-training the deep network as Y k corresponds to the nonlinear representation of the kth frame block;
2)设初始词典D由nd个原子组成,即dj对应第j个原子;设重构系数为C,其元素个数对应帧数目,其维度对应词典的原子数目,即Ck对应第k个帧块系数,对应第i帧图像;2) Let the initial dictionary D consist of n d atoms, namely d j corresponds to the jth atom; let the reconstruction coefficient be C, the number of its elements corresponds to the number of frames, and its dimension corresponds to the number of atoms in the dictionary, namely C k corresponds to the kth frame block coefficient, Corresponding to the i-th frame image;
3)利用乘子交替方向方法优化正则化词典的组稀疏编码目标函数,可以分别得到初始词典D和重构系数C,即求解3) Using the multiplier alternating direction method to optimize the group sparse coding objective function of the regularized dictionary, the initial dictionary D and the reconstruction coefficient C can be obtained respectively, that is, to solve
其中,符号||·||2表示变量的l2范式,正则化参数λ为大于0的实数,多元函数F(Yk,Ck,D)的具体表达为:Among them, the symbol ||·|| 2 represents the l 2 normal form of the variable, the regularization parameter λ is a real number greater than 0, and the specific expression of the multivariate function F(Y k ,C k ,D) is:
其中,参数γ为大于0的实数,符号中的数学式子表示使用词典D对第i帧图像进行重构。这里的乘子交替方向方法具体为:先固定参数D,使上述目标函数变成针对参数C的凸函数;然后固定参数C,使上述目标函数变成针对参数D的凸函数,迭代交替更新两个参数。Among them, the parameter γ is a real number greater than 0, and the symbol The mathematical formula in represents that the i-th frame image is reconstructed using the dictionary D. The method of alternating direction of the multiplier here is as follows: first, fix the parameter D, so that the above objective function becomes a convex function for the parameter C; then fix the parameter C, make the above objective function become a convex function for the parameter D, iteratively update the two parameters.
步骤4)中的根据下一帧块更新深度网络参数并对该帧块进行重构和计算重构误差,具体是:In step 4), the depth network parameters are updated according to the next frame block and the frame block is reconstructed and the reconstruction error is calculated, specifically:
1)对该帧块的各帧图像依次做如下操作:1) Do the following operations in turn for each frame image of the frame block:
a.利用在线梯度下降算法更新深度神经网络中最后一层的参数,即权重W和偏移量b;a. Use the online gradient descent algorithm to update the parameters of the last layer in the deep neural network, that is, the weight W and the offset b;
b.利用后向传播算法更新深度神经网络中其他层的参数;b. Utilize the backpropagation algorithm to update the parameters of other layers in the deep neural network;
2)根据新的参数更新各帧图像的非线性表示;2) Update the nonlinear representation of each frame image according to the new parameters;
3)基于现有词典D,利用组稀疏编码对当前帧块进行重构并计算误差ε,即对当前帧块Xk的非线性表示Yk进行重构,具体步骤为:先最小化多元函数F(Yk,Ck,D)得到最优重构系数然后带入的第一项中并计算其值即为当前重构误差ε。3) Based on the existing dictionary D, use group sparse coding to reconstruct the current frame block and calculate the error ε, that is, reconstruct the nonlinear representation Y k of the current frame block X k . The specific steps are: first minimize the multivariate function F(Y k ,C k ,D) to get the optimal reconstruction coefficient then bring in the first item of and calculate its value as the current reconstruction error ε.
步骤4)中的若误差大于设定阈值则将当前帧块加入精简视频中并更新词典,具体是:If the error in step 4) is greater than the set threshold, the current frame block is added to the simplified video and the dictionary is updated, specifically:
1)若对当前帧块Xk的非线性表示Yk计算得到的重构误差ε大于设定阈值θ(取经验值),则将当前帧块加入精简视频中,即 1) If the reconstruction error ε calculated from the nonlinear representation Y k of the current frame block X k is greater than the set threshold θ (experimental value), then the current frame block is added to the simplified video, that is
2)若当前精简视频中含有q个帧块,则更新词典的帧图像非线性表示集合为那么使用更新词典D即求解目标函数2) If the current streamlined video Contains q frame blocks in , then the frame image nonlinear representation set of the updated dictionary is then use Updating the dictionary D means solving the objective function
其中,参数λ为大于0的实数,用于调节正则化项的影响。Among them, the parameter λ is a real number greater than 0, which is used to adjust the influence of the regularization term.
Claims (4)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410326406.9A CN104113789B (en) | 2014-07-10 | 2014-07-10 | On-line video abstraction generation method based on depth learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410326406.9A CN104113789B (en) | 2014-07-10 | 2014-07-10 | On-line video abstraction generation method based on depth learning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104113789A CN104113789A (en) | 2014-10-22 |
CN104113789B true CN104113789B (en) | 2017-04-12 |
Family
ID=51710398
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410326406.9A Active CN104113789B (en) | 2014-07-10 | 2014-07-10 | On-line video abstraction generation method based on depth learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104113789B (en) |
Families Citing this family (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3241185A4 (en) * | 2014-12-30 | 2018-07-25 | Nokia Technologies Oy | Moving object detection in videos |
CN104778659A (en) * | 2015-04-15 | 2015-07-15 | 杭州电子科技大学 | Single-frame image super-resolution reconstruction method on basis of deep learning |
CN105279495B (en) * | 2015-10-23 | 2019-06-04 | 天津大学 | A video description method based on deep learning and text summarization |
CN105930314B (en) * | 2016-04-14 | 2019-02-05 | 清华大学 | Text summary generation system and method based on encoding-decoding deep neural network |
CN106331433B (en) * | 2016-08-25 | 2020-04-24 | 上海交通大学 | Video denoising method based on deep recurrent neural network |
CN106502985B (en) * | 2016-10-20 | 2020-01-31 | 清华大学 | A neural network modeling method and device for generating titles |
CN106778571B (en) * | 2016-12-05 | 2020-03-27 | 天津大学 | Digital video feature extraction method based on deep neural network |
CN106686403B (en) * | 2016-12-07 | 2019-03-08 | 腾讯科技(深圳)有限公司 | A kind of video preview drawing generating method, device, server and system |
CN106993240B (en) * | 2017-03-14 | 2020-10-16 | 天津大学 | Multi-video abstraction method based on sparse coding |
CN107679031B (en) * | 2017-09-04 | 2021-01-05 | 昆明理工大学 | Advertisement and blog identification method based on stacking noise reduction self-coding machine |
CN107729821B (en) * | 2017-09-27 | 2020-08-11 | 浙江大学 | A video generalization method based on one-dimensional sequence learning |
CN107886109B (en) * | 2017-10-13 | 2021-06-25 | 天津大学 | A Video Summarization Method Based on Supervised Video Segmentation |
CN107911755B (en) * | 2017-11-10 | 2020-10-20 | 天津大学 | Multi-video abstraction method based on sparse self-encoder |
CN109803067A (en) * | 2017-11-16 | 2019-05-24 | 富士通株式会社 | Video concentration method, video enrichment facility and electronic equipment |
CN108417206A (en) * | 2018-02-27 | 2018-08-17 | 四川云淞源科技有限公司 | High speed information processing method based on big data |
CN108417204A (en) * | 2018-02-27 | 2018-08-17 | 四川云淞源科技有限公司 | Information security processing method based on big data |
CN108388942A (en) * | 2018-02-27 | 2018-08-10 | 四川云淞源科技有限公司 | Information intelligent processing method based on big data |
CN108419094B (en) * | 2018-03-05 | 2021-01-29 | 腾讯科技(深圳)有限公司 | Video processing method, video retrieval method, device, medium and server |
CN110366050A (en) * | 2018-04-10 | 2019-10-22 | 北京搜狗科技发展有限公司 | Processing method, device, electronic equipment and the storage medium of video data |
CN108848422B (en) * | 2018-04-19 | 2020-06-02 | 清华大学 | Video abstract generation method based on target detection |
CN111046887A (en) * | 2018-10-15 | 2020-04-21 | 华北电力大学(保定) | A method for feature extraction of noisy images |
CN109360436B (en) * | 2018-11-02 | 2021-01-08 | Oppo广东移动通信有限公司 | Video generation method, terminal and storage medium |
CN111246246A (en) * | 2018-11-28 | 2020-06-05 | 华为技术有限公司 | Video playing method and device |
CN109635777B (en) * | 2018-12-24 | 2022-09-13 | 广东理致技术有限公司 | Video data editing and identifying method and device |
CN109905778B (en) * | 2019-01-03 | 2021-12-03 | 上海大学 | Method for scalable compression of single unstructured video based on group sparse coding |
CN110110646B (en) * | 2019-04-30 | 2021-05-04 | 浙江理工大学 | A deep learning-based method for extracting keyframes from gesture images |
CN110225368B (en) * | 2019-06-27 | 2020-07-10 | 腾讯科技(深圳)有限公司 | Video positioning method and device and electronic equipment |
CN110446067B (en) * | 2019-08-30 | 2021-11-02 | 杭州电子科技大学 | Video Enrichment Method Based on Tensor Decomposition |
US11295084B2 (en) | 2019-09-16 | 2022-04-05 | International Business Machines Corporation | Cognitively generating information from videos |
CN111563423A (en) * | 2020-04-17 | 2020-08-21 | 西北工业大学 | Object detection method and system in UAV image based on deep denoising autoencoder |
CN113626641B (en) * | 2021-08-11 | 2023-09-01 | 南开大学 | Method for generating video abstract based on neural network of multi-modal data and aesthetic principle |
CN117725148B (en) * | 2024-02-07 | 2024-06-25 | 湖南三湘银行股份有限公司 | Question-answer word library updating method based on self-learning |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102930518A (en) * | 2012-06-13 | 2013-02-13 | 上海汇纳网络信息科技有限公司 | Improved sparse representation based image super-resolution method |
CN103118220A (en) * | 2012-11-16 | 2013-05-22 | 佳都新太科技股份有限公司 | Keyframe pick-up algorithm based on multi-dimensional feature vectors |
CN103167284A (en) * | 2011-12-19 | 2013-06-19 | 中国电信股份有限公司 | Video streaming transmission method and system based on picture super-resolution |
CN103295242A (en) * | 2013-06-18 | 2013-09-11 | 南京信息工程大学 | Multi-feature united sparse represented target tracking method |
CN103413125A (en) * | 2013-08-26 | 2013-11-27 | 中国科学院自动化研究所 | Horror Video Recognition Method Based on Discriminative Example Selection Multiple Instance Learning |
CN103531199A (en) * | 2013-10-11 | 2014-01-22 | 福州大学 | Ecological sound recognition method based on fast sparse decomposition and deep learning |
CN103761531A (en) * | 2014-01-20 | 2014-04-30 | 西安理工大学 | Sparse-coding license plate character recognition method based on shape and contour features |
-
2014
- 2014-07-10 CN CN201410326406.9A patent/CN104113789B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103167284A (en) * | 2011-12-19 | 2013-06-19 | 中国电信股份有限公司 | Video streaming transmission method and system based on picture super-resolution |
CN102930518A (en) * | 2012-06-13 | 2013-02-13 | 上海汇纳网络信息科技有限公司 | Improved sparse representation based image super-resolution method |
CN103118220A (en) * | 2012-11-16 | 2013-05-22 | 佳都新太科技股份有限公司 | Keyframe pick-up algorithm based on multi-dimensional feature vectors |
CN103295242A (en) * | 2013-06-18 | 2013-09-11 | 南京信息工程大学 | Multi-feature united sparse represented target tracking method |
CN103413125A (en) * | 2013-08-26 | 2013-11-27 | 中国科学院自动化研究所 | Horror Video Recognition Method Based on Discriminative Example Selection Multiple Instance Learning |
CN103531199A (en) * | 2013-10-11 | 2014-01-22 | 福州大学 | Ecological sound recognition method based on fast sparse decomposition and deep learning |
CN103761531A (en) * | 2014-01-20 | 2014-04-30 | 西安理工大学 | Sparse-coding license plate character recognition method based on shape and contour features |
Non-Patent Citations (1)
Title |
---|
基于稀疏编码的自然特征提起及去噪;尚丽;《系统仿真学报》;20050731;第1782-1787页 * |
Also Published As
Publication number | Publication date |
---|---|
CN104113789A (en) | 2014-10-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104113789B (en) | On-line video abstraction generation method based on depth learning | |
Qin et al. | Coverless image steganography: a survey | |
CN111565318A (en) | Video compression method based on sparse samples | |
CN102750385B (en) | Correlation-quality sequencing image retrieval method based on tag retrieval | |
CN102682298B (en) | Video fingerprint method based on graph modeling | |
CN115131710B (en) | Real-time action detection method based on multi-scale feature fusion attention | |
CN111506773A (en) | Video duplicate removal method based on unsupervised depth twin network | |
CN106203492A (en) | The system and method that a kind of image latent writing is analyzed | |
CN106778571B (en) | Digital video feature extraction method based on deep neural network | |
CN113902925A (en) | A method and system for semantic segmentation based on deep convolutional neural network | |
CN105095857A (en) | Face data enhancement method based on key point disturbance technology | |
CN107169417A (en) | Strengthened based on multinuclear and the RGBD images of conspicuousness fusion cooperate with conspicuousness detection method | |
CN108509939A (en) | A kind of birds recognition methods based on deep learning | |
CN104156464A (en) | Micro-video retrieval method and device based on micro-video feature database | |
CN109034953A (en) | A Movie Recommendation Method | |
CN111382305B (en) | Video deduplication method, video deduplication device, computer equipment and storage medium | |
CN102547477A (en) | Video fingerprint method based on contourlet transformation model | |
CN107027051A (en) | A kind of video key frame extracting method based on linear dynamic system | |
Liu et al. | Ensemble of CNN and rich model for steganalysis | |
CN107527010A (en) | A kind of method that video gene is extracted according to local feature and motion vector | |
Zhang et al. | Global priors with anchored-stripe attention and multiscale convolution for remote sensing image compression | |
Li et al. | Coverless Video Steganography Based on Frame Sequence Perceptual Distance Mapping. | |
CN109657098A (en) | A kind of method for extracting video fingerprints and device | |
Sun et al. | Task-Oriented Scene Graph-Based Semantic Communications with Adaptive Channel Coding | |
Tian et al. | Free-VSC: Free Semantics from Visual Foundation Models for Unsupervised Video Semantic Compression |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20220809 Address after: Room 406, building 19, haichuangyuan, No. 998, Wenyi West Road, Yuhang District, Hangzhou City, Zhejiang Province Patentee after: HANGZHOU HUICUI INTELLIGENT TECHNOLOGY CO.,LTD. Address before: 310018 No. 2 street, Xiasha Higher Education Zone, Hangzhou, Zhejiang Patentee before: HANGZHOU DIANZI University |
|
TR01 | Transfer of patent right |