WO2018137126A1 - 一种静态视频摘要的生成方法及装置 - Google Patents

一种静态视频摘要的生成方法及装置 Download PDF

Info

Publication number
WO2018137126A1
WO2018137126A1 PCT/CN2017/072416 CN2017072416W WO2018137126A1 WO 2018137126 A1 WO2018137126 A1 WO 2018137126A1 CN 2017072416 W CN2017072416 W CN 2017072416W WO 2018137126 A1 WO2018137126 A1 WO 2018137126A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
video
clustering
candidate
module
Prior art date
Application number
PCT/CN2017/072416
Other languages
English (en)
French (fr)
Inventor
钟圣华
吴嘉欣
黄星胜
江健民
Original Assignee
深圳大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳大学 filed Critical 深圳大学
Priority to PCT/CN2017/072416 priority Critical patent/WO2018137126A1/zh
Priority to CN201780000556.2A priority patent/CN107223344A/zh
Publication of WO2018137126A1 publication Critical patent/WO2018137126A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/85Assembly of content; Generation of multimedia applications
    • H04N21/854Content authoring
    • H04N21/8549Creating video summaries, e.g. movie trailer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/758Involving statistics of pixels or of feature values, e.g. histogram matching

Definitions

  • the invention belongs to the technical field of computers, and in particular relates to a method and a device for generating a static video summary.
  • Static video summarization is an effective and classic method to solve this problem. By removing redundant frames in the video, the method obtains a static video summary that can briefly represent the video content. By watching the video summary, the user can understand the approximate content of the video and determine whether they are interested in viewing the entire video.
  • the first Both the method and the second method use a lens-based approach with redundancy, and the number of clusters pre-set in the first method affects the generation of the best video summary results, while the third method
  • the de-redundancy work before clustering is not deep enough, and only some simple, meaningless video frames are simply removed.
  • An object of the present invention is to provide a method and an apparatus for generating a static video digest, which aims to solve the problem of redundant frame removal in a video when generating a static video digest, because the prior art cannot provide an effective method for generating static video digest.
  • the number of clusters that need to be manually specified after clustering is low, resulting in low efficiency of static video summary generation and unstable quality of generated static video summary.
  • the present invention provides a method for generating a static video digest, the method comprising the steps of:
  • All the histograms are clustered by a high-density peak search algorithm based on video representation, and clustered cluster center points are obtained;
  • the present invention provides a static video summary generating apparatus, the apparatus comprising:
  • a video receiving module configured to receive a video to be processed input by a user
  • a candidate frame extraction module configured to pre-sample the to-be-processed video by using a singular value decomposition algorithm to extract a candidate frame of the to-be-processed video
  • a histogram representation module configured to respectively generate a histogram of all the candidate frames according to the word bag model algorithm
  • a clustering operation module configured to cluster all the histograms by a high-density peak search algorithm based on a video representation, and obtain clustered cluster center points;
  • a video summary generating module configured to generate a static video summary of the to-be-processed video according to each cluster center point.
  • the invention first adopts a singular value decomposition algorithm, pre-samples the video to be processed, obtains a candidate frame of the video to be processed, and then uses a word bag model to generate a histogram for representing the candidate frames, and then adopts a high density based on the video representation.
  • the peak search algorithm clusters all the histograms, and finally generates a static video summary of the video to be processed according to each cluster center point after the clustering, thereby effectively improving the de-redundancy effect of the frames in the video, and
  • the number of cluster center centers need not be set in advance, and a certain number of cluster center can be adaptively generated according to the content of the video, thereby effectively improving the stability and adaptability of the cluster, and reducing the clustering.
  • Time complexity which effectively improves the efficiency and quality of static video summary generation.
  • FIG. 1 is a flowchart of an implementation of a method for generating a static video digest according to Embodiment 1 of the present invention
  • FIG. 2 is a schematic structural diagram of a static video summary generating apparatus according to Embodiment 2 of the present invention.
  • FIG. 3 is a schematic structural diagram of a candidate frame extraction module in a static video summary generating apparatus according to Embodiment 2 of the present invention.
  • FIG. 4 is a schematic structural diagram of a histogram representation module in a static video summary generating apparatus according to Embodiment 2 of the present invention.
  • FIG. 5 is a schematic structural diagram of a clustering operation module in a static video summary generating apparatus according to Embodiment 2 of the present invention.
  • FIG. 6 is a schematic structural diagram of a video summary generating module in a static video summary generating apparatus according to Embodiment 2 of the present invention.
  • Embodiment 1 is a diagrammatic representation of Embodiment 1:
  • FIG. 1 is a flowchart showing an implementation process of a method for generating a static video summary according to Embodiment 1 of the present invention. For convenience of description, only parts related to the embodiment of the present invention are shown, which are as follows:
  • step S101 a video to be processed input by the user is received.
  • the embodiments of the present invention are applicable to a platform or a smart device capable of performing video processing.
  • the video can be used as a pending video to input a platform or smart device that can currently perform video processing.
  • step S102 the video to be processed is pre-sampled by the singular value decomposition algorithm to extract candidate frames of the video to be processed.
  • the singular value and rank of the matrix to be decomposed can be obtained.
  • the process of pre-sampling the processed video by the singular value decomposition algorithm to extract candidate frames of the video to be processed may be implemented by the following steps:
  • the input frame is all video frames of the video to be processed.
  • the time varying feature vector corresponding to the input frame can be generated by inputting three color channels of the color saturation value (HSV) color space of the input frame.
  • the time varying feature vector is a row vector.
  • each feature matrix A time-varying feature vector containing a preset window size and a continuous input frame.
  • the window size is equal to the number of frames in the window.
  • a feature matrix can be formed by a time varying feature vector corresponding to a window size and a continuous input frame.
  • the feature matrix corresponding to the input frame of time t is The size of the feature matrix is N ⁇ L.
  • N is the window size and T is the number of all input frames in the video to be processed.
  • the feature matrix X N is composed of window-sized, continuous time-varying feature vectors x 1 , x 2 , . . . , x N , and the feature matrix X N+1 adjacent to the feature matrix is composed of window sizes.
  • the continuous time-varying feature vector x 2 , x 3 , . . . , x N+1 constitutes.
  • the formula for performing singular value decomposition on the feature matrix may be:
  • X U ⁇ V T , where X is a feature matrix, U is a set of matrices that output orthogonal singular vectors, V T is a set of input orthogonal singular vector matrices, and ⁇ is a singular value matrix.
  • the eigenmadia matrix X can obtain the singular value matrix ⁇ after the singular decomposition, and the singular value matrix is a diagonal matrix, and the diagonal elements of the singular value matrix are singular values, and the singular values are arranged in descending order.
  • the diagonal elements of the singular value matrix are q 1 , q 2 , . . . , q N , respectively, q 1 , q 2 , . . . , q N are singular values, and q 1 is the largest singular value among them. .
  • the rank of the corresponding feature matrix can be determined by the singular value matrix. Specifically, a threshold is preset, the singular value in the singular value matrix is sequentially compared with the threshold, and the number of singular values exceeding the threshold is counted, and the quantity is this The rank of the feature matrix corresponding to the singular value matrix.
  • the ranks of the adjacent feature matrices are sequentially compared.
  • the rank of the second feature matrices is greater than the rank of the first feature matrices, the last input frame corresponding to the second feature matrices is set as a candidate frame.
  • the input frame corresponding to the last time-varying feature vector in the second feature matrix may be considered to be different in visual content.
  • the previous input frame, so the input frame corresponding to the last time-varying feature vector in the second feature matrix is set as the candidate frame.
  • the first feature matrix is any feature matrix of all feature matrices
  • the second feature matrix is a next feature matrix adjacent to the first feature matrix in all feature matrices, that is, when the first feature matrix is the current phase
  • the second feature matrix is the second feature matrix in the current neighboring matrix.
  • step S103 a histogram of all candidate frames is generated according to the word bag model algorithm.
  • the word bag model is used for the representation of the candidate frame, which can effectively reduce the redundancy of the frame in the video.
  • generating a histogram of all candidate frames by using a word bag model can be implemented by the following steps:
  • the image features of the candidate frames are extracted by an image feature extraction algorithm.
  • the image feature extraction algorithm uses a Scale Invariant Feature Transform (SIFT) feature extraction algorithm, which can effectively extract a large number of SIFT descriptors in the candidate frame.
  • SIFT Scale Invariant Feature Transform
  • all image features on all candidate frames are clustered by a clustering algorithm to select representative image features, and these representative image features are set as feature codebooks.
  • the clustering algorithm uses a commonly used k-means clustering algorithm.
  • a histogram for representing each candidate frame is generated based on the feature distribution in all the feature codebooks.
  • a histogram may be generated for each candidate frame to represent each candidate frame by a corresponding histogram.
  • step S104 all the histograms are clustered by the high-density peak search algorithm based on the video representation, and the clustered cluster center points are acquired.
  • a high-density peak search algorithm based on video representation is proposed, which is more suitable for processing a clustering task of a frame in a video summary generation process.
  • the histogram can be regarded as a data point, and the distance between each two candidate frames, that is, the Euclidean distance between the histograms corresponding to the two candidate frames.
  • ⁇ i is the local density of the ith candidate frame
  • d ij is the distance between the ith candidate frame and the jth candidate frame
  • d c is a preset cutoff distance. It can be seen that the local density ⁇ i of the candidate frame is the number of candidate frames whose distance from the candidate frame is less than the cutoff distance d c .
  • the high density point distance of the candidate frame that is, the distance between the candidate frame and the candidate frame having a higher local density.
  • the formula for calculating the high-density point distance of the i-th candidate frame is:
  • ⁇ i is the high-density point distance of the ith candidate frame
  • d ij is the distance between the ith candidate frame and the j-th candidate frame.
  • the ith candidate frame and the remaining are calculated.
  • the maximum distance between the candidate frames is set to the high density point distance ⁇ i of the i th candidate frame.
  • the candidate frame whose local density is larger than the ith candidate is obtained, and the minimum distance between the ith candidate frame and the candidate frames is calculated, and this is The minimum distance is set to the high density point distance ⁇ i of the ith candidate frame.
  • the cluster center point is obtained according to the local density and the high density point distance corresponding to each candidate frame.
  • ⁇ *( ⁇ * ⁇ )+(1- ⁇ )* ⁇ , where ⁇ is a preset parameter, the value of the parameter ranges from 0 to 0.5, ⁇ is the local density, ⁇ is the center point distance, and ⁇ is Cluster value.
  • candidate frames with lower local density and larger high-density point distances are more important. This new strategy makes such candidate frames more convincing to be the cluster center point of the video summary.
  • step S105 a static video summary of the video to be processed is generated according to each cluster center point.
  • not among the cluster center points obtained by clustering not every cluster center point can be used as a frame in the static video summary, so the cluster center points are to be filtered.
  • the cluster values of each cluster center point are arranged to obtain a scattergram of all cluster values.
  • a cluster value of a magnitude of increase or a sudden increase in slope is obtained from the scattergram, and the cluster value is set as a threshold.
  • the clustering values of all the cluster center points are compared with the threshold value. When the clustering value exceeds the threshold value, the candidate frame corresponding to the cluster center point of the clustering value is reserved as one frame of the static video summary. Finally, generate a complete static video summary.
  • a singular value decomposition algorithm is first used to extract candidate frames of the video to be processed, and a histogram for representing the candidate frames is generated by the word bag model, which effectively reduces the redundancy of the frames in the video. Then, using the high-density peak search clustering algorithm based on video representation, all candidate frames are clustered to adaptively generate a certain number of cluster centers according to the histogram of the video frames, thereby avoiding pre-setting clusters before clustering. The number of class centers does not have to perform an iterative process, which effectively improves the stability and adaptability of clustering and reduces the time complexity of clustering. Finally, the clustering center is filtered using a pre-set strategy to generate a more representative static video summary. Thereby effectively improving the generation efficiency and generation quality of the static video summary.
  • Embodiment 2 is a diagrammatic representation of Embodiment 1:
  • FIG. 2 is a diagram showing the structure of a static video summary generating apparatus according to Embodiment 2 of the present invention. For the convenience of description, only parts related to the embodiment of the present invention are shown, including:
  • a video receiving module 21 configured to receive a video to be processed input by a user
  • the candidate frame extraction module 22 is configured to perform pre-sampling of the video to be processed by the singular value decomposition algorithm to extract a candidate frame of the video to be processed;
  • a histogram representation module 23 configured to generate a histogram of all candidate frames according to the word bag model algorithm
  • a clustering operation module 24 configured to cluster all histograms by a high-density peak search algorithm based on a video representation, and obtain clustered cluster center points;
  • the video summary generating module 25 is configured to generate a static video summary of the to-be-processed video according to each cluster center point.
  • the candidate frame extraction module 22 further includes a vector generation module 321, a feature matrix construction module 322, a singular value decomposition module 323, and a candidate frame determination module 324, wherein:
  • a vector generation module 321, configured to generate a time-varying feature vector of each input frame in the to-be-processed video
  • the feature matrix construction module 322 is configured to construct a feature matrix for all input frames in turn according to the time-varying feature vector, and each feature matrix includes a time-varying feature vector of a preset window size and a continuous input frame;
  • the singular value decomposition module 323 is configured to perform singular value decomposition on all feature matrices to obtain a singular value-decomposed singular value matrix, and determine a rank of each feature matrix according to the singular value matrix;
  • the candidate frame determining module 324 is configured to sequentially compare the ranks of the adjacent feature matrices. When the rank of the second feature matrix is greater than the rank of the first feature matrix, set the last input frame corresponding to the second feature matrix to The candidate frame, the first feature matrix is any feature matrix of all feature matrices, and the second feature matrix is the next feature matrix adjacent to the first feature matrix in all feature matrices.
  • the histogram representation module 23 further includes a feature extraction module 431, a codebook generation module 432, and a histogram generation module 433, wherein:
  • a feature extraction module 431, configured to extract image features of all candidate frames
  • a codebook generating module 432 configured to generate a feature codebook of each candidate frame by clustering according to all image features
  • the histogram generation module 433 is configured to generate a histogram for representing each candidate frame according to the feature distribution in all the feature codebooks.
  • the clustering operation module 24 further includes a candidate frame distance calculation module 541, a local density calculation module 542, a high density point distance calculation module 543, and a cluster center point acquisition module 544, wherein:
  • a candidate frame distance calculation module 541, configured to calculate a distance between every two candidate frames in all candidate frames according to all histograms
  • the local density calculation module 542 is configured to calculate a local density of each candidate frame according to a distance between each two candidate frames and a preset cutoff distance;
  • a high density point distance calculation module 543 for calculating a high density point distance of each candidate frame based on all local densities
  • the cluster center point obtaining module 544 is configured to obtain a cluster center point according to a local density and a high density point distance of each candidate frame.
  • the cluster center point acquisition module 544 further includes a cluster value calculation module 5441, wherein:
  • the clustering value calculation module 5441 is configured to calculate a clustering value corresponding to each candidate frame by using a weighted-based peak search clustering strategy according to a local density and a high density point distance of each candidate frame, and based on the weighted peak search cluster.
  • the formula for a class strategy is:
  • ⁇ *( ⁇ * ⁇ )+(1- ⁇ )* ⁇ , where ⁇ is a clustering value, ⁇ is a preset parameter, ⁇ is a local density, and ⁇ is a high density point distance.
  • the video summary generating module 25 further includes
  • the threshold setting module 651 is configured to arrange the clustering values of each cluster center point, obtain clustering values of all clustering values, or a sudden increase in the slope, and increase the amplitude or the slope suddenly The clustering value of the increased amplitude is set to the threshold;
  • the video summary frame setting module 652 is configured to compare each cluster value with a threshold value when the cluster value exceeds When the threshold is exceeded, the candidate frame of the cluster center point corresponding to the cluster value is set as the video frame in the static video summary.
  • a singular value decomposition algorithm is firstly used to extract candidate frames of the video to be processed, and then a histogram for representing the candidate frames is generated by the word bag model, and then a high-density peak search based on the video representation is used.
  • the clustering algorithm clusters all video frames and selects the cluster center in the clustering process to generate a more representative static video summary, which not only effectively reduces the video.
  • each module of the static video summary generating device may be implemented by a corresponding hardware or software module, and each module may be an independent software and hardware module, or may be integrated into a software and hardware module.
  • each module may be implemented by a corresponding hardware or software module, and each module may be an independent software and hardware module, or may be integrated into a software and hardware module.

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明适用计算机技术领域,提供了一种静态视频摘要的生成方法及装置,所述方法包括:接收用户输入的待处理视频;通过奇异值分解算法对待处理视频进行预采样,以提取待处理视频的候选帧;根据词袋模型算法,分别生成用来表示每个候选帧的直方图;通过基于视频表示的高密度峰值搜索算法对所有直方图进行聚类,并获取聚类后的簇中心点;根据每个簇中心点,生成待处理视频的静态视频摘要。从而通过候选帧的生成和直方图表示,更深入地去除冗余的帧,且在聚类过程中自适应地生成簇类中心,无需预先设置簇的数量、无迭代过程,有效地提高了聚类的稳定性和适应性、降低了聚类的时间复杂度,进而有效地提高了静态视频摘要的生成效率和质量。

Description

一种静态视频摘要的生成方法及装置 技术领域
本发明属于计算机技术领域,尤其涉及一种静态视频摘要的生成方法及装置。
背景技术
近年来,随着多媒体技术的发展,在网络上观看自己喜欢的视频已经成为多数人日常生活中不可缺少的一部分,但如何帮助人们从大量的视频中快速获取自己喜欢的、感兴趣的视频,在技术上仍然是一个具有挑战性的问题。静态视频总结是一个有效、经典的解决该问题的方法,该方法通过去除视频中的冗余帧,得到能够简要表示视频内容的静态视频摘要。用户通过观看视频摘要就可以了解到视频的大概内容,并判断是否有兴趣观看整段视频。
目前,相关研究人员已提出了静态视频摘要的多种方法,其中,一种方法将视频分成多个镜头,并基于颜色直方图特征,采用k-均值(k-means)聚类算法将每个镜头的帧分组成簇(预先设置好簇的数量),将每个镜头的聚类中心设置为静态视频摘要结果;另一种方法提出了静态视频摘要的三个步骤,首先,基于颜色和边缘信息对镜头进行边界检测,其次,在聚类过程中根据镜头中的运动类型和场景对镜头进行分类,最后,采用镜头重要滤波器,通过计算运动能量和颜色变化来确定每个镜头的重要性,在镜头拍摄过程中选择每个簇的重要镜头;还有一种方法,先通过消除视频中一些无意义的帧来获得候选帧,再采用k-means聚类方法将所有的候选帧划分为簇(簇的数量有相邻帧之间视觉内容的变化决定),最后在这些簇中过滤一些类似的帧,过滤后剩下的帧被认为是静态视频摘要的结果。
在上述的现有方法中,由于类似的镜头在视频中可能出现多次,所以第一 种方法和第二种方法所采用的基于镜头的方式均存在冗余,且第一种方法中预先设置聚类的簇的数量会影响到最佳视频摘要结果的生成,而第三种方法在聚类前的去冗余工作不够深入,仅简单地去除了一些简单的、无意义的视频帧。
发明内容
本发明的目的在于提供一种静态视频摘要的生成方法及装置,旨在解决由于现有技术无法提供一种静态视频摘要生成的有效方法,在生成静态视频摘要时视频中的冗余帧去除程度较低、需人工指定聚类后簇的数量,导致静态视频摘要生成效率较低、生成的静态视频摘要质量不稳定的问题。
一方面,本发明提供了一种静态视频摘要的生成方法,所述方法包括下述步骤:
接收用户输入的待处理视频;
通过奇异值分解算法对所述待处理视频进行预采样,以提取所述待处理视频的候选帧;
根据词袋模型算法,分别生成所述所有候选帧的直方图;
通过基于视频表示的高密度峰值搜索算法对所述所有直方图进行聚类,并获取聚类后的簇中心点;
根据所述每个簇中心点,生成所述待处理视频的静态视频摘要。
另一方面,本发明提供了一种静态视频摘要的生成装置,所述装置包括:
视频接收模块,用于接收用户输入的待处理视频;
候选帧提取模块,用于通过奇异值分解算法对所述待处理视频进行预采样,以提取所述待处理视频的候选帧;
直方图表示模块,用于根据词袋模型算法,分别生成所述所有候选帧的直方图;
聚类运算模块,用于通过基于视频表示的高密度峰值搜索算法对所述所有直方图进行聚类,并获取聚类后的簇中心点;以及
视频摘要生成模块,用于根据所述每个簇中心点,生成所述待处理视频的静态视频摘要。
本发明先采用奇异值分解算法,对待处理视频进行预采样,得到待处理视频的候选帧,再采用词袋模型,生成用来表示这些候选帧的直方图,接着,采用基于视频表示的高密度峰值搜索算法,对所有的直方图进行聚类,最后根据聚类后的每个簇中心点,生成待处理视频的静态视频摘要,从而不仅有效地提高了视频中帧的去冗余效果,且在聚类过程中不需预先设置好簇类中心的数量,能够根据视频的内容自适应地生成一定数量的簇类中心,有效地提高了聚类的稳定性和适应性,降低了聚类的时间复杂度,进而有效地提高了静态视频摘要的生成效率和质量。
附图说明
图1是本发明实施例一提供的静态视频摘要的生成方法的实现流程图;
图2是本发明实施例二提供的静态视频摘要的生成装置的结构示意图;
图3是本发明实施例二提供的静态视频摘要的生成装置中候选帧提取模块的结构示意图;
图4是本发明实施例二提供的静态视频摘要的生成装置中直方图表示模块的结构示意图;
图5是本发明实施例二提供的静态视频摘要的生成装置中聚类运算模块的结构示意图;以及
图6是是本发明实施例二提供的静态视频摘要的生成装置中视频摘要生成模块的结构示意图。
具体实施方式
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅 仅用以解释本发明,并不用于限定本发明。
以下结合具体实施例对本发明的具体实现进行详细描述:
实施例一:
图1示出了本发明实施例一提供的静态视频摘要的生成方法的实现流程,为了便于说明,仅示出了与本发明实施例相关的部分,详述如下:
在步骤S101中,接收用户输入的待处理视频。
本发明实施例适用于可进行视频处理的平台或智能设备。当用户需要提取一段视频的静态视频摘要时,可将该段视频作为待处理视频,输入当前可进行视频处理的平台或智能设备。
在步骤S102中,通过奇异值分解算法对待处理视频进行预采样,以提取待处理视频的候选帧。
在本发明实施例中,一段视频中不同帧的图像之间,会存在相当多的重复信息。通过对视频中的所有输入帧进行预采样,可去除一些重复(或冗余)的帧,得到多个候选帧。这些候选帧可作为后续聚类操作的对象。
通过奇异值分解算法,可得到被分解矩阵的奇异值和秩。具体地,通过奇异值分解算法对待处理视频进行预采样,以提取待处理视频的候选帧的过程,可通过下述步骤实现:
(1)生成待处理视频中每个输入帧的时变特征向量。
在本发明实施例中,输入帧为待处理视频的所有视频帧。可通过输入帧在色相饱和度值(HSV)色彩空间的三个颜色通道,生成该输入帧对应的时变特征向量。具体地,时变特征向量为行向量。
作为示例地,在待处理视频中,时间为t的输入帧对应的时变特征向量为xt=[hHhShV]。其中,hH、hS以及hV分别为色彩饱和度值(HSV)色彩空间的三个颜色通道,分别为这三个颜色通道建立三个直方图,长度为lH、lS以及lV,所以时变特征向量的长度为L=lH+lS+lV
(2)根据时变特征向量,依次为所有输入帧构建特征矩阵,每个特征矩阵 包含预设窗口大小个、连续的输入帧的时变特征向量。
在本发明实施例中,窗口大小等于该窗口中帧的数量。可由窗口大小个、连续的输入帧所对应的时变特征向量,构成一个特征矩阵。
作为示例地,在待处理视频中,时间为t的输入帧对应的特征矩阵为
Figure PCTCN2017072416-appb-000001
特征矩阵的大小为N×L。其中,N为窗口大小,T为待处理视频中所有输入帧的数量。
作为示例地,特征矩阵XN由窗口大小个、连续的时变特征向量x1,x2,...,xN构成,与该特征矩阵相邻的特征矩阵XN+1由窗口大小个、连续的时变特征向量x2,x3,...,xN+1构成。
(3)对所有特征矩阵进行奇异值分解,以获取每个特征矩阵对应的奇异值矩阵,并根据奇异值矩阵,确定每个特征矩阵的秩。
在本发明实施例中,对特征矩阵进行奇异值分解的公式可为:
X=UΣVT,其中,X为特征矩阵,U为一组输出正交奇异向量的矩阵,VT为一组输入正交奇异向量矩阵,Σ为奇异值矩阵。特征矩阵X在奇异分解后可得到奇异值矩阵Σ,奇异值矩阵为对角矩阵,且奇异值矩阵的对角元素为奇异值,这些奇异值按照从大到小的顺序排列。作为示例地,当奇异值矩阵的对角元素分别为q1,q2,…,qN时,q1,q2,…,qN都为奇异值,且q1是其中最大的奇异值。
通过奇异值矩阵可确定相应的特征矩阵的秩,具体地,预先设置一个阈值,依次将奇异值矩阵中的奇异值与该阈值进行比较,并统计超过该阈值的奇异值数量,这个数量即这个奇异值矩阵所对应特征矩阵的秩。
(4)依次将相邻的特征矩阵的秩进行比较,当第二特征矩阵的秩大于第一特征矩阵的秩时,将第二特征矩阵所对应的最后一个输入帧设置为候选帧。
在本发明实施例中,当第二特征矩阵的秩超过第一特征矩阵的秩时,可认为第二特征矩阵中最后一个时变特征向量对应的输入帧,在视觉内容上不同于 前一个输入帧,所以将第二特征矩阵中最后一个时变特征向量对应的输入帧设置为候选帧。在将所有相邻特征矩阵的秩进行一一比较后,可得到多个候选帧。
具体地,第一特征矩阵为所有特征矩阵中的任一特征矩阵,第二特征矩阵为在所有特征矩阵中与第一特征矩阵相邻的下一个特征矩阵,即当第一特征矩阵为当前相邻特征矩阵中的第一个特征矩阵时,第二特征矩阵为当前相邻矩阵中的第二个特征矩阵。
在步骤S103中,根据词袋模型算法,分别生成所有候选帧的直方图。
在本发明实施例中,将词袋模型用于候选帧的表示,可有效地减少视频中帧的冗余。
具体地,通过词袋模型,分别生成所有候选帧的直方图,可通过以下步骤实现:
(1)提取所有候选帧的图像特征。
具体地,通过图像特征提取算法,提取候选帧的图像特征。优选地,图像特征提取算法采用尺度不变特征变换(SIFT)特征提取算法,该算法能够有效地提取出候选帧中大量的SIFT描述符。
(2)根据所有图像特征,通过聚类生成每个候选帧的特征码本。
具体地,通过聚类算法,对所有候选帧上的所有图像特征进行聚类,以选出具有代表性的图像特征,并将这些具有代表性的图像特征设置为特征码本。可选地,聚类算法采用常用的k-means聚类算法。
(3)根据所有特征码本中的特征分布,生成用于表示每个候选帧的直方图。
具体地,根据特征码本上的特征分布情况,可以为每个候选帧生成直方图,以通过相应的直方图来表示每个候选帧。
在步骤S104中,通过基于视频表示的高密度峰值搜索算法对所有直方图进行聚类,并获取聚类后的簇中心点。
在本发明实施例中,提出了基于视频表示的高密度峰值搜索算法,该算法更适合处理视频摘要生成过程中帧的聚类任务。
其中,通过基于视频表示的高密度峰值搜索算法对所有直方图进行聚类,并获取聚类后的簇中心点,可通过以下的步骤实现:
(1)根据所有直方图,计算所有候选帧中每两个候选帧之间的距离。
具体地,直方图可看做数据点,每两个候选帧之间的距离,即该两个候选帧所对应直方图之间的欧氏距离。
(2)根据每两个候选帧之间的距离和预设的截止距离,计算每个候选帧对应的局部密度。
具体地,局部密度的计算公式为:
Figure PCTCN2017072416-appb-000002
当dij-dc<0时,χ(dij-dc)=1,否则χ(dij-dc)=0。其中,ρi为第i个候选帧的局部密度,dij为第i个候选帧与第j个候选帧之间的距离,dc为预设的截止距离。可见,候选帧的局部密度ρi为与该候选帧距离小于截止距离dc的候选帧数量。
(3)根据所有局部密度,计算每个候选帧对应的高密度点距离。
具体地,候选帧的高密度点距离,即该候选帧与具有较高局部密度的候选帧间的距离。第i个候选帧的高密度点距离的计算公式为:
Figure PCTCN2017072416-appb-000003
其中,δi为第i个候选帧的高密度点距离,dij为第i个候选帧与第j个候选帧之间的距离。
具体地,当第i个候选帧的局部密度ρi为最高局部密度时(此时第i个候选帧为最高局部密度点,该点局部密度的数值最大),计算第i个候选帧与剩余候选帧之间的最大距离,将该最大距离设置为第i个候选帧的高密度点距离δi
当第i个候选帧的局部密度ρi不为最高局部密度时,获取局部密度比第i个候选大的候选帧,计算第i个候选帧与这些候选帧之间的最小距离,并将此最小距离设置为第i个候选帧的高密度点距离δi
(4)根据每个候选帧对应的局部密度和高密度点距离,获取簇中心点。
具体地,在基于视频表示的高密度峰值搜索算法中,我们提出了一种新的 策略,来实现簇中心点的生成,使得聚类算法更能捕获视频内容的本质。这种新的策略即基于加权的峰值搜索策略,具体公式为:
γ=α*(ρ*δ)+(1-α)*δ,其中,α为预设参数,该参数的取值范围为0~0.5,ρ为局部密度,δ为中心点距离,γ为聚类值。
在视频摘要的获取过程中,具有较低的局部密度、以及较大的高密度点距离的候选帧更为重要。这种新的策略便使得这类候选帧,被更加趋向地认为是视频摘要的簇中心点。
在步骤S105中,根据每个簇中心点,生成待处理视频的静态视频摘要。
在本发明实施例中,聚类得到的多个簇中心点中,并不是每个簇中心点都可以作为静态视频摘要中的帧,所以,要对这些簇中心点进行筛选。
具体地,将每个簇中心点的聚类值进行排列,得到所有聚类值的散点图。从该散点图中获取增长幅度、或斜率突然大幅度增大的聚类值,并将这个聚类值设置为阈值。再将所有簇中心点的聚类值与该阈值进行一一比较,当聚类值超过该阈值时,将此聚类值对应簇中心点的候选帧保留为静态视频摘要的一帧。最后,生成完整的静态视频摘要。
在本发明实施中,首先采用一个奇异值分解算法提取待处理视频的候选帧,通过词袋模型生成用来表示这些候选帧的直方图,有效地降低了视频中帧的冗余。接着,采用基于视频表示的高密度峰值搜索聚类算法,对所有的候选帧进行聚类,以根据视频帧的直方图自适应地产生一定数量的簇类中心,避免在聚类前预先设置簇类中心的数量,且不必执行迭代过程,有效地提高了聚类的稳定性和适应性、降低了聚类的时间复杂度。最后,使用预先设定好的策略进行聚类中心的筛选,生成更有代表性的静态视频摘要。从而有效地提高了静态视频摘要的生成效率和生成质量。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,所述的程序可以存储于一计算机可读取存储介质中,所述的存储介质,如ROM/RAM、磁盘、光盘等。
实施例二:
图2示出了本发明实施例二提供的静态视频摘要的生成装置的结构,为了便于说明,仅示出了与本发明实施例相关的部分,其中包括:
视频接收模块21,用于接收用户输入的待处理视频;
候选帧提取模块22,用于通过奇异值分解算法对待处理视频进行预采样,以提取待处理视频的候选帧;
直方图表示模块23,用于根据词袋模型算法,分别生成所有候选帧的直方图;
聚类运算模块24,用于通过基于视频表示的高密度峰值搜索算法对所有直方图进行聚类,并获取聚类后的簇中心点;以及
视频摘要生成模块25,用于根据每个簇中心点,生成待处理视频的静态视频摘要。
优选地,如图3所示,候选帧提取模块22还包括向量生成模块321,特征矩阵构建模块322、奇异值分解模块323和候选帧确定模块324,其中:
向量生成模块321,用于生成待处理视频中每个输入帧的时变特征向量;
特征矩阵构建模块322,用于根据时变特征向量,依次为所有输入帧构建特征矩阵,每个特征矩阵包含预设窗口大小个、连续的输入帧的时变特征向量;
奇异值分解模块323,用于对所有特征矩阵进行奇异值分解,以获取奇异值分解后的奇异值矩阵,并根据奇异值矩阵,确定每个特征矩阵的秩;以及
候选帧确定模块324,用于依次将相邻的特征矩阵的秩进行比较,当第二特征矩阵的秩大于第一特征矩阵的秩时,将第二特征矩阵所对应的最后一个输入帧设置为候选帧,第一特征矩阵为所有特征矩阵中的任一特征矩阵,第二特征矩阵为在所有特征矩阵中与第一特征矩阵相邻的下一个特征矩阵。
优选地,如图4所示,直方图表示模块23还包括特征提取模块431、码本生成模块432和直方图生成模块433,其中:
特征提取模块431,用于提取所有候选帧的图像特征;
码本生成模块432,用于根据所有图像特征,通过聚类生成每个候选帧的特征码本;以及
直方图生成模块433,用于根据所有特征码本中的特征分布,生成用来表示每个候选帧的直方图。
优选地,如图5所示,聚类运算模块24还包括候选帧距离计算模块541、局部密度计算模块542、高密度点距离计算模块543以及簇中心点获取模块544,其中:
候选帧距离计算模块541,用于根据所有直方图,计算所有候选帧中每两个候选帧之间的距离;
局部密度计算模块542,用于根据每两个候选帧之间的距离和预设的截止距离,计算每个候选帧的局部密度;
高密度点距离计算模块543,用于根据所有局部密度,计算每个候选帧的高密度点距离;以及
簇中心点获取模块544,用于根据每个候选帧的局部密度和高密度点距离,获取簇中心点。
优选地,簇中心点获取模块544还包括聚类值计算模块5441,其中:
聚类值计算模块5441,用于根据每个候选帧的局部密度和高密度点距离,采用基于加权的峰值搜索聚类策略,计算每个候选帧对应的聚类值,基于加权的峰值搜索聚类策略的公式为:
γ=α*(ρ*δ)+(1-α)*δ,其中,γ为聚类值,α为预设参数,ρ为局部密度,δ为高密度点距离。
优选地,如图6所示,视频摘要生成模块25还包括
阈值设置模块651,用于将每个簇中心点的聚类值进行排列,获取所有聚类值中增长幅度、或者斜率突然大幅度增大的聚类值,并将增长幅度、或者斜率突然大幅度增大的聚类值设置为阈值;以及
视频摘要帧设置模块652,用于将每个聚类值与阈值进行比较,当聚类值超 过阈值时,将聚类值所对应簇中心点的候选帧设置为静态视频摘要中的视频帧。
在本发明实施例中,首先采用一个奇异值分解算法,提取待处理视频的候选帧,再通过词袋模型,生成用来表示这些候选帧的直方图,接着采用基于视频表示的高密度峰值搜索聚类算法,对所有视频帧进行聚类,并在聚类过程中使用预先设定好的策略对聚类中心进行选择,以生成更有代表性的静态视频摘要,从而不仅有效地降低了视频中帧的冗余,且在聚类时可根据视频帧的直方图自适应地产生一定数量的簇类中心,不需预先设置簇类中心的数量,无迭代过程,有效地提高了聚类的稳定性和适应性、降低了聚类的时间复杂度,进而从而有效地提高了静态视频摘要的生成效率和质量。
在本发明实施例中,静态视频摘要的生成装置的各模块可由相应的硬件或软件模块实现,各模块可以为独立的软、硬件模块,也可以集成为一个软、硬件模块,在此不用以限制本发明。本发明实施例中各模块的具体实施方式可参考前述实施例一中各步骤的描述,在此不再赘述。
以上所述仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明的保护范围之内。

Claims (12)

  1. 一种静态视频摘要的生成方法,其特征在于,所述方法包括下述步骤:
    接收用户输入的待处理视频;
    通过奇异值分解算法对所述待处理视频进行预采样,以提取所述待处理视频的候选帧;
    根据词袋模型算法,分别生成所述所有候选帧的直方图;
    通过基于视频表示的高密度峰值搜索算法对所述所有直方图进行聚类,并获取聚类后的簇中心点;
    根据所述每个簇中心点,生成所述待处理视频的静态视频摘要。
  2. 如权利要求1所述的方法,其特征在于,通过奇异值分解算法对所述待处理视频进行预采样,以提取所述待处理视频的候选帧的步骤,包括:
    生成所述待处理视频中每个输入帧的时变特征向量;
    根据所述时变特征向量,依次为所述所有输入帧构建特征矩阵,所述每个特征矩阵包含预设窗口大小个、连续的输入帧的时变特征向量;
    对所述所有特征矩阵进行奇异值分解,以获取所述每个特征矩阵对应的奇异值矩阵,并根据所述奇异值矩阵,确定所述每个特征矩阵的秩;
    依次将相邻的特征矩阵的秩进行比较,当第二特征矩阵的秩大于第一特征矩阵的秩时,将所述第二特征矩阵所对应的最后一个输入帧设置为候选帧,所述第一特征矩阵为所述所有特征矩阵中的任一特征矩阵,所述第二特征矩阵为在所述所有特征矩阵中与所述第一特征矩阵相邻的下一个特征矩阵。
  3. 如权利要求1所述的方法,其特征在于,根据词袋模型算法,分别生成所述所有候选帧的直方图的步骤,包括:
    提取所述所有候选帧的图像特征;
    根据所述所有图像特征,通过聚类生成所述每个候选帧的特征码本;
    根据所述所有特征码本中的特征分布,生成用于表示所述每个候选帧的直方图。
  4. 如权利要求1所述的方法,其特征在于,通过基于视频表示的高密度峰值搜索算法对所述所有直方图进行聚类,并获取聚类后的簇中心点的步骤,包括:
    根据所述所有直方图,计算所述所有候选帧中每两个候选帧之间的距离;
    根据所述每两个候选帧之间的距离和预设的截止距离,计算所述每个候选帧对应的局部密度;
    根据所述所有局部密度,计算所述每个候选帧对应的高密度点距离;
    根据所述每个候选帧对应的局部密度和高密度点距离,获取所述簇中心点。
  5. 如权利要求4所述的方法,其特征在于,根据所述每个候选帧对应的局部密度和高密度点距离,获取所述簇中心点的步骤,包括:
    根据所述每个候选帧的局部密度和高密度点距离,采用基于加权的峰值搜索聚类策略,计算所述每个候选帧对应的聚类值,所述基于加权的峰值搜索聚类策略的公式为:
    γ=α*(ρ*δ)+(1-α)*δ,其中,γ为所述聚类值,α为预设参数,ρ为所述局部密度,δ为所述高密度点距离。
  6. 如权利要求1所述的方法,其特征在于,根据所述每个簇中心点,生成所述待处理视频的静态视频摘要的步骤,包括:
    将所述每个簇中心点的聚类值进行排列,获取所述所有聚类值中增长幅度、或斜率突然大幅度增大的聚类值,并将所述增长幅度、或斜率突然大幅度增大的聚类值设置为阈值;
    将所述每个聚类值与所述阈值进行比较,当所述聚类值超过所述阈值时,将所述聚类值所对应簇中心点的候选帧设置为所述静态视频摘要中的视频帧。
  7. 一种静态视频摘要的生成装置,其特征在于,所述装置包括:
    视频接收模块,用于接收用户输入的待处理视频;
    候选帧提取模块,用于通过奇异值分解算法对所述待处理视频进行预采样,以提取所述待处理视频的候选帧;
    直方图表示模块,用于根据词袋模型算法,分别生成所述所有候选帧的直方图;
    聚类运算模块,用于通过基于视频表示的高密度峰值搜索算法对所述所有直方图进行聚类,并获取聚类后的簇中心点;以及
    视频摘要生成模块,用于根据所述每个簇中心点,生成所述待处理视频的静态视频摘要。
  8. 如权利要求7所述的装置,其特征在于,所述候选帧提取模块包括:
    向量生成模块,用于生成所述待处理视频中每个输入帧的时变特征向量;
    特征矩阵构建模块,用于根据所述时变特征向量,依次为所述所有输入帧构建特征矩阵,所述每个特征矩阵包含预设窗口大小个、连续的输入帧的时变特征向量;
    奇异值分解模块,用于对所述所有特征矩阵进行奇异值分解,以获取所述每个特征矩阵对应的奇异值矩阵,并根据所述奇异值矩阵,确定所述每个特征矩阵的秩;以及
    候选帧确定模块,用于依次将相邻的特征矩阵的秩进行比较,当第二特征矩阵的秩大于第一特征矩阵的秩时,将所述第二特征矩阵所对应的最后一个输入帧设置为候选帧,所述第一特征矩阵为所述所有特征矩阵中的任一特征矩阵,所述第二特征矩阵为在所述所有特征矩阵中与所述第一特征矩阵相邻的下一个特征矩阵。
  9. 如权利要求7所述的装置,其特征在于,所述直方图表示模块包括:
    特征提取模块,用于提取所述所有候选帧的图像特征;
    码本生成模块,用于根据所述所有图像特征,通过聚类生成所述每个候选帧的特征码本;以及
    直方图生成模块,用于根据所述所有特征码本中的特征分布,生成用于表示所述每个候选帧的直方图。
  10. 如权利要求7所述的装置,其特征在于,所述聚类运算模块包括:
    候选帧距离计算模块,用于根据所述所有直方图,计算所述所有候选帧中每两个候选帧之间的距离;
    局部密度计算模块,用于根据所述每两个候选帧之间的距离和预设的截止距离,计算所述每个候选帧的局部密度;
    高密度点距离计算模块,用于根据所有局部密度,计算所述每个候选帧对应的高密度点距离;以及
    簇中心点获取模块,用于根据所述每个候选帧的局部密度和高密度点距离,获取所述簇中心点。
  11. 如权利要求10所述的装置,其特征在于,所述簇中心点获取模块包括:
    聚类值计算模块,用于根据所述每个候选帧的局部密度和高密度点距离,采用基于加权的峰值搜索聚类策略,计算所述每个候选帧对应的聚类值,所述基于加权的峰值搜索聚类策略的公式为:
    γ=α*(ρ*δ)+(1-α)*δ,其中,γ为所述聚类值,α为预设参数,ρ为所述局部密度,δ为所述高密度点距离。
  12. 如权利要求7所述的装置,其特征在于,所述视频摘要生成模块包括:
    阈值设置模块,用于将所述每个簇中心点的聚类值进行排列,获取所述所有聚类值中增长幅度、或者斜率突然大幅度增大的聚类值,并将所述增长幅度、或者斜率突然大幅度增大的聚类值设置为阈值;以及
    视频摘要帧设置模块,用于将所述每个聚类值与所述阈值进行比较,当所述聚类值超过所述阈值时,将所述聚类值所对应簇中心点的候选帧设置为所述静态视频摘要中的视频帧。
PCT/CN2017/072416 2017-01-24 2017-01-24 一种静态视频摘要的生成方法及装置 WO2018137126A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2017/072416 WO2018137126A1 (zh) 2017-01-24 2017-01-24 一种静态视频摘要的生成方法及装置
CN201780000556.2A CN107223344A (zh) 2017-01-24 2017-01-24 一种静态视频摘要的生成方法及装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2017/072416 WO2018137126A1 (zh) 2017-01-24 2017-01-24 一种静态视频摘要的生成方法及装置

Publications (1)

Publication Number Publication Date
WO2018137126A1 true WO2018137126A1 (zh) 2018-08-02

Family

ID=59955073

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/072416 WO2018137126A1 (zh) 2017-01-24 2017-01-24 一种静态视频摘要的生成方法及装置

Country Status (2)

Country Link
CN (1) CN107223344A (zh)
WO (1) WO2018137126A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112528025A (zh) * 2020-12-16 2021-03-19 平安科技(深圳)有限公司 基于密度的文本聚类方法、装置、设备及存储介质
CN112579823A (zh) * 2020-12-28 2021-03-30 山东师范大学 基于特征融合和增量滑动窗口的视频摘要生成方法及系统
CN112580563A (zh) * 2020-12-25 2021-03-30 北京百度网讯科技有限公司 视频摘要的生成方法、装置、电子设备和存储介质

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108848422B (zh) * 2018-04-19 2020-06-02 清华大学 一种基于目标检测的视频摘要生成方法
CN111510724A (zh) * 2019-01-31 2020-08-07 北京小犀智能科技中心(有限合伙) 基于图像特征提取的等效视频压缩存储方法及系统
CN109819338B (zh) * 2019-02-22 2021-09-14 影石创新科技股份有限公司 一种视频自动剪辑方法、装置及便携式终端
CN109934142B (zh) * 2019-03-04 2021-07-06 北京字节跳动网络技术有限公司 用于生成视频的特征向量的方法和装置
CN110223380B (zh) * 2019-06-11 2021-04-23 中国科学院自动化研究所 融合航拍与地面视角图像的场景建模方法、系统、装置
CN110996183B (zh) * 2019-07-12 2022-01-21 北京达佳互联信息技术有限公司 视频摘要的生成方法、装置、终端及存储介质
CN110650379B (zh) * 2019-09-26 2022-04-01 北京达佳互联信息技术有限公司 视频摘要生成方法、装置、电子设备及存储介质
CN112883782B (zh) * 2021-01-12 2023-03-24 上海肯汀通讯科技有限公司 投放行为识别方法、装置、设备及存储介质
CN112861852A (zh) * 2021-01-19 2021-05-28 北京金山云网络技术有限公司 样本数据筛选方法、装置、电子设备及存储介质
CN113038142B (zh) * 2021-03-25 2022-11-01 北京金山云网络技术有限公司 视频数据的筛选方法、装置及电子设备
CN114786039B (zh) * 2022-04-25 2024-03-26 海信电子科技(武汉)有限公司 服务器及视频预览图的制作方法
CN116233569B (zh) * 2023-05-06 2023-07-11 石家庄铁道大学 一种基于运动信息协助的视频摘要生成方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060023944A1 (en) * 2002-09-27 2006-02-02 Lionel Oisel Method and device for measuring similarity between images
CN102098449A (zh) * 2010-12-06 2011-06-15 北京邮电大学 一种利用标志检测进行电视节目内部自动分割的方法
CN103150373A (zh) * 2013-03-08 2013-06-12 北京理工大学 一种高满意度视频摘要生成方法

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100468967B1 (ko) * 2001-12-28 2005-01-29 엘지전자 주식회사 썸네일 영상 생성장치 및 방법
CN101404030B (zh) * 2008-11-05 2011-07-20 中国科学院计算技术研究所 一种视频中周期性结构片段检测的方法及系统
CN104063883B (zh) * 2014-07-07 2018-03-16 杭州银江智慧医疗集团有限公司 一种基于对象和关键帧相结合的监控视频摘要生成方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060023944A1 (en) * 2002-09-27 2006-02-02 Lionel Oisel Method and device for measuring similarity between images
CN102098449A (zh) * 2010-12-06 2011-06-15 北京邮电大学 一种利用标志检测进行电视节目内部自动分割的方法
CN103150373A (zh) * 2013-03-08 2013-06-12 北京理工大学 一种高满意度视频摘要生成方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
., 30 April 2013 (2013-04-30), ISSN: 1674-3229 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112528025A (zh) * 2020-12-16 2021-03-19 平安科技(深圳)有限公司 基于密度的文本聚类方法、装置、设备及存储介质
CN112580563A (zh) * 2020-12-25 2021-03-30 北京百度网讯科技有限公司 视频摘要的生成方法、装置、电子设备和存储介质
CN112580563B (zh) * 2020-12-25 2024-02-06 北京百度网讯科技有限公司 视频摘要的生成方法、装置、电子设备和存储介质
CN112579823A (zh) * 2020-12-28 2021-03-30 山东师范大学 基于特征融合和增量滑动窗口的视频摘要生成方法及系统
CN112579823B (zh) * 2020-12-28 2022-06-24 山东师范大学 基于特征融合和增量滑动窗口的视频摘要生成方法及系统

Also Published As

Publication number Publication date
CN107223344A (zh) 2017-09-29

Similar Documents

Publication Publication Date Title
WO2018137126A1 (zh) 一种静态视频摘要的生成方法及装置
US8467610B2 (en) Video summarization using sparse basis function combination
US9665775B2 (en) Identifying scene boundaries using group sparsity analysis
Picard et al. Improving image similarity with vectors of locally aggregated tensors
US9076043B2 (en) Video summarization using group sparsity analysis
Wei et al. Saliency inside: Learning attentive CNNs for content-based image retrieval
US8913835B2 (en) Identifying key frames using group sparsity analysis
Oneata et al. Efficient action localization with approximately normalized fisher vectors
CN106851437A (zh) 一种提取视频摘要的方法
US8165983B2 (en) Method and apparatus for resource allocation among classifiers in classification systems
CN105389590B (zh) 一种视频聚类推荐方法和装置
CN111460961A (zh) 一种基于cdvs的相似图聚类的静态视频摘要方法
CN110381392B (zh) 一种视频摘要提取方法及其系统、装置、存储介质
Zhang et al. Automatic discrimination of text and non-text natural images
Fu et al. Image aesthetics assessment using composite features from off-the-shelf deep models
CN110188625B (zh) 一种基于多特征融合的视频精细结构化方法
CN107886109B (zh) 一种基于有监督视频分割的视频摘要方法
CN108966042B (zh) 一种基于最短路径的视频摘要生成方法及装置
Blažek et al. Video retrieval with feature signature sketches
CN111414958B (zh) 一种视觉词袋金字塔的多特征图像分类方法及系统
Hao et al. Improvement of word bag model based on image classification
Himeur et al. A fast and robust key-frames based video copy detection using BSIF-RMI
Zhong et al. Prediction system for activity recognition with compressed video
Foroughi et al. Joint Feature Selection with Low-rank Dictionary Learning.
Zhang et al. You Talkin'to Me? Recognizing Complex Human Interactions in Unconstrained Videos

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17893944

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 05.11.2019)

122 Ep: pct application non-entry in european phase

Ref document number: 17893944

Country of ref document: EP

Kind code of ref document: A1