WO2020029883A1 - 一种视频指纹生成方法和装置 - Google Patents

一种视频指纹生成方法和装置 Download PDF

Info

Publication number
WO2020029883A1
WO2020029883A1 PCT/CN2019/099051 CN2019099051W WO2020029883A1 WO 2020029883 A1 WO2020029883 A1 WO 2020029883A1 CN 2019099051 W CN2019099051 W CN 2019099051W WO 2020029883 A1 WO2020029883 A1 WO 2020029883A1
Authority
WO
WIPO (PCT)
Prior art keywords
shot
time slice
sequence
difference
current
Prior art date
Application number
PCT/CN2019/099051
Other languages
English (en)
French (fr)
Inventor
陈长国
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Priority to EP19846876.1A priority Critical patent/EP3835974B1/en
Publication of WO2020029883A1 publication Critical patent/WO2020029883A1/zh
Priority to US17/170,447 priority patent/US11961299B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/71Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/48Matching video sequences
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/49Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes

Definitions

  • the present invention relates to the technical field of video processing, and in particular, to a method and a device for generating a video fingerprint.
  • video websites need to deduplicate video files uploaded by users, so that when displaying search results to users, deduplication processing can be performed on video files with the same content.
  • An existing method for determining a duplicate video file is: picture hash technology, which requires feature extraction for both training images and query images, and then transforms them into a certain number of encoding sequences by transforming the hash function. This process is called Hash encoding. Then, the obtained hash code is used to calculate the Hamming distance. Within the set Hamming distance threshold, the candidate samples are sorted by Euclidean distance, that is, rearranged, and finally the retrieved image is returned, but this method exists Certain disadvantages. For the tampering of video content, the generated hash codes are very different, resulting in similar content not being retrieved, that is, in the video file deduplication processing, these pictures with the same content will not be processed. Considered as a duplicate video file. In this way, there are a large number of missed detections for long video picture content being cropped, rotated, and so on.
  • the application provides a method and a device for generating a video fingerprint, which can quickly and effectively solve the problem of repetitive detection of image content.
  • the present invention provides a video fingerprint generation method, including:
  • Video fingerprint information is obtained according to obtaining the time slice sequence.
  • obtaining video fingerprint information according to obtaining the time slice sequence includes:
  • obtaining video fingerprint information of the current shot includes:
  • the index item of the time slice element of the current shot, the video sequence number of the video, and the shot sequence number of the time slice of the shot are used as the fingerprint information of the current shot.
  • performing shot boundary detection on video content includes:
  • the image corresponding to the current window is the position of the lens boundary.
  • calculating the quantized difference between the time slice element of the current shot, the time slice element of the previous shot and the time slice element of the subsequent shot in the time slice sequence respectively includes:
  • floor () means round down
  • n is a positive integer
  • the manner of obtaining the lens serial number of the time slice of the lens includes:
  • the shot sequence number of the current shot's time slice is determined according to the order of the current shot's duration in the shot boundary time slice sequence.
  • the present invention provides a video fingerprint generation device, including:
  • Boundary detection module configured to perform shot boundary detection on video content
  • the time calculation module is configured to determine the duration of each shot according to the position point of the shot boundary, and form the shot boundary time slice sequence by using the duration of each shot;
  • the fingerprint information module is configured to obtain video fingerprint information according to obtaining the time slice sequence.
  • the obtaining of video fingerprint information by the fingerprint information module according to obtaining the time slice sequence includes:
  • the fingerprint information module includes:
  • a quantization difference unit configured to calculate a quantization difference between a time slice element of a current shot and a time slice element of a previous shot and a time slice element of a subsequent shot in the time slice sequence
  • a first-level index unit configured to use two quantized difference values corresponding to time slice elements of a current shot as index elements of the elements
  • the secondary index unit is configured to use the index item of the time slice element of the shot, the video serial number of the video, and the shot serial number of the shot as the fingerprint information of the current shot.
  • the boundary detection module performs shot boundary detection on video content includes:
  • the image corresponding to the current window is the position of the lens boundary.
  • the quantization difference unit respectively calculates the quantization difference between the time slice element of the current shot and the time slice element of the previous shot and the time slice element of the next shot in the time slice sequence, including:
  • floor () means round down
  • n is a positive integer
  • the manner in which the secondary index unit obtains the lens serial number of the time slice of the lens includes:
  • the shot sequence number of the current shot's time slice is determined according to the order of the duration of the current shot in the shot boundary time slice sequence.
  • the duration of the lens is used as a basis, and the normalized difference quantization formula is used, and the obtained quantized difference sequence is an integer. Then use the secondary inverted index structure.
  • the generated video fingerprint information codeword is short and has high anti-cropping and anti-rotation characteristics. It is also very robust against other common video editing types of attacks.
  • the lens detection technology used in this solution will have a direct impact on the final result, but still has a considerable degree of tolerance for the case of lens detection errors.
  • FIG. 1 is a flowchart of a video fingerprint generation method according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram of a video fingerprint generation process according to an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of a shot boundary time slice sequence according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of a normalized difference quantization process according to an embodiment of the present invention.
  • FIG. 5 is a flowchart of a video fingerprint retrieval method according to an embodiment of the present invention.
  • FIG. 6 is a schematic structural diagram of a video fingerprint generation device according to an embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of a fingerprint information module according to an embodiment of the present invention.
  • FIG. 8 is a schematic structural diagram of a video fingerprint retrieval device according to an embodiment of the present invention.
  • a device that performs video fingerprint generation and retrieval may include one or more processors (CPUs), input / output interfaces, network interfaces, and memory.
  • processors CPUs
  • input / output interfaces output interfaces
  • network interfaces network interfaces
  • memory volatile and non-volatile memory
  • Memory may include non-persistent memory, random access memory (RAM), and / or non-volatile memory in computer-readable media, such as read-only memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium. Memory may include one or more modules.
  • RAM random access memory
  • ROM read-only memory
  • flash RAM flash memory
  • Computer-readable media includes permanent and non-permanent, removable and non-removable storage media, and information storage can be accomplished by any method or technology.
  • Information may be computer-readable instructions, data structures, modules of a program, or other data.
  • Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), and read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technologies, read-only disc read-only memory (CD-ROM), digital versatile disc (DVD) or other optical storage, Magnetic tape cartridges, magnetic disk storage or other magnetic storage devices or any other non-transmitting medium may be used to store information that can be accessed by a computing device.
  • PRAM phase change memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • RAM random access memory
  • ROM read-only memory
  • EEPROM electrically erasable programmable read-
  • an embodiment of the present invention provides a video fingerprint generation method, including:
  • S102 Determine the duration of each shot according to the position point of the shot boundary, and compose the duration of each shot into a shot boundary time slice sequence.
  • the embodiment of the present invention detects the position of a scene change by performing lens boundary detection.
  • the premise of the above assumption is that the position of a movie scene change must be accurately found.
  • the lens boundary detection generally does not have a large impact due to the cropping and rotation of the image content, which means that the method of the embodiment of the present invention is highly robust to cropping and rotation.
  • step S103 obtaining video fingerprint information according to obtaining the time slice sequence includes:
  • step S101 performing shot boundary detection on video content includes:
  • Shot boundary detection is performed on the video content based on a cumulative histogram.
  • performing shot boundary detection on the video content in a manner based on a cumulative histogram includes:
  • the video content is detected, and the obtained lens boundary position is shown by a pulse arrow on the coordinate axis of FIG. 3, and the duration of each lens is shown by the coordinate axis of FIG. 3. Shown by double arrows.
  • the purpose of shot detection is to determine when the shot changes for the input video content.
  • step S103 according to obtaining the difference between the time slice element of the current shot and the time slice element of an adjacent shot in the time slice sequence, obtaining video fingerprint information of the current shot includes:
  • step S1031 in this embodiment calculating the quantized difference between the time slice element of the current shot and the time slice element of the previous shot and the time slice element of the next shot in the time slice sequence respectively includes:
  • floor () means round down
  • n is a positive integer
  • the method for obtaining the lens serial number of the time slice of the lens includes:
  • the shot sequence number of the current shot's time slice is determined according to the order of the duration of the current shot in the shot boundary time slice sequence.
  • a shot boundary detection method based on a cumulative histogram is adopted, and the specific process is as follows:
  • the frame image of the video content is normalized to a 256x256 grayscale picture
  • each pixel is quantized into 6 bits during the calculation of the histogram
  • this embodiment uses three consecutive time-slice sequences to generate features.
  • the three time slice sequences are considered as a time window. Then, the time window slides down. The two adjacent time windows overlap.
  • the method adopted in this embodiment can cope with missed and misdetected lenses.
  • This embodiment uses the normalized difference quantization feature, and the specific calculation formula is as follows:
  • floor () means round down.
  • the output is a 6-bit unsigned integer.
  • time slice elements of three consecutive shots are calculated according to the above formula to obtain a 12-bit integer.
  • a 12-bit unsigned integer is the index term for the construction.
  • the present invention provides a picture fingerprint retrieval method, including:
  • the image fingerprint retrieval method further includes:
  • multiple 12-bit unsigned integers are generated as feature sequences according to the above process.
  • the fingerprint information of each lens is accompanied by the lens serial number of the lens and the corresponding video number.
  • a 12-bit integer can be used as an index into a hash table, corresponding to 4096 hashes. Each feature will be scattered into this hash table.
  • Table 1 The specific memory structure is shown in Table 1 below:
  • the retrieval task can be completed quickly. That is, first obtain all video serial numbers and corresponding lens serial numbers corresponding to the current feature through a 12-bit integer. If multiple features are generated by the same video, and the feature values of the database have been saved in the library, their corresponding video serial numbers are the same, and the lens serial numbers are increasing. Following this rule, you can quickly filter out the final desired result.
  • this embodiment provides a video fingerprint generation device, including:
  • the boundary detection module 100 is configured to perform shot boundary detection on video content
  • the time calculation module 200 is configured to determine the duration of each shot according to the position point of the shot boundary, and form the shot boundary time slice sequence by using the duration of each shot;
  • the fingerprint information module 300 is configured to obtain video fingerprint information according to obtaining the time slice sequence.
  • the fingerprint information module 300 obtaining video fingerprint information according to obtaining the time slice sequence includes:
  • the boundary detection module 100 performs shot boundary detection on video content includes:
  • Shot boundary detection is performed on the video content based on a cumulative histogram.
  • the boundary detection module 100 performs shot boundary detection on the video content in a manner based on a cumulative histogram, including:
  • the image corresponding to the current window is the position of the lens boundary.
  • the fingerprint information module 300 includes:
  • a quantization difference unit configured to calculate a quantization difference between a time slice element of a current shot and a time slice element of a previous shot and a time slice element of a subsequent shot in the time slice sequence
  • a first-level index unit configured to use two quantized difference values corresponding to time slice elements of a current shot as index elements of the elements
  • the secondary index unit is configured to use the index item of the time slice element of the shot, the video serial number of the video, and the shot serial number of the shot as the fingerprint information of the current shot.
  • this embodiment provides a picture fingerprint retrieval device, including:
  • the obtaining module 400 is configured to obtain video fingerprint information of input video content
  • the deduplication module 500 is configured to compare the video fingerprint information of the input video content with the video fingerprint information of each video in the database. When the video fingerprint information of the input video content is compared with the video fingerprint information of the video in the database, At the same time, the input video content is determined as a duplicate video.
  • the above device further includes a marking module 600, configured to:

Abstract

本申请提出一种视频指纹生成方法和装置,所述方法包括:对视频内容进行镜头边界检测;根据镜头边界的位置点,确定每一个镜头的持续时间,将所述每一个镜头的持续时间组成镜头边界时间片序列;根据获得所述时间片序列获得视频指纹信息。

Description

一种视频指纹生成方法和装置
本申请要求2018年08月09日递交的申请号为201810905169.X、发明名称为“一种视频指纹生成方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及视频处理技术领域,尤其涉及一种视频指纹生成方法和装置。
背景技术
在网站中,不同用户可能会上传具有相同内容的视频文件,即使是同一用户,也可能多次上传具有相同内容的视频文件,所以网站中的视频文件存在较为严重的重复问题。实际应用中,视频网站需要对用户上传的视频文件进行去重,以便在向用户展示搜索结果时,能够对具有相同内容的视频文件进行去重处理。
现有的一种重复视频文件确定方法是:图片哈希技术,对于训练图像和查询图像都需要进行特征的提取,之后通过哈希函数的转化,压缩为一定数目的编码序列,该过程称为哈希编码。然后将得到的哈希编码,进行汉明距离的计算,在设置的汉明距离阈值范围内再对候选样本做欧式距离排序,也就是重排,最后返回检索到的图像,但是这种方法存在一定的缺点,对于视频内容的篡改,生成的哈希码有很大的不同,导致检索不到相似的内容,也就是说,在视频文件去重处理中,不会将这些具有相同内容的图片作为重复视频文件予以考虑。这样,对于长视频图片内容被裁剪,被旋转等存在大量的漏检的问题。
发明内容
本申请提供了一种视频指纹生成方法和装置,快速有效的解决影像内容的重复性检测问题。
采取的技术方案如下:
第一方面,本发明提供了一种视频指纹生成方法,包括:
对视频内容进行镜头边界检测;
根据镜头边界的位置点,确定每一个镜头的持续时间,将所述每一个镜头的持续时间组成镜头边界时间片序列;
根据获得所述时间片序列获得视频指纹信息。
优选地,根据获得所述时间片序列获得视频指纹信息包括:
根据获得所述时间片序列中当前镜头的时间片元素与相邻镜头的时间片元素的差值,获得当前镜头的视频指纹信息。
优选地,根据获得所述时间片序列中当前镜头的时间片元素与相邻镜头的时间片元素的差值,获得当前镜头的视频指纹信息包括:
分别计算所述时间片序列中当前镜头的时间片元素与其前一个镜头的时间片元素和后一个镜头的时间片元素的量化差值;
将当前镜头的时间片元素对应的两个量化差值作为所述元素的索引项,
将当前镜头的时间片元素的索引项与所述视频的视频序号和所述镜头的时间片的镜头序号作为当前镜头的指纹信息。
优选地,对视频内容进行镜头边界检测包括:
对所述视频内容的每一帧图像进行处理获得预设尺寸的灰度图像;
计算每一帧图像的归一化直方图;
计算每一帧图像的归一化累积直方图;
计算每相邻两帧图像的累积直方图的差值,形成差值序列;
对所述差值序列采用预设大小的窗口进行平滑处理;
在时间长度为预设长度的时间窗口之内,计算所述差值序列的标准差;
如果当前窗口的值与标准差的差距满足预设条件,则确定当前窗口对应的图像是镜头边界的位置。
优选地,分别计算所述时间片序列中当前镜头的时间片元素与其前一个镜头的时间片元素和后一个镜头的时间片元素的量化差值包括:
利用如下公式计算所述时间片序列中的当前镜头的时间片元素T i与其前一个镜头的时间片元素T i-1的量化差值f(T i,T i-1):
Figure PCTCN2019099051-appb-000001
其中,floor()表示向下取整,n为正整数,4≤n≤9。
优选地,获取所述镜头的时间片的镜头序号的方式包括:
根据当前镜头的持续时间在镜头边界时间片序列中的排序确定当前镜头的时间片的 镜头序号。
第二方面,本发明提供一种视频指纹生成装置,包括:
边界检测模块,设置为对视频内容进行镜头边界检测;
时间计算模块,设置为根据镜头边界的位置点,确定每一个镜头的持续时间,将所述每一个镜头的持续时间组成镜头边界时间片序列;
指纹信息模块,设置为根据获得所述时间片序列获得视频指纹信息。
优选地,所述指纹信息模块根据获得所述时间片序列获得视频指纹信息包括:
根据获得所述时间片序列中当前镜头的时间片元素与相邻镜头的时间片元素的差值,获得当前镜头的视频指纹信息。
优选地,所述指纹信息模块包括:
量化差值单元,设置为分别计算所述时间片序列中当前镜头的时间片元素与其前一个镜头的时间片元素和后一个镜头的时间片元素的量化差值;
一级索引单元,设置为将当前镜头的时间片元素对应的两个量化差值作为所述元素的索引项,
二级索引单元,设置为将所述镜头的时间片元素的索引项与所述视频的视频序号和所述镜头的时间片的镜头序号作为当前镜头的指纹信息。
优选地,所述边界检测模块对视频内容进行镜头边界检测包括:
对所述视频内容的每一帧图像进行处理获得预设尺寸的灰度图像;
计算每一帧图像的归一化直方图;
计算每一帧图像的归一化累积直方图;
计算每相邻两帧图像的累积直方图的差值,形成差值序列;
对所述差值序列采用预设大小的窗口进行平滑处理;
在时间长度为预设长度的时间窗口之内,计算所述差值序列的标准差;
如果当前窗口的值与标准差的差距满足预设条件,则确定当前窗口对应的图像是镜头边界的位置。
优选地,所述量化差值单元分别计算所述时间片序列中当前镜头的时间片元素与其前一个镜头的时间片元素和后一个镜头的时间片元素的量化差值包括:
利用如下公式计算所述时间片序列中的当前镜头的时间片元素T i与其前一个镜头的时间片元素T i-1的量化差值f(T i,T i-1):
Figure PCTCN2019099051-appb-000002
其中,floor()表示向下取整,n为正整数,4≤n≤9。
优选地,所述二级索引单元获取所述镜头的时间片的镜头序号的方式包括:
根据当前镜头的持续时间在镜头边界时间片序列中的排序确定当前镜头的时间片的镜头序号。
本申请和现有技术相比,具有如下有益效果:
本申请采用镜头的持续时间作为依据,利用归一化差值量化公式,得到的量化差值序列是整数。再采用二级倒排索引结构。生成的视频指纹信息码字短,并且具有高度的抗裁剪和抗旋转特性。对于其它常见的视频编辑类型的攻击,同样具有很好的鲁棒性。本方案采用的镜头检测技术将会对最终的结果产生直接的影响,但对镜头检测出错的情况仍然具有相当程度的容忍性能。
附图说明
图1为本发明实施例的一种视频指纹生成方法的流程图;
图2为本发明实施例的视频指纹生成过程的示意图;
图3为本发明实施例的镜头边界时间片序列示意图;
图4为本发明实施例的归一化差值量化过程的示意图;
图5为本发明实施例的一种视频指纹检索方法的流程图;
图6为本发明实施例的一种视频指纹生成装置的结构示意图;
图7为本发明实施例的指纹信息模块的结构示意图;
图8为本发明实施例的一种视频指纹检索装置的结构示意图。
具体实施方式
下面将结合附图及实施例对本申请的技术方案进行更详细的说明。
需要说明的是,如果不冲突,本申请实施例以及实施例中的各个特征可以相互结合,均在本申请的保护范围之内。另外,虽然在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤。
在一种配置中,进行视频指纹生成和检索的设备可包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存(memory)。
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。内存可能包括一个或多个模块。
计算机可读介质包括永久性和非永久性、可移动和非可移动存储介质,可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM),快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。
实施例一
如图1和图2所示,本发明实施例提供了一种视频指纹生成方法,包括:
S101、对视频内容进行镜头边界检测;
S102、根据镜头边界的位置点,确定每一个镜头的持续时间,将所述每一个镜头的持续时间组成镜头边界时间片序列;
S103、根据获得所述时间片序列获得视频指纹信息。
对于一段视频内容,由于经过了后期的人工编辑,会存在较多的场景切换。这种场景切换的表现形式很多样,例如包括镜头的突变,还包括一些特效的转场形式的场景切换。比较常见的类型包括淡入淡出型,马赛克特效转场,百叶窗特效转场等。不同的视频内容,其场景切换的位置和频率是不一样的。如果把一段视频内容所有场景切换的时间位置都准确的找出来,就得到了一个时间序列。对应的,任意一个时间序列,唯一对应一段视频内容。
本发明实施例对场景切换的位置检测做镜头边界检测。上述假设有一个前提是必须准确找到一部电影场景切换的位。进一步的,镜头边界检测一般不会因为图像内容的裁剪,旋转而产生较大的影响,这意味本发明实施例的方法对于裁剪、旋转具有高度鲁棒性。
本实施例中,步骤S103中,根据获得所述时间片序列获得视频指纹信息包括:
根据获得所述时间片序列中当前镜头的时间片元素与相邻镜头的时间片元素的差值,获得当前镜头的视频指纹信息。
本实施例中,步骤S101中,对视频内容进行镜头边界检测包括:
采用基于累积直方图的方式对所述视频内容进行镜头边界检测。
具体地,采用基于累积直方图的方式对所述视频内容进行镜头边界检测包括:
S1、对所述视频内容的每一帧图像进行处理获得预设尺寸的灰度图像;
S2、计算每一帧图像的归一化直方图;
S3、计算每一帧图像的归一化累积直方图;
S4、计算每相邻两帧图像的累积直方图的差值,形成差值序列;
S5、对所述差值序列采用预设大小的窗口进行平滑处理;
S6、在时间长度为预设长度的时间窗口之内,计算所述差值序列的标准差;
S7、如果当前窗口的值与标准差的差距满足预设条件,则确定当前窗口对应的图像是镜头边界的位置。
如图3所示,以输入的视频内容为例,检测所述视频内容,获得的镜头边界位置如图3坐标轴上的脉冲箭头所示,每一个镜头的持续时间如图3坐标轴下的双向箭头所示。镜头检测的目的是对于输入的视频内容,确定镜头变化的时间点。
如图4所示,本实施例中,步骤S103根据获得所述时间片序列中当前镜头的时间片元素与相邻镜头的时间片元素的差值,获得当前镜头的视频指纹信息包括:
S1031、分别计算所述时间片序列中当前镜头的时间片元素与其前一个镜头的时间片元素和后一个镜头的时间片元素的量化差值;
S1032、将当前镜头的时间片元素对应的两个量化差值作为所述元素的索引项,
S1033、将所述镜头的时间片元素的索引项与所述视频的视频序号和所述镜头的时间片的镜头序号作为当前镜头的指纹信息。
本实施例中步骤S1031中,分别计算所述时间片序列中当前镜头的时间片元素与其前一个镜头的时间片元素和后一个镜头的时间片元素的量化差值包括:
利用如下公式计算所述时间片序列中的当前镜头的时间片元素T i与其前一个镜头的时间片元素T i-1的量化差值f(T i,T i-1):
Figure PCTCN2019099051-appb-000003
其中,floor()表示向下取整,n为正整数,4≤n≤9。
本实施例中,
Figure PCTCN2019099051-appb-000004
的取值范围为(-1,1),作加1运算后,取值范围为(0,2),作除2运算后,取值范围为(0,1),作乘以2n运算后,取值范围为(0,2n),可以转换为n为二进制数。优选地,n=6。
其中,获取所述镜头的时间片的镜头序号的方式包括:
根据当前镜头的持续时间在镜头边界时间片序列中的排序确定当前镜头的时间片的镜头序号。
实施例二
本实施例说明图片镜头边界检测处理的过程:
本实施例中采用基于累积直方图的镜头边界检测方法,具体流程如下:
1.1.视频内容的帧图像归一化为256x256的灰度图片;
1.2.计算归一化直方图,计算直方图过程中每一个像素量化为6个比特;
1.3.计算归一化累积直方图;
1.4.计算相邻两帧累积直方图的差值;
1.5.对差值序列进行高斯平滑,平滑窗口大小为3;
1.6.在时间长度为1秒的时间窗口之内,计算差值序列的标准差,如果某个序列的值大于标准差的8倍以上,则认为是镜头边界的位置;
1.7.将滑动窗的位置向后移动一帧的位置,返回执行1.6,直到窗口达到最后的位置。
实施例三
本实施例说明利用时间片序列进行归一化差值量化计算的过程:
由于镜头检测存在误检和漏检,因此不能直接对比两个视频经过镜头检测得到的两个时间序列。而且,在实际场景中,两个相同的视频,也可能一个在时间轴上被裁剪只剩下一半,即在时间轴上不是对齐的。因此,本实施例采用连续的3个时间片序列生成特征。这3个时间片序列视为一个时间窗口。然后,时间窗口向下滑动。相邻的两个时间窗口是有重叠的。本实施例采用的方式可以应对漏检和误检的镜头。
本实施例采用归一化差值量化特征,具体计算公式如下:
对于任意相邻的两个镜头的时间片元素,
Figure PCTCN2019099051-appb-000005
其中,floor()表示向下取整。
本实施例中,输出为一个6bit的无符号整数。
连续的3个相邻的镜头的时间片元素,按照上述公式计算获得12bit的整数,作为特征值,12bit的无符号整数是构建的索引项。
实施例四
如图5所示,本发明提供一种图片指纹检索方法,包括:
S201、获得输入视频内容的视频指纹信息;
S202、将所述输入视频内容的视频指纹信息与数据库中每个视频的视频指纹信息进行对比,当所述输入视频内容的视频指纹信息与数据库中的视频的视频指纹信息相同时,将所述输入视频内容确定为重复视频。
本实施例,上述图片指纹检索方法还包括:
将所述输入视频内容和所述数据库中的重复视频以相同标记符进行标记,以在向用户展示视频时,根据预设的选择要求,选择具有相同标记符的视频中的一个视频进行展示。
对于输入的视频内容,按照上述过程会生成多个的12bit无符号整数作为特征序列。在检索的结构里面,每一个镜头的指纹信息除了包括特征序列外,还附带所述镜头的镜头序号,以及对应的视频编号。12bit整数可以作为一个哈希表的索引,对应4096个哈希值。每一个特征都将被分散到这个哈希表里面。具体的内存结构如下表1所示:
表1
Figure PCTCN2019099051-appb-000006
有了上面的哈希倒排结构,就可以快速的完成检索的任务。即先通过12bit的整数获得当前特征对应的所有的视频序号以及对应的镜头序号。如果多个特征是同一个视频产生的,并且库里面已经保存了数据库的特征值,则他们对应的视频序号是相同,镜头序 号是递增的。按照这个规则,可以快速过滤出最终想要的结果。
实施例五
如图6所示,本实施例提供一种视频指纹生成装置,包括:
边界检测模块100,设置为对视频内容进行镜头边界检测;
时间计算模块200,设置为根据镜头边界的位置点,确定每一个镜头的持续时间,将所述每一个镜头的持续时间组成镜头边界时间片序列;
指纹信息模块300,设置为根据获得所述时间片序列获得视频指纹信息。
本实施例中,所述指纹信息模块300根据获得所述时间片序列获得视频指纹信息包括:
根据获得所述时间片序列中当前镜头的时间片元素与相邻镜头的时间片元素的差值,获得当前镜头的视频指纹信息。
本实施例中,所述边界检测模块100对视频内容进行镜头边界检测包括:
采用基于累积直方图的方式对所述视频内容进行镜头边界检测。
所述边界检测模块100采用基于累积直方图的方式对所述视频内容进行镜头边界检测包括:
对所述视频内容的每一帧图像进行处理获得预设尺寸的灰度图像;
计算每一帧图像的归一化直方图;
计算每一帧图像的归一化累积直方图;
计算每相邻两帧图像的累积直方图的差值,形成差值序列;
对所述差值序列采用预设大小的窗口进行平滑处理;
在时间长度为预设长度的时间窗口之内,计算所述差值序列的标准差;
如果当前窗口的值与标准差的差距满足预设条件,则确定当前窗口对应的图像是镜头边界的位置。
如图7所示,所述指纹信息模块300包括:
量化差值单元,设置为分别计算所述时间片序列中当前镜头的时间片元素与其前一个镜头的时间片元素和后一个镜头的时间片元素的量化差值;
一级索引单元,设置为将当前镜头的时间片元素对应的两个量化差值作为所述元素的索引项,
二级索引单元,设置为将所述镜头的时间片元素的索引项与所述视频的视频序号和所述镜头的时间片的镜头序号作为当前镜头的指纹信息。
实施例六
如图8所示,本实施例提供了一种图片指纹检索装置,包括:
获取模块400,设置为获得输入视频内容的视频指纹信息;
去重模块500,设置为将所述输入视频内容的视频指纹信息与数据库中每个视频的视频指纹信息进行对比,当所述输入视频内容的视频指纹信息与数据库中的视频的视频指纹信息相同时,将所述输入视频内容确定为重复视频。
本实施例中,上述装置还包括标记模块600,设置为:
将所述输入视频内容和所述数据库中的重复视频以相同标记符进行标记,以在向用户展示视频时,根据预设的选择要求,选择具有相同标记符的视频中的一个视频进行展示。
虽然本发明所揭示的实施方式如上,但其内容只是为了便于理解本发明的技术方案而采用的实施方式,并非用于限定本发明。任何本发明所属技术领域内的技术人员,在不脱离本发明所揭示的核心技术方案的前提下,可以在实施的形式和细节上做任何修改与变化,但本发明所限定的保护范围,仍须以所附的权利要求书限定的范围为准。

Claims (12)

  1. 一种视频指纹生成方法,其特征在于,包括:
    对视频内容进行镜头边界检测;
    根据镜头边界的位置点,确定每一个镜头的持续时间,将所述每一个镜头的持续时间组成镜头边界时间片序列;
    根据获得所述时间片序列获得视频指纹信息。
  2. 如权利要求1所述的方法,其特征在于:根据获得所述时间片序列获得视频指纹信息包括:
    根据获得所述时间片序列中当前镜头的时间片元素与相邻镜头的时间片元素的差值,获得当前镜头的视频指纹信息。
  3. 如权利要求2所述的方法,其特征在于:根据获得所述时间片序列中当前镜头的时间片元素与相邻镜头的时间片元素的差值,获得当前镜头的视频指纹信息包括:
    分别计算所述时间片序列中当前镜头的时间片元素与其前一个镜头的时间片元素和后一个镜头的时间片元素的量化差值;
    将当前镜头的时间片元素对应的两个量化差值作为所述元素的索引项,
    将当前镜头的时间片元素的索引项与所述视频的视频序号和所述镜头的时间片的镜头序号作为当前镜头的指纹信息。
  4. 如权利要求1所述的方法,其特征在于:对视频内容进行镜头边界检测包括:
    对所述视频内容的每一帧图像进行处理获得预设尺寸的灰度图像;
    计算每一帧图像的归一化直方图;
    计算每一帧图像的归一化累积直方图;
    计算每相邻两帧图像的累积直方图的差值,形成差值序列;
    对所述差值序列采用预设大小的窗口进行平滑处理;
    在时间长度为预设长度的时间窗口之内,计算所述差值序列的标准差;
    如果当前窗口的值与标准差的差距满足预设条件,则确定当前窗口对应的图像是镜头边界的位置。
  5. 如权利要求3所述的方法,其特征在于:分别计算所述时间片序列中当前镜头的时间片元素与其前一个镜头的时间片元素和后一个镜头的时间片元素的量化差值包括:
    利用如下公式计算所述时间片序列中的当前镜头的时间片元素Ti与其前一个镜头的时间片元素Ti-1的量化差值f(Ti,Ti-1):
    Figure PCTCN2019099051-appb-100001
    其中,floor()表示向下取整,n为正整数,4≤n≤9。
  6. 如权利要求3所述的方法,其特征在于:获取所述镜头的时间片的镜头序号的方式包括:
    根据当前镜头的持续时间在镜头边界时间片序列中的排序确定当前镜头的时间片的镜头序号。
  7. 一种视频指纹生成装置,其特征在于,包括:
    边界检测模块,设置为对视频内容进行镜头边界检测;
    时间计算模块,设置为根据镜头边界的位置点,确定每一个镜头的持续时间,将所述每一个镜头的持续时间组成镜头边界时间片序列;
    指纹信息模块,设置为根据获得所述时间片序列获得视频指纹信息。
  8. 如权利要求7所述的装置,其特征在于:所述指纹信息模块根据获得所述时间片序列获得视频指纹信息包括:
    根据获得所述时间片序列中当前镜头的时间片元素与相邻镜头的时间片元素的差值,获得当前镜头的视频指纹信息。
  9. 如权利要求8所述的装置,其特征在于:所述指纹信息模块包括:
    量化差值单元,设置为分别计算所述时间片序列中当前镜头的时间片元素与其前一个镜头的时间片元素和后一个镜头的时间片元素的量化差值;
    一级索引单元,设置为将当前镜头的时间片元素对应的两个量化差值作为所述元素的索引项,
    二级索引单元,设置为将所述镜头的时间片元素的索引项与所述视频的视频序号和所述镜头的时间片的镜头序号作为当前镜头的指纹信息。
  10. 如权利要求7所述的装置,其特征在于:所述边界检测模块对视频内容进行镜头边界检测包括:
    对所述视频内容的每一帧图像进行处理获得预设尺寸的灰度图像;
    计算每一帧图像的归一化直方图;
    计算每一帧图像的归一化累积直方图;
    计算每相邻两帧图像的累积直方图的差值,形成差值序列;
    对所述差值序列采用预设大小的窗口进行平滑处理;
    在时间长度为预设长度的时间窗口之内,计算所述差值序列的标准差;
    如果当前窗口的值与标准差的差距满足预设条件,则确定当前窗口对应的图像是镜头边界的位置。
  11. 如权利要求9所述的装置,其特征在于:所述量化差值单元分别计算所述时间片序列中当前镜头的时间片元素与其前一个镜头的时间片元素和后一个镜头的时间片元素的量化差值包括:
    利用如下公式计算所述时间片序列中的当前镜头的时间片元素T i与其前一个镜头的时间片元素T i-1的量化差值f(T i,T i-1):
    Figure PCTCN2019099051-appb-100002
    其中,floor()表示向下取整,n为正整数,4≤n≤9。
  12. 如权利要求9所述的装置,其特征在于:所述二级索引单元获取所述镜头的时间片的镜头序号的方式包括:
    根据当前镜头的持续时间在镜头边界时间片序列中的排序确定当前镜头的时间片的镜头序号。
PCT/CN2019/099051 2018-08-09 2019-08-02 一种视频指纹生成方法和装置 WO2020029883A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP19846876.1A EP3835974B1 (en) 2018-08-09 2019-08-02 Method and device for generating video fingerprint
US17/170,447 US11961299B2 (en) 2018-08-09 2021-02-08 Method and apparatus for generating video fingerprint

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810905169.X 2018-08-09
CN201810905169.XA CN110826365B (zh) 2018-08-09 2018-08-09 一种视频指纹生成方法和装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/170,447 Continuation US11961299B2 (en) 2018-08-09 2021-02-08 Method and apparatus for generating video fingerprint

Publications (1)

Publication Number Publication Date
WO2020029883A1 true WO2020029883A1 (zh) 2020-02-13

Family

ID=69413380

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/099051 WO2020029883A1 (zh) 2018-08-09 2019-08-02 一种视频指纹生成方法和装置

Country Status (4)

Country Link
US (1) US11961299B2 (zh)
EP (1) EP3835974B1 (zh)
CN (1) CN110826365B (zh)
WO (1) WO2020029883A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113190713A (zh) * 2021-05-06 2021-07-30 百度在线网络技术(北京)有限公司 视频搜索方法及装置、电子设备和介质
CN113139094B (zh) * 2021-05-06 2023-11-07 北京百度网讯科技有限公司 视频搜索方法及装置、电子设备和介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102073864A (zh) * 2010-12-01 2011-05-25 北京邮电大学 四层结构的体育视频中足球项目检测系统及实现
CN102750339A (zh) * 2012-06-05 2012-10-24 北京交通大学 一种基于视频重构的重复片段定位方法
CN104867161A (zh) * 2015-05-14 2015-08-26 国家电网公司 一种视频处理方法及装置
CN108010044A (zh) * 2016-10-28 2018-05-08 央视国际网络无锡有限公司 一种视频边界检测的方法
WO2018102014A1 (en) * 2016-11-30 2018-06-07 Google Inc. Determination of similarity between videos using shot duration correlation

Family Cites Families (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8600113B2 (en) 2004-11-12 2013-12-03 The University Court Of The University Of St. Andrews System, method and computer program product for video fingerprinting
WO2007080133A2 (en) * 2006-01-16 2007-07-19 Thomson Licensing Method for determining and fingerprinting a key frame of a video sequence
US8009861B2 (en) 2006-04-28 2011-08-30 Vobile, Inc. Method and system for fingerprinting digital video object based on multiresolution, multirate spatial and temporal signatures
WO2007148264A1 (en) 2006-06-20 2007-12-27 Koninklijke Philips Electronics N.V. Generating fingerprints of video signals
DE102007013811A1 (de) * 2007-03-22 2008-09-25 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Verfahren zur zeitlichen Segmentierung eines Videos in Videobildfolgen und zur Auswahl von Keyframes für das Auffinden von Bildinhalten unter Einbeziehung einer Subshot-Detektion
US8094872B1 (en) 2007-05-09 2012-01-10 Google Inc. Three-dimensional wavelet based video fingerprinting
US8611422B1 (en) * 2007-06-19 2013-12-17 Google Inc. Endpoint based video fingerprinting
EP2193420A4 (en) 2007-07-27 2010-10-06 Synergy Sports Technology Llc SYSTEM AND METHOD FOR USING A WEBSITE CONTAINING VIDEOS READING LISTS AS INTRODUCED ON A DOWNLOAD MANAGER
US9177209B2 (en) 2007-12-17 2015-11-03 Sinoeast Concept Limited Temporal segment based extraction and robust matching of video fingerprints
US8259177B2 (en) 2008-06-30 2012-09-04 Cisco Technology, Inc. Video fingerprint systems and methods
US8498487B2 (en) 2008-08-20 2013-07-30 Sri International Content-based matching of videos using local spatio-temporal fingerprints
US8422731B2 (en) * 2008-09-10 2013-04-16 Yahoo! Inc. System, method, and apparatus for video fingerprinting
CN101620629A (zh) * 2009-06-09 2010-01-06 中兴通讯股份有限公司 一种提取视频索引的方法、装置及视频下载系统
US8345990B2 (en) * 2009-08-03 2013-01-01 Indian Institute Of Technology Bombay System for creating a capsule representation of an instructional video
US8229219B1 (en) 2009-08-06 2012-07-24 Google Inc. Full-length video fingerprinting
EP2437498A1 (en) 2010-09-30 2012-04-04 British Telecommunications Public Limited Company Digital video fingerprinting
CN102685398B (zh) * 2011-09-06 2014-08-13 天脉聚源(北京)传媒科技有限公司 一种新闻视频场景生成方法
US8538239B2 (en) * 2011-12-22 2013-09-17 Broadcom Corporation System and method for fingerprinting video
US8989376B2 (en) 2012-03-29 2015-03-24 Alcatel Lucent Method and apparatus for authenticating video content
US8818037B2 (en) * 2012-10-01 2014-08-26 Microsoft Corporation Video scene detection
US9514502B2 (en) * 2015-01-21 2016-12-06 Interra Systems Inc. Methods and systems for detecting shot boundaries for fingerprint generation of a video
US10313710B1 (en) * 2017-07-31 2019-06-04 Amazon Technologies, Inc. Synchronizing encoding between encoders

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102073864A (zh) * 2010-12-01 2011-05-25 北京邮电大学 四层结构的体育视频中足球项目检测系统及实现
CN102750339A (zh) * 2012-06-05 2012-10-24 北京交通大学 一种基于视频重构的重复片段定位方法
CN104867161A (zh) * 2015-05-14 2015-08-26 国家电网公司 一种视频处理方法及装置
CN108010044A (zh) * 2016-10-28 2018-05-08 央视国际网络无锡有限公司 一种视频边界检测的方法
WO2018102014A1 (en) * 2016-11-30 2018-06-07 Google Inc. Determination of similarity between videos using shot duration correlation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3835974A4

Also Published As

Publication number Publication date
CN110826365B (zh) 2023-06-23
US11961299B2 (en) 2024-04-16
EP3835974A1 (en) 2021-06-16
CN110826365A (zh) 2020-02-21
EP3835974A4 (en) 2022-05-04
US20210166036A1 (en) 2021-06-03
EP3835974B1 (en) 2024-01-24

Similar Documents

Publication Publication Date Title
US20210374386A1 (en) Entity recognition from an image
WO2020211624A1 (zh) 对象追踪方法、追踪处理方法、相应的装置、电子设备
EP1519343A2 (en) Method and apparatus for summarizing and indexing the contents of an audio-visual presentation
WO2022143688A1 (zh) 视频抽帧处理方法、装置、设备及介质
Dekel et al. Revealing and modifying non-local variations in a single image
WO2020029883A1 (zh) 一种视频指纹生成方法和装置
CN110009662B (zh) 人脸跟踪的方法、装置、电子设备及计算机可读存储介质
CN109409321B (zh) 一种镜头运动方式的确定方法及装置
CN111666442B (zh) 一种图像检索方法、装置及计算机设备
US20130343618A1 (en) Searching for Events by Attendants
WO2021175040A1 (zh) 视频处理方法及相关装置
US20190311744A1 (en) Comparing frame data to generate a textless version of a multimedia production
US10924637B2 (en) Playback method, playback device and computer-readable storage medium
CN110826461A (zh) 视频内容识别方法、装置、电子设备及存储介质
CN112446361A (zh) 一种训练数据的清洗方法及设备
CN113269854B (zh) 一种智能生成访谈类综艺节目的方法
Ghosal et al. A geometry-sensitive approach for photographic style classification
CN114372169A (zh) 一种同源视频检索的方法、装置以及存储介质
CN106203244B (zh) 一种镜头类型的确定方法及装置
CN114969089A (zh) 数据获得方法、装置、电子设备及存储介质
Bhaumik et al. Real-time storyboard generation in videos using a probability distribution based threshold
Shah et al. Video to text summarisation and timestamp generation to detect important events
CN111008301B (zh) 一种以图搜视频的方法
Fu et al. A novel shot boundary detection technique for illumination and motion effects
KR20190060027A (ko) 주요 등장인물의 감성에 기반한 비디오 자동 편집 방법 및 장치

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19846876

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2019846876

Country of ref document: EP

Effective date: 20210309