WO2017114211A1 - 用于对视频场景切换进行检测的方法和装置 - Google Patents

用于对视频场景切换进行检测的方法和装置 Download PDF

Info

Publication number
WO2017114211A1
WO2017114211A1 PCT/CN2016/110717 CN2016110717W WO2017114211A1 WO 2017114211 A1 WO2017114211 A1 WO 2017114211A1 CN 2016110717 W CN2016110717 W CN 2016110717W WO 2017114211 A1 WO2017114211 A1 WO 2017114211A1
Authority
WO
WIPO (PCT)
Prior art keywords
video
detection
frames
switching
frame
Prior art date
Application number
PCT/CN2016/110717
Other languages
English (en)
French (fr)
Inventor
谢雨来
张杨
Original Assignee
株式会社日立制作所
谢雨来
张杨
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社日立制作所, 谢雨来, 张杨 filed Critical 株式会社日立制作所
Publication of WO2017114211A1 publication Critical patent/WO2017114211A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/142Detection of scene cut or scene change
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/85Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression
    • H04N19/87Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using pre-processing or post-processing specially adapted for video compression involving scene cut or scene change detection in combination with video compression
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/83Generation or processing of protective or descriptive data associated with content; Content structuring
    • H04N21/845Structuring of content, e.g. decomposing content into time segments
    • H04N21/8456Structuring of content, e.g. decomposing content into time segments by decomposing the content in the time domain, e.g. in time segments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N5/00Details of television systems
    • H04N5/14Picture signal circuitry for video frequency region

Definitions

  • the invention relates to a method and a device for detecting a video scene switching, which can detect a switching frame of a video scene by using a matching relationship between the feature points of the frame, and obtain an index frame of each video scene according to the detection result, and It is possible to perform coarse detection and fine detection simultaneously by analyzing the degree of change of the video scene, thereby realizing effective detection of the switching frame in both the scene gradation switching and the abrupt switching.
  • video segmentation technology that can segment video content according to different video scenes is receiving more and more attention from the media industry.
  • the video scene segmentation technology can effectively improve the efficiency of video storage, management and search, and is being used more widely.
  • the so-called video scene usually refers to the video content acquired by one shot continuous shooting, which has coherence. Therefore, it is often possible to use the video content of one frame in a video scene as a representative of the entire video scene.
  • a frame that is representative of the video scene is usually referred to as an index frame.
  • the so-called video scene switching refers to the process of switching video content from one video scene to another. Video content inconsistency often occurs during video scene switching. Therefore, there is a need to segment different video scenes of video content.
  • the splitting of the video scene refers to dividing the video content into video clips of multiple single video scenes according to different video scenes according to the switching position of the video scene.
  • the significance of segmenting a video scene is that the management of video segments of a single video scene is more efficient than the management of the overall video. For example, video clips of the same type of scene can be centrally managed, so when a certain type of video clip is needed, it can be directly The type of video scene library is done, avoiding looking in the overall video library with more complex content.
  • video clips of a single video scene can often be represented by the content of one of the frames.
  • this frame is referred to as an index frame (or representative frame).
  • using the index frame for indexing can effectively manage the video clip of the video scene and quickly retrieve the desired video scene.
  • video segmentation technology can be used to acquire video segments of 100 single video scenes, and each video segment selects one frame as an index frame, and a total of 100 indexes can be obtained.
  • Frame then, in the search of a video scene, only need to retrieve the 100 index frames instead of the overall 10000 frames, greatly improving the efficiency.
  • the key to segmenting a video scene is to find the switching position of the video scene (eg, switching frames).
  • Switching of video scenes is usually divided into two categories: mutation switching and gradual switching.
  • the mutation switching means that the switching position of the video scene lies between two adjacent frames; and the gradual switching refers to the switching of the video scene gradually between frames of more than two frames.
  • one of the objectives of the present invention is to provide a method and apparatus for detecting video scene switching, which can detect a switching frame of a video scene by using a matching relationship of inter-frame feature points, and can utilize a video scene.
  • the analysis of the degree of change performs the coarse detection and the fine detection at the same time, so that the effective detection of the switching frame is realized in the case of the scene gradual switching and the abrupt switching.
  • a method for detecting a video scene switching comprising: an extracting step, a video from a video segment including a plurality of video scenes Extracting each video frame from the content; and detecting a step of determining two video frames of a predetermined interval as detection frames, and detecting a switching frame at a switching position of the video scene by using a feature point matching ratio between the two detection frames.
  • the detecting step comprises: performing fine detection of two adjacent video frames as detection frames; and performing coarse detection of two video frames that are not adjacent by a predetermined interval as detection frames.
  • the detecting step further comprises: combining the detected switching frame detected by the fine detection and the switching frame detected by the coarse detection as a final switching frame.
  • the prescribed interval employed in the coarse detection is determined based on a statistical value of the degree of change between adjacent frames of the video content.
  • the feature point matching rate is represented by a ratio of matching feature points between two detection frames to a total number of feature points in the second detection frame.
  • the second detection frame where the feature point matching rate is zero is determined as the switching frame.
  • the method of the present invention further comprises determining, for each video scene, an index frame that is representative of a video segment of the video scene.
  • an apparatus for detecting a video scene switch comprising: an extracting unit that extracts each video frame from video content of a video clip including a plurality of video scenes; and a detecting unit that Two video frames of a prescribed interval are determined as detection frames, and a switching frame at a switching position of the video scene is detected using a feature point matching ratio between the two detection frames.
  • the present invention can automatically detect video scene switching, and can effectively detect the gradual switching and the mutation switching by using the analysis of the degree of change of the video scene and the matching relationship between the feature points of the frame.
  • FIG. 1 is a schematic diagram of a video retrieval system for explaining video segmentation of video content into a single video scene and storing and managing the index frame.
  • FIG. 2 is a schematic diagram showing a piece of video content having different video scenes.
  • FIG. 3 is a diagram showing a method of detecting a video scene switching according to the present invention. A schematic diagram of the scene.
  • FIG. 4 is a schematic diagram showing the acquisition of an index frame for each video scene in accordance with the present invention.
  • FIG. 5 is a diagram showing a feature point matching relationship between two frames utilized in a method of detecting video scene switching according to the present invention.
  • FIG. 6 is a schematic diagram showing switching of two types of video scenes, abrupt switching and gradual switching.
  • FIG. 7 is a schematic diagram showing fine detection and coarse detection utilized in a method of detecting video scene switching according to the present invention.
  • FIG. 8 is a schematic diagram showing the result of the detection of the video scene switching position detected by the fine detection and the video scene switching position detected by the coarse detection as the final detection result.
  • FIG. 9 is a flow chart showing a method for detecting a video scene switch in accordance with the present invention.
  • FIG. 10 is a schematic diagram showing two different application scenarios of a method for detecting video scene switching in accordance with the present invention.
  • FIG. 1 is a schematic diagram of a video retrieval system for explaining video segmentation of video content into a single video scene and storing and managing the index frame.
  • the left side of Figure 1 represents the various different video content.
  • Each of the individual video content includes a video segment of a plurality of single video scenes.
  • a video segment of each single video scene may be represented by a frame that is a representative of the video scene, that is, an index frame, as shown in the middle of FIG.
  • an index frame There are many known methods for determining an index frame for each video scene. For example, an intermediate frame of a video scene or a frame of the first and last frames may be used as an index frame, which is not described in detail herein.
  • the index frames representing the video scenes are stored in a database, as shown on the right side of FIG. In this way, the retrieval of the video scene can be performed using the database.
  • FIG. 2 is a schematic diagram showing a piece of video content having different video scenes.
  • video content is composed of video segments of a plurality of single video scenes.
  • the so-called video scene generally refers to the video content acquired by one shot continuous shooting, which has coherence. Since the management of a video segment of a single video scene is more efficient than the management of the overall video, it is necessary to segment the video content into video segments of a plurality of single video scenes.
  • the method for detecting video scene switching according to the present invention described below can effectively segment the video scenes to determine an index frame of each video scene.
  • FIG. 3 is a schematic diagram showing an application scenario of a method of detecting a video scene switch according to the present invention.
  • one video content can be segmented into video segments of a plurality of single video scenes.
  • the existing method can be used to determine its index frame.
  • the index frames of the video content are then stored in a database.
  • retrieval of a scene such as a video can be accomplished by retrieving the stored index frames in a database.
  • FIG. 4 is a schematic diagram showing the acquisition of an index frame for each video scene in accordance with the present invention.
  • the method of detecting the video scene switching may be used to first determine the switching frame of each video scene constituting the video content, that is, the frame at the position where the switching between the video scenes occurs. Then, between the adjacent two switching frames, that is, for each video scene, the index frame is acquired by a known method. In the example shown in Figure 4, a total of seven different scenarios are shown. For each video scene, a corresponding index frame is obtained as a representative of the video scene, for example, index frames 1-7 shown in FIG.
  • FIG. 5 is a diagram showing a feature point matching relationship between two frames utilized in a method of detecting video scene switching according to the present invention.
  • a detection frame is a frame used for detection in a video, and is generally two adjacent frames or two adjacent frames.
  • the feature points are feature points on different objects (objects) included in the image of each frame acquired in the detection frame by some existing feature point extraction algorithms.
  • the so-called feature points are extracted by using some algorithms on the image.
  • a feature point has a multi-dimensional feature vector that characterizes the nature of the feature.
  • Feature point extraction algorithms include algorithms such as SIFT or SURF. The extraction of feature points for each frame image is already a well-known technique and therefore will not be described in detail herein.
  • the matching relationship of feature points between two detected frames is utilized.
  • the so-called feature point matching refers to: calculating the relationship between the Euclidean distance between the feature vectors of the two feature points and a certain threshold value. If the threshold value is less than the threshold value, the two feature points match, and vice versa.
  • the feature points extracted on the two detection frames are used to perform matching operations on the feature points, thereby determining the matching ratio of the feature points between the two detection frames.
  • the second detection frame where the feature point matching rate is zero is determined as the switching frame, that is, the switching position of the video scene.
  • FIG. 6 is a schematic diagram showing switching of two types of video scenes, abrupt switching and gradual switching.
  • the so-called mutation switching means that the switching position of the video scene lies in the switching between two adjacent frames, and the switching of the video scene is relatively sharp.
  • the switching of the video scene occurs rapidly between two frames.
  • the gradual switching means that the switching of the video scene is gradually switched between frames larger than two frames, and the switching of the video scene is progressive.
  • the switching of the video scene occurs progressively between 5 frames. Since the detection of the video scene switching in the prior art is performed between two closely adjacent detection frames (see the fine detection described below), the switching of the gradual switching cannot be performed by the prior art detection method. The position is detected. This is in the example of the gradual switching shown in the lower part of Fig.
  • the interval between the two detection frames can be adjusted to 5 frames, that is, the coarse detection as described below is performed.
  • the interval between the two detection frames can be adjusted to every 5 frames, since there is no similar image content part (feature object or feature point) between the 1st frame and the 5th frame, it is possible to split the gradation switch into two by detection. Scenes.
  • FIG. 7 is a schematic diagram showing fine detection and coarse detection utilized in a method of detecting video scene switching according to the present invention.
  • the finely detected detection frame is the next two adjacent frames.
  • the detection frame of the coarse detection is two frames with a certain interval. Both the fine detection and the coarse detection are determined by two
  • the feature point matching rate between the detected frames is determined.
  • the feature point matching rate may be represented by a ratio of matching feature points between two detection frames to a total number of feature points in the second detection frame. If the feature point matching rate is 0, then a video scene transition is detected. At this time, the second detection frame is determined as a switching frame.
  • the interval of the detection frame as the coarse detection can be calculated by the following formula:
  • Interval is the detection interval of the coarse detection
  • ⁇ h is the difference of the gray histograms of two adjacent frames
  • N is a parameter that characterizes the average degree of change in the gray histogram in a video
  • T is the total number of frames of a video
  • is a parameter that characterizes the relationship between N and interval
  • i is the scale of the gray histogram (from 0-255);
  • ⁇ i is the difference between the number of pixels whose gray histogram values of two adjacent frames are i th .
  • the interval of the coarse detection can be determined based on the statistical value of the degree of change between adjacent frames of the video content.
  • FIG. 8 is a schematic diagram showing the result of the detection of the video scene switching position detected by the fine detection and the video scene switching position detected by the coarse detection as the final detection result.
  • the method for detecting the video scene switching according to the present invention needs to detect the detected video scene switching position and the coarse detection detected video.
  • the scene switching positions are combined together as the final detection result.
  • the switching frame and the coarse detection detection of the detected video scene will be finely detected.
  • the switching frames of the incoming video scene are combined together as the resulting switched frame. Thereby, a switching frame for dividing each video scene can be obtained.
  • FIG. 9 is a flow chart showing a method for detecting a video scene switch in accordance with the present invention.
  • step 901 individual frames are extracted from the input video image.
  • step 903 the degree of change between adjacent frames is utilized to determine the interval of the coarse detection.
  • step 905 coarse detection is performed according to the determined coarse detection interval, and the feature point matching ratio between the two detection frames is calculated.
  • step 911 fine detection is performed between two adjacent detection frames, and the feature point matching ratio between the adjacent two detection frames is calculated.
  • step 907 it is judged whether the feature point matching rate of the coarse detection and the feature point matching rate of the fine detection are zero. If it is judged to be zero ("Y" in step 907), the process proceeds to step 913.
  • step 913 the detection frame at which the feature point matching rate is zero is determined as the switching frame, and the ID of the switching frame is recorded. If it is judged that it is not zero ("N" in step 907), it proceeds to step 909. At step 909, it is determined whether the current frame is the last frame. If it is the last frame ("Y" of step 909), the detection process is ended. If it is not the last frame, then steps 905 and 911 are entered, and the determination of the switching frame is continued.
  • FIG. 10 is a schematic diagram showing two different application scenarios of a method for detecting video scene switching in accordance with the present invention.
  • Example 1 in a video compression application, first, a video is segmented into video segments of a plurality of single video scenes, and then key frames (ie, index frames) are extracted for each video segment according to the length. However, each scene has at least one keyframe, which avoids simply extracting keyframes based on time intervals and compressing and missing some scenes.
  • key frames ie, index frames
  • Example 2 in a video retrieval application, first, a video is segmented into video segments of a single video scene. An index frame is extracted for each video segment. This index frame can then be used to represent the video segment. In this way, the video clip can be quickly found through the index frame without having to traverse all the frames of the video.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Television Signal Processing For Recording (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

提出了一种用于对视频场景切换进行检测的方法,包括:提取步骤,从包括多个视频场景的视频片段的视频内容中提取各个视频帧;以及检测步骤,将规定间隔的两个视频帧确定为检测帧,利用两个检测帧之间的特征点匹配率来检测处于视频场景的切换位置的切换帧。

Description

用于对视频场景切换进行检测的方法和装置 技术领域
本发明涉及一种用于对视频场景切换进行检测的方法和装置,能够利用帧间特征点的匹配关系对视频场景的切换帧进行检测,并根据检测结果来获取各视频场景的索引帧,并能够利用对视频场景的变化程度的分析来同时执行粗检测和细检测,从而在场景渐变切换和突变切换的情形均实现切换帧的有效检测。
背景技术
在媒体行业,每天都有海量的视频资料数据产生并需要对其进行存储。如何有效地对这些视频资料数据进行存储和管理是一个亟待解决的技术问题。
在对视频资料数据进行存储和管理的过程中,根据视频场景的不同,能够对视频内容进行分割的视频场景分割技术,正越来越受到媒体行业的重视。该视频场景分割技术能够有效提高视频存储、管理和查找的效率,并正在得到更广泛的应用。
所谓视频场景通常是指一次镜头连续拍摄所获取的视频内容,其具有连贯性。因此,往往可以用视频场景中的一帧的视频内容来作为整个视频场景的代表。作为该视频场景的代表的帧通常被成为索引帧。所谓视频场景切换是指视频内容从一个视频场景切换到另一个视频场景的过程。在视频场景切换的过程中通常会出现视频内容的不连贯,因此,有时会出现对视频内容的不同视频场景进行分割的需求。所谓对视频场景进行分割是指根据视频场景的切换位置,将视频内容根据视频场景的不同而分割成多个单一视频场景的视频片段。
对视频场景进行分割的意义在于:对单一视频场景的视频片段的管理比对整体的视频的管理更具有效率。例如,可以将同类型场景的视频片段进行集中管理,因此,当需要使用某一类型的视频片段时,可以直接在该 类型的视频场景库中进行,而避免在内容更多更复杂的整体视频库中查找。
此外,单一视频场景的视频片段往往可以用其中某一帧的内容来代表。通常,将该帧称为索引帧(或者代表帧)。然后,采用利用索引帧来进行索引的方式可以有效地管理视频场景的视频片段,并快速地检索到所需的视频场景。例如,在一个有10000帧、100个场景的整体视频内容中,利用视频场景分割技术可以获取100个单一视频场景的视频片段,每个视频片段选取一个帧作为索引帧,总共可以获取100个索引帧,那么,在一次视频场景的查找中,只需要检索这100个索引帧,而不是整体的10000帧,极大地提高了效率。
对视频场景进行分割的关键点在于找到视频场景的切换位置(例如,切换帧)。视频场景的切换通常分为两类:突变切换和渐变切换。突变切换是指视频场景的切换位置在于相邻的两帧之间;而渐变切换是指视频场景的切换在大于两帧的帧间逐渐进行的切换。
目前,已有一些技术能够对视频场景进行分割。申请号为CN201110405542的专利申请提出了一种基于聚类的场景检测方法。另一申请号为CN201410831291的专利申请提出了一种基于彩色信息的基于区域分割的场景检测方法。此外,美国专利No:US8913872提出了一种基于区域分割和灰度均值的场景检测方法。
但是,这些技术偏重于利用视频内容的诸如彩色信息等整体信息,因此对于视频场景的渐变切换的检测存在困难,而且对于视频场景未切换,但局部变化较大的视频场景会出现误检测的情形。
发明内容
为了克服现有技术的上述缺陷提出了本发明。因此,本发明的目的之一是提出一种用于对视频场景切换进行检测的方法和装置,能够利用帧间特征点的匹配关系对视频场景的切换帧进行检测,并能够利用对视频场景的变化程度的分析来同时执行粗检测和细检测,从而在场景渐变切换和突变切换的情形均实现切换帧的有效检测。
为了实现上述目的,根据本发明,提出了一种用于对视频场景切换进行检测的方法,包括:提取步骤,从包括多个视频场景的视频片段的视频 内容中提取各个视频帧;以及检测步骤,将规定间隔的两个视频帧确定为检测帧,利用两个检测帧之间的特征点匹配率来检测处于视频场景的切换位置的切换帧。
优选地,所述检测步骤包括:执行将相邻的两个视频帧作为检测帧的细检测;以及执行规定间隔的并非相邻的两个视频帧作为检测帧的粗检测。
优选地,所述检测步骤还包括:将通过细检测所检测到的切换帧和通过粗检测所检测到的切换帧合在一起作为最终的切换帧。
优选地,所述粗检测中所采用的规定间隔是根据视频内容的相邻帧之间的变化程度的统计值来确定的。
优选地,所谓特征点匹配率由两个检测帧之间的匹配的特征点占第二个检测帧中特征点总数的比例来表示。
优选地,将特征点匹配率为零处的第二检测帧确定为所述切换帧。
优选地,本发明的方法还包括:针对各视频场景,确定作为该视频场景的视频片段的代表的索引帧。
另外,根据本发明,还提出了一种用于对视频场景切换进行检测的装置,包括:提取单元,从包括多个视频场景的视频片段的视频内容中提取各个视频帧;以及检测单元,将规定间隔的两个视频帧确定为检测帧,利用两个检测帧之间的特征点匹配率来检测处于视频场景的切换位置的切换帧。
由此可见,本发明能够自动对视频场景切换进行检测,通过利用对视频场景的变化程度的分析和帧间特征点的匹配关系,能够对渐变切换和突变切换进行有效的检测。
附图说明
通过参考以下组合附图对所采用的优选实施方式的详细描述,本发明的上述目的、优点和特征将变得更显而易见,其中:
图1是用于说明将视频内容分割为单一视频场景的视频片段,并利用索引帧进行存储和管理的视频检索系统的示意图。
图2是示出了具有不同的视频场景的一段视频内容的示意图。
图3是示出了根据本发明的对视频场景切换进行检测的方法的一个应 用场景的示意图。
图4是示出了根据本发明的获取各视频场景的索引帧的示意图。
图5是示出了根据本发明的对视频场景切换进行检测的方法中所利用的两帧之间的特征点匹配关系的示意图。
图6是示出了突变切换和渐变切换这两类视频场景的切换的示意图。
图7是示出了根据本发明的对视频场景切换进行检测的方法中所利用的细检测和粗检测的示意图。
图8示出了将细检测检测到的视频场景切换位置与粗检测检测到的视频场景切换位置合在一起作为最终得到的检测结果的示意图。
图9是示出了根据本发明的用于对视频场景切换进行检测的方法的流程图。
图10是示出了根据本发明的用于对视频场景切换进行检测的方法的两个不同应用场景的示意图。
具体实施方式
下面将参考附图描述本发明的优选实施例。在附图中,相同的元件将由相同的参考符号或数字表示。此外,在本发明的下列描述中,将省略对已知功能和配置的具体描述,以避免使本发明的主题不清楚。
图1是用于说明将视频内容分割为单一视频场景的视频片段,并利用索引帧进行存储和管理的视频检索系统的示意图。
图1的左侧表示各个不同的视频内容。各个视频内容的每一个包括多个单一视频场景的视频片段。每一个单一视频场景的视频片段可以由作为该视频场景的代表的帧即索引帧来表示,如图1的中部所示。关于为每一个视频场景确定索引帧的具体方法,存在许多的已知方法,例如可以选用一个视频场景的中间帧,或者首尾帧中的一帧作为索引帧,在此不再具体描述。在视频检索系统中,为了对视频资料数据进行存储和管理,将代表各视频场景的索引帧存储在一个数据库中,如图1的右侧所示。这样,对视频场景进行检索则可以利用该数据库来进行。
图2是示出了具有不同的视频场景的一段视频内容的示意图。
如图2所示,通常,视频内容由多个单一视频场景的视频片段构成。如 前面已经描述过的,所谓视频场景通常是指一次镜头连续拍摄所获取的视频内容,其具有连贯性。由于对单一视频场景的视频片段的管理比对整体的视频的管理更具有效率,所以需要将视频内容分割为多个单一视频场景的视频片段。利用以下所描述的本发明的对视频场景切换进行检测的方法,能够有效地对这些视频场景进行分割,进而确定各视频场景的索引帧。
图3是示出了根据本发明的对视频场景切换进行检测的方法的应用场景的示意图。
如图3所示,利用根据本发明的对视频场景切换进行检测的方法,能够将一个视频内容分割为多个单一视频场景的视频片段。对于每个视频场景,可以利用已有的方法来确定其索引帧。然后,将该视频内容的各索引帧存储在数据库中。这样,对诸如视频场景的检索可以通过在数据库中对所存储的索引帧进行检索来实现。
图4是示出了根据本发明的获取各视频场景的索引帧的示意图。
为了提取各视频场景的索引帧,需要首先确定各视频场景的切换位置。在本发明中,可以利用对视频场景切换进行检测的方法,先确定构成该视频内容的各视频场景的切换帧,即,各视频场景之间发生切换的位置处的帧。然后,在相邻的两个切换帧之间,即,针对每个视频场景,通过已知的方法来获取索引帧。在图4所示的示例中,共示出了7个不同的场景。对于每一个视频场景,获得了相应的索引帧作为该视频场景的代表,例如,图4所示的索引帧1~7。
图5是示出了根据本发明的对视频场景切换进行检测的方法中所利用的两帧之间的特征点匹配关系的示意图。
如背景部分所描述过的,在对视频场景切换进行检测的现有技术中,利用视频内容的诸如彩色信息等整体信息来检测视频场景的切换位置,因此对于视频场景的渐变切换的检测存在困难,甚至会出现误检测的情形。
在本发明的对视频场景切换进行检测的方法中,利用两检测帧之间的特征点匹配来对视频场景的切换位置进行检测。检测帧是指一段视频中用来进行检测的帧,一般为相邻两帧或相邻一定间隔的两帧。特征点是通过一些已有的特征点提取算法在检测帧中获取的各帧的图像中所包括的不同对象(物体)上的特征点,所谓特征点是指在图像上利用一些算法提取的 具有一定特征的像素点,例如在图像中边缘处的角点、交叉点,或在像素点一定领域内具有某种统计特征的像素点。特征点具有一个表征该特征性质的多维的特征向量。特征点提取算法包括诸如SIFT或SURF算法。关于各帧图像的特征点的提取已经是公知的技术,因此在此不再具体描述。
在本发明的对视频场景切换进行检测的方法中,利用了两个检测帧之间的特征点的匹配关系。所谓特征点匹配是指:计算两个特征点的特征向量之间的欧式距离与某个阈值的大小关系,如果小于该阈值则两特征点匹配,反之则不匹配。在本发明中,在对视频场景切换位置进行检测时,利用两个检测帧上提取的特征点进行特征点是否匹配的运算,从而确定两个检测帧之间特征点的匹配率。在特征点匹配率为零处的第二检测帧确定为切换帧,即视频场景的切换位置。
图6是示出了突变切换和渐变切换这两类视频场景的切换的示意图。
所谓突变切换是指视频场景的切换位置在于相邻的两帧之间的切换,视频场景的切换较为急剧。在图6的上部所示的示例中,视频场景的切换在两帧之间迅速发生。而渐变切换是指视频场景的切换在大于两帧的帧间逐渐进行的切换,视频场景的切换为渐进式的。在图6下部所示的示例中,视频场景的切换在5帧之间渐进发生。由于现有技术的对视频场景切换的检测是在两个紧挨的相邻检测帧之间执行的(参见如下所述的细检测),因此通过现有技术的检测方式无法对渐变切换的切换位置进行检测。这是在图6下部所示的渐变切换的示例中,任意两个相邻检测帧之间均存在类似的图像内容部分。因此,为了对渐变切换的切换位置进行检测,需要对两个检测帧之间的间隔进行调整。例如,在图6下部所示的渐变切换的示例中,可以将两个检测帧之间的间隔调整为5个帧,即,执行如下所述的粗检测。显然,通过调整为每5个帧检测一次,由于第1个帧和第5个帧之间没有类似的图像内容部分(特征对象或特征点),因此能够通过检测来将渐变切换分割为两个场景。
图7是示出了根据本发明的对视频场景切换进行检测的方法中所利用的细检测和粗检测的示意图。
如在图6的说明中所提到过的,细检测的检测帧为紧挨着的相邻两帧。而粗检测的检测帧为有一定间隔的两帧。细检测和粗检测的判定均通过两 个检测帧之间的特征点匹配率来判定。例如,所谓特征点匹配率可以由两个检测帧之间的匹配的特征点占第二个检测帧中特征点总数的比例来表示。如果特征点匹配率为0,那么一个视频场景变换被检测。此时,将该第二个检测帧确定为切换帧。
作为粗检测的检测帧的间隔,可以通过以下的公式来计算:
Figure PCTCN2016110717-appb-000001
Figure PCTCN2016110717-appb-000002
Figure PCTCN2016110717-appb-000003
interval=αN      (4)
interval是粗检测的检测间隔;
Δh是相邻两帧的灰度直方图的差值;
N是表征一段视频中灰度直方图平均变化程度的参数;
T是一段视频的总帧数;
α是表征N与interval关系的参数;
i是灰度直方图的尺度(从0-255);
Δi是指两个相邻帧的灰度直方图值为ith的像素数的差值。
例如,一段视频具有N=32,通过(1),可以确定
Figure PCTCN2016110717-appb-000004
因此粗检测间隔为interval=0.25*32=8。
通过上述公式(1)、(2)、(3)和(4),能够根据视频内容的相邻帧之间的变化程度的统计值来确定粗检测的间隔。
图8示出了将细检测检测到的视频场景切换位置与粗检测检测到的视频场景切换位置合在一起作为最终得到的检测结果的示意图。
由于渐变切换无法通过细检测得出而仅能粗检测来得到,因此,根据本发明的对视频场景切换进行检测的方法,需要将细检测检测到的视频场景切换位置与粗检测检测到的视频场景切换位置合在一起作为最终得到的检测结果。如图8所示,将细检测检测到的视频场景的切换帧与粗检测检测 到的视频场景的切换帧合在一起作为最终得到的切换帧。由此,可以得到用于分割各个视频场景的切换帧。
图9是示出了根据本发明的用于对视频场景切换进行检测的方法的流程图。
在步骤901,从输入的视频图像中提取各个帧。在步骤903,利用相邻帧之间的变化程度来确定粗检测的间隔。然后,在步骤905,根据所确定的粗检测间隔,进行粗检测,计算两个检测帧之间的特征点匹配率。同时,在步骤911,在相邻的两个检测帧之间进行细检测,计算相邻的两个检测帧之间的特征点匹配率。在步骤907,判断粗检测的特征点匹配率和细检测的特征点匹配率是否为零。如果判断为零(步骤907的“Y”),则进入步骤913。在步骤913,将特征点匹配率为零处的检测帧确定为切换帧,并记录该切换帧的ID。如果判断为不为零(步骤907的“N”),则进入步骤909。在步骤909,判断是否当前帧为最后一帧。如果为最后一帧(步骤909的“Y”),则结束该检测过程。如果不为最后一帧,则进入步骤905和911,继续对切换帧进行确定。
图10是示出了根据本发明的用于对视频场景切换进行检测的方法的两个不同应用场景的示意图。
如例子1所示,在视频压缩应用中,首先,将视频分割为多个单一视频场景的视频片段,然后,对每个视频片段根据长度提取出关键帧(即索引帧)。但是,每个场景至少有一个关键帧,这样可以避免简单根据时间间隔来提取关键帧并压缩而错失掉某些场景。
如例子2所示,在视频检索应用中,首先,将视频分割为单一视频场景的视频片段。针对每个视频片段,提取一个索引帧。然后,可以用这个索引帧来代表该视频片段。这样,就可以通过该索引帧快速找到这一视频片段,不用去遍历视频的所有帧。
以上列举了若干具体实施例来详细阐明本发明,这些个例仅用于说明本发明的原理及其实施方法,而非对本发明的限制,在不脱离本发明的精神和范围的情况下,本领域的技术人员还可以做出各种变形和改进。因此,本发明不应由上述实施例来限定,而应由所附权利要求及其等价物来限定。

Claims (8)

  1. 一种用于对视频场景切换进行检测的方法,包括:
    提取步骤,从包括多个视频场景的视频片段的视频内容中提取各个视频帧;以及
    检测步骤,将规定间隔的两个视频帧确定为检测帧,利用两个检测帧之间的特征点匹配率来检测处于视频场景的切换位置的切换帧。
  2. 根据权利要求1所述的方法,其特征在于,
    所述检测步骤包括:执行将相邻的两个视频帧作为检测帧的细检测;以及执行规定间隔的并非相邻的两个视频帧作为检测帧的粗检测。
  3. 根据权利要求1所述的方法,其特征在于,
    所述检测步骤还包括:
    将通过细检测所检测到的切换帧和通过粗检测所检测到的切换帧合在一起作为最终的切换帧。
  4. 根据权利要求1所述的方法,其特征在于,
    所述粗检测中所采用的规定间隔是根据视频内容的相邻帧之间的变化程度的统计值来确定的。
  5. 根据权利要求1所述的方法,其特征在于,
    所谓特征点匹配率由两个检测帧之间的匹配的特征点占第二个检测帧中特征点总数的比例来表示。
  6. 根据权利要求1所述的方法,其特征在于,
    将特征点匹配率为零处的第二检测帧确定为所述切换帧。
  7. 根据权利要求1所述的方法,其特征在于还包括:
    针对各视频场景,确定作为该视频场景的视频片段的代表的索引帧。
  8. 一种用于对视频场景切换进行检测的装置,包括:
    提取单元,从包括多个视频场景的视频片段的视频内容中提取各个视频帧;以及
    检测单元,将规定间隔的两个视频帧确定为检测帧,利用两个检测帧之间的特征点匹配率来检测处于视频场景的切换位置的切换帧。
PCT/CN2016/110717 2015-12-30 2016-12-19 用于对视频场景切换进行检测的方法和装置 WO2017114211A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201511024969.3A CN106937114B (zh) 2015-12-30 2015-12-30 用于对视频场景切换进行检测的方法和装置
CN201511024969.3 2015-12-30

Publications (1)

Publication Number Publication Date
WO2017114211A1 true WO2017114211A1 (zh) 2017-07-06

Family

ID=59225634

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/110717 WO2017114211A1 (zh) 2015-12-30 2016-12-19 用于对视频场景切换进行检测的方法和装置

Country Status (2)

Country Link
CN (1) CN106937114B (zh)
WO (1) WO2017114211A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108549859A (zh) * 2018-04-09 2018-09-18 深圳市鹰硕技术有限公司 多屏互动的网络教学方法以及装置
CN109543511A (zh) * 2018-10-09 2019-03-29 广州市诚毅科技软件开发有限公司 基于图纹突变帧和特征计算的视频识别方法、系统及装置
CN111246126A (zh) * 2020-03-11 2020-06-05 广州虎牙科技有限公司 基于直播平台的导播切换方法、系统、装置、设备及介质
WO2021121237A1 (zh) * 2019-12-19 2021-06-24 维沃移动通信有限公司 视频流裁剪方法及电子设备

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108108699A (zh) * 2017-12-25 2018-06-01 重庆邮电大学 融合深度神经网络模型和二进制哈希的人体动作识别方法
CN108307248B (zh) * 2018-02-01 2019-10-29 腾讯科技(深圳)有限公司 视频播放方法、装置、计算设备及存储介质
CN108769458A (zh) * 2018-05-08 2018-11-06 东北师范大学 一种深度视频场景分析方法
CN111383201B (zh) * 2018-12-29 2024-03-12 深圳Tcl新技术有限公司 基于场景的图像处理方法、装置、智能终端及存储介质
CN110430443B (zh) * 2019-07-11 2022-01-25 平安科技(深圳)有限公司 视频镜头剪切的方法、装置、计算机设备及存储介质
CN111491180B (zh) * 2020-06-24 2021-07-09 腾讯科技(深圳)有限公司 关键帧的确定方法和装置
CN112165621B (zh) * 2020-09-24 2024-01-19 北京金山云网络技术有限公司 场景切换帧的检测方法及装置、存储介质、电子设备
CN112203092B (zh) * 2020-09-27 2024-01-30 深圳市梦网视讯有限公司 一种全局运动场景的码流分析方法、系统及设备

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000235639A (ja) * 1999-02-15 2000-08-29 Canon Inc 2次元dpマッチングを用いたシーンチェンジ検出方法及びそれを実現する画像処理装置
CN102685398A (zh) * 2011-09-06 2012-09-19 天脉聚源(北京)传媒科技有限公司 一种新闻视频场景生成方法
CN104243769A (zh) * 2014-09-12 2014-12-24 刘鹏 基于自适应阈值的视频场景变化检测方法

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP3648199B2 (ja) * 2001-12-27 2005-05-18 株式会社エヌ・ティ・ティ・データ カット検出装置およびそのプログラム
CN101072342B (zh) * 2006-07-01 2010-08-11 腾讯科技(深圳)有限公司 一种场景切换的检测方法及其检测系统
CN101360184B (zh) * 2008-09-22 2010-07-28 腾讯科技(深圳)有限公司 提取视频关键帧的系统及方法
CN101620629A (zh) * 2009-06-09 2010-01-06 中兴通讯股份有限公司 一种提取视频索引的方法、装置及视频下载系统
CN102333174A (zh) * 2011-09-02 2012-01-25 深圳市万兴软件有限公司 一种视频图像处理方法和装置
CN102800095B (zh) * 2012-07-17 2014-10-01 南京来坞信息科技有限公司 一种镜头边界检测方法
CN102945549B (zh) * 2012-10-15 2015-04-15 山东大学 基于流形学习的镜头分割方法
CN105049875B (zh) * 2015-07-24 2018-07-20 上海上大海润信息系统有限公司 一种基于混合特征与突变检测的精确关键帧提取方法

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000235639A (ja) * 1999-02-15 2000-08-29 Canon Inc 2次元dpマッチングを用いたシーンチェンジ検出方法及びそれを実現する画像処理装置
CN102685398A (zh) * 2011-09-06 2012-09-19 天脉聚源(北京)传媒科技有限公司 一种新闻视频场景生成方法
CN104243769A (zh) * 2014-09-12 2014-12-24 刘鹏 基于自适应阈值的视频场景变化检测方法

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
LI, XIA ET AL.: "Video Shot Boundary Detection Based on Key Points Matching", JOURNAL OF QUFU NORMAL UNIVERSITY, vol. 41, no. 1, 31 January 2015 (2015-01-31) *
TANG, JIANQI ET AL.: "Shot Boundary Detection Algorithm Based on ORB", JOURNAL ON COMMUNICATIONS, vol. 34, no. 11, 30 November 2013 (2013-11-30) *
YAN, YAN: "Research on Shot Boundary Detection Algorithm", ELECTRONIC TECHNOLOGY & INFORMATION SCIENCE , CHINA MASTER'S THESES FULL-TEXT DATABASE (MONTHLY, 15 December 2011 (2011-12-15), pages 15 - 17 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108549859A (zh) * 2018-04-09 2018-09-18 深圳市鹰硕技术有限公司 多屏互动的网络教学方法以及装置
CN108549859B (zh) * 2018-04-09 2021-04-06 深圳市鹰硕技术有限公司 多屏互动的网络教学方法以及装置
CN109543511A (zh) * 2018-10-09 2019-03-29 广州市诚毅科技软件开发有限公司 基于图纹突变帧和特征计算的视频识别方法、系统及装置
WO2021121237A1 (zh) * 2019-12-19 2021-06-24 维沃移动通信有限公司 视频流裁剪方法及电子设备
CN111246126A (zh) * 2020-03-11 2020-06-05 广州虎牙科技有限公司 基于直播平台的导播切换方法、系统、装置、设备及介质

Also Published As

Publication number Publication date
CN106937114A (zh) 2017-07-07
CN106937114B (zh) 2020-09-25

Similar Documents

Publication Publication Date Title
WO2017114211A1 (zh) 用于对视频场景切换进行检测的方法和装置
CN111327945B (zh) 用于分割视频的方法和装置
US8073208B2 (en) Method and system for classifying scene for each person in video
JP4553650B2 (ja) 画像グループの表現方法および表現方法によって導出される記述子、探索方法、装置、コンピュータプログラム、ならびに記憶媒体
JP4139615B2 (ja) 前景/背景セグメント化を用いた画像のイベント・クラスタリング
US8316301B2 (en) Apparatus, medium, and method segmenting video sequences based on topic
KR101369915B1 (ko) 영상 식별자 추출 장치
US8467611B2 (en) Video key-frame extraction using bi-level sparsity
EP1580757A2 (en) Extracting key-frames from a video
US20110085734A1 (en) Robust video retrieval utilizing video data
KR100729660B1 (ko) 장면 전환 길이를 이용한 디지털 비디오 인식 시스템 및 방법
Angadi et al. Entropy based fuzzy C means clustering and key frame extraction for sports video summarization
Omidyeganeh et al. Video keyframe analysis using a segment-based statistical metric in a visually sensitive parametric space
CN108966042B (zh) 一种基于最短路径的视频摘要生成方法及装置
Chatzigiorgaki et al. Real-time keyframe extraction towards video content identification
US20070061727A1 (en) Adaptive key frame extraction from video data
Patel et al. Shot detection using pixel wise difference with adaptive threshold and color histogram method in compressed and uncompressed video
JP5644505B2 (ja) 照合加重情報抽出装置
Fernando et al. Fade-in and fade-out detection in video sequences using histograms
Barbieri et al. KS-SIFT: a keyframe extraction method based on local features
KR101323369B1 (ko) 영상 프레임 군집화 장치 및 방법
KR102004929B1 (ko) 멀티미디어 파일 유사도 검색 시스템 및 방법
Shah et al. Shot boundary detection using logarithmic intensity histogram: An application for video retrieval
Vashistha et al. 2PASCD: an efficient 2-pass abrupt scene change detection algorithm
KR100438304B1 (ko) 실시간 진행형 뉴스 비디오 인덱싱 방법 및 시스템

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16881000

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16881000

Country of ref document: EP

Kind code of ref document: A1