JPH11266428A - Method and device for picture division and recording medium with picture division program recorded - Google Patents

Method and device for picture division and recording medium with picture division program recorded

Info

Publication number
JPH11266428A
JPH11266428A JP10068160A JP6816098A JPH11266428A JP H11266428 A JPH11266428 A JP H11266428A JP 10068160 A JP10068160 A JP 10068160A JP 6816098 A JP6816098 A JP 6816098A JP H11266428 A JPH11266428 A JP H11266428A
Authority
JP
Japan
Prior art keywords
video
voice
sound information
music
dividing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
JP10068160A
Other languages
Japanese (ja)
Other versions
JP3488626B2 (en
Inventor
Kenichi Minami
憲一 南
Akito Akutsu
明人 阿久津
Yoshinobu Tonomura
佳伸 外村
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nippon Telegraph and Telephone Corp
Original Assignee
Nippon Telegraph and Telephone Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nippon Telegraph and Telephone Corp filed Critical Nippon Telegraph and Telephone Corp
Priority to JP06816098A priority Critical patent/JP3488626B2/en
Publication of JPH11266428A publication Critical patent/JPH11266428A/en
Application granted granted Critical
Publication of JP3488626B2 publication Critical patent/JP3488626B2/en
Anticipated expiration legal-status Critical
Expired - Lifetime legal-status Critical Current

Links

Landscapes

  • Television Signal Processing For Recording (AREA)
  • Electrophonic Musical Instruments (AREA)

Abstract

PROBLEM TO BE SOLVED: To roughly divide pictures by extracting the feature quantities of sections including no music neither voice in sound information of pictures to calculate degrees of similarity of feature quantities and dividing pictures, which include sections having high degrees of similarity and sound information interposed between these sections, as one segment. SOLUTION: Pictures are stored in a picture storage part 102, and it is discriminated by a music detection part 103 whether sound information of sounds is music or not, and it is discriminated by a voice detection part 104 whether the sound information is voice or not if it is not music; and if it is not voice, sound information in this period is decided as background sounds including no music neither voice, and features of segments corresponding to background sounds are extracted by a feature extraction part 105. Long-term average spectrums of sound information are obtained by a feature extraction part 105, and the correlation between the preceding long-term average spectrum and current than is obtained by a picture division part 106, and they are regarded as the same scene to perform labeling if the correlation is high, and label information is preserved in the picture storage part 102 together with time information of segments.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【0001】[0001]

【発明の属する技術分野】本発明は、映像に含まれる音
情報の背景音を解析し、その特徴量の類似性に基づいて
映像を分割する映像分割方法、装置および映像分割プロ
グラムを記録した記録媒体に関する。
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a video dividing method and apparatus for analyzing a background sound of sound information contained in a video and dividing the video based on the similarity of the characteristic amounts thereof, and a recording in which a video dividing program is recorded. Regarding the medium.

【0002】[0002]

【従来の技術】映像を分割する方法には主に画像情報を
用いるものがあり、例えば、カメラの切り替わりである
カット点を検出し、映像をショットに分割するものがあ
る。
2. Description of the Related Art A method of dividing an image mainly uses image information. For example, there is a method of detecting a cut point at which a camera is switched and dividing the image into shots.

【0003】[0003]

【発明が解決しようとする課題】カット点を検出する方
法を用いて画像情報を分割するようにした技術の応用例
として、ショットの先頭画像をそのショットを表す代表
的な静止画像(代表画像)として空間的に並べて表示
し、映像の内容を一覧できるようにした映像表現方法が
あるが、カット点は頻繁に存在するため、長時間の映像
を対象とした場合には、代表画像の数が増えすぎてしま
うという問題があった。代表画像の数を減らすために
は、映像をより大まかに分割する必要がある。
As an application example of a technique for dividing image information using a method of detecting a cut point, a leading image of a shot is set to a representative still image (representative image) representing the shot. There is a video expression method that can be displayed spatially side-by-side and display the contents of the video.However, since there are frequent cut points, when targeting a long video, the number of representative images is reduced. There was a problem that it would increase too much. In order to reduce the number of representative images, the video needs to be roughly divided.

【0004】映像製作の観点から、ショットの集合はシ
ーンであり、当該シーンをとらえて映像を分割すること
も考えられるが、通常シーンは同じ場面のつながりであ
り、自動的に分割することは困難であった。
[0004] From the viewpoint of video production, a set of shots is a scene, and it is conceivable to divide the video by capturing the scene. However, it is difficult to automatically divide the normal scene because the same scene is connected. Met.

【0005】本発明は、同じ場面では背景音が類似する
可能性が高いという特徴を利用し、映像を大まかに分割
するようにすることを目的としている。
SUMMARY OF THE INVENTION An object of the present invention is to roughly divide a video using the feature that background sounds are likely to be similar in the same scene.

【0006】[0006]

【課題を解決するための手段】上記目的を達成するた
め、本発明においては、映像を入力し、入力された映像
を蓄積し、映像の音情報から音楽および音声を検出し、
音情報のうち、音楽および音声を含まない区間に対して
特徴量を抽出し、抽出された特徴量の類似度を算出し、
類似度が高い区間およびその区間に挟まれた音情報を含
む映像を1つのセグメントとして分割することにより、
大まかに映像を分割するようにしている。
In order to achieve the above object, according to the present invention, an image is input, the input image is stored, music and sound are detected from sound information of the image,
In the sound information, a feature amount is extracted for a section that does not include music and voice, and a similarity between the extracted feature amounts is calculated.
By dividing a video including a section having high similarity and sound information sandwiched between the sections as one segment,
The video is roughly divided.

【0007】また、音情報の長時間平均スペクトルを用
いることにより、背景音から映像セグメントの類似性を
求めるようにしている。
Further, the similarity of video segments is obtained from background sound by using a long-term average spectrum of sound information.

【0008】[0008]

【発明の実施の形態】以下に、本発明の実施例について
図面を参照して説明する。図1は、本発明の一実施形態
の映像分割装置の概略構成を示すブロック図である。本
実施形態の映像分割装置は、映像を入力する映像入力部
101と、映像を蓄積する映像蓄積部102と、音楽を
検出する音楽検出部103と、音声を検出する音声検出
部104と、音楽および音声を含まない区間に対して、
特徴量を抽出する特徴抽出部105と、抽出された特徴
量の類似度を算出し、類似度が高い区間およびその区間
に挟まれた音情報を含む映像を1つのセグメントとして
分割する映像分割部106から構成されている。
Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block diagram illustrating a schematic configuration of a video division device according to an embodiment of the present invention. The video dividing apparatus according to the present embodiment includes a video input unit 101 for inputting video, a video storage unit 102 for storing video, a music detection unit 103 for detecting music, a voice detection unit 104 for detecting audio, And the section without sound
A feature extracting unit 105 for extracting a feature amount; and a video dividing unit for calculating a similarity between the extracted feature amounts, and dividing a video including sound information sandwiched between the high similarity and the segment as one segment as one segment. 106.

【0009】図2は、本発明の一実施例の映像分割装置
の処理の流れを示したフローチャートである。本発明を
ソフトウェアで実現した場合でも同様の処理の流れとな
る。1ループの処理は1秒程度の映像セグメントに対し
て行われる。
FIG. 2 is a flowchart showing the flow of processing of the video dividing apparatus according to one embodiment of the present invention. The same processing flow is used when the present invention is implemented by software. One loop of processing is performed on a video segment of about one second.

【0010】まず、映像蓄積処理201で映像を蓄積
し、映像の音情報に対して音楽検出処理202を行う。
判断203において音楽かどうかの判別を行い、音楽な
らば判断208へジャンプする。音楽でない場合には、
音声検出処理204を施す。判断205において音声か
どうかの判断を行い、音声ならば判断208へジャンプ
する。音楽の検出には、音情報の周波数スペクトルのピ
ークが、周波数方向に対して時間的に安定しているとい
う特徴を用い、音声の検出には、くし形フィルタを用い
る方法(南他、「音解析による映像インデクシング」、
電子情報通信学会総合大会、D−12−64、199
7)などが有効である。
First, a video is stored in a video storage process 201, and a music detection process 202 is performed on sound information of the video.
In the judgment 203, it is judged whether or not the music is music. If it's not music,
The voice detection processing 204 is performed. In the judgment 205, it is judged whether or not it is a voice. Music detection uses the characteristic that the peak of the frequency spectrum of sound information is temporally stable in the frequency direction, and voice detection uses a comb filter (Minami et al., “Sound Video Indexing by Analysis ",
IEICE General Conference, D-12-64, 199
7) is effective.

【0011】なお当該「音解析による映像インデクシン
グ」は、映像に含まれる音情報から、音声や音楽を自動
的に検出し、これらが含まれる部分のみを抜き出して映
像を要約するものである。例えば、歌番組のトークを聞
かずに歌の部分のみを聞きたいといった場合に有効であ
る。音楽が存在する場合、周波数スペクトルのピーク
は、周波数方向に対して時間的に安定しているという特
徴があることから、ピークを検出し、時間的な持続性を
算出することによって、音楽を検出することができる。
[0011] The "video indexing by sound analysis" is to automatically detect voice and music from the sound information included in the video, extract only the portion including these, and summarize the video. This is effective, for example, when the user wants to hear only the song without listening to the talk of the song program. When music is present, the peak of the frequency spectrum has the characteristic of being temporally stable in the frequency direction, so the music is detected by detecting the peak and calculating the temporal continuity. can do.

【0012】音声でない場合には、その期間は音楽およ
び音声を含まない背景音であるとして、即ち、背景音に
対応するセグメントとして特徴抽出処理206が施され
る。特徴抽出処理206では、音情報を周波数解析し、
長時間平均スペクトルを求める。長時間平均スペクトル
は、各周波数におけるスペクトルのパワーの時間的平均
値である。
If it is not a voice, the feature extraction processing 206 is performed as a background sound that does not include music and voice during the period, that is, as a segment corresponding to the background sound. In the feature extraction processing 206, the sound information is frequency-analyzed,
Find the long-term average spectrum. The long-term average spectrum is a temporal average value of the spectrum power at each frequency.

【0013】次に、映像分割処理207において、1ル
ープ前に算出された長時間平均スペクトルと現在の長時
間平均スペクトルとの相関を求め、相関が高い場合には
同一場面であるとみなし、ラベリングする。相関を求め
た2つのセグメントに存在する音楽あるいは音声のセグ
メントも同一場面のものとしてラベリングする。ラベル
情報は、セグメントの時間情報と共に映像蓄積部102
に保存される。
Next, in the video segmentation processing 207, the correlation between the long-term average spectrum calculated one loop before and the current long-term average spectrum is determined. If the correlation is high, the scene is regarded as the same scene, and labeling is performed. I do. The music or audio segments present in the two segments for which the correlation has been determined are also labeled as belonging to the same scene. The label information is stored in the video storage unit 102 together with the segment time information.
Is stored in

【0014】なお前記において映像の分割について説明
したが、当該分割の態様はデータ処理装置が実行できる
プログラムの形で保持することができ、本発明は当該プ
ログラムを記録した記録媒体をも含むものである。
Although the above description has been made on the division of an image, the division can be held in the form of a program that can be executed by the data processing device, and the present invention also includes a recording medium on which the program is recorded.

【0015】[0015]

【発明の効果】(1)請求項1、3および5の発明は、
映像を入力し、入力された映像を蓄積し、映像の音情報
から音楽および音声を検出し、音情報のうち、音楽およ
び音声を含まない区間に対して特徴量を抽出し、抽出さ
れた特徴量の類似度を算出し、類似度が高い区間および
その区間に挟まれた音情報を含む映像を1つのセグメン
トとして分割することを可能にし、大まかに映像を分割
することを可能にする。
(1) The first, third and fifth aspects of the present invention
A video is input, the input video is stored, music and audio are detected from the audio information of the video, and a feature amount is extracted from a section of the audio information that does not include the music and audio, and the extracted features are extracted. By calculating the degree of similarity, it is possible to divide a video including a section with high similarity and sound information sandwiched between the sections as one segment, and roughly divide the video.

【0016】(2)請求項2、4および6の発明は、音
情報の長時間平均スペクトルを用いることにより、背景
音から映像セグメントの類似性を求めることを可能にす
る。
(2) The inventions of claims 2, 4 and 6 make it possible to determine the similarity of video segments from background sound by using a long-term average spectrum of sound information.

【図面の簡単な説明】[Brief description of the drawings]

【図1】本発明の一実施形態の映像分割装置の概略構成
を示すブロック図である。
FIG. 1 is a block diagram illustrating a schematic configuration of a video division device according to an embodiment of the present invention.

【図2】本発明の一実施形態の映像分割装置の処理の流
れと本発明をソフトウェアで実現した場合の処理の流れ
を示すフローチャートである。
FIG. 2 is a flowchart showing a processing flow of the video dividing apparatus according to the embodiment of the present invention and a processing flow when the present invention is realized by software.

【符号の説明】[Explanation of symbols]

101 映像入力部 102 映像蓄積部 103 音楽検出部 104 音声検出部 105 特徴抽出部 106 映像分割部 201 映像蓄積処理 202 音楽検出処理 203 音楽判定処理 204 音声検出処理 205 音声判定処理 206 特徴抽出処理 207 映像分割処理 208 映像終了判定処理 Reference Signs List 101 video input unit 102 video storage unit 103 music detection unit 104 audio detection unit 105 feature extraction unit 106 video division unit 201 video storage process 202 music detection process 203 music determination process 204 audio detection process 205 audio determination process 206 feature extraction process 207 video Division processing 208 Image end determination processing

Claims (6)

【特許請求の範囲】[Claims] 【請求項1】 与えられた映像を場面に対応して分割す
る方法であって、 映像を入力する映像入力段階と、 映像を蓄積する映像蓄積段階と、 映像における音情報から音楽を検出する音楽検出段階
と、 音情報から音声を検出する音声検出段階と、 音情報のうち、音楽と音声とを含まない区間に対して、
特徴量を抽出する特徴抽出段階と、 抽出された特徴量の類似度を算出して類似度が高い区間
の映像およびその区間に挟まれた音情報を含む映像を1
つのセグメントとしてまとめることにより、映像を複数
のセグメントに分割する映像分割段階と、 を実行することを特徴とする映像分割方法。
1. A method for dividing a given video according to a scene, comprising: a video input stage for inputting a video; a video storage stage for storing a video; and music for detecting music from sound information in the video. A detection step; a voice detection step of detecting voice from the voice information; and a section of the voice information that does not include music and voice.
A feature extraction step of extracting a feature amount; calculating a similarity of the extracted feature amount to obtain a video of a section having a high similarity and a video including sound information sandwiched between the sections.
A video dividing step of dividing the video into a plurality of segments by combining them into one segment.
【請求項2】 前記特徴抽出段階において、音情報の長
時間平均スペクトルを特徴量として抽出することを特徴
とする請求項1記載の映像分割方法。
2. The video segmentation method according to claim 1, wherein in the feature extraction step, a long-term average spectrum of sound information is extracted as a feature amount.
【請求項3】 与えられた映像を場面に対応して分割す
る装置であって、 映像を入力する映像入力部と、 映像を蓄積する映像蓄積部と、 映像における音情報から音楽を検出する音楽検出部と、 音情報から音声を検出する音声検出部と、 音情報のうち、音楽と音声とを含まない区間に対して、
特徴量を抽出する特徴抽出部と、 抽出された特徴量の類似度を算出して類似度が高い区間
の映像およびその区間に挟まれた音情報を含む映像を1
つのセグメントとしてまとめることにより、映像を複数
のセグメントとして分割する映像分割部と、 を具備することを特徴とする映像分割装置。
3. A device for dividing a given video according to a scene, a video input unit for inputting a video, a video storage unit for storing a video, and music for detecting music from sound information in the video. A detecting unit, a voice detecting unit that detects voice from the voice information, and a section of the voice information that does not include music and voice.
A feature extraction unit for extracting a feature amount; calculating a similarity degree of the extracted feature amount to obtain a video of a section having a high similarity degree and a video including sound information sandwiched between the sections;
A video dividing unit that divides a video into a plurality of segments by combining the video into one segment.
【請求項4】 該特徴抽出部は、音情報の長時間平均ス
ペクトルを特徴量として抽出することを特徴とする請求
項3記載の映像分割装置。
4. The video dividing apparatus according to claim 3, wherein the feature extracting unit extracts a long-term average spectrum of the sound information as a feature amount.
【請求項5】 与えられた映像を場面に対応して分割す
るプログラムを記録した記録媒体であって、 映像を入力する映像入力処理と、 映像を蓄積する映像蓄積処理と、 映像における音情報から音楽を検出する音楽検出処理
と、 音情報から音声を検出する音声検出処理と、 音情報のうち、音楽と音声とを含まない区間に対して、
特徴量を抽出する特徴抽出処理と、 抽出された特徴量の類似度を算出して類似度が高い区間
の映像およびその区間に挟まれた音情報を含む映像を1
つのセグメントとしてまとめることにより、映像を複数
のセグメントとして分割する映像分割処理と、 をコンピュータに実行させるための映像分割プログラム
を記録したことを特徴とする記録媒体。
5. A recording medium on which a program for dividing a given video according to a scene is recorded, comprising: a video input process for inputting a video, a video storage process for storing a video, and sound information in the video. A music detection process for detecting music, a voice detection process for detecting voice from sound information, and a section of the sound information that does not include music and voice.
A feature extraction process of extracting a feature amount; calculating a similarity of the extracted feature amount to obtain a video of a section having a high similarity and a video including sound information sandwiched between the sections;
A recording medium characterized by recording a video dividing program for causing a computer to execute a video dividing process of dividing a video into a plurality of segments by putting together as one segment.
【請求項6】 該特徴抽出処理は、音情報の長時間平均
スペクトルを特徴量として抽出することを特徴とする請
求項5記載の記録媒体。
6. The recording medium according to claim 5, wherein said feature extraction process extracts a long-term average spectrum of sound information as a feature amount.
JP06816098A 1998-03-18 1998-03-18 Video division method, apparatus and recording medium recording video division program Expired - Lifetime JP3488626B2 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP06816098A JP3488626B2 (en) 1998-03-18 1998-03-18 Video division method, apparatus and recording medium recording video division program

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP06816098A JP3488626B2 (en) 1998-03-18 1998-03-18 Video division method, apparatus and recording medium recording video division program

Publications (2)

Publication Number Publication Date
JPH11266428A true JPH11266428A (en) 1999-09-28
JP3488626B2 JP3488626B2 (en) 2004-01-19

Family

ID=13365737

Family Applications (1)

Application Number Title Priority Date Filing Date
JP06816098A Expired - Lifetime JP3488626B2 (en) 1998-03-18 1998-03-18 Video division method, apparatus and recording medium recording video division program

Country Status (1)

Country Link
JP (1) JP3488626B2 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021085105A1 (en) * 2019-10-28 2021-05-06 ソニー株式会社 Information processing device, proposal device, information processing method, and proposal method

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR102255152B1 (en) * 2014-11-18 2021-05-24 삼성전자주식회사 Contents processing device and method for transmitting segments of variable size and computer-readable recording medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021085105A1 (en) * 2019-10-28 2021-05-06 ソニー株式会社 Information processing device, proposal device, information processing method, and proposal method
US11895288B2 (en) 2019-10-28 2024-02-06 Sony Group Corporation Information processing device, proposal device, information processing method, and proposal method

Also Published As

Publication number Publication date
JP3488626B2 (en) 2004-01-19

Similar Documents

Publication Publication Date Title
RU2440606C2 (en) Method and apparatus for automatic generation of summary of plurality of images
EP1416490B1 (en) Systems and methods for automatically editing a video
CN110381366B (en) Automatic event reporting method, system, server and storage medium
JP3175632B2 (en) Scene change detection method and scene change detection device
JP4778231B2 (en) System and method for indexing video sequences
JP2002238027A (en) Video and audio information processing
JP4426743B2 (en) Video information summarizing apparatus, video information summarizing method, and video information summarizing processing program
JP2011223325A (en) Content retrieval device and method, and program
US8630532B2 (en) Video processing apparatus and video processing method
JP2000350156A (en) Method for storing moving picture information and recording medium recording the information
Chen et al. Scene change detection by audio and video clues
JP3517349B2 (en) Music video classification method and apparatus, and recording medium recording music video classification program
JP2009123095A (en) Image analysis device and image analysis method
JP3785068B2 (en) Video analysis apparatus, video analysis method, video analysis program, and program recording medium
JP2002281457A (en) Replaying video information
CN101241553A (en) Method and device for recognizing customizing messages jumping-off point and terminal
JPH11266428A (en) Method and device for picture division and recording medium with picture division program recorded
CN107689229A (en) A kind of method of speech processing and device for wearable device
JP2002344872A (en) Information signal processor, information signal processing method and information signal recording medium
CN115379290A (en) Video processing method, device, equipment and storage medium
JP4019945B2 (en) Summary generation apparatus, summary generation method, summary generation program, and recording medium recording the program
JP2007060606A (en) Computer program comprised of automatic video structure extraction/provision scheme
CN115460462A (en) Method for automatically cutting audio-visual data set containing anchor in Guangdong language news video
Peker et al. An extended framework for adaptive playback-based video summarization
JP3434195B2 (en) Music video management method and apparatus, and recording medium storing music video management program

Legal Events

Date Code Title Description
FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20071031

Year of fee payment: 4

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20081031

Year of fee payment: 5

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20091031

Year of fee payment: 6

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20101031

Year of fee payment: 7

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20101031

Year of fee payment: 7

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20111031

Year of fee payment: 8

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20111031

Year of fee payment: 8

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20121031

Year of fee payment: 9

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20121031

Year of fee payment: 9

FPAY Renewal fee payment (event date is renewal date of database)

Free format text: PAYMENT UNTIL: 20131031

Year of fee payment: 10

S531 Written request for registration of change of domicile

Free format text: JAPANESE INTERMEDIATE CODE: R313531

R350 Written notification of registration of transfer

Free format text: JAPANESE INTERMEDIATE CODE: R350

EXPY Cancellation because of completion of term