JP2009086016A

JP2009086016A - Music detecting device and music detecting method

Info

Publication number: JP2009086016A
Application number: JP2007252163A
Authority: JP
Inventors: Yuji Takao; 祐治高尾
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2007-09-27
Filing date: 2007-09-27
Publication date: 2009-04-23
Anticipated expiration: 2027-09-27
Also published as: JP4864847B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide a music detecting device for accurately detecting a music section. <P>SOLUTION: A music section correction part 20 corrects a start position and an end position of a music detection section 100 detected by a music section detecting part 10 into a non-sound detecting part detected by a sound volume analysis part 21. Recording information for recording music programs or the like having many frequencies in which a non-sound part appears at the start position and the end position of the music section can improve detection precision of the music section at high probability. In addition, users can easily retrieve the music section by registering chapter numbers at the start position and end position of the music section. <P>COPYRIGHT: (C)2009,JPO&INPIT

Description

本発明は、音楽検出装置および音楽検出方法に関する。 The present invention relates to a music detection device and a music detection method.

近年、大容量のハードディスク等の記憶装置を搭載した映像音声録画装置が普及している。これに伴い、録画情報のデータ容量が大きくなる傾向にあり、録画情報の中から所望の情報、例えば、歌や楽器等が演奏される音楽区間を効率よく検索するための機能が搭載されている。具体的には、音楽区間の開始位置と終了位置にチャプタ番号を設けることで利用者が容易に所望の音楽区間を検索できるようになっている。 In recent years, video / audio recording apparatuses equipped with a storage device such as a large-capacity hard disk have become widespread. Along with this, the data capacity of the recorded information tends to increase, and a function for efficiently searching for desired music information such as a song or a musical instrument from the recorded information is installed. . Specifically, by providing chapter numbers at the start position and end position of the music section, the user can easily search for the desired music section.

従来の技術として、録画情報に含まれる音声情報に基づいて音楽区間を検出する音楽検出装置がある（例えば、特許文献１）。 As a conventional technique, there is a music detection device that detects a music section based on audio information included in recorded information (for example, Patent Document 1).

この特許文献１の音楽検出装置によると、録画情報中の２チャンネルの音声からなる音声情報を抽出し、２チャンネルの音声のパワーの合計、および２チャンネルの音声のパワーの差を算出し、これらの算出されたパワーの比を求め、パワーの比を閾値と比較して比較結果に基づいて音楽区間かどうかを判定することで、音楽区間を検出することができる。
特開２００６−３０１１３４号公報 According to the music detection apparatus of Patent Document 1, audio information composed of two-channel audio in the recorded information is extracted, and the sum of the power of the two-channel audio and the difference in the power of the two-channel audio are calculated. The music section can be detected by obtaining the calculated power ratio, comparing the power ratio with a threshold value, and determining whether the music section is based on the comparison result.
JP 2006-301134 A

しかし、従来の音楽検出装置によると、音声情報にステレオ効果のある音楽区間を検出することができるが、ステレオ効果のある音楽区間以外の番組区間やＣＭ区間を検出することにより音楽区間の開始位置および終了位置に誤差を生じるという問題がある。 However, according to the conventional music detection device, a music section having a stereo effect can be detected in the audio information, but the start position of the music section is detected by detecting a program section or a CM section other than the music section having the stereo effect. In addition, there is a problem that an error occurs in the end position.

従って、本発明の目的は、音楽区間を精度良く検出する音楽検出装置を提供することにある。 Accordingly, an object of the present invention is to provide a music detection apparatus that accurately detects a music section.

（１）本発明は、上記目的を達成するため、処理対象情報に含まれる音声情報に基づき、前記処理対象情報中の音楽区間を検出する音楽区間検出部と、前記音声情報中の無音部分を検出する音量解析部と、前記音楽区間の開始位置に最も近い前記無音部分を新たな開始位置とし、前記音楽区間の終了位置に最も近い前記無音部分を新たな終了位置として、前記音楽区間を補正する音楽区間補正部とを有することを特徴とする音楽検出装置を提供する。 (1) In order to achieve the above object, the present invention provides a music section detection unit that detects a music section in the processing target information based on voice information included in the processing target information, and a silent portion in the voice information. A volume analysis unit to detect, and the silence section closest to the start position of the music section as a new start position, and the silence section closest to the end position of the music section as a new end position, the music section is corrected There is provided a music detecting device including a music section correcting unit.

このような構成によれば、音楽区間検出部が検出した音楽区間の開始位置および終了位置を、音量解析部が検出した無音部分に補正するため、音楽区間の開始位置および終了位置に無音部分が出現する頻度が多い音楽番組等を録画した場合に音楽区間の検出精度を向上することができる。 According to such a configuration, since the start position and end position of the music section detected by the music section detection unit are corrected to the silence part detected by the volume analysis unit, there is a silence part at the start position and end position of the music section. When a music program or the like that frequently appears is recorded, the detection accuracy of the music section can be improved.

（２）また、本発明は、上記目的を達成するため、処理対象情報に含まれる音声情報に基づき、前記処理対象情報中の音楽区間を検出する音楽区間検出部と、前記音声情報中の無音部分を検出する音量解析部と、前記処理対象情報に含まれる映像情報に基づき、前記処理対象情報中のトーク部分を検出する映像解析部と、前記音楽区間の開始位置付近の前記無音部分のうち前記トーク部分に属さない無音部分で前記開始位置に最も近い前記無音部分を新たな開始位置とし、前記音楽区間の終了位置付近の前記無音部分のうち前記トーク部分に属さない無音部分で前記終了位置に最も近い前記無音部分を新たな終了位置として、前記音楽区間を補正する音楽区間補正部とを有することを特徴とする音楽検出装置を提供する。 (2) Further, in order to achieve the above object, the present invention provides a music section detection unit that detects a music section in the processing target information based on voice information included in the processing target information, and silence in the voice information. A volume analysis unit for detecting a part, a video analysis unit for detecting a talk part in the processing target information based on the video information included in the processing target information, and the silent part near the start position of the music section. The silence part that does not belong to the talk part and that is closest to the start position is set as a new start position, and the end position is the silence part that does not belong to the talk part among the silence parts near the end position of the music section. And a music section correction unit that corrects the music section with the silent part closest to the position as a new end position.

このような構成によれば、音楽区間補正部は、トーク部分に属する無音部分を省いて、音楽区間検出部が検出した音楽区間の開始位置および終了位置を無音部分に補正するため、トーク部分が出現する頻度が多い音楽番組等を録画した録画情報においては、無音部分を含む確率が高いトーク部分を省いて音楽区間の検出精度を向上することができる。 According to such a configuration, the music section correction unit omits the silent part belonging to the talk part and corrects the start position and the end position of the music section detected by the music section detection part to the silent part. In recorded information that records a music program or the like that appears frequently, it is possible to improve the accuracy of detecting a music section by omitting a talk portion that has a high probability of including a silent portion.

（３）また、本発明は、上記目的を達成するため、処理対象情報に含まれる音声情報に基づき、前記処理対象情報中の音楽区間を検出する音楽区間検出部と、前記処理対象情報に含まれる映像情報に基づき、前記処理対象情報中の映像変化部分を検出する映像解析部と、前記音楽区間の開始位置に最も近い前記映像変化部分を新たな開始位置とし、前記音楽区間の終了位置に最も近い前記映像変化部分を新たな終了位置として、前記音楽区間を補正する音楽区間補正部とを有することを特徴とする音楽検出装置を提供する。 (3) Further, in order to achieve the above object, the present invention includes a music section detection unit that detects a music section in the processing target information based on audio information included in the processing target information, and is included in the processing target information. A video analysis unit for detecting a video change part in the processing target information based on the video information to be processed, and the video change part closest to the start position of the music section as a new start position, at the end position of the music section There is provided a music detection device comprising a music section correction unit for correcting the music section with the closest video change portion as a new end position.

このような構成によれば、音楽区間補正部は、映像変化部分に基づいて、音楽区間検出部が検出した音楽区間の開始位置および終了位置を補正するため、音楽区間においてトーク部分と別セットを組むような音楽番組等を録画した録画情報においては、音楽区間の検出精度を向上することができる。 According to such a configuration, the music section correction unit corrects the start position and the end position of the music section detected by the music section detection unit based on the video change part, so that a different set from the talk part is set in the music section. In recording information in which a music program or the like to be assembled is recorded, the detection accuracy of the music section can be improved.

（４）また、本発明は、上記目的を達成するため、処理対象情報に含まれる音声情報に基づき、前記処理対象情報中の音楽区間を検出する音楽区間検出部と、前記処理対象情報に含まれる字幕情報に基づき、前記処理対象情報中の音楽字幕部分を検出する字幕検出部と、前記音楽区間の開始位置付近の前記音楽字幕部分の出現部分を新たな開始位置とし、前記音楽区間の終了位置付近の前記音楽字幕部分の消失部分を新たな終了位置として、前記音楽区間を補正する音楽区間補正部とを有することを特徴とする音楽検出装置を提供する。 (4) In order to achieve the above object, the present invention includes a music section detection unit that detects a music section in the processing target information based on audio information included in the processing target information, and is included in the processing target information. A subtitle detection unit for detecting a music subtitle portion in the processing target information based on the subtitle information to be processed, and an appearance portion of the music subtitle portion near the start position of the music section as a new start position, and the end of the music section There is provided a music detection apparatus comprising a music section correction unit that corrects the music section with a disappearance portion of the music subtitle portion near the position as a new end position.

このような構成によれば、音楽区間補正部は、音楽字幕部分に基づいて、音楽区間検出部が検出した音楽区間の開始位置および終了位置を補正するため、音楽区間において歌詞が表示されたり、タイトルや作詞／作曲者名が表示されたりするような音楽番組等を録画した録画情報において、音楽区間の検出精度を向上することができる。 According to such a configuration, the music section correction unit corrects the start position and end position of the music section detected by the music section detection unit based on the music subtitle portion, so that the lyrics are displayed in the music section, It is possible to improve the detection accuracy of a music section in recorded information in which a music program or the like in which a title or a song / song name is displayed is recorded.

本発明によれば、精度良く音楽区間を検出できる。 According to the present invention, a music section can be detected with high accuracy.

以下に、本発明の音楽検出装置の実施の形態を、図面を参照して詳細に説明する。 Embodiments of a music detection apparatus according to the present invention will be described below in detail with reference to the drawings.

〔第１の実施の形態〕
（音楽検出装置の構成）
図１は、本発明の第１の実施の形態に係る音楽検出装置の構成を示す概略図である。 [First Embodiment]
(Configuration of music detector)
FIG. 1 is a schematic diagram showing the configuration of a music detection apparatus according to the first embodiment of the present invention.

音楽検出装置１は、録画情報６０に含まれる音声情報３０を入力して音楽検出区間１００を検出する音楽区間検出部１０と、音楽区間検出部１０の検出した音楽検出区間１００を補正して音楽区間情報７０を出力する音楽区間補正部２０と、音声情報３０を解析して解析結果を音楽区間補正部２０へ出力する音量解析部２１と、録画情報６０に含まれる映像情報４０を解析して解析結果を音楽区間補正部２０へ出力する映像解析部２２と、録画情報６０に含まれる字幕情報５０または映像情報４０を解析して解析結果を音楽区間補正部２０へ出力する字幕・テロップ検出部２３とを有する。 The music detection apparatus 1 receives the audio information 30 included in the recording information 60 and detects the music detection section 100 and the music detection section 100 detected by the music section detection section 10 to correct the music. The music section correction unit 20 that outputs the section information 70, the volume analysis unit 21 that analyzes the audio information 30 and outputs the analysis result to the music section correction unit 20, and the video information 40 included in the recording information 60 are analyzed. A video analysis unit 22 that outputs the analysis result to the music section correction unit 20 and a subtitle / telop detection unit that analyzes the caption information 50 or the video information 40 included in the recording information 60 and outputs the analysis result to the music section correction unit 20 23.

音楽検出装置１は、例えば、ハードディスクレコーダーやパーソナルコンピューターに内蔵され、各部はハードウエアで設置されてもよいし、ソフトウエアとしてインストールされてもよい。また、一部をハードウエア、他部をソフトウエアとして構成してもよい。 The music detection device 1 is built in, for example, a hard disk recorder or a personal computer, and each unit may be installed by hardware or may be installed as software. Further, a part may be configured as hardware, and the other part may be configured as software.

音声情報３０、映像情報４０および字幕情報５０を含む録画情報６０は、例えば、ＭＰＥＧ（Moving Picture Experts Group）ムービー等の圧縮された情報によって構成される。音声情報３０は、左右に独立した２チャンネルの音声を出力する。また、録画情報６０は、図示しないチューナーを介して受信したデジタルテレビ放送を録画した情報、ＤＶＤ等に記録された情報、またはリアルタイムで受信するデジタルテレビ放送の情報であってもよい。 The recording information 60 including the audio information 30, the video information 40, and the caption information 50 is configured by compressed information such as an MPEG (Moving Picture Experts Group) movie. The audio information 30 outputs two independent channels of audio on the left and right. The recording information 60 may be information recorded on a digital television broadcast received via a tuner (not shown), information recorded on a DVD or the like, or digital television broadcast information received in real time.

音楽区間検出部１０は、入力した音声情報３０を所定の区間に分割し、各区間ごとに音声情報３０のチャンネル間の音量差、および両チャンネルの合計音量を算出し、音量差と合計音量の比を閾値と比較することで音楽区間を検出する。 The music section detection unit 10 divides the input audio information 30 into predetermined sections, calculates the volume difference between the channels of the audio information 30 and the total volume of both channels for each section, and calculates the volume difference and the total volume. A music segment is detected by comparing the ratio with a threshold.

また、音楽区間検出部１０は、音楽区間が重なる頻度の高いＣＭ区間と、実際の音楽区間とを区別するために、ＣＭ区間と音楽区間の重なり区間と、実際の音楽区間とを比較し、音楽区間の出現比を算出することでＣＭ区間と音楽区間の重なり区間を非音楽区間として処理する手段（図示せず）を備える。 Further, the music section detection unit 10 compares the CM section and the overlapping section of the music section with the actual music section in order to distinguish the CM section having a high frequency of overlapping the music section from the actual music section, Means (not shown) for processing the overlapping section of the CM section and the music section as a non-music section by calculating the appearance ratio of the music section.

音量解析部２１は、音声情報３０を入力し、音声情報３０の音声の各チャンネルの音量振幅を解析して、所定の閾値以下になった場合に無音部分と定義する。なお、無音部分は、音声情報３０をデコードする前の状態であるオーディオストリームをスペクトル解析して検出するようにしてもよい。 The sound volume analysis unit 21 inputs the sound information 30, analyzes the sound volume amplitude of each channel of the sound of the sound information 30, and defines the soundless portion when the sound information is below a predetermined threshold. Note that the silent portion may be detected by spectrum analysis of an audio stream that is in a state before the audio information 30 is decoded.

音楽区間補正部２０は、音楽検出区間１００を補正した音楽区間情報７０を出力し、音楽区間情報７０は、図示しないチャプタ番号登録部に入力され、録画情報６０にチャプタ番号が登録されることで、利用者は、録画情報６０の音楽区間を容易に検索することができるようになる。 The music section correction unit 20 outputs music section information 70 obtained by correcting the music detection section 100. The music section information 70 is input to a chapter number registration unit (not shown), and the chapter number is registered in the recording information 60. The user can easily search for the music section of the recording information 60.

図２は、本発明の第１の実施の形態に係る音楽検出装置の動作を示す概略図である。 FIG. 2 is a schematic diagram showing the operation of the music detection apparatus according to the first embodiment of the present invention.

音楽区間検出部１０は、音声情報３０を解析することで音楽検出区間１００（ｔ_２〜ｔ_４）を検出する。次に、音量解析部２１が、音声情報３０を解析し、音声情報３０中から無音部分を抽出する。次に、音楽区間補正部２０が、音楽検出区間１００の開始時間（ｔ_２）に最も近い無音部分を無音検出部分２１０（ｔ_１）として、音楽検出区間１００の開始時間を補正し、音楽検出区間１００の終了時間（ｔ_４）に最も近い無音部分を無音検出部分２１０（ｔ_３）として、音楽検出区間１００の終了時間を補正する。 The music section detection unit 10 detects the music detection section 100 (t _{2 to} t ₄ ) by analyzing the audio information 30. Next, the sound volume analysis unit 21 analyzes the sound information 30 and extracts a silent part from the sound information 30. Next, the music section correction unit 20 corrects the start time of the music detection section 100 by setting the silence part closest to the start time (t ₂ ) of the music detection section 100 as the silence detection part 210 (t ₁ ), and detects music. The silence part closest to the end time (t ₄ ) of the section 100 is set as the silence detection part 210 (t ₃ ), and the end time of the music detection section 100 is corrected.

図３は、本発明の第１の実施の形態に係る音楽区間検出部の動作を示すフローチャートである。 FIG. 3 is a flowchart showing the operation of the music section detection unit according to the first embodiment of the present invention.

まず、音楽区間検出部１０は、音声情報３０を入力する（Ｓ１０）。次に、音楽区間検出部１０は、入力した音声情報３０を時間について所定の長さの区間に分割する（Ｓ１１）。次に、ある区間について、音声情報３０の音声２チャンネルについて音量差Ｖ_ｄを算出する（Ｓ１２）。次に、音声２チャンネルについて合計音量Ｖ_ｓを算出する（Ｓ１３）。次に、Ｖ_ｄとＶ_ｓとの比Ｖ_ｆを算出する（Ｓ１４）。 First, the music section detection unit 10 inputs audio information 30 (S10). Next, the music section detection unit 10 divides the input audio information 30 into sections having a predetermined length with respect to time (S11). Next, for a certain section, the volume difference _Vd is calculated for the two audio channels of the audio information 30 (S12). Next, the total volume V _s is calculated for the two audio channels (S13). Next, a ratio V _f between V _d and V _s is calculated (S14).

比Ｖ_ｆが予め定めた閾値より大きい状態が、所定時間、例えば１分以上継続する場合（Ｓ１５；Ｙｅｓ）、その区間に音楽区間フラグを付与する（Ｓ１６）。すべての区間についてＳ１２〜Ｓ１６の処理が終了すると（Ｓ１７；Ｙｅｓ）、音楽区間フラグの付いた区間をマージして音楽検出区間１００を音楽区間補正部２０へ出力する。 When the state in which the ratio V _f is greater than a predetermined threshold continues for a predetermined time, for example, 1 minute or longer (S15; Yes), a music section flag is assigned to the section (S16). When the processing of S12 to S16 is completed for all the sections (S17; Yes), the sections with the music section flag are merged, and the music detection section 100 is output to the music section correction unit 20.

図４は、本発明の第１の実施の形態に係る音量解析部および音楽区間補正部の動作を示すフローチャートである。 FIG. 4 is a flowchart showing operations of the volume analysis unit and the music section correction unit according to the first embodiment of the present invention.

まず、音楽区間補正部２０は、音楽区間検出部１０から音楽検出区間１００を入力し、音楽検出区間１００の開始位置（ｔ_２）と終了位置（ｔ_４）を取得する（Ｓ２０）。次に、音量解析部２１は、音声情報３０を入力し、音楽検出区間１００の開始位置付近における無音部分を検出する（Ｓ２１）。また、音楽検出区間１００の終了位置付近における無音部分を検出する（Ｓ２２）。 First, the music section correction unit 20 receives the music detection section 100 from the music section detection section 10 and acquires the start position (t ₂ ) and end position (t ₄ ) of the music detection section 100 (S20). Next, the sound volume analysis unit 21 receives the audio information 30 and detects a silent portion near the start position of the music detection section 100 (S21). Further, a silent portion near the end position of the music detection section 100 is detected (S22).

次に、音楽区間補正部２０は、音量解析部２１が検出した無音部分のうち、音楽検出区間１００の開始位置に最も近い無音部分、および終了位置に最も近い無音部分をそれぞれ無音検出部分２１０（ｔ_１、ｔ_３）と定義する（Ｓ２３）。次に、音楽検出区間１００の開始位置および終了位置をそれぞれ無音検出部分２１０に補正する（Ｓ２４）。次に、補正した音楽検出区間１００（ｔ_１〜ｔ_３）について音楽区間情報７０を出力する（Ｓ２５）。 Next, the music section correction unit 20 detects the silence part closest to the start position of the music detection section 100 and the silence part closest to the end position among the silence parts detected by the volume analysis unit 21, respectively. _t _1, t ₃₎ to define (S23). Next, the start position and end position of the music detection section 100 are respectively corrected to the silence detection portion 210 (S24). Next, the music section information 70 is output for the corrected music detection section 100 (t _{1 to} t ₃ ) (S25).

（第１の実施の形態の効果）
上記した第１の実施の形態によると、音楽区間補正部２０は、音楽区間検出部１０が検出した音楽検出区間１００の開始位置および終了位置を、音量解析部２１が検出した無音検出部分２１０に補正する。音楽区間の開始位置および終了位置に無音部分が出現する頻度が多い音楽番組等を録画した録画情報６０において、高い確率で音楽区間の検出精度を向上することができる。また、音楽区間の開始位置および終了位置にチャプタ番号を登録することで、利用者が容易に音楽区間を検索できるようになる。 (Effects of the first embodiment)
According to the first embodiment described above, the music section correction unit 20 sets the start position and end position of the music detection section 100 detected by the music section detection unit 10 to the silence detection part 210 detected by the volume analysis unit 21. to correct. In the recording information 60 that records a music program or the like in which a silent part frequently appears at the start position and the end position of the music section, the detection accuracy of the music section can be improved with a high probability. Also, by registering chapter numbers at the start position and end position of the music section, the user can easily search for the music section.

〔第２の実施の形態〕
図５は、本発明の第２の実施の形態に係る音楽検出装置の動作を示す概略図である。なお、以下の説明において、第１の実施の形態と同一の構成および機能を有する部分については共通の符号を付している。 [Second Embodiment]
FIG. 5 is a schematic diagram showing the operation of the music detection apparatus according to the second embodiment of the present invention. In the following description, parts having the same configuration and function as those of the first embodiment are denoted by common reference numerals.

音楽区間検出部１０は、音声情報３０を解析することで音楽検出区間１００（ｔ_１４〜ｔ_１７）を検出する。次に、音量解析部２１が、音声情報３０を解析し、音声情報３０中から無音部分を抽出する。次に、映像解析部２２が、映像情報４０を解析し、トーク検出部分２２０（ｔ_１１〜ｔ_１３）を検出する。次に、音楽区間補正部２０が、音楽検出区間１００の開始時間に最も近い無音部分で、トーク検出部分２２０に属さない無音部分を無音検出部分２１０（ｔ_１５）として、音楽検出区間１００の開始時間を補正し、音楽検出区間１００の終了時間に最も近い無音部分を無音検出部分２１０（ｔ_１６）として、音楽検出区間１００の終了時間を補正する。 The music section detection unit 10 detects the music detection section 100 (t _{14 to} t ₁₇ ) by analyzing the audio information 30. Next, the sound volume analysis unit 21 analyzes the sound information 30 and extracts a silent part from the sound information 30. Next, the video analysis unit 22 analyzes the video information 40 and detects a talk detection portion 220 (t _{11 to} t ₁₃ ). Next, the music section correction unit 20 sets the silence part closest to the start time of the music detection section 100 as a silence detection part 210 (t ₁₅ ) that does not belong to the talk detection part 220, and starts the music detection section 100. The time is corrected, and the end time of the music detection section 100 is corrected by setting the silence portion closest to the end time of the music detection section 100 as the silence detection section 210 (t ₁₆ ).

トーク検出部分２２０は、例えば、デジタルテレビ放送の音楽番組において、司会者とアーティストが会話するシーンであり、司会者のカット、アーティストのカットが交互に現れるようなシーンを指す。 The talk detection part 220 is a scene in which a moderator and an artist have a conversation in a music program of a digital television broadcast, for example, and indicates a scene in which a moderator's cut and an artist's cut appear alternately.

図６は、本発明の第２の実施の形態に係る映像解析部の動作を示すフローチャートである。 FIG. 6 is a flowchart showing the operation of the video analysis unit according to the second embodiment of the present invention.

まず、映像解析部２２に音楽検出区間１００の開始位置（ｔ_１４）と終了位置（ｔ_１７）それぞれの前後、例えば前後１０秒間の映像情報４０を入力する（Ｓ３０）。次に、映像解析部２２は、入力した映像情報４０から所定の時間おきに定期的に静止画を取得する（Ｓ３１）。次に、取得した複数の静止画を比較して、似ている静止画にフラグを付与する（Ｓ３２）。 First, the video information 40 of 10 seconds before and after each of the start position (t ₁₄ ) and the end position (t ₁₇ ) of the music detection section 100 is input to the video analysis section 22 (S30). Next, the video analysis unit 22 periodically acquires still images from the input video information 40 at predetermined time intervals (S31). Next, the acquired plurality of still images are compared, and a flag is given to similar still images (S32).

静止画を時間順に並べた場合にフラグの並びに特徴的な規則性、例えば、似ている画像１および画像２が交互に現れるような規則性がある場合（Ｓ３３；Ｙｅｓ）、規則性がある区間をトーク検出部分２２０（ｔ_１１〜ｔ_１３）と定義する（Ｓ３４）。 When still images are arranged in time order, there is a regularity of a sequence of flags, for example, when there is a regularity such that similar images 1 and 2 appear alternately (S33; Yes), a section with regularity Is defined as a talk detection portion 220 (t _{11 to} t ₁₃ ) (S 34).

図７は、本発明の第２の実施の形態に係る音楽区間補正部の動作を示すフローチャートである。 FIG. 7 is a flowchart showing the operation of the music section correction unit according to the second embodiment of the present invention.

まず、音楽区間補正部２０は、音楽区間検出部１０から音楽検出区間１００を入力し、音楽検出区間１００の開始位置（ｔ_１４）と終了位置（ｔ_１７）を取得する（Ｓ４０）。次に、音量解析部２１は、音声情報３０を入力し、音楽検出区間１００の開始位置付近における無音部分を検出する（Ｓ４１）。また、音楽検出区間１００の終了位置付近における無音部分を検出する（Ｓ４２）。 First, the music section correction unit 20 receives the music detection section 100 from the music section detection section 10 and acquires the start position (t ₁₄ ) and end position (t ₁₇ ) of the music detection section 100 (S40). Next, the sound volume analysis unit 21 receives the audio information 30 and detects a silent portion near the start position of the music detection section 100 (S41). Further, a silent portion near the end position of the music detection section 100 is detected (S42).

次に、音楽区間補正部２０は、音量解析部２１が検出した無音部分のうち、図６の動作において定義したトーク検出部分２２０（ｔ_１１〜ｔ_１３）に属する無音部分２１０ａ（ｔ_１２）を排除する（Ｓ４３）。次に、音楽検出区間１００の開始位置に最も近い無音部分、および終了位置に最も近い無音部分をそれぞれ無音検出部分２１０（ｔ_１５、ｔ_１６）と定義する（Ｓ４４）。次に、音楽検出区間１００の開始位置および終了位置をそれぞれ無音検出部分２１０に補正する（Ｓ４５）。次に、補正した音楽検出区間１００（ｔ_１５〜ｔ_１６）について音楽区間情報７０を出力する（Ｓ４６）。 Next, the music section correction unit 20 selects the silent part 210a (t ₁₂ ) belonging to the talk detection part 220 (t _{11 to} t ₁₃ ) defined in the operation of FIG. 6 among the silent parts detected by the volume analysis unit 21. Eliminate (S43). Next, the silence part closest to the start position of the music detection section 100 and the silence part closest to the end position are respectively defined as silence detection parts 210 (t ₁₅ , t ₁₆ ) (S44). Next, the start position and end position of the music detection section 100 are respectively corrected to the silence detection portion 210 (S45). Next, the corrected musical detection section ₁₀₀ _(t 15 _{~t 16)} for outputting a music section information 70 (S46).

（第２の実施の形態の効果）
上記した第２の実施の形態によると、第１の実施の形態に加え、音楽区間補正部２０は、トーク検出部分２２０に属する無音部分２１０ａを省いて、音楽区間検出部１０が検出した音楽検出区間１００の開始位置および終了位置を補正する。トーク検出部分２２０が出現する頻度が多い音楽番組等を録画した録画情報においては、トーク検出部分２２０に無音部分を含む確率が高いため、トーク検出部分２２０の無音部分を排除することで、高い確率で音楽区間の検出精度を向上することができる。 (Effect of the second embodiment)
According to the second embodiment described above, in addition to the first embodiment, the music section correction unit 20 omits the silent part 210a belonging to the talk detection part 220 and detects the music detected by the music section detection unit 10. The start position and end position of the section 100 are corrected. In recorded information obtained by recording a music program or the like in which the talk detection part 220 appears frequently, there is a high probability that the talk detection part 220 includes a silent part. Therefore, by eliminating the silent part of the talk detection part 220, a high probability is obtained. Thus, the detection accuracy of the music section can be improved.

なお、トーク検出部分２２０は、音声情報３０を解析して会話を検出してもよい。 Note that the talk detection unit 220 may detect the conversation by analyzing the audio information 30.

〔第３の実施の形態〕
図８は、本発明の第３の実施の形態に係る音楽検出装置の動作を示す概略図である。 [Third Embodiment]
FIG. 8 is a schematic diagram showing the operation of the music detection apparatus according to the third embodiment of the present invention.

音楽区間検出部１０は、音声情報３０を解析することで音楽検出区間１００（ｔ_２３〜ｔ_２７）を検出する。次に、映像解析部２２が、映像情報４０を解析し、映像情報４０中から特徴量の大きく変化する変化部分２２１を抽出する。次に、音楽区間補正部２０が、音楽検出区間１００の開始時間に最も近い変化部分２２１を映像切替検出部分２２２（ｔ_２２）として、音楽検出区間１００の開始時間を補正し、音楽検出区間１００の終了時間に最も近い無音部分を映像切替検出部分２２２（ｔ_２６）として、音楽検出区間１００の終了時間を補正する。 The music section detection unit 10 detects the music detection section 100 (t _{23 to} t ₂₇ ) by analyzing the audio information 30. Next, the video analysis unit 22 analyzes the video information 40 and extracts a change portion 221 in which the feature amount greatly changes from the video information 40. Next, the music section correction unit 20 corrects the start time of the music detection section 100 with the change portion 221 closest to the start time of the music detection section 100 as the video switching detection section 222 (t ₂₂ ), and the music detection section 100. The end time of the music detection section 100 is corrected by setting the silent part closest to the end time of the video as the video switching detection part 222 (t ₂₆ ).

図９は、本発明の第３の実施の形態に係る映像解析部および音楽区間補正部の動作を示すフローチャートである。 FIG. 9 is a flowchart showing operations of the video analysis unit and the music section correction unit according to the third embodiment of the present invention.

まず、映像解析部２２に音楽検出区間１００の開始位置（ｔ_２３）と終了位置（ｔ_２７）それぞれの前後、例えば前後１０秒間の映像情報４０を入力する（Ｓ５０）。次に、映像解析部２２は、映像情報４０を監視し、映像の輝度値および色相値を測定する（Ｓ５１）。測定の結果、輝度値または色相値が大きく変化する部分を変化部分として検出する（Ｓ５２）。 First, the video information 40 of 10 seconds before and after each of the start position (t ₂₃ ) and the end position (t ₂₇ ) of the music detection section 100 is input to the video analysis section 22 (S50). Next, the video analysis unit 22 monitors the video information 40 and measures the luminance value and hue value of the video (S51). As a result of the measurement, a portion where the luminance value or hue value changes greatly is detected as a changed portion (S52).

次に、音楽区間補正部２０は、映像解析部２２が検出した変化部分のうち、音楽検出区間１００の開始位置に最も近い変化部分、および終了位置に最も近い変化部分をそれぞれ映像切替検出部分２２２（ｔ_２２、ｔ_２６）と定義する（Ｓ５３）。次に、音楽検出区間１００の開始位置および終了位置をそれぞれ映像切替検出部分２２２に補正する（Ｓ５４）。次に、補正した音楽検出区間１００（ｔ_２２〜ｔ_２６）について音楽区間情報７０を出力する（Ｓ５５）。 Next, the music section correction unit 20 detects a change part closest to the start position and a change part closest to the end position of the music detection section 100 among the change parts detected by the video analysis unit 22, respectively. It is defined as (t ₂₂ , t ₂₆ ) (S53). Next, the start position and end position of the music detection section 100 are each corrected to the video switching detection portion 222 (S54). Next, the corrected musical detection section ₁₀₀ _(t 22 _{~t 26)} for outputting a music section information 70 (S55).

（第３の実施の形態の効果）
上記した第３の実施の形態によると、音楽区間補正部２０は、映像切替検出部分２２２に基づいて、音楽区間検出部１０が検出した音楽検出区間１００の開始位置および終了位置を補正する。音楽区間においてトーク部分と別セットを組むような音楽番組等を録画した録画情報においては、音楽区間において照明効果等によってトーク部分とは異なる色相や輝度を有するシーンを含む確率が高いため、輝度値または色相値の変化する部分に音楽区間を補正することで、高い確率で音楽区間の検出精度を向上することができる。 (Effect of the third embodiment)
According to the third embodiment described above, the music section correction unit 20 corrects the start position and end position of the music detection section 100 detected by the music section detection unit 10 based on the video switching detection part 222. Recorded information that records a music program or the like that forms a separate set with the talk part in the music section has a high probability of including a scene having a hue or brightness different from that of the talk part due to lighting effects or the like in the music section. Alternatively, the music section detection accuracy can be improved with high probability by correcting the music section to the portion where the hue value changes.

なお、映像切替検出部分２２２は、輝度値および色相値以外の特徴量を解析して検出してもよい。 Note that the video switching detection portion 222 may analyze and detect a feature amount other than the luminance value and the hue value.

〔第４の実施の形態〕
図１０は、本発明の第４の実施の形態に係る音楽検出装置の動作を示す概略図である。 [Fourth Embodiment]
FIG. 10 is a schematic diagram showing the operation of the music detection apparatus according to the fourth embodiment of the present invention.

音楽区間検出部１０は、音声情報３０を解析することで音楽検出区間１００（ｔ_３２〜ｔ_３４）を検出する。次に、字幕・テロップ検出部２３が、字幕情報５０を解析し、字幕情報５０中から音楽に関連する文字列、例えば、音符を表す記号や、「作詞」、「作曲」等が検出される音楽字幕検出部分２３２を抽出する。次に、音楽区間補正部２０が、音楽字幕検出部分２３２の開始位置を音楽字幕検出出現部分２３０（ｔ_３１）として、音楽検出区間１００の開始時間を補正し、音楽字幕検出部分２３２の終了位置を音楽字幕検出消失部分２３１（ｔ_３３）として、音楽検出区間１００の終了時間を補正する。 The music section detection unit 10 detects the music detection section 100 (t _{32 to} t ₃₄ ) by analyzing the audio information 30. Next, the caption / telop detection unit 23 analyzes the caption information 50 and detects a character string related to music, for example, a symbol representing a note, “lyrics”, “composition”, etc. from the caption information 50. The music subtitle detection part 232 is extracted. Next, the music section correction unit 20 corrects the start time of the music detection section 100 using the start position of the music subtitle detection part 232 as the music subtitle detection appearance part 230 (t ₃₁ ), and ends the music subtitle detection part 232. As the music subtitle detection disappearance portion 231 (t ₃₃ ), the end time of the music detection section 100 is corrected.

図１１は、本発明の第４の実施の形態に係る字幕・テロップ検出部および音楽区間補正部の動作を示すフローチャートである。 FIG. 11 is a flowchart showing operations of the caption / telop detection unit and the music section correction unit according to the fourth embodiment of the present invention.

まず、字幕・テロップ検出部２３に音楽検出区間１００の開始位置（ｔ_３２）と終了位置（ｔ_３４）それぞれの前後、例えば前後１０秒間の字幕情報５０を入力する（Ｓ６０）。次に、字幕・テロップ検出部２３は、字幕情報５０を監視する（Ｓ６１）。監視の結果、特定の文字列が抽出される部分を音楽字幕検出部分２３２（ｔ_３１〜ｔ_３３）として検出する（Ｓ６２）。 First, the subtitle information 50 is input to the subtitle / telop detection unit 23 before and after the start position (t ₃₂ ) and the end position (t ₃₄ ) of the music detection section 100, for example, 10 seconds before and after (S60). Next, the caption / telop detection unit 23 monitors the caption information 50 (S61). As a result of monitoring, a part from which a specific character string is extracted is detected as a music subtitle detection part 232 (t _{31 to} t ₃₃ ) (S 62).

次に、音楽区間補正部２０は、字幕・テロップ検出部２３が検出した音楽字幕検出部分２３２の開始位置、つまり、音楽字幕検出出現部分２３０（ｔ_３１）に音楽検出区間１００の開始位置を補正する（Ｓ６３）。次に、音楽区間補正部２０は、字幕・テロップ検出部２３が検出した音楽字幕検出部分２３２の終了位置、つまり、音楽字幕検出消失部分２３１（ｔ_３３）に音楽検出区間１００の終了位置を補正する（Ｓ６５）。次に、補正した音楽検出区間１００（ｔ_３１〜ｔ_３３）について音楽区間情報７０を出力する（Ｓ６５）。 Next, the music section correction unit 20 corrects the start position of the music subtitle detection part 232 detected by the subtitle / telop detection unit 23, that is, the start position of the music detection section 100 to the music subtitle detection appearance part 230 (t ₃₁ ). (S63). Next, the music section correction unit 20 corrects the end position of the music subtitle detection portion 232 detected by the subtitle / telop detection unit 23, that is, the end position of the music detection section 100 to the music subtitle detection disappearance portion 231 (t ₃₃ ). (S65). Next, the music section information 70 is output for the corrected music detection section 100 (t _{31 to} t ₃₃ ) (S65).

（第４の実施の形態の効果）
上記した第４の実施の形態によると、音楽区間補正部２０は、音楽字幕検出部分２３２に基づいて、音楽区間検出部１０が検出した音楽検出区間１００の開始位置および終了位置を補正する。音楽区間において歌詞が表示されたり、タイトルや作詞／作曲者名が表示されたりするような音楽番組等を録画した録画情報において、音楽字幕検出部分２３２に音楽検出区間１００を補正することで、高い確率で音楽区間の検出精度を向上することができる。 (Effect of the fourth embodiment)
According to the fourth embodiment described above, the music segment correction unit 20 corrects the start position and end position of the music detection segment 100 detected by the music segment detection unit 10 based on the music subtitle detection part 232. It is high by correcting the music detection section 100 in the music subtitle detection portion 232 in the recording information recording the music program or the like in which the lyrics are displayed in the music section or the title or the lyrics / composer name is displayed. The detection accuracy of the music section can be improved with probability.

なお、音楽字幕検出部分２３２の検出は、字幕情報５０を解析する他、映像情報４０に含まれるテロップ等を文字解析して検出してもよい。 Note that the music subtitle detection portion 232 may be detected by analyzing the subtitle information 50 as well as character analysis of a telop or the like included in the video information 40.

本発明の第１の実施の形態に係る音楽検出装置の構成を示す概略図である。It is the schematic which shows the structure of the music detection apparatus which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る音楽検出装置の動作を示す概略図である。It is the schematic which shows operation | movement of the music detection apparatus which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る音楽区間検出部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the music area detection part which concerns on the 1st Embodiment of this invention. 本発明の第１の実施の形態に係る音量解析部および音楽区間補正部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the volume analysis part and music section correction | amendment part which concern on the 1st Embodiment of this invention. 本発明の第２の実施の形態に係る音楽検出装置の動作を示す概略図である。It is the schematic which shows operation | movement of the music detection apparatus which concerns on the 2nd Embodiment of this invention. 本発明の第２の実施の形態に係る映像解析部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the image | video analysis part which concerns on the 2nd Embodiment of this invention. 本発明の第２の実施の形態に係る音楽区間補正部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the music area correction | amendment part which concerns on the 2nd Embodiment of this invention. 本発明の第３の実施の形態に係る音楽検出装置の動作を示す概略図である。It is the schematic which shows operation | movement of the music detection apparatus which concerns on the 3rd Embodiment of this invention. 本発明の第３の実施の形態に係る映像解析部および音楽区間補正部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the video-analysis part which concerns on the 3rd Embodiment of this invention, and a music area correction | amendment part. 本発明の第４の実施の形態に係る音楽検出装置の動作を示す概略図である。It is the schematic which shows operation | movement of the music detection apparatus which concerns on the 4th Embodiment of this invention. 本発明の第４の実施の形態に係る字幕・テロップ検出部および音楽区間補正部の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the subtitle and telop detection part and music area correction | amendment part which concern on the 4th Embodiment of this invention.

Explanation of symbols

１…音楽検出装置、１０…音楽区間検出部、２０…音楽区間補正部、２１…音量解析部、２２…映像解析部、２３…字幕・テロップ検出部、３０…音声情報、４０…映像情報、５０…字幕情報、６０…録画情報、７０…音楽区間情報、１００…音楽検出区間、２１０…無音検出部分、２１０ａ…無音部分、２２０…トーク検出部分、２２１…変化部分、２２２…映像切替検出部分、２３０…音楽字幕検出出現部分、２３１…音楽字幕検出消失部分、２３２…音楽字幕検出部分 DESCRIPTION OF SYMBOLS 1 ... Music detection apparatus, 10 ... Music section detection part, 20 ... Music section correction part, 21 ... Volume analysis part, 22 ... Image | video analysis part, 23 ... Subtitle / telop detection part, 30 ... Audio information, 40 ... Video information, 50 ... subtitle information, 60 ... recording information, 70 ... music section information, 100 ... music detection section, 210 ... silence detection part, 210a ... silence part, 220 ... talk detection part, 221 ... change part, 222 ... video switching detection part 230 ... Music subtitle detection appearance part, 231 ... Music subtitle detection disappearance part, 232 ... Music subtitle detection part

Claims

A music section detection unit that detects a music section in the processing target information based on audio information included in the processing target information;
A volume analysis unit for detecting a silent portion in the audio information;
A music section correction unit that corrects the music section with the silent part closest to the start position of the music section as a new start position and the silent part closest to the end position of the music section as a new end position; A music detection device comprising:

A music section detection unit that detects a music section in the processing target information based on audio information included in the processing target information;
A volume analysis unit for detecting a silent portion in the audio information;
A video analysis unit that detects a talk portion in the processing target information based on video information included in the processing target information;
Among the silent portions near the start position of the music section, the silent portions that do not belong to the talk portion and are closest to the start position are set as new start positions, and the silent portions near the end position of the music section A music detection apparatus comprising: a music section correction unit that corrects the music section using the silent part that does not belong to the talk part and is closest to the end position as a new end position.

The music detection apparatus according to claim 2, wherein the video analysis unit periodically acquires a still image from the video information and detects the talk portion based on an appearance rule of the still image.

A music section detection unit that detects a music section in the processing target information based on audio information included in the processing target information;
A video analysis unit that detects a video change portion in the processing target information based on the video information included in the processing target information;
A music section correction unit that corrects the music section using the video change portion closest to the start position of the music section as a new start position and the video change portion closest to the end position of the music section as a new end position. And a music detection device.

5. The video analysis unit according to claim 4, wherein the video analysis unit acquires at least one of a hue and luminance of the video information and detects the video change portion based on at least one of the hue and the luminance. The music detection device described.

A music section detection unit that detects a music section in the processing target information based on audio information included in the processing target information;
A subtitle detection unit that detects a music subtitle portion in the processing target information based on subtitle information included in the processing target information;
The music section is corrected using the appearance portion of the music subtitle portion near the start position of the music section as a new start position and the disappearance portion of the music subtitle portion near the end position of the music section as a new end position. A music detection apparatus comprising a music section correction unit.

The music detection device according to claim 6, wherein the subtitle detection unit detects the music subtitle portion when a specific character string is extracted from the subtitle information.

The music detection apparatus according to claim 6, wherein the caption detection unit detects the music caption part from a telop of video information included in the processing target information.