JP2001147697A

JP2001147697A - Method and device for acoustic data analysis

Info

Publication number: JP2001147697A
Application number: JP32948999A
Authority: JP
Inventors: Toshiaki Akimoto; 俊昭秋元
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1999-11-19
Filing date: 1999-11-19
Publication date: 2001-05-29
Anticipated expiration: 2019-11-19
Also published as: JP3757719B2

Abstract

PROBLEM TO BE SOLVED: To detect a voice section with high precision even in the case of a mixed sound signal like a broadcasting program. SOLUTION: The sound signal of a program is inputted to an acoustic data input part 101, and the feature quantity or the like is used to divide it into proper sections by a section division part 102. Acoustic data for every divided section is classified to three patterns of 'voice', 'non-voice', and 'mixture' by a discrimination part 103, and the genre of the program is determined by the pattern of classified acoustic data by a discrimination part 104. The start time, the end time, the degree of importance, etc., of the section are added on the basis of the determined genre to output the analysis result by an output part 107.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明はディジタル放送番組
等のデジタルコンテンツ情報のスキミング、検索、フィ
ルタリングを実現するのに必要な付加情報を自動的に抽
出するための技術に相当する。The present invention relates to a technique for automatically extracting additional information necessary for implementing skimming, searching, and filtering of digital content information such as digital broadcast programs.

【０００２】[0002]

【従来の技術】近年、デジタル技術の発達によって、音
や映像等のデジタルコンテンツ情報を大量に送信し、蓄
積することができるようになった。大量のデジタルコン
テンツ情報の中から、効率的に目的の情報を取り出すた
めの方法として、スキミング、検索、フィルタリング等
が知られているが、この方法を実現するためには、あら
かじめデジタルコンテンツ情報の構造を解析し、構造に
応じた付加情報を付ける必要がある。このような付加情
報を人手に頼って付けていたのでは、コストがかかりす
ぎるため、自動的にデジタルコンテンツ情報の構造を解
析し、構造に適した付加情報を付与する技術の研究開発
が行われている。2. Description of the Related Art In recent years, with the development of digital technology, it has become possible to transmit and store a large amount of digital content information such as sound and video. Skimming, searching, filtering, and the like are known as methods for efficiently extracting target information from a large amount of digital content information. To realize this method, the structure of the digital content information must be determined in advance. Must be analyzed and additional information according to the structure needs to be added. If such additional information had to be relied on manually, it would be too costly.Therefore, research and development of technology to automatically analyze the structure of digital content information and to provide additional information suitable for the structure has been conducted. ing.

【０００３】例えば、USP-5,918,223(Method and artic
le of manufacture for content-based analysis, stor
age, retrieval, and segmentation of audio informat
ion.June 29,1999)では、音の分析方法と検索方法が特
許登録されている。上記従来例は、音量、ピッチ、明る
さ、帯域、メル周波数ケプストラム係数（ＭＦＣＣ）と
いった特徴量を統計処理し、音響情報の種別を判別する
ものである。これは、例えばインターネット上にある単
一音源の音響ファイル（あるいはセグメント）を分類、
検索するのに有効な手法である。また、上記技術を用い
て、連続音をセグメントする方法を３種類示している。[0003] For example, USP-5,918,223 (Method and artic
le of manufacture for content-based analysis, stor
age, retrieval, and segmentation of audio informat
In ion. June 29, 1999), a patent is registered for a sound analysis method and a search method. In the above-mentioned conventional example, feature amounts such as volume, pitch, brightness, band, and mel-frequency cepstrum coefficient (MFCC) are statistically processed to determine the type of acoustic information. This classifies single sound source audio files (or segments) on the Internet, for example.
This is an effective method for searching. In addition, three types of methods for segmenting a continuous sound using the above technique are shown.

【０００４】１番目は、ユーザの提示する音の代表値
（特徴ベクトルの統計量）との類似度がしきい値以上の
領域とそれ以外とにセグメントする方法である。この方
法によって、連続音は、２種類の領域に分割される。ま
た、２番目は、ユーザの提示する音と全く同じ領域とそ
うでない領域にセグメントする方法である。最初の方法
は、特徴ベクトルの統計量を代表値として類似度計算す
るのに対し、２番目の方法は、特徴ベクトルそのものを
使って類似度を計算するところが大きな違いである。３
番目の方法は、連続音の隣接する領域を比較し、類似度
がしきい値以下となるポイント（シーンチェンジ）を検
出するものである。[0004] The first method is a method of segmenting into a region having a similarity with a representative value of a sound presented by the user (a statistic of a feature vector) equal to or larger than a threshold value and a region other than the region. With this method, the continuous sound is divided into two types of regions. The second is a method of segmenting into a region that is exactly the same as the sound presented by the user and a region that is not. The first method calculates the similarity using the statistic of the feature vector as a representative value, whereas the second method calculates the similarity using the feature vector itself. Three
The second method is to compare adjacent regions of a continuous sound and detect a point (scene change) at which the similarity is equal to or less than a threshold value.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら、放送番
組などの音響データを解析してみると、単一音源の音は
少なく、音声と音楽が混合した音、観客の騒音とアナウ
ンサーの声の混合した音等、複数の音が混在しているこ
とがわかる。このように、実際には混合音が多いため、
音の種別を判定するだけの手法は放送番組のようなコン
テンツに対して実用的でないという問題点がある。However, when analyzing the acoustic data of a broadcast program or the like, the sound of a single sound source is small, and the sound of the mixed sound and music, the noise of the audience and the voice of the announcer are mixed. It can be seen that a plurality of sounds such as sounds are mixed. In this way, since there are actually many mixed sounds,
There is a problem that the technique of only determining the type of sound is not practical for contents such as broadcast programs.

【０００６】また、音響データは多様であるため、隣接
する領域の類似度のみでシーンチェンジを検出すると、
過剰に検出される。さらには、どのセグメントがその番
組中重要であるかを判定することも困難である。[0006] Further, since sound data is diverse, if a scene change is detected only by the similarity between adjacent areas,
Excessive detection. Furthermore, it is difficult to determine which segments are important during the program.

【０００７】[0007]

【課題を解決するための手段】この課題を解決するため
に本発明は、音響データを意味のある区間に分割し、前
記区間の重要度を判定する音響データ分析装置におい
て、音響データを複数の区間に分割する音響データ分割
手段と、前記区間の種別を判定する区間種別判定手段
と、区間種別の統計処理結果を用いて音響データのジャ
ンルを判別する区間種別パターン判別手段と、ジャンル
に応じた制御情報を管理する制御情報管理部とを備えた
ものである。According to the present invention, there is provided a sound data analyzing apparatus which divides sound data into meaningful sections and determines the importance of the sections. Sound data dividing means for dividing into sections, section type determining means for determining the type of the section, section type pattern determining means for determining a genre of the sound data using a statistical processing result of the section type, and A control information management unit that manages control information.

【０００８】本発明によれば、音響区間の単純な種別
（音量の大きさ、音声が含まれる有音、音声が含まれな
い有音、無音）をいったん判別し、この音響区間の種別
パターンによって、番組のジャンルを推定するので、ジ
ャンルに適した混合音のモデルを設定できる。これによ
り、詳細な種別判定を行うことができるとともにやジャ
ンルに適した重要度の判定を行うことができる。また、
隣接する区間の特徴量の類似度でなく、隣接する区間の
特徴量の変化パターン（増加／減少）、変化前の特徴量
の値、変化後の特徴量の値、変化の大きさを利用するこ
とにより、過剰な区間検出を抑制することができる。According to the present invention, a simple type of a sound section (a loudness of a sound, a sound including a sound, a sound including no sound, and a silent sound) is once discriminated, and the type of the sound section is determined based on the type pattern of the sound section. Since the genre of the program is estimated, a mixed sound model suitable for the genre can be set. As a result, it is possible to make a detailed type determination and also to determine a degree of importance suitable for a genre. Also,
Instead of using the similarity between the feature amounts of adjacent sections, the change pattern (increase / decrease) of the feature amount of the adjacent section, the value of the feature amount before the change, the value of the feature amount after the change, and the magnitude of the change are used. Thus, excessive section detection can be suppressed.

【０００９】[0009]

【発明の実施の形態】本発明の請求項１に記載の発明
は、音響データを複数の区間に分割する音響データ分割
手段と、前記区間の種別を判定する区間種別判定手段
と、区間種別の統計処理結果を用いて番組のジャンルを
判別する区間種別パターン判別手段と、ジャンルに応じ
た制御情報を管理する制御情報管理部とを備えたことを
特徴としており、番組のジャンルの推定、番組のジャン
ルに適した重要度判定を行うことができるという作用を
有する。DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention according to claim 1 of the present invention provides an audio data dividing means for dividing audio data into a plurality of sections, a section type determining means for determining the type of the section, It is characterized by comprising a section type pattern discriminating means for discriminating a genre of a program by using a statistical processing result, and a control information management unit for managing control information according to the genre. This has the effect that importance determination suitable for the genre can be made.

【００１０】本発明の請求項２に記載の発明は、請求項
１に記載の音響データ分析装置において、連続する同一
種別区間を統合する区間統合手段を備えたことを特徴と
しており、同一の分類区間が連続する場合、一つにまと
めることができるという作用を有する。According to a second aspect of the present invention, in the acoustic data analysis apparatus according to the first aspect, there is provided a section integrating means for integrating successive sections of the same type. When sections are continuous, it has an effect that they can be combined into one.

【００１１】本発明の請求項３に記載の発明は、音響デ
ータ分割手段において、音響データの自己相関係数を用
いて音響データを複数の区間に分割することを特徴とし
ており、音声区間の判別を精度良く、簡単に行うことが
できるという作用を有する。According to a third aspect of the present invention, the acoustic data dividing means divides the acoustic data into a plurality of sections by using an autocorrelation coefficient of the acoustic data, and determines a speech section. Can be easily and accurately performed.

【００１２】本発明の請求項４に記載の発明は、請求項
１に記載の音響データ分析装置において、音声を含む有
音区間、音声を含まない有音区間、無音区間のいずれか
の種別に判定する区間種別判定手段を備えたことを特徴
としており、ジャンルを判別する前の種別判定を高精度
に実現でき、ジャンルの判別にも有効なパターンを生成
するという作用を有する。According to a fourth aspect of the present invention, there is provided the acoustic data analysis apparatus according to the first aspect, wherein any one of a sound section including voice, a sound section including no voice, and a silent section is used. It is characterized by comprising a section type determining means for determining, and has a function of performing a type determination before determining a genre with high accuracy and generating a pattern effective for determining a genre.

【００１３】本発明の請求項５に記載の発明は、音響デ
ータを複数の区間に分割する音響データ分割手段と、前
記区間の種別を判定する区間種別判定手段と、ジャンル
に応じた制御情報を管理する制御情報管理部とを備えた
ことを特徴としており、ジャンル情報を用いて音響の分
割区間の重要度を推定できるという作用を有する。According to a fifth aspect of the present invention, there is provided an audio data dividing means for dividing audio data into a plurality of sections, a section type determining means for determining the type of the section, and control information corresponding to a genre. And a control information management unit that manages the sound information, and has an effect that the importance of a sound segment can be estimated using genre information.

【００１４】本発明の請求項６に記載の発明は、音響デ
ータを複数の区間に分割する音響データ分割手段と、前
記区間の種別を判定する区間種別判定手段とを備えたこ
とを特徴としており、番組のジャンルを推定することが
できるという作用を有する。According to a sixth aspect of the present invention, there is provided an audio data dividing means for dividing audio data into a plurality of sections, and a section type determining means for determining a type of the section. , The genre of the program can be estimated.

【００１５】以下、本発明の実施の形態について、図を
用いて説明する。An embodiment of the present invention will be described below with reference to the drawings.

【００１６】（実施の形態１）ジャンル判定を行う場合のシステム図１は、本発明の実施の形態１となる音響データ分析装
置の構成図である。本実施の形態１の音響データ分析装
置は、様々なフォーマットの音響データを番組単位に読
み込む音響データ入力部１０１と、読み込んだ音響デー
タを音響特徴量の変化情報を用いて分割する区間分割部
１０２と、ジャンル不明の場合、音声区間、音声ありの
混合音区間、音声なしの有音区間、無音区間のいずれか
に区間の種別を判定し、ジャンルが判明した場合、ジャ
ンルに適応した種別に判定する区間種別判定部１０３
と、ジャンルが既知の番組の区間種別判定結果を蓄積し
たジャンル判別用標準パターン蓄積部１０５と、区間種
別パターンによって番組のジャンルを判別する区間種別
パターン判別部１０４と、ジャンルに適した種別算出方
法や重要度算出方法を管理する番組ジャンル別制御情報
管理部１０６と、音響データの分析結果を出力する分析
結果出力部１０７とから構成されている。(Embodiment 1) System for performing genre determination FIG. 1 is a configuration diagram of an acoustic data analyzer according to Embodiment 1 of the present invention. The acoustic data analysis device according to the first embodiment includes an acoustic data input unit 101 that reads audio data in various formats for each program, and a section dividing unit 102 that divides the read audio data using the change information of the acoustic feature amount. If the genre is unknown, the type of section is determined to be one of a voice section, a mixed sound section with voice, a voiced section without voice, and a silent section. Section type determination unit 103 to perform
A genre discriminating standard pattern storage unit 105 that stores section type judgment results of programs whose genres are known, a section type pattern discriminating unit 104 that discriminates a program genre based on a section type pattern, and a type calculation method suitable for the genre And a program genre-based control information management unit 106 for managing the method of calculating the degree of importance and an analysis result output unit 107 for outputting an analysis result of audio data.

【００１７】以上の音響データ分析装置の処理フローを
図２に示す。まず音響データ入力部１０１を用いて音響
データを入力し、区間分割部１０２を用いて入力音響デ
ータを複数区間に分割する。以上が区間分割手段（ｓｔ
ｅｐ２０１）に相当する。次に、区間種別判定部１０３
を用いて音声区間、音声ありの混合音区間、音声なしの
有音区間、無音区間のいずれかに区間の種別を判定し、
判定結果を区間種別パターン判定部１０４に出力する。
以上が区間種別判定手段（ｓｔｅｐ２０２）に相当す
る。区間種別パターン判定部１０４は、区間種別判定部
１０３から出力される区間種別のパターンと、ジャンル
判別用標準パターン蓄積部１０５に蓄積されているジャ
ンルが既知の番組の区間種別パターンとを比較し、類似
度の高い上位Ｋ個の番組を選択する。選択した番組のジ
ャンルで最も多いジャンルを音響データ入力部１０１か
ら入力した音響データのジャンルとする。このジャンル
推定結果を番組別制御情報管理部１０６に入力する。以
上が区間種別パターン判別手段（ｓｔｅｐ２０３）に相
当する。番組ジャンル別制御情報管理部１０６は、番組
ジャンルに適した区間種別情報と重要区間判定情報を制
御情報として区間種別判定部１０３に出力する。区間種
別判定部１０３は、ジャンル別制御情報を用いて、区間
種別や区間重要度を計算し、分析結果出力部１０７へ出
力する。以上がジャンル適応型区間種別判定手段（ｓｔ
ｅｐ２０４）に相当する。分析結果出力部１０７は分析
結果を適当なフォーマットに変換して出力する。例え
ば、区間のスタート時間、区間のエンド時間、区間重要
度の値、区間種別番号を行単位に出力する。FIG. 2 shows a processing flow of the above acoustic data analyzer. First, sound data is input using the sound data input unit 101, and the input sound data is divided into a plurality of sections using the section dividing unit 102. The above is the section dividing means (st
ep201). Next, the section type determination unit 103
Is used to determine the type of a section as one of a voice section, a mixed sound section with voice, a voiced section without voice, and a silent section,
The determination result is output to the section type pattern determination unit 104.
The above corresponds to the section type determination means (step 202). The section type pattern determination unit 104 compares the section type pattern output from the section type determination unit 103 with the section type pattern of a program whose genre is known and stored in the genre determination standard pattern storage unit 105, The top K programs with high similarity are selected. The genre that is the largest among the genres of the selected program is the genre of the audio data input from the audio data input unit 101. The genre estimation result is input to the control information management unit 106 for each program. The above corresponds to the section type pattern determination means (step 203). The program genre-based control information management section 106 outputs section type information and important section determination information suitable for the program genre to the section type determination section 103 as control information. The section type determination unit 103 calculates the section type and section importance using the genre-specific control information, and outputs the calculated section type and section importance to the analysis result output unit 107. The above is the genre adaptive section type determination means (st
ep204). The analysis result output unit 107 converts the analysis result into an appropriate format and outputs it. For example, a section start time, a section end time, a section importance value, and a section type number are output for each row.

【００１８】具体的な区間分割手段（ｓｔｅｐ２０１）
を図３のフローチャート図を用いて説明する。音響デー
タを入力として自己相関分析を行い、正規化した１次の
自己相関係数を出力する（ｓｔｅｐ３０１）。正規化し
た１次の自己相関係数を用いて音響データの分割位置を
検出し、分割位置のランク付けをする（ｓｔｅｐ３１
０）。分割位置のランクに応じて音響データを分割する
（ｓｔｅｐ３２０）。さらに分割位置検出手段（ｓｔｅ
ｐ３１０）は、音響データの全ての位置で前後区間の正
規化１次自己相関係数の最大値を計算し（ｓｔｅｐ３１
１）、隣接する分割区間の最大値の差を計算するととも
に（ｓｔｅｐ３１２）、時間方向に対して最大値が増加
するパターン（立ち上がりの位置）を見つける（ｓｔｅ
ｐ３１３）。さらに、立ち上がり位置の変化の大きさに
応じてＮ段階のランク付けを行い（ｓｔｅｐ３１４）、
一定ランク以上の分割位置の前後数秒以内（例えば２秒
以内）に一定ランク以下の分割位置があるかどうかの判
定を行い（ｓｔｅｐ３１５）、一定ランク以下の分割位
置が見つかれば、その分割位置のランクを１ランク下げ
る（ｓｔｅｐ３１６）。Specific section dividing means (step 201)
Will be described with reference to the flowchart of FIG. An autocorrelation analysis is performed using the acoustic data as input, and a normalized first-order autocorrelation coefficient is output (step 301). Using the normalized primary autocorrelation coefficient, the division position of the acoustic data is detected, and the division positions are ranked (step 31).
0). The sound data is divided according to the rank of the division position (step 320). Further, the dividing position detecting means (ste
p310) calculates the maximum value of the normalized primary autocorrelation coefficient in the preceding and succeeding sections at all positions of the acoustic data (step 31).
1) In addition to calculating the difference between the maximum values of adjacent divided sections (step 312), a pattern (a rising position) in which the maximum value increases in the time direction is found (step 312).
p313). Further, N-stage ranking is performed according to the magnitude of the change in the rising position (step 314),
Within a few seconds (for example, within 2 seconds) before and after the division position above a certain rank, it is determined whether there is a division position below a certain rank (step 315), and if a division position below a certain rank is found, the rank of that division position is determined. Is lowered by one rank (step 316).

【００１９】具体的な区間種別判定判別手段（ｓｔｅｐ
２０２）を図４のフローチャート図を用いて説明する。
区間単位にしきい値以下の１次自己相関係数の割合、１
次自己相関係数の平均、音量のピーク等の音響特徴量を
計算し（ｓｔｅｐ４０１）、しきい値以下の１次自己相
関係数の割合がα以上β以下であるかの判定を行い（ｓ
ｔｅｐ４０２）、条件が合えば音声区間とみなす。ま
た、しきい値以下の１次自己相関係数の割合がα以下で
あるかの判定を行い（ｓｔｅｐ４０３）、条件が合えば
音声ありの混合音区間とみなす。さらに、上記ステップ
で条件の合わなかった区間の内、音量ピークがγ以上か
どうかの判定を行い（ｓｔｅｐ４０４）、条件が合えば
音声なしの有音区間とみなす。以上のステップでいずれ
の条件にも合わなかった区間を無音区間とみなす。Specific section type determination determining means (step)
202) will be described with reference to the flowchart of FIG.
Percentage of primary autocorrelation coefficient below threshold in section unit, 1
The average of the second order autocorrelation coefficient, the acoustic feature amount such as the peak of the sound volume and the like are calculated (step 401), and it is determined whether the ratio of the first order autocorrelation coefficient equal to or smaller than the threshold is equal to or larger than α and equal to or smaller than (s).
(step 402), if the conditions are met, it is regarded as a voice section. In addition, it is determined whether the ratio of the primary autocorrelation coefficient equal to or smaller than the threshold value is equal to or smaller than α (step 403). Further, it is determined whether or not the volume peak is equal to or more than γ in the section where the condition is not satisfied in the above step (step 404). If the condition is satisfied, the section is regarded as a sound section without sound. A section that does not meet any of the conditions in the above steps is regarded as a silent section.

【００２０】具体的な区間種別パターン判別手段（ｓｔ
ｅｐ２０３）を図５のフローチャート図を用いて説明す
る。番組単位に、各区間種別の割合、区間長が１０秒以
上／１分以上／３分以上の区間の割合、区間長が１０秒
以下／１分以下／３分以下の区間の割合、さらには区間
種別毎の区間長の割合等の番組種別統計量を計算し（ｓ
ｔｅｐ５０１）、前ステップで求めた番組種別統計量を
使って、ジャンル判別用標準パターンとの類似度を計算
し、類似度の高い上位N個を選択する（ｓｔｅｐ５０
２）。選択したＮ個の番組中、最も多いジャンルを当該
音響データのジャンルとする。ジャンル数が同じ場合
は、最も類似度の高い番組のジャンルとする（ｓｔｅｐ
５０３）。Specific section type pattern determining means (st
ep203) will be described with reference to the flowchart of FIG. For each program, the rate of each section type, the rate of sections whose section length is 10 seconds or more / 1 minute or more / 3 minutes or more, the rate of section length of 10 seconds or less / 1 minute or less / 3 minutes or less, and The program type statistics such as the ratio of the section length for each section type are calculated (s
(Step 501), the similarity to the genre-determining standard pattern is calculated by using the program type statistics obtained in the previous step, and the top N high similarities are selected (Step 50).
2). The genre that is the largest among the selected N programs is the genre of the sound data. If the number of genres is the same, the genre of the program with the highest similarity is set (step
503).

【００２１】具体的なジャンル適応型区間種別判定手段
（ｓｔｅｐ２０４）を図６のフローチャート図を用いて
説明する。あらかじめ用意したジャンルに適した制御情
報（区間種別情報や重要区間情報）を読み込み（ｓｔｅ
ｐ６０１）、ジャンルに適した特徴量を計算し（ｓｔｅ
ｐ６０２）、区間種別の判定及び区間種別に応じた重要
度を計算する（ｓｔｅｐ６０３）。ジャンルに適した区
間種別情報とは、例えば、音楽番組では、区間種別とし
て音楽区間、トーク区間があり、音楽区間の特徴を判定
するための特徴量とそのしきい値、トーク区間と判定す
るための特徴量とそのしきい値に相当する。区間種別を
判定するための特徴量として、音量、ピッチ、明るさ、
帯域、メル周波数ケプストラムの他に、自己相関係数を
用いる。例えば、音楽番組というジャンルの音楽区間を
判定するためには、分割区間単位に正規化１次自己相関
係数が0.01以下となる部分区間の割合が0.1以下かつ正
規化１次自己相関係数が0.1以上という条件を設定す
る。重要区間判定情報とは、例えば、音楽番組では、音
楽区間の重要度が高く、トーク区間の重要度が低いとい
うような情報に相当する。A specific genre-adaptive section type determining means (step 204) will be described with reference to the flowchart of FIG. Read control information (section type information and important section information) suitable for the genre prepared in advance (step
p601), and calculate a feature amount suitable for the genre (ste
p602), the section type is determined, and the importance according to the section type is calculated (step 603). The section type information suitable for the genre is, for example, in a music program, there are a music section and a talk section as section types, and a feature amount for determining a feature of the music section, a threshold thereof, and a talk section. And its threshold value. As feature amounts for determining the section type, volume, pitch, brightness,
The autocorrelation coefficient is used in addition to the band and the mel frequency cepstrum. For example, in order to determine a music section of a genre of a music program, the ratio of the partial section in which the normalized primary autocorrelation coefficient is 0.01 or less in each divided section is 0.1 or less and the normalized primary autocorrelation coefficient is Set the condition of 0.1 or more. The important section determination information corresponds to, for example, information that a music program has a high importance in a music section and a low importance in a talk section.

【００２２】（実施の形態２）ジャンル情報が入力される場合図７は、音響データとともに、ジャンル情報が入力され
る場合の実施の形態となる音響データ分析装置のシステ
ム構成図である。本実施の形態１の音響データ分析装置
は、様々なフォーマットの音響データを番組単位に読み
込む音響データ入力部７０１と、読み込んだ音響データ
を音響特徴量の変化情報を用いて分割する区間分割部７
０２と、ジャンルに適応した種別に判定する区間種別判
定部７０３と、ジャンルに適した種別算出方法や重要度
算出方法を管理する番組ジャンル別制御情報管理部７０
４と、音響データの分析結果を出力する分析結果出力部
７０５とから構成されている。(Embodiment 2) When genre information is input FIG. 7 is a system configuration diagram of an acoustic data analyzer according to an embodiment when genre information is input together with audio data. The acoustic data analysis device according to the first embodiment includes an acoustic data input unit 701 that reads audio data in various formats for each program, and a section dividing unit 7 that divides the read audio data using the change information of the acoustic feature amount.
02, a section type determination unit 703 that determines a type suitable for the genre, and a program genre-based control information management unit 70 that manages a type calculation method and an importance calculation method suitable for the genre.
4 and an analysis result output unit 705 for outputting the analysis result of the acoustic data.

【００２３】以上の音響データ分析装置を構築するため
には、まず音響データ入力部７０１を用いて音響データ
を入力し、区間分割部７０２を用いて入力音響データを
複数区間に分割する。次に、区間種別判定部７０３はジ
ャンルをこのジャンル推定結果を番組別制御情報管理部
７０４に出力する。番組別制御情報管理部７０４は、ジ
ャンルに適した区間種別計算方法や重要区間判定方法を
制御情報として区間種別判定部７０３に出力する。区間
種別判定部７０３は、ジャンル別制御情報を用いて、区
間種別や区間重要度を計算し、分析結果出力部７０５へ
出力する。分析結果出力部７０５は分析結果を適当なフ
ォーマットに変換して出力する。In order to construct the above-described acoustic data analyzer, first, acoustic data is input using the acoustic data input unit 701, and the input acoustic data is divided into a plurality of sections using the section dividing unit 702. Next, the section type determination unit 703 outputs the genre estimation result to the program-specific control information management unit 704. The program-specific control information management unit 704 outputs a section type calculation method and an important section determination method suitable for the genre to the section type determination unit 703 as control information. The section type determination unit 703 calculates the section type and section importance using the genre-specific control information, and outputs the calculated section type and section importance to the analysis result output unit 705. The analysis result output unit 705 converts the analysis result into an appropriate format and outputs it.

【００２４】実施の形態１との違いは、ジャンル情報が
あらかじめわかっているかどうかである。実施の形態２
では、ジャンル情報があらかじめわかっている場合を想
定しており、実施の形態１において、ジャンルを推定す
る装置を除いた構成となる。The difference from the first embodiment is whether genre information is known in advance. Embodiment 2
In this case, it is assumed that the genre information is known in advance, and the configuration according to the first embodiment does not include the device for estimating the genre.

【００２５】[0025]

【発明の効果】以上のように本発明の音響データ分析装
置及び方法は、いったん音響区間の単純な種別（音量の
大きさ、音声が含まれる有音、音声が含まれない有音、
無音）を判定し、この判定結果によって番組のジャンル
を推定しているので、分割区間の混合音の種別判定や重
要度の判定を行うことができるという効果がある。As described above, the acoustic data analyzing apparatus and method of the present invention provide a simple classification of a sound section (volume of sound, sound including sound, sound including no sound,
Since no sound is determined and the genre of the program is estimated based on the result of the determination, the type of mixed sound in the divided section and the importance can be determined.

[Brief description of the drawings]

【図１】本発明の実施の形態１となる音響データ分析装
置の構成図FIG. 1 is a configuration diagram of an acoustic data analysis device according to a first embodiment of the present invention.

【図２】音響データ分析装置の処理全体を表すフローチ
ャートFIG. 2 is a flowchart showing the entire processing of the acoustic data analyzer.

【図３】区間分割手段のフローチャートFIG. 3 is a flowchart of a section dividing unit.

【図４】区間種別判定手段（１）のフローチャートFIG. 4 is a flowchart of section type determination means (1).

【図５】区間種別パターン判別手段のフローチャートFIG. 5 is a flowchart of a section type pattern determination unit.

【図６】ジャンル適応型区間種別判定手段のフローチャ
ートFIG. 6 is a flowchart of genre adaptive section type determination means.

【図７】本発明の実施の形態２となる音響データ分析装
置のシステム構成図FIG. 7 is a system configuration diagram of an acoustic data analysis device according to a second embodiment of the present invention;

[Explanation of symbols]

１０１音響データ入力部１０２区間分割部１０３区間種別判定部１０４区間種別パターン判別部１０５ジャンル判別用標準パターン蓄積部１０６番組ジャンル別制御情報管理部１０７分析結果出力部 Reference Signs List 101 Sound data input unit 102 Section division unit 103 Section type determination unit 104 Section type pattern determination unit 105 Standard pattern storage unit for genre determination 106 Program genre-specific control information management unit 107 Analysis result output unit

Claims

[Claims]

1. An audio data dividing unit that divides audio data into a plurality of sections, a section type determining unit that determines a type of the section, and a section type that determines a genre of a program using a statistical processing result of the section type. An acoustic data analysis device comprising: a pattern discriminating unit; and a control information management unit that manages control information according to a program genre.

2. The acoustic data analyzer according to claim 1, further comprising a section integrating means for integrating continuous sections of the same type.

3. An acoustic data dividing apparatus, wherein acoustic data is divided into a plurality of sections by using an autocorrelation coefficient of the acoustic data.

4. The acoustic data analysis apparatus according to claim 1, further comprising: a section type determining unit that determines any one of a sound section including a voice, a voice section including no voice, and a silent section. An acoustic data analyzer characterized by the following.

5. An audio data dividing unit that divides audio data into a plurality of sections, a section type determining unit that determines the type of the section, and a control information management unit that manages control information according to a program genre. An acoustic data analysis device comprising:

6. A section type pattern discriminating apparatus comprising: a sound data dividing unit that divides sound data into a plurality of sections; and a section type determining unit that determines a type of the section.

7. A section for dividing sound data into a plurality of sections, a section type determining step for determining the type of the section, and a section for determining a genre of the sound data using a statistical processing result of the section type. A type pattern determination process;
A genre-adaptive section type determination step of determining a section type using control information corresponding to a genre.

8. The acoustic data analysis method according to claim 6, further comprising a section integration step of integrating successive sections of the same type.

9. A method for dividing sound data, wherein the sound data is divided into a plurality of sections using an autocorrelation coefficient of the sound data.

10. The acoustic data analysis method, wherein in the section type determining step, the type is determined to be any of a voiced section including voice, a voiced section not including voice, and a silent section.

11. An audio data dividing step of dividing audio data into a plurality of sections, a section type determining step of determining the type of the section, and a control information managing step of managing control information according to a genre. A method for analyzing acoustic data, characterized in that:

12. A section type pattern discriminating method comprising: an audio data dividing step of dividing sound data into a plurality of sections; and a section type determining step of determining a type of the section.