JPH08249343A

JPH08249343A - Device and method for speech information acquisition

Info

Publication number: JPH08249343A
Application number: JP7049656A
Authority: JP
Inventors: Shozo Abe; 省三阿部
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 1995-03-09
Filing date: 1995-03-09
Publication date: 1996-09-27

Abstract

PURPOSE: To efficiently perform information retrieval, etc., by extracting information which is characteristic to external real time speech data, from the external real time speech data, etc. CONSTITUTION: A speech signal from a sound source for an external real time broadcast, etc., is inputted to a speech signal input part 1 and converted into digital speech data, which are stored as a speech file in a speech data storage part 3. A news item extraction part 4 extracts speech data by news items from the stored speech data on the basis of wording mode patterns, characteristic of news, in a news broadcast pattern storage part 5 and saves them in a news item storage part 6. Various saves news item data are converted by a speech data recognition part 7 into text data, and a key word appearance frequency extraction part 8 extracts a key word interactively and further finds its appearance frequency. An information retrieval part 10 performs retrieval from an external data base by using a key word of high appearance frequency as a retrieval key.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、ニュース番組などの外
部音声データ、或いは既存の音声ファイル（音声データ
ファイル）の音声データから、その音声データに特有の
情報を抽出して、情報検索等に利用するのに好適な音声
情報取得装置及び音声情報取得方法に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention extracts information peculiar to audio data from external audio data such as a news program or audio data of an existing audio file (audio data file) to retrieve information. The present invention relates to a voice information acquisition device and a voice information acquisition method suitable for use.

【０００２】[0002]

【従来の技術】近年、コンピュータを使った各種のシス
テムが、個人レベルで普及するようになった。特に、パ
ーソナルコンピュータでのマルチメディアシステムは、
個人レベルでの多くの用途を生み出している。2. Description of the Related Art In recent years, various types of systems using computers have become popular at the individual level. In particular, multimedia systems on personal computers
Has produced many uses at the individual level.

【０００３】例えば、アミューズメント分野など、個人
向けのパーソナルコンピュータ用ソフトウェア市場の拡
大は目覚ましい。低価格のパーソナルコンピュータにお
いても、従来からのテキスト情報の他にグラフィックに
よるアニメーション、映像といったメディア、及び音声
データを取り扱うことが可能となってきた。For example, the expansion of the personal computer software market for individuals such as in the amusement field is remarkable. Even low-priced personal computers can handle media such as graphic animations and images, and audio data in addition to conventional text information.

【０００４】さて、音声データは、音声データ取り込み
ハードウェアの標準装備により、ＣＤ−ＲＯＭ等に格納
されている既存の音声ファイル（例えば＊．ｗａｖ等に
代表される形式のディジタル音声データのファイル）や
マイクロホンから取り込むことが可能となっている。こ
れらの音声データは、そのままの形で各種のソフトウェ
アで利用されている。このように近年は、各種メディア
を使って、様々なアプリケーション・ソフトウェアが市
場に普及してきている。The audio data is an existing audio file stored in a CD-ROM or the like (for example, a file of digital audio data in a format represented by * .wav, etc.) by a standard equipment of audio data capturing hardware. It is possible to take in from a microphone. These audio data are used as they are in various software. As described above, in recent years, various types of application software have become popular in the market using various types of media.

【０００５】[0005]

【発明が解決しようとする課題】しかしながら従来は、
例えば音声データには、ＣＤ−ＲＯＭ等に格納されてい
る既存のスタティックな音声ファイルを利用するだけで
あった。また、リアルタイムに（内容が）変化する音声
データ（リアルタイム音声データ）については、そのま
まの形で入力することはできても、そのデータ内容に応
じて利用することは考えられておらず、仮に利用しよう
としたとしても、実現は困難である。However, conventionally,
For example, as the audio data, the existing static audio file stored in the CD-ROM or the like is simply used. Also, voice data (real-time voice data) that changes in real time (content) can be input as it is, but it is not considered to be used according to the data content, and it is used temporarily. If you try, it will be difficult to achieve.

【０００６】また、スタティックな音声ファイルについ
ても、その種類が膨大になってくると、例えばファイル
名だけで必要とする音声ファイルを検索することは負荷
が大きく、大変な作業となっていた。[0006] Also, as the types of static voice files become huge, it becomes a heavy task to search for the required voice file only by the file name, for example.

【０００７】本発明は上記事情を考慮してなされたもの
でその目的は、ニュース番組などの外部のリアルタイム
音声データ、或いは既存の音声ファイルの音声データか
ら、その音声データに特有の情報を抽出することで、情
報検索等が効率的に行える音声情報取得装置及び音声情
報取得方法を提供することにある。The present invention has been made in consideration of the above circumstances, and an object thereof is to extract information peculiar to audio data from external real-time audio data such as news programs or audio data of an existing audio file. Accordingly, it is to provide a voice information acquisition device and a voice information acquisition method that can efficiently perform information search and the like.

【０００８】本発明の他の目的は、定時ニュース番組情
報からニュース項目毎の音声データを抽出し、その抽出
したニュース項目毎の音声データをもとに情報検索に用
いるキーワードを取得することにより、現在話題になっ
ている情報等の検索が効率的に行える音声情報取得装置
及び音声情報取得方法を提供することにある。Another object of the present invention is to extract voice data for each news item from the scheduled news program information and obtain a keyword used for information retrieval based on the extracted voice data for each news item. An object of the present invention is to provide a voice information acquisition device and a voice information acquisition method capable of efficiently searching for information and the like which is currently a hot topic.

【０００９】本発明の更に他の目的は、既存の膨大な音
声ファイル等の個々のデータの属性を抽出してその属性
情報を該当する音声ファイルに付加しておくことによ
り、その音声ファイルがどういった内容のものか知る手
掛かりを与えることができ、またその属性情報をもとに
検索の対象とする音声ファイルの候補を絞り込むことが
できる音声情報取得装置及び音声情報取得方法を提供す
ることにある。Still another object of the present invention is to extract the attribute of individual data such as a huge existing voice file and add the attribute information to the corresponding voice file to determine the voice file. To provide a voice information acquisition device and a voice information acquisition method that can give a clue to know what the content is and that can narrow down the candidates of the audio file to be searched based on the attribute information. is there.

【００１０】[0010]

【課題を解決するための手段及び作用】本発明の第１の
観点に係る音声情報取得装置は、外部のリアルタイム放
送等の音ソースから音声信号を取り込んでディジタル音
声データに変換する音声信号入力手段と、この音声信号
入力手段により取り込まれて変換されてファイル化され
た各種のディジタル音声データを記憶するための音声デ
ータ記憶手段と、定時ニュースの放送パターン情報を記
憶しておくニュース放送パターン記憶手段と、上記音声
データ記憶手段に記憶されている音声データから、上記
ニュース放送パターン記憶手段に記憶されている放送パ
ターン情報をもとに、ニュース項目毎の音声データをニ
ュース項目データとして抽出するニュース項目抽出手段
と、このニュース項目抽出手段により抽出されたニュー
ス項目データを記憶するためのニュース項目記憶手段
と、このニュース項目記憶手段に記憶されているニュー
ス項目データを音声認識してテキスト化する音声データ
認識手段と、この音声データ認識手段により認識された
ニュース項目データから対話的にキーワードを抽出し、
その抽出したキーワードの出現頻度を取得するキーワー
ド出現頻度抽出手段と、このキーワード出現頻度抽出手
段により抽出されたキーワード及びその出現頻度をもと
に検索キーワードを決定して情報検索を行う情報検索手
段とを備えたことを特徴とするものである。The audio information acquisition apparatus according to the first aspect of the present invention is an audio signal input means for taking in an audio signal from a sound source such as an external real-time broadcast and converting it into digital audio data. And audio data storage means for storing various digital audio data that has been captured by the audio signal input means, converted and filed, and news broadcast pattern storage means for storing broadcast pattern information of scheduled news. A news item for extracting audio data for each news item as news item data from the audio data stored in the audio data storage means based on the broadcast pattern information stored in the news broadcast pattern storage means. The extraction means and the news item data extracted by this news item extraction means are recorded. For interacting with news item storage means, voice data recognition means for recognizing the news item data stored in the news item storage means by voice, and the news item data recognized by the voice data recognition means. To extract keywords,
A keyword appearance frequency extraction means for acquiring the appearance frequency of the extracted keyword, and an information search means for determining a search keyword based on the keyword extracted by the keyword appearance frequency extraction means and the appearance frequency and performing information search It is characterized by having.

【００１１】上記第１の観点に係る音声情報取得装置に
おいては、音声信号入力手段により外部のリアルタイム
放送等の音ソースからの音声信号が取り込まれ、ディジ
タル音声データに変換される。この音声データは、ファ
イル化されて音声データ記憶手段に保存される。したが
って、音声信号入力手段にて音声信号を取り込む時間帯
を設定しておくことにより、１日の中で、決まった時刻
に短い時間長で放送される定時ニュース放送からの音声
信号を取り込んでファイル化して音声データ記憶手段に
保存しておくことが可能となる。In the audio information acquiring apparatus according to the first aspect, the audio signal input means takes in an audio signal from a sound source such as an external real-time broadcast and converts it into digital audio data. This voice data is made into a file and stored in the voice data storage means. Therefore, by setting a time zone for capturing an audio signal by the audio signal input means, an audio signal from a scheduled news broadcast that is broadcast at a fixed time within a day at a short time can be captured and filed. It is possible to convert the data into a voice data storage means and store it.

【００１２】ニュース項目抽出手段は、このようなファ
イル化された定時ニュース情報からのリアルタイム音声
データを対象に、ニュース放送パターン記憶手段に登録
されている定時ニュース特有の言い回しのパターンにマ
ッチする音声パターン領域と所定時間以上の無音部を検
出することで、隣接する音声パターン領域間の（無音部
を除く）各音声領域を、それぞれニュース項目の１つの
塊領域として抽出する。この抽出された各塊領域の音声
データは、それぞれ独立のニュース項目の音声データ
（ニュース項目データ）としてファイル化されてニュー
ス項目記憶手段に保存される。The news item extraction means targets the real-time audio data from such filed scheduled news information, and a voice pattern matching the wording pattern peculiar to the scheduled news registered in the news broadcast pattern storage means. By detecting a region and a silent portion for a predetermined time or more, each voice region between the adjacent voice pattern regions (excluding the silent portion) is extracted as one block region of the news item. The extracted voice data of each chunk area is filed as voice data of independent news items (news item data) and stored in the news item storage means.

【００１３】ニュース項目記憶手段に保存された各ニュ
ース項目に対応した音声データは、音声認識手段の音声
認識処理によりテキストデータに変換される。キーワー
ド出現頻度抽出手段は、このテキストデータからユーザ
との対話によりキーワードを抽出し、更にその抽出した
キーワードの出現頻度を取得する。The voice data corresponding to each news item stored in the news item storage means is converted into text data by the voice recognition processing of the voice recognition means. The keyword appearance frequency extraction means extracts a keyword from this text data through a dialog with the user, and further acquires the appearance frequency of the extracted keyword.

【００１４】このように、ニュース番組で放送（発声）
される各キーワードの出現頻度が求められることから、
利用者は、現在世の中で話題となっている情報を例えば
表示出力或いは音声出力により知ることが可能となる。In this way, the news program is broadcast (voiced).
Since the appearance frequency of each keyword is calculated,
The user can know the information that has become a topic in the present world by, for example, display output or voice output.

【００１５】また、出現頻度の高いキーワードを検索キ
ーとして用いて、例えば外部の大容量データベースを対
象とする情報検索を行うことで、即ちリアルタイム性の
あるニュース情報を利用した情報検索を行うことで、実
社会で話題となっている出来事に関連した情報、及びニ
ュース・ソースの異なるメディアの情報を取得すること
も可能となる。Further, by using a keyword having a high appearance frequency as a search key, for example, an information search for an external large-capacity database, that is, an information search using news information with real-time property is performed. It is also possible to obtain information related to an event that is a topic in the real world, and information on media with different news sources.

【００１６】本発明の第２の観点に係る音声情報取得装
置は、既存の各種音声ファイルのディジタル音声データ
を記憶するための音声データ記憶手段と、この音声デー
タ記憶手段に記憶されている各音声ファイルのディジタ
ル音声データの属性情報を抽出して当該音声ファイルに
付加する音声属性情報抽出手段と、上記音声データ記憶
手段に記憶されている各音声ファイルのうちの外部指定
の属性情報が付加されている音声ファイルを対象に情報
検索を行う情報検索手段とを備えたことを特徴とするも
のである。A voice information acquisition apparatus according to a second aspect of the present invention is a voice data storage unit for storing digital voice data of various existing voice files, and each voice stored in the voice data storage unit. A voice attribute information extraction means for extracting attribute information of digital voice data of a file and adding it to the voice file, and an attribute information designated externally of each voice file stored in the voice data storage means are added. The present invention is characterized by comprising an information retrieval means for conducting an information retrieval for an existing audio file.

【００１７】上記第２の観点に係る音声情報取得装置に
おいては、既存の音声ファイルを対象にして、その音声
ファイルの音声データがどういった内容のものである
か、例えば「音声」音であるか「音楽」音であるか、
「音声」音の場合には男性の音声であるか女性の音声で
あるかといった、音声ファイルの音声データに関する属
性が、音声属性情報抽出手段により抽出される。In the voice information acquisition apparatus according to the second aspect, for an existing voice file, what the voice data of the voice file is, for example, "voice" sound. Or is it a "music" sound,
In the case of the "voice" sound, the attribute relating to the voice data of the voice file, such as the voice of a male voice or the voice of a female voice, is extracted by the voice attribute information extracting means.

【００１８】ここで、音声データの属性を抽出する手法
として、「音声」音であるか「音楽」音であるかについ
ては、「音声」音は「音楽」音に比較してレベルの変動
が激しいことが利用され、男性の音声であるか女性の音
声であるかについては、発話音声において男性音声は１
００Ｈｚ、女性音声は２００Ｈｚ前後の周波数でほぼ一
定であることが利用される。この他、「音声」音が子供
の音声か大人の音声かといった属性を抽出することも可
能である。Here, as a method for extracting the attribute of the voice data, regarding whether it is a "voice" sound or a "music" sound, the level of the "voice" sound is different from that of the "music" sound. As for the utterance voice, the male voice is 1 when the intense voice is used and it is the male voice or the female voice.
It is used that the frequency of 00 Hz and female voice is almost constant at a frequency of around 200 Hz. In addition, it is also possible to extract attributes such as whether the "voice" sound is a child's voice or an adult's voice.

【００１９】このようにして抽出された属性の情報を対
応する音声ファイルに付加しておくことで、既存の膨大
な音声ファイルの個々のデータがどういった内容のもの
であるかを知るための手掛かりとなる。したがって、例
えば利用者が所望の属性情報を指定することで、大量の
音声ファイルの中から情報検索の対象とするファイルを
絞り込むことが可能となり、今後のマルチメディア利用
システムにおける操作性の向上を図ることができる。By adding the attribute information extracted in this way to the corresponding audio file, it is possible to know what the contents of individual data in the existing huge audio file are. It will be a clue. Therefore, for example, when the user specifies desired attribute information, it becomes possible to narrow down the files to be searched for information from among a large number of audio files, and improve the operability in future multimedia usage systems. be able to.

【００２０】以上に述べた第１の観点に係る音声情報取
得装置の構成と、第２の観点に係る音声情報取得装置の
構成とを組み合わせることで、リアルタイム性のあるニ
ュース情報を利用した情報検索を、利用者が指定した属
性を持つ音声ファイルだけを対象として行うことも可能
となり、必要とする情報が検索し易くなる。なお、音声
ファイルを対象とする情報検索では、対象となる音声フ
ァイルの音声データを音声認識してテキストデータに変
換し、そのテキストデータを対象にフルテキストサーチ
を行えばよい。By combining the configuration of the voice information acquisition apparatus according to the first aspect described above and the configuration of the voice information acquisition apparatus according to the second aspect, information retrieval using news information with real time property is possible. Can be performed only for the audio file having the attribute designated by the user, and the necessary information can be easily searched. In the information search targeting the voice file, voice data of the target voice file may be voice-recognized, converted into text data, and a full-text search may be performed on the text data.

【００２１】[0021]

【実施例】以下、本発明の実施例につき図面を参照して
説明する。図１は本発明の一実施例に係る音声情報取得
装置の全体構成を示すブロック構成図である。Embodiments of the present invention will be described below with reference to the drawings. FIG. 1 is a block configuration diagram showing the overall configuration of a voice information acquisition device according to an embodiment of the present invention.

【００２２】図１において、１はリアルタイムに（内容
が）変化する外部の音声信号（リアルタイム音声信
号）、例えばラジオ放送、テレビジョン放送、或いはマ
イクロホンからの音声信号等を取り込んでディジタルの
音声データに変換する音声信号入力部、２はこの音声信
号入力部１による音声信号取り込みの時間帯（開始時刻
と取り込み時間からなる時間帯）を設定する時計機能付
きの時間帯設定部である。この時間帯設定部２により設
定される取り込み時間帯は、音声信号入力部１によるリ
アルタイム音声信号取り込みの対象を例えば公共の定時
ニュースとするならば、朝７時、正午、夜７時といった
決まった時刻から一定時間の幅の時間帯である。In FIG. 1, reference numeral 1 designates an external audio signal (real-time audio signal) that changes in real time (content), for example, a radio broadcast, a television broadcast, or an audio signal from a microphone, and converts it into digital audio data. A voice signal input unit 2 for conversion is a time zone setting unit with a clock function for setting a time zone (time zone consisting of a start time and a capture time) for capturing a voice signal by the voice signal input section 1. If the target of real-time audio signal capture by the audio signal input unit 1 is, for example, public scheduled news, the capture time zone set by the time zone setting unit 2 is determined to be 7:00 am, noon, and 7:00 pm. It is a time zone within a certain time range from the time.

【００２３】音声信号入力部１は、時計機能付き時間帯
設定部２により設定された一定時間の期間（即ち時計機
能付き時間帯設定部２により設定された時間帯におい
て）、ラジオ放送またはテレビジョン放送からの音声信
号を取り込み、ディジタルの音声データに変換する。こ
の音声データ、例えば夜７時の公共の定時ニュースを対
象として、その開始時から一定時間の間に取り込まれて
ディジタル変換された音声データは、１つの独立した音
声ファイルとして音声データ記憶部３に保存される。The audio signal input section 1 is a radio broadcast or a television during a fixed time period set by the time zone setting section with clock function 2 (that is, in the time zone set by the time zone setting section with clock function 2). It takes in audio signals from broadcasting and converts them into digital audio data. This voice data, for example, the public fixed-time news at 7 pm, is recorded in the voice data storage unit 3 as one independent voice file by being taken in and digitally converted for a fixed time from the start. Saved.

【００２４】ここで音声信号入力部１は、音声データ記
憶部３の記憶領域の空き具合によって音声取り込み時の
サンプリング間隔を可変することで、必要ファイル容量
を可変にすることが可能なようになっている。勿論、サ
ンプリング間隔を大きくすることで一定時間内における
必要ファイル容量が少なくなると、音声データの音質は
劣化することになるが、音声データの用途に応じて使い
分ければよい。Here, the audio signal input section 1 can change the required file capacity by changing the sampling interval at the time of audio acquisition depending on the availability of the storage area of the audio data storage section 3. ing. Of course, if the required file capacity within a certain time is reduced by increasing the sampling interval, the sound quality of the audio data will deteriorate, but it can be used properly according to the purpose of the audio data.

【００２５】音声データ記憶部３には、音声信号入力部
１により取り込まれたリアルタイム音声データ（からな
る音声ファイル）の他に、ＣＤ−ＲＯＭ等の媒体から読
み込まれる既存のスタティックな各種音声ファイルのデ
ータ（ディジタル音声データ）が保存される。これらの
音声ファイルは、種々のソフトウェアで利用される音声
データである。また、音声データ記憶部３には更に、後
述する音声属性情報抽出部１３により取り込まれた特別
の音声属性を持つ音声データのファイルも保存される。The voice data storage unit 3 stores real-time voice data (consisting of voice files) captured by the voice signal input unit 1 as well as various existing static voice files read from a medium such as a CD-ROM. Data (digital voice data) is stored. These audio files are audio data used by various software. Further, the voice data storage unit 3 also stores a voice data file having a special voice attribute fetched by the voice attribute information extraction unit 13 described later.

【００２６】音声信号入力部１により上記設定時間帯に
おいて取り込まれて音声データ記憶部３に格納されたリ
アルタイム音声データ（のファイル）は、ニュース項目
抽出部４の抽出対象となる。即ちニュース項目抽出部４
は、音声データ記憶部３に格納されているリアルタイム
音声データ（のファイル）から、予め定められているニ
ュース項目毎のデータ抽出を行う。この抽出のために、
図１の装置には、定時ニュース特有の言い回しのパター
ンが登録されているニュース放送パターン記憶部５が設
けられる。The real-time voice data (file thereof) taken in by the voice signal input unit 1 in the set time zone and stored in the voice data storage unit 3 is an extraction target of the news item extraction unit 4. That is, the news item extraction unit 4
Performs data extraction for each predetermined news item from (the file of) the real-time voice data stored in the voice data storage unit 3. For this extraction
The apparatus of FIG. 1 is provided with a news broadcast pattern storage unit 5 in which a wording pattern peculiar to scheduled news is registered.

【００２７】ニュース項目抽出部４は、ニュース放送パ
ターン記憶部５に登録されている各種言い回しパターン
と音声データ記憶部３に格納されているリアルタイム音
声データ（のファイル）内の無音の時間帯とをチェック
することにより、ニュース項目毎のデータ抽出を行う。The news item extraction unit 4 displays various wording patterns registered in the news broadcast pattern storage unit 5 and silent time zones in (the file of) the real-time voice data stored in the voice data storage unit 3. By checking, data is extracted for each news item.

【００２８】このニュース項目抽出部４による抽出処理
を、図２を参照して説明する。まず、図２は、音声信号
入力部１によって取り込まれる夜７時の公共の定時ニュ
ースの音声波形を示し、音声波形の右横の「」で囲まれ
た文字列は、当該定時ニュースにおける言い回しパター
ンを示している。The extraction processing by the news item extraction unit 4 will be described with reference to FIG. First, FIG. 2 shows a voice waveform of public regular news at 7 o'clock that is taken in by the voice signal input unit 1, and a character string surrounded by "" on the right side of the voice waveform is a wording pattern in the regular news. Is shown.

【００２９】ニュース放送パターン記憶部５には、公共
の定時ニュースでよく使用される言い回しパターンが複
数、辞書パターンとして登録されている。具体的には、
「７時のニュースです」、「まず、はじめに」、「次
に」、「スポーツ情報です」、「担当は桜井でした」と
いった音声パターンが登録されている。In the news broadcast pattern storage unit 5, a plurality of wording patterns often used in public scheduled news are registered as dictionary patterns. In particular,
Voice patterns such as "It's news at 7 o'clock", "First, first", "Next", "It's sports information", and "I was in charge of Sakurai" are registered.

【００３０】ニュース項目抽出部４は、夜７時の公共の
定時ニュースから音声信号入力部１により取り込まれて
音声データ記憶部３に格納されている音声ファイルの中
から、ニュース放送パターン記憶部５に登録されている
上記音声パターンにマッチする音声パターン領域Ｒ１，
Ｒ２，Ｒ３，Ｒ４，Ｒ５を抽出する。なお、特定の音声
パターンにマッチする音声を抽出する技術は、従来から
各種知られているため、ここでは説明を省略する。The news item extraction section 4 selects the news broadcast pattern storage section 5 from the audio files stored in the audio data storage section 3 by the audio signal input section 1 from the public scheduled news at 7 pm. Voice pattern area R1, which matches the voice pattern registered in
Extract R2, R3, R4 and R5. Note that various techniques for extracting a voice that matches a specific voice pattern have been conventionally known, and therefore description thereof will be omitted here.

【００３１】次にニュース項目抽出部４は、抽出した音
声パターン領域間の音声（音声データ）領域、具体的に
は「まず、はじめに」の音声パターンの領域Ｒ２と「次
に」の音声パターンの領域Ｒ３との間の音声（音声デー
タ）領域ＲＦ１、及び「スポーツ情報です」の音声パタ
ーンの領域Ｒ４と「担当は桜井でした」の音声パターン
の領域Ｒ５との間の音声（音声データ）領域ＲＦ２を、
それぞれニュース項目の１つの塊領域として抽出する。Next, the news item extraction unit 4 selects the voice (voice data) region between the extracted voice pattern regions, specifically, the region R2 of the voice pattern of "First, First" and the voice pattern of "Next". A voice (voice data) region RF1 between the region R3, and a voice (voice data) region between a voice pattern region R4 of "Is sports information" and a voice pattern region R5 of "I was in charge of Sakurai". RF2
Each is extracted as one block area of news items.

【００３２】なお、領域Ｒ２と領域Ｒ３との間には、領
域ＲＦ１に続く無音領域Ｓ１、即ちニュースとニュース
の間の比較的長い無音領域Ｓ１が存在するが、当該無音
領域Ｓ１（の音声データ）は捨てられる。即ち、ＲＦ１
＋Ｓ１がニュース項目の１つの塊領域として抽出される
のではなく、上記のようにＲＦ１が１つの塊領域として
抽出される。Between the region R2 and the region R3, there is a silent region S1 following the region RF1, that is, a relatively long silent region S1 between the news, but the silent region S1 (voice data ) Is discarded. That is, RF1
+ S1 is not extracted as one block area of the news item, but RF1 is extracted as one block area as described above.

【００３３】以上に述べた塊領域を抽出する際には、前
処理として、対象となる音声データ（音声波形データ）
にスムージング処理を施し、音声波形の急峻な部分を滑
らかにすることが好ましい。塊領域は、スムージング処
理を施した音声データを所定のしきい値で２値化するこ
とで抽出される。When extracting the lump area described above, the target voice data (voice waveform data) is preprocessed.
It is preferable to apply smoothing processing to smooth the steep part of the voice waveform. The chunk area is extracted by binarizing the smoothed audio data with a predetermined threshold value.

【００３４】さて、ニュース項目抽出部４でニュース項
目毎に抽出された（外部からの）音声データ（ここで
は、定時ニュース中の「まず、はじめに」の音声パター
ンと「次に」の音声パターンとの間のニュース項目＃１
の音声データ、及び「スポーツ情報です」の音声パター
ンと「担当は桜井でした」の音声パターンとの間のニュ
ース項目＃２の音声データ）は、ニュース項目音声ファ
イル（ここでは、ファイルＡ及びＢ）のデータとして、
ニュース項目記憶部６に保存される。Now, the voice data (from the outside) extracted for each news item by the news item extraction unit 4 (here, the "first, first" voice pattern and the "next" voice pattern in the scheduled news) News item # 1 between
Voice data of the news item # 2 between the voice pattern of "is sports information" and the voice pattern of "I was in charge of Sakurai" is the news item voice file (here, files A and B). ) Data,
It is stored in the news item storage unit 6.

【００３５】ニュース項目記憶部６に保存されたニュー
ス項目音声ファイルデータは、音声データ認識部７によ
る音声認識処理の対象となる。即ち音声データ認識部７
は、ニュース項目記憶部６に保存されているニュース項
目音声ファイルデータに対する音声認識処理を行って、
（文字コード列からなる）テキストデータにメディア変
換する。このテキストデータは、元のニュース項目音声
ファイルとは別にニュース項目記憶部６に保存される。The news item voice file data stored in the news item storage unit 6 is subjected to voice recognition processing by the voice data recognition unit 7. That is, the voice data recognition unit 7
Performs a voice recognition process on the news item voice file data stored in the news item storage unit 6,
Convert media to text data (consisting of character code string). This text data is stored in the news item storage unit 6 separately from the original news item voice file.

【００３６】音声データ認識部７によって変換されたニ
ュース項目音声ファイルデータのテキストデータは、キ
ーワード出現頻度抽出部８によるキーワード出現頻度の
抽出処理対象となる。即ちキーワード出現頻度抽出部８
は、ニュース項目音声ファイルデータのテキストデータ
をＣＲＴディスプレイや液晶ディスプレイ等の表示部９
に表示してユーザと対話を行うことで、そのテキストデ
ータの中から、ユーザの指定したキーワードを抽出し、
そのキーワードの出現頻度を求める。本実施例におい
て、この処理は、例えばユーザが外部の大容量データベ
ースシステムからの情報抽出を必要とする場合に、当該
ユーザから図示せぬキーボード、マウス等を通して与え
られる指示に応じて行われるものである。The text data of the news item voice file data converted by the voice data recognition unit 7 is the target of the keyword appearance frequency extraction processing by the keyword appearance frequency extraction unit 8. That is, the keyword appearance frequency extraction unit 8
Displays text data of news item voice file data on a display unit 9 such as a CRT display or a liquid crystal display.
By displaying it on the screen and interacting with the user, the keyword specified by the user is extracted from the text data,
Find the appearance frequency of the keyword. In the present embodiment, this processing is performed in accordance with an instruction given from the user through a keyboard, mouse, etc. (not shown) when the user needs to extract information from an external large-capacity database system. is there.

【００３７】そこで、図１の装置には、情報検索部１０
が設けられている。この情報検索部１０は、ユーザから
情報抽出が指示されると、キーワード出現頻度抽出部８
により抽出されるキーワードを音声データ認識部７から
受け取って、例えば出現頻度の高いキーワード（例えば
出現頻度が第１位〜第３位の３つのキーワード）を検索
キーとして、外部の大容量データベース（図示せず）か
らの情報検索を行うことで、実社会で話題となっている
（ニュースに関連した）出来事の情報を得る。この情報
検索部１０による情報検索結果は、表示部９に表示され
る。Therefore, in the apparatus shown in FIG.
Is provided. This information search unit 10 receives the information extraction from the user, and the keyword appearance frequency extraction unit 8
The keyword extracted by is received from the voice data recognition unit 7, and a keyword with a high appearance frequency (for example, three keywords with an appearance frequency of 1st to 3rd) is used as a search key, and an external large-capacity database (Fig. By performing an information search from (not shown), information on events (related to news) that are hot topics in the real world is obtained. The information search result by the information search unit 10 is displayed on the display unit 9.

【００３８】なお、検索に用いるキーワードの数を、ユ
ーザにより指定させることも可能であり、またキーワー
ド出現頻度抽出部８により抽出されたキーワードを出現
頻度順に画面表示して、その中から検索に用いるキーワ
ードをユーザに選択指定させることも可能である。The number of keywords used in the search can be specified by the user, and the keywords extracted by the keyword appearance frequency extraction unit 8 are displayed on the screen in the order of appearance frequency and used for the search. It is also possible to allow the user to select and specify the keyword.

【００３９】また、図１の装置には、音声データ記憶部
３及びニュース項目記憶部６に蓄えられた音声ファイル
のデータ、更にはキーワード出現頻度抽出部８により抽
出されたキーワードを音声出力するために、音声出力部
１１及びスピーカ１２が設けられている。音声出力部１
１は、ユーザからの指示に応じて起動され、音声データ
記憶部３またはニュース項目記憶部６に蓄えられている
音声ファイルのデータ、或いはキーワード出現頻度抽出
部８により抽出されたキーワードをもとに、その音声信
号（音響信号）をスピーカ１２に出力して、音声出力す
る。但し、ニュース項目記憶部６に蓄えられているテキ
ストデータ及びキーワード出現頻度抽出部８により抽出
されたキーワードデータを音声出力するには、当該デー
タを音声データに変換する処理が必要となる。Further, in the apparatus of FIG. 1, in order to output the voice file data stored in the voice data storage unit 3 and the news item storage unit 6 and further the keywords extracted by the keyword appearance frequency extraction unit 8 by voice. In addition, a voice output unit 11 and a speaker 12 are provided. Audio output unit 1
1 is activated in response to an instruction from the user and is based on the voice file data stored in the voice data storage unit 3 or the news item storage unit 6 or the keyword extracted by the keyword appearance frequency extraction unit 8. The audio signal (acoustic signal) is output to the speaker 12 for audio output. However, in order to output the text data stored in the news item storage unit 6 and the keyword data extracted by the keyword appearance frequency extraction unit 8 by voice, a process of converting the data into voice data is required.

【００４０】また、図１の装置には、音声データ記憶部
３に蓄えられている（既存の音声ファイルを含む）音声
ファイルのデータを対象としてその属性情報を抽出する
ための音声属性情報抽出部１３及び当該属性情報を決定
する情報（一種の辞書データ）が登録されている音声環
境辞書部１４が設けられている。ここで抽出される属性
情報の候補としては、対象となる音声データが音声であ
るか音楽であるか（音声／音楽の区別）、音声の場合に
は男性の音声であるか女性の音声であるか（男性音声／
女性音声の区別）、更には子供の音声であるか大人の音
声であるか（子供／大人の区別）等である。In the apparatus of FIG. 1, a voice attribute information extraction unit for extracting attribute information of voice file data (including existing voice files) stored in the voice data storage unit 3 is targeted. 13 and a voice environment dictionary unit 14 in which information (a kind of dictionary data) for determining the attribute information is registered. The candidate attribute information extracted here is whether the target voice data is voice or music (voice / music distinction), and in the case of voice, is male voice or female voice. Or (male voice /
(Female voice distinction), and further, whether it is a child's voice or an adult's voice (child / adult distinction).

【００４１】即ち音声属性情報抽出部１３は、音声デー
タ記憶部３に蓄えられている音声ファイルのデータを対
象にして、音声環境辞書部１４の登録情報をもとに、そ
の音声データ（音声ファイル）の属性情報を抽出し、
「音声」音であるか「音楽」音であるかを区別し、「音
声」音の場合には、男性の音声であるか女性の音声であ
るか、更には子供の音声であるか大人の音声であるかを
区別する。ここで、例えば「音声／音楽」の区別は、発
話している音声（「音声」音）は、音楽（「音楽」音）
に比較してレベルの変動が激しいことを利用して行われ
る。また、男性音声／女性音声の区別は、発話音声にお
いて男性音声は１００Ｈｚ、女性音声は２００Ｈｚ前後
の周波数でほぼ一定であることを利用して行われる。That is, the voice attribute information extraction unit 13 targets the voice file data stored in the voice data storage unit 3 based on the registration information of the voice environment dictionary unit 14 and outputs the voice data (voice file). ) Attribute information,
It distinguishes between "voice" sound and "music" sound, and in the case of "voice" sound, it is a male voice, a female voice, a child voice or an adult voice. Distinguish whether it is voice. Here, for example, the distinction between “voice / music” is that the voice (“voice” sound) being spoken is music (“music” sound).
This is done by utilizing the fact that the level changes more drastically than in. Further, the distinction between male voice / female voice is performed by utilizing the fact that the male voice in the uttered voice is substantially constant at a frequency of 100 Hz and the female voice is substantially constant at a frequency of around 200 Hz.

【００４２】以上のようにして、既存の膨大な音声ファ
イルの個々のデータ、及び音声信号入力部１により取り
込まれ、ファイルとして蓄えられた外部の音声データを
対象に、「音声」音であるか或いは「音楽」音である
か、「音声」音であれば、男性の音声であるか女性の音
声であるか、更には、子供の音声であるか大人の音声で
あるかといった、音声データの属性情報が抽出される。
この抽出された属性情報を対応する音声ファイルに付加
しておくことで、膨大な音声ファイルの個々のデータ
が、どういった内容のものであるかを知るための手掛か
りの情報として用いることができる。例えば、膨大な音
声ファイルのうち、女性音声の音声ファイルだけを必要
とする音声ファイルとするなど、必要とするファイルの
候補を効率的に絞り込むことができる。これにより、今
後のマルチメディア利用システムにおける操作性の向上
を図ることが可能となる。As described above, whether the sound is a "sound" sound with respect to the individual data of the existing huge sound file and the external sound data which is taken in by the sound signal input unit 1 and stored as a file. Alternatively, if it is a "music" sound or a "voice" sound, it may be a male voice or a female voice, and further, whether it is a child voice or an adult voice. Attribute information is extracted.
By adding this extracted attribute information to the corresponding audio file, it can be used as a clue information for knowing what the contents of individual data in a huge audio file are. . For example, it is possible to efficiently narrow down the necessary file candidates, such as selecting only a female voice audio file from among a huge number of audio files. This makes it possible to improve the operability of future multimedia utilization systems.

【００４３】また、絞り込んだファイルの候補を、前記
した外部データベースと同様に情報検索部１０による情
報検索の対象とすることもできる。この場合、対象とな
る各音声ファイルの音声データを音声データ記憶部３か
ら音声データ認識部７に読み込んで音声認識してテキス
トデータに変換し、そのテキストデータを例えばニュー
ス項目記憶部６に格納した後、当該ニュース項目記憶部
６内の該当するテキストデータを対象に情報検索部１０
にてフルテキストサーチを行えばよい。Further, the narrowed-down file candidates can be the object of information retrieval by the information retrieval unit 10 as in the above-mentioned external database. In this case, the voice data of each target voice file is read from the voice data storage unit 3 to the voice data recognition unit 7, voice-recognized and converted into text data, and the text data is stored in, for example, the news item storage unit 6. After that, the information retrieval unit 10 targets the corresponding text data in the news item storage unit 6
You can do a full text search at.

【００４４】さて、音声属性情報抽出部１３は、音声デ
ータ記憶部３に蓄えられている音声ファイルのデータ
（即ち一旦ファイルした音声データ）を対象として属性
情報を抽出する他に、音声信号入力部１により時々刻々
と入力されるリアルタイム音声データを当該音声信号入
力部１から直接に取り込んで、例えば一定時間の音声デ
ータを対象として属性情報を抽出するようにもなってい
る。これは、例えばラジオ放送またはテレビジョン放送
等からリアルタイムに取り込んだデータのうち、音楽音
データのみ或いは男性音のみを抽出して音声ファイルと
して音声データ記憶部３に格納するといった用途に適用
される。The voice attribute information extraction unit 13 extracts the attribute information from the voice file data stored in the voice data storage unit 3 (that is, the voice data once filed), and the voice signal input unit. 1, the real-time voice data input momentarily is directly fetched from the voice signal input unit 1, and the attribute information is extracted, for example, for voice data of a certain time. This is applied, for example, to the purpose of extracting only music sound data or only male sound from the data captured in real time from radio broadcasting or television broadcasting and storing it in the audio data storage unit 3 as an audio file.

【００４５】一例として、ラジオ放送またはテレビジョ
ン放送等からのリアルタイム音信号を対象に、属性情報
として「音声／音楽」を区別する機能を使って、常時入
力されている音信号から「音楽」音を識別すると、音楽
音データとしてファイル化する場合について、図３を参
照して説明する。As an example, for a real-time sound signal from a radio broadcast or a television broadcast, a function of distinguishing "voice / music" as attribute information is used, and a "music" sound is input from a sound signal that is constantly input. The case of making a file as music sound data will be described with reference to FIG.

【００４６】まず図３（ａ）は、時計機能付き時間帯設
定部２によって設定される音データ取り込み開始時刻ｔ
0 から時刻ｔ1 までの時間幅の間に、音声信号入力部１
により取り込まれたリアルタイム音信号を示す。音声属
性情報抽出部１３は、音データ取り込み開始時刻ｔ0 か
ら連続して入力されるリアルタイム音信号（音声デー
タ）から、音声環境辞書部１４の登録情報をもとに、
「音楽」音を識別するための属性情報抽出処理を開始す
る。First, FIG. 3A shows a sound data acquisition start time t set by the time zone setting unit 2 with a clock function.
During the time width from 0 to time t1, the audio signal input unit 1
Shows a real-time sound signal captured by. The voice attribute information extraction unit 13 extracts the real-time sound signal (voice data) continuously input from the sound data acquisition start time t0, based on the registration information of the voice environment dictionary unit 14,
The attribute information extraction process for identifying the "music" sound is started.

【００４７】もし、時刻ｔ0 から時刻ｔ1 までの間の時
刻ｔs において、初めて「音楽」音であると判断できた
ならば、音声属性情報抽出部１３は、その時刻ｔ0 から
時刻ｔs までの経過時間Ｔs を保持し、この時間情報Ｔ
s をもとに、「音声」音のファイル化を開始する。If it is determined that the sound is a "music" sound for the first time at the time ts between the time t0 and the time t1, the voice attribute information extraction unit 13 determines the elapsed time from the time t0 to the time ts. Ts is held and this time information T
Based on s, start to file "voice" sound.

【００４８】このファイル化処理の具体的な内容は次の
通りである。まず、本実施例では、音声信号入力部１に
より時刻ｔ0 から連続して入力される音声データは、音
声属性情報抽出部１３が有するシフトバッファ（図示せ
ず）に先入れ先出し（ＦＩＦＯ）方式で入力保持され
る。このシフトバッファは、音声データを例えば１０秒
の時間長分だけ入力保持できるようになっている。The specific contents of this filing process are as follows. First, in the present embodiment, the audio data continuously input from the audio signal input unit 1 from time t0 is input and held in a shift buffer (not shown) included in the audio attribute information extraction unit 13 by a first-in first-out (FIFO) method. To be done. This shift buffer can input and hold audio data for a time length of 10 seconds, for example.

【００４９】そこで音声属性情報抽出部１３は、上記し
たように時刻ｔ0 から時間Ｔs 経過後に初めて「音楽」
音であると判断した場合、シフトバッファから時間Ｔs
に相当する量の音声データが出力された時点から、後続
のシフトバッファの出力データをファイル化する処理を
開始する。Therefore, as described above, the voice attribute information extraction unit 13 does not perform "music" for the first time after the time Ts has elapsed from the time t0.
If the sound is judged to be sound, the time from the shift buffer Ts
From the time when the amount of audio data corresponding to is output, the process of converting the output data of the subsequent shift buffer into a file is started.

【００５０】音声属性情報抽出部１３は、時刻ｔs 以降
も入力されるリアルタイム音が「音楽」音であると判断
し続け、やがて図３（ｂ）に示す時刻ｔe で「音楽」音
でないと判断したものとする。この場合、音声属性情報
抽出部１３は、この「音楽」音でないと判断した時点
（時刻ｔe ）を「音楽」音の終了時点とする。The voice attribute information extraction unit 13 continues to determine that the real-time sound input after the time ts is the "music" sound, and eventually determines that it is not the "music" sound at the time te shown in FIG. 3B. It is assumed that In this case, the voice attribute information extraction unit 13 sets the time point (time te) when it is determined that the sound is not the "music" sound as the end time point of the "music" sound.

【００５１】そこで音声属性情報抽出部１３は、時刻ｔ
0 から時刻ｔe までの経過時間Ｔeを保持し、シフトバ
ッファから時間Ｔe に相当する量の音声データが出力さ
れた時点で、シフトバッファの出力データのファイル化
処理を終了する。Therefore, the voice attribute information extracting section 13 determines at time t.
The elapsed time Te from 0 to the time t e is held, and when the shift buffer outputs the audio data of the amount corresponding to the time T e, the filing process of the output data of the shift buffer is ended.

【００５２】また、音声属性情報抽出部１３は、時刻ｔ
e で「音楽」音の終了を判断すると、その時刻ｔe を新
たなリアルタイム音信号の音データ取り込み開始時刻ｔ
0 として、流れてくるであろう「音楽」音の識別チェッ
クを繰り返す。Further, the voice attribute information extraction unit 13 determines that the time t
When the end of the "music" sound is judged by e, the time t e is set to the start time t of the sound data acquisition of the new real-time sound signal.
As 0, the identification check of the “music” sound that may be heard is repeated.

【００５３】このようにして、外部リアルタイム放送な
どの音ソースから、「音楽」音（所望の属性の音声デー
タ）だけを自動的に取り込んでファイル化することがで
きる。ファイル化されたデータは、その属性情報が付加
された状態で音声データ記憶部３に保存される。このフ
ァイル化されたデータ、即ち外部の音ソースから取り込
まれた「音楽」音は、ユーザからの指示により音声出力
部１１に取り出されて、スピーカ１２から再生出力され
る。In this way, it is possible to automatically capture only the "music" sound (sound data having a desired attribute) from a sound source such as an external real-time broadcast and make it a file. The filed data is stored in the voice data storage unit 3 with the attribute information added. The filed data, that is, the “music” sound taken in from the external sound source is taken out by the voice output unit 11 according to an instruction from the user, and reproduced and output from the speaker 12.

【００５４】同様にして、外部リアルタイム放送などの
音ソースから、例えば「男性」音声だけ、「女性」音声
だけ、更には「子供」の音声だけというように、ユーザ
の指示する属性の音声だけを自動的に取り込んでファイ
ル化して音声データ記憶部３に保存することができる。Similarly, from the sound source such as the external real-time broadcast, only the audio of the attribute instructed by the user, such as only the "male" audio, the "female" audio, and the "child" audio, is output. It can be automatically taken in, made into a file, and saved in the voice data storage unit 3.

【００５５】この場合、例えば定時ニュース番組から抽
出された「男性」音の属性のファイルの音声データだけ
を音声データ認識部７にて音声認識してテキスト化し、
そこからキーワード出現頻度抽出部８にてキーワードを
抽出して、そのキーワードの出現頻度を求めることによ
り、情報検索部１０での情報検索に用いる検索キーを絞
ること、即ち検索対象を絞ることが可能となる。In this case, for example, only the voice data of the file having the attribute of "male" sound extracted from the regular news program is voice-recognized by the voice-data recognizing unit 7 and converted into text,
By extracting a keyword from the keyword appearance frequency extraction unit 8 and obtaining the appearance frequency of the keyword, it is possible to narrow down the search key used for information search in the information search unit 10, that is, narrow down the search target. Becomes

【００５６】なお、本発明は前記実施例に限定されるも
のではない。例えば、前記実施例では、音声信号入力部
１による外部音声信号取り込みの対象が定時のニュース
番組であるものとしているが、任意の番組でも構わな
い。また、情報ソースとして、ラジオ放送やテレビジョ
ン放送からの音声信号の他、ＶＴＲ（ビデオテープレコ
ーダ）装置、ＣＤ（コンパクトディスク）装置等で扱わ
れる記録媒体、電話音声などからの音声データを使用す
ることも可能である。The present invention is not limited to the above embodiment. For example, in the above-described embodiment, the target of external audio signal acquisition by the audio signal input unit 1 is a scheduled news program, but any program may be used. Also, as the information source, in addition to audio signals from radio broadcasts and television broadcasts, recording media handled by VTR (video tape recorder) devices, CD (compact disc) devices, and audio data from telephone voices are used. It is also possible.

【００５７】[0057]

【発明の効果】以上詳述したように本発明によれば、定
時ニュース番組情報からニュースに特有の言い回しパタ
ーンを検出することでニュース項目毎の音声データを抽
出し、その抽出したニュース項目毎の音声データをテキ
スト化して、そこから情報検索に用いるキーワードを対
話的に抽出すると共に、そのキーワードの出現頻度を取
得する構成とすることにより、例えば出現頻度の高いリ
アルタイム性のあるキーワードを検索キーとして外部デ
ータベース等を対象とする情報検索を行うことができる
ため、現在話題になっている情報、更にはその情報に関
連した情報を効率的に入手することができる。As described above in detail, according to the present invention, voice data for each news item is extracted by detecting a wording pattern peculiar to news from the regular news program information, and the extracted news data for each news item is extracted. By converting the voice data into text and interactively extracting the keyword used for information retrieval from it, and acquiring the appearance frequency of the keyword, for example, a keyword with a high appearance frequency with real-time property is used as a search key. Since it is possible to search for information in an external database or the like, it is possible to efficiently obtain currently popular information and information related to the information.

【００５８】また、本発明によれば、既存の膨大な音声
ファイル等の個々のデータの属性を抽出してその属性情
報を該当する音声ファイルに付加しておくことにより、
その音声ファイルがどういった内容のものか知る手掛か
りを与えることができる。このため、音声ファイルに付
された属性情報をもとに、既存の膨大な音声ファイル等
の中から、検索の対象とする音声ファイルの候補を絞り
込むことができ、今後のマルチメディア利用システムに
おける操作性の向上を図ることができる。Further, according to the present invention, by extracting the attribute of individual data such as a huge existing audio file and adding the attribute information to the corresponding audio file,
It can give a clue as to what the audio file is about. Therefore, based on the attribute information attached to the audio file, it is possible to narrow down the candidates for the audio file to be searched from among the vast amount of existing audio files, etc. It is possible to improve the sex.

【００５９】このように、本発明によれば、ニュース番
組などの外部のリアルタイム音声データ、或いは既存の
音声ファイルの音声データから、その音声データに特有
の情報を抽出してその情報を利用することで、情報検索
等を効率的に行うことができる。As described above, according to the present invention, the information peculiar to the audio data is extracted from the external real-time audio data such as a news program or the audio data of the existing audio file and the information is used. Thus, information retrieval and the like can be efficiently performed.

[Brief description of drawings]

【図１】本発明の一実施例に係る音声情報取得装置の全
体構成を示すブロック構成図。FIG. 1 is a block configuration diagram showing an overall configuration of a voice information acquisition device according to an embodiment of the present invention.

【図２】同実施例におけるニュース項目抽出部４の抽出
処理を説明するための図であり、定時ニュースでよく使
われる言い回しパターンを使って、定時ニュースに含ま
れるニュース項目を抽出する様子を示す。FIG. 2 is a diagram for explaining an extraction process of a news item extraction unit 4 in the same embodiment, showing a state in which a news item included in scheduled news is extracted using a wording pattern often used in scheduled news. .

【図３】同実施例における音声属性情報抽出部１３の動
作を、音声信号入力部１により入力されるリアルタイム
音信号から「音楽」音を識別してファイル化する場合に
ついて説明するための図。FIG. 3 is a diagram for explaining the operation of the voice attribute information extraction unit 13 in the embodiment when a “music” sound is identified from the real-time sound signal input by the voice signal input unit 1 and filed.

[Explanation of symbols]

１…音声信号入力部、２…時計機能付き時間帯設定部
（入力時間帯設定手段）、３…音声データ記憶部、４…
ニュース項目抽出部、５…ニュース放送パターン記憶
部、６…ニュース項目記憶部、７…音声データ認識部、
８…キーワード出現頻度抽出部、９…表示部、１０…情
報検索部、１１…音声出力部、１２…スピーカ、１３…
音声属性情報抽出部、１４…音声環境辞書部。1 ... Voice signal input section, 2 ... Time zone setting section with clock function (input time zone setting means), 3 ... Voice data storage section, 4 ...
News item extraction unit, 5 ... News broadcast pattern storage unit, 6 ... News item storage unit, 7 ... Voice data recognition unit,
8 ... Keyword appearance frequency extraction unit, 9 ... Display unit, 10 ... Information retrieval unit, 11 ... Voice output unit, 12 ... Speaker, 13 ...
Voice attribute information extraction unit, 14 ... Voice environment dictionary unit.

Claims

[Claims]

1. An audio signal input means for taking in an audio signal from an external sound source such as a real-time broadcast and converting it into digital audio data, and various digital signals taken in by the audio signal input means, converted and filed. A voice data storage unit for storing voice data, a news broadcast pattern storage unit for storing broadcast pattern information of scheduled news, and a news broadcast pattern from the digital voice data stored in the voice data storage unit. News item extraction means for extracting audio data for each news item as news item data based on the broadcast pattern information stored in the storage means, and the news item data extracted by the news item extraction means are stored. News item storage means for this news item A voice data recognition means for voice-recognizing the news item data stored in the storage means and converting it into text, and a keyword is interactively extracted from the news item data recognized by the voice data recognition means, and the extracted keyword A keyword appearance frequency extraction unit that acquires the appearance frequency, and an information search unit that determines a search keyword based on the keyword extracted by the keyword appearance frequency extraction unit and the appearance frequency and performs information search are provided. A characteristic voice information acquisition device.

2. Audio data storage means for storing digital audio data of various existing audio files, and attribute information of digital audio data of each audio file stored in the audio data storage means is extracted. A voice attribute information extraction unit to be added to the voice file, and an information search for performing an information search on a voice file to which externally designated attribute information is added from the voice files stored in the voice storage unit An audio information acquisition apparatus comprising:

3. An audio signal input means for taking in an audio signal from an external sound source such as a real-time broadcast and converting it into digital audio data, and various digital signals taken in by the audio signal input means, converted and filed. Audio data storage means for storing audio data and digital audio data of various existing audio files, and at least the digital audio data of the existing audio files of the digital audio data stored in the audio data storage means. Audio attribute information extraction means for extracting attribute information and adding it to the audio file, news broadcast pattern storage means for storing broadcast pattern information of scheduled news, and the audio signal stored in the audio data storage means From the voice data captured and converted by the input means News item extraction means for extracting audio data for each news item as news item data based on the broadcast pattern information stored in the news broadcast pattern storage means, and news items extracted by the news item extraction means News item storage means for storing data, voice data recognition means for voice-recognizing news item data stored in the news item storage means, and news item recognized by the voice data recognition means A keyword appearance frequency extracting means for interactively extracting a keyword from data and acquiring the appearance frequency of the extracted keyword, and a keyword extracted by the keyword appearance frequency extracting means and a search keyword determined based on the appearance frequency , External database, or the sound Audio information acquisition apparatus characterized by comprising an information retrieval unit which performs information retrieval in a subject an audio file outside the specified attribute information is added among the audio files stored in the storage means.

4. The information search means, when a search keyword is given from the outside, uses the search key,
4. The voice information acquisition apparatus according to claim 3, wherein the voice information acquisition device performs the information search for a voice file to which externally designated attribute information is added among the voice files stored in the voice storage means.

5. The voice attribute information extracting means extracts from the voice data at least a “voice” sound or a “music” sound, and in the case of a “voice” sound, a male voice or a female voice. The voice information acquisition apparatus according to claim 2 or 3, wherein it is identified whether the voice information is voice and corresponding attribute information is extracted.

6. The voice attribute information extraction means inputs the digital voice data fetched and converted by the voice signal input means from the input means, and the voice data portion having an externally designated attribute from the input voice data. 4. The voice information acquisition apparatus according to claim 3, wherein the voice information acquisition unit stores the voice data in the voice data storage unit by adding the attribute information to the extracted voice data.

7. The voice information acquisition apparatus according to claim 1, further comprising an input time zone setting means for setting a time zone for capturing the voice signal by the voice signal input means.

8. A wording pattern peculiar to news is detected from an audio signal from an external sound source such as an external real-time broadcast, converted into digital audio data, saved as a file, and the saved audio data. By doing so, the voice data for each news item is extracted and saved as a file. By performing voice recognition on the saved voice data for each news item, each voice data is converted into a text. Converting to data, interactively extracting keywords from each text data, obtaining the appearance frequency of the extracted keywords, determining the search keyword based on the extracted keywords and their appearance frequency, and performing information retrieval A voice information acquisition method characterized by the above.

9. Attribute information of digital audio data of various existing audio files is extracted and added to the audio file, and an audio file to which externally specified attribute information of each of the audio files is added is extracted. A method for acquiring voice information, characterized in that information retrieval is performed on a target.

10. The attribute information of existing digital audio data of various audio files is extracted and added to the audio file, while an audio signal is taken from an external sound source such as real-time broadcasting and converted into digital audio data. Then, the data is saved as a file, and the voice data for each news item is extracted by detecting the wording pattern peculiar to the news from the saved voice data and saved as a file. By performing voice recognition on the voice data for each news item, each voice data is converted into text data, and keywords are interactively extracted from each text data, and the appearance of the extracted keywords Calculate the frequency, determine the search keyword based on the extracted keyword and its appearance frequency, Parts database or voice information acquisition method characterized in that to perform the subject information retrieval an audio file outside the specified attribute information is added one of the audio file.