JPH10312389A

JPH10312389A - Voice data base system and recording medium

Info

Publication number: JPH10312389A
Application number: JP9122264A
Authority: JP
Inventors: Hiroshi Shibazaki; 博柴崎
Original assignee: Dainippon Screen Manufacturing Co Ltd
Current assignee: Dainippon Screen Manufacturing Co Ltd
Priority date: 1997-05-13
Filing date: 1997-05-13
Publication date: 1998-11-24

Abstract

PROBLEM TO BE SOLVED: To obtain voice data that an operator desires speedily, efficiently, and securely. SOLUTION: Key phrases are set for all voice data registered in the voice data base system. A key phrase is sectional voice information on a featured part of a voice that voice data indicate. Then the voice data base system retrieves voice data under arbitrary conditions. For all voice data extracted by the retrieval, voice waveforms WAV4 and WAV5 are generated by repeating key phrases. Then a voice waveform WAV6 is synthesized by putting together all voice waveforms having key phrases repeated. Further, the level of the voice waveform WAV6 is adjusted to generate a vice waveform WAV7. When the retrieval result is displayed, a voice corresponding to the voice waveform WAV7 is reproduced to make it easy to specify the voice data.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、音声データを取
り扱うコンピュータにおいて、多量の音声データを保管
し、その中から所望の音声データを検索し、さらに抽出
された音声データを表示・再生する音声データベースシ
ステムおよび記録媒体に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an audio database for storing a large amount of audio data in a computer that handles audio data, searching for desired audio data from the stored audio data, and displaying and reproducing the extracted audio data. The present invention relates to a system and a recording medium.

【０００２】[0002]

【従来の技術】音声データをコンピュータ内に多量に保
管管理して利用する場合、保管データ数が多量になるほ
ど、効率の良い検索や迅速な確認ができる音声データベ
ースシステムが要求される。2. Description of the Related Art When a large amount of voice data is stored and managed in a computer and used, a voice database system capable of more efficient retrieval and quick confirmation is required as the number of stored data increases.

【０００３】しかし、音声データは聴覚において認識さ
れる音声についてのデータであるため、テキストデータ
や画像データのように表示装置にデータの内容を表示さ
せるだけでオペレータが視覚的に認識できるものではな
い。[0003] However, since voice data is data relating to voice recognized in hearing, the operator cannot visually recognize the data simply by displaying the content of the data on a display device like text data or image data. .

【０００４】そこで、従来の音声データベースシステム
では、効率の良い検索を行うために音声データにキーワ
ードなどの属性情報を対応付けておき、多量の音声デー
タの中からその属性情報に基づいて検索を行うように構
成されている。属性情報には、キーワードの他にタイト
ル名，作成者／録音者，収録場所，収録日時，収録時間
などがある。そして、検索の結果抽出された音声データ
は、一覧形式で表示装置の画面上に表示される。Therefore, in the conventional voice database system, attribute information such as a keyword is associated with voice data in order to perform an efficient search, and a search is performed from a large amount of voice data based on the attribute information. It is configured as follows. The attribute information includes a title name, a creator / recorder, a recording location, a recording date, a recording time, and the like in addition to the keyword. Then, the audio data extracted as a result of the search is displayed on the screen of the display device in a list format.

【０００５】図１６は、このような従来の音声データベ
ースシステムにおける操作手順を表示装置に表示される
画面で示した説明図である。図１６に示すように、オペ
レータは所望の音声データを得るために検索画面Ｐ１の
検索条件入力領域２５にキーワードなどの属性情報を入
力する。例えば、オペレータが猫の鳴き声の音声データ
を要求している場合には、キーワードに「猫」などとい
う言葉を入力し、コンピュータに検索を実行させる。FIG. 16 is an explanatory diagram showing an operation procedure in such a conventional voice database system on a screen displayed on a display device. As shown in FIG. 16, the operator inputs attribute information such as a keyword into the search condition input area 25 of the search screen P1 in order to obtain desired voice data. For example, when the operator requests voice data of a cat's meow, a word such as "cat" is input as a keyword, and the computer is caused to execute a search.

【０００６】そして、コンピュータが検索を終了する
と、表示装置において検索結果が表示される。図１６に
示す検索結果表示画面Ｐ２は、抽出された音声データが
６個ある場合を示している。検索結果表示画面Ｐ２で
は、音声データを「アイコン」と「ファイル名」とによ
り表示している。ここで、「ファイル名」とは、データ
やプログラムに対応付けられた名称であり、「アイコ
ン」とは、画面上でファイルをシンボル化して表示する
絵記号である。そして音声データを示すアイコンとして
スピーカの絵柄を採用している。[0006] When the computer completes the search, the search result is displayed on the display device. The search result display screen P2 shown in FIG. 16 shows a case where there are six extracted voice data. In the search result display screen P2, audio data is displayed by "icon" and "file name". Here, the “file name” is a name associated with data or a program, and the “icon” is a pictorial symbol for displaying a file as a symbol on a screen. Then, a picture of a speaker is adopted as an icon indicating audio data.

【０００７】[0007]

【発明が解決しようとする課題】しかし、一般的に、オ
ペレータは、他人が設定したファイル名について、それ
がどのようなファイルであるのか判断できない場合が多
く、上記のように検索結果が表示されても、オペレータ
がファイル名を認識していない限り、最終的な音声デー
タの特定をすることができない。However, in general, the operator cannot often determine the file name set by another person, and the search result is displayed as described above. Even if the operator does not recognize the file name, the final audio data cannot be specified.

【０００８】従って、オペレータが所望する音声データ
を確定するためには、属性情報を表示させ、それを詳細
に確認することにより決定しなければならない。このよ
うに属性情報を確認しながら音声データを特定すること
は、容易な作業ではなく、効率が非常に悪いとともに、
実際の音声の再生を伴わないためオペレータに不安感が
残ることもある。[0008] Therefore, in order for the operator to determine the desired voice data, it is necessary to display the attribute information and confirm it in detail. Identifying audio data while checking attribute information in this way is not an easy task, and is extremely inefficient,
Since no actual sound reproduction is involved, the operator may sometimes feel uneasy.

【０００９】そして、属性情報の確認で不安感が残る場
合には、オペレータは音声データの再生アプリケーショ
ンを起動させて、実際の音声を再生することにより確認
する。この作業は、検索結果表示画面Ｐ２から意図的に
再生アプリケーションを起動させ、さらに再生アプリケ
ーション画面Ｐ３についての操作を行わなければならな
ず、操作に手間がかかることとなる。また、音声データ
が長い場合には、音声データを再生してもその音声の特
徴部分にたどり着くまで時間がかかることがあり、効率
的な特定を行うことができない。[0009] If the operator remains uneasy by checking the attribute information, the operator activates a voice data reproduction application and reproduces the actual voice. In this operation, the reproduction application must be intentionally started from the search result display screen P2, and further, the operation on the reproduction application screen P3 must be performed, and the operation is troublesome. Further, when the audio data is long, even if the audio data is reproduced, it may take a long time to reach a characteristic portion of the audio, and efficient identification cannot be performed.

【００１０】この発明は、上記課題に鑑みてなされたも
のであって、オペレータの所望する音声データを迅速か
つ効率的に確実に得ることができる音声データベースシ
ステムおよび記録媒体を提供することを目的とする。SUMMARY OF THE INVENTION The present invention has been made in view of the above problems, and has as its object to provide a voice database system and a recording medium capable of quickly and efficiently obtaining voice data desired by an operator. I do.

【００１１】[0011]

【課題を解決するための手段】上記目的を達成するため
に、請求項１に記載の発明は、音声データを取り扱うコ
ンピュータにおいて、(a)音声データの示す音声の任意
の特徴的な部分についての区間音声情報を前記音声デー
タに対応付けて記憶する記憶手段と、(b)複数の音声デ
ータの中から所定の情報に基づいて音声データの検索を
行う検索手段と、(c)検索手段によって抽出された音声
データを区間音声情報に基づいて再生し、音声として出
力する出力手段とを備えている。In order to achieve the above-mentioned object, the invention according to claim 1 is a computer which handles audio data, comprising: Storage means for storing section voice information in association with the voice data, (b) a search means for searching for voice data based on predetermined information from a plurality of voice data, and (c) extraction by the search means Output means for reproducing the voice data obtained based on the section voice information and outputting the voice data as voice.

【００１２】請求項２に記載の発明は、請求項１に記載
のシステムにおいて、記憶手段は、１つの音声データに
ついて複数の区間音声情報を対応付けて記憶することが
可能であることを特徴としている。According to a second aspect of the present invention, in the system according to the first aspect, the storage means can store a plurality of section audio information in association with one audio data. I have.

【００１３】請求項３に記載の発明は、請求項１または
２に記載のシステムにおいて、出力手段で出力されてい
る音声データに対応する表示が、当該音声データの区間
音声情報に基づく再生に伴って変化することを特徴とし
ている。According to a third aspect of the present invention, in the system according to the first or second aspect, the display corresponding to the audio data output by the output means is accompanied by reproduction based on the section audio information of the audio data. It is characterized by changing.

【００１４】請求項４に記載の発明は、請求項１ないし
３のいずれかに記載のシステムにおいて、検索手段によ
って抽出された音声データについて、特徴的な部分の発
音の擬音語または当該音声データについての客観的対象
物を含む文字で表現した音声データ認識ワードを表示す
ることを特徴としている。According to a fourth aspect of the present invention, in the system according to any one of the first to third aspects, for the voice data extracted by the search means, the onomatopoeia of the pronunciation of a characteristic portion or the voice data is used. And a voice data recognition word expressed in characters including the objective object of the present invention.

【００１５】請求項５に記載の発明は、コンピュータ
を、(a)音声データの示す音声の任意の特徴的な部分に
ついての区間音声情報を前記音声データに対応付けて記
憶する記憶手段、(b)複数の音声データの中から所定の
情報に基づいて音声データの検索を行う検索手段、(c)
検索手段によって抽出された音声データを区間音声情報
に基づいて再生し、音声として出力する出力手段として
機能させるための音声データベースプログラムが記録さ
れている。According to a fifth aspect of the present invention, there is provided a storage means for storing (a) section voice information on an arbitrary characteristic portion of voice indicated by voice data in association with the voice data; A) searching means for searching for audio data based on predetermined information from a plurality of audio data, (c)
A voice database program is recorded for reproducing the voice data extracted by the search means based on the section voice information and for functioning as an output means for outputting as voice.

【００１６】[0016]

BEST MODE FOR CARRYING OUT THE INVENTION

＜１．装置の構成＞まず、この発明の実施の形態におけ
る音声データベースシステムの概要について説明する。
図１は、この発明の実施の形態である音声データベース
システムの構成を示す概略図である。図１に示すよう
に、この装置において入出力装置１１，ＣＰＵ１２，メ
モリ１３，記憶部１４，インタフェース１５，１６，１
７がバスラインＢＬを介して相互に接続されている。入
出力装置１１は、フレキシブルディスク，光磁気ディス
ク，ＣＤ−ＲＯＭなどのコンピュータ読み取り可能な可
搬性記録媒体Ｄからデータを読み込んだり、それらに対
してデータを書き込んだりする装置である。ＣＰＵ１２
は、演算処理を行う処理部である。メモリ１３は、デー
タを一時的に記憶保持しておくための装置であり、記憶
部１４は、磁気ディスクなどのコンピュータ読み取り可
能な固定の記録媒体である。そして、インタフェース１
５にはさらにＣＲＴや液晶ディスプレイなどのような表
示装置１８が接続されており、インタフェース１６には
キーボード１９，マウス２０が接続されている。さら
に、インタフェース１７には音声データについての音声
を発生させるスピーカ２１が接続されている。また、こ
の音声データベースシステムは、必要に応じてネットワ
ークに接続することもでき、そのネットワークに接続さ
れている他の機器から音声データを獲得することもでき
る。<1. Configuration of Apparatus> First, an outline of a voice database system according to an embodiment of the present invention will be described.
FIG. 1 is a schematic diagram showing a configuration of a voice database system according to an embodiment of the present invention. As shown in FIG. 1, in this device, an input / output device 11, a CPU 12, a memory 13, a storage unit 14, interfaces 15, 16, 1
7 are mutually connected via a bus line BL. The input / output device 11 is a device that reads data from and writes data to a computer-readable portable recording medium D such as a flexible disk, a magneto-optical disk, and a CD-ROM. CPU 12
Is a processing unit that performs arithmetic processing. The memory 13 is a device for temporarily storing and holding data, and the storage unit 14 is a computer-readable fixed recording medium such as a magnetic disk. And interface 1
A display device 18 such as a CRT or a liquid crystal display is connected to 5, and a keyboard 19 and a mouse 20 are connected to the interface 16. Further, a speaker 21 for generating audio for audio data is connected to the interface 17. The voice database system can be connected to a network as needed, and can obtain voice data from other devices connected to the network.

【００１７】このように、この実施の形態の音声データ
ベースシステムは、一般的な１台のコンピュータにおい
て内部のＣＰＵ１２が音声データベースプログラムを実
行することにより、実現される装置である。なお、上記
の音声データベースプログラムは、可搬性記録媒体Ｄか
ら読み込まれても良いし、予め記憶部１４に記憶させて
おいても良い。すなわち、この音声データベースプログ
ラムが格納される対象は、可搬性記録媒体であるか、固
定の記録媒体であるかを問わない。As described above, the voice database system according to this embodiment is a device realized by executing the voice database program by the internal CPU 12 in one general computer. Note that the above audio database program may be read from the portable recording medium D, or may be stored in the storage unit 14 in advance. That is, the object in which the audio database program is stored may be a portable recording medium or a fixed recording medium.

【００１８】＜２．音声データの登録＞この実施の形態
の音声データベースシステムにおいて、音声データを登
録する際に、オペレータは、属性情報と音声データ認識
ワードを設定する。属性情報に含まれる情報としては、
音声データについてのタイトル名，作成者／収録者，収
録場所，収録日時，収録時間，キーワード，コメントな
どである。なお、キーワードは複数個設定することが可
能である。この属性情報は、音声データの検索の際に使
用される情報である。また、「音声データ認識ワード」
とは、音声データがどのような音声についてのデータで
あるかを具体的かつ簡潔に示すような任意の言葉であ
る。例えば、鈴虫の鳴き声の音声データである場合は、
「リーンリーン」という擬音語や「鈴虫の鳴き声」とい
う具体的名称、さらには「鈴虫」という当該音声の源で
ある客観的対象物の名称などの音声データ認識ワードが
入力される。なお、この音声データ認識ワードは、検索
結果の表示の際に用いられる文字データであるため、音
声データの内容を誰もが認識できるような曲名や擬音な
どの言葉で入力することが好ましい。<2. Registration of Voice Data> In the voice database system of this embodiment, when registering voice data, the operator sets attribute information and a voice data recognition word. As information included in the attribute information,
The title, creator / recorder, recording location, recording date, recording time, keyword, comment, etc., of the audio data are provided. Note that a plurality of keywords can be set. This attribute information is information used when searching for audio data. Also, "Speech data recognition word"
Is an arbitrary word that specifically and simply indicates what kind of sound the sound data is. For example, if the sound data is the sound of a bell
A speech data recognition word such as an onomatopoeic word “Lean Lean”, a specific name “Suzumushi cry”, and a name of an objective object “Suzumushi” as a source of the sound is input. Since the voice data recognition word is character data used for displaying search results, it is preferable to input the content of the voice data using words such as a song title or an onomatopoeia that can be recognized by anyone.

【００１９】そして、オペレータが属性情報と音声デー
タ認識ワードとの入力を行う際は、キーボード１９やマ
ウス２０から行う。そして入力された属性情報と音声デ
ータ認識ワードは記憶部１４に保存される。また、多量
の音声データがＣＤ−ＲＯＭなどの記録媒体に格納され
ている場合などには、属性情報，音声データ認識ワー
ド，および記録媒体の音声データとを相互に関連づけた
状態で属性情報，音声データ認識ワードを記憶部１４に
保存する。これにより、大容量を必要とする音声データ
を記憶部１４内に保持する必要がなくなる。The operator inputs the attribute information and the voice data recognition word from the keyboard 19 or the mouse 20. The input attribute information and voice data recognition word are stored in the storage unit 14. When a large amount of audio data is stored in a recording medium such as a CD-ROM, etc., the attribute information, the audio data recognition word, and the The data recognition word is stored in the storage unit 14. As a result, there is no need to store audio data requiring a large capacity in the storage unit 14.

【００２０】また、オペレータは、音声データを登録す
る際に、音声データのキーフレーズを設定する。「キー
フレーズ」とは、音声データの示す音声の任意の特徴的
な部分の区間音声情報である。なお、１つの音声データ
について複数のキーフレーズを設定することも可能であ
り、複数のキーフレーズを設定することにより音声デー
タの特定がより確実なものとなる。キーフレーズの設定
は、音声データについて識別性のある特徴的な部分の開
始点と終了点とを入力することにより行われる。When registering voice data, the operator sets a key phrase of the voice data. The “key phrase” is section voice information of an arbitrary characteristic portion of the voice indicated by the voice data. Note that it is also possible to set a plurality of key phrases for one audio data, and by setting a plurality of key phrases, the identification of the audio data becomes more reliable. The setting of the key phrase is performed by inputting a start point and an end point of a distinctive characteristic portion of the audio data.

【００２１】図２は、この実施の形態における音声デー
タのキーフレーズの設定方法を説明するための説明図で
ある。図２に示すように、時刻ｔ１に音声データの再生
を開始した場合、時刻ｔ２に終了する。なお、図２にお
いて音声波形ＷＡＶ１は、音声データを再生した際の音
声の波形を示している。このような音声波形ＷＡＶ１
は、キーフレーズの設定の際に表示装置１８に表示さ
れ、音声データを視覚的に認識することができるととも
に、音声データの再生に伴って再生位置表示２２が時刻
ｔ１の位置から時刻ｔ２の位置まで移動するように構成
されている。そして、音声データの特徴的な部分が時刻
ｔａから時刻ｔｂの間の音声であるとすると、オペレー
タは、音声データを再生し始めてから時刻ｔａに差し掛
かったところでキーボード１９などから開始点の入力を
行い、時刻ｔｂに差し掛かったところでキーボード１９
などから終了点の入力を行う。このようにして音声デー
タについて、時刻ｔａからｔｂの間の音声の特徴的な部
分がキーフレーズとして設定される。FIG. 2 is an explanatory diagram for explaining a method of setting a key phrase of audio data in this embodiment. As shown in FIG. 2, when the reproduction of the audio data starts at time t1, the process ends at time t2. In FIG. 2, a sound waveform WAV1 shows a sound waveform when sound data is reproduced. Such an audio waveform WAV1
Is displayed on the display device 18 at the time of setting the key phrase so that the audio data can be visually recognized, and the reproduction position display 22 changes from the position of the time t1 to the position of the time t2 with the reproduction of the audio data. It is configured to move up to. Then, assuming that the characteristic part of the audio data is the audio between time ta and time tb, the operator inputs the start point from the keyboard 19 or the like when approaching the time ta after starting to reproduce the audio data. At the time tb, the keyboard 19
Input the end point from In this way, for the voice data, a characteristic portion of the voice between time ta and tb is set as a key phrase.

【００２２】このようなキーフレーズの設定は、一つの
音声データに対して複数個の指定が可能である。キーフ
レーズを複数個設定する場合は、上記の開始点と終了点
とのキーフレーズの区間の入力を繰り返し行えば良い。
また、キーフレーズの設定が行われない場合は、その音
声データについて全区間が自動的にキーフレーズとして
設定される。そして、キーフレーズについても属性情
報，音声データ認識ワード，音声データと相互に関連づ
けられた状態で記憶部１４に記憶される。この記憶部１
４への記憶は、キーフレーズの開始点・終了点を示す情
報を記憶しても良いが、キーフレーズとして設定された
区間の実際の音声データを取り出して記憶しても良い。In the setting of such a key phrase, a plurality of designations can be made for one voice data. When a plurality of key phrases are set, the input of the key phrase section between the start point and the end point may be repeated.
If the key phrase is not set, all sections of the audio data are automatically set as the key phrase. The key phrase is also stored in the storage unit 14 in a state where the key phrase is associated with the attribute information, the voice data recognition word, and the voice data. This storage unit 1
The storage in 4 may store information indicating the start point and end point of the key phrase, or may extract and store the actual voice data of the section set as the key phrase.

【００２３】次に、この音声データベースシステムにお
ける音声データの登録の処理手順について説明する。図
３は、この実施の形態の音声データベースシステムにお
ける音声データの登録の処理を示すフローチャートであ
る。Next, a procedure for registering voice data in the voice database system will be described. FIG. 3 is a flowchart showing a process of registering audio data in the audio database system according to this embodiment.

【００２４】まず、ＣＰＵ１２において、オペレータの
指示により音声データベースプログラムのうちの音声デ
ータの登録に関するプログラムを起動する（ステップＳ
１１）。次にステップＳ１２においてオペレータは、Ｃ
Ｄ−ＲＯＭなどの記録媒体や記憶部１４に保持されてい
る音声データのうちで未登録のものを選択する。この選
択は、オペレータが表示装置１８に表示されている音声
データの一覧を参照しながら行う。そして、ステップＳ
１３においてＣＰＵ１２は、オペレータの選択した音声
データを再生し、スピーカ２１より音声を発生させる。
そしてオペレータがスピーカ２１からの音声を聞き、音
声データベースシステムに登録する音声データであるか
否かを判断し、キーボード１９やマウス２０より「ＹＥ
Ｓ」または「ＮＯ」に対応する入力を行う（ステップＳ
１４）。そして、ステップＳ１４でのオペレータの入力
によって登録処理を進めるのであれば、ステップＳ１５
に進み、登録処理を行わないのであればステップＳ１２
に戻り、他の音声データの選択を行うこととなる。First, in the CPU 12, a program relating to registration of voice data among voice database programs is started in accordance with an instruction of an operator (step S).
11). Next, in step S12, the operator
Unregistered audio data is selected from among audio data held in a storage medium such as a D-ROM or the storage unit 14. This selection is made by the operator while referring to a list of audio data displayed on the display device 18. And step S
At 13, the CPU 12 reproduces the audio data selected by the operator and causes the speaker 21 to generate audio.
Then, the operator listens to the voice from the speaker 21 and determines whether or not the voice data is to be registered in the voice database system.
S ”or“ NO ”(step S
14). If the registration process is to be advanced by the input of the operator in step S14, step S15
Proceeds to step S12 if the registration process is not performed.
And the other audio data is selected.

【００２５】ステップＳ１５では、登録の対象となって
いる音声データの属性情報と音声データ認識ワードなど
の入力を行う。この入力もオペレータが表示装置１８を
参照しながらキーボード１９等より行う。そして次に、
キーフレーズの設定を行う（ステップＳ１６）。キーフ
レーズの設定は、先述のように、表示装置１８に表示さ
れる内容と、スピーカ２１から聞こえる音声に基づいて
音声データの特徴的な部分の開始点と終了点を入力する
ことにより行われる。そして、ステップＳ１７では、ス
テップＳ１５，Ｓ１６で入力・設定した内容について、
オペレータが登録の指示を行う。この登録の指示がある
と、ＣＰＵ１２は音声データと属性情報と音声データ認
識ワードとキーフレーズとを相互に対応付け、記憶部１
４にその内容を保存する。そして、ステップＳ１８にお
いてオペレータが登録処理を終了するか否かの判断を行
い、それぞれに対応する入力を行う。ＣＰＵ１２は、ス
テップＳ１８での入力に基づいてステップＳ１２やステ
ップＳ１９に処理を進める。そして、ステップＳ１９で
は、ＣＰＵ１２において音声データの登録に関するプロ
グラムを終了する。In step S15, input is made of attribute information of voice data to be registered and voice data recognition words. This input is also performed by the operator using the keyboard 19 or the like while referring to the display device 18. And then
The key phrase is set (step S16). As described above, the setting of the key phrase is performed by inputting the start point and the end point of the characteristic portion of the voice data based on the content displayed on the display device 18 and the voice heard from the speaker 21. Then, in step S17, regarding the contents input and set in steps S15 and S16,
The operator issues a registration instruction. Upon receiving the registration instruction, the CPU 12 associates the voice data, the attribute information, the voice data recognition word, and the key phrase with each other, and
4. Save the contents. Then, in step S18, the operator determines whether or not to end the registration processing, and performs an input corresponding to each. The CPU 12 advances the process to step S12 or step S19 based on the input in step S18. Then, in step S19, the CPU 12 terminates the program relating to the registration of the audio data.

【００２６】以上のような処理によって、この実施の形
態の音声データベースシステムでは、音声データについ
て属性情報，音声データ認識ワード，キーフレーズの設
定を行うことができる。With the above-described processing, the voice database system of this embodiment can set attribute information, voice data recognition words, and key phrases for voice data.

【００２７】＜３．音声データの検索および検索結果の
表示＞次に、音声データの検索および検索結果の表示に
ついて説明する。<3. Search for Voice Data and Display of Search Results> Next, search for voice data and display of search results will be described.

【００２８】音声データの検索においては、従来からの
検索と同様に、タイトル名，キーワード，コメントなど
の属性情報やファイル名に基づく検索が行われる。例え
ば、オペレータが猫の鳴き声の音声データを要求する場
合は、検索時に「動物」や「猫」という検索語を入力す
れば、そのような検索語を属性情報やファイル名に有す
る音声データを得ることができる。また、検索はＣＰＵ
１２によって記憶部１４に保存されている属性情報など
を基に行われ、登録の際に対応付けられている音声デー
タを特定することができる。In the search for audio data, a search based on attribute information such as a title name, a keyword, and a comment or a file name is performed in the same manner as a conventional search. For example, if the operator requests voice data of a bark of a cat, inputting a search word such as "animal" or "cat" at the time of search will obtain voice data having such a search word in attribute information or a file name. be able to. Also, search is CPU
12 is performed based on the attribute information and the like stored in the storage unit 14, and the audio data associated with the registration can be specified.

【００２９】そして、検索の結果得られた音声データ
は、表示装置１８に表示される。図４は、この実施の形
態の音声データべースシステムにおける検索結果の表示
の一例を示す図である。図４に示すように、検索結果表
示画面Ｐ２には４個の音声データのアイコンＩＣ１〜Ｉ
Ｃ４が表示されている。そして、それぞれのアイコンＩ
Ｃ１からＩＣ４の下欄には音声データ認識ワードが表示
されている。アイコンＩＣ１には「リーンリーン」が、
アイコンＩＣ２には「わんわん」が、アイコンＩＣ３に
は「石がころころ」が、アイコンＩＣ４には「ポロネー
ズ」が音声データ認識ワードとして表示されている。音
声データの登録の際に音声データ認識ワードは、音声デ
ータがどのような音声についての音声データであるかを
判断しやすい言葉で登録されているため、図４に示す検
索結果表示画面Ｐ２についても４個の音声データがそれ
ぞれどのような音声であるかを推定することが容易であ
る。The audio data obtained as a result of the search is displayed on the display device 18. FIG. 4 is a diagram showing an example of display of a search result in the audio database system according to the present embodiment. As shown in FIG. 4, the icons IC1 to IC4 of the four voice data are displayed on the search result display screen P2.
C4 is displayed. And each icon I
Voice data recognition words are displayed in the lower columns of C1 to IC4. "Lean Lean" is displayed on the icon IC1.
The icon IC2 displays “Wanwan”, the icon IC3 displays “Stone Rolls”, and the icon IC4 displays “Polonise” as voice data recognition words. When the voice data is registered, the voice data recognition word is registered in words that make it easy to determine what kind of voice the voice data is. Therefore, the search result display screen P2 shown in FIG. It is easy to estimate what kind of voice each of the four voice data is.

【００３０】さらに、この実施の形態では、オペレータ
の所望する音声データの特定を容易にするために、「非
選択状態」と「仮選択状態」という２つの状態に基づい
て検索結果の表示の状態やスピーカ２１から発生させる
音声の制御を行っている。この非選択状態と仮選択状態
の設定は、例えば図４のように表示された４個の音声デ
ータのそれぞれについて行うことができる。そして、そ
れぞれについての非選択状態／仮選択状態の切り替え
は、マウス２０などの操作によりマウスポインタをアイ
コンＩＣ１〜ＩＣ４と重なる位置に移動させ、その位置
でマウス２０をクリックすること等により行うことがで
きる。また、非選択状態の音声データのアイコンに重な
る位置にマウスポインタを位置させると、そのアイコン
に対応する音声データは仮選択状態となる。このよう
に、オペレータの意図的な切り替えとダイナミックな切
り替えとを併存させている。図５は、検索結果表示画面
Ｐ２の非選択状態と仮選択状態と示す図である。図５に
示すアイコンＩＣ１，ＩＣ３，ＩＣ４は非選択状態の音
声データを示しており、アイコンＩＣ２は仮選択状態の
音声データを示している。すなわち、マウス２０などの
操作によって仮選択状態となった場合は、アイコンの枠
が太枠で表示される。Further, in this embodiment, in order to facilitate the specification of the voice data desired by the operator, the display state of the search result based on the two states of "non-selection state" and "temporary selection state" And the sound generated from the speaker 21 is controlled. The setting of the non-selection state and the provisional selection state can be performed for each of the four voice data displayed as shown in FIG. 4, for example. The switching between the non-selection state and the provisional selection state is performed by moving the mouse pointer to a position overlapping the icons IC1 to IC4 by operating the mouse 20 or the like, and clicking the mouse 20 at that position. it can. When the mouse pointer is positioned at a position overlapping the icon of the audio data in the non-selected state, the audio data corresponding to the icon is in the tentatively selected state. Thus, intentional switching by the operator and dynamic switching coexist. FIG. 5 is a diagram showing a non-selected state and a provisionally selected state of the search result display screen P2. Icons IC1, IC3, and IC4 shown in FIG. 5 indicate audio data in a non-selected state, and icon IC2 indicates audio data in a tentatively selected state. That is, when the operation is temporarily selected by the operation of the mouse 20 or the like, the icon frame is displayed as a thick frame.

【００３１】この「非選択状態」と「仮選択状態」につ
いて説明する。The "non-selected state" and the "temporarily selected state" will be described.

【００３２】ａ）非選択状態での表示などの制御検索結果の表示の際に、検索条件と一致した音声データ
がすべて非選択状態の場合について説明する。この場合
には、「モード１」と「モード２」の２通りの再生モー
ドが準備されており、当該音声データベースシステムの
初期設定において任意のモードを選択することができ
る。A) Control of Display in Non-Selected State When displaying search results, a case where all audio data that matches the search condition is in the non-selected state will be described. In this case, two reproduction modes “mode 1” and “mode 2” are prepared, and any mode can be selected in the initial setting of the audio database system.

【００３３】a-1）モード１について「モード１」としては、検索条件と一致した音声データ
のそれぞれについて設定されているキーフレーズの区間
を繰り返し、そして全ての音声データの繰り返されたキ
ーフレーズを総和合成し、平均の２分の１の音量になる
ようにレベル調整した後に再生し、スピーカ２１より合
成された音声を発生させる機能である。このことを図６
と図７とを参照して説明する。A-1) Mode 1 In the "mode 1", the section of the key phrase set for each of the voice data that matches the search condition is repeated, and the repeated key phrase of all the voice data is deleted. This is a function of generating a voice synthesized by the speaker 21 after performing total synthesis, adjusting the level so that the volume becomes half of the average, and reproducing. This is shown in FIG.
This will be described with reference to FIG.

【００３４】図６は、２つの音声データのキーフレーズ
を示す図である。図６（ａ）に示す音声データの音声波
形ＷＡＶ２について設定されているキーフレーズは、区
間ｋａの範囲の音声データである。図６（ｂ）に示す音
声データの音声波形ＷＡＶ３について設定されているキ
ーフレーズは、区間ｋｂの範囲の音声データである。FIG. 6 is a diagram showing key phrases of two voice data. The key phrase set for the audio waveform WAV2 of the audio data shown in FIG. 6A is audio data in the range of the section ka. The key phrase set for the audio waveform WAV3 of the audio data shown in FIG. 6B is audio data in the range of section kb.

【００３５】図７は、キーフレーズの総和合成を示す説
明図である。図７に示す音声波形ＷＡＶ４は、図６
（ａ）に示す音声波形ＷＡＶ２のキーフレーズ区間ｋａ
の繰り返しを示している。また、図７に示す音声波形Ｗ
ＡＶ５は、図６（ｂ）に示す音声波形ＷＡＶ３のキーフ
レーズ区間ｋｂの繰り返しを示している。なお、一つの
音声データに複数のキーフレーズが設定されている場合
には、それらは順次に繰り返される。検索の結果抽出さ
れた音声データが他にもある場合には、それらについて
もキーフレーズ区間を繰り返した音声データを生成す
る。各キーフレーズを繰り返す際に、キーフレーズとキ
ーフレーズのつなぎの部分には、レベル調整が施され、
フェードインやフェードアウトの効果が効かされる。
「フェードイン」とはキーフレーズの始まりの部分にお
いて音声レベルを徐々に大きくしていくことであり、
「フェードアウト」とはキーフレーズの終わりの部分で
音声レベルを徐々に小さくしていくことである。FIG. 7 is an explanatory diagram showing the summation of key phrases. The audio waveform WAV4 shown in FIG.
The key phrase section ka of the audio waveform WAV2 shown in FIG.
Is shown repeatedly. Also, the audio waveform W shown in FIG.
AV5 indicates the repetition of the key phrase section kb of the audio waveform WAV3 shown in FIG. If a plurality of key phrases are set in one audio data, they are sequentially repeated. If there are other voice data extracted as a result of the search, voice data in which the key phrase section is repeated is generated for those voice data. When each key phrase is repeated, the level of the connection between the key phrases is adjusted,
Fade-in and fade-out effects are applied.
"Fade in" means gradually increasing the audio level at the beginning of a key phrase,
"Fade-out" refers to gradually lowering the audio level at the end of a key phrase.

【００３６】そして、検索によって抽出された音声デー
タの全ての音声データのキーフレーズを繰り返した音声
データを総和合成し、平均化した音声波形ＷＡＶ６を生
成する。そして、平均化された音声波形ＷＡＶ６の音声
レベルを、「１／２」になるようにレベル調整し、音声
波形ＷＡＶ７を生成する。このようにしてＣＰＵ１２に
よって抽出された全ての音声データの総和合成，平均
化，レベル調整が行われて生成された音声波形ＷＡＶ７
が再生され、スピーカ２１により音声波形ＷＡＶ７に基
づいた音声を発生させる。Then, the voice data obtained by repeating the key phrases of all the voice data of the voice data extracted by the search are summed and synthesized to generate an averaged voice waveform WAV6. Then, the audio level of the averaged audio waveform WAV6 is adjusted so as to be “１／”, and an audio waveform WAV7 is generated. The audio waveform WAV7 generated by performing the total synthesis, averaging, and level adjustment of all the audio data extracted by the CPU 12 in this manner.
Is reproduced, and the speaker 21 generates sound based on the sound waveform WAV7.

【００３７】このように「モード１」においては、検索
の結果抽出された音声データの数が少ない場合、どのよ
うな音声の音声データが検索されたかを音声波形ＷＡＶ
７に基づく音声によって推定することができる。また、
検索の結果抽出された音声データの数が多い場合は、全
ての音声データのキーフレーズが同時に再生されること
となり、雑音に近い音声となる。なお、「モード１」に
おける表示画面は図４に示したものと同様である。As described above, in the "mode 1", when the number of audio data extracted as a result of the search is small, it is determined which audio data is searched by the audio waveform WAV.
7 can be estimated by voice. Also,
When the number of voice data extracted as a result of the search is large, the key phrases of all voice data are reproduced simultaneously, and the voice is close to noise. The display screen in “mode 1” is the same as that shown in FIG.

【００３８】a-2）モード２について「モード２」としては、検索条件と一致して抽出された
全ての音声データについて設定されているキーフレーズ
の区間を順次に連続させるとともに、２分の１の音量に
なるようにレベル調整した後に再生し、スピーカ２１よ
り合成された音声を発生させる機能である。このことを
図８を参照して説明する。A-2) Mode 2 In the "mode 2", the key phrase sections set for all the audio data extracted in accordance with the search condition are sequentially continued, and a half This is a function of reproducing a sound after adjusting the level so as to have a volume of, and generating a synthesized voice from the speaker 21. This will be described with reference to FIG.

【００３９】図８は、「モード２」についての音声デー
タの再生モードの説明図であり、検索の結果抽出された
音声データが３個である場合を例示している。図８
（ａ）に示す音声データについて設定されているキーフ
レーズは区間ｋａの範囲の音声データであり、図８
（ｂ）に示す音声データについて設定されているキーフ
レーズは区間ｋｂの範囲の音声データであり、図８
（ｃ）に示す音声データについて設定されているキーフ
レーズは区間ｋｃの範囲の音声データである。FIG. 8 is an explanatory diagram of the reproduction mode of the audio data in "mode 2", and illustrates a case where the number of audio data extracted as a result of the search is three. FIG.
The key phrase set for the audio data shown in (a) is audio data in the range of the section ka, and FIG.
The key phrase set for the audio data shown in (b) is audio data in the range of section kb, and is shown in FIG.
The key phrase set for the audio data shown in (c) is audio data in the range of section kc.

【００４０】そして、これら図８（ａ）〜（ｃ）に示す
キーフレーズを順次に連続して再生するために、図８
（ｄ）に示す音声波形ＷＡＶ８を生成する。音声波形Ｗ
ＡＶ８は単に図８（ａ）〜（ｃ）に示すそれぞれのキー
フレーズ区間ｋａ，ｋｂ，ｋｃを連続してつなげたもの
である。各キーフレーズのつなぎの部分には、レベル調
整が施され、フェードインやフェードアウトの効果が効
かされる。In order to reproduce the key phrases shown in FIGS. 8A to 8C sequentially and continuously, FIG.
An audio waveform WAV8 shown in (d) is generated. Sound waveform W
The AV8 is simply a series of the respective key phrase sections ka, kb, kc shown in FIGS. 8 (a) to 8 (c). The level adjustment is applied to the connecting portion of each key phrase, and the effect of fade-in or fade-out is exerted.

【００４１】そして、得られた音声波形ＷＡＶ８の音声
レベルを、「１／２」になるようにレベル調整し、音声
波形ＷＡＶ９を生成する。このようにしてＣＰＵ１２に
よって抽出された全ての音声データのキーフレーズの連
続化，レベル調整が行われて生成された音声波形ＷＡＶ
９が再生され、スピーカ２１により音声波形ＷＡＶ９に
基づいた音声を発生させる。すなわち、「モード２」に
おいては、抽出された音声データのキーフレーズが一つ
ずつ順次に繰り返し再生されることとなる。Then, the audio level of the obtained audio waveform WAV8 is adjusted so as to be "1/2" to generate an audio waveform WAV9. The audio waveform WAV generated by performing continuity and level adjustment of key phrases of all audio data extracted by the CPU 12 in this manner.
9 is reproduced, and the speaker 21 generates a sound based on the sound waveform WAV9. That is, in the "mode 2", the key phrases of the extracted audio data are repeatedly reproduced one by one.

【００４２】そして、キーフレーズが再生されている音
声データについて、表示装置１８で表示されているアイ
コンの色が変化するとともに、そのアイコンの大きさが
音量のに応じてダイナミックに変化する。これにより、
スピーカ２１より再生されているキーフレーズが表示装
置１８に表示されている音声データのアイコンのうちで
どの音声データを再生しているかの認識が視覚的にも容
易となる。Then, with respect to the audio data in which the key phrase is reproduced, the color of the icon displayed on the display device 18 changes, and the size of the icon dynamically changes according to the volume. This allows
Recognition of which audio data is being reproduced among the audio data icons displayed on the display device 18 for the key phrase reproduced from the speaker 21 is also visually easy.

【００４３】図９は、キーフレーズの再生に伴う音声デ
ータのアイコンの変化を示す図であり、音声データ認識
ワードが「わんわん」と設定されている音声データのキ
ーフレーズが再生されている場合を示している。図９
（ａ）は再生されるキーフレーズの音量が大きいときの
アイコンＩＣ２を示しており、図９（ｂ）は音量が小さ
いときのアイコンＩＣ２を示している。また、図９
（ａ），（ｂ）に示すアイコンＩＣ２は、他のアイコン
と比べると色が異なり、再生されているアイコンを特定
しやすくしている。このようにスピーカ２１から発せら
れるキーフレーズの音量に応じてアイコンの大きさがダ
イナミックに変化するとともに、アイコンの色も変化さ
せるため、再生している音声データの特定を視覚的に容
易に認識できるように実現されている。FIG. 9 is a diagram showing a change in the icon of the voice data accompanying the reproduction of the key phrase. The case where the key phrase of the voice data in which the voice data recognition word is set to "Wanwan" is reproduced. Is shown. FIG.
(A) shows the icon IC2 when the volume of the key phrase to be reproduced is high, and FIG. 9 (b) shows the icon IC2 when the volume is low. FIG.
The icon IC2 shown in (a) and (b) has a different color compared to other icons, making it easier to identify the icon being reproduced. As described above, since the size of the icon dynamically changes according to the volume of the key phrase emitted from the speaker 21 and the color of the icon also changes, the identification of the audio data being reproduced can be easily visually recognized. Is realized as follows.

【００４４】さらに、この実施の形態における「モード
２」では、再生しているキーフレーズに対応する音声デ
ータについて設定されている音声データ認識ワードが流
れ表示になる。「流れ表示」とは、表示装置１８に表示
されている文字などが画面上を流れるように移動するこ
とをいう。この例を図１０に示す。図１０は、この実施
の形態における音声データ認識ワードの流れ表示を示す
図である。図１０（ａ）は、音声データ認識ワードが流
れ表示となる第１段階を示しており、図１０（ｂ）は第
２段階を示している。そして、図１０（ｃ）は第３段階
を示している。まず、第１段階では、音声データ認識ワ
ードとして設定されている「リーンリーン」が全て表示
されている。第２段階では、音声データ認識ワードが左
に１文字分移動し、「リーンリーン」の最初の「リ」が
消えている。さらに、第３段階では、第２段階からさら
に左に１文字分移動し、「リーンリーン」の最初の「リ
ー」が消えるとともに、右欄に「リ」が現れている。以
下同様に音声データ認識ワードが左に少しずつ移動し、
左端から文字が消えていく一方で右端から文字が出現す
るように実現されている。このように音声データ認識ワ
ードを流れ表示とすることによっても再生されているキ
ーフレーズの音声データがどれであるを視覚的に特定し
やすくなっている。Further, in "mode 2" in this embodiment, the voice data recognition word set for the voice data corresponding to the key phrase being reproduced is displayed in a flowing manner. "Flow display" means that characters and the like displayed on the display device 18 move so as to flow on the screen. This example is shown in FIG. FIG. 10 is a diagram showing a flow display of the speech data recognition word in this embodiment. FIG. 10A shows the first stage in which the voice data recognition word is displayed in a flowing manner, and FIG. 10B shows the second stage. FIG. 10C shows the third stage. First, in the first stage, all "lean / lean" set as speech data recognition words are displayed. In the second stage, the voice data recognition word has moved to the left by one character, and the first "L" of "Lean Lean" has disappeared. Further, in the third stage, the character is further moved leftward by one character from the second stage, and the first "Lee" of "Lean Lean" disappears and "L" appears in the right column. Similarly, the voice data recognition word gradually moves to the left,
It is realized that the characters disappear from the left end while the characters appear from the right end. As described above, by displaying the voice data recognition word in a flowing manner, it is easy to visually identify the voice data of the key phrase being reproduced.

【００４５】このように「モード２」によれば、検索に
よって抽出された音声データのキーフレーズを順次に再
生するとともに、再生されている音声データを特定する
ことを視覚的に容易となるように実現したため、抽出さ
れた複数の音声データの中から確実にオペレータの所望
する音声データを特定することができる。しかし、「モ
ード２」において抽出された音声データが多い場合は、
全ての音声データのキーフレーズを再生するのに要する
時間が長くなるということがある。As described above, according to the "mode 2", the key phrases of the audio data extracted by the search are sequentially reproduced, and the identification of the reproduced audio data is visually facilitated. As a result, the audio data desired by the operator can be reliably specified from the plurality of extracted audio data. However, if there is much audio data extracted in "mode 2",
In some cases, the time required to reproduce the key phrases of all audio data becomes longer.

【００４６】a-3）モードの切り替えについて先述したように、「モード１」と「モード２」の切換
は、当該音声データベースシステムの初期設定において
任意のモードを選択することも可能であるが、自動でモ
ードを切り替えることも可能である。自動でモードを切
り替える方法としては、検索の結果抽出された音声デー
タが予め設定されている指定個数以上である場合は「モ
ード１」による再生・表示となり、指定個数未満である
場合は「モード２」による再生・表示となる。指定個数
は予めオペレータが設定することが可能である。A-3) Mode Switching As described above, the mode can be switched between "mode 1" and "mode 2" by selecting an arbitrary mode in the initial setting of the voice database system. It is also possible to switch modes automatically. As a method of automatically switching the mode, if the number of audio data extracted as a result of the search is equal to or more than a predetermined number, the mode is reproduced / displayed in “mode 1”. "Is displayed / displayed. The designated number can be set in advance by the operator.

【００４７】ｂ）仮選択状態での表示などの制御検索結果の表示の際に、検索条件と一致した音声データ
の中に少なくとも１つの仮選択された音声データがある
場合について説明する。非選択状態の場合の再生におい
てスピーカ２１から発せられる音量は「１／２」にレベ
ル調整された音量であったが、この仮選択状態の場合の
再生においてスピーカ２１から発せられる音量にはレベ
ル調整を施さずに本来の音声データの示す音量で出力さ
れる。すなわち、仮選択状態における表示や再生は、非
選択状態の「モード２」で説明した内容と同様であり、
異なる点はレベル調整を行わないことである。B) Control such as Display in Temporary Selection State A case where at least one temporarily selected audio data is included in the audio data that matches the search condition when the search result is displayed will be described. In the reproduction in the non-selection state, the volume emitted from the speaker 21 was the level adjusted to “１／”. However, in the reproduction in the temporary selection state, the volume generated from the speaker 21 was adjusted to the level. Is output at the volume indicated by the original audio data without performing the above processing. That is, the display and reproduction in the temporary selection state are the same as the contents described in “mode 2” in the non-selection state.
The difference is that no level adjustment is performed.

【００４８】オペレータの操作によって設定された仮選
択状態の音声データの全てについてのキーフレーズが順
番に連続した状態で再生される。一つの音声データにつ
いて複数のキーフレーズが設定されている場合には、そ
れらは順に繰り返される。また、キーフレーズとキーフ
レーズのつなぎの部分にはフェードインとフェードアウ
トの効果が効かされている。そして、再生しているキー
フレーズに対応した音声データのアイコンの色が変化す
るとともに、そのアイコンの大きさが音量に応じてダイ
ナミックに変化するように実現されている。さらに、再
生しているキーフレーズに対応した音声データの音声デ
ータ認識ワードが流れ表示になる。The key phrases for all of the temporarily selected audio data set by the operation of the operator are reproduced in a state in which they are successively arranged. When a plurality of key phrases are set for one audio data, they are repeated in order. In addition, the effect of fade-in and fade-out is applied to a portion where a key phrase is connected. The color of the icon of the audio data corresponding to the key phrase being reproduced changes, and the size of the icon dynamically changes according to the volume. Further, a voice data recognition word of voice data corresponding to the key phrase being reproduced is displayed in a flowing manner.

【００４９】このように、仮選択状態とされて表示・再
生されると、キーフレーズが順次に連続して再生される
とともに、音声データに対応するアイコンの表示や音声
データ認識ワードの表示が変化するため、再生されてい
る音声データを視覚的に容易に認識することが可能とな
る。なお、仮選択状態とする音声データは複数個設定す
ることも可能である。As described above, when displayed and reproduced in the tentatively selected state, the key phrases are reproduced sequentially and continuously, and the display of the icon corresponding to the voice data and the display of the voice data recognition word change. Therefore, it is possible to easily recognize the reproduced audio data visually. It is also possible to set a plurality of audio data to be in the temporary selection state.

【００５０】また、例えば、非選択状態において検索の
結果抽出された音声データが「モード２」で表示・再生
されている場合において、任意の音声データをマウス操
作によって仮選択とすることにより、オペレータの所望
する音声データを絞り込んでいくことが可能となる。Further, for example, when the voice data extracted as a result of the search in the non-selection state is displayed / reproduced in “mode 2”, arbitrary voice data is temporarily selected by a mouse operation, so that the operator It is possible to narrow down the desired audio data.

【００５１】これまで説明した非選択状態と仮選択状態
とでオペレータが所望の音声データを確定できない場合
には、さらに、属性情報を表示装置１８に表示させるこ
とも可能である。図４に示すような検索結果が表示され
ている画面の任意の音声データのアイコンをマウス２０
でダブルクリックすることにより、その音声データにつ
いて、図１１に示すような属性情報表示画面Ｐ４を表示
装置１８に表示させることができる。オペレータは、図
１１の属性情報表示画面Ｐ４から当該音声データにの属
性情報を確認することができる。また、マウス２０の操
作によって属性情報表示画面Ｐ４に表示された再生ボタ
ンＢ２１をクリックすることにより当該音声データの全
区間が再生され、スピーカ２１から音声が発せられる。
非選択状態および仮選択状態における再生は、設定され
たキーフレーズの区間のみの再生であったが、再生ボタ
ンＢ２１をクリックすることにより音声データの全てが
再生されることとなる。If the operator cannot determine the desired voice data between the non-selection state and the temporary selection state described above, the display device 18 can further display the attribute information. The icon of any audio data on the screen on which the search result is displayed as shown in FIG.
By double-clicking on, the attribute information display screen P4 as shown in FIG. 11 can be displayed on the display device 18 for the audio data. The operator can check the attribute information of the audio data from the attribute information display screen P4 in FIG. By clicking the play button B21 displayed on the attribute information display screen P4 by operating the mouse 20, the entire section of the audio data is reproduced, and the speaker 21 emits sound.
The reproduction in the non-selection state and the provisional selection state is reproduction only in the section of the set key phrase, but when the reproduction button B21 is clicked, all of the audio data is reproduced.

【００５２】そして、オペレータは、属性情報の確認や
音声データの再生によって当該音声データが所望する音
声データであることを認識すると、取出しボタンＢ２２
をクリックすることにより、当該音声データを取り出す
ことができる。「音声データを取り出す」とは、音声デ
ータベースから音声データをコピーし、他のプログラム
などでその音声データを活用することができるようにす
ることをいう。また、オペレータは、属性情報の確認や
音声データの再生によって当該音声データが所望する音
声データでないことを認識した場合は、キャンセルボタ
ンＢ２３をクリックして属性情報表示画面Ｐ４を終了さ
せて検索結果表示画面などに戻り、再び所望の音声デー
タの特定作業を行うこととなる。When the operator recognizes that the audio data is the desired audio data by confirming the attribute information and reproducing the audio data, the operator presses the takeout button B22.
By clicking, the audio data can be extracted. "Retrieving audio data" means copying audio data from an audio database so that the audio data can be used by another program or the like. When the operator recognizes that the audio data is not the desired audio data by checking the attribute information or reproducing the audio data, the operator clicks the cancel button B23 to end the attribute information display screen P4 and display the search result. Returning to the screen or the like, the operation of specifying the desired audio data is performed again.

【００５３】このように、この実施の形態の音声データ
ベースシステムの音声データの検索結果の表示を行う際
には、検索によって抽出された音声データの再生を自動
的に行うため、従来のように再生を伴わない検索結果の
表示に比して容易に音声データの特定を行うことが可能
であり、オペレータに不安感が残ることもない。また、
検索結果の表示の際には、ファイル名ではなくて、音声
データがどのような音声についてのデータであるかを具
体的かつ簡潔に示した音声データ認識ワードを音声デー
タのアイコンの下欄に表示しているため、従来に比較し
て容易に音声データの内容を推定することができ、作業
の効率化を図ることができる。さらに、検索結果の表示
の際に行う再生については、音声データの全てを再生す
るのではなく、音声データについて設定されたキーフレ
ーズを繰り返し再生するため、短時間で所望の音声デー
タを特定することが可能となる。このように、オペレー
タの所望する音声データを迅速かつ効率的に確実に得る
ことができる。As described above, when the search result of the audio data of the audio database system of this embodiment is displayed, the audio data extracted by the search is automatically reproduced. The voice data can be easily specified as compared with the display of the search result not accompanied by the operator, and the operator does not feel uneasy. Also,
When displaying search results, instead of a file name, an audio data recognition word that specifically and succinctly indicates the type of audio data is displayed in the lower column of the audio data icon. Therefore, it is possible to easily estimate the content of the audio data as compared with the related art, and it is possible to improve the efficiency of the operation. Furthermore, for the reproduction performed when displaying the search results, the key phrase set for the audio data is repeatedly reproduced instead of reproducing all the audio data, so that the desired audio data can be specified in a short time. Becomes possible. In this way, audio data desired by the operator can be obtained promptly and efficiently.

【００５４】＜４．フローチャート＞次に、この実施の
形態の音声データベースシステムにおける検索から音声
データを特定するまでの処理について説明する。<4. Flowchart> Next, a description will be given of processing from search to identification of voice data in the voice database system of the present embodiment.

【００５５】図１２は、この実施の形態の音声データベ
ースシステムにおける検索から音声データを特定するま
での処理を示すフローチャートである。まず、ステップ
Ｓ２１では、ＣＰＵ１２においてオペレータの指示によ
り音声データベースプログラムのうちの音声データの検
索／取り出しに関するプログラムを起動する。そして、
初期設定の画面で、非選択状態での再生モードとしてモ
ード１とモード２のどちらか一方を選択し、設定する
（ステップＳ２２）。そして、ステップＳ２３におい
て、音声データの検索を行うための条件（例えば、キー
ワードなど）を入力する。そしてオペレータは、ステッ
プＳ２４において検索開始の指示を入力する。これによ
り、ＣＰＵ１２は入力された検索条件に一致する音声デ
ータの検索を開始する。そして、検索の結果抽出された
音声データは、表示装置１８に表示されるとともに、非
選択状態における「モード１」または「モード２」の再
生モードで抽出された音声データのキーフレーズが再生
される（ステップＳ２５）。そして、オペレータは特定
の音声データのみを確認する必要があるか否かの判断を
行う（ステップＳ２６）。ここで「ＹＥＳ」と判断した
場合はステップＳ２７に進み、「ＮＯ」と判断した場合
はステップＳ２９に進む。FIG. 12 is a flowchart showing processing from search to identification of voice data in the voice database system of this embodiment. First, in step S21, the CPU 12 activates a program related to search / retrieval of voice data among voice database programs in accordance with an instruction of an operator. And
On the initial setting screen, one of mode 1 and mode 2 is selected and set as the reproduction mode in the non-selected state (step S22). Then, in step S23, a condition (for example, a keyword or the like) for searching for voice data is input. Then, the operator inputs a search start instruction in step S24. As a result, the CPU 12 starts searching for audio data that matches the input search condition. The audio data extracted as a result of the search is displayed on the display device 18, and the key phrase of the audio data extracted in the "mode 1" or "mode 2" reproduction mode in the non-selected state is reproduced. (Step S25). Then, the operator determines whether it is necessary to check only specific voice data (step S26). If "YES" is determined here, the process proceeds to a step S27, and if "NO", the process proceeds to a step S29.

【００５６】ステップＳ２７では、オペレータは特定の
音声データのアイコンに対してマウス操作によるクリッ
クやマウスポインタの移動を行い、非選択状態から仮選
択状態に変更させる。そして、ステップＳ２８では、仮
選択状態での表示・再生が行われる。In step S27, the operator clicks on the icon of the specific voice data by mouse operation or moves the mouse pointer to change the non-selected state to the temporary selected state. Then, in step S28, display / reproduction in the temporary selection state is performed.

【００５７】そして、所望する音声データに該当する候
補の音声データがある場合はステップＳ３０に進み、候
補の音声データがない場合にはステップＳ３６に進む
（ステップＳ２９）。そして、オペレータは候補の音声
データを選択し、マウス操作を行ってその音声データに
ついての属性情報表示画面を表示させる（ステップＳ３
０）。そして、属性情報表示画面により音声データの最
終確認を行う（ステップＳ３１）。そしてステップＳ３
２においては音声データの全ての再生を行う場合にはス
テップＳ３３の処理を行う。ステップＳ３３では、属性
情報表示画面の再生ボタンを句陸することにより再生指
示を行う。そして、音声データの確認の結果、当該音声
データを取り出すか否かの判断を行う（ステップＳ３
４）。そして当該音声データを取り出す場合には取り出
しの操作を行う（ステップＳ３５）。If there is candidate voice data corresponding to the desired voice data, the process proceeds to step S30, and if there is no candidate voice data, the process proceeds to step S36 (step S29). Then, the operator selects the candidate voice data and performs a mouse operation to display an attribute information display screen for the voice data (step S3).
0). Then, final confirmation of the audio data is performed on the attribute information display screen (step S31). And step S3
In Step 2, when all the audio data is reproduced, the process of Step S33 is performed. In step S33, a reproduction instruction is issued by depressing the reproduction button on the attribute information display screen. Then, as a result of the confirmation of the audio data, it is determined whether or not the audio data is taken out (step S3).
4). When the voice data is to be taken out, an operation of taking out is performed (step S35).

【００５８】そして、次の検索を行う場合は、ステップ
Ｓ２２からの処理を繰り返し、行わない場合はステップ
Ｓ３７に進み、音声データの検索／取り出しに関するプ
ログラムを終了する。If the next search is to be performed, the process from step S22 is repeated. If not, the process proceeds to step S37 to terminate the program related to voice data search / extraction.

【００５９】次に、非選択状態の「モード１」での自動
再生処理について説明する。図１３は、この実施の形態
における音声データベースシステムの非選択状態の「モ
ード１」での再生処理を示すフローチャートである。ま
ず、ステップＳ４１において検索の結果抽出された音声
データのリストを作成し、メモリ１３に記憶する。そし
てステップＳ４１で作成したリストに基づいて、音声デ
ータのキーフレーズの再生プロセスを検索の結果抽出さ
れた音声データの個数分起動する。Next, the automatic reproduction process in the "mode 1" in the non-selected state will be described. FIG. 13 is a flowchart showing a reproduction process in “mode 1” in a non-selected state of the audio database system according to the present embodiment. First, a list of audio data extracted as a result of the search in step S41 is created and stored in the memory 13. Then, based on the list created in step S41, the reproduction process of the key phrase of the audio data is started for the number of audio data extracted as a result of the search.

【００６０】例えば、検索の結果抽出された音声データ
の数がＮ個（ただし、Ｎは任意の整数）であったとする
と、抽出された音声データのそれぞれのキーフレーズを
「１／（２・Ｎ）」の音量にレベル調整して再生プロセ
スを起動する。これにより、抽出された全ての音声デー
タのキーフレーズが総和合成されるとともに、「１／
２」の音量レベルにレベル調整することができる。For example, assuming that the number of voice data extracted as a result of the search is N (where N is an arbitrary integer), each key phrase of the extracted voice data is expressed as “1 / (2 · N )) And start the playback process. As a result, the key phrases of all the extracted voice data are summed and synthesized, and “1 /
2 ".

【００６１】そして、ステップＳ４３で「ＮＯ」と判断
されるまで「モード１」による再生を繰り返す。再生を
終了する場合は、ステップＳ４４で各再生プロセスを停
止させることにより行われる。The reproduction in "mode 1" is repeated until "NO" is determined in step S43. When the reproduction is completed, the reproduction is stopped by stopping each reproduction process in step S44.

【００６２】次に、非選択状態の「モード２」での自動
再生処理について説明する。図１４は、この実施の形態
における音声データベースシステムの非選択状態の「モ
ード２」での再生処理を示すフローチャートである。ま
ず、ステップＳ５１において検索の結果抽出された音声
データのリストを作成し、メモリ１３に記憶する。そし
て、以前に再生していたキーフレーズが終了したか否か
の判断が行われる（ステップＳ５２）。再生中である場
合は、ステップＳ５８に進み、終了している場合にはス
テップＳ５３に進む。ステップＳ５３では、再生が終了
した音声データのキーフレーズの再生にかかわっていた
各プロセスを停止させる。ステップＳ５４では、ステッ
プＳ５１で作成したリストに基づいて次に再生する音声
データを特定する。ステップＳ５５では、ステップＳ５
４で特定された音声データの音声データ認識ワードを流
れ表示にするために流れ表示プロセスを起動する。ステ
ップＳ５６では、ステップＳ５４で特定された音声デー
タのキーフレーズの再生を行うための再生プロセスを起
動する。ステップＳ５７では、ステップＳ５４で特定さ
れた音声データのアイコン表示をダイナミックに変化す
るようにアイコン表示プロセスを起動する。そしてステ
ップＳ５８に進む。なお、ステップＳ５５〜Ｓ５７につ
いては、他の順序で行われても良い。Next, the automatic reproduction process in the "mode 2" in the non-selected state will be described. FIG. 14 is a flowchart showing a reproduction process in “mode 2” in a non-selected state of the audio database system according to the present embodiment. First, a list of audio data extracted as a result of the search in step S51 is created and stored in the memory 13. Then, it is determined whether or not the previously reproduced key phrase has ended (step S52). When the reproduction is being performed, the process proceeds to step S58, and when the reproduction is completed, the process proceeds to step S53. In step S53, each process involved in the reproduction of the key phrase of the audio data whose reproduction has been completed is stopped. In step S54, audio data to be reproduced next is specified based on the list created in step S51. In step S55, step S5
A flow display process is started to display the voice data recognition word of the voice data specified in step 4 in a flow display. In step S56, a reproduction process for reproducing the key phrase of the audio data specified in step S54 is started. In step S57, an icon display process is started to dynamically change the icon display of the audio data specified in step S54. Then, the process proceeds to step S58. Steps S55 to S57 may be performed in another order.

【００６３】ステップＳ５８では、「モード２」での再
生を継続するか否かを決定する。オペレータによる入力
がない場合はステップＳ５２に進み、再生を継続する。
「モード２」での再生を終了する場合は、ステップＳ５
９で音声データのキーフレーズの再生にかかわっていた
各プロセス（ステップＳ５５〜Ｓ５７で起動したプロセ
ス）を停止させて処理を終了する。In step S58, it is determined whether or not reproduction in "mode 2" is to be continued. If there is no input by the operator, the process proceeds to step S52, and the reproduction is continued.
When ending the reproduction in “mode 2”, step S5
In step 9, the processes involved in the reproduction of the key phrase of the audio data (the processes started in steps S55 to S57) are stopped, and the process ends.

【００６４】次に、仮選択状態での自動再生処理につい
て説明する。図１５は、この実施の形態における音声デ
ータベースシステムの仮選択状態での再生処理を示すフ
ローチャートである。まず、ステップＳ６１において検
索の結果抽出された音声データのリストを作成し、メモ
リ１３に記憶する。そして、以前に再生していたキーフ
レーズが終了したか否かの判断が行われる（ステップＳ
６２）。再生中である場合は、ステップＳ６８に進み、
終了している場合にはステップＳ６３に進む。ステップ
Ｓ６３では、再生が終了した音声データのキーフレーズ
の再生にかかわっていた各プロセスを停止させる。ステ
ップＳ６４では、ステップＳ６１で作成したリストに基
づいて次に再生する音声データを特定する。ステップＳ
６５では、ステップＳ６４で特定された音声データの音
声データ認識ワードを流れ表示にするために流れ表示プ
ロセスを起動する。ステップＳ６６では、ステップＳ６
４で特定された音声データのキーフレーズの再生を行う
ための再生プロセスを起動する。ステップＳ６７では、
ステップＳ６４で特定された音声データのアイコン表示
をダイナミックに変化するようにアイコン表示プロセス
を起動する。そしてステップＳ６８に進む。なお、ステ
ップＳ６５〜Ｓ６７については、他の順序で行われても
良い。Next, the automatic reproduction process in the temporary selection state will be described. FIG. 15 is a flowchart showing a reproduction process in a temporarily selected state of the audio database system in this embodiment. First, a list of audio data extracted as a result of the search in step S61 is created and stored in the memory 13. Then, it is determined whether or not the previously reproduced key phrase has ended (step S).
62). If it is during reproduction, the process proceeds to step S68,
If it has been completed, the process proceeds to step S63. In step S63, each process related to the reproduction of the key phrase of the audio data whose reproduction has been completed is stopped. In step S64, the audio data to be reproduced next is specified based on the list created in step S61. Step S
In step 65, a flow display process is started to display the voice data recognition word of the voice data specified in step S64 in a flow display. In step S66, step S6
A reproduction process for reproducing the key phrase of the audio data specified in step 4 is started. In step S67,
The icon display process is activated so that the icon display of the audio data specified in step S64 is dynamically changed. Then, the process proceeds to step S68. Steps S65 to S67 may be performed in another order.

【００６５】ステップＳ６８では、仮選択状態での再生
を継続するか否かを決定する。オペレータによる入力が
ない場合はステップＳ６２に進み、再生を継続する。仮
選択状態での再生を終了する場合は、ステップＳ６９で
音声データのキーフレーズの再生にかかわっていた各プ
ロセス（ステップＳ６５〜Ｓ６７で起動したプロセス）
を停止させて処理を終了する。In step S68, it is determined whether or not the reproduction in the provisionally selected state is to be continued. If there is no input by the operator, the process proceeds to step S62, and the reproduction is continued. When the reproduction in the tentatively selected state is ended, each process involved in reproducing the key phrase of the audio data in step S69 (the process started in steps S65 to S67)
Is stopped and the processing is terminated.

【００６６】＜５．変形例＞上記の実施の形態で示した
音声データのアイコンは、スピーカの絵柄で示したがこ
れに限定するものではなく、オペレータが自由に音声デ
ータごとに設定することが可能である。例えば、音声デ
ータの内容に応じた絵柄をアイコンとして設定すれば、
視覚的な効果が高まり、より効率的に音声データの特定
を行うことが可能となる。<5. Modifications> The icons of the audio data shown in the above-described embodiment are shown by the picture of the speaker. However, the present invention is not limited to this, and the operator can freely set each audio data. For example, if a picture corresponding to the content of audio data is set as an icon,
The visual effect is enhanced, and the audio data can be specified more efficiently.

【００６７】また、非選択状態における再生では、音量
が「１／２」となるようにレベル調整していたが、これ
に限定するものでもない。仮選択状態がオペレータが意
図的に特定の音声データの音声を出力させるものである
ため、音量を大きくして良い。しかし、非選択状態にお
ける再生は検索結果の表示とほぼ同時に自動的に行われ
るため、音量が大きいとオペレータに不快感を与える可
能性がある。そこで、意図的な再生でない非選択状態で
の再生の音量を小さくすることにより、そのような問題
を解決している。従って、音量を小さくするのであれ
ば、「１／２」以外の数値でも良い。In the reproduction in the non-selection state, the level is adjusted so that the sound volume becomes "1/2". However, the present invention is not limited to this. Since the temporary selection state is a state where the operator intentionally outputs the sound of the specific sound data, the volume may be increased. However, the reproduction in the non-selected state is automatically performed almost simultaneously with the display of the search result. Therefore, if the volume is large, the operator may feel uncomfortable. Therefore, such a problem is solved by reducing the volume of reproduction in a non-selected state that is not intentional reproduction. Therefore, if the volume is to be reduced, a numerical value other than "1/2" may be used.

【００６８】[0068]

【発明の効果】以上説明したように、請求項１に記載の
発明によれば、音声データの示す音声の任意の特徴的な
部分についての開始点と終了点に基づく区間音声情報を
音声データに対応付けて記憶し、複数の音声データの中
から所定の情報に基づいて音声データの検索を行い、検
索によって抽出された音声データを区間音声情報に基づ
いて再生して音声として出力するため、容易に音声デー
タの特定を行うことが可能であり、オペレータに不安感
が残ることもないとともに、短時間で所望の音声データ
を特定することが可能となり、オペレータの所望する音
声データを迅速かつ効率的に確実に得ることができる。As described above, according to the first aspect of the present invention, the section voice information based on the start point and the end point of an arbitrary characteristic portion of the voice indicated by the voice data is converted into the voice data. Since the audio data is stored in association with each other, audio data is searched based on predetermined information from a plurality of audio data, and the audio data extracted by the search is reproduced based on the section audio information and output as audio, It is possible to specify voice data quickly, and it is possible to specify desired voice data in a short time without leaving the operator feeling uneasy, and to quickly and efficiently specify voice data desired by the operator. Can be reliably obtained.

【００６９】請求項２に記載の発明によれば、１つの音
声データについて複数の区間音声情報を対応付けて記憶
することが可能であるため、音声データの特定をより確
実かつ容易なものとすることができる。According to the second aspect of the present invention, it is possible to store a plurality of section voice information in association with one voice data, so that the voice data can be specified more reliably and easily. be able to.

【００７０】請求項３に記載の発明によれば、出力手段
で出力されている音声データに対応する表示が、当該音
声データの区間音声情報に基づく再生に伴って変化する
ため、再生されている音声データを視覚的に特定するこ
とが容易となり、オペレータの所望する音声データを迅
速かつ効率的に確実に得ることができる。According to the third aspect of the present invention, the display corresponding to the audio data output by the output means changes along with the reproduction based on the section audio information of the audio data, so that the reproduction is performed. It becomes easy to visually identify the voice data, and voice data desired by the operator can be obtained quickly, efficiently and reliably.

【００７１】請求項４に記載の発明によれば、検索手段
によって抽出された音声データについて、特徴的な部分
の発音の擬音語または当該音声データについての客観的
対象物を含む文字で表現した音声データ認識ワードを表
示するため、検索の結果抽出された音声データがそれぞ
れどのような音声であるかを推定することが容易とな
る。According to the fourth aspect of the present invention, the speech data extracted by the search means is a speech onomatopoeic sound of a characteristic portion or a speech expressed as a character including an objective object of the speech data. Since the data recognition word is displayed, it is easy to estimate what kind of voice each of the voice data extracted as a result of the search is.

【００７２】請求項５に記載の発明によれば、コンピュ
ータ読み取り可能な記録媒体に記録された音声データベ
ースプログラムをコンピュータが読み取り実行すること
により、容易に音声データの特定を行うことが可能であ
り、オペレータに不安感が残ることもないとともに、短
時間で所望の音声データを特定することが可能となり、
オペレータの所望する音声データを迅速かつ効率的に確
実に得ることができる音声データベースシステムを実現
することが可能となる。According to the fifth aspect of the present invention, the audio data can be easily specified by the computer reading and executing the audio database program recorded on the computer-readable recording medium, Anxiety does not remain for the operator, and desired voice data can be specified in a short time.
It is possible to realize a voice database system that can reliably and quickly obtain voice data desired by an operator.

[Brief description of the drawings]

【図１】この発明の実施の形態である音声データベース
システムの構成を示す概略図である。FIG. 1 is a schematic diagram showing a configuration of a voice database system according to an embodiment of the present invention.

【図２】この発明の実施の形態における音声データのキ
ーフレーズの設定方法を説明するための説明図である。FIG. 2 is an explanatory diagram for explaining a method of setting a key phrase of audio data according to the embodiment of the present invention.

【図３】この発明の実施の形態の音声データベースシス
テムにおける音声データの登録の処理を示すフローチャ
ートである。FIG. 3 is a flowchart showing a process of registering audio data in the audio database system according to the embodiment of the present invention;

【図４】この発明の実施の形態の音声データべースシス
テムにおける検索結果の表示の一例を示す図である。FIG. 4 is a diagram showing an example of a display of a search result in the audio database system according to the embodiment of the present invention;

【図５】この発明の実施の形態の音声データベースシス
テムの非選択状態と仮選択状態と示す図である。FIG. 5 is a diagram showing a non-selected state and a tentatively selected state of the voice database system according to the embodiment of the present invention;

【図６】２つの音声データのキーフレーズを示す図であ
る。FIG. 6 is a diagram showing key phrases of two audio data.

【図７】この発明の実施の形態の音声データベースシス
テムのキーフレーズの総和合成を示す説明図である。FIG. 7 is an explanatory diagram showing the sum total synthesis of key phrases in the voice database system according to the embodiment of the present invention.

【図８】この発明の実施の形態の音声データベースシス
テムの「モード２」についての音声データの再生モード
の説明図である。FIG. 8 is an explanatory diagram of an audio data reproduction mode in “mode 2” of the audio database system according to the embodiment of the present invention;

【図９】この発明の実施の形態の音声データベースシス
テムにおけるキーフレーズの再生に伴う音声データのア
イコンの変化を示す図である。FIG. 9 is a diagram showing a change in an icon of audio data accompanying reproduction of a key phrase in the audio database system according to the embodiment of the present invention.

【図１０】この発明の実施の形態における音声データ認
識ワードの流れ表示を示す図である。FIG. 10 is a diagram showing a flow display of speech data recognition words in the embodiment of the present invention.

【図１１】この発明の実施の形態における属性情報表示
画面を示す概念図である。FIG. 11 is a conceptual diagram showing an attribute information display screen according to the embodiment of the present invention.

【図１２】この発明の実施の形態の音声データベースシ
ステムにおける検索から音声データを特定するまでの処
理を示すフローチャートである。FIG. 12 is a flowchart showing a process from searching to specifying audio data in the audio database system according to the embodiment of the present invention;

【図１３】この発明の実施の形態における音声データベ
ースシステムの非選択状態の「モード１」での再生処理
を示すフローチャートである。FIG. 13 is a flowchart showing a reproduction process in “mode 1” in a non-selected state of the audio database system according to the embodiment of the present invention.

【図１４】この発明の実施の形態における音声データベ
ースシステムの非選択状態の「モード２」での再生処理
を示すフローチャートである。FIG. 14 is a flowchart showing a reproduction process in “mode 2” in a non-selected state of the audio database system according to the embodiment of the present invention.

【図１５】この発明の実施の形態における音声データベ
ースシステムの仮選択状態での再生処理を示すフローチ
ャートである。FIG. 15 is a flowchart showing a reproduction process in a temporarily selected state of the audio database system according to the embodiment of the present invention.

【図１６】従来の音声データベースシステムにおける操
作手順を表示装置に表示される画面で示した説明図であ
る。FIG. 16 is an explanatory diagram showing an operation procedure in a conventional voice database system on a screen displayed on a display device.

[Explanation of symbols]

１１入出力装置１２ＣＰＵ１３メモリ１４記憶部１５，１６，１７インタフェース１８表示装置１９キーボード２０マウス２１スピーカＤ可搬性記録媒体 DESCRIPTION OF SYMBOLS 11 I / O device 12 CPU 13 Memory 14 Storage part 15, 16, 17 Interface 18 Display device 19 Keyboard 20 Mouse 21 Speaker D Portable recording medium

Claims

[Claims]

1. A computer for handling audio data, comprising: (a) storage means for storing section audio information of an arbitrary characteristic portion of the audio indicated by the audio data in association with the audio data; A search unit for searching for voice data based on predetermined information from among the voice data, and (c) outputting the voice data extracted by the search unit based on the section voice information and outputting as voice. Means, and a voice database system.

2. The audio database system according to claim 1, wherein said storage means is capable of storing a plurality of said section audio information in association with one audio data.

3. The system according to claim 1, wherein the display corresponding to the audio data output by the output unit changes with the reproduction of the audio data based on the section audio information. Characteristic voice database system.

4. The system according to claim 1, wherein the audio data extracted by the search unit is:
A speech database system for displaying a speech data recognition word expressed by a character including an onomatopoeic word of a pronunciation of a characteristic portion or an objective object of the speech data.

5. A computer, comprising: (a) storage means for storing, in association with the audio data, section audio information on an arbitrary characteristic portion of the audio indicated by the audio data; And (c) an output unit that reproduces the audio data extracted by the search unit based on the section audio information and outputs the audio data as audio. A computer-readable recording medium on which an audio database program is recorded.