JPH0749695A

JPH0749695A - Time sequential data recording and reproducing device

Info

Publication number: JPH0749695A
Application number: JP5325231A
Authority: JP
Inventors: Shigenobu Seto; 重宣瀬戸; Yoichi Takebayashi; 洋一竹林; Yasutsugu Kawakura; 康嗣川倉; Hiroshi Mizoguchi; 博溝口; Hisako Tanaka; 久子田中; Hideaki Shinchi; 秀昭新地
Original assignee: Toshiba Corp; Toshiba Software Engineering Corp
Current assignee: Toshiba Corp; Toshiba Software Engineering Corp
Priority date: 1993-06-03
Filing date: 1993-12-22
Publication date: 1995-02-21
Anticipated expiration: 2018-08-18
Also published as: JP3437617B2

Abstract

PURPOSE:To reproduce meaningful information only by performing recognition processes of time sequential data, retrieving key data, recording the constructional information generated by the above processes with the time sequential data and retrieving the data based on the key data. CONSTITUTION:Time sequential data are inputted through a time sequential data inutting means 1, stored in a storage means 2 and sent to a construction analysis means 3. Recognition processes of the time sequential data are performed in the means 3, key data are detected, a constructional information is generated and is stored in a constructional information storage means 4. A user inputs key data into a retrieving instruction input means 5 and the instruction is sent to a retrieving means 6. The means 6 retrieves all constructional information corresponding to the key data from the means 4, the time sequential data are retrieved from the means 2 and transmitted to an information output means 7. Moreover, a key data input means 8 accepts corrections and additions so that not only corrections but also changes in the constructional information are made by the user on the key data detected by the means 3.

Description

Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は時系列データを格納し、
格納した時系列データを再生する時系列データ記録再生
装置に関する。The present invention stores time series data,
The present invention relates to a time series data recording / reproducing device that reproduces stored time series data.

【０００２】[0002]

【従来の技術】近年、計算機の処理能力の向上を背景
に、音声、画像などのマルチメディアデータの格納と再
生および編集処理を行うために、マルチメディアデータ
の記録や検索を様々な形態で行うことが可能になってい
る。2. Description of the Related Art In recent years, in order to store and reproduce multimedia data such as voices and images and to perform editing processing against the backdrop of improvement in processing capacity of computers, recording and retrieval of multimedia data are performed in various forms. Is possible.

【０００３】大容量記録媒体が比較的容易に利用できる
ようになり、マルチメディアデータの入出力あるいは加
工、編集処理によるアプリケーションが開発されている
ものの、これらは単にマルチメディアデータの出し入れ
による編集機能の実現にとどまっている。Large-capacity recording media have become relatively easy to use, and although applications for multimedia data input / output, processing, and editing processing have been developed, these have only an editing function by putting multimedia data in and out. It has only been realized.

【０００４】膨大なデータ量となるマルチメディアデー
タの記録は、十分な記録容量を有する記録媒体を用いる
ことにより実現することは可能である。しかし、記録し
たマルチメディアデータの検索と出力は、そのデータの
量が膨大になるほど、記憶されたデータの中から必要な
部分を探し出すための計算機の労力が大きくなるため、
効果的に実現されているとはいえない。The recording of a huge amount of multimedia data can be realized by using a recording medium having a sufficient recording capacity. However, in the search and output of recorded multimedia data, the greater the amount of data, the greater the labor of the computer for finding the necessary part from the stored data.
It cannot be said that it has been effectively realized.

【０００５】特に、扱うデータが時系列データであれ
ば、検索に要する時間はデータの時間の長さに比例して
長くなる。また、利用者が検索して出力したい情報は必
ずしも膨大なマルチメディアデータそのものであるとは
限らず、むしろ、マルチメディアデータの中の意味のあ
る部分であったり、あるいはその意味そのもの、つま
り、キーデータであることが多い。Particularly, when the data to be handled is time-series data, the time required for the search becomes long in proportion to the time length of the data. In addition, the information that the user wants to search and output is not necessarily the huge amount of multimedia data itself, but rather a meaningful part of the multimedia data, or the meaning itself, that is, the key. It is often data.

【０００６】このキーデータは、従来の単なるマルチメ
ディアデータの記録・再生・編集処理だけでは得ること
ができない。例えば、音声データとして「えーと、オレ
ンジジュースを１つ、いや、３つほしい。」という発言
が入力された場合、この入力された音声データをそのま
ま出力するよりも「オレンジジュースを３つほしい。」
という主旨の意味のある情報を出力するほうが、利用者
はわかりやすい。This key data cannot be obtained by the conventional simple recording / reproducing / editing processing of multimedia data. For example, if the voice data “Um, I want one, no, three orange juices” is input, rather than outputting the input voice data as is, “I want three orange juices.”
It is easier for the user to output the information that has the meaning of the meaning.

【０００７】そのため、マルチメディアの認識処理を行
いキーデータを検出し、少なくとも検出した前記キーデ
ータと、このキーデータと前記時系列データとを関連付
けるリンク情報で構成される構造情報を作成し、キーデ
ータを含む構造情報の一部の組み合わせを意味のある部
分として出力する必要がある。Therefore, multimedia recognition processing is performed to detect key data, and at least the detected key data and structural information composed of link information for associating the key data with the time-series data are created, and the key information is generated. It is necessary to output some combinations of structural information including data as meaningful parts.

【０００８】このような認識処理などの自動的に構造情
報を生成する構造解析処理においてまず問題となるの
は、実時間処理や検出したキーデータの誤りの問題であ
る。実時間処理の問題は、計算機能力の向上により、音
声、画像などを認識する処理が十分リアルタイムに実現
できるようになりつつある。[0008] The first problem in the structure analysis process for automatically generating structure information such as the recognition process is the problem of real-time process and error of detected key data. The problem of real-time processing is that it is becoming possible to sufficiently realize the processing of recognizing voices, images, etc. in real time by improving the calculation function.

【０００９】構造解析結果の誤りの問題は、あらかじめ
キーデータに誤りが含まれ得る場合もあることを前提と
して、キーデータ生成処理で自動的に生成されたキーデ
ータを利用者が修正したり、利用者がキーデータそのも
のを直接付加するといった更新処理を行うことにより、
さらに正確なキーデータを作成することができる。しか
し、キーデータを後の検索時の検索キーとして利用する
マルチメディアデータの記録・検索装置はまだ存在して
いない。The problem of the error of the structural analysis result is that the user may correct the key data automatically generated by the key data generation process, assuming that the key data may include an error in advance. By performing update processing such as the user directly adding the key data itself,
More accurate key data can be created. However, there is no multimedia data recording / retrieving device that uses the key data as a retrieval key for later retrieval.

【００１０】[0010]

【発明が解決しようとする課題】上述のような従来のマ
ルチメディア時系列データの記録・検索装置では、記録
・再生・編集処理といったマルチメディア時系列データ
の単なる出し入れによる編集機能の実現は可能であっ
た。In the conventional multimedia time-series data recording / retrieving apparatus as described above, it is possible to realize an editing function such as recording / playback / editing processing by simply inserting / removing the multimedia time-series data. there were.

【００１１】また、マルチメディア時系列データの認識
処理も、十分リアルタイムに実現できるようになりつつ
あるが、利用者にとって意味のあるキーデータを得るこ
とはできなかった。Further, although recognition processing of multimedia time-series data is being realized in real time, key data meaningful to the user cannot be obtained.

【００１２】本発明は、マルチメディア時系列データの
中から利用者にとって必要な意味のある部分だけを取り
出して再生したり、時系列データの中の意味のある情報
を出力することのできる、時系列データ記録再生装置を
提供することを目的とする。According to the present invention, it is possible to take out only a meaningful portion required for a user from multimedia time-series data and reproduce it, or to output meaningful information in the time-series data. It is an object to provide a series data recording / reproducing device.

【００１３】また、構造情報を生成する構造解析処理能
力の向上のためにシステムを改良していく必要がある
が、構造解析の処理結果をもとに改良するシステムを提
供することを目的とする。Further, it is necessary to improve the system in order to improve the structural analysis processing capacity for generating structural information, but it is an object of the present invention to provide a system that improves based on the result of structural analysis processing. .

【００１４】[0014]

【課題を解決するための手段】本発明は、上記課題を解
決するために、マルチメディアの時系列データを格納
し、格納した前記時系列データを再生する時系列データ
記録再生装置において、入力された時系列データの認識
処理を行いキーデータを検出し、少なくとも検出した前
記キーデータと、前記キーデータの時刻情報と、前記時
系列データと前記キーデータをリンクさせる情報で構成
される構造情報を作成する構造情報解析手段と、前記構
造情報を格納する構造情報格納手段と、前記キーデータ
を検索キーとして前記構造情報の前記時系列データとリ
ンクさせる情報から前記時系列データを検索する検索手
段とを具備することを特徴とする。In order to solve the above problems, the present invention is applied to a time series data recording / reproducing apparatus which stores time series data of multimedia and reproduces the stored time series data. The key information is detected by performing the recognition process of the time series data, and at least the detected key data, time information of the key data, and structural information composed of information for linking the time series data and the key data with each other. Structure information analyzing means for creating, structure information storing means for storing the structure information, and searching means for searching the time series data from information linking the time series data of the structure information with the key data as a search key. It is characterized by including.

【００１５】[0015]

【作用】本発明の時系列データ記録再生装置によれば、
時系列データや構造情報の中から、時系列データや利用
者の必要な情報を検索する場合に、キーデータを検索キ
ーとして時系列データや利用者の必要な情報を取り出す
ことができる。According to the time series data recording / reproducing apparatus of the present invention,
When searching the time-series data or the information required by the user from the time-series data or the structural information, the time-series data or the information required by the user can be extracted using the key data as a search key.

【００１６】また、利用者が直接キーデータを入力でき
るため、構造解析手段において自動的に生成した構造情
報が誤っていたり不十分である場合も、構造情報格納手
段に記録されている構造情報を情報出力手段で画面出力
し確認しながら、利用者が随時検索キーとなるキーデー
タの修正・追加ができる。Further, since the user can directly input the key data, even if the structure information automatically generated by the structure analysis means is erroneous or insufficient, the structure information recorded in the structure information storage means can be used. The user can modify / add the key data as a search key at any time while displaying the information on the screen and confirming it.

【００１７】さらに、現時点において構造情報が不十分
である場合にも、既に時系列データ格納手段に記録され
ている時系列データを利用して新たに構造情報を作成す
ることにより、必要なキーデータが含まれる構造情報を
改めて構築し直すことができる。Further, even when the structure information is insufficient at the present time, necessary key data can be obtained by newly creating the structure information by utilizing the time series data already recorded in the time series data storage means. It is possible to reconstruct the structural information including the.

【００１８】[0018]

【実施例】まず、音声や動画像のような時系列データを
認識・処理するシステム（認識理解システムと呼ぶこと
にする）に適用した本発明の一実施例について説明す
る。ここでは、認識・理解だけでなく、その結果を受け
て何らかの応答・出力する応答システムや対話システム
であってもよい。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS First, an embodiment of the present invention applied to a system for recognizing and processing time-series data such as voice and moving images (referred to as a recognition and understanding system) will be described. Here, not only recognition / understanding, but also a response system or a dialogue system that receives a result and outputs / responds in some way may be used.

【００１９】例えば、キーデータは、認識・理解などの
処理結果である。リンク情報は、処理結果を得るもとと
なった時系列データとの対応を示す情報である。即ち、
時系列データのどの部分から得たキーデータであるかを
示す情報である。For example, the key data is a result of processing such as recognition / understanding. The link information is information indicating the correspondence with the time series data from which the processing result is obtained. That is,
This is information indicating from which part of the time-series data the key data is obtained.

【００２０】一般に、認識・理解などの処理は、複数の
処理段階を経る。例えば音声認識理解であれば、音声区
間の検出、音声分析、パターン認識、言語処理などの複
数の処理を経る。単語スポティングをベースとする音声
認識（坪井、橋本、竹林：“キーボードスポッティング
に基づく連続音声理解”電子情報通信学会技術研究報
告、SP-91-95,pp.33-40(1991.12)）ならば、単語検出、
構文解析、単語解析、意味解析などの処理を経る。In general, processing such as recognition and understanding goes through a plurality of processing steps. For example, in the case of voice recognition and understanding, a plurality of processes such as voice segment detection, voice analysis, pattern recognition, and language processing are performed. Speech recognition based on word spotting (Tsuboi, Hashimoto, Takebayashi: "Continuous Speech Understanding Based on Keyboard Spotting" Technical Report of IEICE, SP-91-95, pp.33-40 (1991.12)) , Word detection,
It goes through processing such as syntax analysis, word analysis, and semantic analysis.

【００２１】このように、複数の処理結果を経る場合、
キーデータは、最終的な処理結果だけでなく、途中の各
処理段階における中間的な処理結果もキーデータとする
ことも可能である。Thus, when a plurality of processing results are passed,
The key data may be not only the final processing result but also an intermediate processing result at each processing stage in the middle.

【００２２】この場合、リンク情報はもとの入力時系列
データとの対応を示す情報としてもよいし、前の処理段
階における処理結果との対応を示す情報としてもよい。
例えば、上述の単語スポティングをベースとする音声理
解ならば、検出された単語、単語列からなる構文木、解
析結果として得られた意味を各々キーデータとし、リン
ク情報としては、もとの音声データとの対応を示す情報
（例えば時間あるいはサンプル点あるいはデータのアド
レスなど）を利用してもよいし、中間結果としての単語
列や構文木をリンク情報として間接的に入力音声データ
との対応をとってもよい。In this case, the link information may be information indicating the correspondence with the original input time series data, or information indicating the correspondence with the processing result in the previous processing stage.
For example, in the case of speech understanding based on the word spotting described above, detected words, a syntax tree composed of word strings, and the meaning obtained as an analysis result are used as key data, and the link information is the original speech. Information indicating correspondence with data (for example, time, sample point, or data address) may be used, or a word string or a syntax tree as an intermediate result may be indirectly used as the link information to correspond to the input voice data. Very good.

【００２３】入力する時系列データとして、マルチメデ
ィアデータのように複数の種類のデータを扱う場合に
は、キーデータがどの時系列データに対応するかを示す
データ識別子をリンク情報に含めることにより、その対
応がより明確になる。When a plurality of types of data, such as multimedia data, are handled as input time series data, the link information includes a data identifier indicating which time series data the key data corresponds to, The correspondence becomes clearer.

【００２４】検索は、認識・理解システムが出力するこ
とのできる上述のキーデータが指定されると、それに一
致するキーデータをもつ構造情報を検索し、その構造情
報に記されたリンク情報からもとの時系列データにアク
セスすることによって、実現する。In the search, when the above-mentioned key data that can be output by the recognition / understanding system is specified, the structural information having the key data that matches the key data is searched, and the link information described in the structural information is also searched. It is realized by accessing the time series data of and.

【００２５】具体的に本発明の一実施例を図面をもとに
説明する。時系列データの入力と格納までの手順を、図
１の本発明の時系列データ記録再生装置の一実施例のブ
ロック図と図２のフローチャートをもとに簡単に説明す
る。An embodiment of the present invention will be specifically described with reference to the drawings. A procedure for inputting and storing time series data will be briefly described with reference to a block diagram of an embodiment of the time series data recording / reproducing apparatus of the present invention shown in FIG. 1 and a flow chart shown in FIG.

【００２６】時系列データは、ステップ「時系列データ
入力」２００にあるように、マイクやカメラなどの時系
列データ入力手段１から入力される。時系列データ入力
手段１は、ステップ「時系列データを時系列データ格納
手段に送る」２０１にあるように、入力された時系列デ
ータを時系列データ格納手段２に送る。The time-series data is input from the time-series data input means 1 such as a microphone or a camera as shown in step "time-series data input" 200. The time-series data input means 1 sends the input time-series data to the time-series data storage means 2 as in step “Send time-series data to time-series data storage means” 201.

【００２７】時系列データ格納手段２は、ステップ「時
系列データを時系列データ格納手段に格納する」２０１
にあるように、送られた時系列データを格納する。同様
に、時系列データ入力手段１は、ステップ「時系列デー
タを構造情報解析手段に送る」２０３にあるように、入
力された時系列データを構造解析手段３に送る。The time-series data storage means 2 executes the step "store time-series data in the time-series data storage means" 201.
Store the sent time-series data as in. Similarly, the time-series data input means 1 sends the input time-series data to the structure analysis means 3 as in the step “send time-series data to structure information analysis means” 203.

【００２８】構造解析手段３は、ステップ「構造情報を
生成する」２０４にあるように、送られた時系列データ
の認識処理を行いキーデータを検出し、少なくとも検出
した前記キーデータと、前記キーデータの時刻情報と、
前記時系列データと前記キーデータをリンクさせる情報
で構成される構造情報を生成する。The structure analyzing means 3 detects the key data by performing the recognition process of the transmitted time-series data as in step "generate structure information" 204, and at least the detected key data and the key. Data time information,
Structural information composed of information that links the time series data and the key data is generated.

【００２９】構造解析手段３で生成された時系列データ
の構造情報は、ステップ「構造情報を構造情報格納手段
に格納する」２０５にあるように、構造情報格納手段４
に格納される。ここでいうキーデータとは、入力された
時系列データから取り出した、音声認識などのパターン
認識処理結果など、時系列データの一部を意味づけする
ことのできる、利用者にとって意味のある情報である。The structure information of the time-series data generated by the structure analysis means 3 is stored in the structure information storage means 4 as in the step "store structure information in structure information storage means" 205.
Stored in. The key data referred to here is information meaningful to the user that can take a part of the time-series data such as the result of pattern recognition processing such as voice recognition extracted from the input time-series data. is there.

【００３０】時系列データ格納手段２に格納された時系
列データおよび構造情報格納手段４に格納された時系列
データの意味のある情報を利用者の要求に従って出力す
る手順を図３のフローチャートをもとに簡単に説明す
る。The procedure of outputting the meaningful information of the time series data stored in the time series data storage means 2 and the time series data stored in the structure information storage means 4 according to the user's request is also shown in the flowchart of FIG. And briefly explained.

【００３１】利用者は、ステップ「検索命令入力」３０
０にあるように、検索命令入力手段５に検索キーとして
入力するキーデータを入力する。検索命令手段５は、ス
テップ「検索命令を検索手段に送る」３０１にあるよう
に、検索の命令を検索手段６に送る。The user uses the step "input search command" 30
As shown in 0, the key data to be input as the search key is input to the search command input means 5. The search command means 5 sends a search command to the search means 6 as in the step "send search command to search means" 301.

【００３２】検索手段６は、ステップ「検索を行う」３
０２にあるように、検索命令手段５に入力された検索の
命令に従い、キーデータに対応する構造情報を構造情報
格納手段４からすべて検索し、検索された構造情報のリ
ンクさせる情報をもとに時系列データを時系列データ格
納手段２から検索する。The search means 6 uses the step "search" 3
02, the structure information storage unit 4 is searched for all the structural information corresponding to the key data in accordance with the search command input to the search command unit 5, and based on the information linked to the searched structural information. The time series data is retrieved from the time series data storage means 2.

【００３３】検索手段６は、同様に検索された構造情報
をもとにキーワードの組み合わせ等の意味のある情報を
構造情報格納手段４から検索する。検索手段６は、ステ
ップ「検索したデータを情報出力手段におくる」３０３
にあるように、検索した時系列データや意味のある情報
を、情報出力手段７に送る。The retrieval means 6 retrieves meaningful information such as a combination of keywords from the structure information storage means 4 based on the similarly retrieved structure information. The search means 6 carries out the step “Carry the searched data to the information output means” 303.
As described in 1., the retrieved time series data and meaningful information are sent to the information output means 7.

【００３４】情報出力手段７は、ステップ「検索したデ
ータを出力する」３０４にあるように、検索した時系列
データや意味のある情報を、視覚的あるいは聴覚的に利
用者に提示する。The information output means 7 visually or audibly presents the retrieved time-series data and meaningful information to the user as in step "output retrieved data" 304.

【００３５】情報出力手段７は、時系列データを利用者
に提示する場合、時系列データや意味のある部分を他意
味のある部分と区別して出力することができるものとす
る。キーデータ入力手段８は、マウスやペン、タッチパ
ネルなどポインティングデバイスあるいはキーボードな
どの入力装置である。When the time-series data is presented to the user, the information output means 7 can output the time-series data and a meaningful part separately from other meaningful parts. The key data input means 8 is an input device such as a pointing device such as a mouse, a pen, a touch panel, or a keyboard.

【００３６】キーデータ入力手段８は、利用者からのキ
ーデータの修正、追加などを受け付ける。これらの入力
装置を利用する場合は、情報出力手段７でキーデータや
構造情報を画面表示し、利用者からの入力があればキー
データに対応する構造情報を変更する。The key data input means 8 receives correction and addition of key data from the user. When using these input devices, the information output means 7 displays the key data and the structural information on the screen and changes the structural information corresponding to the key data if there is an input from the user.

【００３７】このようにして、構造解析手段３が検出し
たキーデータの修正だけでなく、利用者が構造情報を変
更・追加することができる。このような、利用者による
キーデータの入力は、情報出力手段７により構造情報格
納手段４および時系列データ格納手段２に記録されてい
る内容を確認しながら行い、利用者の要求に合った構造
情報に改良することができ、構造情報の質を高めること
ができる。In this way, not only the key data detected by the structure analysis means 3 can be corrected, but also the user can change / add the structure information. Such input of key data by the user is performed while confirming the contents recorded in the structure information storage means 4 and the time-series data storage means 2 by the information output means 7, and a structure that meets the user's request. Information can be improved and the quality of structural information can be improved.

【００３８】以下具体的に、時系列データとして音声デ
ータを入出力する場合について説明する。音声データ
は、マイクなどの入力装置を持つ時系列データ入力手段
１によって入力される。The case of inputting and outputting audio data as time series data will be specifically described below. The voice data is input by the time-series data input means 1 having an input device such as a microphone.

【００３９】時系列データ入力手段１は、図４（ａ）の
ように、入力された音声データにデータ種別、時刻情報
といった識別データを付加する。データ種別とは、入力
装置、チャネル番号、メディアの種類、標本化周波数な
どデータ離散化情報のように入力データを再生するため
の情報である。この情報を以下、付加時系列データとよ
ぶ。As shown in FIG. 4A, the time-series data input means 1 adds identification data such as data type and time information to the input voice data. The data type is information for reproducing input data such as data discretization information such as an input device, a channel number, a type of medium, and a sampling frequency. This information is hereinafter referred to as additional time series data.

【００４０】各入力装置は同時に複数利用でき、例え
ば、複数のマイクを利用したマルチチャネル入力も可能
である。この場合、データ種別の一情報として、どのチ
ャネルから入力されたデータであるかを示すチャネル番
号も含まれることになる。A plurality of each input device can be used at the same time, and for example, multi-channel input using a plurality of microphones is also possible. In this case, as one piece of data type information, a channel number indicating from which channel the data was input is included.

【００４１】また、入力開始時刻は、システムに音声デ
ータが入力された時刻を指すが、もともと入力されたマ
ルチメディアデータの中に時刻情報が含まれている場合
は、それをそのまま利用することも可能である。The input start time indicates the time when the voice data is input to the system, but if the originally input multimedia data includes time information, it can be used as it is. It is possible.

【００４２】図４（ｂ）は、時系列データ入力手段１
が、音声データに「データがマイクで入力され、マイク
のチャネルが２番、データのメディアは音声、標本化周
波数が１２ｋＨｚ、量子化精度が１６ｂｉｔ、そして、
音声の開始時刻がtsで終了時刻がteある」という情報を
付加時系列データを表の形式で図示したものである。FIG. 4B shows a time series data input means 1
However, in the voice data, "Data is input with a microphone, the channel of the microphone is 2, the data medium is voice, the sampling frequency is 12 kHz, the quantization accuracy is 16 bits, and
The information that the start time of the voice is ts and the end time is te is shown in the form of a table of additional time series data.

【００４３】時系列データ入力手段１は、識別データを
付加した図４の付加時系列データを時系列データ格納手
段２と構造解析手段３に送る。時系列データ格納手段２
は、送られた付加時系列データと、時系列データのＩＤ
と時系列データ格納手段２のアドレスを対応づけた付加
時系列データの格納位置を示す情報を格納する。図５
（ａ）は、時系列データＩＤ「SP-129」という時系列デ
ータが、時系列データ格納手段２の「××××」という
アドレスの記憶領域に格納されていることを示してい
る。図５（ｂ）は、「××××」というアドレスに時系
列データＩＤ「SP-129」の付加時系列データを格納して
いる表の形式で図示したものである。The time-series data input means 1 sends the additional time-series data of FIG. 4 to which the identification data is added to the time-series data storage means 2 and the structure analysis means 3. Time series data storage means 2
Is the additional time series data sent and the ID of the time series data
And information indicating the storage position of the additional time series data in which the addresses of the time series data storage means 2 are associated with each other. Figure 5
(A) shows that the time-series data with the time-series data ID “SP-129” is stored in the storage area of the address “xxxxx” of the time-series data storage means 2. FIG. 5B is a diagram in the form of a table in which the additional time series data of the time series data ID "SP-129" is stored in the address "xxxxxx".

【００４４】構造解析手段３は、付加時系列データの認
識処理を行いキーデータを検出し、少なくとも検出した
前記キーデータと、前記キーデータの時刻情報と、前記
時系列データと前記キーデータをリンクさせる情報で構
成される構造情報を作成する。構造情報は構造情報の種
類を示す構造情報名あるいは構造情報ＩＤのほかに、音
声データのどの部分から得たかを示す時刻情報が必ず含
まれるよう構成されている。この時刻情報は、付加時系
列データの開始時刻と標本化周波数などデータの離散化
情報をもとに得ることができる。この構造情報を図６の
ように表示することにする。The structure analysis means 3 detects the key data by recognizing the additional time series data, and links at least the detected key data, time information of the key data, the time series data and the key data. Structure information composed of information to be created is created. The structure information is configured to always include time information indicating from which part of the audio data, in addition to the structure information name or structure information ID indicating the type of structure information. This time information can be obtained based on the start time of the additional time series data and the discretization information of the data such as the sampling frequency. This structural information will be displayed as shown in FIG.

【００４５】図６の構造情報は、構造情報ＩＤが「WD-
5」、もととなる時系列データのＩＤ「SP-129」、構造
情報名が「単語検出結果」、時刻情報として開始時刻
「t1」、終了時刻「t2」、および検出されたキーデータ
であるキーワードから構成されていることを示してい
る。認識処理によるキーワードの検出についてはのちに
詳しく述べる。In the structure information of FIG. 6, the structure information ID is "WD-
5 ”, the original time-series data ID“ SP-129 ”, the structure information name is“ word detection result ”, the time information is the start time“ t1 ”, the end time“ t2 ”, and the detected key data. It indicates that it is composed of a certain keyword. The keyword detection by the recognition processing will be described in detail later.

【００４６】構造解析手段３は、生成した構造情報を構
造情報格納手段４に送り、構造情報格納手段４は、送ら
れた構造情報と、構造情報のＩＤと構造情報格納手段内
のアドレスを対応づけた構造情報の格納位置を示す情報
を格納する。図７（ａ）は、構造情報ＩＤ「WD-5」とい
う構造情報が、構造情報格納手段４の「○○○○」とい
うアドレスの記憶領域に格納されていることを示してい
る。図７（ｂ）は、「○○○○」というアドレスに構造
情報ＩＤ「WD-5」の構造情報を格納している表の形式で
図示したものである。The structure analysis means 3 sends the generated structure information to the structure information storage means 4, and the structure information storage means 4 associates the sent structure information with the ID of the structure information and the address in the structure information storage means. Information indicating the storage location of the attached structure information is stored. FIG. 7A shows that the structure information having the structure information ID “WD-5” is stored in the storage area of the structure information storage unit 4 at the address “◯◯◯◯”. FIG. 7 (b) is shown in the form of a table in which the structure information of the structure information ID "WD-5" is stored in the address "○○○○".

【００４７】以下、構造解析手段３での音声データの認
識処理を具体的に説明する。ここでは構造情報として、
音声分析処理、単語検出処理、構文意味解析処理、対話
構造理解処理などを通じて得られる音声区間、発声単
語、発話意味内容、発話環境情報といったキーデータの
検出を例にとって説明する。The speech data recognition processing by the structure analysis means 3 will be specifically described below. Here, as structural information,
An example will be described of detection of key data such as a voice section, a spoken word, utterance meaning contents, and utterance environment information obtained through a voice analysis process, a word detection process, a syntactic and meaning analysis process, a dialogue structure understanding process, and the like.

【００４８】ここで構造解析手段３は、図８にあるよう
に、音声区間検出部８１、音響分析部８２、単語検出部
８３、構文意味解析部８４から構成されるものとする。
音声区間検出部８１は、音声データ中の音声が存在する
区間をキーデータとして検出する。これは音声信号のパ
ワーを調べ、その値が一定時間にある閾値を越えたとき
の始点を音声区間の始端とし、また、パワー値が一定時
間にある閾値を越えないとき、その始点を音声区間の終
端とすることで実現する。分かりやすくするため、図９
に時間変化と音声パワーの関係を表したグラフで示す
が、この図９の音声の開始時刻t1（始端）、終了時刻t2
（終端）を検出することになる。これは音声信号のパワ
ー値が一定時間（ｄt1）で閾値を越えたときの始点を音
声区間の始端とし、また、パワー値が一定時間（ｄt2）
で閾値を越えないとき、その始点を音声区間の終端とす
ることで実現できる。このパワーの閾値、持続時間の長
さは、始端検出と終端検出とで異なる値を設定すれば、
検出精度を高めることができる。この音声区間の検出に
ついての構造情報は、ここでは図１０のように表し、構
造情報ＩＤ「VP-013」、もととなる時系列データのＩＤ
「SP-129」、構造情報名「音声区間」、時刻情報として
開始時刻「t1」、終了時刻「t2」、キーデータとして音
声区間「t1からt2」を持つ。ここでは、キーデータが音
声の区間を時間で表したものであるため、時刻情報と同
じようなデータであるが、構造情報ではキーデータと時
刻情報を明確に区別する必要がある。Here, the structure analysis means 3 is assumed to be composed of a voice section detection section 81, an acoustic analysis section 82, a word detection section 83, and a syntactic and meaning analysis section 84, as shown in FIG.
The voice section detection unit 81 detects a section in which voice exists in the voice data as key data. This is to check the power of the audio signal, and set the start point when the value exceeds a certain threshold for a certain period of time as the beginning of the voice section, and when the power value does not exceed the certain threshold for a certain period of time, the start point is the voice section. It is realized by setting the end of. Figure 9 for clarity
A graph showing the relationship between the time change and the voice power is shown in Fig. 9. The start time t1 (start end) and end time t2 of the voice in Fig. 9 are shown.
(End) will be detected. In this, the start point when the power value of the voice signal exceeds the threshold value for a fixed time (dt1) is the start of the voice section, and the power value is a fixed time (dt2).
When the threshold value is not exceeded at, it can be realized by setting the start point as the end of the voice section. For the threshold value of power and the length of duration, if different values are set for the start end detection and the end detection,
The detection accuracy can be improved. The structure information regarding the detection of the voice section is represented here as shown in FIG. 10, and the structure information ID “VP-013” and the ID of the original time series data are shown.
"SP-129", structure information name "voice section", start time "t1" as time information, end time "t2", and voice section "t1 to t2" as key data. Here, the key data is the same as the time information because the key data represents the voice section by time, but the key information and the time information must be clearly distinguished in the structure information.

【００４９】音響分析部８２は、音声データの音響分析
を行う。ここで、ＦＦＴ（高速フーリエ変換）等の方法
によるスペクトル分析、周波数領域での平滑化、対数変
換を行い、例えば、１６チャンネルのバンドパスフィル
タより８ｍｓ間隔で音響分析結果、つまり、周波数スペ
クトルのパターンを得る。The acoustic analysis unit 82 performs acoustic analysis of voice data. Here, spectrum analysis by a method such as FFT (Fast Fourier Transform), smoothing in the frequency domain, and logarithmic transformation are performed, and, for example, an acoustic analysis result, that is, a frequency spectrum pattern is obtained at 8 ms intervals from a 16-channel bandpass filter. To get

【００５０】この音響分析結果つまり、周波数スペクト
ルのパターンを単語検出部８３に送り、単語検出部８３
で単語検出処理が行われる。単語検出部８３は、例えば
文献（金沢、坪井、竹林“不要語を含む連続音声中から
の単語検出”電子情報通信学会技術研究報告、SP91-22,
pp.33-39(1991.6)）に開示された方法で単語検出の処理
を行うことができる。これにより、入力音声の周波数ス
ペクトル系列のパターンと検出対象となる単語の周波数
スペクトル系列の標準パターンとの照合を行い、スコア
の高い単語を単語検出結果情報として得ることができ
る。This acoustic analysis result, that is, the frequency spectrum pattern is sent to the word detecting section 83, and the word detecting section 83 is sent.
The word detection process is performed. The word detection unit 83 uses, for example, documents (Kanazawa, Tsuboi, Takebayashi “Word Detection from Continuous Speech Containing Unwanted Words” IEICE Technical Report, SP91-22,
pp.33-39 (1991.6)) can be used to perform word detection processing. As a result, the pattern of the frequency spectrum sequence of the input voice is compared with the standard pattern of the frequency spectrum sequence of the word to be detected, and a word with a high score can be obtained as word detection result information.

【００５１】単語検出部８３で得られた単語検出結果情
報は、単語とその始端、終端、標準パターンとの類似度
などのキーデータとを合わせて扱うことにより、時間情
報、単語名、登録されている単語との尤度などのキーデ
ータを検出された単語に組み合わせて構造情報として扱
うことができる。図１１（ａ）に示すように、ここで単
語検出の構造情報は、構造情報ＩＤ「WD-7」、もととな
る時系列データのＩＤ「SP-129」、構造情報名「単語検
出結果」、時刻情報として開始時刻「t1」、終了時刻
「t2」、単語名（キーワード）「はい」、尤度「0.82」
で構成される。The word detection result information obtained by the word detection unit 83 is treated as a combination of the key data such as the start and end of the word and the similarity with the standard pattern, and the time information, the word name, and the word information are registered. It is possible to combine key data such as the likelihood of the word being detected with the detected word as structural information. As shown in FIG. 11A, here, the structure information for word detection is structure information ID “WD-7”, original time-series data ID “SP-129”, structure information name “word detection result”. , Start time “t1”, end time “t2”, word name (keyword) “yes”, likelihood “0.82” as time information
Composed of.

【００５２】これらの検出対象となる単語は、利用者が
あらかじめ決めておき、必要に応じて追加・削除など変
更することも可能である。上述の単語検出部８３の処理
の例は、検出単語ごとに周波数スペクトル系列の標準パ
ターンを持つものとして説明したが、もちろん、音韻単
位で周波数スペクトル系列の標準パターンを持つことも
可能である。また、単語単位の照合と音韻単位の照合を
併用することもできる。The words to be detected can be determined in advance by the user and can be added or deleted as necessary. Although the example of the processing of the word detection unit 83 described above has been described as having a standard pattern of frequency spectrum series for each detected word, it is of course possible to have a standard pattern of frequency spectrum series for each phoneme. It is also possible to use word-based matching and phoneme-based matching together.

【００５３】ところで、単語検出部８３が検出した単語
検出結果情報は、必ずしもそのまま単語認識の最終結果
となるわけではない。単語検出結果情報として得たキー
ワードの中には、発話の中に含まれていた検出すべき正
しい単語のほかに、実際には発話に含まれていないが周
波数スペクトル系列が類似しているために誤検出された
単語も含まれている可能性がある。例えば、「はい」と
いう単語と「大」（“だい”と読む場合）という単語は
類似しているため誤検出される可能性がある。ここで、
「大」がキーデータとして検出され、構造情報を生成し
た場合、この構造情報は図１１（ｂ）のように構造情報
ＩＤ「WD-8」、もととなる時系列データのＩＤ「SP-12
9」、構造情報名「単語検出結果」、時刻情報として開
始時刻「t1」、終了時刻「t2」、単語名（キーワード）
「大」、尤度「0.75」で構成される。By the way, the word detection result information detected by the word detection unit 83 does not necessarily directly become the final result of word recognition. Among the keywords obtained as word detection result information, in addition to the correct words to be detected that were included in the utterance, the frequency spectrum sequences that are not actually included in the utterance are similar, but It may also include mis-detected words. For example, the word “yes” and the word “large” (when read as “dai”) are similar to each other, and thus may be erroneously detected. here,
When “large” is detected as key data and structural information is generated, the structural information is structural information ID “WD-8” as shown in FIG. 11B, and the original time-series data ID “SP- 12
9 ”, structural information name“ word detection result ”, time information as start time“ t1 ”, end time“ t2 ”, word name (keyword)
It is composed of "Large" and likelihood "0.75".

【００５４】構文意味解析部８４は、単語検出部８３で
の検出結果として得られた単語に対し、可能なあらゆる
単語系列（時間的に重ならない単語の組合せ）について
構文意味解析を行い、受理可能な単語系列によって生ま
れる発話の意味内容を意味表現候補とする。The syntactic and semantic analysis unit 84 performs syntactic and semantic analysis on all possible word sequences (combinations of words that do not overlap in terms of time) with respect to the words obtained as a result of detection by the word detection unit 83, and accepts them. Semantic content of utterances generated by various word sequences are defined as semantic expression candidates.

【００５５】また、ここで得られる意味表現候補は、必
ずしも１つにしぼられず、受理可能な単語系列の候補が
複数存在することがあり得る。これらの複数の候補の中
から、それまでの発話の意味内容の履歴を考慮して適切
な意味表現を選択することにより、最終的に発話の意味
を決定する。したがって、単語認識の最終結果は、ここ
で選択された意味表現を構成する単語系列を指す。これ
らの発話の意味を決定する処理は、文献（坪井、橋本、
竹林“キーワードスポッティングに基づく連続音声理
解”電子情報通信学会技術研究報告、SP91-95,pp.33-40
(1991.12) ）、あるいは、文献（貞本、新地、坪井、竹
林“不特定話者音声対話システムＴＯＳＢＵＲＧの対話
処理”日本音響学会講演論文集、1-P-17,pp.137-138(19
92.3) ）に開示された方法により、行うことができる。The semantic expression candidates obtained here are not necessarily limited to one, and there may be a plurality of acceptable word series candidates. The meaning of the utterance is finally determined by selecting an appropriate meaning expression from the plurality of candidates in consideration of the history of the meaning content of the utterance up to that point. Therefore, the final result of word recognition refers to the word sequences that make up the semantic expression selected here. The process of determining the meaning of these utterances is described in the literature (Tsuboi, Hashimoto,
Takebayashi “Continuous Speech Understanding Based on Keyword Spotting” IEICE Technical Report, SP91-95, pp.33-40
(1991.12)) or the literature (Sadamoto, Shinchi, Tsuboi, Takebayashi "Dialogue processing of the unspecified speaker spoken dialogue system TOSBURG", Acoustical Society of Japan, 1-P-17, pp.137-138 (19)
It can be carried out by the method disclosed in 92.3)).

【００５６】これらは、予想される単語の並びから意味
表現を得る方法であり、想定される単語の並びとその意
味表現から予め決めておくことができる。また、利用者
が必要に応じて追加・削除など変更することも可能であ
る。追加・削除についてはのちに詳しく説明する。These are methods for obtaining a semantic expression from an expected sequence of words, which can be determined in advance from an expected sequence of words and their semantic representation. Also, the user can add or delete as necessary. The addition / deletion will be described in detail later.

【００５７】上述した、意味表現候補と意味表現候補を
構成する単語系列は、発話の意味内容の候補に関するキ
ーデータとして扱うことができる。このキーデータから
構造情報を作成したものを図１２で示すが、図１２の構
造情報は、構造情報ＩＤ「SR-5」、もととなる時系列デ
ータのＩＤ「SP-129」、構造情報名「発話意味」、開始
時刻「t1」、終了時刻「t2」、発話の意味「肯定」、構
成単語の総数が２つで、「はい」「そうです」という単
語、意味表現として選択されたか否かを表している。ま
た、単語系列についての情報は、単語検出結果として得
た構造情報を指すポインタを持つことによって、単語検
出結果に関する構造情報とリンクできる。また同様に、
最終的に選択した意味表現とそれを構成する単語系列も
発話の意味内容に関する構造情報として扱うことができ
る。The above-described meaning expression candidates and the word sequences forming the meaning expression candidates can be treated as key data relating to the meaning content candidates of the utterance. The structure information created from this key data is shown in FIG. 12. The structure information in FIG. 12 includes structure information ID “SR-5”, original time-series data ID “SP-129”, structure information. Name "utterance meaning", start time "t1", end time "t2", meaning "affirmation" of the utterance, the total number of constituent words is two, and the words "yes" and "yes" were selected as the meaning expression. Indicates whether or not. Further, the information about the word series can be linked to the structure information about the word detection result by having a pointer that points to the structure information obtained as the word detection result. Similarly,
The finally selected semantic expression and the word sequences that compose it can also be treated as structural information regarding the semantic content of the utterance.

【００５８】環境情報抽出部８５は、音声データを取り
込んだ際の周囲の環境に関するキーデータを抽出し、構
造情報を作成する。環境情報抽出部８５を加えることに
より、さらに詳細な構造情報を得ることができることに
なる。The environment information extraction unit 85 extracts key data relating to the surrounding environment when the voice data is taken in, and creates structure information. By adding the environment information extraction unit 85, more detailed structure information can be obtained.

【００５９】例えば、複数の話者の発声した音声を入力
データとして扱う場合について図１３をもとに説明す
る。図１３（ａ）は、話者Ａのマイクから入力した音声
の音声パワーを表し、図１３（ｂ）は、話者Ｂのマイク
から入力した音声パワーをグラフで表したものであるも
のとする。For example, a case where voices uttered by a plurality of speakers are treated as input data will be described with reference to FIG. 13A shows the voice power of the voice input from the microphone of the speaker A, and FIG. 13B shows the voice power input from the microphone of the speaker B in a graph. .

【００６０】また、出席者ごとにマイクを用意しなくて
も、マイクアレイ（指向性のあるマイク）を構成すれば
特定の方向の利得を大きく取ることができ、話者のいる
方向からの音声を強めて取り出すことができ、話者を特
定することも可能である。したがって、これらの方法を
利用し、どの話者による発声かというキーデータが抽出
でき、これを構造情報に利用できる。Further, even if a microphone is not prepared for each attendee, a large gain can be obtained in a specific direction by constructing a microphone array (a directional microphone), and a voice from the direction in which the speaker is present can be obtained. Can be taken out by strengthening, and the speaker can be specified. Therefore, by using these methods, key data indicating which speaker is speaking can be extracted and used as structural information.

【００６１】全ての話者の音声区間を比較することによ
って、どの話者も発声していない時間を沈黙として検出
することができる。また、一方、これらの方法を利用す
れば、逆に、話者以外の周囲環境音の成分を取り出すこ
とも可能である。すなわち、音声区間検出部８１により
検出された音声データの音声区間以外の部分を周囲環境
音として扱うことができる。話者Ａのマイクから入力さ
れた周囲環境音を表したグラフを図１３（ｃ）に表し、
話者Ｂのマイクから入力された周囲環境音を表したグラ
フを図１３（ｄ）に表す。ここでは、図１３（ａ）と図
１３（ｂ）の音声区間でない部分が周囲環境音になる。By comparing the voice intervals of all speakers, the time when no speaker is speaking can be detected as silence. On the other hand, by using these methods, conversely, it is possible to extract the component of the ambient environment sound other than that of the speaker. That is, the portion of the voice data detected by the voice section detection unit 81 other than the voice section can be treated as the ambient sound. A graph showing the ambient sound input from the microphone of the speaker A is shown in FIG.
A graph showing the ambient sound input from the microphone of the speaker B is shown in FIG. Here, a part other than the voice section in FIGS. 13A and 13B is the ambient sound.

【００６２】この周囲環境音の中には、音声以外の周囲
環境の雑音のほか、パワーが低かったり持続時間が短い
ために音声区間として扱われなかった音声が含まれる。
このため、周囲環境音のパワーの大きさを調べることに
より、ざわめきなど、周囲環境の静かさの程度を構造情
報に利用できる。この周囲環境を表す情報は、意味のあ
るキーデータとは違うので、ここでは環境情報と定義す
る。The ambient environment sound includes ambient environment noise other than voice, and voice not treated as a voice section because of low power or short duration.
Therefore, by examining the magnitude of the power of the ambient environment sound, the degree of quietness of the ambient environment such as buzz can be used as structural information. Since the information indicating the surrounding environment is different from the meaningful key data, it is defined here as environment information.

【００６３】この周囲環境音の中には、音声以外の周囲
環境の雑音のほか、パワーが低かったり持続時間が短い
ために音声区間として扱われなかった音声が含まれる。
このため、周囲環境音のパワーの大きさを調べることに
より、ざわめきなど、周囲環境の静かさの程度を環境情
報として構造情報に利用できる。The ambient environment sound includes ambient environment noise other than voice, and voice that is not treated as a voice section because of low power or short duration.
Therefore, by examining the magnitude of the power of the ambient environment sound, the degree of quietness of the surrounding environment such as buzz can be used as structural information in the structural information.

【００６４】周囲環境音として笑い声や拍手など、その
場の雰囲気を表わす特徴的な音を認識することにより、
使用時の雰囲気を環境情報として構造情報に利用でき
る。音声区間検出部８１が出力する音声区間の始端、終
端のキーデータを組み合わせることにより、１人の話者
が発話を終了してから、別の話者が発話を開始するまで
の時間が短いほど活発な対話を行なっているなどの判断
が可能であり、これを環境情報として構造情報に利用で
きる。前出の図１３では、話者Ａが話し終わったt2から
話者Ｂが話し始めたｔ３までの時間が短いほど活発な対
話を行なっていることになる。By recognizing a characteristic sound that represents the atmosphere of the place, such as laughter and applause, as the ambient environment sound,
The atmosphere during use can be used as structural information for structural information. By combining the key data at the beginning and end of the voice section output by the voice section detector 81, the shorter the time from when one speaker finishes speaking until another speaker starts speaking. It is possible to judge that an active dialogue is taking place, and this can be used for structural information as environmental information. In FIG. 13 described above, the shorter the time from t2 when the speaker A finishes speaking to t3 when the speaker B begins speaking, the more active the conversation is.

【００６５】また、残響特性の標準パターンとする辞書
を作成することにより、周囲環境音から場所の推定がで
きる。特に、本装置を利用する場所がある程度限られる
場合、自分の部屋、会議室、廊下、屋外といった使用場
所ごとの残響特性を集めることにより、残響特性の標準
パターンとする辞書を作成することができる。図１４の
ように試験音発生部１４０が電源投入時などに出力する
システムのビープ音を試験信号として発生させ、マイク
等の音声入力装置から使用場所推定部１４１に入力させ
る。使用場所推定部１４１は、残響特性辞書１４２に格
納されている使用場所のデータと照合する。Further, by creating a dictionary having standard patterns of reverberation characteristics, it is possible to estimate the place from the ambient sound. In particular, when the place where this device is used is limited to some extent, a dictionary can be created as a standard pattern of reverberation characteristics by collecting reverberation characteristics for each use location such as one's room, meeting room, corridor, and outdoors. . As shown in FIG. 14, the test sound generation unit 140 generates a system beep sound, which is output when the power is turned on, as a test signal and inputs it to the use location estimation unit 141 from a voice input device such as a microphone. The usage place estimation unit 141 collates with the usage place data stored in the reverberation characteristic dictionary 142.

【００６６】このようにして推定した利用場所を環境情
報として利用し、どこの場所で入力したかを示す情報が
付加され、構造情報として構造情報格納部４に格納され
る。このほか、対話の中でやりとりされた発話の意味内
容を対話履歴として保持しておき、新たに入力された発
話の意味内容と対話履歴から現在の対話の状況を知るこ
とができる。これをキーデータとして構造情報に利用す
ることも可能である。The use place estimated in this way is used as environment information, and information indicating where the input is made is added and stored in the structure information storage unit 4 as structure information. In addition, the semantic content of the utterances exchanged in the dialogue is held as a dialogue history, and the current situation of the dialogue can be known from the semantic content of the newly input utterance and the dialogue history. It is also possible to use this as key data for structural information.

【００６７】なお、本実施例では、構造解析される時系
列データは、時系列データ入力手段１を介して入力され
るとしているが、これをオンラインで解析するだけでな
く、図１５に示すように（図１と同一の符号を付す）、
時系列データ格納手段２に既に格納されている時系列デ
ータを構造解析手段３に送ることができるように構成
し、既に格納されている時系列データを解析し新しいキ
ーデータを生成することも可能である。In this embodiment, the time-series data to be structurally analyzed is input via the time-series data input means 1. However, not only is this data analyzed online, but as shown in FIG. To (the same reference numerals as in FIG. 1),
The time-series data already stored in the time-series data storage unit 2 may be configured to be sent to the structure analysis unit 3, and the already-stored time-series data may be analyzed to generate new key data. Is.

【００６８】新しいキーデータを作成することよって、
すでに解析を行なった時系列データに対し、再び認識処
理をやり直すことができる。たとえば、データ入力時に
認識させた単語と異なる語を認識語彙として設定して、
再び構造解析することもできる。By creating new key data,
The recognition process can be redone for the time series data that has already been analyzed. For example, set a word that is different from the word recognized at the time of data entry as a recognition vocabulary,
The structure can be analyzed again.

【００６９】また、単語認識だけでなく意味表現の場合
でも、予想していた場面での単語の並びや意味表現の候
補の範囲をこえた際には、単語の並びや意味表現を利用
者が正しく設定して、改めて構造解析することによっ
て、適切な構造情報を生成することができる。Further, in the case of not only word recognition but also meaning expression, when the word arrangement and meaning expression candidate range in the expected scene are exceeded, the user is required to use the word arrangement and meaning expression. Appropriate structural information can be generated by setting correctly and performing structural analysis again.

【００７０】つぎに、これらのキーデータを利用して得
られるキーデータの例をいくつか示す。簡単のため、話
者ごとにマイクを用意するなどの方法により、すでに話
者ごとに音声データが分離されているとする。Next, some examples of key data obtained by using these key data will be shown. For simplicity, it is assumed that the voice data has already been separated for each speaker by, for example, preparing a microphone for each speaker.

【００７１】この話者ごとの音声データを分析して、音
声区間を示すキーデータから構造情報を生成し、構造情
報格納手段４に格納したとものする。このキーデータ
は、ある話者がいつ発声したかを示す情報であり、どの
話者がいつ、どれくらい長く発声したか、どれくらい頻
繁に発声したかがわかる。また、どれくらい頻繁に話者
が交替したか、一定時間にどれだけ多くの話者が発声し
たか、１人の話者が発話を終了してから別の話者が発話
を開始するまでの時間差の大きさなどによって、対話や
議論がいかに白熱しているかを知ることができる。It is assumed that the voice data for each speaker is analyzed, structure information is generated from the key data indicating the voice section, and stored in the structure information storage means 4. This key data is information indicating when a certain speaker uttered, and it is possible to know which speaker uttered when, how long, and how often. Also, how often the speakers changed, how many speakers uttered in a certain period of time, and the time difference between the end of one speaker and the start of another speaker. You can know how exciting conversations and discussions are based on the size of.

【００７２】また、同じように音声データを分析して、
単語音声認識の認識結果を示すキーデータから構造情報
を生成し、構造情報格納手段４に格納したものとする。
例えば会議出席者の音声データを入力した場合、その会
議におけるそれぞれの話題ごとに頻出する単語を認識対
象語のキーデータとしておけば、いつどういう話題につ
いて議論していたかをおよその内容についてその単語か
ら知ることができる。またある人が発言した特定の単語
を認識対象語のキーデータとすれば、重要な発言のあっ
た部分を検索して取り出すことができる。Similarly, by analyzing the voice data,
It is assumed that the structure information is generated from the key data indicating the recognition result of the word voice recognition and stored in the structure information storage means 4.
For example, when inputting voice data of meeting attendees, if the frequently occurring words for each topic in the meeting are used as the key data of the recognition target word, the topic and content of the discussion will be estimated from the word. I can know. Further, if a specific word spoken by a person is used as the key data of the recognition target word, it is possible to search for and extract a portion in which an important speech is made.

【００７３】前述したように、既に時系列データ格納部
２に記録してある音声データを分析することもできるの
で、認識対象語は音声データを取り込んだ後に設定し直
すことも可能であり、事前に想定していなかった重要な
発言の検索を行なうこともできる。As described above, since the voice data already recorded in the time series data storage unit 2 can be analyzed, the recognition target word can be reset after the voice data is captured. It is also possible to search for important utterances that were not expected.

【００７４】さらに、音声データの分析から、発話の意
味表現を示すキーデータから構造情報を生成し、構造情
報格納手段４に格納したものとする。これにより、いつ
どういう話題について議論していたかをおよその内容に
ついてその意味表現のキーデータから知ることができ
る。また、誰がどういう内容について発言したか、誰と
誰の発言内容が近かったかについて知ることができる。
また、話題の推移はどうだったかを図示すれば議論の流
れを整理して図解・整理することができる。Further, it is assumed that the structure information is generated from the key data indicating the semantic expression of the utterance from the analysis of the voice data and stored in the structure information storage means 4. This makes it possible to know when and what topic was being discussed from the key data of the semantic expression about the approximate content. In addition, it is possible to know who spoke what and what was close to whom.
Also, if the transition of topics is illustrated, the flow of discussion can be arranged and illustrated / arranged.

【００７５】時系列データが複数格納されている場合、
これらにまたがって共通する構造情報があれば、これを
リンクして検索することも可能である。たとえば、複数
の音声データに共通する話者がいる場合、同じ話者をピ
ックアップすることができる。When a plurality of time series data are stored,
If there is common structural information across these, it is possible to search by linking it. For example, if a plurality of voice data have a common speaker, the same speaker can be picked up.

【００７６】複数の会議の音声データが入力されている
場合、それらの話者を示す構造情報から、ある会議に出
席していた人が、別の会議で発言していた内容を検索す
ることも可能である。When the voice data of a plurality of conferences are input, it is possible to retrieve the contents of a person who attended one conference from another conference from the structural information indicating the speakers. It is possible.

【００７７】また、複数の音声データに共通する単語が
ある場合、ある会議で主要な話題となった単語につい
て、以前の会議ではどう使われていたか、単語を示す構
造情報を頼りに検索することができる。In addition, when there are words common to a plurality of voice data, it is necessary to search for a word that became a main topic in a certain meeting, how it was used in a previous meeting or not, by relying on structural information indicating the word. You can

【００７８】ここで、誰が何を発言したか、話題の推移
と発言内容の近さを画面表示する例を図１６に示す。図
１６（ａ）は、会議参加者Ａ、Ｂ、Ｃ、Ｄ、Ｅの５人で
行われた会議で、議決をとったとき状況を図示したもの
である。ここでは、議決をとったときの音声データを構
造解析した結果、誰が何を言ったかという発話意味を表
し、Ａが賛成、Ｂが反対、Ｃが反対、Ｄが条件付きの賛
成、Ｅが賛成という主旨の意見を述べたことを示してい
る。Here, FIG. 16 shows an example in which who spoke what, what the transition of the topic, and the closeness of the contents of the speech are displayed on the screen. FIG. 16A shows the situation when a decision is taken in a conference held by five conference participants A, B, C, D, and E. Here, as a result of structural analysis of the voice data when the decision was taken, it represents the utterance meaning of who said what. A is for, B is for, C is for, D is for conditional, E is for It indicates that he has expressed his opinion.

【００７９】図１６（ｂ）は、会議における時間の経過
と、発言の内容の近さを図示したものである。ここで
は、はじめＤの案１とＢの案２があり、発言内容が離れ
ていて、また、他の発言者もなく会議が停滞していた
が、後にＡが案３を出してから議論が活発になり、Ｄも
Ａの意見に近くなっていったことを示している。FIG. 16B shows the progress of time in the conference and the closeness of the contents of the message. Here, there were plan 1 of D and plan 2 of B at the beginning, the contents of the speech were separated, and the conference was stalled without other speakers. However, the discussion was started after A issued the plan 3. It shows that D became more active and D became closer to A's opinion.

【００８０】構造情報どうしは、構造情報名あるいは構
造情報ＩＤ、および時刻情報をもとに互いに関係づけら
れ、新たな構造情報を生成する。以下、検索、出力、そ
して、利用者によるキーデータの入力について簡単に説
明する。The structure information items are related to each other based on the structure information name or structure information ID and the time information, and new structure information is generated. The search, output, and user input of key data will be briefly described below.

【００８１】前述したように、検索時はキーデータを検
索キーとして利用する。利用者は検索命令入力手段５に
キーデータを入力する。ここでは、「はい」という単語
をキーデータとして入力する。検索命令入力手段５に入
力されたキーデータを検索手段６に送り、検索手段６は
構造情報格納手段４から「はい」というキーデータをも
つ構造情報をすべて検索する。検索手段６は検索した構
造情報を情報出力手段７に送り、情報出力手段７は、検
索された構造情報を提示し、利用者はこれを参照して、
さらに検索したい構造情報のキーデータを検索命令入力
手段５を利用して入力する。As described above, the key data is used as the search key during the search. The user inputs the key data into the search command input means 5. Here, the word "yes" is input as key data. The key data input to the search command input means 5 is sent to the search means 6, and the search means 6 searches the structure information storage means 4 for all structural information having the key data of "yes". The search means 6 sends the searched structural information to the information output means 7, and the information output means 7 presents the searched structural information, and the user refers to this and displays the structural information.
Further, the key data of the structural information to be searched for is input using the search command input means 5.

【００８２】検索命令入力手段５は、さらにキーデータ
で構成される検索キーの入力を受け付け、検索命令を検
索手段６に送る。例えば、時刻t1からt2に肯定の発言を
したときの時系列データを再生する場合、開始時刻「t
1」、終了時刻「t2」、発話意味「肯定」というキーデ
ータを利用者は検索命令入力手段５に入力する。The search command input means 5 further receives an input of a search key composed of key data, and sends the search command to the search means 6. For example, when playing back the time-series data when an affirmative statement is made from time t1 to t2, the start time "t
The user inputs key data “1”, end time “t2”, and utterance meaning “affirmation” into the search command input means 5.

【００８３】検索手段６は、構造情報格納手段４から利
用者の入力したキーデータに一致する構造情報を検索す
る。ここでは、図１２に示した構造情報「SR-5」が検索
され、すでに「はい」というキーデータが入力されてい
るので、「はい」というキーデータをもつ構造情報と構
造情報「SR-5」から「はい」というキーデータをもつ構
造情報「WD-7」という構造情報が検索される。The search means 6 searches the structure information storage means 4 for structure information that matches the key data entered by the user. Here, since the structural information “SR-5” shown in FIG. 12 is searched and the key data “Yes” has already been input, the structural information “SR-5” having the key data “Yes” and the structural information “SR-5” are input. The structural information "WD-7" having the key data "Yes" is retrieved from "."

【００８４】意味のある情報のみを出力させる場合、検
索された構造情報を情報出力手段７に送り、情報出力手
段７は送られた構造情報の一部である意味のある情報を
出力する。When outputting only meaningful information, the retrieved structure information is sent to the information output means 7, and the information output means 7 outputs meaningful information which is a part of the sent structure information.

【００８５】音声データそのものを出力させる場合、検
索手段６は、すでに検索した構造情報「WD-7」にあるも
との音声データのＩＤから、時系列データ格納手段２に
格納されている音声データにもアクセスする。When outputting the voice data itself, the retrieval means 6 uses the ID of the original voice data in the already retrieved structure information "WD-7" to retrieve the voice data stored in the time series data storage means 2. Also access.

【００８６】情報出力手段７は、スピーカやＣＲＴのよ
うなディスプレイ装置などで構成され、検索手段６が検
索したもととなった音声データを再生する。時系列デー
タの出力は、入力データのすべてもしくは一部を再生す
ることに相当し、構造情報の出力は、視覚的表示を用い
時系列データを短時間で表現したり、メタファとしての
音を聴覚的に出力する。たとえば、単語認識で得た構造
情報の表示としては、認識した単語をテキストやアイコ
ンを利用して表示したり、発話理解により得た構造情報
の表示では、キーワードのテキスト表示による意味内容
を表現することが可能である。会議記録の表示では、話
者の席順や発言時刻や発言の意味内容に応じてアイコン
表示と意味内容のテキスト表示を行い、どの発言者がど
ういう意味内容の発言をしてきたか、誰のどういう意味
内容の発言に対して他の者がどういう意味内容の発言を
したかという記録を表示する。また、前述の図１６
（ｂ）のように「議論が活発」、「停滞」、であったな
どの情報を色調や濃淡などの視覚的な表現で示すことに
より効果的な出力ができる。The information output means 7 is composed of a display device such as a speaker or a CRT, and reproduces the audio data which is the source of the retrieval by the retrieval means 6. Outputting the time-series data is equivalent to reproducing all or part of the input data, and outputting the structural information expresses the time-series data in a short time using a visual display, or hears the sound as a metaphor. Output. For example, in displaying the structural information obtained by word recognition, the recognized words are displayed using text or icons, and in displaying the structural information obtained by understanding the utterance, the meaning content by the text display of keywords is expressed. It is possible. In the display of the meeting record, icons are displayed and texts of the meaning contents are displayed according to the order of the seats of the speaker, the time of the talk, and the meaning contents of the talk. A record of what the other person said in response to the statement was displayed. In addition, FIG.
As shown in (b), the effective output can be performed by showing the information such as “the discussion is active” and “stagnation” in the visual expression such as the color tone and the shading.

【００８７】キーデータ入力手段８は、マウスやペン、
タッチパネルなどポインティングデバイスあるいはキー
ボードなどの入力装置で構成される。キーデータ入力手
段８は、これらの入力装置によって利用者からのキーデ
ータの修正、追加などを受け付ける。これらの入力装置
を利用する場合は、情報出力手段７にキーデータや構造
情報を画面に表示し、修正・追加の対象となるキーデー
タをカーソルの場所で示し、利用者からの入力がキーワ
ード入力手段８にあれば対応する構造情報を変更する。The key data input means 8 is a mouse, a pen,
It is composed of a pointing device such as a touch panel or an input device such as a keyboard. The key data input means 8 accepts correction and addition of key data from the user by these input devices. When using these input devices, the key data and the structural information are displayed on the screen on the information output means 7, the key data to be corrected / added is shown at the cursor position, and the input from the user is the keyword input. If the means 8 exists, the corresponding structural information is changed.

【００８８】このように利用者がキーデータを入力する
ことにより、構造解析手段３が自動的に生成したキーデ
ータや構造情報の修正だけでなく、利用者がキーデータ
や構造情報を追加することができる。In this way, by the user inputting the key data, not only the correction of the key data and the structure information automatically generated by the structure analysis means 3 but also the addition of the key data and the structure information by the user You can

【００８９】利用者によるキーデータの入力は、情報出
力手段７により構造情報格納手段４および時系列データ
格納手段２に格納されている内容を確認しながら行い、
利用者の要求に合った構造情報に改良することができ、
構造情報の質を高めることができる。The user inputs the key data while confirming the contents stored in the structure information storage means 4 and the time series data storage means 2 by the information output means 7,
It is possible to improve the structural information to meet the user's request,
The quality of structural information can be improved.

【００９０】利用者が入力するキーデータの１つとし
て、音声データや構造情報に関する利用者の評価を含め
ることもできる。例えば、各話者の発言内容に対する評
価を☆、○、△、×などのランクに分けて評価し、これ
をキーデータとして構造情報に付加したとする。極めて
重要と思われる発話には☆を、重要と思われる内容には
○を、重要ではないが話題進行に関係する発話には△
を、特に話題進行にも関係しない発話には×を付けたと
する。あとでポイントとなる部分を調べるときには☆の
部分を検索し、要約する場合には○の部分を検索し、時
間の余裕があるときに話題の流れを追う場合には△の部
分を検索することができる。As one of the key data entered by the user, the user's evaluation of voice data and structural information can be included. For example, suppose that the evaluation of the utterance content of each speaker is divided into ranks such as ☆, ◯, Δ, and × and evaluated, and this is added to the structural information as key data. ☆ for utterances that seem to be extremely important, ○ for utterances that seem to be important, and △ for utterances that are not important but related to topic progression.
In particular, utterances that are not related to the progress of the topic are marked with x. If you want to find out the point that will be the point later, search for the ☆ part, if you want to summarize, search for the ○ part, and if you have time to follow the flow of the topic, search for the △ part. You can

【００９１】次に、認識・理解システムの評価システム
（認識・理解評価システムと呼ぶことにする）に適用し
た例について説明する。一般に認識・理解システムの性
能向上のためには、システムを実際に利用して行った実
データを多く収集する必要があるが、収集した大量のデ
ータを管理する手間と労力が要求される。本実施例のよ
うに、全ての認識・理解結果を各々キーデータとし、そ
れを得るもとになった時系列データとのリンク情報を保
持していれば、各々の認識・理解結果を得た時系列デー
タだけを選択的に出力することができる。Next, an example applied to a recognition / understanding evaluation system (hereinafter referred to as a recognition / understanding evaluation system) will be described. Generally, in order to improve the performance of the recognition / understanding system, it is necessary to collect a large amount of actual data actually used by the system, but it requires labor and labor to manage a large amount of collected data. As in this embodiment, all recognition / understanding results are obtained as key data, and if the link information with the time-series data from which they are obtained is held, each recognition / understanding result is obtained. Only time series data can be selectively output.

【００９２】この場合、入力時系列データは認識・理解
システムへの入力時系列データおよび認識・理解結果の
データである。上述の例のように、認識・理解システム
内で複数の処理段階を経る場合、各段階における中間的
な処理結果も含めて、時系列データとして扱うこともで
きる。In this case, the input time series data is the input time series data to the recognition / understanding system and the recognition / understanding result data. When a plurality of processing steps are performed in the recognition / understanding system as in the above-described example, it is possible to handle them as time-series data including intermediate processing results in each step.

【００９３】評価システムでは、認識・理解性能を評価
するための評価規模、すなわち、（Ａ）望ましい認識・
理解結果（途中結果）あるいは正しい認識・理解結果
（途中結果）がキーデータに相当する。あるいは、
（Ｂ）認識・理解システムの認識・理解結果（途中結
果）と望ましいあるいは正しい認識・理解結果（途中結
果）との間の比較の結果をキーデータとしてもよい。あ
るいは、キーデータを、（Ｃ）現状の認識・理解システ
ムでは認識・理解対象としていないものの、今後は対象
とすべき時系列データに対するラベルとすることもでき
る。In the evaluation system, the evaluation scale for evaluating the recognition / understanding performance, that is, (A) desirable recognition /
The understanding result (interim result) or the correct recognition / understanding result (interim result) corresponds to the key data. Alternatively,
(B) The result of comparison between the recognition / understanding result of the recognition / understanding system (interim result) and the desired or correct recognition / understanding result (interim result) may be used as the key data. Alternatively, the key data may be (C) a label for time-series data that should be targeted in the future although it is not targeted for recognition / understanding by the current recognition / understanding system.

【００９４】リンク情報は、（ａ）認識・理解システム
の処理結果との対応を示す情報としてもよいし、（ｂ）
認識・理解システムへの入力時系列データとの対応を示
す情報（例えば、時間、あるいはサンプル点、あるいは
データのアドレスなど）としてもよい。The link information may be (a) information indicating the correspondence with the processing result of the recognition / understanding system, or (b).
It may be information (for example, time, sample point, or data address) indicating the correspondence with the input time series data to the recognition / understanding system.

【００９５】これらの情報はユーザが入力することがで
きる。キーデータを望ましい認識結果（Ａ）として、リ
ンク情報を（ｂ）として検索すれば、ある認識対象に対
する入力データの実データを収集でき、これを学習用デ
ータとして使ってより精度の高い認識辞書を得ることが
できる。The above information can be input by the user. If the key data is used as the desired recognition result (A) and the link information is searched as (b), the actual data of the input data for a certain recognition target can be collected, and this can be used as learning data to create a more accurate recognition dictionary. Obtainable.

【００９６】キーデータ（Ｂ）を集計すれば、認識・理
解性能を得ることができ、リンク情報をたどれば、ある
認識対象に対するにゅうりょく時系列データを正誤の判
定結果とともに収集できる。By summarizing the key data (B), the recognition / understanding performance can be obtained, and by tracing the link information, the time series data for a certain recognition target can be collected together with the correct / wrong judgment result.

【００９７】キーデータ（Ｃ）を使い検索すれば現状シ
ステムで扱えない、実データだけを収集できる。以下、
本発明の時系列データ記録システムを、音声対話システ
ムとその利用者との間で行なわれた対話を評価する、音
声対話評価システムに応用する例を図１７をもとに説明
する。By searching using the key data (C), it is possible to collect only actual data that cannot be handled by the current system. Less than,
An example in which the time-series data recording system of the present invention is applied to a voice dialog evaluation system for evaluating a dialog between a voice dialog system and its user will be described with reference to FIG.

【００９８】図１７にあるように、音声対話評価システ
ム１７０は、利用者の発声する音声を認識理解し音声応
答や画面表示による応答を利用者に返すシステムである
音声対話システム１７１と、音声対話システム１７１の
認識理解結果の精度、対話状況を考慮した対話音声デー
タベース作成などの機能を持つ対話評価システム１７２
で構成される。つまりここでは、音声対話システム１７
１は、構造解析手段３のように音声データを認識して構
造情報を出力し、対話評価システム１７２は、本発明の
時系列データ記録再生装置で、音声対話システム１７１
が入力した構造情報から音声対話システム１７１の音声
認識結果の評価を行うものである。As shown in FIG. 17, the voice dialogue evaluation system 170 is a system for recognizing and understanding the voice uttered by the user and returning a voice response or a screen display response to the user, and a voice dialogue system 171. Dialogue evaluation system 172 having functions such as accuracy of recognition result of system 171 and creation of dialogue voice database considering dialogue situation
Composed of. That is, here, the voice dialogue system 17
Reference numeral 1 is a structure analysis unit 3 which recognizes voice data and outputs structure information, and a dialogue evaluation system 172 is a time-series data recording / reproducing apparatus of the present invention, which is a voice dialogue system 171.
The speech recognition result of the speech dialogue system 171 is evaluated from the structural information input by the.

【００９９】以下、対話評価システム１７２の構成は図
１に示した構成と同様のものとする。また、図１と同一
の符号で説明する。音声対話システム１７１は、対話評
価システム１７２（時系列データ記録再生装置）の時系
列データ入力手段１に対話音声のＰＣＭデータ、および
認識理解結果や応答内容などの対話の情報が含まれる付
加時系列データやキーデータ、構造情報を送る。Hereinafter, the configuration of the dialogue evaluation system 172 is similar to that shown in FIG. Also, description will be given with the same reference numerals as those in FIG. The voice dialogue system 171 is an additional time series in which the time series data input means 1 of the dialogue evaluation system 172 (time series data recording / reproducing device) includes PCM data of dialogue voice and dialogue information such as recognition and understanding result and response content. Send data, key data, structural information.

【０１００】音声対話システム１７１が、対話評価シス
テム１７２の時系列データ入力部１に送る認識理解結果
や応答内容などの対話の情報のファイル出力例を図１８
に示す。この音声対話システムのファイル出力には、何
回目の発話であるかを示す発話番号、音声対話システム
とその利用者の発話の音声区間、単語検出結果、構文・
意味解析によって得られた複数の意味表現の候補、これ
ら複数の候補の中から、対話の履歴を考慮して選択した
意味表現（理解結果）、応答内容などが含まれている。
対話評価システム１７２は、この入力データを音声デー
タと構造情報に分離する。FIG. 18 shows an example of a file output of dialogue information such as recognition and understanding results and response contents sent to the time series data input unit 1 of the dialogue evaluation system 172 by the voice dialogue system 171.
Shown in. The file output of this voice dialog system includes the utterance number indicating the number of utterances, the voice section of the voice dialog system and the user's utterance, the word detection result, the syntax,
It includes a plurality of semantic expression candidates obtained by the semantic analysis, a semantic expression (understanding result) selected from the plurality of candidates in consideration of the history of the dialogue, response contents, and the like.
The dialogue evaluation system 172 separates this input data into voice data and structural information.

【０１０１】音声対話システム１７１の認識理解結果の
精度を調べるためには、単語検出・認識結果や発話の意
味の理解結果の正解のキーデータを人間が与えなければ
ならないが、これは対話評価システム１７２のキーデー
タ入力手段８を用いて利用者が追加入力する。In order to check the accuracy of the recognition / understanding result of the voice dialogue system 171, a human must give the correct key data of the word detection / recognition result and the understanding result of the meaning of the utterance. This is the dialogue evaluation system. The user additionally inputs using the key data input means 8 of 172.

【０１０２】図１９は、ハンバーガーショップでの注文
を受け付ける音声対話システムを一例として、キーデー
タ入力手段８と情報出力手段７の一部をウインドウシス
テムのライブラリであるウインドウインタフェイスを利
用して実現する例であり、対話評価システムの利用者が
画面に表示された認識単語のアイコンをマウスやペンで
クリックすることにより、正解の単語のテキストを与え
ることができる。この例では、認識単語だけでなく、１
回の発話が終了したことを示す区切り用のアイコンも用
意している。例えば、対話評価システム１７２のキーデ
ータ入力手段８をウインドウインタフェイスで構成する
と、正解の認識単語のテキストは容易に入力、修正する
ことができる。ここで、オレンジジュースの小さいサイ
ズのものを１つ注文するという意味を入力する場合、ペ
ンなどで図１９に示した画面の「オレンジジュース」を
示すアイコン１９０と、「小」を示すアイコン１９１
と、「１つ」を示すアイコン１９２と「ください」を示
すアイコン１９３と「発話終了」を示すアイコン１９４
を触れて入力する。FIG. 19 shows an example of a voice interactive system for accepting orders at a hamburger shop, and realizes a part of the key data input means 8 and the information output means 7 by using a window interface which is a window system library. This is an example, and the user of the dialog evaluation system can give the text of the correct word by clicking the icon of the recognized word displayed on the screen with the mouse or the pen. In this example, not only the recognized word, but 1
There is also a delimiter icon to indicate the end of the utterance. For example, if the key data input means 8 of the dialogue evaluation system 172 is configured by a window interface, the text of the correct recognition word can be easily input and modified. Here, when inputting the meaning of ordering one small size of orange juice, an icon 190 indicating “orange juice” and an icon 191 indicating “small” on the screen shown in FIG.
And an icon 192 indicating “one”, an icon 193 indicating “please” and an icon 194 indicating “utterance end”.
Touch to enter.

【０１０３】ところで、図１９のウインドウインタフェ
イスで入力した正解の認識単語のテキストには、単語の
始端、終端などの時間情報が含まれていない。時間情報
を得るには、対話音声データを情報出力手段７から認識
単語の存在する区間を部分的に再生出力して試聴し、確
認しながら対話評価システム１７２の利用者が決定す
る。認識単語の存在する区間は、対話評価システム１７
２の利用者が入力して決定するが、音声対話システム１
７１が認識した結果として得た認識単語の始端、終端を
初期値とすれば、対話評価システム１７２の利用者の入
力の負担は大幅に軽減される。By the way, the text of the correct recognition word input through the window interface of FIG. 19 does not include time information such as the start and end of the word. In order to obtain the time information, the user of the dialogue evaluation system 172 decides the dialogue voice data from the information output means 7 by partially reproducing and outputting the section in which the recognized word exists and listening and checking. The section in which the recognized word exists is the dialogue evaluation system 17
2 users input and decide, but voice dialogue system 1
If the start and end of the recognized word obtained as a result of recognition by 71 are set as initial values, the burden of input on the user of the dialog evaluation system 172 is significantly reduced.

【０１０４】発話の意味表現の正解を入力する場合も、
ウインドウインタフェイスを用意すれば実現できる。ま
た、既に正しい単語の並びが得られていれば、その意味
解析を行うことによって発話内容の意味表現を得ること
ができる。この意味表現を修正することにより、評価シ
ステムの利用者が直接入力する必要のあるデータは軽減
される。Even when the correct answer of the semantic expression of the utterance is input,
This can be achieved by preparing a window interface. Further, if the correct word sequence has already been obtained, the semantic expression of the utterance content can be obtained by performing the semantic analysis. By modifying this semantic expression, the data that the user of the evaluation system needs to directly input is reduced.

【０１０５】このようにして利用者が正解のキーデータ
を入力して得られた構造情報は、音声対話システム１７
１の出力するキーデータや構造情報、つまり、対話評価
システム１７２に入力されたキーデータや構造情報と照
合し、音声対話システム１７１の単語検出性能を示す構
造情報を構成する。The structural information obtained by inputting the correct key data by the user in this way is the voice interaction system 17
1 is output, that is, the key data and structure information input to the dialog evaluation system 172 are collated, and the structure information indicating the word detection performance of the voice dialog system 171 is constructed.

【０１０６】例えば、利用者が正解の単語のキーデータ
「はい」をキーデータ入力手段８で入力し、図２０
（ａ）に表した正解の単語を示す構造情報を作成する。
構造解析手段３は、図２０（ａ）に表した正解の単語を
示す構造情報「WC-5」の時刻情報、つまり、開始時刻
「t1」、終了時刻「t2」をもとに、図１１に示した単語
検出結果を示す構造情報と照合し、正解の単語を検出し
ているか、検出されていない単語があるか（脱落）、正
解にない単語を検出しているか（挿入）について調べ、
それぞれの結果から単語検出性能を示す構造情報を生成
する。For example, the user inputs the key data “Yes” of the correct word with the key data input means 8,
Structural information indicating the correct word shown in (a) is created.
Based on the time information of the structure information “WC-5” indicating the correct word shown in FIG. 20A, that is, the start time “t1” and the end time “t2”, the structure analysis unit 3 performs the structure shown in FIG. By comparing with the structural information showing the word detection result shown in, it is checked whether a correct word is detected, whether there is a word that is not detected (dropped), or whether a word that is not in the correct answer is detected (insert),
Structural information indicating the word detection performance is generated from each result.

【０１０７】ここで、図１１（ａ）の構造情報「WD-7」
がある場合、単語検出結果を示す構造情報「WD-7」のポ
インタあるいは構造情報ＩＤが含まれ、正解の単語を示
す構造情報「WC-5」の情報と合わせ、図２０（ｂ）に示
すような、音声対話システム１７１の単語検出性能を示
す構造情報を作成ことができる。Here, the structure information “WD-7” in FIG.
20B, the pointer or structure information ID of the structure information “WD-7” indicating the word detection result is included and combined with the information of the structure information “WC-5” indicating the correct word, and is shown in FIG. Such structure information that indicates the word detection performance of the voice interaction system 171 can be created.

【０１０８】さらに、単語検出性能を示す構造情報と単
語認識結果を示す構造情報と照合し、単語認識性能を示
す構造情報を生成する。ここで、前述したように、単語
認識結果を示す構造情報は、単語検出結果を示す構造情
報とリンクする情報を持っているので、その単語が正解
かあるいは正解でないかが判定できる。したがって、単
語認識結果に含まれる単語列のそれぞれが正しいか否か
が判断できる。このようにして、音声対話システム１７
１の単語認識性能を評価することができる。Further, the structure information indicating the word detection performance is collated with the structure information indicating the word recognition result to generate the structure information indicating the word recognition performance. Here, as described above, since the structure information indicating the word recognition result has the information linked to the structure information indicating the word detection result, it can be determined whether the word is the correct answer or not. Therefore, it is possible to determine whether or not each of the word strings included in the word recognition result is correct. In this way, the voice dialogue system 17
The word recognition performance of 1 can be evaluated.

【０１０９】また、正解の発話の意味表現を示す構造情
報は、意味表現候補や選択した意味表現を示す構造情報
と照合され、選択すべき意味表現候補を正しく選択して
いるか、意味表現候補の中に選択すべきものがない場合
はあるかについての情報が含まれた理解性能を示す構造
情報を生成する。The structural information indicating the semantic expression of the correct utterance is collated with the semantic expression candidate and the structural information indicating the selected semantic expression to determine whether the semantic expression candidate to be selected is correctly selected or the semantic expression candidate. The structural information indicating the comprehension performance including information about whether or not there is something to be selected is generated.

【０１１０】上述した構造情報は、以下に述べるよう
に、音声対話システム１７１の性能向上に利用すること
ができる。例えば、単語検出性能の向上について述べ
る。対話評価システム１７２の利用者の入力した正解の
単語を示す構造情報から、実対話中の単語音声データを
リストアップすることができる。これにより、ある単語
だけの音声データを集めた音声データベースを得ること
ができる。この音声データを用いて単語検出用の辞書の
再学習を行えば、単語認識性能の向上が図れる。さら
に、単語検出性能を示す構造情報を用いれば、単語ごと
の検出精度が得られ、特に検出性能の低い単語から再学
習を行えば、性能向上を迅速に行うことができる。The above-mentioned structural information can be used for improving the performance of the voice dialogue system 171 as described below. For example, the improvement of word detection performance will be described. From the structural information indicating the correct word input by the user of the dialogue evaluation system 172, the word voice data in the actual dialogue can be listed. This makes it possible to obtain a voice database that collects voice data of only certain words. If the dictionary for word detection is relearned using this voice data, the word recognition performance can be improved. Further, if the structural information indicating the word detection performance is used, the detection accuracy for each word can be obtained, and if the learning is performed again from a word having a low detection performance, the performance can be rapidly improved.

【０１１１】対話評価システム１７２の情報出力手段７
における、構造情報を話題の移り変わりと対話の時間的
な経過と関連付けて表示させる出力例を図２１をもとに
説明する。ここでは、図１８のファイル出力例を表示し
たものである。Information output means 7 of the dialogue evaluation system 172
In FIG. 21, an output example of displaying the structural information in association with the change of the topic and the temporal passage of the dialogue will be described with reference to FIG. Here, the file output example of FIG. 18 is displayed.

【０１１２】会話音声波形表示部２１０は、会話の音声
の音声波形である。これは音声の時系列データを波形で
再現したものである。テキスト表示部２１１は、時間情
報付きの時系列データをテキストで表現したものを出力
する。The conversation voice waveform display section 210 is a voice waveform of conversation voice. This is a reproduction of time series data of voice in a waveform. The text display unit 211 outputs time-series data with time information expressed in text.

【０１１３】単語音声波形表示部２１２は、会話内容表
示部２１１で点滅している部分の音声波形を出力してい
る。単語検出結果表示部２１３は、単語音声波形表示部
２１２で表示した音声波形から検出される単語を表示す
る。ここでは、波形が似ている単語、音声区間が似てい
る単語などが検出される。The word voice waveform display section 212 outputs the voice waveform of the blinking portion of the conversation content display section 211. The word detection result display unit 213 displays words detected from the voice waveform displayed by the word voice waveform display unit 212. Here, words having similar waveforms, words having similar voice intervals, etc. are detected.

【０１１４】単語認識結果表示部２１４は、検出された
単語の中から、正しいと認識した単語を表示する。ここ
では、4.74秒から5.12秒に発話された単語が「ポテト」
と認識されたことを示している。The word recognition result display unit 214 displays the word recognized as correct from the detected words. Here, the word uttered from 4.74 seconds to 5.12 seconds is "potato".
Has been recognized.

【０１１５】構文意味候補表示部２１５は、検出された
単語の中や認識された単語の中から文章の意味となる候
補を表示する。ここでは、検出した単語から注文を意味
する５つの候補が出力されているが、認識された単語か
ら１番目の候補が選択されている。The syntax-meaning-candidate display unit 215 displays candidates for the meaning of a sentence from the detected words and the recognized words. Here, five candidates that mean an order are output from the detected words, but the first candidate is selected from the recognized words.

【０１１６】構文意味候補情報表示部２１６は、構文意
味候補表示部２１５で選択した候補から「ハンバーガー
２つ」「ポテトの大１つ」「コーラ３つ」を注文すると
いう意味を表示する。The syntax-meaning-candidate information display unit 216 displays the meaning of ordering “two hamburgers”, “one large potato”, and “three colas” from the candidates selected in the syntax-meaning candidate display unit 215.

【０１１７】構文意味訂正情報表示部２１７は、構文意
味候補情報表示部２１６に表示された意味候補情報が誤
りを含んでいる場合に訂正した意味候補情報を表示す
る。以上のように、対話履歴に関する構造情報を利用
し、話題の移り変わりと対話の時間的な経過と関連付け
て表示させることにより、対話が滞っているか、円滑に
進んでいるかなどを視覚的に表示できる。The syntactic and meaning correction information display section 217 displays the meaning and candidate information corrected when the meaning and candidate information displayed on the syntactic and meaning candidate information displaying section 216 contains an error. As described above, by using the structural information related to the dialogue history and displaying it in association with the transition of the topic and the passage of time of the dialogue, it is possible to visually display whether the dialogue is stagnant or smoothly progressing. .

【０１１８】以上のような「音声対話評価システム」に
おける評価用対話データ（時系列データ）の扱いについ
て以下にまとめる。ここでの時系列データは、・対話時の音声データ（少なくともシステム応答とユー
ザ発話の２ch）である。The handling of the evaluation dialogue data (time series data) in the above "speech dialogue evaluation system" is summarized below. Here, the time-series data are: -Voice data at the time of dialogue (at least 2ch of system response and user utterance).

【０１１９】そして、時系列データとしても構造情報と
しても扱うことができるものとして例えば図１８に示す
システムの認識結果のように、・システム応答，ユーザの発話の開始・終了時刻（一般
には複数個ある）・ユーザ発話から取り出した（単語）音声認識結果（単
語の始端時刻・終端時刻も含む）・ユーザの（一発話に対する）発話意味理解結果・システムの内部状態・システムの応答内容（これらは各々、「音声対話システム」の音声区間検出
部、音声認識部（単語検出部）、音声理解部、対話管理
部、応答生成部の処理結果にほぼ相当する。）およびそ
の他に「望ましい処理として」システム開発者（評価
者）が入力する「正解」がある。Then, as what can be treated as both time-series data and structure information, for example, as the recognition result of the system shown in FIG. 18, system response, user's utterance start / end time (generally plural・ The (word) voice recognition result extracted from the user's utterance (including the start time and end time of the word) ・ User's utterance meaning understanding result (for one utterance) ・ System internal state ・ System response contents (these are These are almost equivalent to the processing results of the voice section detecting section, the voice recognizing section (word detecting section), the voice understanding section, the dialogue managing section, and the response generating section of the "speech dialogue system", and "the desirable processing". There is a "correct answer" entered by the system developer (evaluator).

【０１２０】そして、評価のため「音声対話システム」
とユーザの間で音声を使った（画面表示も利用）対話が
行われる。音声対話システムへユーザの発話が入力され
ると、この音声データの分析・認識理解、対話処理、応
答生成により、上記の音声データ，発話の開始・終了時
刻、音声認識結果、発話意味理解結果、内部状態、応答
内容が各々決定・生成される。Then, for evaluation, "spoken dialogue system"
And a user dialogue using voice (using screen display). When the user's utterance is input to the voice dialogue system, the voice data, the start / end time of the utterance, the voice recognition result, the utterance meaning understanding result, are analyzed and recognized by the voice data, the dialogue process, and the response generation. The internal state and response contents are determined and generated respectively.

【０１２１】「音声対話評価システム」は、システム性
能向上と、ユーザインタフェース改良のため、上記の各
処理データ、処理結果を記録し、評価者（音声対話評価
システムのユーザ）が入力する各々の「望ましい処理結
果」と比較し、現状の「音声対話システム」の性能の評
価情報あるいは、各々の処理のもととなった入力データ
（時系列データ）とのリンク情報などのシステム改良に
有用な情報を提供する。The "speech dialogue evaluation system" records each of the above-mentioned processing data and processing results in order to improve the system performance and the user interface, and each "input by the evaluator (user of the speech dialogue evaluation system)" is recorded. Information useful for system improvement such as performance evaluation information of the current "speech dialogue system" or link information with the input data (time series data) that is the source of each processing compared with "desired processing result" I will provide a.

【０１２２】ここで、得られた情報をもとに性能を評価
する例として、・音声認識性能は、「望ましい処理」としてシステム開
発者が入力する「正解」と、システムの処理結果との音
声認識結果の比較により評価し、・音声理解性能は、「望ましい処理」としてシステム開
発者が入力する「正解」と、システムの処理結果との発
話意味理解結果の比較により評価する、等の処理結果と「望ましい処理」の比較により評価す
る。Here, as an example of evaluating the performance based on the obtained information, the voice recognition performance is the voice of the “correct answer” input by the system developer as “desired processing” and the processing result of the system. Evaluated by comparing the recognition results ・ The speech understanding performance is evaluated by comparing the "correct answer" entered by the system developer as "desired processing" and the utterance meaning understanding result with the system processing result, etc. And "desirable treatment" are compared.

【０１２３】得られた情報を改良に有用な情報として用
いる例として、・音声認識性能の向上にために、実際の対話音声データ
に正誤の処理結果のラベルをつけた認識辞書の学習用デ
ータを出力できる。また、新しく追加すべき語彙のリス
トアップができる。・音声理解性能の向上のために、想定外の発話をリスト
アップし、文法を追加して理解性能を向上できる。・音声区間検出性能の向上のために、音声区間の検出誤
りをリストアップできる。As an example of using the obtained information as useful information for improvement, in order to improve the voice recognition performance, the learning data of the recognition dictionary in which the correct dialogue result label is attached to the actual dialogue voice data is used. Can be output. You can also list new vocabulary to be added.・ In order to improve the voice comprehension performance, you can list unexpected utterances and add grammar to improve comprehension performance.・ It is possible to list the detection error of the voice section in order to improve the voice section detection performance.

【０１２４】このように、個々の認識手段の認識結果で
ある構造情報を蓄積し、個々の認識手段の性能を向上さ
せることで、音声対話評価システム全体の性能を改良す
ることができる。As described above, by accumulating the structural information which is the recognition result of each recognizing means and improving the performance of each recognizing means, it is possible to improve the performance of the entire voice dialogue evaluation system.

【０１２５】以上、時系列データとして、音声データの
例を説明してきたが、以下時系列データとして、画像デ
ータを扱った例を簡単に説明する。カメラ等で構成され
る時系列データ入力手段１から画像データを入力する。
時系列データ入力手段１は、画像データに識別データを
付加し、時系列データ格納手段２と構造解析手段３に送
る。ＶＴＲ装置等の画像データを記録する装置で構成さ
れる時系列データ格納手段２は、送られた識別データを
付加した画像データを記録する。Although the example of the audio data has been described as the time series data, an example of handling the image data as the time series data will be briefly described below. Image data is input from the time-series data input means 1 including a camera or the like.
The time series data input means 1 adds identification data to the image data and sends it to the time series data storage means 2 and the structure analysis means 3. The time-series data storage means 2 composed of a device for recording image data such as a VTR device records the image data to which the sent identification data is added.

【０１２６】画像の認識処理を行う構造解析手段３は、
キーデータを検出し、構造情報を生成する。現在の画像
認識の装置では、「人間が、時刻taからtbの間、移動し
た」という程度の認識は可能である。このため、「人
間」、「移動」というキーデータの検出が可能である。
また、「時刻情報としてtaからtb、認識結果として人
間、画像データのＩＤ」といった構造情報や「時刻情報
としてtaからtb、認識結果として移動、画像データのＩ
Ｄ」といった構造情報を作成することができる。The structure analysis means 3 for recognizing an image is
Detect key data and generate structural information. With the current image recognition apparatus, it is possible to recognize the "human being has moved from time ta to time tb". Therefore, it is possible to detect the key data of “human” and “movement”.
Further, structural information such as “ta to tb as time information, human as recognition result, ID of image data” or “ta to tb as time information, move as recognition result, I of image data I
Structural information such as "D" can be created.

【０１２７】前記画像データを検索する場合、「人
間」、「移動」といったキーデータを検索命令入力部５
に入力することによって、検索手段６は、「人間」、
「移動」というキーデータをもつ構造情報を検索し、こ
の構造情報とリンクされている画像データを検索する。
この検索された画像データを情報出力手段７が再生す
る。When searching the image data, key data such as “human” and “movement” are searched for by the search command input unit 5.
By inputting in
The structural information having the key data "move" is searched, and the image data linked with this structural information is searched.
The information output means 7 reproduces the searched image data.

【０１２８】また、キーデータ入力手段８で、「人間」
というキーデータを具体的に人の名前に変更することに
より、より精度の高いキーデータや構造情報の生成がで
きる。Further, the key data input means 8 is used to
By specifically changing the key data to a person's name, more accurate key data and structure information can be generated.

【０１２９】映画やビデオのように音声データのついた
画像データであれば、音声についてのキーデータの検出
と構造情報の生成と、画像についてのキーデータの検出
と構造情報の生成を行い、活用することにより、質の高
いデータベースを実現できる。In the case of image data with audio data such as a movie or a video, key data for audio is detected and structural information is generated, and key data for image is detected and structural information is generated and utilized. By doing so, a high quality database can be realized.

【０１３０】[0130]

【発明の効果】以上説明したように、本発明では、マル
チメディアデータを入力とする時系列データの記録再生
において、その時系列データの認識処理を行いキーデー
タを検索し、キーデータから生成される構造情報を時系
列データとともに記録し、利用者がキーデータをもとに
検索し、時系列データや利用者にとって意味のある情報
だけを再生することができる。また、キーデータの修正
・追加を可能とすることにより構造情報を再度生成し、
利用者のニーズにより近い、利用者にとって質の高いデ
ータベースを提供できる。また、時系列データの構造情
報をもとに認識・理解を評価させ、システムの性能を向
上させることができる。As described above, according to the present invention, in recording / reproducing time-series data with multimedia data as an input, the time-series data is recognized, key data is searched, and generated from the key data. The structural information can be recorded together with the time series data, and the user can search based on the key data to reproduce only the time series data or information meaningful to the user. Also, by revising and adding key data, structural information is regenerated,
We can provide a high-quality database for users that is closer to their needs. Further, the recognition / understanding can be evaluated based on the structure information of the time series data, and the system performance can be improved.

[Brief description of drawings]

【図１】本発明の一実施例における全体の構成を示した
図である。FIG. 1 is a diagram showing an overall configuration in an embodiment of the present invention.

【図２】本発明の一実施例における時系列データの入力
から格納までの手順を示した図である。FIG. 2 is a diagram showing a procedure from input of time series data to storage in one embodiment of the present invention.

【図３】本発明の一実施例における格納されているデー
タを検索し、出力するまでの手順を示した図である。FIG. 3 is a diagram showing a procedure of searching for stored data and outputting the data according to an embodiment of the present invention.

【図４】本発明の一実施例におけるデータ種別や時刻情
報を付加した付加時系列データを示した図である。FIG. 4 is a diagram showing additional time series data to which a data type and time information are added according to an embodiment of the present invention.

【図５】本発明の一実施例における時系列データの格納
位置を示すアドレス情報と対応する付加時系列データを
示した図である。FIG. 5 is a diagram showing additional time-series data corresponding to address information indicating a storage position of time-series data in an embodiment of the present invention.

【図６】本発明の一実施例における構造情報の一例を示
した図である。FIG. 6 is a diagram showing an example of structural information according to an embodiment of the present invention.

【図７】本発明の一実施例における構造情報の格納位置
を示すアドレス情報と対応する構造情報を示した図であ
る。FIG. 7 is a diagram showing structure information corresponding to address information indicating a storage position of structure information according to an embodiment of the present invention.

【図８】本発明の一実施例における構造解析手段３の内
部構成を示した図である。FIG. 8 is a diagram showing an internal configuration of a structural analysis means 3 in one embodiment of the present invention.

【図９】本発明の一実施例における時間変化と音声パワ
ーの関係のグラフを表した図である。FIG. 9 is a diagram showing a graph of a relationship between time change and voice power in one embodiment of the present invention.

【図１０】本発明の一実施例における音声区間の検出に
ついての構造情報を示した図である。FIG. 10 is a diagram showing structural information about detection of a voice section in the embodiment of the present invention.

【図１１】本発明の一実施例における単語検出について
の構造情報を示した図である。FIG. 11 is a diagram showing structural information about word detection according to an embodiment of the present invention.

【図１２】本発明の一実施例における発話の意味内容の
候補に関する構造情報を示した図である。FIG. 12 is a diagram showing structure information related to candidates of semantic content of utterances in an example of the present invention.

【図１３】本発明の一実施例における時間変化と２人の
話者の音声パワーの関係を示した図である。FIG. 13 is a diagram showing the relationship between the time change and the voice power of two speakers according to an embodiment of the present invention.

【図１４】本発明の一実施例における使用場所を推定す
るシステムの構成を示した図である。FIG. 14 is a diagram showing a configuration of a system for estimating a place of use in an embodiment of the present invention.

【図１５】本発明の一実施例における既に格納されてい
る時系列データを構造解析手段３に送ることができるよ
うに構成したブロック図である。FIG. 15 is a block diagram configured so that already stored time series data in one embodiment of the present invention can be sent to the structure analysis means 3.

【図１６】本発明の一実施例における発言内容や話題の
推移と発言内容の近さを画面表示した図である。FIG. 16 is a diagram in which transitions of utterance contents and topics and closeness of utterance contents are displayed on a screen according to an embodiment of the present invention.

【図１７】本発明の一実施例における音声対話評価シス
テムに応用した場合の構成図である。FIG. 17 is a configuration diagram when applied to a voice dialogue evaluation system in one embodiment of the present invention.

【図１８】本発明の一実施例における時系列データの認
識理解結果や応答内容などの対話の構造情報のファイル
出力例を示した図である。FIG. 18 is a diagram showing an example of a file output of dialogue structural information such as a result of recognition and understanding of time-series data and response contents in an embodiment of the present invention.

【図１９】本発明の一実施例におけるキーデータ入力手
段をウインドウインタフェイスを利用して実現する例を
示した図である。FIG. 19 is a diagram showing an example in which the key data input means in one embodiment of the present invention is realized by using a window interface.

【図２０】本発明の一実施例における正解の単語につい
ての構造情報を示した図である。FIG. 20 is a diagram showing structural information about correct words in an example of the present invention.

【図２１】本発明の一実施例における構造情報を話題の
移り変わりと対話の時間的な経過と関連付けて表示させ
る例を示した図である。FIG. 21 is a diagram showing an example in which structural information in one embodiment of the present invention is displayed in association with topic changes and the temporal passage of dialogue.

[Explanation of symbols]

１時系列データ入力手段２時系列データ格納手段３構造解析手段４構造情報格納手段５検索命令入力手段６検索手段７情報出力手段８キーデータ入力手段８１音声区間検出部８２音響分析部８３単語検出部８４構文意味解析部８５環境情報抽出部１４０試験音発生部１４１使用場所推定部１４２残響特性辞書１７０音声対話評価システム１７１音声対話システム１７２対話評価システム１９０、１９１、１９２、１９３、１９４アイコン２１０会話音声波形表示部２１１テキスト表示部２１２単語音声波形表示部２１３単語検出結果表示部２１４単語認識結果表示部２１５構文意味候補表示部２１６構文意味候補情報表示部２１７構文意味訂正情報表示部 1 time-series data input means 2 time-series data storage means 3 structure analysis means 4 structure information storage means 5 search command input means 6 search means 7 information output means 8 key data input means 81 voice section detection section 82 acoustic analysis section 83 word detection Part 84 Syntax Semantic Analysis Part 85 Environment Information Extraction Part 140 Test Sound Generation Part 141 Use Location Estimation Part 142 Reverberation Characteristic Dictionary 170 Spoken Dialogue Evaluation System 171 Spoken Dialogue System 172 Dialogue Evaluation System 190, 191, 192, 193, 194 Icon 210 Conversation Speech waveform display unit 211 Text display unit 212 Word speech waveform display unit 213 Word detection result display unit 214 Word recognition result display unit 215 Syntax meaning candidate display unit 216 Syntax meaning candidate information display unit 217 Syntax meaning correction information display unit

───────────────────────────────────────────────────── フロントページの続き (72)発明者竹林洋一神奈川県川崎市幸区小向東芝町１番地株式会社東芝研究開発センター内 (72)発明者川倉康嗣神奈川県川崎市幸区小向東芝町１番地株式会社東芝研究開発センター内 (72)発明者溝口博神奈川県川崎市幸区小向東芝町１番地株式会社東芝研究開発センター内 (72)発明者田中久子神奈川県川崎市幸区小向東芝町１番地株式会社東芝研究開発センター内 (72)発明者新地秀昭東京都青梅市新町1385番地東芝ソフトウェアエンジニアリング株式会社内 ─────────────────────────────────────────────────── ─── Continuation of the front page (72) Yoichi Takebayashi, 1 Komukai Toshiba-cho, Sachi-ku, Kawasaki City, Kanagawa Prefecture, Research & Development Center, Toshiba Corp. Town No. 1 In stock company Toshiba Research and Development Center (72) Inventor Hiroshi Mizoguchi No. 1 Komukai Toshiba Town, Kouki-ku, Kawasaki City, Kanagawa Prefecture Inside company Toshiba Research and Development Center (72) Inventor Hisako Tanaka Yuko Kawasaki, Kanagawa Prefecture Komukai-shi, Toshiba-cho 1 Incorporated company Toshiba Research and Development Center (72) Inventor Hideaki Shinchi 1385 Shinmachi, Ome-shi, Tokyo Inside Toshiba Software Engineering Co., Ltd.

Claims

[Claims]

1. A time-series data recording / reproducing apparatus for storing time-series data and reproducing the stored time-series data, wherein key data is detected from input time-series data, and at least the detected key data and Structural information analysis means for generating structural information composed of key data and link information for associating the time series data, structural information storage means for storing the structural information generated by the structural information analysis means, and the key data Using the search key as a search key, the structure information stored in the structure information storage means is searched, and the time series data is searched based on the link information of the searched structure information. A time-series data recording / reproducing apparatus, which reproduces the time-series data retrieved by.

2. The time-series data recording / reproducing according to claim 1, wherein when reproducing the time-series data retrieved by the retrieval means, only a predetermined portion of the time-series data including the key data is reproduced. apparatus.

3. A time-series data recording / reproducing apparatus for storing a plurality of time-series data and reproducing the stored time-series data, wherein key data is detected from the input time-series data, and at least the detected key data and A structure information analysis unit for generating structure information composed of the key data and link information for associating the time-series data; a structure information storage unit for storing the structure information generated by the structure information analysis unit; A search for searching the structure information stored in the structure information storage means using key data as a search key, and searching for predetermined time series data from a plurality of time series data based on link information of the searched structure information. Means for reproducing the predetermined time series data retrieved by the retrieval means. .

4. Key data is detected from input time-series data, and at least the detected key data, link information for associating the key data with the time-series data, and environment information representing the environment of the time-series data are configured. Structure information analyzing means for generating structure information, structure information storing means for storing structure information generated by the structure information analyzing means, and the structure information storing means for storing the key data as a search key. A search unit that searches the structural information and searches for environmental information of a portion corresponding to the key data of the searched structural information, and the environmental information searched by the searching unit is visible to the user. A time-series data recording / reproducing device characterized by reproducing in a form.

5. The time-series data recording / reproducing apparatus according to claim 1, further comprising key data input means for a user to directly input key data in order to create new structural information.

6. A time-series data recording / reproducing apparatus according to claim 1, further comprising key data input means for allowing a user to directly input structural information.