JPH1153385A

JPH1153385A - Conference information recording reproducing device

Info

Publication number: JPH1153385A
Application number: JP9210291A
Authority: JP
Inventors: Eriko Tamaru; 恵理子田丸
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 1997-08-05
Filing date: 1997-08-05
Publication date: 1999-02-26
Anticipated expiration: 2017-08-05
Also published as: JP3879786B2

Abstract

PROBLEM TO BE SOLVED: To make it possible to arrive at desired information as efficiently as possible in a conference information recording reproducing device that visually expresses a structure of comment at a conference as a speaker chart and uses it as an index of an access to conference information. SOLUTION: A structure of comments by plural conference participants is extracted from sound data recorded and this structure of comments is displayed as a speaker chart on a display picture. A retriever executes a comment or a reproduction section that he or she wants to reproduce through an instruction input means 12 on the speaker chart. Reproduction means 13 and 14 reproduce sound data corresponding to the instructed comment or the reproduction section. An intention extraction means 17 extracts an intention of an instruction operation of the retriever by using values of attribute such as name of the speaker, time of speech, name of the previous speaker, name of the following speaker and so on that are predetermined for extracting the intention. A similar comment candidate having an intention similar to the comment instructed by the instruction input means is detected and the detected similar comment candidate is visually displayed on a display device.

Description

DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】この発明は、会議における音
声情報あるいは映像情報などの会議情報を記録し、再生
する装置において、会議の参加者の発言構造から特定の
状況の音声情報および／または映像情報を検索して再生
する場合に、特に、検索者の意図に適したアクセス個所
をできるだけもれなく効率的に検索できる装置に関わ
る。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an apparatus for recording and reproducing conference information such as audio information or video information in a conference. In particular, the present invention relates to an apparatus that can efficiently search for an access location suitable for the searcher's intention as much as possible.

【０００２】[0002]

【従来の技術】会議では、多くの情報が会話による音声
情報として生成される。これらのうち、ホワイトボード
や議事録にテキスト情報として記録される情報はわずか
であり、多くの重要な情報が記録されない、あるいは正
確に思い出せないなどの問題点がある。2. Description of the Related Art In a conference, a great deal of information is generated as speech information by conversation. Of these, only a small amount of information is recorded as text information on whiteboards or minutes, and there is a problem that many important information is not recorded or cannot be accurately remembered.

【０００３】この問題に対して、会議で発生するあらゆ
る情報を記録しておく会議記録装置があり、この会議記
録装置の一例が、特開平6-343146号公報に記載されてい
る。ここでは、マイクロフォンから入力された音声情
報、ビデオカメラから入力された映像情報、ペン入力に
よるテキスト情報や図形情報など、あらゆるマルチメデ
ィア情報がもれなく記録される。To solve this problem, there is a conference recording device for recording all information generated in a conference, and an example of this conference recording device is described in Japanese Patent Laid-Open No. Hei 6-343146. Here, all multimedia information such as audio information input from a microphone, video information input from a video camera, text information and graphic information input from a pen, and the like are recorded without exception.

【０００４】このような会議記録装置においては、会議
の内容を思い出そうとしたとき、どのようにして、適切
に必要な場所へアクセスできるのかが重要な問題とな
る。しかし、リアルタイムに参加者が、各会議場面にイ
ンデックスを貼付するのはきわめて困難である。この
点、会議終了後、人間による手動によって、適切なイン
デックスづけがなされれば、効果的なインデックスが可
能である。In such a meeting recording device, when trying to remember the contents of a meeting, it is important how to properly access a necessary place. However, it is extremely difficult for participants to attach an index to each conference scene in real time. In this regard, effective indexing is possible if proper indexing is performed manually by humans after the meeting.

【０００５】しかしながら、このようなインデックス付
けの手間は莫大である。さらに、後で必要な情報は、検
索を行う人それぞれによって、あるいは時間の経過に伴
って変化することが多く、あらかじめ決められたインデ
ックスで十分な検索を行うことは困難である。したがっ
て、会議中に発生する多様な手がかり情報から、人手を
わずらわせず、自動的に効果的なインデックスを提供す
る方法が検討されている。[0005] However, such indexing work is enormous. Further, the information required later often changes depending on each person who performs the search or over time, and it is difficult to perform a sufficient search using a predetermined index. Therefore, a method of automatically providing an effective index from various kinds of clue information generated during a meeting without any human intervention is being studied.

【０００６】特開平6-343146号公報では、ペンによる入
力手段によって、テキストやジェスチャーが入力された
時刻をインデックスとして、音声や映像情報を検索でき
る手段を提供している。会議参加者は、重要な発言が発
生すると、しばしば手書きメモをとる。このことから、
手書きメモを行った時刻をインデックスとすることで、
会議の重要情報に効果的にアクセスすることが可能とな
る。Japanese Patent Application Laid-Open No. Hei 6-343146 provides a means for retrieving audio and video information by using a time when a text or gesture is input by an input means using a pen as an index. Meeting participants often take handwritten notes when significant remarks occur. From this,
By using the time when the handwritten memo was performed as an index,
It is possible to access important information of the conference effectively.

【０００７】しかしながら、会議参加者は議論に熱中す
ると、メモがとれないという問題点が存在する。したが
って、このような会議参加者の能動的な指示および行為
を必要とするインデックスは、効果的なものが多い反
面、もれも多い。また、十分なインデックスを作成しよ
うと思ったら、会議参加者は、多くのメモを取らなけれ
ばならず、負担が増す。また、十分なメモ書きが存在す
れば、マルチメディア記録の必要性も小さくなるという
矛盾が発生する。[0007] However, there is a problem that a conference participant cannot take notes when he is enthusiastic about the discussion. Therefore, while indices that require such active instructions and actions of conference participants are many effective, there are many omissions. Also, if you want to create a sufficient index, meeting participants have to take a lot of notes, increasing the burden. In addition, if sufficient memo writing exists, there is a contradiction that the necessity of multimedia recording is reduced.

【０００８】できるだけ会議参加者に負担をかけず、十
分なインデックスを自動的に抽出するための方法とし
て、他にもいくつかの方法が検討されている。特開平2-
113790号公報では、動画像から、画像情報の特徴抽出に
より検索シーンを抽出し、これをメニュー表示すること
により、検索者が対話的に必要とする場面を選択してい
くことにより、大量の動画像データから、効率的に必要
なデータへとアクセスを可能とする。「特定の人物が黒
板に出て話した時」というように、会議の中でもこのよ
うな技術が有効である局面は存在する。しかしながら、
一般的には会議における映像情報はあまり大きな変化が
なく、ここから会議内容を思い出すための十分な手がか
りを抽出するのは困難である。[0010] Several other methods have been studied as methods for automatically extracting a sufficient index while minimizing the burden on conference participants as much as possible. JP-A-2-
In Japanese Patent No. 113790, a search scene is extracted from a moving image by extracting features of image information, and a menu is displayed on the search scene. It is possible to efficiently access necessary data from image data. There are situations where such technology is effective even in meetings, such as "when a specific person goes out and talks on the blackboard." However,
In general, video information in a conference does not change much, and it is difficult to extract sufficient clues from the video information to recall the content of the conference.

【０００９】会議で最も重要な情報は、会話による音声
データである。この音声データから検索のための手がか
りを抽出す方法が試みられている。特開平3-250481号公
報には、ユーザが道具を使用している映像の中からトラ
ブルに陥った時の映像へとアクセスするために、トラブ
ル時に頻繁に発せられるキーワードを用いて、該当する
データが記録されている場所へとアクセスする手法が記
載されている。しかしながら、ここではかなり状況が特
定化されており、汎用的な手がかり情報とはなり得てい
ない。[0009] The most important information in a conference is voice data from a conversation. Attempts have been made to extract a clue for retrieval from the voice data. Japanese Patent Application Laid-Open No. 3-250481 discloses that in order to access an image when a user falls into trouble from an image in which a user uses a tool, a keyword that is frequently issued at the time of the trouble is used to access the corresponding data. A method for accessing a place where is recorded is described. However, the situation is quite specific here and cannot be general clue information.

【００１０】同じく音声情報を利用するものとして、特
開示平6-236410号公報がある。ここでは、発話者の言語
解析を行い、発話内容の話題とその分野を同定し、話題
に適した情報群をデータベースから自動的に選択する。
ここでは、発話表現用の辞書を用いて、話題転換個所お
よびそこでの話題の候補を検出する。話題の転換点は、
会議記録へのアクセスの手がかりとして、非常に重要で
ある。Japanese Unexamined Patent Publication No. Hei 6-236410 discloses an apparatus utilizing voice information. Here, the language of the speaker is analyzed, the topic of the utterance content and its field are identified, and a group of information suitable for the topic is automatically selected from the database.
Here, a topic conversion point and a topic candidate there are detected using a dictionary for speech expression. The turning point of the topic is
It is very important as a clue to access to meeting records.

【００１１】しかしながら、話題転換点は重要ではある
が、アクセスの手がかり情報としては、粒度が大きすぎ
ることで、きめの細かいアクセスができないという問題
点がある。さらに、実用的な話題転換点を見つけるため
には、現時点での自然な発話における音声認識技術では
対応が十分ではないことと、発話表現用の辞書の充実に
おいて困難性が高い。[0011] However, although the topic turning point is important, there is a problem that fine-grained access is not possible because the granularity of the access clue information is too large. Furthermore, in order to find a practical topic turning point, it is difficult to cope with the natural speech recognition technology for natural utterance at the present time, and there is a high difficulty in enriching a dictionary for utterance expression.

【００１２】一方、特開平8-317365号公報には、会議の
発言者の音声データを、データの記憶量の大きさに応じ
た長さで時系列的にグラフ化する技術が示されている。
これにより、どのような順序で、誰が、どのくらいの時
間長の発言を行ったのかを、グラフとして視覚化するこ
とができる。以下、この明細書では、この発言構造図を
発言者チャートと呼ぶこととする。On the other hand, Japanese Patent Application Laid-Open No. 8-317365 discloses a technique for graphing voice data of a speaker of a conference in a time series in a length corresponding to the amount of stored data. .
This makes it possible to visualize, as a graph, in which order and by whom and for how long. Hereinafter, in this specification, this statement structure diagram is referred to as a speaker chart.

【００１３】この発言者チャートから会議参加者は、会
議終了後でも、自身が参加した会議の会議内容をある程
度想起することができ、重要な、あるいは必要とする情
報の記録場所にアクセスすることが可能となる。この技
術の利点は、高度な音声認識技術や辞書を必要としない
こと、会議参加者の明示的な指示が必要なく、記録され
た情報だけから自動的に作成可能な点である。From the speaker chart, a conference participant can recall to some extent the contents of the conference in which he or she participated even after the conference, and can access important or necessary information storage locations. It becomes possible. The advantages of this technology are that it does not require advanced speech recognition technology or dictionaries, and does not require explicit instructions from conference participants, and can be created automatically from only recorded information.

【００１４】[0014]

【発明が解決しようとする課題】しかしながら、発言者
チャートを使用した会議記録における検索においては、
次のような問題点が存在する。However, in searching in a conference record using a speaker chart,
The following problems exist.

【００１５】一つには、記録された会議情報中の「部分
情報」にアクセスすることに起因する問題点である。具
体的には、現在、アクセスしている情報が、どこの情報
だったのかがわからなくなってしまうという、アクセス
の絶対位置の喪失の問題がある。また、会議全体の中で
現在アクセスしている場所がどの辺なのかがわからない
と言う、全体の中の相対的な位置の喪失感の問題があ
る。さらに、アクセスした部分情報を信用して結論を出
してしまい、後で、結論が覆った部分の情報を見逃して
しまうという、論理展開のどんでん返しに対する弱さが
存在している。[0015] One is a problem caused by accessing "partial information" in the recorded conference information. Specifically, there is a problem of a loss of the absolute position of the access, which makes it impossible to know what information is currently being accessed. In addition, there is a problem that the relative location in the whole meeting is lost, because it is not known which side of the whole meeting is currently being accessed. In addition, there is a weakness against a repetition of logic development that trusts the accessed partial information to make a conclusion and later misses the information of the part that the conclusion has overturned.

【００１６】２点めは、正しくない再生場所にアクセス
した時、他のどこに必要な情報が存在しているのかわか
らないという点が挙げられる。The second point is that when an incorrect reproduction location is accessed, it is not possible to know where else the necessary information is located.

【００１７】これらの問題点に対して、特開平8-317365
号公報では対処できていない。これに対して、Xerox PA
RCのAudio browsing Tool(Donald G. Kimber,lynn D.Wi
lcox, Francine R. Chen, and Thomas Moran: "Speaker
Segmentation for BrowsingRecorded Audio", CHI ’9
5 Proceedings( short paper), pp.212-213) では、現
在アクセスしている場所を明示的に発言者チャート上に
示すことと、全体の中のどの部分を発言者チャートとし
て表示しているのかという２つの情報を表示すること
で、上記の「部分情報」へのアクセスに起因する問題点
の、絶対的および相対的なアクセス位置の喪失という問
題点は解決している。To solve these problems, Japanese Patent Application Laid-Open No. 8-317365
The issue does not address this issue. In contrast, Xerox PA
RC Audio browsing Tool (Donald G. Kimber, lynn D.Wi
lcox, Francine R. Chen, and Thomas Moran: "Speaker
Segmentation for BrowsingRecorded Audio ", CHI '9
5 Proceedings (short paper), pp. 212-213) indicates that the currently accessed location is explicitly shown on the speaker chart, and which part of the whole is displayed as the speaker chart By displaying the two pieces of information described above, the problem of the loss of the absolute and relative access positions, which is a problem caused by accessing the “partial information”, is solved.

【００１８】しかし、他の２つ問題点は残されたままで
ある。すなわち、会議に中では、論理展開が二点三点す
る可能性があり、誤って最初の結論にアクセスしてしま
った時、その後に存在する正しい情報を見落としがちで
ある。したがって、このような論理展開の転換に対し
て、アクセス漏れがなくなるような支援が必要となる。However, two other problems remain. That is, during a meeting, the logical development can be two or three points, and when one accidentally accesses the first conclusion, it is easy to overlook the correct information that exists thereafter. Therefore, it is necessary to support such a change in logical development so as to eliminate access omission.

【００１９】また、発言者チャート自身は必ずしも、１
回で正確に必要な情報の存在場所にアクセスできるイン
デックスではない。実際には、手書きメモなどと併用さ
れることで、正確さを増すことができる。しかしなが
ら、先にも述べたように手書きメモは参加者の負荷が高
いため、むしろ、あいまい性の存在する発言者チャート
から、どのように適切な情報の存在場所にたどりつける
支援を行えるのかが重要となる。すなわち、たとえ正し
くない場所にアクセスしたとしても、必要とする情報が
他のどの辺に存在しているのかがわかるような情報が必
要である。Also, the speaker chart itself is not necessarily 1
It is not an index that can access the location of necessary information exactly at the time. Actually, accuracy can be increased by being used in combination with a handwritten memo or the like. However, as mentioned earlier, handwritten notes are a heavy burden on participants, so it is more important to support how to find the appropriate information from a speaker chart with ambiguity. Become. That is, even if an incorrect place is accessed, information is needed so that it is possible to know on which side the required information exists.

【００２０】以上の問題点に鑑み、この発明は、会議に
おける発言構造を視覚化表示し、それを記録された会議
情報へのアクセスのインデックスとして利用することが
可能な会議情報記録再生装置において、会議参加者の負
荷が小さく、しかも検索漏れが少なくでき、できるだけ
効率的に欲しい情報へと到達できるようにする装置を提
供することを目的とする。In view of the above problems, the present invention provides a conference information recording / reproducing apparatus capable of visualizing and displaying a statement structure in a conference and using the visualized structure as an index for accessing recorded conference information. It is an object of the present invention to provide a device which can reduce the load on conference participants, reduce omission in retrieval, and reach desired information as efficiently as possible.

【００２１】[0021]

【課題を解決するための手段】上記課題を解決するた
め、請求項１に記載の発明による会議情報再生装置は、
複数人の会議参加者が会議を行う際の音声データを記録
する記録手段と、前記音声データから、前記複数人の会
議参加者による発言構造を抽出する発言構造抽出手段
と、前記発言構造を視覚化するための視覚化情報を生成
する視覚化情報生成手段と、前記視覚化情報に基づいて
前記発言構造を視覚化する表示装置と、前記表示装置上
に視覚化された発言構造中において指示入力を行うため
の指示入力手段と、前記指示入力手段で指示された位置
または部分に該当する音声データを再生する再生手段
と、前記指示入力手段を通じた検索者の指示操作の意図
を、予め意図を抽出するために定めた属性の値を用いて
抽出する意図抽出手段と、前記意図抽出手段で抽出され
た意図と類似した意図を持つ音声データ区間を検出する
類似候補検出手段と、前記類似候補検出手段で検出され
た類似候補を表示装置上に視覚化するための類似候補表
示手段と、を具備することを特徴とする。In order to solve the above-mentioned problems, a conference information reproducing apparatus according to the first aspect of the present invention comprises:
Recording means for recording audio data when a plurality of conference participants hold a conference; statement structure extraction means for extracting a statement structure by the plurality of conference participants from the speech data; and visualizing the statement structure. Information generating means for generating visualization information for visualizing, a display device for visualizing the speech structure based on the visualization information, and an instruction input in the speech structure visualized on the display device Instruction input means for performing, a reproducing means for reproducing the audio data corresponding to the position or portion designated by the instruction input means, and an intention of the searcher's instruction operation through the instruction input means, Intention extraction means for extracting using the value of the attribute determined for extraction, similar candidate detection means for detecting a voice data section having an intention similar to the intention extracted by the intention extraction means, Serial characterized by comprising a similarity candidate display means for visualizing on the display device the detected similar candidates similar candidate detection means.

【００２２】また、請求項２に記載の発明による会議情
報再生装置は、会議情報の音声データを入力するために
会議参加者のそれぞれに設けられる音声入力装置と、前
記音声データを格納する記憶手段と、前記音声データか
ら発言を抽出する発話データ抽出手段と、前記抽出した
発言のデータとタイマーから発言構造テーブルを生成す
る発言構造テーブル生成手段と、前記発言構造テーブル
を格納する記憶手段と、前記音声入力装置と前記会議参
加者との対応関係を保持する会議参加者テーブルを格納
する記憶手段と、前記発言構造テーブルを表示装置上に
視覚化するための発言者チャートを生成する発言者チャ
ート生成手段と、前記発言者チャート生成手段で生成さ
れた前記発言者チャートを前記表示装置上に表示する発
言者チャート表示手段と、前記発言者チャート上で、検
索者が再生を意図する任意の発言を指示するための指示
入力手段と、前記指示入力手段によって指示された発言
を特定する発言特定手段と、前記特定された発言に関す
る前記検索者の指示意図を、予め意図を抽出するために
定めた属性の値を用いて抽出する意図抽出手段と、前記
意図抽出手段で抽出された前記検索者の指示意図の情報
を用いて、前記指示入力手段によって指示された発言と
類似した意図を持つ類似発言候補を検出する類似発言検
出手段と、前記類似発言検出手段で検出された類似発言
候補を表示装置上に視覚化するための類似発言候補表示
手段とを具備することを特徴とする。A conference information reproducing apparatus according to a second aspect of the present invention provides a conference information reproducing apparatus provided for each conference participant for inputting audio data of conference information, and storage means for storing the audio data. Utterance data extraction means for extracting utterances from the voice data, utterance structure table generation means for generating an utterance structure table from the extracted utterance data and a timer, storage means for storing the utterance structure table, Storage means for storing a conference participant table for holding a correspondence relationship between a voice input device and the conference participant, and a speaker chart generation for generating a speaker chart for visualizing the statement structure table on a display device Means, and a speaker chart display for displaying the speaker chart generated by the speaker chart generating means on the display device A step, on the speaker chart, instruction input means for instructing an arbitrary statement that the searcher intends to reproduce, statement specifying means for specifying the statement specified by the instruction input means, Intention extraction means for extracting the search intention of the searcher using the attribute value determined to extract the intention in advance, and information of the search intention of the searcher extracted by the intention extraction means. A similar utterance detecting means for detecting a similar utterance candidate having an intention similar to the utterance instructed by the instruction input means, and visualizing the similar utterance candidates detected by the similar utterance detecting means on a display device. And a similar utterance candidate display unit for displaying

【００２３】また、請求項３に記載の発明による会議情
報再生装置は、請求項２の会議情報記録再生装置におい
て、前記指示意図抽出手段では、指示された発言に関す
る、発言者名、発言時間、前発言者名、後発言者名の４
つの属性の値から、検索者の意図を抽出することを特徴
とする。According to a third aspect of the present invention, in the conference information recording / reproducing apparatus of the second aspect, the instruction intention extracting means includes: a speaker name, a speaking time, Previous speaker name, post speaker name 4
It is characterized in that the intention of the searcher is extracted from the values of the two attributes.

【００２４】また、請求項４に記載の発明による会議情
報再生装置は、請求項２の会議情報記録再生装置におい
て、前記類似発言検出手段は、前記指示意図抽出手段に
おいて抽出された指示入力された発言の意図と、前記発
言構造テーブル中の他の発言との類似度を算出する発言
類似度算出手段と、前記発言類似度算出手段で算出され
た前記類似度が、予め定めた値以上の類似度を持つか否
かを判定する発言類似度判定手段と、を有し、前記発言
類似度判定手段の判定結果に基づいて、前記類似発言候
補を検出することを特徴とする。According to a fourth aspect of the present invention, there is provided the conference information reproducing apparatus according to the second aspect, wherein the similar utterance detecting means receives the instruction extracted by the instruction intention extracting means. An utterance similarity calculating unit that calculates a similarity between the intention of the utterance and another utterance in the utterance structure table, and the similarity calculated by the utterance similarity calculating unit is a similarity equal to or more than a predetermined value. And a utterance similarity determination unit that determines whether or not the utterance similarity has a degree. The similarity utterance candidate is detected based on a determination result of the utterance similarity determination unit.

【００２５】また、請求項５に記載の発明による会議情
報再生装置は、請求項２の会議情報記録再生装置におい
て、前記指示入力手段によって、前記検索者が再生区間
の指示が可能であり、前記意図抽出手段では、前記検索
者の再生行為を監視する再生操作監視手段を持ち、再生
された音声データ区間の一連の発言群に関わる予め定め
た属性値から検索者の再生意図を抽出する再生意図抽出
手段を備えることを特徴とする。According to a fifth aspect of the present invention, there is provided the conference information reproducing apparatus according to the second aspect, wherein the instruction input means allows the searcher to specify a reproduction section. The intention extraction means includes a reproduction operation monitoring means for monitoring a reproduction action of the searcher, and a reproduction intention for extracting a searcher's reproduction intention from a predetermined attribute value relating to a series of utterance groups in the reproduced audio data section. It is characterized by comprising extraction means.

【００２６】また、請求項６に記載の発明による会議情
報再生装置は、請求項５の会議情報記録再生装置におい
て、前記再生意図抽出手段で用いる属性値は、前記再生
された音声データ区間の一連の発言群の再生開始発言の
指示意図、停止発言者名、総発言数、総発言時間、発言
者集合、発言遷移行列であることを特徴とする。According to a sixth aspect of the present invention, in the conference information recording / reproducing apparatus of the fifth aspect, the attribute value used in the reproduction intention extracting means is a series of the reproduced audio data section. , The intent of the reproduction start utterance of the utterance group, the name of the uttered utterer, the total number of utterances, the total utterance time, the utterer set, and the utterance transition matrix.

【００２７】また、請求項７に記載の発明による会議情
報再生装置は、請求項５の会議情報記録再生装置におい
て、前記類似発言検出手段では、前記再生意図抽出手段
からの前記再生意図を用いて、前記発言構造テーブル中
の他の一連の発言群に関して、発言構造の類似度を算出
する発言構造類似度算出手段と、前記発言構造類似度算
出手段で算出された前記発言構造の類似度が、予め定め
た値以上の類似度を持つか否かを判定する発言構造類似
度判定手段と、を有し、前記発言構造類似度判定手段の
判定結果に基づいて、前記類似発言構造候補を検出する
ことを特徴とする。According to a seventh aspect of the present invention, in the conference information recording / reproducing apparatus of the fifth aspect, the similar utterance detecting means uses the reproduction intention from the reproduction intention extracting means. For another series of utterance groups in the utterance structure table, utterance structure similarity calculating means for calculating similarity of utterance structures, and similarity of the utterance structures calculated by the utterance structure similarity calculating means, And a utterance structure similarity determining unit that determines whether or not the utterance structure similarity is equal to or greater than a predetermined value, and detects the similar utterance structure candidate based on the determination result of the utterance structure similarity determining unit. It is characterized by the following.

【００２８】また、請求項８に記載の発明による会議情
報再生装置は、請求項５の会議情報記録再生装置におい
て、前記類似発言検出手段は、再生された発言の状況に
応じて、類似発言検出手段と類似発言構造検出手段を自
動的に選択する類似度判定方式選択手段を有することを
特徴とする。In the conference information reproducing apparatus according to the present invention, in the conference information recording / reproducing apparatus according to the fifth aspect, the similar utterance detection means detects a similar utterance in accordance with a state of the reproduced utterance. A similarity determination method selecting means for automatically selecting a means and a similar utterance structure detecting means.

【００２９】また、請求項９に記載の発明による会議情
報再生装置は、請求項２の会議情報記録再生装置におい
て、前記類似発言候補表示手段は、会議時間の情報を時
系列的に可視化する全会議時間表示領域と、複数個の発
言構造の縮小図を表示する類似候補縮小図表示領域との
２つの表示領域を持ち、前記全会議時間表示領域に、前
記検索者の前記指示入力装置からの入力指示により定ま
る再生区間およびその再生区間の類次候補の存在区間を
前記時系列上に部分表示領域として表示する手段と、前
記類似候補縮小図表示領域には、前記全会議時間表示領
域に表示された部分表示領域の区間の発言構造を縮小し
た類似候補縮小図を、前記部分表示領域の数だけ一覧表
示する一覧表示手段と、を備え、さらに、前記一覧表示
された複数個の前記類似候補縮小図のうちの一つが、前
記検索者により選択指示されたことを検知して、前記選
択指示された区間の音声データを再生する手段と、を備
えることを特徴とする。According to a ninth aspect of the present invention, in the conference information recording / reproducing apparatus according to the second aspect, the similar utterance candidate display means displays the conference time information in time series. It has two display areas, a conference time display area and a similar candidate reduced view display area for displaying a reduced view of a plurality of speech structures, and the entire meeting time display area has a display area from the instruction input device of the searcher. Means for displaying, as the partial display area, the reproduction section determined by the input instruction and the similar candidate of the reproduction section as the partial display area, and displaying the similar candidate reduced figure display area in the entire conference time display area List display means for displaying a reduced list of similar candidates obtained by reducing the comment structure of the section of the displayed partial display area by the number of the partial display areas, and further comprising: One of the similar candidate reduction diagram detects that the selected instruction by said searcher, characterized in that it comprises a means for reproducing the audio data of the selected designated interval.

【００３０】[0030]

【作用】請求項１の発明の会議情報記録再生装置では、
会議情報の音声入力データから、発言構造を抽出し、記
録する。ここで、発言構造は、例えば、音声入力データ
から発言を抽出し、その発言の発言者、発言開始時刻、
発言終了時刻を特定し、さらに、発言順序をも特定する
ことにより抽出できる。この発言構造は、視覚化情報生
成手段により生成された視覚化情報により表示装置上に
視覚化される。In the conference information recording / reproducing apparatus according to the first aspect of the present invention,
The speech structure is extracted from the voice input data of the conference information and recorded. Here, the utterance structure is, for example, extracting a utterance from voice input data,
It can be extracted by specifying the utterance end time and further specifying the utterance order. This comment structure is visualized on the display device by the visualization information generated by the visualization information generation means.

【００３１】そして、この視覚化情報上の任意の位置
が、例えばマウス等のポインティングデバイスからなる
指示入力手段により指示されることにより、音声および
映像で記録された会議情報データの任意の位置が再生さ
れる。この際、検索者の検索行為が監視され、検索行動
から検索者の検索の意図が自動的に抽出される。そし
て、会議中の他の部分に関して、抽出した検索者の意図
と類似した意図を持つ発言が存在するかが検出され、検
出された類似候補が表示装置上に表示される。When an arbitrary position on the visualization information is designated by an instruction input means such as a pointing device such as a mouse, an arbitrary position of the conference information data recorded in audio and video is reproduced. Is done. At this time, the search behavior of the searcher is monitored, and the search intention of the searcher is automatically extracted from the search behavior. Then, it is detected whether or not there is a statement having an intention similar to the extracted searcher's intention with respect to the other part during the meeting, and the detected similar candidate is displayed on the display device.

【００３２】これにより、検索者に対して自動的に類似
候補を提示することができる。この情報は、検索が失敗
した場合に、次にアクセスすべき情報の存在を示し、効
率的な検索を支援することができる。また、検索が成功
した場合にも、他にも正解の候補が存在することを検索
者に知らしめ、検索もれを減少させる効果を持つ。Thus, similar candidates can be automatically presented to the searcher. This information indicates the presence of information to be accessed next when the search fails, and can support efficient search. In addition, even when the search is successful, the searcher is notified that other correct candidates exist, and the search leakage is reduced.

【００３３】請求項２の発明の会議情報記録再生装置で
は、会議情報の音声入力データから、発言構造を抽出
し、発言構造データを記録する。発言構造データを視覚
化するための手段として、例えば発言者、発言時間、発
言遷移情報などの発言構造情報を時系列的に表示する発
言者チャートが使用される。In the conference information recording / reproducing apparatus according to the second aspect of the present invention, the speech structure is extracted from the speech input data of the conference information, and the speech structure data is recorded. As means for visualizing the utterance structure data, for example, a utterer chart that displays utterance structure information such as a utterer, utterance time, and utterance transition information in a time-series manner is used.

【００３４】発言者チャート上の任意の位置が検索者に
より指示入力されると、検索者の指示意図が自動的に抽
出される。ここでの指示意図は、検索者が指示して再生
した特定の発言に関する検索の意図であり、その発言に
関わる複数の属性の特性値から構成される。指示発言の
意図が抽出されたのち、発言構造データファイル中の他
の発言に関して、指示意図と類似した意図を持つ発言が
存在するかが評価される。類似した発言が検出された場
合、その類似発言として抽出された発言が、発言者チャ
ート上の該当する位置に視覚化される。When an arbitrary position on the speaker chart is designated and input by the searcher, the intention of the searcher is automatically extracted. Here, the instruction intention is an intention of a search relating to a specific utterance instructed and reproduced by the searcher, and is composed of characteristic values of a plurality of attributes related to the utterance. After the intention of the instruction statement is extracted, it is evaluated whether another statement in the statement structure data file has a statement having an intention similar to the instruction intention. When a similar utterance is detected, the utterance extracted as the similar utterance is visualized at a corresponding position on the utterer chart.

【００３５】これにより、会議情報の検索者の検索意図
と類似した構造を持つ発言が、検索者の付加的な入力な
しに、自動的に抽出できる。さらに、検索者に類次発言
候補を視覚的に提示することにより、その存在を知らし
めることが可能となる。As a result, a statement having a structure similar to the search intention of the searcher of the conference information can be automatically extracted without additional input by the searcher. Further, by visually presenting a similar utterance candidate to the searcher, it becomes possible to inform the presence of the candidate.

【００３６】請求項３の発明の会議情報記録再生装置で
は、指示意図の抽出において、検索者が指示入力により
特定した発言に関する、発言者名、発言時間、前発言者
名、後発言者名の４つの属性値を抽出することにより、
検索者が行った指示入力の意図を算出することができ
る。これにより、検索者の意図の複雑な構造から、指示
意図を表現する代表的な４つの属性を特定することによ
り、少ない情報量でかつ適切な検索者の指示意図を抽出
することが可能となる。In the conference information recording / reproducing apparatus according to the third aspect of the present invention, in the extraction of the instruction intention, the name of the speaker, the speaking time, the name of the previous speaker, and the name of the subsequent speaker related to the statement specified by the searcher through the instruction input. By extracting the four attribute values,
The intention of the instruction input performed by the searcher can be calculated. Thus, it is possible to extract an appropriate search intention with a small amount of information by specifying four representative attributes representing the design intention from the complicated structure of the searcher's intention. .

【００３７】請求項４の発明の会議情報記録再生装置で
は、検索者により指示された発言以外の会議中に行われ
た他の発言に関し、指示された発言との類似度が算出さ
れる。そして、この類似度がある基準を満足しているか
を判定することにより、類似発言が抽出される。これに
より、検索者が再生を指示した発言と類似した発言を自
動的に抽出することが可能となる。In the conference information recording / reproducing apparatus according to the fourth aspect of the present invention, the similarity with the instructed utterance other than the utterance instructed by the searcher is calculated. Then, by determining whether or not the similarity satisfies a certain criterion, a similar utterance is extracted. This makes it possible to automatically extract a statement similar to the statement that the searcher has instructed to reproduce.

【００３８】請求項５または請求項６の発明の会議情報
記録再生装置では、検索者の検索行為から、指示入力行
為だけではなく、再生行為からも自動的に検索意図が抽
出される。In the conference information recording / reproducing apparatus according to the fifth or sixth aspect of the present invention, a search intention is automatically extracted not only from an instruction inputting action but also from a reproducing action by a searcher.

【００３９】検索者は発言者チャート上の任意の発言を
指示して会議情報を記録した音声および映像データを再
生する。ついでしばらく再生した後、再生を停止すると
いう再生行為を行うことができる。ここでは、再生停止
指示入力が行われた後、再生区間を特定し、再生区間か
ら、指示入力意図と再生意図の両者が自動的に抽出され
る。再生区間から意図を抽出するということは、単に１
つの発言だけではなく、再生された一連の発言群とその
発言構造から検索意図が抽出されるということを意味す
る。The searcher instructs an arbitrary statement on the speaker chart to reproduce the audio and video data in which the conference information is recorded. Then, after playing for a while, it is possible to perform a playing action of stopping the playing. Here, after the reproduction stop instruction is input, the reproduction section is specified, and both the instruction input intention and the reproduction intention are automatically extracted from the reproduction section. Extracting intent from the playback section is simply a matter of 1
This means that not only one utterance, but also a series of uttered utterances reproduced and its utterance structure extract a search intention.

【００４０】ここで再生意図とは、請求項６において
は、開始発言の指示意図、停止発言者名、総発言数、総
発言時間、発言者集合、発言遷移行列の６つの発言構造
に関わる属性により算出できる。これにより、指示意図
だけを利用した時に比較し、より正確に検索者の検索意
図を推論することが可能となる。Here, the reproduction intention is an attribute relating to the six statement structures of the instructing intention of the start statement, the name of the stop speaker, the total number of statements, the total statement time, the speaker set, and the statement transition matrix. Can be calculated by This makes it possible to infer the search intention of the searcher more accurately by comparing when only the instruction intention is used.

【００４１】請求項７の発明の会議情報記録再生装置で
は、再生した区間の発言構造以外の、会議中に発生した
他の発言構造について、再生した区間との類似度が算出
される。この類似度が一定条件を満たすかが判断され、
条件を満足したものが、類似発言構造候補として検出さ
れる。これにより、検索者の再生意図と類似した発言構
造を持つ一連の発言群が自動的に抽出できる。In the conference information recording / reproducing apparatus according to the seventh aspect of the present invention, the similarity between the reproduced section and other utterance structures generated during the conference other than the utterance structure of the reproduced section is calculated. It is determined whether this similarity satisfies a certain condition,
Those satisfying the conditions are detected as similar utterance structure candidates. Thus, a series of utterance groups having a utterance structure similar to the search intention of the searcher can be automatically extracted.

【００４２】請求項８の発明の会議情報記録再生装置で
は、検索者の検索行為から、検索者の意図が特定の発言
なのか、一連の発言群なのかを判定し、それぞれに適切
な類似度の判定方式を自動的に判定する。これにより、
検索者の付加的な入力なしに、適切な類似度を判定する
手段を選択でき、より適切な類似候補を提示することが
可能となる。In the conference information recording / reproducing apparatus according to the eighth aspect of the present invention, it is determined from the search action of the searcher whether the searcher's intention is a specific statement or a series of statements, and an appropriate similarity is determined for each. Is automatically determined. This allows
Means for determining an appropriate similarity can be selected without additional input by the searcher, and more appropriate similar candidates can be presented.

【００４３】請求項９の発明の会議情報記録再生装置で
は、検出した類次候補を検索者に提示する表示方法に関
して、会議の全体を時系列的に示す表示領域と、類次候
補の発言構造を縮小表示によって一覧できる表示領域を
持つことにより、類次候補の会議の中の相対的な位置関
係を時間軸上で把握できることと、その詳細を縮小表示
により一覧できることにより、発言の内容の詳細と時系
列上の相対的な位置関係の２つの情報を有機的に連結し
て表示することが可能となる。In the conference information recording / reproducing apparatus according to the ninth aspect of the present invention, regarding a display method for presenting the detected similar candidates to a searcher, a display area showing the entire meeting in chronological order, and a statement structure of the similar candidates Has a display area that can be listed in a reduced display, so that the relative positional relationship of similar candidates in the meeting can be grasped on the time axis, and the details can be listed in a reduced display, so that the details of the remarks can be And two pieces of information about the relative positional relationship in time series can be organically connected and displayed.

【００４４】これにより、発言構造の認識力が向上し、
より効率的に検索が可能となる。また、このような情報
を参照しながら再生情報を聞く、または見ることによ
り、再生内容の理解も促進することができる。As a result, the ability to recognize the speech structure is improved,
Searching can be performed more efficiently. In addition, by listening to or viewing reproduction information while referring to such information, it is possible to promote understanding of the reproduction contents.

【００４５】[0045]

【発明の実施の形態】以下、図を参照しながら、この発
明による会議情報記録再生装置の実施の形態を説明す
る。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of a conference information recording / reproducing apparatus according to the present invention will be described below with reference to the drawings.

【００４６】図１は、この発明の一実施の形態の会議情
報記録再生装置のシステム構成図を示すブロック図であ
る。この実施の形態の会議情報記録再生装置は、会議情
報として音声および映像データを記録し、かつ、記録し
た音声および映像データファイルの任意の位置へのアク
セス手段を持ち、このアクセス手段によりアクセスされ
た個所の音声および映像データを再生するものである。FIG. 1 is a block diagram showing a system configuration of a conference information recording / reproducing apparatus according to an embodiment of the present invention. The conference information recording / reproducing apparatus according to this embodiment records audio and video data as conference information, and has access means for accessing an arbitrary position of the recorded audio and video data file, and is accessed by this access means. It reproduces the audio and video data at the location.

【００４７】この実施の形態における会議情報記録再生
装置では、検索者の再生指示に応じて、会議情報として
記録された音声および映像データファイル中の任意の位
置へアクセスすることができるようにするため、発言者
チャートのような発言構造を視覚化したアクセスインデ
ックスを備える装置を前提としている。そして、この発
言者チャートを介して、検索者が再生指示をしたとき
に、指示された位置に該当する音声および映像データを
再生するのはもちろんであるが、その上に、検索者の再
生指示意図を抽出し、その意図と類似した検索候補が存
在しないかを検出し、それを表示することにより、検索
者の検索もれを減少させるようにするものである。In the conference information recording / reproducing apparatus according to this embodiment, it is possible to access an arbitrary position in the audio and video data file recorded as the conference information in response to a reproduction instruction of the searcher. It is assumed that the apparatus has an access index that visualizes a speech structure such as a speaker chart. Then, when the searcher gives a playback instruction through the speaker chart, the audio and video data corresponding to the designated position is of course played back, and the searcher's playback instruction is given on top of that. An intention is extracted, a search candidate similar to the intention is detected, and it is detected. By displaying the search candidate, omission of search by the searcher is reduced.

【００４８】図１に示すように、この実施の形態の会議
情報記録再生装置は、複数個の音声入力装置１ａと、映
像入力装置１ｂと、音声入力装置１ａからの音声信号用
のＡ／Ｄ変換装置２と、音声データ合成装置３と、ファ
イル格納部４と、発言者チャート生成制御部５と、表示
装置１１と、指示入力装置１２と、映像再生装置１３
と、音声再生装置１４とを備える。As shown in FIG. 1, the conference information recording / reproducing apparatus of this embodiment comprises a plurality of audio input devices 1a, a video input device 1b, and an A / D for audio signals from the audio input device 1a. Conversion device 2, audio data synthesis device 3, file storage unit 4, speaker chart generation control unit 5, display device 11, instruction input device 12, video playback device 13
And an audio playback device 14.

【００４９】発言者チャート生成制御部５は、発話デー
タ抽出部６と、タイマー７と、発言構造テーブル生成部
８と、発言者チャート生成部９と、発言者チャート表示
部１０の一部とを備える。発言者チャート検索制御部１
５は、発言特定部１６と、検索者意図抽出部１７と、類
似候補検出部１８と、類次候補表示部１９と、発言者チ
ャート表示部１０の一部とを備える。The speaker chart generation control unit 5 includes an utterance data extraction unit 6, a timer 7, an utterance structure table generation unit 8, an utterer chart generation unit 9, and a part of the utterer chart display unit 10. Prepare. Speaker chart search control unit 1
5 includes a statement specifying unit 16, a searcher intention extracting unit 17, a similar candidate detecting unit 18, a similar candidate display unit 19, and a part of the speaker chart display unit 10.

【００５０】この実施の形態においては、発言者チャー
ト生成制御部５および発言者チャート検索制御部１５
は、コンピュータ処理装置の構成とされる。すなわち、
発言者チャート生成制御部５および発言者チャート検索
制御部１５の各部は、コンピュータのソフトウエアで実
現される機能部の構成とされる。In this embodiment, the speaker chart generation controller 5 and the speaker chart search controller 15
Is a configuration of a computer processing device. That is,
Each unit of the speaker chart generation control unit 5 and the speaker chart search control unit 15 has a configuration of a functional unit realized by software of a computer.

【００５１】音声入力装置１ａは、マイクロフォンなど
からなる会議参加者の音声を入力する装置であり、会議
参加者のそれぞれに割り当てられている。複数個の音声
入力装置１ａのそれぞれの出力音声信号は、Ａ／Ｄ変換
装置２においてデジタル信号に変換される。このＡ／Ｄ
変換装置２からの複数個のデジタル音声データは、音声
データ合成装置３によって会議参加者全員の音声データ
として合成され、ファイル格納部４に音声データファイ
ルとして格納される。The voice input device 1a is a device for inputting a voice of a conference participant, such as a microphone, and is assigned to each of the conference participants. The output audio signals of each of the plurality of audio input devices 1a are converted into digital signals in the A / D converter 2. This A / D
The plurality of digital audio data from the conversion device 2 are synthesized as audio data of all the conference participants by the audio data synthesizing device 3 and stored in the file storage unit 4 as an audio data file.

【００５２】映像入力装置１ｂは、例えばデジタルビデ
オカメラからなり、この映像入力装置１ｂからのデジタ
ル映像データは、ファイル格納部４に映像データファイ
ルとして格納される。映像入力装置１ｂのデジタルビデ
オカメラは、１台でも、あるいは複数台でもよい。The video input device 1b comprises, for example, a digital video camera. Digital video data from the video input device 1b is stored in the file storage unit 4 as a video data file. The number of digital video cameras of the video input device 1b may be one or more.

【００５３】図２は、ファイル格納部４に格納されるデ
ータファイルについて説明する図である。ファイル格納
部４には、この例では、４つのデータファイルが格納さ
れている。発言構造テーブル４１は、会議における会議
参加者の発言の構造を、入力音声データから抽出して生
成されるデータファイルである。このデータは、音声デ
ータファイル４３および映像データファイル４４へアク
セスするためのインデックスとなる情報を保持してい
る。さらに、発言者チャートを生成するためのデータと
もなる。この発言構造テーブル４１については、後で詳
述する。FIG. 2 is a diagram for explaining a data file stored in the file storage unit 4. In this example, the file storage unit 4 stores four data files. The statement structure table 41 is a data file generated by extracting a statement structure of a conference participant in a conference from input voice data. This data holds information serving as an index for accessing the audio data file 43 and the video data file 44. Further, it is also data for generating a speaker chart. The comment structure table 41 will be described later in detail.

【００５４】音声データファイル４３および映像データ
ファイル４４は、会議情報として記録した音声データお
よび映像データを保持するデータファイルである。これ
ら音声データファイル４３および映像データファイル４
４は、発言構造テーブル４１とのあいだにリンク関係を
保持している。会議参加者テーブル４２は、会議参加者
を識別するためのデータファイルであり、音声入力装置
１ａのそれぞれに対応して付与された入力装置番号と会
議参加者名との関係をデータとして保持している。The audio data file 43 and the video data file 44 are data files for holding audio data and video data recorded as conference information. These audio data file 43 and video data file 4
4 holds a link relationship with the comment structure table 41. The conference participant table 42 is a data file for identifying a conference participant. The conference participant table 42 holds, as data, a relationship between an input device number assigned to each of the voice input devices 1a and a conference participant name. I have.

【００５５】図３は会議参加者テーブル４２のデータ構
造を説明するための図である。会議参加者テーブルは、
会議参加者と入力装置番号との対応関係を保持するデー
タファイルである。フィールド４２ａは入力装置番号で
あり、音声入力装置１ａが保持する識別子である装置番
号を意味する。フィールド４２ｂは会議参加者名であ
り、各音声入力装置１ａに割り当てられた会議参加者の
名前がテキストデータとして保持される。FIG. 3 is a diagram for explaining the data structure of the conference participant table 42. The conference participant table
It is a data file that holds the correspondence between conference participants and input device numbers. The field 42a is an input device number, and means a device number which is an identifier held by the voice input device 1a. The field 42b is a conference participant name, and the name of the conference participant assigned to each voice input device 1a is stored as text data.

【００５６】Ａ／Ｄ変換装置２からの、複数個の音声入
力装置１ａのそれぞれについてのデジタル音声データ
は、発言者チャート生成制御部５に渡され、処理され
る。発言者チャート生成制御部５は、ファイル格納部４
に格納された音声データファイルの任意の位置にアクセ
スするためのアクセス手段の１つである発言者チャート
を生成する装置である。この発言者チャートの生成処理
の詳細については後述する。The digital audio data from the A / D converter 2 for each of the plurality of audio input devices 1a is passed to the speaker chart generation control unit 5 and processed. The speaker chart generation control unit 5 includes a file storage unit 4
This is an apparatus for generating a speaker chart, which is one of access means for accessing an arbitrary position of the audio data file stored in the file. Details of the speaker chart generation processing will be described later.

【００５７】表示装置１１は、発言者チャート生成制御
部５で生成された発言者チャートを、その画面に視覚的
に表示する。また、映像再生装置１３により再生された
映像も、さらに表示するようにしてもよい。すなわち、
映像再生装置１３は表示部を備えるので、その表示部に
再生された映像を表示するが、表示装置１１の表示画面
に表示してもよい。もちろん、表示装置１１には、発言
者チャートのみを表示し、映像は、映像再生装置１３の
表示部に表示するように分担させて表示させるようにす
ることもできる。The display device 11 visually displays the speaker chart generated by the speaker chart generation controller 5 on its screen. Also, the video reproduced by the video reproducing device 13 may be further displayed. That is,
Since the video reproducing device 13 includes a display unit, the reproduced video is displayed on the display unit, but may be displayed on the display screen of the display device 11. Of course, only the speaker chart may be displayed on the display device 11, and the video may be shared and displayed on the display unit of the video playback device 13.

【００５８】指示入力装置１２は、表示装置１１の表示
画面に表示された発言者チャート中の発言や発言構造を
指示するためのもので、マウスやポインティングデバイ
スによって構成される。The instruction input device 12 is for instructing an utterance or a utterance structure in the speaker chart displayed on the display screen of the display device 11, and is constituted by a mouse or a pointing device.

【００５９】映像再生装置１３は、ファイル格納部４の
映像データファイルの内の、発言者チャートからユーザ
により指示された部分の映像データを再生する装置であ
る。また、音声再生装置１４は、同様に、ファイル格納
部４の音声データファイルの内の、ユーザにより指示さ
れた部分の音声データを再生する装置である。発言者チ
ャートを使用して、音声データと同期させて、映像デー
タの任意の箇所を、映像再生装置１３で再生することも
できる。The video reproducing device 13 is a device for reproducing the video data of the portion specified by the user from the speaker chart in the video data file in the file storage unit 4. Similarly, the audio reproducing device 14 is a device that reproduces audio data of a portion specified by the user in the audio data file in the file storage unit 4. Using the speaker chart, an arbitrary portion of the video data can be reproduced by the video reproduction device 13 in synchronization with the audio data.

【００６０】発言者チャート探索制御部１５は、表示装
置１１の表示画面の発言者チャート上において指示入力
装置１２により指示された任意の位置に対応する音声デ
ータおよび画像データを検索して再生するものである。The speaker chart search controller 15 searches and reproduces voice data and image data corresponding to an arbitrary position designated by the instruction input device 12 on the speaker chart on the display screen of the display device 11. It is.

【００６１】以下の説明においては、説明の簡単のた
め、音声データファイルからの指示された音声データの
検索について述べるが、会議情報のデータファイルの再
生に関しては、映像データにおいても同様である。In the following description, retrieval of the designated audio data from the audio data file will be described for the sake of simplicity. However, the reproduction of the conference information data file is the same for video data.

【００６２】まず、発言者チャート生成制御部５におけ
る処理動作について説明する。First, the processing operation in the speaker chart generation control unit 5 will be described.

【００６３】Ａ／Ｄ変換装置２からの、複数個の音声入
力装置１ａのそれぞれについてのデジタル音声データ
は、発話データ抽出部６に入力される。この発話データ
抽出部６においては、入力された音声データのそれぞれ
について、ある一定以上の音量レベルが一定時間以上継
続した場合を発話としてみなして発言区間を抽出し、そ
の発言区間データを発言構造テーブル生成部８に伝達す
る。発言区間データは、音声入力装置１ａのいずれから
の音声データをあるかを示す入力装置番号と、発言開始
タイミングと、発言終了タイミングの情報とによって構
成されている。The digital voice data from the A / D converter 2 for each of the plurality of voice input devices 1 a is input to the utterance data extraction unit 6. The utterance data extracting unit 6 extracts a utterance section for each of the input voice data, assuming that a sound volume level exceeding a certain level continues for a certain time or more as a utterance, and uses the utterance section data in the utterance structure table. The information is transmitted to the generation unit 8. The utterance section data is composed of an input device number indicating which of the voice input devices 1a has voice data, utterance start timing, and utterance end timing information.

【００６４】発言構造テーブル生成部８では、会議の発
言を記録した音声データファイルへのアクセスインデッ
クスとなる、発言構造テーブルを生成する。すなわち、
前記発話データ抽出部６からの発言区間データと、タイ
マー７の時間情報から、入力装置番号、発言開始時刻、
発言終了時刻など、会議参加者の発言区間に関する情報
を抽出し、発言構造テーブルを生成し、ファイル格納部
４に格納する。The comment structure table generating section 8 generates a comment structure table which is an access index to the audio data file in which the comments of the conference are recorded. That is,
From the speech section data from the speech data extraction unit 6 and the time information of the timer 7, the input device number, the speech start time,
Information about the speech section of the conference participant, such as the speech end time, is extracted, a speech structure table is generated, and stored in the file storage unit 4.

【００６５】図４は、発言構造テーブルのデータ構造を
説明するための図である。発言構造テーブルは、会議に
おける会議参加者の発言の構造を保持し、会議情報を記
録した音声データファイルおよび映像データファイルへ
のアクセスインデックスとして使用されるデータファイ
ルである。FIG. 4 is a diagram for explaining the data structure of the comment structure table. The speech structure table is a data file that holds the structure of speeches of conference participants in the conference and is used as an access index to an audio data file and a video data file in which conference information is recorded.

【００６６】図４において、フィールド５１は発言番号
であり、発言の時間順に識別子が割り当てられる。フィ
ールド５２は発言が検出された音声入力装置１ａの識別
子としての入力装置番号である。フィールド５３は発言
開始時刻であり、検出された発言の開始時刻を記録開始
時からの経過時間として記録する。フィールド５４は発
言終了時刻であり、検出された発言の終了時刻を同様に
記録する。In FIG. 4, a field 51 is a statement number, and an identifier is assigned in order of the time of the statement. The field 52 is an input device number as an identifier of the voice input device 1a from which the utterance was detected. The field 53 is a speech start time, and records the start time of the detected speech as the elapsed time from the start of recording. The field 54 is the utterance end time, and the end time of the detected utterance is similarly recorded.

【００６７】前述もしたように、音声データファイル４
３と発言構造テーブルとは対応関係が付けられている。
例えば、図４で発言番号７の発言についての、両者の対
応関係について説明すると、５６は音声データファイル
４３に記録された発言番号７の記録個所を示しており、
リンク５５ａは発言番号７の記録位置の開始点を指して
いる。また、同様に、リンク５５ｂは発言番号７の記録
位置の終了点を指している。As described above, the audio data file 4
3 is associated with the comment structure table.
For example, a description will be given of the correspondence between the utterance number 7 and the utterance number 7 in FIG. 4. Reference numeral 56 denotes a recording location of the utterance number 7 recorded in the audio data file 43;
The link 55a points to the start point of the recording position of the utterance number 7. Similarly, the link 55b indicates the end point of the recording position of the utterance number 7.

【００６８】発言者チャート生成部９は、ファイル格納
部４に格納された発言構造テーブルの情報を受け、この
発言構造テーブルを視覚化して表示するための発言者チ
ャートの情報を生成する。生成された発言者チャート情
報は発言者チャート表示部１０に渡され、発言者チャー
ト表示部１０は、発言者チャートを表示装置１１に表示
する。The speaker chart generating section 9 receives the information of the statement structure table stored in the file storage section 4 and generates information of the speaker chart for visualizing and displaying the statement structure table. The generated speaker chart information is passed to the speaker chart display unit 10, and the speaker chart display unit 10 displays the speaker chart on the display device 11.

【００６９】図５は、発言者チャートの一実施の形態を
示す図である。１０１は発言者チャート表示領域であ
る。発言者チャートは、会議全体のオーバービューとし
て表示する全会議時間表示領域１０２と、この全会議時
間表示領域１０２中に表示される詳細表示個所１０４に
該当する部分の発言構造の詳細を表示する発言構造表示
領域１０３の２つの領域から構成される。FIG. 5 is a diagram showing one embodiment of the speaker chart. Reference numeral 101 denotes a speaker chart display area. The speaker chart displays the entire conference time display area 102 displayed as an overview of the entire conference and the details of the statement structure of the portion corresponding to the detail display location 104 displayed in the entire conference time display area 102. The structure display area 103 includes two areas.

【００７０】全会議時間表示領域１０２は、会議の開始
時刻を「００：００：００」とし、それから会議終了ま
でを相対時刻として表示する時刻表示を伴う。図５の例
では、途中の相対時刻は丁度中間時点の時刻のみを表示
している。詳細表示箇所１０４は、全会議時間のうちの
特定の時間区間を示すものである。The total conference time display area 102 has a time display in which the start time of the conference is "00: 00: 00: 00" and the time from then to the end of the conference is displayed as a relative time. In the example of FIG. 5, the relative time on the way displays only the time at the intermediate point. The detailed display portion 104 indicates a specific time section of the total conference time.

【００７１】そして、詳細表示箇所１０４で示される時
間区間の発言構造の詳細が、発言構造表示領域１０３に
表示されるという関係になっている。換言すれば、発言
構造表示領域１０３に表示されている発言構造は、全会
議時間中のどの辺りの時間区間のものであるかを詳細表
示個所１０４の、全会議時間表示領域１０２中の位置に
より知ることができる。The details of the comment structure in the time section indicated by the detail display portion 104 are displayed in the comment structure display area 103. In other words, the statement structure displayed in the statement structure display area 103 indicates which time section in the entire meeting time is based on the position of the detailed display location 104 in the entire meeting time display area 102. You can know.

【００７２】発言構造表示領域１０３は、発言者を識別
するための発言者名を表示する発言者名領域１０６と、
発言の遷移の状態を視覚的に表示するための発言遷移表
示領域１０７とから構成される。図５に示すように、発
言構造表示領域１０３の発言遷移表示領域１０７に対し
ても、この領域１０７に詳細表示される区間の先頭の時
刻と、終りの時刻とが表示されて、全会議時間の内のど
の時間部分の発言構造が詳細表示されているかが表示さ
れている。The speech structure display area 103 includes a speaker name area 106 for displaying a speaker name for identifying the speaker,
And an utterance transition display area 107 for visually displaying the state of the utterance transition. As shown in FIG. 5, the start time and the end time of the section displayed in detail in the comment transition display area 107 of the comment structure display area 103 are also displayed, and the total conference time is displayed. Which time portion of the message structure is displayed in detail is displayed.

【００７３】発言遷移表示領域１０７の発言者毎の各欄
には、各会議参加者（発言者）が会議時間中において、
いつ、どのくらいの時間の発言を行ったのかが、発言区
間バーＶＢの表示位置と長さにより示される。この発言
遷移表示領域１０７の全会議参加者分の発言区間バーの
遷移として表示される発言構造を読み取ることで、誰の
発言から誰の発言へと遷移したのかという、詳細表示箇
所１０４で示される時間区間の発言遷移構造を読みとる
ことが可能となる。In each column of the speaker transition display area 107 for each speaker, each conference participant (speaker) displays during the conference time.
When and how long the utterance was made are indicated by the display position and length of the utterance section bar VB. By reading the utterance structure displayed as the transition of the utterance section bar for all the conference participants in the utterance transition display area 107, a detailed display portion 104 indicating which utterance has transitioned to which utterance is displayed. It becomes possible to read the utterance transition structure in the time section.

【００７４】図５の全会議時間表示領域１０２における
三角点１０５ａ、または発言遷移表示領域１０７におけ
る破線１０５ｂは、その時に再生中の音声データに該当
する発言者チャート上の時間位置を示している。A triangle point 105a in the all meeting time display area 102 or a broken line 105b in the speech transition display area 107 in FIG. 5 indicates a time position on the speaker chart corresponding to the audio data being reproduced at that time.

【００７５】この表示装置１１に表示された発言者チャ
ートを、指示入力装置１２によって任意の位置を指示す
ることで、記録した会議の音声データの任意の位置を再
生することができる。発言者チャート検索制御部１５
は、指示された任意の位置の音声データを検索して再生
する。By designating an arbitrary position on the speaker chart displayed on the display device 11 by the instruction input device 12, it is possible to reproduce an arbitrary position of the recorded audio data of the conference. Speaker chart search control unit 15
Retrieves and reproduces the voice data at the specified position.

【００７６】発言者チャート検索制御部１５の発言特定
部１６は、表示装置１１上で指示された位置情報から、
ファイル格納部４の発言構造テーブル４１の該当する発
言（発言区間）を特定する処理を行う装置である。そし
て、図４に示したように、発言構造テーブル４１に記録
されているインデックスに従い、音声データファイル４
３の該当する個所が検索され、特定された発言（発言区
間）に該当する音声データが音声データファイル４３か
ら抽出され、音声再生装置１４において再生される。The utterance identification unit 16 of the utterer chart search control unit 15 uses the position information specified on the display device 11
This is a device that performs a process of specifying a corresponding utterance (utterance section) in the utterance structure table 41 of the file storage unit 4. Then, as shown in FIG. 4, according to the index recorded in the statement structure table 41, the audio data file 4
3 is searched, audio data corresponding to the specified utterance (utterance section) is extracted from the audio data file 43, and is reproduced by the audio reproducing device 14.

【００７７】検索者意図抽出部１７では、指示入力を行
った検索者の指示入力の意図（指示意図）の抽出を行
う。ここで、指示意図とは、音声および映像データの任
意の位置を再生したい検索者であるユーザが、再生した
い発言を指示した時の指示入力の検索意図を意味してい
る。この実施の形態では、検索者の指示意図は、発言に
関わる４つの属性、再生が指示された発言に関する発言者名、その発言時間、その前発言者名、その後発言者名から抽出される。前発言者名および後発言者名は、
発言遷移構造に関わる属性である。検索者意図抽出部１
７は、発言特定部１６で特定された発言に関する情報に
基づいて、ファイル格納部４を検索して、前記〜の
４つの属性を取得し、それにより指示意図を抽出する。The searcher intention extraction unit 17 extracts the intention (instruction intention) of the instruction input of the searcher who has performed the instruction input. Here, the instruction intention means a search intention of an instruction input when a user who is a searcher who wants to reproduce an arbitrary position of audio and video data has given an utterance to reproduce. In this embodiment, the intention of the searcher to specify is extracted from the four attributes related to the utterance, the name of the utterer regarding the utterance whose reproduction is instructed, the utterance time, the previous utterer's name, and the subsequent utterer's name. The first speaker name and the second speaker name are
This is an attribute related to the statement transition structure. Searcher intention extraction unit 1
7 retrieves the file storage unit 4 based on the information on the utterance specified by the utterance specifying unit 16 to obtain the above four attributes, and thereby extracts the instruction intention.

【００７８】類次候補検出部１８では、検索者意図抽出
部１７で抽出された指示意図の情報を受けて、当該指示
意図に類似した発言である類似候補が存在するかを検索
する。類似候補が存在した場合には、類次候補表示部１
９にその情報を送る。これを受けて、類似候補表示部１
９は、表示装置１１に類似候補を表示する。The similar candidate detecting unit 18 receives the information on the instruction intention extracted by the searcher intention extracting unit 17 and searches for a similar candidate that is a statement similar to the instruction intention. If there is a similar candidate, the similar candidate display unit 1
9 and send the information. In response to this, the similar candidate display unit 1
9 displays a similar candidate on the display device 11.

【００７９】図６は、検索者が再生したい発言を指示す
る方法を説明するための図である。図６では、発言者チ
ャートの一部分を拡大して図示している。検索者は、再
生したい発言に該当する領域を、指示入力装置１２を構
成するマウス等のポインティングデバイスを用いて指示
する。FIG. 6 is a diagram for explaining a method of instructing a comment that the searcher wants to reproduce. FIG. 6 shows an enlarged part of the speaker chart. The searcher designates an area corresponding to a statement to be reproduced by using a pointing device such as a mouse constituting the instruction input device 12.

【００８０】図６には、図５および図７において番号１
０８を付した、発言者「佐藤」の発言区間バーが図示さ
れており、指示入力装置１２で指し示されている位置
が、矢印カーソル１１０によって示されている。矢印カ
ーソル１１０の示している位置で、マウスボタンのクリ
ック等、指示入力装置１２による指示を行うと、後述す
るようにして発言区間バー１０８に該当する音声データ
が再生される。FIG. 6 shows the number 1 in FIGS. 5 and 7.
A speech section bar of the speaker “Sato”, which is denoted by 08, is illustrated, and the position pointed by the instruction input device 12 is indicated by an arrow cursor 110. When an instruction is given by the instruction input device 12 such as a click of a mouse button at the position indicated by the arrow cursor 110, audio data corresponding to the speech section bar 108 is reproduced as described later.

【００８１】図７は、検索者の指示入力位置の、発言者
チャート表示領域１０１における相対座標位置を説明す
るための図である。この実施の形態では、指示入力位置
は、表示装置１１上の座標ではなく、発言者チャート表
示領域１０１内における相対座標として扱われる。図７
において、１２１は発言者チャートにおける起点の座標
（０，０）を示す。FIG. 7 is a diagram for explaining the relative coordinate position of the searcher's instruction input position in the speaker chart display area 101. In this embodiment, the instruction input position is not treated as coordinates on the display device 11, but as relative coordinates in the speaker chart display area 101. FIG.
, 121 indicates the coordinates (0, 0) of the starting point in the speaker chart.

【００８２】また、発言遷移表示領域１０７に表示され
ている区間の起点（座標（０，０））に該当する会議時
刻は、Ｔoriginと表すこととする。また、発言遷移表示
領域１０７に表示されている部分に該当する会議区間の
時間幅をΔＴｍとし、発言遷移表示領域１０７の表示幅
をΔＸｍとする。したがって、時間幅ΔＴｍは、そのと
きに発言構造表示領域１０３内に表示されている会議区
間に応じた値を持つ。ΔＸｍは、そのときに表示されて
いる発言者チャート表示領域１０１の表示枠の大きさに
依存して変動する。The meeting time corresponding to the start point (coordinates (0, 0)) of the section displayed in the comment transition display area 107 is represented by Origin. Further, the time width of the conference section corresponding to the portion displayed in the statement transition display area 107 is ΔTm, and the display width of the statement transition display area 107 is ΔXm. Therefore, the time width ΔTm has a value corresponding to the conference section displayed in the comment structure display area 103 at that time. ΔXm varies depending on the size of the display frame of the speaker chart display area 101 displayed at that time.

【００８３】そして、図７において、１２２は、検索者
による指示入力装置１２による指示入力位置を示してお
り、この指示入力位置１２２に該当する会議時刻の値
を、指示入力時刻をＴpoint とする。Δｘは、この指示
入力位置１２２の、発言者チャート表示領域１０１にお
ける起点１２１からのｘ方向（横方向）の相対座標を示
している。In FIG. 7, reference numeral 122 denotes an instruction input position of the searcher using the instruction input device 12, and the value of the meeting time corresponding to the instruction input position 122 is defined as Tpoint. Δx indicates the relative coordinates of the instruction input position 122 in the x direction (lateral direction) from the starting point 121 in the speaker chart display area 101.

【００８４】この指示入力時刻Ｔpoint の算出式は、Ｔpoint ＝Ｔorigin＋ΔＴｍ（Δｘ／ΔＸｍ） …（１）となる。The formula for calculating the instruction input time Tpoint is as follows: Tpoint = Torigin + ΔTm (Δx / ΔXm) (1)

【００８５】次に、図８に、発言者チャート検索制御部
１５における処理の流れを示すフローチャートを示す。Next, FIG. 8 is a flowchart showing the flow of processing in the speaker chart search control section 15.

【００８６】ステップ201 では、検索者であるユーザか
らの再生の指示入力があるかを監視する。ステップ202
では、指示入力があったかどうかを判定し、指示入力が
ない場合には、ステップ201 へ戻り、ユーザの指示入力
の監視を繰り返す。In step 201, it is monitored whether or not there is a reproduction instruction input from a user who is a searcher. Step 202
Then, it is determined whether or not there is an instruction input, and if there is no instruction input, the process returns to step 201, and the monitoring of the user's instruction input is repeated.

【００８７】ユーザからの指示入力があった場合には、
ステップ203 において、ユーザの指示入力座標Ｐpoint
を獲得する。これは表示画面上における絶対座標であ
る。次いで、ステップ204 において指示入力位置に該当
する発言を特定する。この際に、ステップ203 で獲得し
たユーザの指示入力座標Ｐpoint を、前述した発言遷移
表示領域１０７内の相対座標位置に変換する処理も行
う。以上の処理は、発言特定部１６が行うことになる。
そして、ステップ204 の処理の詳細は、図９のフローチ
ャートを用いて後述する。When there is an instruction input from the user,
In step 203, the user's instruction input coordinates Ppoint
To win. This is an absolute coordinate on the display screen. Next, in step 204, the utterance corresponding to the instruction input position is specified. At this time, a process of converting the user's instruction input coordinate Ppoint acquired in step 203 into a relative coordinate position in the utterance transition display area 107 is also performed. The above processing is performed by the comment specifying unit 16.
The details of the processing in step 204 will be described later with reference to the flowchart in FIG.

【００８８】ステップ205 では、特定した発言の意図を
抽出する処理を行う。このステップ205 の処理は、検索
者意図抽出部１７が行う処理に相当する。このステップ
205の処理の詳細は、図１１のフローチャートを用いて
後述する。In step 205, processing for extracting the intention of the specified utterance is performed. The processing in step 205 corresponds to the processing performed by the searcher intention extracting unit 17. This step
Details of the processing of 205 will be described later using the flowchart of FIG.

【００８９】次のステップ206 では、抽出した指示意図
と類似の発言候補を検出するための処理を行う。このス
テップ206 の処理は、類似候補検出部１８が行う。この
ステップ206 の詳細は、図１３のフローチャートを用い
て後述する。In the next step 206, a process for detecting a utterance candidate similar to the extracted instruction intention is performed. The process of step 206 is performed by the similar candidate detecting unit 18. Details of step 206 will be described later with reference to the flowchart of FIG.

【００９０】次に、図９のフローチャートを用いて、ス
テップ204 の発言特定処理を説明する。ステップ251 で
は、入力された座標位置Ｐpoint を、発言遷移表示領域
１０７内の相対座標位置に変換し、指示入力位置のｘ座
標Δｘを算出する。そして、次のステップ252 では、前
述した（１）式から、指示入力時刻Ｔpoint を算出す
る。Next, the utterance identification processing in step 204 will be described with reference to the flowchart in FIG. In step 251, the input coordinate position Ppoint is converted into a relative coordinate position in the utterance transition display area 107, and the x coordinate Δx of the instruction input position is calculated. Then, in the next step 252, the instruction input time Tpoint is calculated from the aforementioned equation (1).

【００９１】次のステップ253 では、ファイル格納部４
の発言構造テーブル４１から１レコード分、読込み、変
数R1に代入する。これは、任意の１発言に相当するデー
タである。次のステップ254 では、読込んだレコードR1
の発言開始時刻フィールドと発言終了時刻フィールドの
値をＴ（開始）、Ｔ（終了）という変数にそれぞれ代入
する。In the next step 253, the file storage unit 4
Is read from the comment structure table 41 for one record and substituted into the variable R1. This is data corresponding to one arbitrary statement. In the next step 254, the read record R1
Of the utterance start time field and the utterance end time field are substituted into variables T (start) and T (end), respectively.

【００９２】次のステップ255 では、指示入力時刻Ｔpo
int が、レコードR1の発言開始時刻と終了時刻の間の時
刻であるかを判定する。入力指示時刻Ｔpoint が発言開
始時刻と発言終了時刻の間に存在している場合には、指
示発言が特定できたと判断し、ステップ256 において、
発言構造テーブル４１の該当する発言のレコードR1の発
言番号フィールドの値を獲得し、それを変数IDに代入
し、その変数IDの値を返す。もし、ステップ255 で、指
示入力時刻Ｔpoint が、レコードR1の発言開始時刻と終
了時刻の間に存在しないと判定された場合は、ステップ
253 にもどり、次のレコードを読込み、次の発言に関す
る処理を行う。At the next step 255, the instruction input time Tpo
It is determined whether int is a time between the utterance start time and the end time of the record R1. If the input instruction time Tpoint exists between the utterance start time and the utterance end time, it is determined that the instruction utterance has been identified, and in step 256,
The value of the comment number field of the record R1 of the corresponding comment in the comment structure table 41 is obtained, the value is substituted for the variable ID, and the value of the variable ID is returned. If it is determined in step 255 that the instruction input time Tpoint does not exist between the utterance start time and the end time of the record R1, the process proceeds to step 255.
Returning to step 253, the next record is read, and the processing related to the next statement is performed.

【００９３】次に、指示意図抽出処理について説明す
る。指示意図は、前述したように、発言に関わる４つの
属性、発言者名、発言時間、前発言者名、後発言者名に
よって定義する。これらの属性を用いて、指示意図は、
この明細書では、Ｉinst（発言者名，発言時間，前発言
者名，後発言者名）と表記する。Next, the instruction intention extracting process will be described. As described above, the instruction intention is defined by the four attributes related to the utterance, the name of the utterer, the utterance time, the name of the previous utterer, and the name of the latter utterer. Using these attributes, the instruction intention is
In this specification, it is described as Iinst (speaker name, speech time, previous speaker name, subsequent speaker name).

【００９４】図１０に発言者チャートの一部を示すが、
この図１０では、矢印カーソル１１０により示されるよ
うに、会議参加者名「田中」の発言が、検索者により指
示されたことを示している。このときの検索者の指示意
図は、Ｉinst（田中，６５秒，鈴木，佐藤）と規定され
る。これは、「田中」の発言が、発言時間が６５秒であ
り、「鈴木」の後に発言し、「田中」の後には「佐藤」
が発言したことを意味する。この実施の形態では、検索
者は、この４つの属性により表現されている意図をもっ
て特定の発言を指示したと解釈するものである。FIG. 10 shows a part of the speaker chart.
In FIG. 10, as indicated by the arrow cursor 110, it is indicated that the remark of the conference participant name “Tanaka” has been instructed by the searcher. At this time, the searcher's intention to specify is defined as Iinst (Tanaka, 65 seconds, Suzuki, Sato). This means that the speech of "Tanaka" has a speech time of 65 seconds, speaks after "Suzuki", and "Sato" after "Tanaka"
Means that In this embodiment, it is interpreted that the searcher has designated a specific utterance with an intention expressed by these four attributes.

【００９５】なお、発言に対する指示意図全体ではな
く、指示意図を、個別の属性について表記する場合に
は、指示意図Ｉinst（）の、（）内にそれぞれの属性を
記すこととする。例えば、指示意図の発言者名属性は、
Ｉinst（発言者名）と標記する。他の発言時間、前発言
者名、後発言者名の属性の場合も同様の形式で記述す
る。When not indicating the entire instruction intention but the instruction intention with respect to an individual attribute, each attribute is described in () of the instruction intention Iinst (). For example, the speaker name attribute of the instruction intention is
Marked as Iinst (speaker name). Attributes of other utterance times, preceding speaker names, and subsequent speaker names are described in the same format.

【００９６】次に、図１１のフローチャートを用いて、
ステップ205 の指示意図抽出処理を説明する。Next, using the flowchart of FIG.
The instruction intention extraction process in step 205 will be described.

【００９７】図１１は指示意図を抽出する処理を説明す
るためのフローチャートである。ステップ311 は初期設
定であり、変数IDに発言特定処理によって特定された発
言の発言番号を代入する。次のステップ312 では、発言
構造テーブル４１から、変数IDで示される発言番号のレ
コードを読込み変数Riに代入する。同様に、変数IDで示
される発言番号の前後の発言に関するレコードも読込
み、それぞれ変数Rp，変数Rnに代入する。FIG. 11 is a flowchart for explaining a process of extracting an instruction intention. Step 311 is an initial setting, in which the statement number of the statement specified by the statement specifying process is substituted for the variable ID. In the next step 312, the record of the statement number indicated by the variable ID from the statement structure table 41 is substituted for the read variable Ri. Similarly, records related to the utterances before and after the utterance number indicated by the variable ID are read and assigned to the variables Rp and Rn, respectively.

【００９８】次のステップ313 では、変数Riから発言者
名属性に関する指示意図Ｉinst（発言者名）を導出す
る。次のステップ314 においても、同様に、発言時間属
性の指示意図Ｉinst（発言時間）を導出する。In the next step 313, an instruction intention Iinst (speaker name) relating to the speaker name attribute is derived from the variable Ri. In the next step 314, the instruction intention Iinst (speech time) of the speech time attribute is similarly derived.

【００９９】また、次のステップ315 では、発言遷移構
造に関わる指示意図を算出する。まず、変数Rpの入力装
置番号に該当する会議参加者名を、ファイル格納部４の
会議参加者テーブル４２から抽出し、前発言者名属性の
指示意図Ｉinst（前発言者名）を導出する。同様にし
て、変数Rnの入力装置番号に該当する会議参加者名を、
ファイル格納部４の会議参加者テーブル４２から抽出
し、後発言者名属性の指示意図Ｉinst（後発言者名）を
導出する。In the next step 315, the instruction intention related to the statement transition structure is calculated. First, a conference participant name corresponding to the input device number of the variable Rp is extracted from the conference participant table 42 of the file storage unit 4 to derive an instruction intention Iinst (previous speaker name) of a previous speaker name attribute. Similarly, the conference participant name corresponding to the input device number of the variable Rn is
It is extracted from the conference participant table 42 of the file storage unit 4 to derive the instruction intention Iinst (name of the poster speaker) of the poster speaker name attribute.

【０１００】そして、次のステップ316 においては、特
定された指示意図Ｉinst（発言者名，発言時間，前発言
者名，後発言者名）の値を、類似候補検出部１８に送
る。In the next step 316, the value of the specified instruction intention Iinst (speaker name, speech time, previous speaker name, subsequent speaker name) is sent to the similar candidate detecting unit 18.

【０１０１】次に、ステップ206 の類似発言検出処理に
ついて説明する。以下の説明において、発言の類似度は
ＤＩartiと表記する。この発言の類似度ＤＩartiは、発
言意図Ｉinstを構成する４つの属性に関する各々の類似
度の合成関数として定義される。類似度を、個別の属性
について表記する場合には、類似度ＤＩarti（）
の、（）内にそれぞれの属性を記すこととする。例え
ば、類似度の発言者名属性は、ＤＩarti（発言者名）と
標記する。他の発言時間、前発言者名、後発言者名も同
様の形式で記述する。Next, the similar utterance detection process in step 206 will be described. In the following description, the similarity of remarks is referred to as DIarti. The utterance similarity DIarti is defined as a composite function of the similarities of the four attributes constituting the utterance intention Iinst. When the similarity is expressed for individual attributes, the similarity DIarti ()
Each attribute is described in parentheses. For example, the speaker name attribute of the similarity is marked as DIarti (speaker name). Other utterance times, names of previous speakers, and names of subsequent speakers are described in the same format.

【０１０２】類似度ＤＩartiは、類似度が高いほど小さ
な値を持つものとする。ＤＩarti（Ａ，Ｂ）は、発言Ａ
と発言Ｂの指示意図の類似度とする。発言Ａと発言Ｂの
指示意図の各属性毎の類似度は、ＤＩarti（Ａ，
Ｂ）（）の（）内にそれぞれの属性を記すことにする。The similarity DIarti has a smaller value as the similarity is higher. DIarti (A, B) says remark A
And the similarity of the instruction intention of remark B. The similarity for each attribute of the intent to instruct A and B is DIarti (A,
B) Each attribute is described in parentheses in parentheses.

【０１０３】発言者名属性の類似度ＤＩarti（Ａ，Ｂ）
（発言者名）は、発言Ａと発言Ｂの発言者名が等しい場
合に０の値を持つ。異なる場合には、ＤＩmax というき
わめて大きな類似度の値が割り当てられる。すなわち、
類似度を評価する際、発言者名属性の類似度は０でない
場合には、まったく類似していないと判断される。発言
時間属性の類似度類似度ＤＩarti（Ａ，Ｂ）（発言時
間）は、発言時間の差異の絶対値で評価する。前発言者
名および後発言者名の類似度は一致した場合が０，不一
致の場合に１の値をとる。Similarity DIarti (A, B) of speaker name attribute
(Speaker name) has a value of 0 when the names of the speakers A and B are equal. If not, a very high similarity value of DImax is assigned. That is,
When evaluating the similarity, if the similarity of the speaker name attribute is not 0, it is determined that there is no similarity at all. The similarity degree DIarti (A, B) (speech time) of the speech time attribute is evaluated by the absolute value of the difference between the speech times. The similarity between the name of the preceding speaker and the name of the following speaker takes a value of 0 when they match, and a value of 1 when they do not match.

【０１０４】発言の類似度ＤＩartiは、発言者名属性を
条件部として、その他の各属性毎の類似度の重みづき合
成関数として表現される。この発言の類似度ＤＩartiの
定義式は、次のようになる。The similarity DIarti of the statement is expressed as a weighted composite function of the similarity for each of the other attributes, using the speaker name attribute as a condition part. The definition formula of the similarity DIarti of this remark is as follows.

【０１０５】すなわち、 (i) ＤＩarti（発言者名）＝０のときには、ＤＩarti＝ｗ１×ＤＩarti（発言時間）＋ｗ２×ＤＩar
ti（前発言者名）＋ｗ３×ＤＩarti（後発言者名） (ii)ＤＩarti（発言者名）＞０のときには、ＤＩarti＝ＤＩmax …（２）と表すことができる。なお、ｗ１，ｗ２，ｗ３は重み係
数である。(I) When DIarti (speaker name) = 0, DIarti = w1 × DIarti (speech time) + w2 × DIar
ti (previous speaker name) + w3 × DIarti (post speaker name) (ii) When DIarti (speaker name)> 0, it can be expressed as: DIarti = DImax (2) Note that w1, w2, and w3 are weight coefficients.

【０１０６】発言の類似度ＤＩartiの定義式および発言
Ａと発言Ｂの指示意図の各属性毎の類似度の定義を、図
１２にまとめて示す。FIG. 12 shows the definition formula of the similarity DIarti of the utterance and the definition of the similarity of each attribute of the intent of the utterance A and the utterance B for each attribute.

【０１０７】（２）式に示されるように、発言の類似度
ＤＩartiに関し、発言者名属性の類似度ＤＩarti（Ａ，
Ｂ）（発言者名）は条件部であり、一致が必要条件にな
る。そして、ＤＩarti（発言者名）＝０で、発言者名が
一致しているときに、他の３つの属性、発言時間、前発
言者名、後発言者名の合成関数として定義される。この
場合、発言時間、前発言者名、後発言者名の３つの属性
については、各々の類似度に、ｗ１，ｗ２，ｗ３という
重みがつけられ、これらが加算されることにより発言の
類似度ＤＩartiが算出される。そして、発言者名が不一
致の場合は、類似度は無限大の値ＤＩmax をとり、まっ
たく類似していないことを意味する。As shown in the equation (2), regarding the similarity DIarti of the utterance, the similarity DIarti (A,
B) (speaker name) is a condition part, and matching is a necessary condition. When DIarti (speaker name) = 0 and the names of the speakers are the same, the attribute is defined as a composite function of the other three attributes, the speaking time, the name of the preceding speaker, and the name of the following speaker. In this case, with respect to the three attributes of the speech time, the name of the previous speaker, and the name of the subsequent speaker, weights of w1, w2, and w3 are assigned to the respective similarities, and these are added to obtain the similarity of the speech. DIarti is calculated. If the names of the speakers do not match, the similarity has an infinite value DImax, which means that they are not similar at all.

【０１０８】図１３は類似発言を検出するための処理を
説明するフローチャートである。ステップ351 は初期設
定値であり、類似発言候補のリストを保持する変数List
に初期値（）を設定している。ステップ352 からステッ
プ356 の間では、発言構造テーブル４１中の各レコー
ド、すなわち各発言に対して、類似度の算出および判定
などの一連の処理を繰り返す。FIG. 13 is a flowchart for explaining a process for detecting a similar utterance. Step 351 is an initial setting value, a variable List that holds a list of similar utterance candidates.
Is set to the initial value (). From step 352 to step 356, a series of processing such as calculation and determination of similarity is repeated for each record in the comment structure table 41, that is, for each comment.

【０１０９】ステップ352 では、発言構造テーブル４１
から１レコードを読込み、変数R1に代入している。ステ
ップ353 で変数R1がnil でなければ、すなわち処理すべ
きレコードが存在すれば、ステップ354 の発言類似度算
出処理を行う。ついでステップ355 では発言の類似度
が、類似していると判定できる一定の基準を満たしてい
るかを評価する発言類似度判定処理を行う。次のステッ
プ356 では、類似していると判定された発言候補に該当
するデータファイルの存在場所（音声データファイルや
映像データファイル中の位置）を検出する。ステップ35
3 において、読込むべきレコードがなかったと判定され
た場合には処理を終了する。In step 352, the statement structure table 41
, One record is read and assigned to the variable R1. In step 353, if the variable R1 is not nil, that is, if there is a record to be processed, a statement similarity calculation process in step 354 is performed. Next, in step 355, a statement similarity determination process is performed to evaluate whether the similarity of the statement satisfies a certain criterion for determining similarity. In the next step 356, the location of the data file corresponding to the speech candidate determined to be similar (the position in the audio data file or the video data file) is detected. Step 35
If it is determined in step 3 that there is no record to be read, the process ends.

【０１１０】図１４は、図１３のステップ354 の発言類
似度算出処理を説明するためのフローチャートである。
ステップ401 では、変数の初期設定値を示し、変数inpu
t には発言特定処理によって特定した発言の発言番号を
代入し、変数R1にはinput との類似比較対照である、現
在処理中の発言番号が代入されている。FIG. 14 is a flowchart for explaining the utterance similarity calculation process in step 354 of FIG.
In step 401, the initial value of the variable is indicated, and the variable inpu
The utterance number of the utterance identified by the utterance identification processing is substituted for t, and the currently processed utterance number, which is a similarity comparison with input, is substituted for the variable R1.

【０１１１】ステップ402 では、変数input および変数
R1の２つの発言番号の発言の指示意図Ｉinst（input ）
およびＩinst（R1）を算出する。次のステップ403 で
は、定義式（２）に沿って、発言者名属性に関する変数
input の指示発言と変数R1の類似発言候補の類似度ＤＩ
arti（input ，R1）（発言者名）を算出する。In step 402, the variable input and the variable
Instruction intention Iinst (input) of speech of two speech numbers of R1
And Iinst (R1) are calculated. In the next step 403, variables related to the speaker name attribute are defined according to the definition equation (2).
The similarity DI between the instruction utterance of input and the similar utterance candidate of the variable R1
Calculate arti (input, R1) (speaker name).

【０１１２】そして、次のステップ404 で、この発言者
名属性の類似度ＤＩarti（input ，R1）（発言者名）の
値が１かどうかを判定する。発言者属性の類似度ＤＩar
ti（input ，R1）（発言者名）の値が１以外の値、すな
わち不一致である場合は、これ以降の類似度は算出せ
ず、ステップ407 において、類似度ＤＩarti（input ，
R1）（発言者名）の値として、前述したＤＩmax という
きわめて大きな値を代入して処理を終了する。Then, in the next step 404, it is determined whether or not the value of the similarity DIarti (input, R1) (speaker name) of the speaker name attribute is 1. Similarity DIar of speaker attributes
If the value of ti (input, R1) (speaker name) is a value other than 1, that is, if they do not match, the similarity thereafter is not calculated, and in step 407, the similarity DIarti (input,
R1) The extremely large value of DImax described above is substituted as the value of (speaker name), and the process is terminated.

【０１１３】一方、ステップ404 で発言者名が一致した
と判定された場合はステップ405 に移行する。ステップ
405 では、残りの３つの属性に関する類似度ＤＩarti
（input ，R1）（発言時間）、ＤＩarti（input ，R1）
（前発言者名）およびＤＩarti（input ，R1）（後発言
者名）を個別に算出する。そして、ステップ406 におい
て、発言番号input の指示発言と、発言番号R1の類似発
言候補との類似度ＤＩarti（input ，R1）を、定義式
（２）に従って算出し、その値を発言類似度判定処理に
渡す。On the other hand, if it is determined in step 404 that the names of the speakers match, the flow shifts to step 405. Steps
In 405, the similarity DIarti for the remaining three attributes
(Input, R1) (utterance time), DIarti (input, R1)
(Previous speaker name) and DIarti (input, R1) (post speaker name) are calculated individually. Then, in step 406, the similarity DIarti (input, R1) between the instruction utterance of the utterance number input and the similar utterance candidate of the utterance number R1 is calculated according to the definition formula (2), and the value is calculated. Pass to.

【０１１４】図１５は発言類似度判定処理を説明するた
めのフローチャートである。ステップ451 では、初期設
定として前記発言類似度算出処理により、発言番号inpu
t の入力指示発言と、発言番号R1の類似候補発言との類
似度が求められている。次のステップ452 では、算出さ
れた類似度ＤＩarti（input ，R1）の値が、類似してい
るという評価基準の類似度ＤＩlimit よりも小さいかが
判定される。評価基準値ＤＩlimit よりも小さい場合に
は、この２つの発言は類似していると判定し、ステップ
453 において「True」の値を返す。基準値ＤＩlimit よ
りも大きい場合には、この２つの発言は類似していない
と判断し、ステップ454 において「False 」の値を返
す。FIG. 15 is a flowchart for explaining the utterance similarity determination processing. In step 451, as the initial setting, the statement number inpu
The similarity between the input instruction utterance t and the similar candidate utterance of the utterance number R1 is calculated. In the next step 452, it is determined whether or not the value of the calculated similarity DIarti (input, R1) is smaller than the similarity DIlimit of the evaluation criterion of similarity. If it is smaller than the evaluation reference value DIlimit, it is determined that these two remarks are similar, and
Returns a value of "True" at 453. If it is larger than the reference value DIlimit, it is determined that the two remarks are not similar, and the value of "False" is returned in step 454.

【０１１５】図１６は、図１３のステップ356 の類似発
言候補検出処理に相当するデータファイルの場所を検出
する処理を説明するためのフローチャートである。FIG. 16 is a flowchart for explaining a process of detecting the location of a data file corresponding to the similar utterance candidate detecting process of step 356 in FIG.

【０１１６】ステップ471 では初期設定が行われ、変数
R1に現在処理中の発言構造テーブル４１のレコードが代
入されている。ステップ472 において、前記類似度判定
処理の結果の判定が行われ、もし戻り値が「True」の場
合にはステップ473 において、指示入力発言と類似して
いると判定された発言に該当する音声データファイルの
場所を、発言の開始時刻と終了時刻の区間によって表
し、変数Listに追加する。ステップ472 において戻り値
が「False 」の場合には、そのまま処理を終了する。In step 471, initialization is performed, and the variable
The record of the currently processed comment structure table 41 is substituted for R1. At step 472, the result of the similarity determination processing is determined. If the return value is "True", at step 473, the audio data corresponding to the utterance determined to be similar to the instruction input utterance The location of the file is represented by a section between the start time and the end time of the comment, and is added to the variable List. If the return value is "False" in step 472, the process ends.

【０１１７】以上により、会議等の参加者の音声情報を
記録し、音声データファイルへアクセスするためのイン
デックス情報としての発言構造データを抽出し、発言構
造データを発言者チャートとして視覚化するような手段
を持つマルチメディア会議記録再生装置において、会議
記録の検索者であるユーザが、発言者チャート上の任意
の発言位置をポインティングデバイス等で指示したと
き、ユーザの指示の意図を抽出し、その意図と類似の発
言候補を検出するので、ユーザは、再生された音声や画
像の視聴により、自分の意図したものでないと判断した
ときに、自分の意図するものと類似の発言を容易に検索
することができる。As described above, voice information of a participant in a conference or the like is recorded, speech structure data as index information for accessing a voice data file is extracted, and speech structure data is visualized as a speaker chart. In a multimedia conference recording / playback apparatus having a means, when a user who is a searcher of a conference record designates an arbitrary utterance position on a speaker chart with a pointing device or the like, the intention of the user's instruction is extracted and the intention is extracted. The user can easily search for utterances similar to what he intends when he / she judges that the utterance candidate is not his own by watching the reproduced sound or image. Can be.

【０１１８】［第２の実施の形態］前記の実施の形態に
おいては、ユーザの検索意図を、特定の発言を指示する
指示入力から抽出した。しかし、ユーザの検索意図を、
ユーザの再生行為による再生意図を抽出することによ
り、ユーザが必要としている情報を、より忠実に抽出す
ることが可能になる。[Second Embodiment] In the above embodiment, the user's search intention is extracted from an instruction input for instructing a specific utterance. However, the user ’s search intent
By extracting the reproduction intention by the reproduction action of the user, it is possible to more accurately extract the information required by the user.

【０１１９】この第２の実施の形態では、ユーザは、特
定の発言区間を再生するために、前述したように発言チ
ャート上で、希望する発言（発言区間バー）を指示する
だけでなく、発言者チャート上で再生開始指示を行い、
再生情報を視聴しながら再生終了指示をすることができ
るようにされている。すなわち、ユーザは、複数個の発
言区間に跨がった再生区間を指示することができる。そ
して、この第２の実施の形態では、ユーザのこの再生指
示行為から再生意図を抽出して、それに基づいてユーザ
が必要としている情報を抽出することができるようにす
る。In the second embodiment, in order to reproduce a specific utterance section, the user not only specifies a desired utterance (utterance section bar) on the utterance chart as described above, but also Instruction on the player chart
It is possible to give a reproduction end instruction while viewing the reproduction information. That is, the user can specify a playback section that extends over a plurality of comment sections. Then, in the second embodiment, a reproduction intention is extracted from the reproduction instruction action of the user, and information required by the user can be extracted based on the extracted reproduction intention.

【０１２０】図１７は、この第２の実施の形態の場合の
検索者意図抽出部１７の詳細を説明するためのブロック
図であり、検索者意図抽出部１７は、指示入力の意図を
抽出する指示意図抽出部１７ａと再生意図を抽出する再
生意図抽出部１７ｂから構成される。FIG. 17 is a block diagram for explaining the details of the searcher intention extraction unit 17 in the case of the second embodiment. The searcher intention extraction unit 17 extracts the intention of inputting an instruction. It comprises an instruction intention extraction unit 17a and a reproduction intention extraction unit 17b for extracting a reproduction intention.

【０１２１】指示意図抽出部１７ａは、指示入力情報か
ら、指示された特定の発言に対して前述の第１の実施の
形態で説明したようにして指示意図を抽出するのに対し
て、再生意図抽出部１７ｂでは、再生開始から再生終了
までの区間に含まれる一連の発言群の発言構造から、ユ
ーザの、検索したい情報に対する再生意図を抽出する。The instruction intention extracting unit 17a extracts the instruction intention from the instruction input information for the specified specific utterance as described in the first embodiment. The extraction unit 17b extracts the user's reproduction intention for the information to be searched from the statement structure of a series of statement groups included in the section from the reproduction start to the reproduction end.

【０１２２】図１８は、この第２の実施の形態の場合の
類似候補検出部１８の詳細を説明するためのブロック図
である。この第２の実施の形態の場合、類似候補検出部
１８は、類似度判定方式選択部１８ａと、類似発言候補
検出部１８ｂと、類似発言構造候補検出部１８ｆとから
構成される。FIG. 18 is a block diagram for explaining details of the similar candidate detecting section 18 in the case of the second embodiment. In the case of the second embodiment, the similar candidate detection unit 18 includes a similarity determination method selection unit 18a, a similar utterance candidate detection unit 18b, and a similar utterance structure candidate detection unit 18f.

【０１２３】類似度判定方式選択部１８ａは、検索者の
指示入力情報と、再生情報とから、類似発言候補検出部
１８ｂと類似発言構造候補検出部１８ｆとの、いずれか
の適切な類似度の判定方式を選択するための処理を行
う。この実施の形態では、類似度判定方式選択部１８ａ
は、後述もするように、ユーザの指示入力に応じて特定
された再生区間内に１個の発言のみしか含まれていない
場合は、類似発言候補検出部１８ｂを選択し、再生区間
内に複数個の発言が含まれている場合には、類似発言構
造候補検出部１８ｆを選択するようにする。The similarity determination method selection unit 18a determines whether the appropriate similarity between the similar utterance candidate detection unit 18b and the similar utterance structure candidate detection unit 18f is determined based on the searcher's instruction input information and the reproduction information. A process for selecting a determination method is performed. In this embodiment, the similarity determination method selection unit 18a
As will be described later, if only one utterance is included in the reproduction section specified in response to the user's instruction input, the similar utterance candidate detection unit 18b is selected, and a plurality of If there are multiple utterances, the similar utterance structure candidate detecting unit 18f is selected.

【０１２４】類似発言候補検出部１８ｂは、図１３を用
いて説明した第１の実施の形態の類似候補検出部の動作
と同じもので、発言類似度算出部１８ｃと、発言類似度
判定部１８ｄと、類似発言検出部１８ｅとの３つの構成
要素からなる。そして、類似発言候補検出部１８ｂ、発
言類似度判定部１８ｄおよび類似発言検出部１８ｅの処
理は、図１４、図１５および図１６を用いて説明したも
のと同じである。The similar remark candidate detecting section 18b is the same as the operation of the similar candidate detecting section of the first embodiment described with reference to FIG. 13, and includes a remark similarity calculating section 18c and a remark similarity determining section 18d. And a similar utterance detection unit 18e. The processes of the similar utterance candidate detection unit 18b, the utterance similarity determination unit 18d, and the similar utterance detection unit 18e are the same as those described with reference to FIGS. 14, 15, and 16.

【０１２５】類似発言構造候補検出部１８ｆは、発言構
造類似度算出部１８ｇと、発言構造類似度判定部１８ｈ
と、類似発言構造検出部１８ｉの３つ部分から構成され
る。類似発言候補検出部１８ｂと類似発言構造候補検出
部１８ｆとの相違は、次の通りである。すなわち、指示
入力された発言に対して類似度を検出する場合が類似発
言候補検出部１８ｂであり、再生情報も付加して一連の
発言群に対して類似度を検出するのが類似発言構造候補
検出部１８ｆである。The similar utterance structure candidate detecting section 18f includes an utterance structure similarity calculating section 18g and an utterance structure similarity determining section 18h.
And three parts of the similar utterance structure detecting unit 18i. The difference between the similar utterance candidate detection unit 18b and the similar utterance structure candidate detection unit 18f is as follows. That is, the similar utterance candidate detecting unit 18b detects the similarity with respect to the utterance inputted and instructed, and detects the similarity with respect to a series of utterance groups by adding the reproduction information. The detecting unit 18f.

【０１２６】図１９は、発言者チャートにおける、ユー
ザの再生区間の指定について説明するための図である。
図１９は、発言者チャートの一部を示すものである。FIG. 19 is a diagram for explaining the user's designation of the reproduction section in the speaker chart.
FIG. 19 shows a part of the speaker chart.

【０１２７】再生指示入力位置も、第１の実施の形態の
指示入力の場合と同様に、発言遷移表示領域１０７内に
おける相対座標であらわされる。図１９で、発言遷移表
示領域１０７のｘ方向の最も左側を、起点501 として、
その相対座標を（０，０）で表す。そして、ユーザによ
り再生開始指示された再生開始点のｘ座標502 をΔｘst
art 、再生終了指示された再生終了点のｘ座標503 をΔ
ｘstopとする。The reproduction instruction input position is also represented by relative coordinates in the utterance transition display area 107, as in the case of the instruction input of the first embodiment. In FIG. 19, the leftmost point in the x direction of the utterance transition display area 107 is defined as a starting point 501.
The relative coordinates are represented by (0,0). Then, the x coordinate 502 of the reproduction start point instructed by the user to start reproduction is represented by Δxst
art, the x coordinate 503 of the playback end point specified by the playback end is Δ
xstop.

【０１２８】そして、起点（０，０）に相当する時刻を
起点時刻Ｔoriginと表し、ユーザにより再生開始指示入
力された時刻である再生開始指示時刻をＴstart と表
し、また、ユーザにより再生終了指示入力された時刻で
ある再生終了指示時刻をＴstopと表す。再生開始指示時
刻Ｔstart と、再生終了指示時刻Ｔstopとの間が、再生
区間である。検索者の再生意図は、この再生区間に含ま
れる一連の発言群に対して抽出する。A time corresponding to the starting point (0, 0) is represented as a starting time Torigin, a reproduction start instruction time at which a reproduction start instruction is inputted by the user is represented as Tstart, and a reproduction end instruction is inputted by the user. The playback end instruction time, which is the performed time, is represented as Tstop. A period between the reproduction start instruction time Tstart and the reproduction end instruction time Tstop is a reproduction section. The search intention of the searcher is extracted for a series of utterance groups included in the playback section.

【０１２９】図２０は、類似発言構造候補を検出するた
めの処理を説明するためのフローチャートである。FIG. 20 is a flowchart for explaining a process for detecting a similar utterance structure candidate.

【０１３０】ステップ601 では、検索者であるユーザか
らの再生開始の指示入力があるかを監視する。ステップ
602 では、指示入力があったかどうかを判定し、指示入
力がないと判定した場合には、ステップ601 へ戻り、ユ
ーザの指示入力の監視を繰り返す。In step 601, it is monitored whether or not there is an input of a reproduction start instruction from a user who is a searcher. Steps
At 602, it is determined whether or not there is an instruction input. If it is determined that there is no instruction input, the process returns to step 601 and the monitoring of the user's instruction input is repeated.

【０１３１】ステップ602 で、ユーザからの再生開始指
示入力があったと判定された場合には、ステップ603 に
おいてユーザの再生開始指示入力座標を抽出し、その座
標を変数Ｐstart に入力する。この座標変数Ｐstart に
対して、発言特定処理を行い、指示入力位置の発言を特
定する。この発言特定処理は、図９を用いて説明した処
理と同様である。If it is determined in step 602 that a reproduction start instruction has been input by the user, in step 603, the reproduction start instruction input coordinates of the user are extracted, and the coordinates are input to a variable Pstart. An utterance identification process is performed on the coordinate variable Pstart to identify an utterance at the instruction input position. This utterance identification processing is the same as the processing described with reference to FIG.

【０１３２】次いで、ステップ605 では、ユーザからの
指示入力の監視を継続し、次のステップ606 において再
生の終了指示入力があったかを監視し、終了指示入力が
ない場合にはステップ605 において監視を継続する。ス
テップ606 で、再生終了指示入力があったと判定された
場合には、ステップ607 において、変数Ｔstopに再生終
了時刻を代入する。次いで、ステップ608 において再生
区間特定処理を行う。ここで再生区間が特定され、再生
区間に含まれる一連の発言群が特定される。再生区間特
定処理の詳細については、図２１を用いて後述する。Next, in step 605, the monitoring of the instruction input from the user is continued, and in the next step 606, it is monitored whether or not there is a reproduction ending instruction input. If there is no ending instruction input, the monitoring is continued in step 605. I do. If it is determined in step 606 that the reproduction end instruction has been input, in step 607, the reproduction end time is substituted for a variable Tstop. Next, at step 608, a reproduction section specifying process is performed. Here, a playback section is specified, and a series of utterance groups included in the playback section are specified. Details of the playback section identification processing will be described later with reference to FIG.

【０１３３】検索者の再生終了指示入力後、類似度の判
定処理が行われる。まず、ステップ609 において類似度
の判定方式を選択するための類似度判定処理を行う。こ
の類似度判定処理の詳細については、図２２を用いて後
述する。After the searcher inputs a reproduction end instruction, similarity determination processing is performed. First, in step 609, similarity determination processing for selecting a similarity determination method is performed. Details of the similarity determination processing will be described later with reference to FIG.

【０１３４】そして、ステップ610 で、ステップ609 で
の類似度判定処理の結果、類似度の判定が発言に対して
行われると判断された場合には、ステップ611 に移り、
指示意図抽出処理を行い、また、次のステップ612 で類
似発言検出処理を行う。この611 および612 の処理は、
第１の実施の形態において、図１１から図１６までを参
照しながら説明した一連の処理に相当する。If it is determined in step 610 that the similarity determination is performed on the utterance as a result of the similarity determination processing in step 609, the process proceeds to step 611.
An instruction intention extraction process is performed, and a similar utterance detection process is performed in the next step 612. The processing of 611 and 612
In the first embodiment, this corresponds to a series of processes described with reference to FIGS. 11 to 16.

【０１３５】また、ステップ610 で、類似度判定処理の
結果、類似度の判定が発言構造に対して行われると判断
された場合には、ステップ613 において再生意図を抽出
するための処理を行い、次のステップ614 において類似
した発言構造の検出処理を行う。ステップ613 の再生意
図を抽出するための処理は、図２４を用いて後述する。
また、ステップ614 の類似した発言構造の検出処理は、
図２７〜図３１６を用いて後述する。If it is determined in step 610 that the similarity is determined for the speech structure as a result of the similarity determination process, a process for extracting a reproduction intention is performed in step 613. In the next step 614, similar speech structure detection processing is performed. The processing for extracting the reproduction intention in step 613 will be described later with reference to FIG.
In addition, the similar utterance structure detection processing in step 614 is performed as follows.
This will be described later with reference to FIGS.

【０１３６】前記ステップ608 の再生区間を特定する処
理を、図２１のフローチャートを用いて説明する。The process of specifying the playback section in step 608 will be described with reference to the flowchart of FIG.

【０１３７】ステップ651 は、変数ＩＤstart と変数Ｉ
Ｄstopの初期設定を示すものであり、変数ＩＤstart に
は、再生開始指示入力位置Ｐstart から、ステップ604
の発言特定処理によって特定された発言番号を代入す
る。同様に、変数ＩＤstopには、再生停止指示入力によ
って指示された入力時刻Ｔstopから特定された発言番号
を代入する。この場合の発言特定処理は、図９に示した
ステップ253 〜256 の処理を指す。At step 651, the variable IDstart and the variable I
This indicates the initial setting of Dstop, and the variable IDstart is set at step 604 from the reproduction start instruction input position Pstart.
Is assigned to the utterance number specified by the utterance specifying process. Similarly, the utterance number specified from the input time Tstop specified by the reproduction stop instruction input is substituted for the variable IDstop. The utterance identification process in this case refers to the processes of steps 253 to 256 shown in FIG.

【０１３８】これによって、ユーザが指示入力した再生
区間は求められる。しかしながら、再生終了指示行為に
おいては、再生したいという意図がないにも関わらず、
次の発言が再生された後に終了指示入力がなされるとい
う可能性も存在する。したがって、できるだけユーザの
意図した再生区間を正確に抽出するために、再生の過剰
部分を補正する処理を行うほうがよい。As a result, the reproduction section designated by the user is obtained. However, in the playback end instruction act, although there is no intention to play,
There is also a possibility that an end instruction is input after the next utterance is reproduced. Therefore, in order to extract a reproduction section intended by the user as accurately as possible, it is better to perform a process of correcting an excessive portion of reproduction.

【０１３９】一般に、ユーザは、発言の再生が開始し
て、それが自分の再生意図区間に関係ないものとなった
ときは、比較的、即座に再生終了入力をすると考えられ
る。そこで、この第２の実施の形態では、ユーザの再生
終了指示入力があった位置の発言（以下、停止発言とい
う）の開始時刻から、再生終了指示入力時刻までが、予
め定めた一定時間ΔＴlimit よりも短いときには、その
最後の発言である停止発言は、再生意図に関係ない発言
として、ユーザの意図した再生区間から除外するように
補正する。In general, it is considered that a user starts reproduction of an utterance and, when the utterance does not relate to his / her intended reproduction section, inputs a reproduction end relatively immediately. Therefore, in the second embodiment, the time from the start time of the utterance at the position where the user has input the reproduction end instruction (hereinafter referred to as a stop utterance) to the reproduction end instruction input time is longer than a predetermined time ΔTlimit. When the length is also shorter, the stop utterance, which is the last utterance, is corrected to be excluded from the reproduction section intended by the user, as a utterance irrelevant to the reproduction intention.

【０１４０】すなわち、ステップ652 で、変数Ｔstopに
再生終了指示時刻を代入する。次のステップ653 におい
て、現時点で特定されている停止発言の発言番号ＩＤst
opに相当する発言構造テーブル４１のレコードを読込
み、それを変数R1に代入する。次に、ステップ654 にお
いて、変数Ｔ（開始時刻）に、変数R1のレコード中の開
始時刻フィールドを代入する。That is, in step 652, the reproduction end instruction time is substituted for the variable Tstop. In the next step 653, the statement number IDst of the stop statement specified at this time
The record of the comment structure table 41 corresponding to op is read, and it is substituted for the variable R1. Next, in step 654, the start time field in the record of the variable R1 is substituted for the variable T (start time).

【０１４１】そして、次のステップ655 では、再生終了
指示入力のあった実際の時刻Ｔstopと、停止発言として
特定された発言番号ＩＤstopの開始時刻Ｔ（開始時刻）
との差が、ある一定時間ΔＴlimit よりも小さいか否か
を判定する。小さい場合には、ステップ656 に移行し、
検索者は意図せず過剰に再生したものと見做し、停止発
言の区間は再生区間には含めないこととする。すなわ
ち、ステップ656 においては、再生区間の終了時の発言
を、停止発言の１つ前の発言と見做し、変数ＩＤstopを
「１」だけ減算する。Then, in the next step 655, the actual time Tstop at which the reproduction end instruction was input and the start time T (start time) of the speech number IDstop specified as the stop speech
Is smaller than a certain time ΔTlimit. If smaller, go to step 656,
The searcher unintentionally considers that the reproduction is excessive, and does not include the section of the stop comment in the reproduction section. That is, in step 656, the utterance at the end of the playback section is regarded as the utterance immediately before the stop utterance, and the variable IDstop is subtracted by "1".

【０１４２】ステップ655 で、再生終了指示入力のあっ
た実際の時刻Ｔstopと、停止発言の開始時刻Ｔ（開始時
刻）との差が、ΔＴlimit よりも大きいと判別された場
合には何もせず、再生終了指示入力位置で指定された時
刻までの区間をそのまま再生区間とする。そして、次の
ステップ657 においては、以上のようにして求めた再生
区間（ＩＤstart ，ＩＤstop）の値を返す。If it is determined in step 655 that the difference between the actual time Tstop at which the reproduction end instruction was input and the start time T (start time) of the stop utterance is larger than ΔTlimit, no operation is performed. The section up to the time specified by the playback end instruction input position is regarded as a playback section as it is. Then, in the next step 657, the value of the reproduction section (IDstart, IDstop) obtained as described above is returned.

【０１４３】次に、類似度の判定方式を選択するための
処理を、図２２のフローチャートについて説明する。Next, a process for selecting a similarity determination method will be described with reference to the flowchart in FIG.

【０１４４】まず、ステップ671 では、前述した再生区
間特定処理によって、再生区間（ＩＤstart ，ＩＤsto
p）が特定されている。次のステップ672 においては、
再生開始発言ＩＤstart と再生停止発言ＩＤstopが等し
いかが判断される。等しい場合には再生区間は区間では
なく、単一発言であることから、戻り値としては”発
言”を返し、発言に対する類似度判定を行う。一方、等
しくない場合には、再生区間には複数の発言が含まれて
いることから、戻り値としては”発言構造”を返し、発
言構造に対する類似度判定を行う。First, in step 671, the playback section (IDstart, IDsto
p) is specified. In the next step 672,
It is determined whether the reproduction start utterance IDstart is equal to the reproduction stop utterance IDstop. If they are equal, the playback section is not a section, but a single utterance. Therefore, "utterance" is returned as a return value, and the similarity determination for the utterance is performed. On the other hand, if they are not equal to each other, since a plurality of utterances are included in the reproduction section, "speech structure" is returned as a return value, and similarity determination for the utterance structure is performed.

【０１４５】図２３は、再生意図を説明するための図で
あり、これは、発言者チャートの一部を示すものであ
る。FIG. 23 is a diagram for explaining a reproduction intention, which shows a part of a speaker chart.

【０１４６】図２４に、再生意図の定義と表記方式につ
いて示す。この実施の形態において、再生意図は、再生
区間内における発言群の発言構造に関わる６つの属性に
よって定義する。６つの属性とは、指示発言、停止
発言者名、総発言数、総発言時間、発言者集合、
発言遷移行列である。FIG. 24 shows the definition and notation system of the reproduction intention. In this embodiment, the reproduction intention is defined by six attributes related to the statement structure of a group of statements in the playback section. The six attributes are the instruction utterance, the name of the uttered utterer, the total number of utterances, the total utterance time, the speaker set,
This is a statement transition matrix.

【０１４７】これらの属性を用いて、再生意図は、Ｉre
play（指示発言，停止発言者名，総発言数，総発言時
間，発言者集合，発言遷移行列）と表記する。また、再
生意図全体ではなく、再生意図を、個別の属性について
表記する場合には、再生意図Ｉreplay（）の、（）内に
それぞれの属性を記すこととする。例えば、再生意図の
発言者名属性は、Ｉreplay（発言者名）と標記する。他
の停止発言者名、総発言数、総発言時間、発言者集合、
発言遷移行列の属性の場合も同様の形式で記述する。By using these attributes, the reproduction intention is
Play (instructed utterance, name of stopped utterer, total number of utterances, total utterance time, set of utterers, utterance transition matrix). In addition, when not describing the entire reproduction intention but the reproduction intention with respect to individual attributes, each attribute is described in parentheses of the reproduction intention Ireplay (). For example, the speaker name attribute of the reproduction intention is marked as Ireplay (speaker name). The names of other suspended speakers, the total number of speeches, the total speech time, the speaker set,
The attribute of the statement transition matrix is described in the same format.

【０１４８】６つの属性の詳細について説明すると、指
示発言は、再生区間指示の場合には、再生開始指示位置
の発言（発言区間）に相当し、Ｉreplay（指示発言）＝
Ｉinst（指示発言）である。停止発言者名は、停止発言
の発言者名である。総発言数は、再生区間（ＩＤstart
，ＩＤstop）内に含まれる発言数である。また、総発
言時間は、再生区間（ＩＤstart ，ＩＤstop）内の各発
言の時間の総和である。発言者集合は、再生区間（ＩＤ
start ，ＩＤstop）内に含まれる発言者名の、重複を除
いたリストである。To explain the details of the six attributes, in the case of a reproduction section instruction, the instruction utterance corresponds to the utterance (utterance section) at the reproduction start instruction position, and Ireplay (instruction utterance) =
Iinst (instruction remark). The stop speaker name is the name of the stop speaker. The total number of remarks is based on the playback section (IDstart
, IDstop). The total utterance time is the sum of the time of each utterance in the playback section (IDstart, IDstop). The speaker set is a playback section (ID
Start, IDstop) is a list of speaker names included in the list without duplication.

【０１４９】発言遷移行列は、発言者集合に含まれる複
数人の発言者間の発言の遷移を表す行列であり、発言者
集合の発言者数がｎ人であれば、ｎ行×ｎ列の行列であ
る。すなわち、発言者ごとの入力装置番号順に、ｎ人を
並べ、また、ｎ列に並べる。そして、ある発言者Ａか
ら、ある発言者Ｂに発言の遷移があった場合に、発言者
Ａの入力装置番号に相当する行であって、発言者Ｂの入
力装置番号に相当する列の要素に１を加算する。これに
よって、どの発言者からどの発言者へ、何回の遷移が生
じたのかを表すことができる。The utterance transition matrix is a matrix representing utterance transition among a plurality of utterers included in the utterer set. If the number of utterers in the utterer set is n, n rows × n columns are used. It is a matrix. That is, n persons are arranged in the order of the input device number for each speaker, and arranged in n columns. When there is a transition from one speaker A to another speaker B, an element of a row corresponding to the input device number of the speaker A and a column corresponding to the input device number of the speaker B Is added to. Thus, it is possible to indicate how many transitions have occurred from which speaker to which speaker.

【０１５０】図２５は、図２３に示した発言者チャート
の再生区間に該当する再生意図の記述例を示している。FIG. 25 shows a description example of a reproduction intention corresponding to a reproduction section of the speaker chart shown in FIG.

【０１５１】まず、指示発言は再生入力指示された発言
であるから、発言番号205 が特定される。停止発言者名
は、特定された再生区間の停止発言に該当する発言の発
言者名であるから、図２３の例では発言番号209 の発言
者「鈴木」である。総発言数は、再生区間内に含まれる
発言の総数であるから、この例では５件である。総発言
時間は、再生区間内に含まれる発言群の各発言時間の総
和であるが、再生指示時刻Ｔstart ，再停止時刻Ｔstop
の差異時間は考慮せず、発言番号205 の先頭から、発言
番号209 の最後までであり、例えば３分２０秒である。
発言者集合は、この例では、（田中，鈴木，佐藤）であ
る。鈴木は３度の発言を行っているが、重複を除くの
で、１度しかカウントしない。First, since the instructed utterance is a utterance instructed to be reproduced and input, the utterance number 205 is specified. The stop speaker name is the speaker name of the speech corresponding to the stop speech of the specified reproduction section, and thus is the speaker “Suzuki” of the speech number 209 in the example of FIG. Since the total number of utterances is the total number of utterances included in the reproduction section, the total number is five in this example. The total utterance time is the sum of each utterance time of a group of utterances included in the reproduction section, and includes a reproduction instruction time Tstart and a re-stop time Tstop.
The difference time is not taken into account, and is from the head of the statement number 205 to the end of the statement number 209, for example, 3 minutes and 20 seconds.
The speaker set is (Tanaka, Suzuki, Sato) in this example. Suzuki has made three remarks, but counts only once, since duplication is eliminated.

【０１５２】発言遷移行列は、図２３の例では、発言者
「鈴木」から「田中」に１回、発言者「田中」から「鈴
木」に１回、発言者「鈴木」から「佐藤」に１回、発言
者「佐藤」から「鈴木」に１回という行列になる。In the example of FIG. 23, the statement transition matrix is changed from the speaker "Suzuki" to "Tanaka" once, the speaker "Tanaka" to "Suzuki" once, and the speaker "Suzuki" to "Sato". Once in a row, the speaker "Sato" goes once to "Suzuki".

【０１５３】図２６および図２７は、再生意図を抽出す
る処理を説明するためのフローチャートである。FIGS. 26 and 27 are flowcharts for explaining the process of extracting a reproduction intention.

【０１５４】ステップ711 とステップ712 とは、初期設
定のための処理である。まず、ステップ711 で、再生区
間特定処理によって変数ＩＤstart とＩＤstopに、それ
ぞれ再生開始指示のあった発言の発言番号、再生終了指
示のあった発言の発言番号が代入される。Steps 711 and 712 are processing for initial setting. First, in step 711, the speech number of the utterance instructed to start reproduction and the utterance number of the utterance instructed to end reproduction are assigned to the variables IDstart and IDstop by the reproduction section specifying process.

【０１５５】次のステップ712 では、各種の変数の初期
値を設定している。変数timeは総発言時間の値を保持す
る。変数Listは発言者集合を保持するためのリストであ
る。変数idには初期値として指示発言（開始発言）が設
定される。変数transferは発言遷移行列を保持する変数
である。初期値としては、会議参加者数ｎとした場合、
ｎ×ｎのゼロ行列が設定される。At the next step 712, initial values of various variables are set. The variable time holds the value of the total speech time. The variable List is a list for holding a set of speakers. An instruction utterance (start utterance) is set as an initial value in the variable id. The variable “transfer” is a variable that holds a statement transition matrix. As an initial value, when the number of conference participants is n,
An n × n zero matrix is set.

【０１５６】ステップ713 では、再生停止発言の発言番
号ＩＤstopに相当する発言構造テーブルのレコードを読
込み、変数R1に代入する。次のステップ714 では、変数
name-stop に、読み込んだ変数R1のレコード中の入力装
置番号に相当する会議参加者名を、会議参加者テーブル
４２から獲得して代入する。これは、停止発言者名に相
当する。In step 713, a record in the statement structure table corresponding to the statement number IDstop of the playback stop statement is read and assigned to the variable R1. In the next step 714, the variable
The name of the conference participant corresponding to the input device number in the record of the read variable R1 is acquired from the conference participant table 42 and substituted into name-stop. This corresponds to the name of the suspended speaker.

【０１５７】次のステップ715 では、発言構造テーブル
４１中の、再生開始発言の１つ前の発言のレコードを読
込み、それを変数R1に代入し、以後のステップ716 から
ステップ721 における繰り返し処理の準備を行う。ステ
ップ716 〜ステップ721 までの処理は、再生区間内の各
発言に対して繰り返し行われる再生意図抽出処理であ
る。In the next step 715, the record of the utterance immediately before the reproduction start utterance in the utterance structure table 41 is read, and it is substituted for the variable R1, and the subsequent steps 716 to 721 are prepared for the repetition processing. I do. The processing from step 716 to step 721 is a reproduction intention extraction processing that is repeatedly performed for each utterance in the reproduction section.

【０１５８】まず、ステップ716 において、変数idに示
される再生開始発言の発言番号と一致するレコードを発
言構造テーブル４１から読込み、それを変数R2に代入す
る。したがって、変数R1と変数R2とには、前後した発言
に関するレコードが代入されていることになる。なお、
以下の繰り返し処理の中での基本的な処理対象はR2であ
る。First, in step 716, a record that matches the statement number of the reproduction start statement indicated by the variable id is read from the statement structure table 41, and is substituted for the variable R2. Therefore, the records related to the preceding and following statements are substituted into the variable R1 and the variable R2. In addition,
The basic processing target in the following repetition processing is R2.

【０１５９】ステップ717 では、変数R2のレコード中の
発言番号が、停止発言ＩＤstopの発言番号よりも小さい
か、すなわち再生区間内に存在するかを判定する。再生
区間内に存在する場合には、ステップ718 ，ステップ71
9 ，ステップ720 において、再生意図に関わる属性の計
算を行う。In step 717, it is determined whether the statement number in the record of the variable R2 is smaller than the statement number of the stop statement IDstop, that is, whether the statement number exists in the reproduction section. If it exists in the playback section, steps 718 and 71
9. In step 720, an attribute related to the reproduction intention is calculated.

【０１６０】まず、ステップ718 では、総発言時間time
に、変数R2のレコード中の発言時間を加算する。総発言
数の変数numberも、＋１、加算する。ステップ719 で
は、発言者集合に関する処理が行われる。変数nameとし
ては、変数R2のレコード中の入力装置番号に該当する会
議参加者名を、会議者参加者テーブル４２から取り出
す。これが現在処理中の発言の発言者名である。そし
て、この変数nameに示される発言者名が、発言者集合Li
stにすでに存在しているかが判定され、まだリストに存
在していない場合には、発言者集合Listに、その変数na
meの発言者名が追加される。First, at step 718, the total speech time time
Is added to the speech time in the record of the variable R2. The variable number of the total number of remarks is also incremented by +1. In step 719, processing relating to the speaker set is performed. As the variable name, the conference participant name corresponding to the input device number in the record of the variable R2 is extracted from the conference participant participant table 42. This is the speaker name of the statement currently being processed. Then, the speaker name indicated in this variable name is the speaker set Li
It is determined whether or not it already exists in st, and if it does not yet exist in the list, the variable na
The name of the speaker of me is added.

【０１６１】ステップ720 では、発言遷移号列の処理が
行われる。会議参加者数＝ｎのときのｎ×ｎ行列におい
て、発言R2の前発言R1の入力装置番号を行番号とし、R2
の入力装置番号を列番号とする要素の値に＋１加算す
る。これはR1からR2への発言の遷移があったことを意味
している。At step 720, processing of a statement transition sequence is performed. In the n × n matrix when the number of conference participants = n, the input device number of the previous statement R1 of the statement R2 is taken as the row number, and R2
Is added to the value of the element whose column number is the input device number. This means that there was a transition of the utterance from R1 to R2.

【０１６２】ステップ721 では、次の繰り返しのための
後処理が行われている。すなわち、変数idに＋１加算す
ることで、次の発言を処理するための準備をおこなう。
また、変数R2は次の処理ループにおいては前発言とな
り、変数R1に代入する。At step 721, post-processing for the next repetition is performed. That is, by adding +1 to the variable id, the preparation for processing the next utterance is made.
Further, the variable R2 becomes the previous utterance in the next processing loop, and is substituted for the variable R1.

【０１６３】ステップ717 で変数idの発言番号が、発言
区間内に存在しないと判断された場合には、ステップ72
2 に移行し、算出した意図属性から全体の再生意図を導
出し、再生意図Ｉreplay（ＩＤstart ，name-stop ，nu
mber，time，List，transfer）を戻り値として返す。If it is determined in step 717 that the comment number of the variable id does not exist in the comment section, step 72
The process proceeds to step 2 to derive the overall playback intention from the calculated intention attribute, and to set the playback intention Ireplay (IDstart, name-stop, nu
mber, time, List, transfer) as the return value.

【０１６４】図２８は、発言構造の類似度の定義および
表記方法を説明する図である。発言の類似度と同様、意
図をＩ、類似度をＤＩと表記する。ＤＩは発言構造Ａ，
Ｂの類似度とする。この場合も、類似度は、類似度が高
いほど小さな値を持つものとする。FIG. 28 is a view for explaining the definition and notation method of the similarity of the statement structure. Similar to the similarity of the remark, the intention is represented by I and the similarity is represented by DI. DI is statement structure A,
B similarity. Also in this case, the similarity has a smaller value as the similarity is higher.

【０１６５】発言構造の類似度は、図示の定義式のよう
に定義される。すなわち、発言構造の類似度ＤＩa-stru
は、指示発言の類似度ＤＩartiと、発言構造の類似度Ｄ
Ｉstruの総和として定義でき、ＤＩa-stru＝α１×ＤＩarti＋α２×ＤＩstru …（３）として表される。α１およびα２はそれぞれ重み係数で
ある。The similarity of the statement structure is defined as shown in the definition equation. In other words, the similarity DIa-stru
Is the similarity DIarti of the instruction utterance and the similarity D of the utterance structure
It can be defined as the sum of the values of the following expressions: DIa-stru = α1 × DIarti + α2 × DIstru (3) α1 and α2 are weighting factors, respectively.

【０１６６】指示発言の類似度はすでに定義済みである
ので、ここでは、再生意図を構成する６つの属性のう
ち、指示発言を除く、他の５つの属性に関する類似度の
定義について説明する。Since the similarity of the instruction utterance has already been defined, the definition of the similarity regarding the other five attributes excluding the instruction utterance among the six attributes constituting the reproduction intention will be described here.

【０１６７】停止発言者名の類似度ＤＩstru（Ａ，Ｂ）
（停止発言者名）は、発言構造Ａと発言構造Ｂのおのお
のの発言区間において、最終の発言者名が同一であるか
を判断するものである。停止発言者名が一致する場合に
は、０の値をとり、異なる場合はＤＩmax という大きな
値を持つ。これは、指示発言の類似度と同様に、発言構
造の類似度においては、停止発言者名が一致しなけれ
ば、類似度の値は限りなく大きくなり、類似していない
と判断されることを意味している。The similarity DIstru (A, B) of the stop speaker name
The (suspended speaker name) is to determine whether the final speaker names are the same in the speech sections of the speech structure A and the speech structure B. If the names of the stopped speakers match, the value takes a value of 0, and if the names are different, the value takes a large value of DImax. This means that, in the similarity of the statement structure, if the names of the stopped speakers do not match, the value of the similarity will increase indefinitely, and it will be determined that they are not similar, as in the similarity of the instruction statement. Means.

【０１６８】総発言数の類似度ＤＩstru（Ａ，Ｂ）（総
発言数）は、総発言数の差異の絶対値で定義される。The similarity DIstru (A, B) (total utterance count) of the total utterance count is defined by the absolute value of the difference in the total utterance count.

【０１６９】同様に、総発言時間の類似度ＤＩstru
（Ａ，Ｂ）（総発言時間）は、総発言時間の差異の絶対
値で定義される。Similarly, the similarity DIstru of the total utterance time
(A, B) (total utterance time) is defined by the absolute value of the difference between the total utterance times.

【０１７０】発言者集合の類似度ＤＩstru（Ａ，Ｂ）
（発言者集合）は、発言構造Ａと発言構造Ｂの発言者集
合の和において、集合内の要素でＡとＢで重複しない発
言者の集合を算出する。類似度は、この算出された集合
の要素数で定義され、発言者集合が一致しない発言者が
多いほどその数値は大きくなる。Similarity DIstru (A, B) of speaker set
(Speaker set) calculates a set of speakers that do not overlap between A and B among elements in the set in the sum of the speaker sets of the statement structure A and the statement structure B. The similarity is defined by the number of elements of the calculated set, and the numerical value increases as the number of speakers whose speaker sets do not match increases.

【０１７１】発言遷移構造の類似度ＤＩstru（Ａ，Ｂ）
（発言遷移行列）は、発言遷移行列の差異の絶対値を算
出し、各要素の総和によって定義される。これは、発言
者Ｘから発言者Ｙへの遷移というパターンの一致度がど
のくらい存在するのかを表し、同一遷移パターンが多い
ほど、類似度の値は小さくなり、類似度は大きいと解釈
する。Similarity DIstru (A, B) of statement transition structure
(Speech transition matrix) calculates the absolute value of the difference between the speech transition matrices and is defined by the sum of the elements. This indicates the degree of coincidence of the pattern of the transition from the speaker X to the speaker Y, and it is interpreted that as the number of the same transition patterns increases, the value of the similarity decreases and the similarity increases.

【０１７２】発言構造の類似度は、次の定義式（４）に
示すように、停止発言者名属性を条件部として、その他
の各属性毎の類似度の重みづき合成関数として表現され
る。すなわち、発言構造の類似度ＤＩstruは、 (i) ＤＩstru（停止発言者名）＝０のときには、ＤＩstru＝ｗ１×ＤＩstru（総発言数）＋ｗ２×ＤＩst
ru（総発言時間）＋ｗ３×ＤＩstru（発言者集合）＋ｗ
４×ＤＩstru（Ａ，Ｂ）（発言遷移行列） (ii)ＤＩstru（停止発言者名）＞０のときには、ＤＩstru＝ＤＩmax …（４）と定義される。なお、ｗ１，ｗ２，ｗ３，ｗ４は重み係
数である。As shown in the following definition expression (4), the similarity of the statement structure is expressed as a weighted composite function of the similarity for each of the other attributes, with the attribute of the stopped speaker name as a condition part. That is, the similarity DIstru of the statement structure is as follows: (i) When DIstru (name of the stopped speaker) = 0, DIstru = w1 × DIstru (total number of statements) + w2 × Dist
ru (total utterance time) + w3 × DIStru (speaker set) + w
4 × DIstru (A, B) (state transition matrix) (ii) When DIstru (stop speaker name)> 0, DIstru = DImax (4) is defined. Note that w1, w2, w3, and w4 are weight coefficients.

【０１７３】この式（４）に示されるように、発言構造
の類似度に関し、停止発言者名属性の類似度は条件部で
あり、一致が必要条件になる。停止発言者名が一致して
いるときに、他の４つの属性の合成関数の合成関数とし
て定義される。すなわち、発言構造が類似しているとい
うことは、指示発言が類似していることに加えて、停止
発言者名が一致していることが必要条件であり、不一致
の場合は類似度は無限大の値をとり、まったく類似して
いないことを意味するからである。As shown in the expression (4), the similarity of the attribute of the uttered speaker name is a condition part with respect to the similarity of the utterance structure, and matching is a necessary condition. When the names of the stopped speakers are the same, they are defined as a composite function of the composite functions of the other four attributes. In other words, the similarity of the statement structure is a necessary condition that, in addition to the similarity of the instruction statements, the names of the stopped speakers match, and if they do not match, the degree of similarity is infinite. , Which means that they are not quite similar.

【０１７４】総発言数、総発言時間、発言者集合、発言
遷移行列の４つの属性の合成関数では、各々の類似度
に、ｗ１，ｗ２，ｗ３，ｗ４という重みがつけられ、加
算することにより類似度を算出する。In the composite function of the four attributes of the total number of utterances, the total utterance time, the utterer set, and the utterance transition matrix, each similarity is weighted by w1, w2, w3, and w4 and added. Calculate the similarity.

【０１７５】図２９は、類似発言構造を検出するための
処理を説明するフローチャートである。FIG. 29 is a flowchart for explaining a process for detecting a similar utterance structure.

【０１７６】ステップ781 は初期設定を行うステップで
あり、類似発言構造の存在場所の値のリストを保持する
変数Listに初期値（）を設定している。ステップ782 で
は、発言構造テーブル４１から１レコード読込み、変数
R1に代入する。次のステップ783 では、変数R1がnil で
なければ、すなわち処理すべきレコードが存在すれば、
次のステップ784 において類似発言構造候補の区間の抽
出を行う。次いで、ステップ785 の発言構造類似度算出
処理を行う。Step 781 is a step for performing initial setting, in which an initial value () is set in a variable List that holds a list of values of locations where similar speech structures exist. In step 782, one record is read from the comment structure table 41, and the variable
Substitute for R1. In the next step 783, if the variable R1 is not nil, that is, if there is a record to be processed,
In the next step 784, a section of a similar utterance structure candidate is extracted. Next, a comment structure similarity calculation process of step 785 is performed.

【０１７７】そして、次のステップ786 では、算出され
た発言構造の類似度が、類似しているという一定の基準
を満たしているかいなかを評価する発言構造類似度判定
処理を行い、ステップ787 で類似していると判定された
発言構造候補に該当するデータファイルの存在場所を検
出する。ステップ783 において読込むべきレコードがな
かった場合には処理を終了する。In the next step 786, a statement structure similarity determination process for evaluating whether or not the calculated similarity of the statement structure satisfies a certain criterion of similarity is performed. The location of the data file corresponding to the statement structure candidate determined to be present is detected. If there is no record to be read in step 783, the process ends.

【０１７８】図３０は、発言構造の類似候補の発言区間
を抽出するための処理を説明するフローチャートであ
る。FIG. 30 is a flowchart for explaining a process for extracting a speech section of a similar candidate having a speech structure.

【０１７９】ステップ801 では、処理の初期値として、
再生区間特定処理によって再生区間Ａと、再生意図抽出
処理によって再生意図Ｉreplay（Ａ）（指示発言、停止
発言者名、総発言数、総発言時間、発言者集合、発言遷
移行列）を算出する。In step 801, the initial value of the processing is
The reproduction section A is calculated by the reproduction section identification processing, and the reproduction intention Ireplay (A) (instruction utterance, stop speaker name, total utterance number, total utterance time, utterer set, utterance transition matrix) is calculated by the reproduction intention extraction processing.

【０１８０】次のステップ802 では、検出した類似発言
構造候補を代入する変数ＫListに空リスト()を代入す
る。ステップ803 では、現在処理中の発言構造テーブル
のレコードR1の発言番号を抽出し、変数idに代入する。
ステップ804 では、再生区間の開始発言である発言番号
ＩＤstart の発言と、発言番号idの発言との類似度を算
出し、その類似度が、ある一定の類似度ＤＩlimit より
も小さいか否かが判定される。開始発言の指示意図が類
似していることは、発言構造が類似しているための必要
条件である。従って、もし、類似度が一定の値よりも大
きい、すなわち、類似していないと判定されたら、ステ
ップ813 へ移り、戻り値としてＫListを返し、処理は終
了する。In the next step 802, an empty list () is substituted for a variable KList to which the detected similar speech structure candidate is substituted. In step 803, the statement number of the record R1 of the statement structure table that is currently being processed is extracted and assigned to a variable id.
In step 804, the similarity between the utterance of the utterance number IDstart which is the start utterance of the playback section and the utterance of the utterance number id is calculated, and it is determined whether or not the similarity is smaller than a certain similarity DIlimit. Is done. The similarity of the instruction intentions of the start utterances is a necessary condition for the utterance structures to be similar. Therefore, if it is determined that the similarity is larger than a certain value, that is, it is determined that they are not similar, the process proceeds to step 813, where KList is returned as a return value, and the process ends.

【０１８１】ステップ804 で、再生区間の開始発言と、
発言番号idの発言が類似していると判定された場合に
は、ステップ805 〜ステップ812 の処理で、発言構造の
区間を特定する。At step 804, a remark of the start of the reproduction section,
When it is determined that the utterances with the utterance number id are similar, the section of the utterance structure is specified by the processing of steps 805 to 812.

【０１８２】すなわち、ステップ805 では、カウンタ変
数ｎの初期値としてid＋１を代入する。これは、現在処
理中の発言の次の発言から処理を行うことを意味してい
る。また、停止発言者の処理に関するカウンタ変数ｍの
初期値として１を設定し、変数Ｍには、停止発言者名に
関する処理のループの最大回数として、再生区間内にお
ける停止発言者名の発言回数を設定する。これは、類似
発言構造の区間を抽出する際、調査する区間の範囲を限
定する一つの基準として、停止発言者の出現回数を用い
たケースである。That is, in step 805, id + 1 is substituted as the initial value of the counter variable n. This means that processing is performed from a statement next to a statement currently being processed. Also, 1 is set as the initial value of the counter variable m relating to the processing of the stopped speaker, and the variable M is set to the maximum number of loops of the processing relating to the name of the stopped speaker, and the number of times of the name of the stopped speaker in the playback section. Set. This is a case in which the number of appearances of stopped speakers is used as one criterion for limiting the range of the section to be investigated when extracting sections having a similar utterance structure.

【０１８３】次のステップ806 においては、発言番号が
ｎのレコードを発言構造テーブル４１から読込み、それ
を変数R2に代入する。次のステップ807 においては、変
数R2がnil かどうかを判定し、nil の場合、すなわち、
読込むべきレコードがない場合には、適切な発言構造が
抽出できなかったとして、ステップ813 に移り、変数Ｋ
Listを戻り値として返し、処理は終了する。In the next step 806, the record with the comment number n is read from the comment structure table 41 and is substituted for the variable R2. In the next step 807, it is determined whether or not the variable R2 is nil.
If there is no record to be read, it is determined that an appropriate utterance structure could not be extracted, and the process proceeds to step 813 where the variable K
List is returned as a return value, and the process ends.

【０１８４】ステップ807 において、変数R2がnil でな
いと判定されたときには、ステップ808 に移行する。ス
テップ808 では、変数ｍが停止発言者名に関するループ
の最大値を超えたかどうかが判断され、超えていた場合
には、発言idに関する処理は終了し、ステップ813 で戻
り値としてＫListを返し、処理は終了する。超えていな
ければ、ステップ809 に進む。If it is determined in step 807 that the variable R2 is not nil, the flow shifts to step 808. At step 808, it is determined whether or not the variable m has exceeded the maximum value of the loop relating to the name of the stopped speaker. If so, the processing relating to the statement id is terminated. At step 813, KList is returned as a return value, and Ends. If not, go to step 809.

【０１８５】ステップ809 においては、再生意図の停止
発言者名属性の値と、変数R2の発言番号に相当する会議
参加者名が一致しているかを判断する。一致しない場合
には、ステップ811 においてカウンタ変数ｎに１を加算
し、次の発言の処理を行うためにステップ806 へ移行す
る。一致している場合には、類似発言候補の区間が特定
されたと判断し、ステップ810 に移行して、変数ＫList
に、特定された類似発言候補の区間（id，ｎ）をＫList
に追加する。そして、ステップ812 で停止発言者名に関
する処理のカウンタ変数ｍに１を加えて、発言番号idの
発言に関して次の発言構造を探索するための処理を継続
する。In step 809, it is determined whether or not the value of the attribute of the name of the paused speaker who intends to reproduce is equal to the name of the conference participant corresponding to the statement number of the variable R2. If they do not match, in step 811, 1 is added to the counter variable n, and the flow shifts to step 806 in order to process the next utterance. If they match, it is determined that the section of the similar utterance candidate has been specified, and the routine proceeds to step 810, where the variable KList
In the KList, the section (id, n) of the identified similar utterance candidate is
Add to Then, in step 812, 1 is added to the counter variable m of the processing relating to the name of the stopped speaker, and the processing for searching for the next statement structure for the statement with the statement number id is continued.

【０１８６】図３１は、発言構造の類似度算出処理を説
明するためのフローチャートである。まず、ステップ85
1 では、初期設定として類似発言構造区間抽出処理によ
り、抽出した区間のリストをＫListに代入する。次い
で、ステップ852 では、再生区間を変数Ａに設定する。FIG. 31 is a flowchart for explaining the similarity calculation process of the utterance structure. First, step 85
In 1, a list of extracted sections is substituted into KList by a similar utterance structure section extraction process as an initial setting. Next, at step 852, the reproduction section is set to a variable A.

【０１８７】ステップ853 からステップ858 は、ＫList
の各要素毎に類似度を算出するための処理を行う。ステ
ップ853 では、ＫListから類似発言構造候補である１つ
の区間（ＩＤstart ，ＩＤstop）を取り出し、変数Ｂに
代入する。ステップ854 では、ＫList中のすべての発言
構造についての処理が終了したのかを判断する。もし終
了したら、ステップ859 に移る。Steps 853 to 858 correspond to the KList
Is performed for each element. In step 853, one section (IDstart, IDstop) as a similar utterance structure candidate is extracted from the KList and assigned to a variable B. In step 854, it is determined whether or not the processing has been completed for all the comment structures in the KList. If it is finished, go to step 859.

【０１８８】ステップ854 で処理すべき再生区間が存在
すると判定された場合には、ステップ855 で変数Ａの再
生区間と、変数Ｂの再生区間のそれぞれ開始発言に関す
る指示意図の類似度を定義式にそって算出する。つい
で、ステップ856 では、発言構造を規定する各属性毎の
類似度を算出する。このとき、停止発言者名に関する類
似度は区間抽出時に判定済みであり、ここでは、総発言
数、総発言時間、総発言者集合、発言遷移行列の４つの
属性について算出する。If it is determined in step 854 that there is a playback section to be processed, in step 855, the similarity between the instruction intentions regarding the start utterances of the playback section of variable A and the playback section of variable B is defined as a definition expression. Then calculate. Next, in step 856, the similarity for each attribute defining the comment structure is calculated. At this time, the similarity regarding the stopped speaker name has already been determined at the time of section extraction, and here, four attributes of the total number of statements, the total statement time, the total speaker set, and the statement transition matrix are calculated.

【０１８９】次のステップ857 では、定義式にそって、
再生区間Ａと再生区間Ｂの発言構造の類似度を定義式に
そって算出する。次のステップ858 では、開始発言の指
示意図の類似度と、発言構造の類似度の両者を合わせた
総合的な発言構造の類似度を算出し、類似度のリストを
保持する変数ＤListに追加する。以降、ステップ853に
戻り、処理を繰り返す。In the next step 857, according to the definition formula,
The similarity between the utterance structures of the playback section A and the playback section B is calculated according to the definition formula. In the next step 858, the overall similarity of the utterance structure is calculated by combining both the similarity of the instruction intention of the start utterance and the similarity of the utterance structure, and is added to the variable DList holding a list of similarities. . Thereafter, the process returns to step 853, and the process is repeated.

【０１９０】最後に、ステップ859 で、類似発言候補の
リストＫListと、類似度のリストＤListを戻り値として
処理を終了する。Finally, in step 859, the process is terminated with the list KList of similar utterance candidates and the list DList of similarity as return values.

【０１９１】図３２は、類似発言構造候補の類似度の判
定を行い、該当する音声データファイルの場所を検出す
る処理を説明するためのフローチャートである。FIG. 32 is a flowchart for explaining a process of determining the similarity of a similar utterance structure candidate and detecting the location of the corresponding audio data file.

【０１９２】ステップ871 で、初期設定が行われ、類似
度算出処理による戻り値である類似発言候補の区間のリ
ストを変数ＫListに、再生意図との類似度の値のリスト
を変数ＤListに、それぞれ代入する。In step 871, initialization is performed. A list of similar utterance candidate sections, which are return values of the similarity calculation process, is set as a variable KList, and a list of similarity values with the reproduction intention is set as a variable DList. substitute.

【０１９３】ステップ872 からステップ875 までは、リ
スト中の各要素に対して、類似度判定処理を行う。まず
ステップ872 において、ＤList，ＫListのリスト中から
それぞれ１つの要素を取り出し、変数Ｄ，変数Ｋに代入
する。次のステップ873 では処理すべき要素が終了した
か否かを判定する。終了した場合にはステップ876 に進
む。ステップ876 では、類似発言構造の区間を保持する
変数Listの値を戻り値として返し、処理を終了する。From step 872 to step 875, similarity judgment processing is performed for each element in the list. First, at step 872, one element is taken out from each of the lists DList and KList, and substituted into variables D and K. In the next step 873, it is determined whether or not the element to be processed has been completed. If the processing has been completed, the process proceeds to step 876. In step 876, the value of the variable List that holds the section of the similar utterance structure is returned as a return value, and the process ends.

【０１９４】ステップ873 で、リスト中の要素の処理が
終了していないと判定した場合には、ステップ874 で、
類似度の値が、ある一定の制限値ＤＩlimit よりも小さ
いか否かを判定する。ある一定の類似度よりも小さな値
の場合には、類似していると判定され、ステップ875 に
進み、変数Ｄに該当する区間Ｋを、類似発言構造候補を
保持するリストListに追加する。そして、ステップ872
に戻り、次の要素に関して処理を繰り返す。ステップ87
4 で、変数Ｄの値がＤＩlimit よりも大きい場合は、類
似していないと判定し、ステップ872 に戻り、次の要素
の処理に進む。If it is determined in step 873 that the processing of the elements in the list has not been completed, in step 874,
It is determined whether or not the value of the similarity is smaller than a certain limit value DIlimit. If the value is smaller than a certain similarity, it is determined that they are similar, and the process proceeds to step 875, and the section K corresponding to the variable D is added to the list List holding the similar utterance structure candidates. And step 872
And the process is repeated for the next element. Step 87
In step 4, if the value of the variable D is larger than DIlimit, it is determined that they are not similar, and the flow returns to step 872 to proceed to the processing of the next element.

【０１９５】図３３は、検出された類似発言構造候補の
表示方法の一実施例を説明するための図である。FIG. 33 is a diagram for explaining an embodiment of a method for displaying detected similar speech structure candidates.

【０１９６】901 は類似発言構造候補表示領域である。
この領域901 は、全会議時間表示領域902 と、類似発言
構造候補縮小図表示領域903 との２つの領域から構成さ
れる。類似発言構造が検出されると、全会議時間表示領
域902 に、類似発言構造候補が存在する場所が、縦バー
表示904 および905 のように示される。縦バー表示904
は、再生区間を示している。Reference numeral 901 denotes a similar utterance structure candidate display area.
This area 901 is composed of two areas, an all meeting time display area 902 and a similar utterance structure candidate reduced figure display area 903. When the similar speech structure is detected, the place where the similar speech structure candidate exists is shown in vertical bar displays 904 and 905 in the entire conference time display area 902. Vertical bar display 904
Indicates a playback section.

【０１９７】縦バー表示905 は、類似発言候補の存在場
所を示す。全会議時間表示領域902に、類似発言構造候
補の存在場所が示されることで、類似発言候補が全体の
どの部分に存在しているのかが一覧できる。A vertical bar display 905 indicates the location of a similar utterance candidate. By indicating the location of the similar utterance structure candidate in the all meeting time display area 902, it is possible to list in which part of the whole the similar utterance candidate exists.

【０１９８】類似発言構造候補縮小図表示領域903 は、
複数の矩形領域から構成される。各矩形領域には、発言
構造の縮小図が表示される。906 に表示された縮小図は
904再生区間に相当する発言構造である。矩形領域907
を始めとするその他の矩形領域には、縦バー表示905 を
始めとする他の会議時間中に存在する類似発言構造候補
に相当する発言構造の縮小図が、時系列順に表示されて
いる。検索者は、表示された縮小図をマウス等ポインテ
ィングデバイスによりクリックすることにより、類次候
補を選択し、再生することができる。[0198] The similar utterance structure candidate reduced figure display area 903 is
It is composed of a plurality of rectangular areas. In each rectangular area, a reduced view of the statement structure is displayed. The reduced view displayed on 906 is
This is a speech structure corresponding to 904 playback sections. Rectangular area 907
In the other rectangular areas including, a reduced view of a speech structure corresponding to a similar speech structure candidate existing during another meeting time, such as a vertical bar display 905, is displayed in chronological order. The searcher can select and play similar candidates by clicking on the displayed reduced view with a pointing device such as a mouse.

【０１９９】なお、全会議時間表示領域902 において、
その存在場所を示すだけではなく、矩形領域の表示色を
変化させることで、類似度の大きさも情報として提示す
ることもできる。また、ここでは類似発言候補に関して
表示例を示したが、類似発言の表示方法に関しても、類
似発言および前後の遷移発言構造を含めた部分に関し
て、同様の表示を行うことができる。In the total conference time display area 902,
By changing the display color of the rectangular area as well as indicating its location, the degree of similarity can also be presented as information. Although a display example of similar utterance candidates is shown here, a similar display can be performed for a similar utterance display method for a portion including a similar utterance and a transition utterance structure before and after.

【０２００】[0200]

【発明の効果】以上のように、この発明による会議記録
再生装置によれば、検索者の検索意図を検索の指示入力
行為および再生行為から自動的に抽出し、類似した発言
および一連の発言群を検出し、表示画面上に視覚化して
提示する。これにより、会議情報の検索者の検索意図と
類似した構造を持つ発言が、検索者の付加的な入力なし
に、自動的に抽出できる。さらに、検索者に類次発言候
補を視覚的に提示することにより、その存在を知らしめ
ることが可能となる。As described above, according to the conference recording / playback apparatus of the present invention, the search intention of the searcher is automatically extracted from the search instruction input action and the playback action, and a similar statement and a series of statement groups Is detected and visualized and presented on a display screen. As a result, remarks having a structure similar to the search intention of the searcher of the conference information can be automatically extracted without additional input by the searcher. Further, by visually presenting a similar utterance candidate to the searcher, it becomes possible to inform the presence of the candidate.

【０２０１】類似発言および類似発言構造候補を、検索
者に提示することにより、会議情報の必要とする情報へ
とアクセスしたい検索者が、十分なアクセスのための手
がかりがない状態でアクセスし、正しい場所にアクセス
できなかった場合にも、検索意図に類似した他の候補が
自動的に提示されることにより、効率的に、正しいアク
セス場所へとたどり着くことが可能となる。By presenting the similar utterance and the similar utterance structure candidate to the searcher, the searcher who wants to access the information required for the conference information accesses without sufficient clues for the access, and corrects it. Even when the location cannot be accessed, another candidate similar to the search intention is automatically presented, so that it is possible to efficiently reach the correct access location.

【０２０２】逆に、あいまいな記憶にたよって、再生個
所を正しいと誤って判断した場合にも、他に類似候補が
存在することを検索者に示すことで、他にも正しいと考
えられる候補が存在することを検索者が知ることがで
き、検索もれを減少させることができる。Conversely, even if the reproduction location is erroneously determined to be correct based on the ambiguous memory, by indicating to the searcher that other similar candidates exist, other candidates that are considered to be correct can be obtained. Can be known by a searcher, and search omission can be reduced.

【０２０３】また、類次候補の表示画面において、時系
列的な全体の中の相対的な位置と、各類似候補の内容が
把握できる詳細情報の縮小図の一覧表示を同時に表示す
ることにより、相対的な位置情報と絶対的な内容に関す
る情報２つの情報を有機的に連結することができる。こ
れにより、発言構造の認識力が向上し、検索者の検索行
為を適切にナビゲートし、効率的に検索が可能となる。
また、このような情報を参照しながら再生情報を聞く、
または見ることにより、再生内容の理解も促進すること
ができる。[0203] Further, on the display screen of the similar candidates, a relative position in the whole of the chronological order and a list display of a reduced view of detailed information which can grasp the contents of each similar candidate are simultaneously displayed. Relative position information and information on absolute contents can be organically linked. As a result, the recognizing ability of the comment structure is improved, the search action of the searcher is appropriately navigated, and the search can be efficiently performed.
Listening to playback information while referring to such information,
Or, by watching, it is possible to promote the understanding of the reproduction contents.

[Brief description of the drawings]

【図１】この発明の一実施の形態の会議情報記録再生装
置のシステム構成図を示すブロック図である。FIG. 1 is a block diagram showing a system configuration diagram of a conference information recording / reproducing apparatus according to an embodiment of the present invention.

【図２】この発明の一実施の形態の会議情報記録再生装
置のファイル格納部に格納されるデータファイルについ
て説明する図である。FIG. 2 is a diagram illustrating a data file stored in a file storage unit of the conference information recording / playback apparatus according to one embodiment of the present invention.

【図３】図２のファイル格納部の会議参加者テーブルの
データ構造を説明するための図である。FIG. 3 is a diagram for explaining a data structure of a conference participant table in a file storage unit of FIG. 2;

【図４】図２のファイル格納部の発言構造テーブルのデ
ータ構造を説明するための図である。FIG. 4 is a diagram for explaining a data structure of a statement structure table in a file storage unit in FIG. 2;

【図５】発言者チャートの一例を示す図である。FIG. 5 is a diagram showing an example of a speaker chart.

【図６】検索者が再生したい発言を指示する方法を説明
するための図である。FIG. 6 is a diagram for explaining a method in which a searcher instructs a statement to be reproduced;

【図７】検索者の指示入力位置と発言者チャート表示領
域における相対座標位置との関係を説明するための図で
ある。FIG. 7 is a diagram for explaining a relationship between a searcher's instruction input position and a relative coordinate position in a speaker chart display area.

【図８】この発明の一実施の形態の会議情報記録再生装
置において、類似発言候補を検出するための処理の概要
を示すフローチャートである。FIG. 8 is a flowchart showing an outline of a process for detecting a similar utterance candidate in the conference information recording / reproducing apparatus according to one embodiment of the present invention;

【図９】この発明の一実施の形態の会議情報記録再生装
置において、発言特定処理を説明するためのフローチャ
ートである。FIG. 9 is a flowchart for explaining a statement specifying process in the conference information recording / reproducing apparatus according to one embodiment of the present invention;

【図１０】この発明の一実施の形態の会議情報記録再生
装置において、指示意図を説明するための図である。FIG. 10 is a diagram for explaining an instruction intention in the conference information recording and reproducing apparatus according to one embodiment of the present invention;

【図１１】この発明の一実施の形態の会議情報記録再生
装置において、指示意図を抽出する処理を説明するため
のフローチャートである。FIG. 11 is a flowchart illustrating a process of extracting an instruction intention in the conference information recording and reproducing apparatus according to the embodiment of the present invention;

【図１２】この発明の一実施の形態の会議情報記録再生
装置において、発言の類似度の定義および表記方法を説
明する図である。FIG. 12 is a diagram illustrating a definition and a notation method of a similarity between utterances in the conference information recording and reproducing apparatus according to the embodiment of the present invention;

【図１３】この発明の一実施の形態の会議情報記録再生
装置において、類似発言を検出するための処理を説明す
るフローチャートである。FIG. 13 is a flowchart illustrating a process for detecting a similar utterance in the conference information recording / reproducing apparatus according to the embodiment of the present invention.

【図１４】この発明の一実施の形態の会議情報記録再生
装置において、発言類似度算出処理を説明するためのフ
ローチャートである。FIG. 14 is a flowchart illustrating a statement similarity calculation process in the conference information recording and reproducing apparatus according to one embodiment of the present invention;

【図１５】この発明の一実施の形態の会議情報記録再生
装置において、発言類似度判定処理を説明するためのフ
ローチャートである。FIG. 15 is a flowchart for explaining a statement similarity determination process in the conference information recording and reproducing apparatus according to one embodiment of the present invention;

【図１６】この発明の一実施の形態の会議情報記録再生
装置において、発言類似候補に相当するデータファイル
の場所を検出する処理を説明するためのフローチャート
である。FIG. 16 is a flowchart illustrating a process of detecting a location of a data file corresponding to a similar statement candidate in the conference information recording and reproducing apparatus according to one embodiment of the present invention;

【図１７】この発明の一実施の形態の会議情報記録再生
装置において、検索者意図抽出部の詳細を説明するため
のブロック図である。FIG. 17 is a block diagram illustrating details of a searcher intention extracting unit in the conference information recording / reproducing apparatus according to one embodiment of the present invention;

【図１８】この発明の一実施の形態の会議情報記録再生
装置において、類似候補検出部の詳細を説明するための
ブロック図である。FIG. 18 is a block diagram illustrating details of a similar candidate detecting unit in the conference information recording / reproducing apparatus according to one embodiment of the present invention;

【図１９】この発明の一実施の形態の会議情報記録再生
装置において、発言者チャートにおける再生区間につい
て説明するための図である。FIG. 19 is a diagram for describing a playback section in a speaker chart in the conference information recording / playback apparatus according to one embodiment of the present invention.

【図２０】この発明の一実施の形態の会議情報記録再生
装置において、類似発言構造候補を検出するための処理
を説明するためのフローチャートである。FIG. 20 is a flowchart illustrating a process for detecting a similar utterance structure candidate in the conference information recording / reproducing apparatus according to one embodiment of the present invention;

【図２１】この発明の一実施の形態の会議情報記録再生
装置において、再生区間を特定する処理を説明するため
のフローチャートである。FIG. 21 is a flowchart illustrating a process of specifying a playback section in the conference information recording / playback apparatus according to one embodiment of the present invention.

【図２２】この発明の一実施の形態の会議情報記録再生
装置において、類似度の判定方式を選択するための処理
を説明するフローチャートである。FIG. 22 is a flowchart illustrating a process for selecting a similarity determination method in the conference information recording and reproducing apparatus according to one embodiment of the present invention;

【図２３】この発明の一実施の形態の会議情報記録再生
装置において、再生意図を説明するための図である。FIG. 23 is a diagram for describing a reproduction intention in the conference information recording / reproducing apparatus according to one embodiment of the present invention;

【図２４】この発明の一実施の形態の会議情報記録再生
装置において、再生意図を説明するための図である。FIG. 24 is a diagram for explaining a reproduction intention in the conference information recording / reproducing apparatus according to one embodiment of the present invention;

【図２５】この発明の一実施の形態の会議情報記録再生
装置において、再生意図を説明するための図である。FIG. 25 is a diagram for describing a playback intention in the conference information recording / playback apparatus according to one embodiment of the present invention.

【図２６】この発明の一実施の形態の会議情報記録再生
装置において、再生意図を抽出する処理を説明するため
のフローチャートの一部を示す図である。FIG. 26 is a diagram showing a part of a flowchart for explaining a process of extracting a reproduction intention in the conference information recording / reproducing apparatus according to one embodiment of the present invention;

【図２７】この発明の一実施の形態の会議情報記録再生
装置において、再生意図を抽出する処理を説明するため
のフローチャートの一部を示す図である。FIG. 27 is a diagram showing a part of a flowchart for explaining a process of extracting a reproduction intention in the conference information recording / reproducing apparatus according to one embodiment of the present invention;

【図２８】この発明の一実施の形態の会議情報記録再生
装置において、発言構造の類似度の定義および表記方法
を説明する図である。FIG. 28 is a diagram illustrating a definition and a notation method of a similarity of a statement structure in the conference information recording / reproducing device according to the embodiment of the present invention.

【図２９】この発明の一実施の形態の会議情報記録再生
装置において、類似発言構造を検出するための処理を説
明するフローチャートである。FIG. 29 is a flowchart illustrating a process for detecting a similar utterance structure in the conference information recording and reproducing apparatus according to one embodiment of the present invention;

【図３０】この発明の一実施の形態の会議情報記録再生
装置において、発言構造の類似候補の発言区間を抽出す
るための処理を説明するフローチャートである。FIG. 30 is a flowchart illustrating a process for extracting a speech section of a similar candidate having a speech structure in the conference information recording / playback apparatus according to one embodiment of the present invention;

【図３１】この発明の一実施の形態の会議情報記録再生
装置において、発言構造の類似度算出処理を説明するた
めのフローチャートである。FIG. 31 is a flowchart illustrating a similarity calculation process of a statement structure in the conference information recording and reproducing apparatus according to one embodiment of the present invention;

【図３２】この発明の一実施の形態の会議情報記録再生
装置において、類似発言構造候補の類似度の判定を行
い、該当する音声データファイルの場所を検出する処理
を説明するためのフローチャートである。FIG. 32 is a flowchart illustrating a process of determining a similarity of a similar utterance structure candidate and detecting a location of a corresponding audio data file in the conference information recording / reproducing apparatus according to the embodiment of the present invention; .

【図３３】この発明の一実施の形態の会議情報記録再生
装置において、検出された類似発言構造候補の表示方法
の一実施例を説明するための図である。FIG. 33 is a diagram for explaining an example of a method of displaying detected similar speech structure candidates in the conference information recording / reproducing apparatus according to one embodiment of the present invention;

[Explanation of symbols]

１ａ音声入力装置２Ａ／Ｄ変換装置４ファイル格納部５発言者チャート生成制御部６発話データ抽出部７タイマー８発言構造テーブル生成部９発言者チャート生成部１０発言者チャート表示部１１表示装置１２指示入力装置１３映像再生装置１４音声再生装置１５発言者チャート検索制御部１６発言特定部１７検索者意図抽出部１８類似候補検出部１９類似候補表示部 Reference Signs List 1a Voice input device 2 A / D converter 4 File storage unit 5 Speaker chart generation control unit 6 Utterance data extraction unit 7 Timer 8 Utterance structure table generation unit 9 Utterance chart generation unit 10 Utterance chart display unit 11 Display device 12 Instruction input device 13 Video playback device 14 Audio playback device 15 Speaker chart search control unit 16 Statement identification unit 17 Searcher intention extraction unit 18 Similar candidate detection unit 19 Similar candidate display unit

Claims

[Claims]

Recording means for recording voice data when a plurality of conference participants hold a conference; statement structure extracting means for extracting a statement structure of the plurality of conference participants from the voice data; Visualization information generating means for generating visualization information for visualizing the statement structure; a display device for visualizing the statement structure based on the visualization information; and a statement visualized on the display device. Instruction input means for inputting an instruction in a structure; reproducing means for reproducing audio data corresponding to the position or portion indicated by the instruction input means; and intention of a searcher's instruction operation through the instruction input means To
An intention extraction unit that extracts using an attribute value determined in advance to extract an intention; a similar candidate detection unit that detects a voice data section having an intention similar to the intention extracted by the intention extraction unit; A similarity candidate display means for visualizing the similarity candidates detected by the similarity candidate detection means on a display device.

2. A speech input device provided for each conference participant for inputting speech data of conference information, storage means for storing said speech data, and speech data extraction means for extracting speech from said speech data. Utterance structure table generation means for generating an utterance structure table from the extracted utterance data and a timer; storage means for storing the utterance structure table; holding a correspondence relationship between the voice input device and the conference participant Storage means for storing a conference participant table to be executed; speaker chart generation means for generating a speaker chart for visualizing the speech structure table on a display device; and the speaker chart generation means generated by the speaker chart generation means. Speaker chart display means for displaying a speaker chart on the display device, and a searcher performs reproduction on the speaker chart. An instruction input means for instructing an arbitrary utterance that figure, the speech identifying means for identifying the speech indicated by the instruction input unit, an instruction intention of the searcher related to the identified speech,
Intention extraction means for extracting using an attribute value determined in advance to extract an intention, and instructed by the instruction input means using information on the instruction intention of the searcher extracted by the intention extraction means A similar utterance detection unit for detecting a similar utterance candidate having an intention similar to the utterance; and a similar utterance candidate display unit for visualizing the similar utterance candidate detected by the similar utterance detection unit on a display device. A conference information recording / reproducing apparatus characterized by the above-mentioned.

3. The conference information recording / reproducing apparatus according to claim 2, wherein the instruction intention extracting means includes:
A conference information recording / reproducing apparatus for extracting a searcher's intention from values of four attributes of a speaker name, a speech time, a preceding speaker name, and a subsequent speaker name.

4. The conference information recording / reproducing apparatus according to claim 2, wherein the similar utterance detection means includes: an instruction input utterance intention extracted by the instruction intention extraction means; Statement similarity calculating means for calculating a similarity to the statement; and a statement similarity determining whether the similarity calculated by the statement similarity calculating means has a similarity greater than or equal to a predetermined value. A meeting information recording / reproducing apparatus, comprising: a determination unit; and detecting the similar utterance candidate based on a determination result of the utterance similarity determination unit.

5. The conference information recording / reproducing apparatus according to claim 2, wherein the searcher can give an instruction of a playback section by the instruction input means, and the intention extracting means performs a playback action of the searcher. Conference information recording, comprising: playback operation monitoring means for monitoring; and playback intention extraction means for extracting a search intention of a searcher from predetermined attribute values relating to a series of utterance groups in a reproduced voice data section. Playback device.

6. The conference information recording / reproducing apparatus according to claim 5, wherein the attribute value used by the reproduction intention extraction means is an instruction intention or a stop of a reproduction start utterance of a series of utterance groups of the reproduced audio data section. A conference information recording / reproducing apparatus characterized by a speaker name, a total number of speeches, a total speech time, a speaker set, and a speech transition matrix.

7. The conference information recording / reproducing apparatus according to claim 5, wherein the similar utterance detecting means uses the reproduction intention from the reproduction intention extracting means to output another series of utterances in the utterance structure table. For a group, a statement structure similarity calculating means for calculating a similarity of a statement structure, and whether the similarity of the statement structure calculated by the statement structure similarity calculating means has a similarity greater than or equal to a predetermined value. A meeting information recording / reproducing apparatus, comprising: statement structure similarity determining means for determining whether the similar statement structure candidate is detected based on the determination result of the statement structure similarity determining means.

8. The conference information recording / reproducing apparatus according to claim 5, wherein said similar utterance detecting means automatically selects a similar utterance detecting means and a similar utterance structure detecting means in accordance with the state of the reproduced utterance. A conference information recording / reproducing apparatus comprising a degree determination method selecting means.

9. The conference information recording / reproducing apparatus according to claim 2, wherein said similar utterance candidate display means includes: an entire conference time display area for visualizing chronological information of conference times; and a reduced view of a plurality of utterance structures. And a similar section reduced display area for displaying a similar section. In the total meeting time display area, a reproduction section determined by an input instruction of the searcher from the instruction input device and a similar candidate of the reproduction section are provided. Means for displaying the existing section as a partial display area on the time series; and the similar candidate reduced figure display area includes a similarity obtained by reducing the comment structure of the section of the partial display area displayed in the entire conference time display area. List display means for displaying a list of the reduced candidate figures as many as the number of the partial display areas, and one of the plurality of reduced similar candidate candidates displayed in the list is displayed by the searcher. By detecting that shown 択指, meeting information recording and reproducing apparatus characterized in that it comprises a means for reproducing the audio data of the selected designated interval.