JP2004279897A

JP2004279897A - Method, device, and program for voice communication record generation

Info

Publication number: JP2004279897A
Application number: JP2003073455A
Authority: JP
Inventors: Eriko Sano; 恵利子佐野; Yoshihiko Hirakawa; 義彦平川; Akio Kameda; 明男亀田; Shinichiro Takagi; 伸一郎高木
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2003-03-18
Filing date: 2003-03-18
Publication date: 2004-10-07

Abstract

<P>PROBLEM TO BE SOLVED: To provide the minutes of a remote communication conference whose main point is made clear under conditions that a person who requires the minutes desires when the minutes are provided. <P>SOLUTION: A method for generating voice communication record includes the stages of: recognizing speech information from a speech storage means of storing speech information from at least two or more points together with time information when the speech information is inputted and converting it into text information; deciding information on the time when the text information includes at lest one topic keyword from a topic keyword storage means storing top keywords or its synonym and recording the time information and topic keyword in a recording means while making them correspond to each other; deciding time information when the topic keyword or its synonym is recorded in the recording medium among pieces of inputted retrieval request information; and outputting the decided time information or speech information corresponding to a specified time section including the time information from the speech storage means. <P>COPYRIGHT: (C)2005,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、音声通信記録生成方法、装置及びプログラムに関する。特に、インターネットを利用した遠隔通信会議において、取得された音声情報等の内容を、利用者が容易に検索できる記録を作成するものに関する。
【０００２】
【従来の技術】
従来、音声又は画像等のマルチメディア情報を用いた会議システムにおいて、会議中の重要な項目を短時間で簡便に編集できる装置があった（例えば、特許文献１参照）。
【０００３】
この装置は、会議によって得られたマルチメディア情報の時間関係を解析し、発話者を識別し、マルチメディア情報からキーワードを抽出する。特に、音声情報に対しては音声認識処理によりキーワードを抽出する。また、発話者の識別は、音声情報が入力された装置識別番号及び特徴抽出処理によって行う。更に、この装置は、そのキーワード及びその発話者等に重み付けすることにより、会議の重要度を時系列的に表す検索用ファイルを蓄積する。これら処理は、会議中にリアルタイムに行われる。これにより、会議途中に現れたキーワード、参照資料、資料参照者又は発話者、会議の時間帯等に基づいて、重要な項目を含むシーンを絞り込んでいくことが可能となり、会議録作成に要する時間が大幅に削減できる。
【０００４】
また、議事録作成を支援する仕組みとしては、複数の発話者の音声情報が混在している混在データを、発話者毎の音声情報に分離し、発話者を特定して音声議事録を作成する方法もあった（例えば、特許文献２参照）。
【０００５】
更に、音声や映像のみならず、グラフィック情報をも記録し、要求に応じて要点だけを送り出す、コンサルテーション・カンファレンスシステムも実現されている（例えば、特許文献３参照）。
【０００６】
【特許文献１】
特許第３１８５５０５号公報
【特許文献２】
特開２００３−５７９０号公報（特願２００１−１９１２８９号）
【特許文献３】
特開平５−２９８３４０号公報（特願平４−９６７８３号）
【０００７】
【発明が解決しようとする課題】
特許文献１によれば、会議録の作成は、会議録作成者による検索条件の入力により行うことを想定している。特に、所望するデータに到達できなかった場合、更なる検索条件の入力を会議録作成者に要求する。従って、会議録作成者による唯一の詳細な会議録を作成することはできる。しかし、会議参加者それぞれによって、所望される会議録の要点は異なる場合も多い。即ち、特許文献１によれば、それぞれの会議参加者に対して異なる会議録を提供することはできなかった。
【０００８】
尚、特許文献２によれば発話者特定に精度の障壁があり、特許文献３によればインデクッス機能がなく利便性に欠けるという難点もある。
【０００９】
即ち、従来の技術によれば、会議録作成者のみの便宜を図るものであり、会議録を必要とする者の所望条件によって要点がまとめられた会議録を提供するものではなかった。
【００１０】
そこで、本発明は、会議録を提供するに際し、会議録を必要とする者の所望条件によって要点がまとめられた会議録を提供することを目的とする。
【００１１】
【課題を解決するための手段】
本発明は、遠隔通信会議について、それぞれの参加者によって用いられる２以上の端末と、該会議における音声情報を記録し且つ会議録を生成する音声通信記録生成装置とを用いた音声通信記録生成方法、装置及プログラムに関する。
【００１２】
本発明による音声通信記録生成方法によれば、
少なくとも２以上の各地点からの音声情報と、当該音声情報が入力された時刻情報とともに蓄積する音声蓄積手段から、音声情報を音声認識してテキスト情報に変換する過程と、
話題キーワードを蓄積する話題キーワード蓄積手段からの話題キーワードの少なくとも１つ又はその類義語をテキスト情報が含む時刻情報を判定し、当該時刻情報と話題キーワードとを対応付けて記録手段に記録する過程と、
入力された検索要求情報のうち記録手段に話題キーワード又はその類義語が記録されている時刻情報を判定する過程と、
当該判定された時刻情報又は当該時刻情報を含む所定の時刻区間に対応する音声情報を音声蓄積手段から出力する過程と、を有する。
【００１３】
また、本発明の音声通信記録生成方法における他の実施形態によれば、音声情報の発話区間及び無音区間を検出する過程を有するものであってもよい。
【００１４】
更に、本発明の音声通信記録生成方法における他の実施形態によれば、話題キーワードに対する類義語を予め蓄積した類義語蓄積手段を更に用いて類義語を検索し、話題キーワードと共に進行位置をテキスト情報にマーク付けすることも好ましい。
【００１５】
更に、本発明の音声通信記録生成方法における他の実施形態によれば、音声情報に会議特定情報が付与されており、音声情報に他の情報が紐付けられていることも好ましい。
【００１６】
本発明の音声通信記録生成装置によれば、
少なくとも２以上の各地点からの音声情報と、当該音声情報が入力された時刻情報とともに蓄積する音声蓄積手段から、音声情報を音声認識してテキスト情報に変換する手段と、
話題キーワードを蓄積する話題キーワード蓄積手段からの話題キーワードの少なくとも１つ又はその類義語をテキスト情報が含む時刻情報を判定し、当該時刻情報と話題キーワードとを対応付けて記録手段に記録する手段と、
入力された検索要求情報のうち記録手段に話題キーワード又はその類義語が記録されている時刻情報を判定する手段と、
当該判定された時刻情報又は当該時刻情報を含む所定の時刻区間に対応する音声情報を音声蓄積手段から出力する手段と、を有する。
【００１７】
また、本発明の音声通信記録生成装置における他の実施形態によれば、音声情報の発話区間及び無音区間を検出する手段を有するものであってもよい。
【００１８】
更に、本発明の音声通信記録生成装置における他の実施形態によれば、話題キーワードに対する類義語を予め蓄積した類義語蓄積手段を更に用いて類義語を検索し、話題キーワードと共に進行位置をテキスト情報にマーク付けすることも好ましい。
【００１９】
更に、本発明の音声通信記録生成装置における他の実施形態によれば、音声情報に会議特定情報が付与されており、音声情報に他の情報が紐付けられているものであってもよい。
【００２０】
本発明の音声通信記録生成プログラムによれば、
少なくとも２以上の各地点からの音声情報と、当該音声情報が入力された時刻情報とともに蓄積する音声蓄積手段から、音声情報を音声認識してテキスト情報に変換する過程と、
話題キーワードを蓄積する話題キーワード蓄積手段からの話題キーワードの少なくとも１つ又はその類義語をテキスト情報が含む時刻情報を判定し、当該時刻情報と話題キーワードとを対応付けて記録手段に記録する過程と、
入力された検索要求情報のうち記録手段に話題キーワード又はその類義語が記録されている時刻情報を判定する過程と、
当該判定された時刻情報又は当該時刻情報を含む所定の時刻区間に対応する音声情報を音声蓄積手段から出力する過程としてコンピュータを実行させる。
【００２１】
また、本発明の音声通信記録生成プログラムにおける他の実施形態によれば、音声情報の発話区間及び無音区間を検出する過程を有するようにコンピュータを実行させるものであってもよい。
【００２２】
更に、本発明の音声通信記録生成プログラムにおける他の実施形態によれば、話題キーワードに対する類義語を予め蓄積した類義語蓄積手段を更に用いて類義語を検索し、話題キーワードと共に進行位置をテキスト情報にマーク付けするようにコンピュータを実行させることも好ましい。
【００２３】
更に、本発明の音声通信記録生成プログラムにおける他の実施形態によれば、音声情報に会議特定情報が付与されており、音声情報に他の情報が紐付けられているようにコンピュータを実行させることも好ましい。
【００２４】
【発明の実施の形態】
以下で、図面を用いて、本発明の実施の形態を説明する。
【００２５】
図１は、本発明におけるシステム構成図である。
【００２６】
図１によれば、本発明における音声通信記録サーバ１と、参加者の端末２、３及び４とが、インターネット５を介して接続されている。会議参加者は、それぞれの端末２、３及び４を用いて、インターネット５を介して遠隔通信会議を行うことができる。ここでは、参加者ＩＤｘｘｘ、ｙｙｙ及びｚｚｚの参加者が、遠隔通信会議に参加している。
【００２７】
端末２、３及び４それぞれには、音声情報を取得するマイク２１、３１及び４１と、映像情報を取得するビデオカメラ２２、３２及び４２とが備えられている。
【００２８】
会議における端末間での音声情報等の交換は、音声通信記録サーバ１を経由して配信されるものであってもよいし、音声通信記録サーバ１と会議参加者の端末とに同報的に配信されるものであってもよい。
【００２９】
また、音声情報等は、会議の開始から終了まで、各端末によって音声情報等がファイル形式で蓄積されるものであってもよい。この場合、一方では、会議終了後、端末２、３及び４それぞれによって記録された音声情報のファイルを、一度に、音声通信記録サーバ１へ送信する方法がある。他方では、会議中に、発言毎にファイル形式にした音声情報等を、逐次、音声通信記録サーバ１へ送信する方法もある。音声情報等が発言毎に区分されることにより、細かい検索条件に対応することが可能となる。このとき、各端末で無音区間を検出し、有音区間の音声情報のみを、音声通信記録サーバ１へ送信することも好ましい。
【００３０】
これに対し、音声情報等は、会議の開始から終了まで、各端末から音声通信記録サーバ１へストリーミング形式で送信されるものであってもよい。この場合、音声通信記録サーバ１において、発話者毎及び／又は発言毎等によって音声情報等が区分される。このとき、音声通信記録サーバ１において、無音区間を検出し、有音区間の音声情報のみを抽出することも好ましい。
【００３１】
尚、本発明は、会議終了後に会議内容の編集及び検索等を行うことを意図するものである。従って、実施形態においては、複数の端末による通信会議を想定しているが、１箇所の装置に音声情報等を集約して、計算機による処理を行うことも現実的である。
【００３２】
図２は、本発明における音声通信記録サーバ１の機能構成図である。
【００３３】
図２によれば、音声通信記録サーバ１は、音声情報データベース１０（音声蓄積手段）と、映像情報等データベース１１と、タイムスタンプ１２と、会議ＩＤ決定部１３と、音声認識処理部１４と、議事録情報データベース１５（記録手段）と、要点検索部１６と、会議録生成部１７と、類義語データベース１８と、通信インタフェース１９とを有する。
【００３４】
インターネット５には、通信インタフェース１９を介して接続される。
【００３５】
会議ＩＤ決定部１３は、その会議の「会議名」「参加者名」等を含む会議開始登録メッセージを受信し、それに対応する「会議ＩＤ」「参加者ＩＤ」（会議特定情報）を決定し、参加者の端末へ配信する。これらＩＤを、端末から受信する音声情報及び映像情報に付することができる。
【００３６】
タイムスタンプ部１２は、通信インタフェース１９によって受信された音声情報及び映像情報に時刻を付する。
【００３７】
受信された音声情報は、参加者ＩＤ毎に、音声情報データベース１０に蓄積される。また、映像情報は、参加者ＩＤ毎に、映像情報等データベース１１に蓄積される。ここで、音声情報と映像情報とは、タイムスタンプにより、時間で紐付けられている。尚、映像情報とは、参加者の用いる端末に備えられたビデオカメラから取得されたものに限られず、会議録で用いる文書資料データ又は投影資料データ等の、視覚的効果を有する資料データであってもよい。
【００３８】
また、音声情報データベース１０は、当該音声情報における発話区間と無音区間とを検出し、発話区間のみの音声情報を蓄積する。これを実現する方法としては、例えば、特許２５９０１９３号公報がある。
【００３９】
音声情報データベース１０に蓄積された音声情報は、次に、音声認識処理部１４において、テキスト情報に変換される。このとき、テキスト情報について会議の話題の進行位置を意味する「特徴語」を検索する。更に、その「特徴語」に相当する「話題キーワード」を検索する。そして、話題キーワードと共に進行位置をテキスト情報にマーク付けする。例えば、発言の開始と終了とのタイミングにマーク付けをする。議事録情報データベース１５には、「話題キーワード」と、その議題を導く「特徴語」とを含む文型データが記録される。
【００４０】
例えば、「特徴語」として「始めます」「お話します」「次は」が登録されている場合、以下のテキスト情報が得られたとする。
（１）現在の研究開発についての議論を「始めます」。
（２）「次は」、今後の研究開発について「お話します」。
（３）「次は」、研究費についての議論を「始めます」。
このとき、「特徴語」に相当する「話題キーワード」として、「現在の研究開発」「今後の研究開発」「研究費」が得られる。
【００４１】
音声認識処理部１４によって抽出された、マーク付きのテキスト情報は、議事録情報データベース１５に蓄積される。
【００４２】
端末は、音声通信記録サーバ１に対して、「話題キーワード」に基づく要点のみを記録した会議録を要求することができる。このとき、当該端末は、音声通信記録サーバ１に対して、「話題キーワード」を含む会議録要求メッセージを送信する。
【００４３】
要点検索部１６は、通信インタフェース１９によって受信された会議録要求メッセージを取得する。そして、要点検索部１６は、「話題キーワード」に基づいて、議事録情報データベース１５を検索する。例えば、要求された「話題キーワード」が「今後の研究開発」であれば、議事録情報データベース１５から、「今後の研究開発」に相当するマークを検索し、その開始及び終了等の進行位置の情報を得ることができる。
【００４４】
会議録要求メッセージには、会議録の要点の条件として、「話題キーワード」、「参加者ＩＤ（発話者毎）」及び「発話時刻」に限られず、「自由テキスト情報」又は「自由音声情報」であってもよい。「自由テキスト情報」とは、会議録要求者が取得したいと思う内容を記載した文章のテキスト情報をいう。音声通信記録サーバ１によって、そのテキスト情報から話題キーワード等を抽出することができる。また、「自由音声情報」とは、会議録要求者が取得したいと思う内容を録音した音声情報をいう。音声通信記録サーバ１によって、その音声情報を音声認識処理することでテキスト情報を抽出し、そのテキスト情報から話題キーワード等を抽出することができる。
【００４５】
要点検索部１６は、類義語データベース１８を参照することもできる。例えば、特徴語「始めます」は、「開始します」「開きます」「行います」と類義であるとする。これら情報が、類義語データベース１８に体系的に構成されて蓄積されている。従って、特徴語の類義語についても、要点検索部１６は、議事録情報データベース１５を検索することができる。もちろん、類義語データベース１８は、話題キーワードの類義語についても検索できることが好ましい。
【００４６】
会議録生成部１７は、要点検索部１６によって検索されたテキスト情報に基づいて、マーク位置及びその時刻情報に基づいて、音声情報データベース１０と、映像情報等データベース１１とを検索する。会議録生成部１７は、検索されたテキスト情報に付された時刻情報に相当する音声情報及び映像情報を取得する。そして、ＨＴＭＬ形式のＡＶ（ＡｕｄｉｏａｎｄＶｉｓｕａｌ）会議録を生成し、その会議録を、会議録要求メッセージを送信した端末へ返送する。
【００４７】
これにより、会議録要求メッセージを送信した端末は、要求した「話題キーワード」に基づく会議録のみを受信することができ、その端末によって、その会議状況における音声及び映像を再生することができる。
【００４８】
図３は、本発明による音声通信記録サーバと端末との間のシーケンス図である。
【００４９】
図３によれば、以下のシーケンスで進行する。
（Ｓ３０）遠隔通信会議を開始する際に、ある端末が、会議開始登録メッセージを、音声通信記録サーバ１へ送信する。この会議開始登録メッセージには、これから始める会議の「会議名」「参加者名」「開始時刻」等が含まれている。音声通信記録サーバ１は、会議ＩＤ決定部１３によって、これら情報から会議ＩＤ及び参加者ＩＤを決定する。
（Ｓ３１）音声通信記録サーバ１は、会議ＩＤ及び参加者ＩＤを、参加者の各端末２、３及び４へ配信する。
（Ｓ３２）各端末２、３及び４は、最初に、音声通信記録サーバ１との間で時刻合わせを行う。尚、端末及び音声通信記録サーバ１のそれぞれが、インターネットにおけるＮＴＰ（ＮｅｔｗｏｒｋＴｉｍｅＰｒｏｔｏｃｏｌ）サーバ又はＳＮＴＰ（ＳｉｍｐｌｅＮｅｔｗｏｒｋＴｉｍｅＰｒｏｔｏｃｏｌ）サーバにアクセスして、内蔵時計を一致させるものであってもよい。
（Ｓ３３）端末２、３及び４は、インターネット５を介して会議を始める。この会議における音声情報は、各端末のマイクから取得され、映像情報は、各端末のビデオカメラによって取得される。会議ＩＤ及び参加者ＩＤが付与された音声情報及び映像情報は、音声通信記録サーバ１へ配信される。
（Ｓ３４）遠隔通信会議を終了する際に、ある端末が、会議終了登録メッセージを、音声通信記録サーバ１へ送信する。
（Ｓ３５）その後、ある参加者が、所望の「話題キーワード」に基づく会議録を必要とする場合がある。このとき、その参加者の操作によって、当該端末が、音声通信記録サーバ１へ、会議録要求メッセージを送信する。
尚、会議録要求メッセージを送信する端末は、必ずしも会議の参加者の端末（音声情報等を送信する端末）に限られない。
（Ｓ３６）音声通信記録サーバ１は、会議録要求メッセージに含まれる「キーワード」に基づいて、音声情報及び映像情報を組み合わせた、ＨＴＭＬ形式のＡＶ会議録を作成する。そして、その会議録を、会議録要求メッセージを送信した端末へ送信する。
【００５０】
図４は、本発明における音声通信記録サーバが、会議の音声情報及び映像情報を受信した際のフローチャートである。
【００５１】
図４によれば、以下のシーケンスで進行する。
（Ｓ４０）端末から、会議開始登録メッセージを受信する。
（Ｓ４１）会議開始登録メッセージに含まれる「会議名」及び「参加者名」等の情報に基づいて、「会議ＩＤ」及び「参加者ＩＤ」を決定し、これらＩＤを参加者の端末へ配信する。
（Ｓ４２）端末から、会議中の音声情報及び映像情報を受信する。
（Ｓ４３）その音声情報及び映像情報に、現在の時刻をスタンプする。これは、ストリーミング形式の場合に有効である。これに対し、ファイル形式の場合、端末において時刻がファイルにスタンプされていれば、ここで時刻をスタンプする必要はない。
（Ｓ４４）受信した音声・映像情報を、音声情報と映像情報とで区別する。
（Ｓ４５）音声情報は、音声情報データベース１０に、参加者毎に蓄積される。
（Ｓ４６）映像情報は、映像情報等データベース１１に、参加者毎に蓄積される。
（Ｓ４７）音声情報は、音声認識処理によって、テキスト情報に変換される。このとき、テキスト情報について会議の話題の進行位置を意味する特徴語を検索し、その特徴語に相当する話題キーワードを検索し、話題キーワードと共に進行位置をテキスト情報にマーク付けする。このとき、特徴語に対する類義語を予め蓄積した類義語データベース１８を用いる。これを用いて、その特徴語に対する類義語を検索し、その類義語に相当する話題キーワードを検索する。
（Ｓ４８）そのテキスト情報は、議事録情報データベース１５に、参加者毎に蓄積される。
【００５２】
図５は、本発明における音声通信記録サーバが、会議録要求メッセージを受信した際のフローチャートである。
【００５３】
図５によれば、以下のシーケンスで進行する。
（Ｓ５１）端末から、会議録要求メッセージを受信する。この会議録要求メッセージには、「話題キーワード」が含まれている。
（Ｓ５２）議事録情報データベース１５から、「話題キーワード」に基づくテキスト情報を検索する。
（Ｓ５３）検索されたテキスト情報の時刻情報を特定する。
（Ｓ５４）特定された時刻情報に相当する音声情報及び映像情報を、音声情報データベース１０及び映像情報等データベース１１から取得する。
（Ｓ５５）取得された音声情報及び映像情報からなる会議録を生成する。ここで、会議録は、ＨＴＭＬ形式のものである。これにより、マルチメディア会議録を提供することができる。
（Ｓ５６）生成された会議録を、会議録要求メッセージを送信した端末へ、送信する。
【００５４】
本発明の音声通信記録再生方法における各過程は、計算機に内蔵された記録媒体を用い、ＣＰＵ等の制御手段を用いて実行可能である。また、計算機読み取り可能なプログラムをＣＤ等の記録媒体若しくは通信回線を介してインストールして当該計算機に実行させることもできる。これらプログラムは、主に、インターネットにおけるサーバの一機能として、サーバに搭載されるプログラムによって実現されてもよい。もちろん、これら機能は、端末に搭載されるプログラムによっても実現され、Ｐｅｅｒ−ｔｏ−Ｐｅｅｒ型で利用することもできる。
【００５５】
【発明の効果】
以上、詳細に説明したように、本発明の音声通信記録生成方法、装置及びプログラムによれば、会議録を提供するに際し、会議録を必要とする者の所望する条件（話題キーワード又は発話者等）によって要点がまとめられた会議録を提供することができる。特に、インターネットを利用した遠隔通信会議において、本発明における議事録の作成は、既存のサービスに付加価値を加えることになる。
【００５６】
これにより、会議終了後、聞き直したい特定発話者の発言（例えば社長の発言等）、遅刻したために聞き逃した会議の頭部の討議内容、又は、会議の総括として各議題のまとめ部分のみのレビュー等を取得することができる。特に、会議の全容を取得する必要なく、所望の条件に応じた必要箇所のみのコンパクトな会議録を取得することができる。
【図面の簡単な説明】
【図１】本発明におけるシステム構成図である。
【図２】本発明における音声通信記録サーバの機能構成図である。
【図３】本発明による音声通信記録サーバと端末との間のシーケンス図である。
【図４】本発明における音声通信記録サーバが、会議の音声情報及び映像情報を受信した際のフローチャートである。
【図５】本発明における音声通信記録サーバが、会議録要求メッセージを受信した際のフローチャートである。
【符号の説明】
１音声通信記録サーバ、音声通信記録装置
１０音声情報データベース、音声蓄積手段
１１映像情報等データベース
１２タイムスタンプ部
１３会議ＩＤ決定部
１４音声認識処理部
１５議事録情報データベース、記憶手段
１６要件検索部
１７会議録生成部
１８類義語データベース、類義語蓄積手段
１９通信インタフェース部
２、３、４端末
２１、３１、４１マイク
２２、３２、４２ビデオカメラ
５インターネット[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a voice communication record generation method, an apparatus, and a program. In particular, the present invention relates to a method for creating a record that allows a user to easily retrieve the contents of acquired voice information and the like in a telecommunication conference using the Internet.
[0002]
[Prior art]
2. Description of the Related Art Conventionally, in a conference system using multimedia information such as audio or images, there has been a device that can easily edit important items during a conference in a short time (for example, see Patent Document 1).
[0003]
This apparatus analyzes the temporal relationship of multimedia information obtained by a conference, identifies a speaker, and extracts a keyword from the multimedia information. In particular, for voice information, keywords are extracted by voice recognition processing. The speaker is identified by the device identification number to which the voice information is input and the feature extraction process. Further, the apparatus accumulates a search file indicating the importance of the conference in a time series by weighting the keyword and the speaker. These processes are performed in real time during the meeting. This makes it possible to narrow down scenes containing important items on the basis of keywords, reference materials, material referrers or speakers, time periods of meetings, and the like that appear during the meeting, and the time required for creating meeting minutes Can be greatly reduced.
[0004]
In addition, as a mechanism for supporting minutes creation, mixed data in which voice information of multiple speakers is mixed is separated into voice information for each speaker, and the voice minutes are created by specifying the speaker. There was also a method (for example, see Patent Document 2).
[0005]
Further, a consultation conference system that records not only audio and video but also graphic information and sends out only essential points in response to requests has been realized (for example, see Patent Document 3).
[0006]
[Patent Document 1]
Japanese Patent No. 3185505 [Patent Document 2]
JP-A-2003-5790 (Japanese Patent Application No. 2001-191289)
[Patent Document 3]
JP-A-5-298340 (Japanese Patent Application No. 4-96883)
[0007]
[Problems to be solved by the invention]
According to Patent Literature 1, it is assumed that the creation of a conference record is performed by a search creator inputting search conditions. In particular, if the desired data cannot be reached, a request is made to the transcript creator to input further search conditions. Therefore, it is possible to create only one detailed minutes by the minutes creator. However, the desired points of the minutes are often different for each conference participant. That is, according to Patent Literature 1, different conference minutes cannot be provided to each conference participant.
[0008]
In addition, according to Patent Literature 2, there is a barrier of accuracy in speaker identification, and according to Patent Literature 3, there is also a disadvantage that there is no index function and lacks convenience.
[0009]
That is, according to the conventional technique, the conference minutes are prepared only for the convenience of the conference creator, and the conference minutes in which the main points are summarized according to the desired conditions of the person who needs the conference minutes are not provided.
[0010]
In view of the above, an object of the present invention is to provide a conference record in which the main points are summarized according to desired conditions of a person who needs the conference record.
[0011]
[Means for Solving the Problems]
The present invention relates to a telecommunications conference, a method for generating a voice communication record using two or more terminals used by each participant, and a voice communication record generation device for recording voice information in the conference and generating a meeting record. , Devices and programs.
[0012]
According to the voice communication record generation method according to the present invention,
Voice information from at least two or more points, and a voice storage means for storing the voice information together with the input time information, voice-recognizing the voice information and converting it into text information;
Determining time information in which the text information includes at least one of the topic keywords or a synonym thereof from the topic keyword storage unit that accumulates the topic keywords, and records the time information in the recording unit in association with the topic keywords;
A step of determining time information at which the topic keyword or a synonym thereof is recorded in the recording unit of the input search request information;
Outputting the determined time information or voice information corresponding to a predetermined time section including the time information from the voice storage means.
[0013]
Further, according to another embodiment of the voice communication record generation method of the present invention, the method may include a step of detecting a speech section and a silent section of voice information.
[0014]
Further, according to another embodiment of the voice communication record generation method of the present invention, a synonym is further searched for by using a synonym storage unit that previously stores synonyms for the topic keyword, and the progress position is marked on the text information together with the topic keyword. It is also preferable to do so.
[0015]
Further, according to another embodiment of the voice communication record generation method of the present invention, it is preferable that the conference information is added to the voice information, and the voice information is linked to other information.
[0016]
According to the voice communication record generation device of the present invention,
Means for converting voice information into text information by voice recognition from voice storage means for storing voice information from at least two or more points and time information at which the voice information is input;
Means for determining time information including at least one of the topic keywords or synonyms thereof from the topic keyword storage means for accumulating topic keywords in the text information, and recording the time information and the topic keywords in the recording means in association with each other;
Means for determining the time information at which the topic keyword or a synonym thereof is recorded in the recording means of the input search request information;
Means for outputting the determined time information or voice information corresponding to a predetermined time section including the time information from the voice storage means.
[0017]
According to another embodiment of the voice communication record generation device of the present invention, the voice communication record generation device may include a unit for detecting a speech section and a silent section of voice information.
[0018]
Further, according to another embodiment of the voice communication record generation device of the present invention, a synonym is further searched for by using a synonym storage means in which a synonym for the topic keyword is stored in advance, and the progress position is marked on the text information together with the topic keyword. It is also preferable to do so.
[0019]
Further, according to another embodiment of the audio communication record generation device of the present invention, the conference information may be added to the audio information, and the audio information may be linked to other information.
[0020]
According to the voice communication record generation program of the present invention,
Voice information from at least two or more points, and a voice storage means for storing the voice information together with the input time information, voice-recognizing the voice information and converting it into text information;
Determining time information in which the text information includes at least one of the topic keywords or a synonym thereof from the topic keyword storage means for accumulating the topic keywords, and recording the time information in the recording means in association with the topic keywords;
A step of determining time information at which the topic keyword or a synonym thereof is recorded in the recording unit of the input search request information;
The computer is executed as a process of outputting the determined time information or voice information corresponding to a predetermined time section including the time information from the voice storage unit.
[0021]
According to another embodiment of the voice communication record generation program of the present invention, the computer may be executed so as to have a process of detecting a speech section and a silent section of voice information.
[0022]
Further, according to another embodiment of the voice communication record generation program of the present invention, a synonym is further searched for by using a synonym storage unit that previously stores synonyms for the topic keyword, and the progress position is marked on the text information together with the topic keyword. It is also preferable to execute the computer in such a manner as to execute the above.
[0023]
Furthermore, according to another embodiment of the voice communication record generation program of the present invention, the conference specific information is added to the voice information, and the computer is executed such that the voice information is linked to other information. Is also preferred.
[0024]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
[0025]
FIG. 1 is a system configuration diagram according to the present invention.
[0026]
According to FIG. 1, a voice communication recording server 1 according to the present invention is connected to participants' terminals 2, 3 and 4 via the Internet 5. The conference participants can hold a telecommunication conference via the Internet 5 using the respective terminals 2, 3 and 4. Here, participants with participant IDxxx, yyy, and zzz are participating in the telecommunication conference.
[0027]
Each of the terminals 2, 3 and 4 includes a microphone 21, 31 and 41 for acquiring audio information, and video cameras 22, 32 and 42 for acquiring video information.
[0028]
The exchange of voice information and the like between terminals in the conference may be delivered via the voice communication recording server 1 or may be broadcast to the voice communication recording server 1 and the terminals of the conference participants. It may be distributed.
[0029]
The audio information or the like may be one in which the audio information or the like is stored in a file format by each terminal from the start to the end of the conference. In this case, on the other hand, there is a method of transmitting a file of audio information recorded by each of the terminals 2, 3 and 4 to the audio communication recording server 1 at a time after the conference. On the other hand, there is a method of sequentially transmitting voice information or the like in a file format for each utterance to the voice communication recording server 1 during a conference. By classifying voice information and the like for each utterance, it is possible to respond to detailed search conditions. At this time, it is also preferable that each terminal detects a silent section and transmits only the voice information of the voiced section to the voice communication recording server 1.
[0030]
On the other hand, the audio information or the like may be transmitted from each terminal to the audio communication recording server 1 in a streaming format from the start to the end of the conference. In this case, in the voice communication recording server 1, voice information and the like are classified according to each speaker and / or each utterance. At this time, it is also preferable that the audio communication recording server 1 detects a silent section and extracts only audio information of a sound section.
[0031]
It should be noted that the present invention intends to edit and search the contents of the meeting after the meeting. Therefore, in the embodiment, a communication conference by a plurality of terminals is assumed, but it is also realistic to aggregate voice information and the like in one device and perform processing by a computer.
[0032]
FIG. 2 is a functional configuration diagram of the voice communication recording server 1 according to the present invention.
[0033]
According to FIG. 2, the voice communication recording server 1 includes a voice information database 10 (voice storage means), a video information database 11, a time stamp 12, a conference ID determination unit 13, a voice recognition processing unit 14, It has a minutes information database 15 (recording means), a gist search unit 16, a minutes generation unit 17, a synonym database 18, and a communication interface 19.
[0034]
The Internet 5 is connected via a communication interface 19.
[0035]
The conference ID determination unit 13 receives the conference start registration message including the “conference name”, “participant name”, and the like of the conference, and determines the corresponding “conference ID” and “participant ID” (conference specific information). , And deliver to the participants' terminals. These IDs can be attached to audio information and video information received from the terminal.
[0036]
The time stamp unit 12 attaches a time to the audio information and the video information received by the communication interface 19.
[0037]
The received voice information is stored in the voice information database 10 for each participant ID. The video information is stored in the video information database 11 for each participant ID. Here, the audio information and the video information are linked by time using a time stamp. Note that the video information is not limited to information obtained from a video camera provided in a terminal used by a participant, but is material data having a visual effect such as document material data or projection material data used in a meeting record. You may.
[0038]
Further, the voice information database 10 detects an utterance section and a silent section in the voice information, and accumulates voice information of only the utterance section. As a method for realizing this, there is, for example, Japanese Patent No. 2590193.
[0039]
Next, the voice information stored in the voice information database 10 is converted into text information in the voice recognition processing unit 14. At this time, for the text information, a “feature word” meaning the progress position of the topic of the meeting is searched. Further, a “topic keyword” corresponding to the “feature word” is searched. Then, the progress position is marked on the text information together with the topic keyword. For example, the timing of the start and end of a comment is marked. The minutes information database 15 records sentence pattern data including “topic keywords” and “feature words” that guide the agenda.
[0040]
For example, if "start", "speak", and "next" are registered as "feature words", it is assumed that the following text information is obtained.
(1) "Begin" the current R & D discussion.
(2) “Next,” I will talk about future research and development.
(3) “Next”, “Discussion” on research expenses will begin.
At this time, “current research and development”, “future research and development”, and “research expenses” are obtained as “topic keywords” corresponding to “characteristic words”.
[0041]
The marked text information extracted by the voice recognition processing unit 14 is stored in the minutes information database 15.
[0042]
The terminal can request the meeting record in which only the key points based on the “topic keyword” are recorded from the voice communication recording server 1. At this time, the terminal transmits a conference minutes request message including the “topic keyword” to the voice communication recording server 1.
[0043]
The key point search unit 16 acquires the conference record request message received by the communication interface 19. Then, the gist search unit 16 searches the minutes information database 15 based on the “topic keywords”. For example, if the requested "topic keyword" is "future research and development", a mark corresponding to "future research and development" is searched from the minutes information database 15, and the progress position such as start and end of the mark is searched. Information can be obtained.
[0044]
The minutes record request message is not limited to “topic keywords”, “participant IDs (for each speaker)” and “speech time”, but may be “free text information” or “free speech information”. It may be. “Free text information” refers to text information of a sentence that describes the content that the meeting requester wants to obtain. The voice communication recording server 1 can extract topic keywords and the like from the text information. The “free voice information” refers to voice information in which the contents requested by the conference record requester are recorded. The voice communication recording server 1 can extract text information by performing voice recognition processing on the voice information, and can extract a topic keyword or the like from the text information.
[0045]
The gist search unit 16 can also refer to the synonym database 18. For example, the feature word "begin" is synonymous with "begin,""open," and "do." These pieces of information are systematically configured and stored in the synonym database 18. Therefore, the key point search unit 16 can also search the minutes information database 15 for synonyms of the characteristic words. Of course, it is preferable that the synonym database 18 can also search for synonyms of topic keywords.
[0046]
The meeting record generation unit 17 searches the audio information database 10 and the video information database 11 based on the mark position and the time information based on the text information searched by the gist search unit 16. The conference record generation unit 17 acquires audio information and video information corresponding to time information added to the searched text information. Then, it generates an AV (Audio and Visual) meeting record in HTML format, and returns the meeting record to the terminal that transmitted the meeting request message.
[0047]
Thus, the terminal that has transmitted the conference request message can receive only the conference based on the requested “topic keyword”, and the terminal can reproduce the audio and video in the conference situation.
[0048]
FIG. 3 is a sequence diagram between the voice communication recording server and the terminal according to the present invention.
[0049]
According to FIG. 3, it proceeds in the following sequence.
(S30) When starting a remote communication conference, a certain terminal transmits a conference start registration message to the voice communication recording server 1. This conference start registration message includes the “conference name”, “participant name”, “start time”, and the like of the conference to be started. The audio communication recording server 1 determines the conference ID and the participant ID from these pieces of information by the conference ID determination unit 13.
(S31) The voice communication recording server 1 distributes the conference ID and the participant ID to the terminals 2, 3, and 4 of the participants.
(S32) First, each of the terminals 2, 3 and 4 performs time synchronization with the voice communication recording server 1. Note that each of the terminal and the voice communication recording server 1 may access the NTP (Network Time Protocol) server or the SNTP (Simple Network Time Protocol) server on the Internet, and may match the built-in clocks.
(S33) The terminals 2, 3 and 4 start a conference via the Internet 5. The audio information in this conference is obtained from the microphone of each terminal, and the video information is obtained by the video camera of each terminal. The audio information and the video information to which the conference ID and the participant ID have been assigned are distributed to the audio communication recording server 1.
(S34) When ending the telecommunication conference, a certain terminal transmits a conference end registration message to the voice communication recording server 1.
(S35) Thereafter, a participant may need a minutes based on a desired “topic keyword”. At this time, the terminal transmits a conference request message to the voice communication recording server 1 by the operation of the participant.
It should be noted that the terminal transmitting the conference record request message is not necessarily limited to a terminal of a participant of the conference (a terminal transmitting audio information or the like).
(S36) Based on the “keyword” included in the conference record request message, the audio communication recording server 1 creates an HTML format AV conference record combining audio information and video information. Then, the minutes are transmitted to the terminal that transmitted the minutes request message.
[0050]
FIG. 4 is a flowchart when the audio communication recording server according to the present invention receives audio information and video information of a conference.
[0051]
According to FIG. 4, it proceeds in the following sequence.
(S40) A conference start registration message is received from the terminal.
(S41) Based on information such as "conference name" and "participant name" included in the conference start registration message, a "conference ID" and "participant ID" are determined, and these IDs are distributed to the participants' terminals. I do.
(S42) Audio information and video information during the conference are received from the terminal.
(S43) The current time is stamped on the audio information and the video information. This is effective for the streaming format. On the other hand, in the case of the file format, if the time is stamped on the file at the terminal, there is no need to stamp the time here.
(S44) The received audio / video information is distinguished between audio information and video information.
(S45) The voice information is stored in the voice information database 10 for each participant.
(S46) The video information is stored in the video information database 11 for each participant.
(S47) The voice information is converted into text information by voice recognition processing. At this time, for the text information, a feature word meaning the progress position of the topic of the meeting is searched, a topic keyword corresponding to the feature word is searched, and the progress position is marked on the text information together with the topic keyword. At this time, a synonym database 18 in which synonyms for the characteristic words are stored in advance is used. By using this, a synonym for the characteristic word is searched, and a topic keyword corresponding to the synonym is searched.
(S48) The text information is stored in the minutes information database 15 for each participant.
[0052]
FIG. 5 is a flowchart when the voice communication recording server according to the present invention receives a conference request message.
[0053]
According to FIG. 5, the process proceeds in the following sequence.
(S51) A conference record request message is received from the terminal. The conference minutes request message includes a “topic keyword”.
(S52) The minutes information database 15 is searched for text information based on "topic keywords".
(S53) The time information of the searched text information is specified.
(S54) The audio information and the video information corresponding to the specified time information are acquired from the audio information database 10 and the video information database 11.
(S55) A conference record including the acquired audio information and video information is generated. Here, the minutes are in HTML format. As a result, multimedia minutes can be provided.
(S56) The generated minutes are transmitted to the terminal that transmitted the minutes request message.
[0054]
Each step in the voice communication recording / reproducing method of the present invention can be executed by using a recording medium built in a computer and using control means such as a CPU. Also, a computer-readable program can be installed via a recording medium such as a CD or a communication line and executed by the computer. These programs may be mainly realized as a function of a server on the Internet by a program mounted on the server. Of course, these functions are also realized by a program installed in the terminal, and can be used in a peer-to-peer type.
[0055]
【The invention's effect】
As described above in detail, according to the voice communication record generation method, apparatus, and program of the present invention, when providing a conference record, conditions desired by a person who needs the conference record (topic keyword, speaker, etc.) ) Can provide the minutes of the meeting. In particular, in a telecommunication conference using the Internet, the creation of minutes in the present invention adds value to existing services.
[0056]
As a result, after the meeting, the specific speaker who wants to hear again (for example, the statement of the president), the content of the discussion of the head of the meeting that was missed because it was late, or only the summary of each agenda as a summary of the meeting Reviews can be obtained. In particular, it is possible to acquire a compact conference record of only a necessary portion corresponding to a desired condition without acquiring the entire conference.
[Brief description of the drawings]
FIG. 1 is a system configuration diagram according to the present invention.
FIG. 2 is a functional configuration diagram of a voice communication recording server according to the present invention.
FIG. 3 is a sequence diagram between a voice communication recording server and a terminal according to the present invention.
FIG. 4 is a flowchart when the audio communication recording server according to the present invention receives audio information and video information of a conference.
FIG. 5 is a flowchart when the voice communication recording server according to the present invention receives a conference record request message.
[Explanation of symbols]
Reference Signs List 1 voice communication recording server, voice communication recording device 10 voice information database, voice storage means 11 video information etc. database 12 time stamp section 13 meeting ID determination section 14 voice recognition processing section 15 minutes information database, storage means 16 requirement search section 17 Conference record generation unit 18 Synonym database, synonym storage means 19 Communication interface units 2, 3, 4 Terminals 21, 31, 41 Microphones 22, 32, 42 Video camera 5 Internet

Claims

A step of converting the voice information into text information by voice recognition from voice information from at least two or more points and voice storage means for storing the voice information together with the input time information;
Determining time information in which the text information includes at least one topic keyword or a synonym thereof from the topic keyword storage unit that stores the topic keyword, and recording the time information and the topic keyword in the recording unit in association with each other; ,
A step of determining time information at which a topic keyword or a synonym thereof is recorded in the recording unit in the input search request information;
Outputting said determined time information or voice information corresponding to a predetermined time section including said time information from said voice storage means.

The method according to claim 1, further comprising a step of detecting a speech section and a silent section of the voice information.

3. The voice communication record generation method according to claim 1, wherein a synonym is further searched for using a synonym storage unit that previously stores synonyms for the topic keyword, and the progress position is marked on the text information together with the topic keyword. 4.

The audio communication record generation method according to claim 1, wherein conference information is added to the audio information, and other information is linked to the audio information.

Means for converting the voice information into text information by voice recognition from voice information from at least two or more points and voice storage means for storing the voice information together with the input time information;
Means for determining time information in which the text information includes at least one topic keyword or a synonym thereof from the topic keyword storage means for accumulating topic keywords, and recording the time information and the topic keyword in the recording means in association with each other. ,
Means for determining time information at which a topic keyword or a synonym thereof is recorded in the recording means of the input search request information;
A means for outputting the determined time information or voice information corresponding to a predetermined time section including the time information from the voice storage means.

The voice communication record generation device according to claim 5, further comprising means for detecting an utterance section and a silent section of the voice information.

7. The voice communication record generation device according to claim 5, wherein a synonym is further searched for using a synonym storage unit that previously stores synonyms for the topic keyword, and the progress position is marked on the text information together with the topic keyword.

8. The voice communication record generation device according to claim 5, wherein conference specific information is added to the voice information, and other information is linked to the voice information. 9.

A step of converting the voice information into text information by voice recognition from voice information from at least two or more points and voice storage means for storing the voice information together with the input time information;
Determining time information in which the text information includes at least one topic keyword or a synonym thereof from the topic keyword storage unit that stores the topic keyword, and recording the time information and the topic keyword in the recording unit in association with each other; ,
A step of determining time information at which a topic keyword or a synonym thereof is recorded in the recording unit in the input search request information;
A voice communication record generation program for causing a computer to execute the process of outputting the determined time information or voice information corresponding to a predetermined time section including the time information from the voice storage unit.

The voice communication record generation program according to claim 9, wherein the computer is configured to execute a computer having a step of detecting an utterance section and a silent section of the voice information.

11. The computer according to claim 9, wherein a synonym is further searched for using a synonym storage unit that previously stores synonyms for the topic keyword, and a computer is executed so as to mark the text information together with the topic keyword in the progress information. 12. Voice communication record generation program.

The conference specific information is added to the audio information, and a computer is executed so that other information is linked to the audio information. Voice communication record generation program.