JP2005345508A

JP2005345508A - Listening support program, listening support device, and listening support method

Info

Publication number: JP2005345508A
Application number: JP2004161622A
Authority: JP
Inventors: Takeshi Mizunashi; 豪水梨
Original assignee: Fuji Xerox Co Ltd
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2004-05-31
Filing date: 2004-05-31
Publication date: 2005-12-15

Abstract

<P>PROBLEM TO BE SOLVED: To inform a congress participant of the existence of a participant's interesting portion, if any, in the contents of the congress. <P>SOLUTION: A computer is functioned as a keyword memory means for memorizing keyword data indicating keyword and registrant data specifying a registrant registering the keyword data by making both the data correspondent to each other, a data acquisition section 91 for acquiring the speech data as a processing object, a speech recognition section 89 for recognizing the speech expressed by the speech data acquired from the processing object acquisition means as a character string, and a notification section 82 for performing prescribed notification processing to the registrant indicated by the registrant data memorized by the keyword memory means by being made correspondent to the keyword data when the keyword indicated by the keyword data memorized by the keyword memory means is included in the character string. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、聴取支援プログラム、聴取支援装置及び聴取支援方法に関し、例えば遠隔会議において該会議の内容に参加者の興味ある部分があった場合に、該会議参加者に通知する技術に関する。 The present invention relates to a listening support program, a listening support device, and a listening support method. For example, the present invention relates to a technique for notifying a participant of a conference when there is a part of interest in the content of the conference in a remote conference.

従来、ネットワークを介して会議を行う遠隔会議においては、話者と聞き手が遠隔地にいるため、特に多地点会議である場合に、メイン業務を行いつつ、副次的な業務として会議に参加する人が多く見受けられる。 Conventionally, in a remote conference where a conference is performed via a network, since a speaker and a listener are in a remote place, particularly in a multipoint conference, the main task is performed and the conference is performed as a subsidiary task. Many people can be seen.

しかしながら、副次的な業務として遠隔会議に参加していると、遠隔会議の模様を流し続けるのみとなりがちであり、興味ある部分を聞き逃したり、発言の機会を逸したりすることがある。 However, if you participate in a teleconference as a secondary task, you will tend to continue to circulate the teleconference pattern, and you may miss the interesting part or miss the opportunity to speak.

本発明は上記課題に鑑みてなされたものであって、その目的の一つは、会議の内容に参加者の興味ある部分があった場合に、該会議参加者に通知することを可能とする聴取支援プログラム、聴取支援装置及び聴取支援方法を提供することにある。 The present invention has been made in view of the above problems, and one of its purposes is to make it possible to notify a conference participant when there is an interesting part of the conference in the content of the conference. To provide a listening support program, a listening support device, and a listening support method.

上記課題を解決するための本発明に係る聴取支援プログラムは、キーワードを示すキーワードデータと、該キーワードデータを登録した登録者を特定する登録者データと、を対応付けて記憶するキーワード記憶手段、処理対象として音声データを取得する処理対象取得手段、前記処理対象取得手段により取得される前記音声データにより表される音声を文字列として認識する音声認識手段、及び前記文字列に前記キーワード記憶手段により記憶されるキーワードデータにより示されるキーワードが含まれる場合に、該キーワードデータと対応付けて前記キーワード記憶手段により記憶される登録者データが示す登録者に対する所定の通知処理を行う通知手段、としてコンピュータを機能させることを特徴とする。 A listening support program according to the present invention for solving the above-described problem is a keyword storage means for storing keyword data indicating a keyword and registrant data for specifying a registrant who has registered the keyword data in association with each other. Processing target acquisition means for acquiring voice data as a target, voice recognition means for recognizing a voice represented by the voice data acquired by the processing target acquisition means as a character string, and storing the character string in the keyword storage means When a keyword indicated by the keyword data is included, the computer functions as a notification means for performing a predetermined notification process for the registrant indicated by the registrant data stored in the keyword storage means in association with the keyword data. It is characterized by making it.

このようにすることにより、会議の内容に参加者の興味ある部分があった場合に、該会議参加者に通知することが可能となる。 By doing in this way, when there is a part in which the participant is interested in the content of the conference, the conference participant can be notified.

また、上記聴取支援プログラムにおいて、話者を示す話者データと、該話者データを登録した登録者を示す登録者データと、を対応付けて記憶する話者記憶手段、及び前記処理対象取得手段により取得される前記音声データが表す音声の話者を特定する話者特定手段、として前記コンピュータをさらに機能させ、前記通知手段は、前記キーワードである文字列として前記音声認識手段により認識される音声の話者として特定される話者を示す話者データが、前記話者記憶手段により記憶される話者データに含まれる場合に、該話者データと対応付けて前記話者記憶手段により記憶される登録者データが示す登録者に対する前記所定の通知処理を行う、こととしてもよい。 Further, in the listening support program, speaker storage means for storing speaker data indicating a speaker and registrant data indicating a registrant who has registered the speaker data in association with each other, and the processing target acquisition means The computer further functions as speaker specifying means for specifying a speaker of the voice represented by the voice data acquired by the voice, and the notification means recognizes the voice recognized by the voice recognition means as the character string that is the keyword. When the speaker data indicating the speaker specified as the speaker is included in the speaker data stored by the speaker storage unit, the speaker data is stored in association with the speaker data by the speaker storage unit. The predetermined notification process for the registrant indicated by the registrant data may be performed.

このようにすれば、キーワードとともに、話者についても特定して通知処理をおこなうことができる。 In this way, it is possible to specify the speaker as well as the keyword and perform the notification process.

また、上記聴取支援プログラムにおいて、前記処理対象取得手段により取得される前記音声データが表す音声の話者を特定する話者特定手段、として前記コンピュータをさらに機能させ、前記キーワード記憶手段は、前記キーワードデータと、話者を示す話者データと、前記登録者データと、を対応付けて記憶し、前記通知手段は、前記特定される話者を示す話者データが前記話者記憶手段により記憶される話者データに含まれる場合であって、前記文字列に前記キーワード記憶手段により該話者データと対応付けて記憶されるキーワードデータにより示されるキーワードが含まれる場合に、前記所定の通知処理を行う、こととしてもよい。 In the listening support program, the computer further functions as a speaker specifying unit that specifies a speaker of a voice represented by the voice data acquired by the processing target acquisition unit, and the keyword storage unit includes the keyword storage unit. Data, speaker data indicating a speaker, and the registrant data are stored in association with each other, and the notification unit stores speaker data indicating the specified speaker by the speaker storage unit. The predetermined notification process is performed when the character string includes a keyword indicated by the keyword data stored in association with the speaker data by the keyword storage means. It may be done.

このようにしても、キーワードとともに、話者についても特定して通知処理をおこなうことができる。 Even in this way, it is possible to specify the speaker together with the keyword and perform the notification process.

また、上記聴取支援プログラムにおいて、前記音声データは、該音声データが表す音声の話者の映像を含む映像を伴って、前記処理対象取得手段により取得され、前記話者特定手段は、前記映像に含まれる前記話者の映像をパターン認識することにより、前記音声の話者を特定する、こととしてもよい。 In the listening support program, the audio data is acquired by the processing target acquisition unit along with an image including an image of a speaker of the voice represented by the audio data, and the speaker specifying unit The voice speaker may be specified by pattern recognition of the included video of the speaker.

また、上記聴取支援プログラムにおいて、前記話者特定手段は、前記音声の声紋をパターン認識することにより、該音声の話者を特定する、こととしてもよい。 In the listening support program, the speaker specifying means may specify the speaker of the voice by pattern recognition of the voice print of the voice.

また、上記聴取支援プログラムにおいて、前記処理対象取得手段により取得される音声を記憶する音声信号記憶手段、前記文字列に前記キーワード記憶手段により記憶されるキーワードデータにより示されるキーワードが含まれる場合に、該キーワードである文字列として前記音声認識手段により認識される音声の、前記音声内における発生タイミングを示す情報を記憶するタイミング情報記憶手段、前記音声信号記憶手段により記憶される音声を、前記発生タイミングを示す情報に応じて、再生する音声信号再生手段、として前記コンピュータをさらに機能させることとしてもよい。 Further, in the listening support program, when the voice signal storage means for storing the voice acquired by the processing target acquisition means, the character string includes a keyword indicated by the keyword data stored by the keyword storage means, Timing information storage means for storing information indicating the generation timing in the voice of the voice recognized by the voice recognition means as the character string as the keyword, and the voice stored by the voice signal storage means as the generation timing The computer may be caused to further function as audio signal reproduction means for reproduction according to information indicating the above.

このようにすれば、過去の処理対象データを再生することができる。 In this way, past processing target data can be reproduced.

また、本発明に係る聴取支援装置は、キーワードを示すキーワードデータと、該キーワードデータを登録した登録者を特定する登録者データと、を対応付けて記憶するキーワード記憶手段と、処理対象として音声を表す音声データを取得する処理対象取得手段と、前記処理対象取得手段により取得される前記音声データにより表される音声を文字列として認識する音声認識手段と、前記文字列に前記キーワード記憶手段により記憶されるキーワードデータにより示されるキーワードが含まれる場合に、該キーワードデータと対応付けて前記キーワード記憶手段により記憶される登録者データが示す登録者に対する所定の通知処理を行う通知手段と、を含むことを特徴とする。 The listening support device according to the present invention includes keyword storage means for storing keyword data indicating a keyword and registrant data for specifying a registrant who has registered the keyword data in association with each other, and voice as a processing target. Processing target acquisition means for acquiring voice data to be represented, voice recognition means for recognizing a voice represented by the voice data acquired by the processing target acquisition means as a character string, and storing the character string in the keyword storage means Notification means for performing a predetermined notification process for the registrant indicated by the registrant data stored in the keyword storage means in association with the keyword data when the keyword indicated by the keyword data is included. It is characterized by.

また、本発明に係る聴取支援方法は、キーワードを示すキーワードデータと、該キーワードデータを登録した登録者を特定する登録者データと、を対応付けて記憶するキーワード記憶ステップと、処理対象として音声を表す音声データを取得する処理対象取得ステップと、前記処理対象取得ステップにおいて取得される前記音声データにより表される音声を文字列として認識する音声認識ステップと、前記文字列に前記キーワード記憶ステップにおいて記憶されるキーワードデータにより示されるキーワードが含まれる場合に、該キーワードデータと対応付けて前記キーワード記憶ステップにおいて記憶される登録者データが示す登録者に対する所定の通知処理を行う通知ステップと、を含むことを特徴とする。 Further, the listening support method according to the present invention includes a keyword storage step for storing keyword data indicating a keyword and registrant data for specifying a registrant who has registered the keyword data in association with each other, and voice as a processing target. A processing target acquisition step of acquiring voice data to be represented, a voice recognition step of recognizing a voice represented by the voice data acquired in the processing target acquisition step as a character string, and storing the character string in the keyword storage step A notification step for performing a predetermined notification process for the registrant indicated by the registrant data stored in the keyword storage step in association with the keyword data when the keyword indicated by the keyword data is included. It is characterized by.

本発明の実施の形態について、図面を参照しながら説明する。 Embodiments of the present invention will be described with reference to the drawings.

本実施の形態に係る聴取支援システム１は、図１に示すように、通信ネットワーク２を介して相互に接続される複数の通信装置３、サーバコンピュータ４、録画装置５、ディスプレイ６、クライアントコンピュータ７、カメラ８及びマイクロフォン９を含んで構成されている。 As shown in FIG. 1, the listening support system 1 according to the present embodiment includes a plurality of communication devices 3, a server computer 4, a recording device 5, a display 6, and a client computer 7 that are connected to each other via a communication network 2. The camera 8 and the microphone 9 are included.

通信ネットワーク２は、例えば聴取支援システム１をテレビ電話会議システムの一部として使用する場合には、多地点のテレビ電話網であってもよいし、ＩＰ電話網すなわちインターネットであってもよい。また、聴取支援システム１をテレビ電話システムの一部として使用する場合には、電話網であってもよいし、ＩＰ電話網すなわちインターネットであってもよい。さらに、聴取支援システム１を電話システムの一部として使用する場合にも、電話網であってもよいし、ＩＰ電話網すなわちインターネットであってもよい。 For example, when the listening support system 1 is used as a part of a videophone conference system, the communication network 2 may be a multipoint videophone network or an IP telephone network, that is, the Internet. When the listening support system 1 is used as a part of a videophone system, it may be a telephone network or an IP telephone network, that is, the Internet. Further, when the listening support system 1 is used as a part of the telephone system, it may be a telephone network or an IP telephone network, that is, the Internet.

通信装置３は、通信ネットワーク２とのインターフェイスとなる装置である。各通信装置３は以下に説明する他の装置の一部であってもよいし、独立した装置であってもよい。通信ネットワーク２が電話網である場合には、通信装置３はダイヤル機能を備えていてもよい。また、通信ネットワーク２がインターネットである場合には、通信装置３は例えばＬＡＮカードであったり、ダイヤルアップのためのモデムであったりする。 The communication device 3 is a device that serves as an interface with the communication network 2. Each communication device 3 may be a part of another device described below, or may be an independent device. When the communication network 2 is a telephone network, the communication device 3 may have a dial function. When the communication network 2 is the Internet, the communication device 3 is, for example, a LAN card or a dial-up modem.

サーバコンピュータ４は、図２に示すように、バス４０、制御部４１、主記憶部４２、入出力制御部４３、通信部４４、データベース４５及び副記憶部４６を含んで構成されている。制御部４１、主記憶部４２、入出力制御部４３はバス４０を介して相互に接続され、データの入出力を行っている。通信部４４、データベース４５及び副記憶部４６は入出力制御部４３に接続され、同じくデータの入出力を行っている。なお、サーバコンピュータ４は、多地点テレビ会議サーバとして使用することもできる。 As illustrated in FIG. 2, the server computer 4 includes a bus 40, a control unit 41, a main storage unit 42, an input / output control unit 43, a communication unit 44, a database 45, and a sub storage unit 46. The control unit 41, the main storage unit 42, and the input / output control unit 43 are connected to each other via the bus 40 and input / output data. The communication unit 44, the database 45, and the sub storage unit 46 are connected to the input / output control unit 43 and similarly perform data input / output. The server computer 4 can also be used as a multipoint video conference server.

制御部４１は、サーバコンピュータ４の各部を制御する。主記憶部４２は、制御部４１のワークメモリとして動作するとともに、制御部４１によって行われる各種処理に関わるプログラムやパラメータを保持している。入出力制御部４３は、制御部４１からの指示に従い、該入出力制御部４３に接続される各部を制御するとともに、該各部から入力されるデータを制御部４１に出力したり、制御部４１から入力されるデータを該各部に出力したり、といった処理を行う。通信部４４は、通信ネットワーク２との間でデータの送受信を行う。そして、受信したデータを入出力制御部４３に対して出力したり、入出力制御部４３から入力されるデータを通信ネットワーク２に送信したり、といった処理を行う。データベース４５は、例えば従来公知のリレーショナルデータベースを使用することができる。副記憶部４６は、ハードディスクや外部記憶媒体であり、サーバコンピュータ４で利用する各種データを記憶したり、制御部４１を動作させるための本実施の形態に係るプログラムを記憶したりする。そして、制御部４１は該プログラムを主記憶部４２に展開し、実行する。 The control unit 41 controls each unit of the server computer 4. The main storage unit 42 operates as a work memory of the control unit 41 and holds programs and parameters related to various processes performed by the control unit 41. The input / output control unit 43 controls each unit connected to the input / output control unit 43 in accordance with an instruction from the control unit 41, and outputs data input from the respective unit to the control unit 41 or controls the control unit 41. The data input from is output to the respective units. The communication unit 44 transmits / receives data to / from the communication network 2. Then, the received data is output to the input / output control unit 43, and the data input from the input / output control unit 43 is transmitted to the communication network 2. As the database 45, for example, a conventionally known relational database can be used. The secondary storage unit 46 is a hard disk or an external storage medium, stores various data used by the server computer 4, and stores a program according to the present embodiment for operating the control unit 41. Then, the control unit 41 expands the program in the main storage unit 42 and executes it.

なお、サーバコンピュータ４を多地点テレビ会議サーバとして使用する場合には、制御部４１は、通信部４４が受信する複数のカメラ８及びマイクロフォン９からの映像データ及び音声データである処理対象データを合成し、複数のディスプレイ６に対して送信する。 When the server computer 4 is used as a multipoint video conference server, the control unit 41 synthesizes processing target data that is video data and audio data from a plurality of cameras 8 and microphones 9 received by the communication unit 44. And transmitted to the plurality of displays 6.

録画装置５はビデオデッキやＤＶＤレコーダー等の従来公知の録画手段を使用することができる。また、後述するクライアントコンピュータ７の副記憶部７７に録画することとしてもよい。そして、該録画装置５は通信装置３に接続され、通信装置３が受信する音声データや映像データを録画することができるとともに、録画したデータをディスプレイ６に対して出力することができる。 The recording device 5 can use a conventionally known recording means such as a video deck or a DVD recorder. Moreover, it is good also as recording on the secondary memory | storage part 77 of the client computer 7 mentioned later. The recording device 5 is connected to the communication device 3, can record audio data and video data received by the communication device 3, and can output the recorded data to the display 6.

ディスプレイ６はテレビやＣＲＴ等の従来公知の画像表示手段を利用することができ、一般的にはスピーカを含んで構成される。そして、該ディスプレイ６は通信装置３に接続され、通信装置３が受信する音声データや映像データを出力するとともに、録画装置５から入力される録画データを出力することができる。 The display 6 can use a conventionally known image display means such as a television or a CRT, and generally includes a speaker. The display 6 is connected to the communication device 3 and can output audio data and video data received by the communication device 3 and can also output recording data input from the recording device 5.

クライアントコンピュータ７は、図３に示すように、バス７０、制御部７１、主記憶部７２、入出力制御部７３、通信部７４、操作部７５、表示部７６及び副記憶部７７を含んで構成されている。制御部７１、主記憶部７２、入出力制御部７３はバス７０を介して相互に接続され、データの入出力を行っている。通信部７４、操作部７５、表示部７６及び副記憶部７７は入出力制御部７３に接続され、同じくデータの入出力を行っている。 As shown in FIG. 3, the client computer 7 includes a bus 70, a control unit 71, a main storage unit 72, an input / output control unit 73, a communication unit 74, an operation unit 75, a display unit 76, and a sub storage unit 77. Has been. The control unit 71, the main storage unit 72, and the input / output control unit 73 are connected to each other via the bus 70 and input / output data. The communication unit 74, the operation unit 75, the display unit 76, and the sub storage unit 77 are connected to the input / output control unit 73 and similarly perform data input / output.

制御部７１は、クライアントコンピュータ７の各部を制御する。主記憶部７２は、制御部７１のワークメモリとして動作するとともに、制御部７１によって行われる各種処理に関わるプログラムやパラメータを保持している。また、ディスプレイ６に対する表示や、音量の変更といった処理も行う。入出力制御部７３は、制御部７１からの指示に従い、該入出力制御部７３に接続される各部を制御するとともに、該各部から入力されるデータを制御部７１に出力したり、制御部７１から入力されるデータを該各部に出力したり、といった処理を行う。通信部７４は、通信ネットワーク２との間でデータの送受信を行う。そして、受信したデータを入出力制御部７３に対して出力したり、入出力制御部７３から入力されるデータを通信ネットワーク２に送信したり、といった処理を行う。操作部７５は、例えばマウスやキーボードを含んで構成されており、クライアントコンピュータ７の利用者からの入力を受け付けて入出力制御部７３に出力している。表示部７６は、例えば液晶表示装置を含んで構成されており、入出力制御部７３から入力される信号に従って情報を表示出力する。なお、ディスプレイ６は該表示部７６であってもよい。副記憶部７７は、ハードディスクや外部記憶媒体であり、クライアントコンピュータ７で利用する各種データを記憶したり、制御部７１を動作させるための例えば本実施の形態に係るプログラムを記憶したりする。そして、制御部７１は該プログラムを主記憶部７２に展開し、実行する。 The control unit 71 controls each unit of the client computer 7. The main storage unit 72 operates as a work memory of the control unit 71 and holds programs and parameters related to various processes performed by the control unit 71. Also, processing such as display on the display 6 and change of volume is performed. The input / output control unit 73 controls each unit connected to the input / output control unit 73 in accordance with an instruction from the control unit 71, outputs data input from the respective unit to the control unit 71, and controls the control unit 71. The data input from is output to the respective units. The communication unit 74 transmits and receives data to and from the communication network 2. Then, the received data is output to the input / output control unit 73 and the data input from the input / output control unit 73 is transmitted to the communication network 2. The operation unit 75 includes a mouse and a keyboard, for example, and receives an input from a user of the client computer 7 and outputs the input to the input / output control unit 73. The display unit 76 includes, for example, a liquid crystal display device, and displays and outputs information according to a signal input from the input / output control unit 73. The display 6 may be the display unit 76. The secondary storage unit 77 is a hard disk or an external storage medium, and stores various data used by the client computer 7 or stores, for example, a program according to the present embodiment for operating the control unit 71. Then, the control unit 71 develops the program in the main storage unit 72 and executes it.

次に、図４は聴取支援システム１をテレビ会議システムとして使用する場合の機能ブロック図である。図４に示すように、該聴取支援システム１は、通信開始部８０、再生指示部８１、通知部８２、話者設定部８３、キーワード設定部８４、再生動作設定部８５、通知設定部８６、再生部８７、照合部８８、音声認識部８９、記録媒体９０、データ取得部９１、カメラ８及びマイクロフォン９を含んで構成されている。これらの各部は全て１のクライアントコンピュータ７にあってもよいし、一部がサーバコンピュータ４にあってもよい。ここではまず全て１のクライアントコンピュータ７に存在するとして説明し、後にサーバコンピュータ４に移した場合の処理について述べる。 Next, FIG. 4 is a functional block diagram when the listening support system 1 is used as a video conference system. As shown in FIG. 4, the listening support system 1 includes a communication start unit 80, a reproduction instruction unit 81, a notification unit 82, a speaker setting unit 83, a keyword setting unit 84, a reproduction operation setting unit 85, a notification setting unit 86, The reproduction unit 87, the verification unit 88, the voice recognition unit 89, the recording medium 90, the data acquisition unit 91, the camera 8, and the microphone 9 are configured. All these units may be in one client computer 7, or a part may be in the server computer 4. Here, it is assumed that all are present in one client computer 7, and the processing when moving to the server computer 4 will be described later.

通信開始部８０は、クライアントコンピュータ７と通信ネットワーク２との間での通信を開始するとともに、テレビ会議の相手となる通信装置３との間での通信を開始する。そして、通信を開始した場合には、データ取得部９１において、通信相手となる通信装置に接続されているカメラ８及びマイクロフォン９からの映像データ及び音声データである処理対象データの取得を開始する。なお、この場合の映像データの例を図５に示す。該映像データには、テレビ会議の相手の上半身画像２０が含まれている。 The communication start unit 80 starts communication between the client computer 7 and the communication network 2 and starts communication with the communication device 3 that is a partner of the video conference. When communication is started, the data acquisition unit 91 starts acquisition of processing target data that is video data and audio data from the camera 8 and the microphone 9 connected to the communication device that is the communication partner. An example of the video data in this case is shown in FIG. The video data includes the upper body image 20 of the other party of the video conference.

記録媒体９０は、録画装置５において使用される記録媒体であり、データ取得部が取得した処理対象データを逐次記録する。 The recording medium 90 is a recording medium used in the recording device 5 and sequentially records the processing target data acquired by the data acquisition unit.

音声認識部８９は、データ取得部９１が取得する音声データを音声認識し、文字列として取得する。音声認識手段には、従来から各種販売されている音声認識ソフトを使用することができる。そして、映像データと、文字列として取得された音声データと、を照合部８８に出力する。 The voice recognition unit 89 recognizes the voice data acquired by the data acquisition unit 91 as a character string. Various types of voice recognition software that has been sold in the past can be used as the voice recognition means. Then, the video data and the audio data acquired as a character string are output to the matching unit 88.

照合部８８は、後述するキーワード設定部８４において設定されるキーワードデータにより示されるキーワードが、音声認識部８９から入力される文字列に含まれるか否かを判定する。具体的には、該文字列を所定時間ごとに単語に区切り、単語集合として取得し、該単語集合にキーワードが含まれるか否かを判定することとしてもよい。また、順次取得される音声データから順次単語を抽出し、該単語がキーワードに含まれるか否かを判定することとしてもよい。さらに、該キーワードである文字列が音声データに含まれるタイミングを記憶することとしてもよい。具体的には、例えばテレビ会議が開始してからの秒数で記憶してもよいし、時刻で記憶することもできる。 The matching unit 88 determines whether or not the keyword indicated by the keyword data set in the keyword setting unit 84 to be described later is included in the character string input from the voice recognition unit 89. Specifically, the character string may be divided into words every predetermined time, acquired as a word set, and it may be determined whether or not a keyword is included in the word set. It is also possible to sequentially extract words from the sequentially acquired audio data and determine whether or not the words are included in the keyword. Furthermore, the timing at which the character string that is the keyword is included in the voice data may be stored. Specifically, for example, the number of seconds after the start of the video conference may be stored, or the time may be stored.

さらに照合部８８は、後述する話者設定部８３において設定される話者データにより示される話者が、音声認識部８９から入力される映像データ又は音声データに含まれるか否かを判定することとしてもよい。すなわち、話者データにより示される話者が、キーワードを喋ったのか否かを判定する。具体的には、映像データをパターン認識することにより、話者の映像を抽出し、該話者の映像がニューラルネットによって記憶される話者データが示す話者と判定できるか否かを判断することとしてもよいし、音声データをパターン認識することにより、話者の声紋を抽出し、該話者の声紋がニューラルネットによって記憶される話者データが示す話者と判定できるか否かを判断することとしてもよい。 Further, the collation unit 88 determines whether or not a speaker indicated by speaker data set in a speaker setting unit 83 described later is included in the video data or audio data input from the voice recognition unit 89. It is good. That is, it is determined whether or not the speaker indicated by the speaker data has hit the keyword. Specifically, the video of the speaker is extracted by pattern recognition of the video data, and it is determined whether or not the video of the speaker can be determined as the speaker indicated by the speaker data stored by the neural network. It is also possible to extract the voiceprint of the speaker by pattern recognition of the voice data and determine whether the speaker's voiceprint can be determined as the speaker indicated by the speaker data stored by the neural network. It is good to do.

以上のようにして、照合部８８は処理対象データに含まれる話者とキーワードを取得することができる。なお、話者とキーワードを取得する処理は、どちらを先に行ってもよい。例えば話者が話者データにより示される話者として判定される場合に、キーワードが文字列に含まれるか否かを判定することとしてもよいし、キーワードが文字列に含まれる場合に、話者が話者データにより示される話者として判定されるか否かを判断してもよい。 As described above, the matching unit 88 can acquire speakers and keywords included in the processing target data. Note that either the speaker or keyword acquisition process may be performed first. For example, when the speaker is determined as the speaker indicated by the speaker data, it may be determined whether or not the keyword is included in the character string, or when the keyword is included in the character string. May be determined as the speaker indicated by the speaker data.

ここで、話者設定部８３、キーワード設定部８４、再生動作設定部８５及び通知設定部８６における処理について詳述する。 Here, processing in the speaker setting unit 83, the keyword setting unit 84, the reproduction operation setting unit 85, and the notification setting unit 86 will be described in detail.

話者設定部８３、キーワード設定部８４、再生動作設定部８５及び通知設定部８６は、操作部７５において、クライアントコンピュータ７のユーザが後述する各データを設定するための機能部である。具体的には、例えば表示部７６において図６に示すようなメニュー画面を表示することにより、ＧＵＩによりユーザが各データを設定することとしてもよい。図６を例に取り、話者設定部８３、キーワード設定部８４、再生動作設定部８５及び通知設定部８６について説明する。 The speaker setting unit 83, the keyword setting unit 84, the reproduction operation setting unit 85, and the notification setting unit 86 are function units for setting data described later by the user of the client computer 7 in the operation unit 75. Specifically, for example, by displaying a menu screen as shown in FIG. 6 on the display unit 76, the user may set each data using the GUI. The speaker setting unit 83, the keyword setting unit 84, the reproduction operation setting unit 85, and the notification setting unit 86 will be described with reference to FIG.

話者設定部８３は、話者データを登録するために話者データを設定するための機能部であり、図６においてはメンバーを入力する入力欄２２において話者データを入力することにより設定することができる。話者データとしては、ここでは話者の名前を用いている。なお、複数の話者データを設定できることとしてもよい。 The speaker setting unit 83 is a functional unit for setting speaker data in order to register speaker data. In FIG. 6, the speaker setting unit 83 is set by inputting speaker data in the input field 22 for inputting members. be able to. As the speaker data, the name of the speaker is used here. A plurality of speaker data may be set.

キーワード設定部８４は、キーワードデータを登録するためにキーワードデータを設定するための機能部であり、図６においては発言キーワードを入力する入力欄２３においてキーワードデータを入力することにより設定することができる。キーワードデータとしてはここではキーワードとして使用される文字列そのものを用いている。また、複数のキーワードデータを設定することができることとしてもよい。 The keyword setting unit 84 is a functional unit for setting keyword data in order to register keyword data. In FIG. 6, the keyword setting unit 84 can be set by inputting keyword data in the input field 23 for inputting a remark keyword. . Here, the character string itself used as the keyword is used as the keyword data. It is also possible to set a plurality of keyword data.

通知設定部８６は、後述する通知部８２において通知処理を行う際の処理方法について入力することにより設定する。ここでは、「アイコン点滅」としており、例えば図５のアイコン２１が点滅することによりユーザに対する通知処理を行うことを設定する。なお、他にも例えば音量を増大することによって通知することとしてもよいし、ページャを鳴らすことにより通知することとしてもよい。これらのうち、複数の通知処理を行うように設定することとしてもよい。 The notification setting unit 86 is set by inputting a processing method when performing notification processing in the notification unit 82 described later. Here, “icon blinking” is set, and for example, the setting of performing notification processing to the user when the icon 21 in FIG. 5 blinks is set. In addition, for example, notification may be performed by increasing the volume, or notification may be performed by ringing a pager. Among these, it may be set to perform a plurality of notification processes.

再生動作設定部８５は、後述する再生部８７において記録媒体９０に逐次記録される処理対象データを再生する際の方法を入力することにより設定する。再生方法は、例えば「キーワードがあったタイミング（発言時）の１分前から再生する」というように、いつから再生するのかを設定することもできるし、例えば「会議の模様をリアルタイムでディスプレイ６に流し続けながら、ディスプレイ６において子画面表示により再生する」というように、表示の方法を設定することとしてもよい。もちろん、「再生しない」というように表示方法を設定することもできる。 The reproduction operation setting unit 85 is set by inputting a method for reproducing the processing target data sequentially recorded on the recording medium 90 in the reproduction unit 87 described later. As the playback method, for example, “playing back from one minute before the timing when the keyword is present (at the time of speech)” can be set, and for example, “the meeting pattern is displayed on the display 6 in real time. It is also possible to set the display method, for example, “reproduce by sub-screen display on display 6 while continuing to flow”. Of course, the display method can also be set such as “do not play”.

なお、話者設定部８３、キーワード設定部８４、再生動作設定部８５及び通知設定部８６は、図６におけるアイコン２６が押下された場合に、各部ともに設定処理を行うこととすることができる。設定処理とは、設定された各データを、主記憶部７２に記憶することにより、照合部８８でキーワードデータや話者データを参照できるようにしたり、再生部８７で再生する際に、再生指示部８１が再生方法を参照できるようにしたり、通知部８２で通知処理を行う際に、通知方法を参照できるようにしたり、といった処理である。この場合、例えばキーワードと話者を対応付けて記憶することとしてもよい。すなわち、複数の話者を設定する場合に、各話者について、それぞれ異なるキーワードを記憶することとしてもよい。こうすれば、照合部８８において、話者が複数の話者のうちいずれの話者であるか否かに応じて、キーワードが文字列に含まれるか否かを判定することができる。すなわち、話者ごとに通知されるキーワードを設定することができる。 Note that the speaker setting unit 83, the keyword setting unit 84, the playback operation setting unit 85, and the notification setting unit 86 can perform setting processing for each unit when the icon 26 in FIG. 6 is pressed. In the setting process, each set data is stored in the main storage unit 72 so that the collation unit 88 can refer to the keyword data or the speaker data, or when the reproduction unit 87 reproduces the data, This is processing such that the unit 81 can refer to the playback method, or can refer to the notification method when the notification unit 82 performs notification processing. In this case, for example, a keyword and a speaker may be stored in association with each other. That is, when a plurality of speakers are set, different keywords may be stored for each speaker. In this way, the collation unit 88 can determine whether or not the keyword is included in the character string depending on whether or not the speaker is one of a plurality of speakers. That is, a keyword to be notified for each speaker can be set.

次に、再生指示部８１及び再生部８７の処理について説明する。再生指示部８１は、照合部８８においてキーワードが含まれると判断され、かつ話者が話者データとして設定された話者であると判断される場合に、再生動作設定部８５において設定された再生方法に従って記録媒体９０に記録された処理対象データを再生することを、再生部８７に対して指示する。この再生の際の具体例を図７に示す。図７は図５と同様にテレビ会議の相手の上半身画像２７が映像として写っている。そしてさらにアイコン２８が表示される。ユーザが該アイコン２８をクリックした場合には、再生を終了し通常のテレビ会議の映像に戻す処理を行う。この場合において、再生中も引き続きデータ取得部９１において取得される処理対象データを記録し続けることとしてもよいし、再生中は一旦電話を切断し、アイコン２８がクリックした段階で再度接続することとしてもよい。 Next, processing of the reproduction instruction unit 81 and the reproduction unit 87 will be described. The reproduction instruction unit 81 determines that the keyword is included in the collation unit 88 and the reproduction instruction set in the reproduction operation setting unit 85 when it is determined that the speaker is a speaker set as speaker data. The reproduction unit 87 is instructed to reproduce the processing target data recorded on the recording medium 90 according to the method. A specific example of this reproduction is shown in FIG. FIG. 7 shows the upper body image 27 of the other party of the video conference as a video image as in FIG. Further, an icon 28 is displayed. When the user clicks on the icon 28, the reproduction is terminated and the process returns to the normal video conference video. In this case, the processing target data acquired by the data acquisition unit 91 may be continuously recorded during reproduction, or the telephone is disconnected once during reproduction and reconnected when the icon 28 is clicked. Also good.

通知部８２は、照合部８８においてキーワードが含まれると判断され、かつ話者が話者データとして設定された話者であると判断される場合に、所定の通知処理を行う。所定の通知処理は上述の通り、通知設定部８６において設定された通知方法にて、通知処理を行う。該通知処理により、ユーザはテレビ電話の相手がキーワードを発言したことを知ることができる。さらには、そのキーワードを発言した相手が設定した人間であることを知ることができる。 The notification unit 82 performs a predetermined notification process when it is determined by the verification unit 88 that the keyword is included and the speaker is determined to be a speaker set as speaker data. As described above, the predetermined notification processing is performed by the notification method set in the notification setting unit 86. By the notification process, the user can know that the other party of the videophone has spoken the keyword. Furthermore, it is possible to know that the person who has spoken the keyword is a set person.

以上の説明においては、各部がクライアントコンピュータ７に存在するとして説明したが、次にサーバコンピュータ４において一部の処理を行った場合の処理について述べる。 In the above description, it has been described that each unit is present in the client computer 7. Next, processing when a part of processing is performed in the server computer 4 will be described.

サーバコンピュータ４において処理を行う場合には、複数のクライアントコンピュータ７の話者設定部８３、キーワード設定部８４、再生動作設定部８５及び通知設定部８６でそれぞれ設定された各データを、データベース４５に記憶する。記憶する際には、各データを各クライアントコンピュータ７のユーザである各データの登録者を特定する登録者データと対応付けて記憶する。登録者データとしては、通知部８２における通知処理に応じた登録者データを記憶する。例えば通知処理がアイコンの点滅や音量の増大である場合には、クライアントコンピュータ７のＩＰアドレスであるとすることができる。また、通知処理がページャの鳴動である場合には、ページャのアドレスとることもできる。さらに、サーバコンピュータ４を多地点テレビ会議サーバとして使用する場合には、各ディスプレイ６に対する映像に通知のための映像を合成するためのデータとすることもできる。データベース４５に記憶されるテーブルの具体的な例を図９に示す。図９に示すように、登録者データ、キーワードデータ、話者データ、通知方法及び再生方法は、対応付けて記憶することができる。そしてこのようにこれらを対応付けて記憶した場合には、特定の話者が特定のキーワードを喋った場合に、特定の通知方法で通知し、特定の再生方法で再生する、ということが可能になる。ただし、これら全てのデータを対応付けなくても、個別にそれぞれ対応付けて記憶するだけでもよい。 When processing is performed in the server computer 4, the data set by the speaker setting unit 83, the keyword setting unit 84, the reproduction operation setting unit 85, and the notification setting unit 86 of the plurality of client computers 7 are stored in the database 45. Remember. When storing, each data is stored in association with registrant data that identifies the registrant of each data who is the user of each client computer 7. As the registrant data, registrant data corresponding to the notification process in the notification unit 82 is stored. For example, when the notification process is blinking of an icon or increase in volume, the IP address of the client computer 7 can be assumed. If the notification process is ringing of the pager, it can be the address of the pager. Furthermore, when the server computer 4 is used as a multi-point video conference server, it can be data for synthesizing video for notification with video for each display 6. A specific example of the table stored in the database 45 is shown in FIG. As shown in FIG. 9, registrant data, keyword data, speaker data, a notification method, and a reproduction method can be stored in association with each other. And when these are stored in association with each other, when a specific speaker hits a specific keyword, it is possible to notify by a specific notification method and reproduce by a specific reproduction method. Become. However, all of these data may not be associated but may be stored individually associated with each other.

サーバコンピュータ４にもデータ取得部９１を設け処理対象データを取得するとともに、該処理対象データについて、同じくサーバコンピュータ４に設置される音声認識部８９、照合部８８において上記処理を行うこととしてもよい。また、記録媒体９０としてサーバコンピュータ４の副記憶部４６を使用することとしてもよい。さらに、再生指示部８１、再生部８７及び通知部８２をサーバコンピュータ４に設置することとしてもよい。以下では、音声認識部８９、照合部８８、記録媒体９０、再生指示部８１、再生部８７及び通知部８２はサーバコンピュータ４に設置されるものとして説明する。さらに、サーバコンピュータ４を多地点テレビ会議サーバとして使用するものとして説明する。 The server computer 4 may also be provided with a data acquisition unit 91 to acquire processing target data, and the processing target data may be subjected to the above processing in the voice recognition unit 89 and the verification unit 88 that are also installed in the server computer 4. . Further, the secondary storage unit 46 of the server computer 4 may be used as the recording medium 90. Furthermore, the reproduction instruction unit 81, the reproduction unit 87, and the notification unit 82 may be installed in the server computer 4. In the following description, it is assumed that the voice recognition unit 89, the collation unit 88, the recording medium 90, the reproduction instruction unit 81, the reproduction unit 87, and the notification unit 82 are installed in the server computer 4. Furthermore, it demonstrates as what uses the server computer 4 as a multipoint video conference server.

音声認識部８９は、データ取得部９１において取得された処理対象データについて、上述の文字列として取得する処理を行う。そして照合部８８は、データベース４５に記憶されるキーワードが該文字列に含まれるか否かを判定する。含まれる場合には、上述の処理により、該キーワードと対応付けて記憶される登録者データと対応付けてデータベース４５に記憶される話者データにより示される話者が、音声認識部８９から入力される映像データ又は音声データに含まれるか否かを判定することとしてもよい。 The voice recognition unit 89 performs processing for acquiring the processing target data acquired by the data acquisition unit 91 as the character string described above. Then, the collation unit 88 determines whether or not the keyword stored in the database 45 is included in the character string. If included, a speaker indicated by the speaker data stored in the database 45 in association with the registrant data stored in association with the keyword is input from the speech recognition unit 89 by the above processing. It may be determined whether it is included in video data or audio data.

このようにすると、登録者データごとに、キーワードを検出し、該キーワードの話者を取得することができる。そして、通知部８２は、登録者データに対して、該登録者データと対応付けて記憶される通知方法にて通知処理を行う。また、再生指示部８１も同様に、登録者データが特定する登録者の指示（例えばアイコン２１のクリック）を該登録者が使用するクライアントコンピュータ７から受信した場合に、該クライアントコンピュータと対応するディスプレイ６に対して送信する映像に、該登録者データと対応付けて記憶される再生方法に従って、再生部８７が再生する記録媒体９０に記録される映像を合成して送信する。 If it does in this way, a keyword can be detected for every registrant data, and the speaker of the keyword can be acquired. And the notification part 82 performs a notification process with respect to registrant data by the notification method memorize | stored in association with this registrant data. Similarly, when the reproduction instruction unit 81 receives an instruction from the registrant specified by the registrant data (for example, clicking on the icon 21) from the client computer 7 used by the registrant, the display corresponding to the client computer is displayed. 6 is combined with the video recorded on the recording medium 90 to be played back by the playback unit 87 according to the playback method stored in association with the registrant data.

以上のようにすることにより、サーバコンピュータ４を利用して登録者データごとにキーワードを検出し、該キーワードの話者を取得し、通知処理を実行し、再生映像を送信する、といった処理を行うことができる。 As described above, the server computer 4 is used to detect a keyword for each registrant data, acquire a speaker of the keyword, execute a notification process, and transmit a playback video. be able to.

以上の処理を、１のクライアントコンピュータ７で処理を行う場合についてフロー図を参照しながら説明する。 The case where the above processing is performed by one client computer 7 will be described with reference to a flowchart.

図８は、クライアントコンピュータ７で行われる処理の一例を示すフロー図である。 FIG. 8 is a flowchart showing an example of processing performed in the client computer 7.

まず、音声データ及び映像データである処理対象データを取得する（Ｓ１００）。ここでは、処理対象データは時間毎に区切って取得するものとして説明する。そして、その中に含まれる話者を特定する（Ｓ１０２）とともに、音声認識処理により音声データを文字列化して取得する（Ｓ１０４）。そして、該文字列を単語に分割し、単語と、話者と、発話時刻と、を対応付けて記憶する（Ｓ１０６）。なお、時間毎に区切られた処理対象データを、その開始時刻と対応付けて記憶してもよい。ただしこの場合にも話者は単語と対応付けられることが望ましい。なお記憶するのは、単語でなくても文節を使用することもできる。文節の場合にはそれ自身でひとまとまりの音声上の特徴を有するので、より音声認識には適していると考えられる。 First, processing target data that is audio data and video data is acquired (S100). Here, the description will be made assuming that the processing target data is acquired by dividing every time. Then, a speaker included therein is specified (S102), and voice data is converted into a character string by voice recognition processing (S104). Then, the character string is divided into words, and the words, speakers, and utterance times are stored in association with each other (S106). In addition, you may memorize | store the process target data divided | segmented for every time in correlation with the start time. In this case, however, it is desirable that the speaker is associated with a word. Note that phrases can be used even if they are not words. In the case of a phrase, it has a group of voice characteristics by itself, so it is considered more suitable for voice recognition.

そして、上記各単語が、記憶されるキーワードに含まれるか否かを判定する（Ｓ１０８）。なお、上述のように単語に変えて文節を記憶する場合には、単語として入力されるキーワードは文節の一部となるので、該文節の一部にキーワードが含まれるか否かについても判定する必要がある。そして、該文節の一部にキーワードが含まれる場合には、該文節が、記憶されるキーワードに含まれると判定する。そして、含まれると判定される場合には次のＳ１１０の処理に進み、含まれないと判定される場合には、Ｓ１００に戻り、次の処理対象データを取得する。 Then, it is determined whether or not each of the above words is included in the stored keyword (S108). When a phrase is stored in place of a word as described above, the keyword input as a word becomes a part of the phrase, so it is also determined whether or not the keyword is included in a part of the phrase. There is a need. If a keyword is included in a part of the phrase, it is determined that the phrase is included in the stored keyword. If it is determined that it is included, the process proceeds to the next process of S110, and if it is determined that it is not included, the process returns to S100 to acquire the next process target data.

Ｓ１１０の処理では、キーワードを含む単語について対応付けて記憶される話者が、記憶されている話者に含まれるか否かを判定する。すなわち、キーワードを喋った人間が登録者の登録した話者か否かを判定する。そして、記憶される話者である場合には、次のＳ１１２の処理に進み、含まれないと判定される場合には、Ｓ１００に戻り、次の処理対象データを取得する。もちろん、キーワードは複数含まれる場合があるので、各キーワードについて処理を行い、１つでも記憶される話者が喋ったキーワードである場合には次のＳ１１２の処理に進む。 In the process of S110, it is determined whether or not the speaker stored in association with the word including the keyword is included in the stored speaker. That is, it is determined whether or not the person who spoke the keyword is a speaker registered by the registrant. If the speaker is stored, the process proceeds to the next process of S112. If it is determined that the speaker is not included, the process returns to S100 to acquire the next process target data. Of course, since there may be a plurality of keywords, processing is performed for each keyword, and if even one of the stored keywords is a spoken keyword, the process proceeds to the next step S112.

Ｓ１１２では、登録者に対する所定の通知処理を行う。上述のように、アイコンを点滅させたり、音量を増大させたり、といった処理である。 In S112, a predetermined notification process for the registrant is performed. As described above, this is a process of blinking the icon or increasing the volume.

そして、登録者によって繰り返し再生することを要求されたか否かを監視することにより判定する（Ｓ１１４）。すなわち、予め登録者によって設定された再生方法にて再生することを要求されたか否かを判定する。要求されたと判定する場合には、次のＳ１１６の処理に進み、要求されないと判定される場合（タイムアウトする場合）には、Ｓ１００に戻り、次の処理対象データを取得する。 Then, it is determined by monitoring whether or not replaying is requested by the registrant (S114). That is, it is determined whether or not reproduction is requested by a reproduction method set in advance by a registrant. When it is determined that the request has been made, the process proceeds to the next process of S116. When it is determined that the request has not been made (when time-out occurs), the process returns to S100 to acquire the next process target data.

Ｓ１１６では、上記話者が話すキーワードを含んでいた単語と対応付けられて記憶される発話時刻から計算し、１分前の時刻から、再生処理を行う。このとき、記録媒体９０によって新たな処理対象データが逐次記録されていく場合には、それを逐次読み出すことにより、次の処理によって登録者が再生停止を要求するまでの間、常に一分遅れの処理対象データを再生し続けることも可能である。 In S116, the calculation is performed from the utterance time stored in association with the word including the keyword spoken by the speaker, and the reproduction process is performed from the time one minute before. At this time, when new data to be processed is sequentially recorded by the recording medium 90, it is always read out until the registrant requests to stop reproduction by the next processing. It is also possible to continue playing the data to be processed.

次に、再生終了が要求されたか、又は再生が最後まで終了したか、を監視し、判定する（Ｓ１１８）。そして、肯定判定された場合には、再生を終了し、通常のテレビ会議に戻す（Ｓ１２０）。そして、Ｓ１００に戻り、次の処理対象データを取得する。否定判定された場合には、肯定判定またはタイムアウトまで繰り返し該判定処理を行う。 Next, it is monitored and judged whether the end of reproduction is requested or whether the reproduction is completed to the end (S118). If an affirmative determination is made, the reproduction ends and the normal video conference is resumed (S120). Then, the process returns to S100 to acquire the next processing target data. If a negative determination is made, the determination process is repeated until an affirmative determination or timeout occurs.

以上のようにすることにより、会議の内容に参加者の興味ある部分があった場合に、該会議参加者に通知することが可能とする。そして、興味ある部分を、話者の発言に含まれるキーワードにより特定することができる。また、キーワードを話した話者によって処理を変更することも可能である。 By doing as described above, when there is a part in which the participant is interested in the content of the conference, the conference participant can be notified. Then, an interesting part can be specified by a keyword included in the speaker's speech. It is also possible to change the process depending on the speaker who spoke the keyword.

なお、本発明は上記実施の形態に限定されるものではない。 The present invention is not limited to the above embodiment.

例えば、本発明はテレビ会議以外にも適用できる。すなわち、音声からキーワードをリアルタイムで抽出する必要があるような場合に本発明を適用することができる。また、キーワードを抽出せずに、話者のみを特定して、所定の通知処理を行うとすることもできる。このようにすれば、例えばテレビ会議で特定の話者が話し始めた時に所定の通知処理を行うことができる。 For example, the present invention can be applied to other than video conference. That is, the present invention can be applied to cases where it is necessary to extract keywords from voice in real time. In addition, it is possible to specify only a speaker and perform a predetermined notification process without extracting a keyword. In this way, for example, when a specific speaker starts speaking in a video conference, a predetermined notification process can be performed.

本発明の実施の形態にかかる聴取支援システムの構成を示すブロック図である。It is a block diagram which shows the structure of the listening assistance system concerning embodiment of this invention. 本発明の実施の形態にかかるサーバコンピュータのハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of the server computer concerning embodiment of this invention. 本発明の実施の形態にかかるクライアントコンピュータのハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of the client computer concerning embodiment of this invention. 本発明の実施の形態にかかる聴取支援システムの機能ブロック図である。It is a functional block diagram of the listening assistance system concerning an embodiment of the invention. 本発明の実施の形態にかかるディスプレイにおける表示例である。It is an example of a display in the display concerning embodiment of this invention. 本発明の実施の形態にかかるディスプレイにおける表示例である。It is an example of a display in the display concerning embodiment of this invention. 本発明の実施の形態にかかるディスプレイにおける表示例である。It is an example of a display in the display concerning embodiment of this invention. 本発明の実施の形態にかかる処理のフロー図である。It is a flowchart of the process concerning embodiment of this invention. 本発明の実施の形態にかかるデータベースに記憶されるテーブルである。It is a table memorize | stored in the database concerning embodiment of this invention.

Explanation of symbols

１聴取支援システム、２通信ネットワーク、３通信装置、４サーバコンピュータ、５録画装置、６ディスプレイ、７クライアントコンピュータ、８カメラ、９マイクロフォン、４０バス、４１，７１制御部、４２，７２主記憶部、４３，７３入出力記憶部、４４，７４通信部、４５データベース、４６，７７副記憶部、７５操作部、７６表示部、８０通信開始部、８１再生指示部、８２通知部、８３話者設定部、８４キーワード設定部、８５再生動作設定部、８６通知設定部、８７再生部、８８照合部、８９音声認識部、９０記録媒体、９１データ取得部。
1 listening support system, 2 communication network, 3 communication device, 4 server computer, 5 recording device, 6 display, 7 client computer, 8 camera, 9 microphone, 40 bus, 41, 71 control unit, 42, 72 main storage unit, 43, 73 I / O storage unit, 44, 74 communication unit, 45 database, 46, 77 secondary storage unit, 75 operation unit, 76 display unit, 80 communication start unit, 81 playback instruction unit, 82 notification unit, 83 speaker setting Unit, 84 keyword setting unit, 85 reproduction operation setting unit, 86 notification setting unit, 87 reproduction unit, 88 collation unit, 89 voice recognition unit, 90 recording medium, 91 data acquisition unit.

Claims

Keyword storage means for storing keyword data indicating a keyword and registrant data specifying a registrant who registered the keyword data in association with each other,
Processing target acquisition means for acquiring audio data as a processing target;
A voice recognition unit for recognizing a voice represented by the voice data acquired by the processing target acquisition unit as a character string, and a keyword indicated by the keyword data stored by the keyword storage unit in the character string Notification means for performing a predetermined notification process for the registrant indicated by the registrant data stored in the keyword storage means in association with the keyword data;
Listening support program characterized by causing a computer to function as

In the listening support program according to claim 1,
Speaker data indicating a speaker and registrant data indicating a registrant who registered the speaker data are stored in association with each other, and the voice data acquired by the processing target acquisition unit is stored. Speaker identification means for identifying the speaker of the voice to be represented,
Further functioning the computer as
The notification means includes speaker data indicating a speaker identified as a voice speaker recognized by the voice recognition means as a character string that is the keyword, in speaker data stored by the speaker storage means. If included, the predetermined notification process for the registrant indicated by the registrant data stored in the speaker storage means in association with the speaker data is performed.
Listening support program characterized by this.

In the listening support program according to claim 1,
A speaker identification unit that identifies a speaker of the voice represented by the voice data acquired by the processing target acquisition unit;
Further functioning the computer as
The keyword storage means stores the keyword data, speaker data indicating a speaker, and the registrant data in association with each other,
The notification means is a case in which speaker data indicating the specified speaker is included in speaker data stored by the speaker storage means, and the character string is stored in the character string by the speaker storage means. When the keyword indicated by the keyword data stored in association with is included, the predetermined notification process is performed.
Listening support program characterized by this.

In the listening support program according to claim 2 or 3,
The audio data is acquired by the processing target acquisition unit along with an image including an image of an audio speaker represented by the audio data,
The speaker specifying means specifies the speaker of the voice by pattern recognition of the video of the speaker included in the video.
Listening support program characterized by this.

In the listening support program according to claim 2 or 3,
The speaker specifying means specifies the speaker of the voice by pattern recognition of the voice print of the voice;
Listening support program characterized by this.

The listening support program according to any one of claims 1 to 5,
Audio signal storage means for storing the sound acquired by the processing object acquisition means;
When the character string includes the keyword indicated by the keyword data stored by the keyword storage unit, the generation timing in the voice of the voice recognized by the voice recognition unit as the character string that is the keyword is indicated. Timing information storage means for storing information,
Audio signal reproduction means for reproducing the audio stored by the audio signal storage means in accordance with the information indicating the generation timing;
As a listening support program, the computer further functions as:

Keyword storage means for storing keyword data indicating a keyword and registrant data for specifying a registrant who registered the keyword data in association with each other;
Processing target acquisition means for acquiring voice data representing voice as a processing target;
Voice recognition means for recognizing a voice represented by the voice data acquired by the processing target acquisition means as a character string;
When the character string includes a keyword indicated by the keyword data stored by the keyword storage means, a predetermined notification to the registrant indicated by the registrant data stored by the keyword storage means in association with the keyword data Notification means for processing;
A listening support apparatus comprising:

A keyword storage step for storing keyword data indicating a keyword and registrant data for identifying a registrant who has registered the keyword data in association with each other;
A processing target acquisition step for acquiring voice data representing a voice as a processing target;
A voice recognition step for recognizing the voice represented by the voice data acquired in the processing target acquisition step as a character string;
When the character string includes a keyword indicated by the keyword data stored in the keyword storage step, a predetermined notification to the registrant indicated by the registrant data stored in the keyword storage step in association with the keyword data A notification step for processing;
Listening support method characterized by including.