JP7471979B2

JP7471979B2 - Meeting Support System

Info

Publication number: JP7471979B2
Application number: JP2020164423A
Authority: JP
Inventors: 崇資山内; 直亮住田; 雅樹中塚; 雄一吉田; 一博中臺; 一也眞浦; 恭佑日根野; 昇三横尾
Original assignee: Honda Motor Co Ltd; Honda Sun Co Ltd
Current assignee: Honda Motor Co Ltd; Honda Sun Co Ltd
Priority date: 2020-09-30
Filing date: 2020-09-30
Publication date: 2024-04-22
Anticipated expiration: 2040-09-30
Also published as: JP2022056593A

Description

本発明は、会議支援システム、会議支援方法およびプログラムに関する。 The present invention relates to a conference support system, a conference support method, and a program.

健聴者と聴覚障がい者とが一緒に会議に参加する会議において、各発表者の発話内容と聴覚障がい者が端末を操作して入力したテキスト化して、テキストを表示装置と各利用者が所有する端末に表示することが提案されている（例えば特許文献１参照）。特許文献１に記載の技術では、聴覚障がい者がテキストによって会話に参加した場合に、他の会議参加者の発話を待機させるように親機を制御し、発言を待機させる情報を聴覚障がい者が使用する子機に送信するようにしている。 In a conference where hearing people and hearing-impaired people participate together, it has been proposed to convert the speech of each presenter and the speech input by the hearing-impaired person into text by operating a terminal, and to display the text on a display device and on the terminal owned by each user (see, for example, Patent Document 1). In the technology described in Patent Document 1, when a hearing-impaired person participates in a conversation using text, the parent unit is controlled to put the other conference participants on hold for speech, and information on the hold for speech is sent to the child unit used by the hearing-impaired person.

特開２０１９－１７９４８０号公報JP 2019-179480 A

しかしながら、従来技術では、健聴者が発表者として聴衆の方を向いて発言しているときには、表示装置の画面を確認することなく話してしまうことがあった。 However, with conventional technology, when a hearing person is presenting and facing the audience, they may speak without checking the display screen.

本発明は、上記の問題点に鑑みてなされたものであって、発表者および聴衆に対して分かりやすくテキスト入力が行われていることを気づかせることができる会議支援システム、会議支援方法およびプログラムを提供することを目的とする。 The present invention has been made in consideration of the above problems, and aims to provide a meeting support system, meeting support method, and program that can clearly make the presenter and the audience aware that text input is taking place.

（１）上記目的を達成するため、本発明の一態様に係る会議支援システムは、テキスト入力部を備えた端末と、画像を表示する表示装置と、前記端末と接続し、前記端末から入力されたテキスト情報および収音部から入力された音声信号をテキスト化した情報を前記表示装置に表示させる会議支援装置と、を備え、前記会議支援装置は、前記端末によってテキスト入力が開始された際、前記表示装置の全体の表示を所定時間変更する。 (1) In order to achieve the above object, a conference support system according to one aspect of the present invention includes a terminal having a text input unit, a display device that displays images, and a conference support device that is connected to the terminal and causes the display device to display text information input from the terminal and information that is a text version of an audio signal input from an audio pickup unit, and when text input is started by the terminal, the conference support device changes the overall display of the display device for a predetermined period of time.

（２）また、本発明の一態様に係る会議支援システムにおいて、前記会議支援装置は、前記端末によってテキスト入力が開始された際、前記表示装置の全体の表示の輝度を変更する、表示色を変更する、彩度を変更する、およびコントラストを変更するうちのすくなくとも１つによって、前記表示装置の全体の表示を所定時間変更する、ようにしてもよい。 (2) In addition, in a conference support system according to one aspect of the present invention, when text input is started by the terminal, the conference support device may change the overall display of the display device for a predetermined period of time by at least one of changing the brightness of the overall display of the display device, changing the display color, changing the saturation, and changing the contrast.

（３）また、本発明の一態様に係る会議支援システムにおいて、前記会議支援装置は、前記音声信号に対して発話区間を検出し、前記表示装置の全体の表示を所定時間変更した後も発話が継続している場合、再度、前記表示装置の全体の表示を、所定時間変更する、ようにしてもよい。 (3) In addition, in a conference support system according to one aspect of the present invention, the conference support device may detect a speech section in the audio signal, and if speech continues after changing the overall display of the display device for a predetermined period of time, change the overall display of the display device again for a predetermined period of time.

（４）上記目的を達成するため、本発明の一態様に係る会議支援方法は、テキスト入力部を備えた端末と、画像を表示する表示装置と、前記端末と接続し、前記端末から入力されたテキスト情報および収音部から入力された音声をテキスト化した情報を前記表示装置に表示させる会議支援装置と、を有する会議支援システムにおける会議支援方法であって、前記会議支援装置が、前記端末によってテキスト入力が開始された際、前記表示装置の全体の表示を所定時間変更する。 (4) In order to achieve the above object, a conference support method according to one aspect of the present invention is a conference support method in a conference support system having a terminal equipped with a text input unit, a display device that displays images, and a conference support device that is connected to the terminal and causes the display device to display text information input from the terminal and information that has been converted from audio input from an audio pickup unit into text, and when text input is started by the terminal, the conference support device changes the overall display of the display device for a predetermined period of time.

（５）上記目的を達成するため、本発明の一態様に係るプログラムは、テキスト入力部を備えた端末と、画像を表示する表示装置と、前記端末と接続し、前記端末から入力されたテキスト情報および収音部から入力された音声をテキスト化した情報を前記表示装置に表示させる会議支援装置と、を有する会議支援システムにおける会議支援装置のコンピュータに、前記端末によってテキスト入力が開始された際、前記表示装置の全体の表示を所定時間変更させる。 (5) In order to achieve the above object, a program according to one aspect of the present invention is a conference support system having a terminal equipped with a text input unit, a display device that displays images, and a conference support device that is connected to the terminal and causes the display device to display text information input from the terminal and information that has been converted from audio input from an audio pickup unit into text. The program causes a computer of the conference support device in the conference support system to change the overall display of the display device for a predetermined period of time when text input is started by the terminal.

（１）～（５）によれば、発表者および聴衆に対して分かりやすくテキスト入力が行われていることを気づかせることができる。 By using (1) to (5), the presenter and the audience can be made aware that text is being entered in an easy-to-understand manner.

実施形態に係る発話が不自由な人が端末に入力を行っていない場合を示す図である。FIG. 13 is a diagram illustrating a case where a person with a speech impediment is not making an input to a terminal according to the embodiment. 実施形態に係る発話が不自由な人が端末に入力を行った場合を示す図である。11 is a diagram illustrating a case where a person with a speech impediment inputs information into a terminal according to the embodiment. FIG. 実施形態に係る会議支援システムの構成例を示すブロック図である。1 is a block diagram showing an example of the configuration of a conference support system according to an embodiment; 実施形態に係る端末に表示させる画像例を示す図である。FIG. 11 is a diagram showing an example of an image displayed on a terminal according to the embodiment; 実施形態に係る第１表示装置に表示される画像例を示す図である。5A to 5C are diagrams illustrating an example of an image displayed on a first display device according to the embodiment. 実施形態に係るテキスト入力が開始された際の第１表示装置に表示される画像の変更例を示す図である。13A and 13B are diagrams illustrating examples of changes to the image displayed on the first display device when text input is started according to the embodiment. 実施形態に係る第１表示装置の表示変更期間の例を示す図である。6 is a diagram showing an example of a display change period of the first display device according to the embodiment; FIG. 実施形態に係る会議支援装置の処理手順のフローチャートである。4 is a flowchart of a processing procedure of the conference supporting device according to the embodiment. 会議支援システムの変形例の構成例を示すブロック図である。FIG. 13 is a block diagram showing a configuration example of a modified example of the conference support system.

以下、本発明の実施の形態について図面を参照しながら説明する。なお、以下の説明に用いる図面では、各部材を認識可能な大きさとするため、各部材の縮尺を適宜変更している。 The following describes an embodiment of the present invention with reference to the drawings. Note that in the drawings used in the following description, the scale of each component has been appropriately altered so that each component is of a recognizable size.

［会議支援システムの概要、本実施形態の概要］
まず、会議支援システムの概要、本実施形態の概要を説明する。
図１は、本実施形態に係る発話が不自由な人が端末に入力を行っていない場合を示す図である。
本実施形態の会議支援システムは、例えば２人以上が参加して行われる会議で用いられる。参加者ＵＳ１～ＵＳ３のうち、発話が不自由な人（例えば聴覚障がい者）ＵＳ２が会議に参加していてもよい。なお、参加者は、全員が同じ会議室にいなくてもよく、例えばネットワークＮＷを介して、別の会議室や自宅等から参加してもよい。 [Overview of the conference support system, overview of this embodiment]
First, an overview of the conference support system and the present embodiment will be described.
FIG. 1 is a diagram showing a case in which a person with a speech disability according to the present embodiment is not making any input to a terminal.
The conference support system of this embodiment is used in a conference with, for example, two or more participants. Among the participants US1 to US3, a person with a speech impediment (e.g., a person with a hearing impairment) US2 may also participate in the conference. Note that the participants do not all need to be in the same conference room, and may participate from another conference room or from home, for example, via a network NW.

発話可能な参加者ＵＳ１、ＵＳ３は、参加者毎に収音部１１を装着する。また、発話が不自由な人ＵＳ２は、端末（スマートフォン、タブレット端末、パーソナルコンピュータ等）２０を所持している。会議支援装置３０は、参加者の発話した音声信号に対して音声認識を行いテキスト化して、第１表示装置６０（表示装置）と端末２０にテキストを表示させる。また、第２表示装置７０には、説明用の資料を表示させるＰＣ（パーソナルコンピュータ等）８０が接続されている。参加者ＵＳ２は、例えばテーブルＴｂに置いて端末２を利用する。 Each of the participants US1 and US3 who are able to speak wears a sound pickup unit 11. The person US2 who is unable to speak carries a terminal (smartphone, tablet terminal, personal computer, etc.) 20. The conference support device 30 performs voice recognition on the voice signals spoken by the participants, converts them into text, and displays the text on the first display device 60 (display device) and the terminal 20. The second display device 70 is connected to a PC (personal computer, etc.) 80 that displays explanatory materials. Participant US2 uses the terminal 2, for example, by placing it on a table Tb.

図２は、本実施形態に係る発話が不自由な人が端末に入力を行った場合を示す図である。
図２のように、発話が不自由な人ＵＳ２が、端末２０を使ってテキスト入力を開始すると、第１表示装置６０の画面の輝度または色が第１所定時間（所定時間）変化する。
これにより、本実施形態よれば、発話している参加者ＵＳ１に、端末２０によって入力が行われていることを気づかせることができる。参加者ＵＳ１は、例えば、現在発話しているフレーズの発話が完了した時点で、端末２０によって入力が終了するまで発話を中断する。これにより、本実施形態によれば、端末２０による入力を待機することができる。 FIG. 2 is a diagram showing a case in which a person with a speech disability inputs information into a terminal according to this embodiment.
As shown in FIG. 2, when the speech-impaired person US2 starts inputting text using the terminal 20, the brightness or color of the screen of the first display device 60 changes for a first predetermined time (predetermined period of time).
As a result, according to this embodiment, it is possible to make the participant US1 who is speaking aware that input is being made by the terminal 20. For example, when the participant US1 finishes speaking the phrase he or she is currently speaking, he or she stops speaking until input is completed by the terminal 20. As a result, according to this embodiment, it is possible to wait for input by the terminal 20.

［会議支援システムの構成例］
次に、会議支援システムの構成例を説明する。
図３は、本実施形態に係る会議支援システム１の構成例を示すブロック図である。図３に示すように、会議支援システム１は、収音装置１０、端末２０、会議支援装置３０、音響モデル・辞書ＤＢ４０、議事録・音声ログ記憶部５０、第１表示装置６０、第２表示装置７０、およびＰＣ８０を備える。また、端末２０は、端末２０－１、端末２０－２、・・・を備える。以下、端末２０－１、端末２０－２のうち１つを特定しない場合は、「端末２０」という。 [Example of a conference support system configuration]
Next, an example of the configuration of a conference support system will be described.
Fig. 3 is a block diagram showing an example of the configuration of a conference support system 1 according to this embodiment. As shown in Fig. 3, the conference support system 1 includes a sound collection device 10, a terminal 20, a conference support device 30, an acoustic model and dictionary DB 40, a minutes and audio log storage unit 50, a first display device 60, a second display device 70, and a PC 80. The terminals 20 include terminals 20-1, 20-2, .... Hereinafter, when one of terminals 20-1 and 20-2 is not specified, it will be referred to as "terminal 20".

収音装置１０は、収音部１１－１、収音部１１－２、収音部１１－３、・・・を備える。以下、収音部１１－１、収音部１１－２、収音部１１－３、・・・のうち１つを特定しない場合は、「収音部１１」という。 The sound collection device 10 includes a sound collection unit 11-1, a sound collection unit 11-2, a sound collection unit 11-3, etc. Hereinafter, when one of the sound collection units 11-1, 11-2, 11-3, etc. is not specified, it will be referred to as "sound collection unit 11".

端末２０は、入力部２０１、処理部２０２、表示部２０３、および通信部２０４を備える。 The terminal 20 includes an input unit 201, a processing unit 202, a display unit 203, and a communication unit 204.

会議支援装置３０は、取得部３０１、音声認識部３０２、テキスト変換部３０３（音声認識部）、係り受け解析部３０４、議事録作成部３０６、通信部３０７、認証部３０８、操作部３０９、および処理部３１０を備える。 The conference support device 30 includes an acquisition unit 301, a voice recognition unit 302, a text conversion unit 303 (voice recognition unit), a dependency analysis unit 304, a minutes creation unit 306, a communication unit 307, an authentication unit 308, an operation unit 309, and a processing unit 310.

収音装置１０と会議支援装置３０とは、有線または無線によって接続されている。端末２０と会議支援装置３０とは、有線または無線のネットワークＮＷによって接続されている。 The sound collection device 10 and the conference support device 30 are connected by wire or wirelessly. The terminal 20 and the conference support device 30 are connected by a wire or wireless network NW.

［収音装置］
収音装置１０は、利用者が発話した音声信号を収音し、収音した音声信号を会議支援装置３０に出力する。なお、収音装置１０は、１つのマイクロフォンアレイであってもよい。この場合、収音装置１０は、それぞれ異なる位置に配置されたＰ個のマイクロフォンを有する。そして、収音装置１０は、収音した音からＰチャネル（Ｐは、２以上の整数）の音声信号を生成し、生成したＰチャネルの音声信号を会議支援装置３０に出力する。 [Sound recording device]
The sound collection device 10 collects a voice signal uttered by a user and outputs the collected voice signal to the conference support device 30. The sound collection device 10 may be a single microphone array. In this case, the sound collection device 10 has P microphones arranged at different positions. The sound collection device 10 generates P channel voice signals (P is an integer equal to or greater than 2) from the collected sound and outputs the generated P channel voice signals to the conference support device 30.

収音部１１は、マイクロフォンである。収音部１１は、利用者の音声信号を収音し、収音した音声信号をアナログ信号からデジタル信号に変換して、デジタル信号に変換した音声信号を会議支援装置３０に出力する。なお、収音部１１は、アナログ信号の音声信号を会議支援装置３０に出力するようにしてもよい。 The sound collection unit 11 is a microphone. The sound collection unit 11 collects the user's voice signal, converts the collected voice signal from an analog signal to a digital signal, and outputs the converted digital voice signal to the conference support device 30. Note that the sound collection unit 11 may also be configured to output the analog voice signal to the conference support device 30.

［端末］
端末２０は、例えばスマートフォン、タブレット端末、パーソナルコンピュータ等である。端末２０は、音声出力部、モーションセンサー、ＧＰＳ（ＧｌｏｂａｌＰｏｓｉｔｉｏｎｉｎｇＳｙｓｔｅｍ；全地球測位システム）等を備えていてもよい。 [Device]
The terminal 20 is, for example, a smartphone, a tablet terminal, a personal computer, etc. The terminal 20 may include an audio output unit, a motion sensor, a Global Positioning System (GPS), etc.

入力部２０１は、例えば表示部２０３上に設けられたタッチパネル式のセンサー（含むタッチパネル用のペンシル）、またはキーボードである。入力部２０１は、利用者が入力した入力を検出し、検出した結果を処理部２０２に出力する。なお、端末２０の利用者は、テキストを、例えばタッチパネル用のペンシル等で手書きで入力するか、表示部２０３に表示されるソフトウェアキーボードを操作して入力するか、機械式のキーボードを操作して入力する。 The input unit 201 is, for example, a touch panel sensor (including a touch panel pencil) or a keyboard provided on the display unit 203. The input unit 201 detects input entered by the user and outputs the detected result to the processing unit 202. Note that the user of the terminal 20 inputs text by handwriting, for example, with a touch panel pencil, inputting by operating a software keyboard displayed on the display unit 203, or inputting by operating a mechanical keyboard.

処理部２０２は、入力部２０１が出力した結果に基づいて、テキスト入力が開始されたことを検出し、テキストが入力開始されたことを示すテキスト入力開始情報を生成し、生成したテキスト入力開始情報を通信部２０４に出力する。処理部２０２は、テキストが入力開始されたことを、例えば、文字が例えば１文字手書きされたことを検出したとき、またはキーボードで文字が一文字入力されたときに検出するようにしてもよい。なお、処理部２０２は、手書きでテキスト入力された場合、手書き文字を周知の手法で認識してテキスト化する。
処理部２０２は、入力部２０１が出力した出力した結果に応じて送信情報を生成し、生成した送信情報を通信部２０４に出力する。送信情報には、入力されたテキスト情報と端末２０を識別するための識別情報が含まれている。
処理部２０２は、通信部２０４が出力するテキスト情報を取得し、取得したテキスト情報を画像データに変換し、変換した画像データを表示部２０３に出力する。なお、表示部２０３上に表示される画像については、図４を用いて後述する。 Processing unit 202 detects that text input has started based on the result output by input unit 201, generates text input start information indicating that text input has started, and outputs the generated text input start information to communication unit 204. Processing unit 202 may detect that text input has started, for example, when it detects that one character has been handwritten or when one character has been input on a keyboard. When text has been input by handwriting, processing unit 202 recognizes the handwritten character using a known method and converts it into text.
The processing unit 202 generates transmission information in accordance with the result output by the input unit 201, and outputs the generated transmission information to the communication unit 204. The transmission information includes the input text information and identification information for identifying the terminal 20.
The processing unit 202 acquires text information output by the communication unit 204, converts the acquired text information into image data, and outputs the converted image data to the display unit 203. Note that the image displayed on the display unit 203 will be described later with reference to FIG.

表示部２０３は、処理部２０２が出力した画像データを表示する。表示部２０３は、例えば、液晶表示装置、有機ＥＬ（エレクトロルミネッセンス）表示装置、電子インク表示装置等である。 The display unit 203 displays the image data output by the processing unit 202. The display unit 203 is, for example, a liquid crystal display device, an organic EL (electroluminescence) display device, an electronic ink display device, etc.

通信部２０４は、テキスト情報または議事録の情報を会議支援装置３０から受信し、受信した受信情報を処理部２０２に出力する。通信部２０４は、処理部２０２が出力したテキスト入力開始情報、送信情報を会議支援装置３０に送信する。 The communication unit 204 receives text information or minutes information from the conference supporting device 30 and outputs the received information to the processing unit 202. The communication unit 204 transmits the text input start information and transmission information output by the processing unit 202 to the conference supporting device 30.

［音響モデル・辞書ＤＢ、議事録・音声ログ記憶部］
音響モデル・辞書ＤＢ４０には、例えば音響モデル、言語モデル、単語辞書等が格納されている。音響モデルとは、音の特徴量に基づくモデルであり、言語モデルとは、単語とその並び方の情報のモデルである。また、単語辞書とは、多数の語彙による辞書であり、例えば大語彙単語辞書である。なお、会議支援装置３０は、音声認識辞書１３に格納されていない単語等を、音響モデル・辞書ＤＢ４０に格納して更新するようにしてもよい。 [Acoustic model/dictionary DB, minutes/speech log storage unit]
The acoustic model/dictionary DB 40 stores, for example, an acoustic model, a language model, a word dictionary, etc. An acoustic model is a model based on sound features, and a language model is a model of information on words and their arrangement. A word dictionary is a dictionary with a large vocabulary, for example a large vocabulary word dictionary. The conference support device 30 may store words and the like that are not stored in the speech recognition dictionary 13 in the acoustic model/dictionary DB 40 for updating.

議事録・音声ログ記憶部５０は、議事録（含む音声信号）を記憶する。 The minutes/audio log storage unit 50 stores minutes (including audio signals).

第１表示装置６０は、会議支援装置３０が出力した画像データを表示する。第１表示装置６０は、例えば液晶表示装置、有機ＥＬ（エレクトロルミネッセンス）表示装置、電子インク表示装置等である。なお、第１表示装置６０は、会議支援装置３０が備えていてもよい。 The first display device 60 displays the image data output by the conference support device 30. The first display device 60 is, for example, a liquid crystal display device, an organic EL (electroluminescence) display device, an electronic ink display device, etc. The first display device 60 may be provided in the conference support device 30.

第２表示装置７０は、ＰＣ８０が出力する画像データを表示する。第２表示装置７０は、例えば液晶表示装置、有機ＥＬ（エレクトロルミネッセンス）表示装置、電子インク表示装置等である。なお、第２表示装置７０は、ＰＣ８０が備えていてもよい。 The second display device 70 displays image data output by the PC 80. The second display device 70 is, for example, a liquid crystal display device, an organic EL (electroluminescence) display device, an electronic ink display device, or the like. The second display device 70 may be provided in the PC 80.

ＰＣ８０は、例えば、パーソナルコンピュータ、スマートフォン、タブレット端末等のうちのいずれか１つである。 The PC 80 is, for example, one of a personal computer, a smartphone, a tablet terminal, etc.

［会議支援装置］
会議支援装置３０は、例えばパーソナルコンピュータ、サーバ、スマートフォン、タブレット端末等のうちのいずれかである。なお、会議支援装置３０は、収音装置１０がマイクロフォンアレイの場合、音源定位部、音源分離部、および音源同定部をさらに備える。会議支援装置３０は、参加者によって発話された音声信号を、例えば所定の期間毎に音声認識してテキスト化する。そして、会議支援装置３０は、テキスト化した発話内容のテキスト情報を、参加者の端末２０それぞれに送信する。 [Conference Support Device]
The conference support device 30 is, for example, any one of a personal computer, a server, a smartphone, a tablet terminal, etc. When the sound collection device 10 is a microphone array, the conference support device 30 further includes a sound source localization unit, a sound source separation unit, and a sound source identification unit. The conference support device 30 converts the voice signals uttered by the participants into text by performing voice recognition, for example, at predetermined intervals. Then, the conference support device 30 transmits the text information of the converted utterances to each of the terminals 20 of the participants.

取得部３０１は、収音部１１が出力する音声信号を取得し、取得した音声信号を音声認識部３０２に出力する。なお、取得した音声信号がアナログ信号の場合、取得部３０１は、アナログ信号をデジタル信号に変換し、デジタル信号に変換した音声信号を音声認識部３０２に出力する。 The acquisition unit 301 acquires the audio signal output by the sound collection unit 11, and outputs the acquired audio signal to the voice recognition unit 302. If the acquired audio signal is an analog signal, the acquisition unit 301 converts the analog signal into a digital signal, and outputs the converted digital audio signal to the voice recognition unit 302.

音声認識部３０２は、収音部１１が複数の場合、収音部１１を使用する話者毎に音声認識を行う。
音声認識部３０２は、取得部３０１が出力する音声信号を取得する。音声認識部３０２は、取得部３０１が出力した音声信号から発話区間の音声信号を検出する。発話区間の検出は、例えば所定のしきい値以上の音声信号を発話区間として検出してもよく、収音部１１のオン状態とオフ状態を検出してもよい。なお、音声認識部３０２は、発話区間の検出を周知の他の手法を用いて行ってもよい。音声認識部３０２は、検出した発話区間の音声信号に対して、音響モデル・辞書ＤＢ４０を参照して、周知の手法を用いて音声認識を行う。なお、音声認識部３０２は、例えば特開２０１５－６４５５４号公報に開示されている手法等を用いて音声認識を行う。音声認識部３０２は、認識した認識結果と音声信号をテキスト変換部３０３に出力する。なお、音声認識部３０２は、認識結果と音声信号とを、例えば１文毎、または発話句間毎、または話者毎に対応つけて出力する。 When there are multiple sound collection units 11, the voice recognition unit 302 performs voice recognition for each speaker who uses the sound collection unit 11.
The voice recognition unit 302 acquires the voice signal output by the acquisition unit 301. The voice recognition unit 302 detects a voice signal of a speech section from the voice signal output by the acquisition unit 301. The detection of the speech section may be performed by detecting a voice signal equal to or greater than a predetermined threshold value as the speech section, or by detecting the on and off states of the sound collection unit 11. The voice recognition unit 302 may detect the speech section using other known methods. The voice recognition unit 302 performs voice recognition on the voice signal of the detected speech section by referring to the acoustic model/dictionary DB 40 using a known method. The voice recognition unit 302 performs voice recognition using, for example, a method disclosed in Japanese Patent Application Laid-Open No. 2015-64554. The voice recognition unit 302 outputs the recognized recognition result and the voice signal to the text conversion unit 303. The voice recognition unit 302 outputs the recognition result and the voice signal in association with each other, for example, for each sentence, for each speech phrase, or for each speaker.

テキスト変換部３０３は、音声認識部３０２が出力した認識結果に基づいて、テキストに変換する。テキスト変換部３０３は、変換したテキスト情報と音声信号を係り受け解析部３０４に出力する。なお、テキスト変換部３０３は、「あー」、「えーと」、「えー」、「まあ」等の間投詞を削除してテキストに変換するようにしてもよい。 The text conversion unit 303 converts the speech into text based on the recognition result output by the speech recognition unit 302. The text conversion unit 303 outputs the converted text information and the speech signal to the dependency analysis unit 304. Note that the text conversion unit 303 may convert the speech into text by deleting interjections such as "ah," "um," "eh," and "well."

係り受け解析部３０４は、テキスト変換部３０３が出力したテキスト情報に対して形態素解析と係り受け解析を行う。係り受け解析には、例えば、Ｓｈｉｆｔ－ｒｅｄｕｃｅ法や全域木の手法やチャンク同定の段階適用手法においてＳＶＭ（ＳｕｐｐｏｒｔＶｅｃｔｏｒＭａｃｈｉｎｅｓ）を用いる。係り受け解析部３０４は、解析した結果に基づいて、修正が必要な場合は、音声認識したテキスト情報を音響モデル・辞書ＤＢ４０を参照して修正して議事録・音声ログ記憶部５０に記憶させる。係り受け解析部３０４は、係り受け解析した結果のテキスト情報と音声信号を議事録作成部３０６に出力する。 The dependency analysis unit 304 performs morphological analysis and dependency analysis on the text information output by the text conversion unit 303. For dependency analysis, for example, SVM (Support Vector Machines) is used in the shift-reduce method, the spanning tree method, or the stepwise application method of chunk identification. If correction is required based on the analysis results, the dependency analysis unit 304 corrects the voice-recognized text information by referring to the acoustic model and dictionary DB 40 and stores it in the minutes and voice log storage unit 50. The dependency analysis unit 304 outputs the text information and voice signal resulting from the dependency analysis to the minutes creation unit 306.

議事録作成部３０６は、係り受け解析部３０４が出力したテキスト情報と音声信号に基づいて、発表者等である発話者毎に分けて、議事録を作成する。議事録作成部３０６は、作成した議事録と対応する音声信号を議事録・音声ログ記憶部５０に記憶させる。なお、議事録作成部３０６は、「あー」、「えーと」、「えー」、「まあ」等の間投詞を削除して議事録を作成するようにしてもよい。 The minutes creation unit 306 creates minutes, dividing them into groups for each speaker, such as a presenter, based on the text information and audio signals output by the dependency analysis unit 304. The minutes creation unit 306 stores the created minutes and the audio signals corresponding to them in the minutes/audio log storage unit 50. The minutes creation unit 306 may also create the minutes by deleting interjections such as "ah," "um," "eh," and "well."

通信部３０７は、端末２０と情報の送受信を行う。端末２０から受信する情報には、例えば、会議への参加要請、テキスト入力開始情報、テキスト送信情報、過去の議事録の送信を要請する指示情報等が含まれている。通信部３０７は、端末２０から受信した参加要請から、例えば、端末２０を識別するための識別情報を抽出し、抽出した識別情報を認証部３０８に出力する。識別情報は、例えば、端末２０のシリアル番号、ＭＡＣアドレス（ＭｅｄｉａＡｃｃｅｓｓＣｏｎｔｒｏｌａｄｄｒｅｓｓ）、ＩＰ（ＩｎｔｅｒｎｅｔＰｒｏｔｏｃｏｌ）アドレス等である。通信部３０７は、認証部３０８が通信参加を許可する指示を出力した場合、会議に参加要請した端末２０との通信を行う。通信部３０７は、認証部３０８が通信参加を許可しない指示を出力した場合、会議に参加要請した端末２０との通信を行わない。通信部３０７は、受信した情報を処理部３１０に出力する。通信部３０７は、処理部３１０が出力したテキスト情報または過去の議事録情報等を、参加要請のあった端末２０に送信する。 The communication unit 307 transmits and receives information to and from the terminal 20. The information received from the terminal 20 includes, for example, a request to participate in a conference, text input start information, text transmission information, and instruction information requesting transmission of past minutes. The communication unit 307 extracts, for example, identification information for identifying the terminal 20 from the participation request received from the terminal 20, and outputs the extracted identification information to the authentication unit 308. The identification information is, for example, the serial number of the terminal 20, a Media Access Control address (MAC address), an Internet Protocol (IP) address, etc. When the authentication unit 308 outputs an instruction to permit communication participation, the communication unit 307 communicates with the terminal 20 that requested participation in the conference. When the authentication unit 308 outputs an instruction not to permit communication participation, the communication unit 307 does not communicate with the terminal 20 that requested participation in the conference. The communication unit 307 outputs the received information to the processing unit 310. The communication unit 307 transmits the text information or past minutes information output by the processing unit 310 to the terminal 20 that has requested participation.

認証部３０８は、通信部３０７が出力した識別情報を受け取り、通信を許可するか否か判別する。なお、会議支援装置３０は、例えば、会議への参加者が使用する端末２０の登録を受け付け、認証部３０８に登録しておく。認証部３０８は、判別結果に応じて、通信参加を許可する指示か、通信参加を許可しない指示を通信部３０７に出力する。 The authentication unit 308 receives the identification information output by the communication unit 307 and determines whether or not to permit communication. The conference support device 30, for example, accepts registration of the terminals 20 used by participants in the conference and registers them in the authentication unit 308. Depending on the result of the determination, the authentication unit 308 outputs to the communication unit 307 an instruction to permit participation in communication or an instruction not to permit participation in communication.

操作部３０９は、例えばキーボード、マウス、第１表示装置６０上に設けられているタッチパネルセンサー等である。操作部３０９は、利用者の操作結果を検出して、検出した操作結果を処理部３１０に出力する。 The operation unit 309 is, for example, a keyboard, a mouse, a touch panel sensor provided on the first display device 60, etc. The operation unit 309 detects the result of the user's operation and outputs the detected operation result to the processing unit 310.

処理部３１０は、端末２０からテキスト入力開始情報を受信したとき、第１表示装置６０に表示させる画像を変更する。処理部３１０は、表示の変更を、例えばバックライトを暗くする（輝度を落とす）か、背景を黒等に変更して画面を暗くするか明るくするか、彩度を高くするか低くするか、コントラストを変更して暗くするか明るくする等によって行う。処理部３１０は、第１表示装置６０に表示させる画像を変更した後、例えば第１所定時間の経過後に、第１表示装置６０の画像変更を終了する。
処理部３１０は、過去の議事録の送信を要請する指示情報に応じて議事録・音声ログ記憶部５０から議事録を読み出し、読み出した議事録の情報を通信部３０７に出力する。なお、議事録の情報には、話者を示す情報、係り受け解析した結果を示す情報等が含まれていてもよい。 When the processing unit 310 receives text input start information from the terminal 20, it changes the image to be displayed on the first display device 60. The processing unit 310 changes the display, for example, by darkening the backlight (reducing the brightness), changing the background to black or the like to make the screen darker or brighter, increasing or decreasing the saturation, changing the contrast to make it darker or brighter, etc. After changing the image to be displayed on the first display device 60, the processing unit 310 ends the image change of the first display device 60, for example, after a first predetermined time has elapsed.
The processing unit 310 reads the minutes from the minutes/voice log storage unit 50 in response to instruction information requesting transmission of past minutes, and outputs the read information of the minutes to the communication unit 307. Note that the information of the minutes may include information indicating the speaker, information indicating the result of dependency analysis, etc.

なお、収音装置１０がマイクロフォンアレイの場合、会議支援装置３０は、音源定位部、音源分離部、および音源同定部をさらに備える。この場合、会議支援装置３０は、取得部３０１が取得した音声信号に対して予め生成した伝達関数を用いて音源定位部が音源定位を行う。そして、会議支援装置３０は、音源定位部が定位して結果を用いて話者同定を行う。会議支援装置３０は、音源定位部が定位して結果を用いて、取得部３０１が取得した音声信号に対して音源分離を行う。そして、会議支援装置３０の音声認識部３０２は、分離された音声信号に対して発話区間の検出と音声認識を行う（例えば特開２０１７－９６５７号公報参照）。また、会議支援装置３０は、残響音抑圧処理を行うようにしてもよい。 When the sound collection device 10 is a microphone array, the conference support device 30 further includes a sound source localization unit, a sound source separation unit, and a sound source identification unit. In this case, the conference support device 30 has the sound source localization unit perform sound source localization using a transfer function generated in advance for the audio signal acquired by the acquisition unit 301. Then, the conference support device 30 performs speaker identification using the localization result of the sound source localization unit. The conference support device 30 performs sound source separation on the audio signal acquired by the acquisition unit 301 using the localization result of the sound source localization unit. Then, the voice recognition unit 302 of the conference support device 30 performs speech section detection and voice recognition on the separated audio signal (see, for example, JP 2017-9657 A). The conference support device 30 may also perform reverberation suppression processing.

［端末に表示される画像例］
図４は、本実施形態に係る端末２０に表示させる画像例を示す図である。
テキスト表示欄ｇ１１～ｇ１３は、既に発話された発話を音声認識したテキスト、または入力されたテキストの画像である。
テキスト入力欄ｇ１４は、端末２０の利用者がテキスト入力を行う領域である。
送信ボタン画像ｇ１５は、端末２０の利用者が、テキスト入力を終了し、入力したテキストを送信する際に選択するボタン画像である。 [Example of image displayed on device]
FIG. 4 is a diagram showing an example of an image displayed on the terminal 20 according to the present embodiment.
The text display fields g11 to g13 are images of text obtained by voice recognition of speech that has already been spoken, or input text.
The text input field g14 is an area where the user of the terminal 20 inputs text.
The send button image g15 is a button image that is selected by the user of the terminal 20 when completing text input and sending the inputted text.

なお、図４のように、表示画像には、発話者に対応するアイコンｇ１６、発話者の名前画像ｇ１７、発話時刻画像ｇ１８、テキスト入力を開始する際に利用者が選択するテキスト入力開始ボタン画像ｇ１９等が含まれていてもよい。なお、テキスト入力開始ボタン画像ｇ１９と送信ボタン画像ｇ１５とは、切り替えて表示するようにしてもよい。 As shown in FIG. 4, the display image may include an icon g16 corresponding to the speaker, a speaker name image g17, a speaking time image g18, and a text input start button image g19 selected by the user when starting text input. The text input start button image g19 and the send button image g15 may be displayed in a switched manner.

［第１表示装置に表示される画像例］
図５は、本実施形態に係る第１表示装置６０に表示される画像例を示す図である。 [Example of image displayed on first display device]
FIG. 5 is a diagram showing an example of an image displayed on the first display device 60 according to this embodiment.

領域ｇ１００の画像は、参加者情報編集を行う領域である。
領域ｇ１０１は、参加者情報の領域である。アイコン画像ｇ１０２は、参加者に対応した画像である。名前画像ｇ１０３は、参加者の名前の画像である。マイクロフォン番号画像ｇ１０５は、参加者が使用する収音部１１の番号（または識別情報）の画像である。端末番号画像ｇ１０６は、参加者が使用する端末２０の番号（または識別情報）の画像である。 The image in area g100 is an area where participant information is edited.
The area g101 is an area for participant information. The icon image g102 is an image corresponding to a participant. The name image g103 is an image of the name of the participant. The microphone number image g105 is an image of the number (or identification information) of the sound pickup unit 11 used by the participant. The terminal number image g106 is an image of the number (or identification information) of the terminal 20 used by the participant.

領域ｇ２００の画像は、発話されたたテキストまたは入力されたテキストを表示、または議事録を表示する領域である。なお、図５では、ログイン中の状態を示している。
ボタン画像ｇ２０１は、ログイン／ログアウトの画像である。
ボタン画像ｇ２０２は、会議支援システム１の開始／終了のボタンである。表示画像ｇ２０３は、会議支援システム１の使用中に点灯する画像である。
ボタン画像ｇ２０４は、議事録・音声ログ記憶部５０が記憶する議事録の表示や音声信号の再生を行う画像である。 The image in the area g200 is an area for displaying spoken text, input text, or minutes of a meeting. Note that Fig. 5 shows a state during login.
The button image g201 is a login/logout image.
The button image g202 is a button for starting/ending the conference support system 1. The display image g203 is an image that lights up while the conference support system 1 is being used.
The button image g204 is an image for displaying the minutes stored in the minutes/audio log storage unit 50 and playing back audio signals.

アイコン画像ｇ２１１は、参加者のうち、発話した人に対応する画像である。
符号ｇ２１２、ｇ２２１、ｇ２３１は、参加者が発話した内容を音声認識したテキスト情報である。
発話時刻画像ｇ２１４は、参加者が発話またはテキスト入力した時刻を示す情報である。
名前画像ｇ２１５は、第１の参加者の名前の画像である。 Icon image g211 is an image corresponding to the participant who has spoken.
Reference characters g212, g221, and g231 denote text information obtained by voice recognition of the contents of speech by the participants.
The speech time image g214 is information indicating the time when the participant spoke or input text.
Name image g215 is an image of the name of the first participant.

なお、図５に示した画像は一例であり、第１表示装置６０上に表示される画像は、これに限らない。例えば、会議中には、領域ｇ２００の画像のみを表示するようにしてもよい。 Note that the image shown in FIG. 5 is an example, and the image displayed on the first display device 60 is not limited to this. For example, during a meeting, only the image of area g200 may be displayed.

［第１表示装置の画像例］
次に、テキスト入力が開始された際の第１表示装置６０に表示される画像の変更例を説明する。
図６は、本実施形態に係るテキスト入力が開始された際の第１表示装置６０に表示される画像の変更例を示す図である。なお、図６では、表示されている画像のうち、１つのテキスト画像を示し、アイコン画像やボタン画像等を省略している。 [Example of image on first display device]
Next, an example of changing the image displayed on the first display device 60 when text input is started will be described.
Fig. 6 is a diagram showing an example of a change in the image displayed on the first display device 60 when text input according to this embodiment is started. Note that Fig. 6 shows one text image among the displayed images, and omits icon images, button images, etc.

テキスト入力が開始される前の画像ｇ５０１では、例えば白地に黒色でテキスト画像が表示される。
これに対して、テキスト入力が開始されたときの第１の画像ｇ５０２では、第１表示装置６０の画面全体が変更されて表示される。
または、テキスト入力が開始されたときの第２の画像ｇ５０３では、第１表示装置６０の画面全体を暗くされ、さらにテキスト画像を黒から白に変更されて表示される。 In an image g501 before text input starts, for example, a text image is displayed in black on a white background.
In contrast, in the first image g502 when text input is started, the entire screen of the first display device 60 is changed and displayed.
Alternatively, in the second image g503 when text input is started, the entire screen of the first display device 60 is darkened, and further the text image is changed from black to white and displayed.

なお、図６に示した変更例は一例であり、これに限らない。例えば、画面の２／３に対して、画面を暗くするなど、画面の一部を変更するようにしてもよい。
このように、画面を変更することで、発話している司会者等に、テキスト入力が開始されたことを気づかせることができる。 The modification example shown in Fig. 6 is merely an example, and is not limited thereto. For example, a part of the screen may be modified, such as by darkening 2/3 of the screen.
By changing the screen in this way, it is possible to make the presenter or other person who is speaking aware that text input has started.

［第１表示装置の表示変更期間］
次に、第１表示装置の表示変更期間の例を説明する。
図７は、本実施形態に係る第１表示装置６０の表示変更期間の例を示す図である。
図７において横軸は時刻（秒）である。
第１の表示変更期間は、テキスト入力開始が検出された時刻ｔ１から、第１所定時間経過の時刻ｔ２までの期間である。第１所定時間は、例えば数秒である。
また、第２の表示変更期間は、テキスト入力開始が検出された時刻ｔ１から、送信ボタンのアイコン画像が端末２０によって送信され、すなわちテキスト入力が終了した時刻ｔ３までの期間（第２所定時間）である。 [Display change period of the first display device]
Next, an example of the display change period of the first display device will be described.
FIG. 7 is a diagram showing an example of a display change period of the first display device 60 according to the present embodiment.
In FIG. 7, the horizontal axis represents time (seconds).
The first display change period is a period from time t1 when the start of text input is detected to time t2 when a first predetermined time has elapsed. The first predetermined time is, for example, several seconds.
The second display change period is the period (second specified time) from time t1 when the start of text input is detected to time t3 when the icon image of the send button is transmitted by terminal 20, i.e., when text input ends.

なお、図７に示した表示変更期間は一例であり、これに限らない。例えば、第１の表示変更期間が経過して、表示変更を例えば図６の画像ｇ５０２からｇ５０１に戻した後も発話が終了しない場合、発話者が表示変更に気づかなかった可能性がある。このような場合は、再度、第１の表示変更期間の表示変更を行うようにしてもよく、この場合、第１所定時より長い時間、表示変更を行うようにしてもよい。なお処理部３１０は、発話終了の確認を、変更を終了した後、第２所定時間経過後に行うようにしてもよい。
または、第１の表示変更期間が経過して、表示変更を戻した後も発話が終了しない場合は、第２の表示変更期間を行うようにしてもよい。 The display change period shown in FIG. 7 is an example, and is not limited thereto. For example, if the speech does not end even after the first display change period has elapsed and the display change is returned, for example, from image g502 to image g501 in FIG. 6, the speaker may not have noticed the display change. In such a case, the display change may be performed again for the first display change period, and in this case, the display change may be performed for a time longer than the first predetermined time. The processing unit 310 may check the end of the speech after the second predetermined time has elapsed after the change is completed.
Alternatively, if the speech does not end even after the first display change period has elapsed and the display change has been returned, a second display change period may be performed.

［処理手順例］
次に、会議支援装置３０の処理手順例を説明する。図８は、本実施形態に係る会議支援装置３０の処理手順のフローチャートである。なお、以下の処理例は、テキスト入力開始の際、所定時間、第１表示装置６０の画面表示を変更する例である。 [Example of processing procedure]
Next, a description will be given of an example of a processing procedure of the conference supporting device 30. Fig. 8 is a flowchart of the processing procedure of the conference supporting device 30 according to the present embodiment. Note that the following processing example is an example in which the screen display of the first display device 60 is changed for a predetermined time when text input is started.

（ステップＳ１）会議支援装置３０の処理部３１０は、利用者が操作部３０９を操作した操作結果等に基づいて、会議で使用する収音部１１や端末２０を登録する。 (Step S1) The processing unit 310 of the conference support device 30 registers the audio pickup unit 11 and the terminal 20 to be used in the conference based on the operation result of the user operating the operation unit 309, etc.

（ステップＳ２）処理部３１０は、利用者が操作部３０９を操作した操作結果等に基づいて、会議開始を検出する。 (Step S2) The processing unit 310 detects the start of the conference based on the operation result of the user operating the operation unit 309, etc.

（ステップＳ３）処理部３１０は、発話を検出する。
（ステップＳ４）会議支援装置３０の音声認識部３０２、テキスト変換部３０３等は、収音部１１によって収音された音声信号に対して、音声認識処理、テキスト変換等の処理を行う。 (Step S3) The processing unit 310 detects an utterance.
(Step S4) The voice recognition unit 302, the text conversion unit 303, etc. of the conference support device 30 perform processes such as voice recognition processing and text conversion on the voice signal collected by the sound collection unit 11.

（ステップＳ５）処理部３１０は、発話区間検出等の結果に基づいて、発話が終了したか否かを判別する。処理部３１０は、発話が終了したと判別した場合（ステップＳ５；ＹＥＳ）。ステップＳ３の処理に戻す。処理部３１０は、発話が終了していないと判別した場合（ステップＳ５；ＮＯ）。ステップＳ６の処理に進める。 (Step S5) The processing unit 310 determines whether the speech has ended based on the results of speech section detection, etc. If the processing unit 310 determines that the speech has ended (Step S5; YES), the processing returns to step S3. If the processing unit 310 determines that the speech has not ended (Step S5; NO), the processing proceeds to step S6.

（ステップＳ６）処理部３１０は、端末２０によってテキスト入力が開始されたか否かを判別する。処理部３１０は、テキスト入力が開始されたと判別した場合（ステップＳ６；ＹＥＳ）、ステップＳ７の処理に進める。処理部３１０は、テキスト入力が開始されていないと判別した場合（ステップＳ６；ＮＯ）、ステップＳ５の処理に戻す。 (Step S6) The processing unit 310 determines whether or not text input has been started by the terminal 20. If the processing unit 310 determines that text input has been started (Step S6; YES), the processing unit 310 proceeds to the process of Step S7. If the processing unit 310 determines that text input has not been started (Step S6; NO), the processing unit 310 returns to the process of Step S5.

（ステップＳ７）処理部３１０は、例えば、画面全体を暗くするように第１表示装置６０に表示させる画像変更を開始する。 (Step S7) The processing unit 310 starts changing the image displayed on the first display device 60, for example, by darkening the entire screen.

（ステップＳ８）処理部３１０は、画像変更を開始した後、所定時間が経過したか否かを判別する。処理部３１０は、所定時間が経過したと判別した場合（ステップＳ８；ＹＥＳ）、ステップＳ９の処理に進める。処理部３１０は、所定時間が経過していないと判別した場合（ステップＳ８；ＮＯ）、ステップＳ８の処理を繰り返す。 (Step S8) The processing unit 310 determines whether a predetermined time has elapsed after starting the image change. If the processing unit 310 determines that the predetermined time has elapsed (Step S8; YES), the processing unit 310 proceeds to the processing of Step S9. If the processing unit 310 determines that the predetermined time has not elapsed (Step S8; NO), the processing unit 310 repeats the processing of Step S8.

（ステップＳ９）処理部３１０は、第１表示装置６０の画面変更を終了し、元の表示に戻す。 (Step S9) The processing unit 310 ends the screen change on the first display device 60 and returns to the original display.

（ステップＳ１０）処理部３１０は、端末２０によるテキスト入力が終了したか否かを判別する。処理部３１０は、テキスト入力が終了したと判別した場合（ステップＳ１０；ＹＥＳ）、ステップＳ３の処理に戻す。処理部３１０は、テキスト入力が終了していないと判別した場合（ステップＳ１０；ＮＯ）、ステップＳ１１の処理に進める。 (Step S10) The processing unit 310 determines whether or not text input by the terminal 20 has ended. If the processing unit 310 determines that text input has ended (Step S10; YES), the processing unit 310 returns to the processing of Step S3. If the processing unit 310 determines that text input has not ended (Step S10; NO), the processing unit 310 proceeds to the processing of Step S11.

（ステップＳ１１）処理部３１０は、発話が終了したか否かを判別する。処理部３１０は、発話が終了したと判別した場合（ステップＳ１１；ＹＥＳ）、ステップＳ３の処理に戻す。処理部３１０は、発話が終了していないと判別した場合（ステップＳ１１；ＮＯ）。ステップＳ７の処理に戻す。 (Step S11) The processing unit 310 determines whether the speech has ended. If the processing unit 310 determines that the speech has ended (Step S11; YES), the processing unit 310 returns to the processing of Step S3. If the processing unit 310 determines that the speech has not ended (Step S11; NO), the processing unit 310 returns to the processing of Step S7.

ステップＳ１０とＳ１１の処理によって、テキスト入力をしていることを第１表示装置６０の表示を変更して提示しても、話者が気づいていなかった場合であっても、本実施形態によれば、再度、第１表示装置６０の表示を変更して提示することができる。 Even if the display on the first display device 60 is changed to indicate that text is being input through the processing of steps S10 and S11, and the speaker is not aware of this, according to this embodiment, the display on the first display device 60 can be changed again to indicate this.

なお、上述した処理手順は一例であり、これに限らない。例えば、テキスト入力開始の検出は、割り込み処理によって行うようにしてもよい。
また、上述した例では、発話されている際にテキスト入力が開始された場合に画面表示を変更する例を説明したが、これに限らない。処理部３１０は、発話が行われていない場合にも、第１表示装置６０の画面全体の表示を変更して、テキスト入力が開始されたことを報知するようにしてもよい。さらに、処理部３１０は、発話開始前に画面変更を行わずに、発話開始された後に、画面の変更を開始するようにしてもよい。 The above-described processing procedure is merely an example, and is not limiting. For example, the start of text input may be detected by an interrupt process.
In the above example, the screen display is changed when text input is started while speaking, but the present invention is not limited to this. Even when no speech is being made, the processing unit 310 may change the display of the entire screen of the first display device 60 to notify that text input has started. Furthermore, the processing unit 310 may not change the screen before the speech starts, but may start changing the screen after the speech starts.

以上のように、本実施形態では、端末２０によってテキスト入力が開始されたとき、第１表示装置６０の画面の変更するようにした。
また、本実施形態では、１回目の画面変更を見逃しても、発話が継続されかつテキスト入力が継続されている場合に、さらに第１表示装置６０の画面の変更を行うようにした。 As described above, in this embodiment, when text input via the terminal 20 is started, the screen of the first display device 60 is changed.
Furthermore, in this embodiment, even if the first screen change is missed, if speech and text input continue, the screen of the first display device 60 is further changed.

これにより、本実施形態によれば、発表者および聴衆に対して分かりやすくテキスト入力が行われていることを気づかせることができる。また、本実施形態によれば、テキスト入力が行われていることに気づき、発表者が発話を区切って、テキスト入力の終了を待つことができる。この結果、本実施形態によれば、聴覚障がい者等は、端末２０を使用して、発話を気にせずに、落ち着いてテキスト入力を行うことができる。
さらに、本実施形態によれば、１回目の画面変更を見逃しても、再度、第１表示装置６０の画面の変更が行われるので、テキスト入力が開始されたことを気づくことができる。 As a result, according to this embodiment, it is possible to make the presenter and the audience aware that text input is being performed in an easily understandable manner. Also, according to this embodiment, the presenter is aware that text input is being performed, and can pause his/her speech and wait for the end of text input. As a result, according to this embodiment, the hearing impaired person or the like can calmly perform text input using the terminal 20 without worrying about the speech.
Furthermore, according to this embodiment, even if the first screen change is missed, the screen of the first display device 60 is changed again, so that the user can notice that text input has started.

［変形例］
次に、変形例を説明する。
図９は、会議支援システムの変形例の構成例を示すブロック図である。図９に示すように、会議支援システム１Ａは、収音装置１０Ａ、端末２０、会議支援装置３０Ａ、音響モデル・辞書ＤＢ４０、議事録・音声ログ記憶部５０、第１表示装置６０、第２表示装置７０、およびＰＣ８０を備える。 [Modification]
Next, a modified example will be described.
9 is a block diagram showing a configuration example of a modified example of the conference support system. As shown in Fig. 9, the conference support system 1A includes a sound collection device 10A, a terminal 20, a conference support device 30A, an acoustic model and dictionary DB 40, a minutes and voice log storage unit 50, a first display device 60, a second display device 70, and a PC 80.

収音装置１０Ａは、収音部１１Ａ－１、収音部１１Ａ－２、収音部１１Ａ－３、・・・を備える。以下、収音部１１Ａ－１、収音部１１Ａ－２、収音部１１Ａ－３、・・・のうち１つを特定しない場合は、「収音部１１Ａ」という。 The sound collection device 10A includes a sound collection unit 11A-1, a sound collection unit 11A-2, a sound collection unit 11A-3, etc. Hereinafter, when one of the sound collection units 11A-1, 11A-2, 11A-3, etc. is not specified, it will be referred to as "sound collection unit 11A".

会議支援装置３０Ａは、取得部３０１、音声認識部３０２、テキスト変換部３０３（音声認識部）、係り受け解析部３０４、議事録作成部３０６、通信部３０７、認証部３０８、操作部３０９、および処理部３１０Ａを備える。 The conference support device 30A includes an acquisition unit 301, a voice recognition unit 302, a text conversion unit 303 (voice recognition unit), a dependency analysis unit 304, a minutes creation unit 306, a communication unit 307, an authentication unit 308, an operation unit 309, and a processing unit 310A.

収音部１１Ａは、報知部１１１Ａ（１１１Ａ－１、、・・・）を備える。報知部１１１Ａは、例えば、振動装置または発光装置のうちの少なくとも１つである。報知部１１１Ａは、会議支援装置３０Ａの処理部３１０Ａの制御に応じて、例えば所定の時間、報知する。この報知は、端末２０によって、テキスト入力が開始されていることを表している。 The sound collection unit 11A includes an alarm unit 111A (111A-1, ...). The alarm unit 111A is, for example, at least one of a vibration device or a light-emitting device. The alarm unit 111A issues an alarm for, for example, a predetermined period of time, in response to the control of the processing unit 310A of the conference support device 30A. This alarm indicates that text input has begun by the terminal 20.

処理部３１０Ａは、端末２０からテキスト入力開始情報を受信したとき、収音部１１Ａの報知部１１１Ａに報知する。処理部３１０Ａは、報知後、例えば第１所定時間の経過後に、報知部１１１Ａへの報知を終了する。 When the processing unit 310A receives text input start information from the terminal 20, it notifies the notification unit 111A of the sound collection unit 11A. After the notification, for example, after a first predetermined time has elapsed, the processing unit 310A ends the notification to the notification unit 111A.

なお、処理部３１０Ａは、報知後、第１所定時間経過しても発話が終了せず、かつテキスト入力が継続されている場合、再度、報知させるようにしてもよい。
また、処理部３１０Ａは、全ての参加者が所持する収音部１１Ａに報知してもよく、発話されている参加者の収音部１１Ａのみに報知してもよい。
なお、本実施形態においても、第１表示装置６０の画面を変更する。 Note that, if the speech does not end even after the first predetermined time has elapsed after the notification and text input is still being continued, processing unit 310A may issue another notification.
Furthermore, the processing unit 310A may notify the sound collection units 11A carried by all participants, or may notify only the sound collection unit 11A of the participant who is speaking.
In this embodiment, the screen of the first display device 60 is also changed.

以上のように、変形例では、少なくとも発話している参加者が使用している収音部１１Ａに、振動等によって報知するようにした。 As described above, in this modified example, the sound pickup unit 11A used by at least the participant who is speaking is notified by vibration or the like.

これにより、変形例によれば、第１表示装置６０による報知に気づかなくても、収音部１１Ａへの報知によって、テキスト入力が開始されたことを気づくことができる。 As a result, according to the modified example, even if the user does not notice the notification from the first display device 60, the user can notice that text input has started by the notification from the sound collection unit 11A.

なお、本発明における会議支援装置３０（３０Ａ）の機能の全てまたは一部を実現するためのプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたプログラムをコンピュータシステムに読み込ませ、実行することにより会議支援装置３０（３０Ａ）が行う処理の全てまたは一部を行ってもよい。なお、ここでいう「コンピュータシステム」とは、ＯＳや周辺機器等のハードウェアを含むものとする。また、「コンピュータシステム」は、ホームページ提供環境（あるいは表示環境）を備えたＷＷＷシステムも含むものとする。また、「コンピュータ読み取り可能な記録媒体」とは、フレキシブルディスク、光磁気ディスク、ＲＯＭ、ＣＤ－ＲＯＭ等の可搬媒体、コンピュータシステムに内蔵されるハードディスク等の記憶装置のことをいう。さらに「コンピュータ読み取り可能な記録媒体」とは、インターネット等のネットワークや電話回線等の通信回線を介してプログラムが送信された場合のサーバやクライアントとなるコンピュータシステム内部の揮発性メモリ（ＲＡＭ）のように、一定時間プログラムを保持しているものも含むものとする。 In addition, a program for realizing all or part of the functions of the conference support device 30 (30A) in the present invention may be recorded on a computer-readable recording medium, and the program recorded on the recording medium may be read into a computer system and executed to perform all or part of the processing performed by the conference support device 30 (30A). Note that the term "computer system" here includes hardware such as an OS and peripheral devices. The term "computer system" also includes a WWW system equipped with a home page providing environment (or display environment). The term "computer-readable recording medium" refers to portable media such as flexible disks, optical magnetic disks, ROMs, and CD-ROMs, and storage devices such as hard disks built into computer systems. The term "computer-readable recording medium" also includes those that hold a program for a certain period of time, such as volatile memory (RAM) inside a computer system that becomes a server or client when a program is transmitted via a network such as the Internet or a communication line such as a telephone line.

また、上記プログラムは、このプログラムを記憶装置等に格納したコンピュータシステムから、伝送媒体を介して、あるいは、伝送媒体中の伝送波により他のコンピュータシステムに伝送されてもよい。ここで、プログラムを伝送する「伝送媒体」は、インターネット等のネットワーク（通信網）や電話回線等の通信回線（通信線）のように情報を伝送する機能を有する媒体のことをいう。また、上記プログラムは、前述した機能の一部を実現するためのものであってもよい。さらに、前述した機能をコンピュータシステムにすでに記録されているプログラムとの組み合わせで実現できるもの、いわゆる差分ファイル（差分プログラム）であってもよい。 The above program may also be transmitted from a computer system in which the program is stored in a storage device or the like to another computer system via a transmission medium, or by transmission waves in the transmission medium. Here, the "transmission medium" that transmits the program refers to a medium that has the function of transmitting information, such as a network (communication network) such as the Internet or a communication line (communication line) such as a telephone line. The above program may also be one that realizes part of the above-mentioned functions. Furthermore, it may be one that can realize the above-mentioned functions in combination with a program already recorded in the computer system, a so-called difference file (difference program).

以上、本発明を実施するための形態について実施形態を用いて説明したが、本発明はこうした実施形態に何等限定されるものではなく、本発明の要旨を逸脱しない範囲内において種々の変形および置換を加えることができる。 The above describes the form for carrying out the present invention using an embodiment, but the present invention is not limited to such an embodiment, and various modifications and substitutions can be made without departing from the spirit of the present invention.

１，１Ａ…会議支援システム、１０，１０Ａ…収音装置、１１…収音部、２０…端末、３０，３０Ａ…会議支援装置、４０…音響モデル・辞書ＤＢ４０、５０…議事録・音声ログ記憶部、６０…第１表示装置、７０…第２表示装置、８０…ＰＣ、１１１Ａ…報知部 1, 1A...conference support system, 10, 10A...sound collection device, 11...sound collection section, 20...terminal, 30, 30A...conference support device, 40...acoustic model/dictionary DB 40, 50...minutes/audio log storage section, 60...first display device, 70...second display device, 80...PC, 111A...notification section

Claims

A terminal equipped with a text input unit;
A display device for displaying an image;
a conference support device that is connected to the terminal and displays, on the display device, text information input from the terminal and information obtained by converting an audio signal input from a sound pickup unit into text;
Equipped with
The conference support device includes:
when text input is started by the terminal, changing the entire display of the display device for a first predetermined time;
the first predetermined time is either a predetermined time from a start of the text input or a time from a start of the text input to an end of the text input,
detecting a speech section in the voice signal of a second person different from the first person who is inputting the text, and if the speech of the second person is continuing even after the predetermined time period is changed, changing the entire display on the display device for a second predetermined time period that is longer than the first predetermined time period;
Meeting support system.

A terminal equipped with a text input unit;
A display device for displaying an image;
a conference support device that is connected to the terminal and displays, on the display device, text information input from the terminal and information obtained by converting an audio signal input from a sound pickup unit into text;
Equipped with
The conference support device includes:
when text input is started by the terminal, changing the entire display of the display device for a first predetermined time;
detecting a speech segment in the voice signal of a second person different from the first person who is inputting the text, and if the speech of the second person continues after the first predetermined time has elapsed, changing the entire display on the display device for a second predetermined time that is longer than the first predetermined time;
Meeting support system.