JP2013153411A

JP2013153411A - Conference server, conference system, control method and program of conference server, and recording medium

Info

Publication number: JP2013153411A
Application number: JP2012215311A
Authority: JP
Inventors: Yuta Kawamura; 優太川村; Shinichi Hara; 新一原; Yumi Asanaga; 優美朝長; Tomokazu Kaneko; 智一金子
Original assignee: Canon Marketing Japan Inc; Canon MJ IT Group Holdings Inc; Canon Software Inc
Current assignee: Canon Marketing Japan Inc; Canon IT Solutions Inc; Canon MJ IT Group Holdings Inc
Priority date: 2011-12-26
Filing date: 2012-09-27
Publication date: 2013-08-08
Anticipated expiration: 2032-09-27
Also published as: JP2016136746A; JP6135782B2; JP5892021B2; JP6020685B2; JP2015233337A

Abstract

PROBLEM TO BE SOLVED: To provide a conference system which presents the speech converted into character information, out of utterances made at the same time by using speech output and character output, on a screen plainly.SOLUTION: Speech data transmitted from a client device is converted into characters as a character output. The character output is combined with the image data of an utterer if it is displayed, otherwise the image data is generated in a character speech frame. The whole image data based on the image data combined with the character output, the image data received from the client device, and the character speech frame is synthesized as the entire image data, and transmitted to the client device.

Description

本発明は、ネットワークにより複数の拠点を接続して行う会議システムにおいて、複数の発言が重なり合った場合に、発言内容を分かりやすく会議の参加者に提示する技術に関する。 The present invention relates to a technology for presenting the content of an utterance to a conference participant in an easily understandable manner when a plurality of utterances overlap in a conference system that connects a plurality of bases via a network.

従来、ネットワークを介して接続された複数の端末装置を利用して、会議をする会議システムが知られている。会議システムにより、遠隔地にいるユーザ同士が、１つの“会議室”に入室し、参加者の様子を撮影した画像や、音声データを送受信し、移動時間を無駄にすることなく会議を行うことができる。 2. Description of the Related Art Conventionally, a conference system that performs a conference using a plurality of terminal devices connected via a network is known. A conference system allows users in remote locations to enter a single “meeting room” and send and receive images and audio data of participants and hold a meeting without wasting travel time Can do.

しかしながら、会議システムにおいては、実際に１ヶ所に集まって会議をする場合と比較して、誰が発言しているのか分かりにくかったり、複数の参加者が同時に発言した場合に必要な発言のみを聞き取ることが困難であったりするという問題がある。 However, in the conferencing system, it is harder to understand who is speaking, compared to the case of actually gathering at one place, or listening only to the necessary comments when multiple participants speak at the same time. There is a problem that is difficult.

特許文献１に記載の技術には、参加者に対して予め優先順位を付けたり、先に発言を始めた参加者に高い優先順位を付与するなどして、優先順位が最も高い参加者の発言をリアルタイムで音声出力し、他の参加者の発言は文字情報に変換して出力する技術の記載がある。 In the technique described in Patent Document 1, priorities are given to the participants in advance, or the priorities of the participants with the highest priorities are given, for example, by giving high priorities to the participants who have started speaking first. Is described in real-time, and the speech of other participants is converted into text information and output.

特開２００６−２２９９０３号公報JP 2006-229903 A

しかしながら、特許文献１に記載の技術は、複数の発言、特に文字情報に変換された発言について、具体的な表示の構成が記載されておらず、特に文字情報に変換された発言がある場合であって、画面上に、分かりやすく表示する構成について記載されていない。 However, the technology described in Patent Document 1 does not describe a specific display configuration for a plurality of utterances, particularly utterances converted into character information. In particular, there is a utterance converted into character information. Therefore, there is no description about a configuration that is easy to understand on the screen.

本発明は、上記問題を解決するものであり、音声出力と文字出力を用いて、同時になされた発言のうち、文字情報に変換された発話を分かりやすく画面上に提示する会議システムを提供することを目的とする。 The present invention solves the above-described problem, and provides a conference system that uses speech output and character output to present on the screen an utterance that has been converted to character information in an easy-to-understand manner. With the goal.

また、本発明は、参加者の発言に関して音声出力と文字出力を分かりやすく切り替える会議システムを提供することを目的とする。 It is another object of the present invention to provide a conference system that switches between voice output and character output in an easy-to-understand manner with respect to participants' statements.

本発明は、クライアント装置とネットワークを介して通信可能であり、前記クライアント装置からの音声データを文字に変換して文字表示することが可能な会議サーバであって、前記クライアント装置から送信された音声データを受け付ける音声受付手段と、前記音声受付手段により受け付けた音声データを、前記クライアント装置に対応付けて音声データ記憶手段に登録する音声データ登録手段と、前記クライアント装置に対応する音声データを、前記音声データ記憶手段から取得する音声取得手段と、前記クライアント装置から送信された画像データを受け付ける画像受付手段と、前記画像受付手段により受け付けた画像データを、前記クライアント装置に対応付けて画像データ記憶手段に登録する画像データ登録手段と、前記音声取得手段により取得した音声データを、文字に変換する音声データ文字変換手段と、前記変換された文字に対応する前記画像データが前記クライアント装置にて表示されるよう設定されている場合には、文字出力として該画像データと合成し、また、該画像データが該クライアント装置にて表示されるよう設定されていない場合には、該文字出力を表示するための文字発話枠に基づく画像データを生成し、さらに、該文字出力と合成された該画像データと、該クライアント装置から受け付けた該画像データと、該文字発話枠が生成された場合には、生成された該文字発話枠に基づく画像データの全てを全体画像データとして合成する画像合成手段と、前記画像合成手段により合成された全体画像データを前記クライアント装置に送信する全体画像データ送信手段と、
を有することを特徴とする。 The present invention is a conference server capable of communicating with a client device via a network, capable of converting voice data from the client device into characters and displaying the characters, and transmitting a voice transmitted from the client device. Voice receiving means for receiving data, voice data registration means for registering the voice data received by the voice receiving means in the voice data storage means in association with the client device, and voice data corresponding to the client device, Audio data acquisition means to be acquired from audio data storage means, image reception means for receiving image data transmitted from the client device, and image data storage means in association with image data received by the image reception means in association with the client device Image data registration means to be registered with the voice acquisition If the voice data acquired by the stage is set to be displayed on the client device, the voice data character conversion means for converting the voice data into characters and the image data corresponding to the converted characters And, if the image data is not set to be displayed on the client device, generates image data based on a character utterance frame for displaying the character output, Further, when the image data combined with the character output, the image data received from the client device, and the character utterance frame are generated, all of the image data based on the generated character utterance frame Image synthesizing means for synthesizing the image as whole image data, and an entire image for transmitting the whole image data synthesized by the image synthesizing means to the client device And over data transmission means,
It is characterized by having.

本発明の会議システムにより、音声出力と文字出力を用いて、同時になされた発言のうち、文字情報に変換された発話を分かりやすく画面上に提示する会議システムを提供することが可能となる。 With the conference system of the present invention, it is possible to provide a conference system that uses speech output and character output to present on the screen an utterance converted into character information in an easy-to-understand manner.

また、本発明の会議システムにより、参加者の発言に関して音声出力と文字出力を分かりやすく切り替えることが可能となる。 In addition, the conference system of the present invention makes it possible to switch between voice output and character output in an easy-to-understand manner with respect to participants' statements.

本発明の実施形態に係わるシステム構成の一例を示す図である。It is a figure which shows an example of the system configuration | structure concerning embodiment of this invention. 本発明の実施形態に係わる会議サーバ、クライアント装置のハードウェア構成の一例を示すブロック図である。It is a block diagram which shows an example of the hardware constitutions of the conference server and client apparatus concerning embodiment of this invention. 本発明の実施形態に係る会議システムにおける構成要素を示す図である。It is a figure which shows the component in the conference system which concerns on embodiment of this invention. 本発明の実施形態に係わるクライアント装置に表示される会議室を登録する画面の一例である。It is an example of the screen which registers the conference room displayed on the client apparatus concerning embodiment of this invention. 本発明の実施形態に係わるクライアント装置に表示される会議室を予約する画面の一例である。It is an example of the screen which reserves the conference room displayed on the client apparatus concerning embodiment of this invention. 本発明の実施形態に係わるクライアント装置に表示される会議に参加する画面の一例である。It is an example of the screen which participates in the meeting displayed on the client apparatus concerning embodiment of this invention. 本発明の実施形態に係わるクライアント装置に表示される会議実施中の画面の一例である。It is an example of the screen in the middle of a meeting displayed on the client apparatus concerning embodiment of this invention. 本発明の実施形態に係わる終了条件テーブルの構成の一例を示す図である。It is a figure which shows an example of a structure of the completion | finish condition table concerning embodiment of this invention. 本発明の実施形態に係わる音声バッファの一例を示すイメージ図である。It is an image figure which shows an example of the audio | voice buffer concerning embodiment of this invention. 本発明の実施形態に係わる権限を管理する変数および変数の状態の一例を示すイメージ図である。It is an image figure which shows an example of the variable which manages the authority concerning embodiment of this invention, and the state of a variable. 本発明の実施形態に係わる会議室使用の処理を示すフローチャートの一例である。It is an example of the flowchart which shows the process of meeting room use concerning embodiment of this invention. 本発明の実施形態に係わる各クライアント装置に対応するプロセスが音声または文字による発話の権限を取得する処理を示すフローチャートの一例である。It is an example of the flowchart which shows the process in which the process corresponding to each client apparatus concerning embodiment of this invention acquires the authority of speech by a voice or a character. 本発明の実施形態に係わる音声による発言の権限を取得したプロセスが音声出力する処理を示すフローチャートの一例である。It is an example of the flowchart which shows the process which carries out the audio | voice output of the process which acquired the authority of the speech utterance based on embodiment of this invention. 本発明の実施形態に係わるプロセスが文字出力する処理を示すフローチャートの一例である。It is an example of the flowchart which shows the process which the process concerning embodiment of this invention outputs a character. 本発明の実施形態に係わる各クライアント装置に入力された音声を会議サーバに送信する処理を示すフローチャートの一例である。It is an example of the flowchart which shows the process which transmits the audio | voice input into each client apparatus concerning embodiment of this invention to a conference server. 本発明の実施形態に係わる各プロセスで出力された音声および文字による発言を各クライアント装置に送信する処理を示すフローチャートの一例である。It is an example of the flowchart which shows the process which transmits the speech by the sound and character output by each process concerning embodiment of this invention to each client apparatus. 本発明の実施形態に関わるクライアント装置に表示される会議実施中の画面の一例である。It is an example of the screen in the middle of a meeting displayed on the client apparatus in connection with embodiment of this invention. 本発明の実施形態に関わる文字発話の表示画面生成の処理を示すフローチャートの一例である。It is an example of the flowchart which shows the process of the display screen generation of the character utterance concerning embodiment of this invention. 本発明の第２の実施形態に関わる文字発話の表示画面生成の処理を示すフローチャートの一例である。It is an example of the flowchart which shows the display screen generation process of the character utterance in connection with the 2nd Embodiment of this invention.

以下、本発明の実施の形態を、図面を参照して詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

図１は、本発明の実施形態に係わるシステム構成の一例を示す図である。 FIG. 1 is a diagram illustrating an example of a system configuration according to an embodiment of the present invention.

本発明のクライアント装置１０２は、例えば、パーソナルコンピュータであり、ネットワーク通信機能を備え、ネットワーク１０３（ＬＡＮ、ＷＡＮなど）を介して会議サーバ１０１（会議支援サーバ、テレビ会議サーバ）とデータ通信が可能である。 The client apparatus 102 of the present invention is, for example, a personal computer, has a network communication function, and can perform data communication with the conference server 101 (conference support server, video conference server) via the network 103 (LAN, WAN, etc.). is there.

会議サーバ１０１は、クライアント装置１０２同士での会議を実現するためのサーバである。クライアント装置１０２は会議サーバ１０１へアクセスすることにより、会議画面（インターフェース）を取得して会議を行うことができる。 The conference server 101 is a server for realizing a conference between the client devices 102. By accessing the conference server 101, the client device 102 can acquire a conference screen (interface) and hold a conference.

クライアント装置１０２は、会議システム（会議支援システム、テレビ会議システム）を利用するユーザが操作するパーソナルコンピュータ、タブレット端末、携帯端末などであり、会議サーバ１０１へアクセスするためのクライアント側アプリケーション、および専用のモジュール（アプリケーション）がインストールされている。また、会議システムがウェブを利用したウェブ会議システムである場合には、クライアント側アプリケーションはウェブブラウザを用いるように構成されていてもよい。すなわち、専用のモジュールは、例えば、ウェブブラウザを介して、会議サーバ１０１からダウンロードするＡｃｔｉｖｅＸ（登録商標）コンポーネントであってもよい。なお、会議システムにおいてクライアント装置１０２から動画像を、会議サーバ１０１を通じて会議の会議参加者が利用する他のクライアント装置１０２（自身も含んでもよい）に配信する場合は、撮像装置（不図示）を、音声を送信する場合は、マイクロフォン（不図示）を接続する。また、相手の音声を視聴するためにはスピーカ（不図示）を接続する。 The client device 102 is a personal computer, a tablet terminal, a mobile terminal, or the like operated by a user who uses a conference system (a conference support system or a video conference system), a client-side application for accessing the conference server 101, and a dedicated device Module (application) is installed. When the conference system is a web conference system using the web, the client-side application may be configured to use a web browser. That is, the dedicated module may be an ActiveX (registered trademark) component downloaded from the conference server 101 via a web browser, for example. In the conference system, when a moving image is distributed from the client device 102 to another client device 102 (including itself) used by a conference participant through the conference server 101, an imaging device (not shown) is used. When transmitting audio, a microphone (not shown) is connected. Also, a speaker (not shown) is connected to view the other party's voice.

また、クライアント装置１０２のうちの１台が、会議サーバ１０１を兼ねるよう、会議サーバ１０１とクライアント装置１０２が同一筐体に構成可能であってもよい。 Further, the conference server 101 and the client device 102 may be configured in the same casing so that one of the client devices 102 also serves as the conference server 101.

図２は、会議サーバ１０１およびクライアント装置１０２に適用可能な情報処理装置のハードウェア構成例を示すブロック図である。 FIG. 2 is a block diagram illustrating a hardware configuration example of an information processing apparatus applicable to the conference server 101 and the client apparatus 102.

図２に示すように、会議サーバ１０１およびクライアント装置１０２は、システムバス２０４を介してＣＰＵ（ＣｅｎｔｒａｌＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）２０１、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）２０３、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）２０２、入力コントローラ２０５、ビデオコントローラ２０６、メモリコントローラ２０７、通信Ｉ／Ｆコントローラ２０８等が接続された構成を採る。 As shown in FIG. 2, the conference server 101 and the client device 102 are connected via a system bus 204 to a CPU (Central Processing Unit) 201, a RAM (Random Access Memory) 203, a ROM (Read Only Memory) 202, an input controller 205, A configuration is adopted in which a video controller 206, a memory controller 207, a communication I / F controller 208, and the like are connected.

ＣＰＵ２０１は、システムバス２０４に接続される各デバイスやコントローラを統括的に制御する。 The CPU 201 comprehensively controls each device and controller connected to the system bus 204.

また、ＲＯＭ２０２あるいは外部メモリ２１１には、ＣＰＵ２０１の制御プログラムであるＢＩＯＳ（ＢａｓｉｃＩｎｐｕｔ／ＯｕｔｐｕｔＳｙｓｔｅｍ）やＯＳ（ＯｐｅｒａｔｉｎｇＳｙｓｔｅｍ）や、各サーバあるいは各ＰＣが実行する機能を実現するために必要な後述する各種プログラム等が記憶されている。また、本発明を実施するために必要な情報が記憶されている。なお外部メモリはデータベースであってもよい。 Further, the ROM 202 or the external memory 211 will be described later, which is necessary for realizing the functions executed by each server or each PC, such as BIOS (Basic Input / Output System) and OS (Operating System) which are control programs of the CPU 201. Various programs are stored. Further, information necessary for carrying out the present invention is stored. The external memory may be a database.

ＲＡＭ２０３は、ＣＰＵ２０１の主メモリ、ワークエリア等として機能する。ＣＰＵ２０１は、処理の実行に際して必要なプログラム等をＲＯＭ２０２あるいは外部メモリ２１１からＲＡＭ２０３にロードし、ロードしたプログラムを実行することで各種動作を実現する。 The RAM 203 functions as a main memory, work area, and the like for the CPU 201. The CPU 201 implements various operations by loading a program or the like necessary for executing the processing from the ROM 202 or the external memory 211 to the RAM 203 and executing the loaded program.

また、入力コントローラ２０５は、キーボード（ＫＢ）２０９や不図示のマウス等のポインティングデバイス等からの入力を制御する。 The input controller 205 controls input from a keyboard (KB) 209 or a pointing device such as a mouse (not shown).

ビデオコントローラ２０６は、ディスプレイ２１０等の表示器への表示を制御する。尚、表示器は液晶ディスプレイ等の表示器でもよい。これらは、必要に応じて管理者が使用する。 The video controller 206 controls display on a display device such as the display 210. The display device may be a display device such as a liquid crystal display. These are used by the administrator as needed.

メモリコントローラ２０７は、ブートプログラム、各種のアプリケーション、フォントデータ、ユーザファイル、編集ファイル、各種データ等を記憶する外部記憶装置（ハードディスク（ＨＤ））や、フレキシブルディスク（ＦＤ）、あるいは、ＰＣＭＣＩＡ（ＰｅｒｓｏｎａｌＣｏｍｐｕｔｅｒＭｅｍｏｒｙＣａｒｄＩｎｔｅｒｎａｔｉｏｎａｌＡｓｓｏｃｉａｔｉｏｎ）カードスロットにアダプタを介して接続されるコンパクトフラッシュ（登録商標）メモリ等の外部メモリ２１１へのアクセスを制御する。 The memory controller 207 is an external storage device (hard disk (HD)), flexible disk (FD), or PCMCIA (Personal Computer) that stores a boot program, various applications, font data, user files, editing files, various data, and the like. Controls access to an external memory 211 such as a Compact Flash (registered trademark) memory connected to a Memory Card International Association (Card Memory) card slot via an adapter.

通信Ｉ／Ｆコントローラ２０８は、ネットワーク１０３を介して外部機器と接続・通信し、ネットワークでの通信制御処理を実行する。例えば、ＴＣＰ／ＩＰ（ＴｒａｎｓｍｉｓｓｉｏｎＣｏｎｔｒｏｌＰｒｏｔｏｃｏｌ／ＩｎｔｅｒｎｅｔＰｒｏｔｏｃｏｌ）を用いた通信等が可能である。 The communication I / F controller 208 connects and communicates with an external device via the network 103, and executes communication control processing in the network. For example, communication using TCP / IP (Transmission Control Protocol / Internet Protocol) is possible.

尚、ＣＰＵ２０１は、例えばＲＡＭ２０３内の表示情報用領域へアウトラインフォントの展開（ラスタライズ）処理を実行することにより、ディスプレイ２１０上に表示することが可能である。また、ＣＰＵ２０１は、ディスプレイ２１０上のマウスカーソル（図示しない）等によるユーザ指示を可能とする。 The CPU 201 can display on the display 210 by executing an outline font rasterization process on a display information area in the RAM 203, for example. Further, the CPU 201 enables a user instruction using a mouse cursor (not shown) on the display 210.

本発明を実現するための後述する各種プログラムは、外部メモリ２１１に記録されており、必要に応じてＲＡＭ２０３にロードされることによりＣＰＵ２０１によって実行されるものである。さらに、上記プログラムの実行時に用いられる定義ファイルおよび各種情報テーブル等も、外部メモリ２１１に格納されており、これらについての詳細な説明についても後述する。 Various programs to be described later for realizing the present invention are recorded in the external memory 211 and executed by the CPU 201 by being loaded into the RAM 203 as necessary. Furthermore, definition files and various information tables used when executing the program are also stored in the external memory 211, and a detailed description thereof will be described later.

図３は、本発明の実施形態に係る会議システムにおける構成要素を示す図である。図３で示すとおり、すべてのユーザとルーム（会議室）は、グループに所属する（３０１、３０３）。 FIG. 3 is a diagram showing components in the conference system according to the embodiment of the present invention. As shown in FIG. 3, all users and rooms (conference rooms) belong to a group (301, 303).

システム管理者３０２は、会議システム全体の管理者であって、グループおよびグループ管理者の登録、変更、削除を行うと共に、会議システムに関する各種設定を行う。 A system administrator 302 is an administrator of the entire conference system, and registers, changes, and deletes groups and group administrators, and performs various settings related to the conference system.

グループ管理者３０４は、会議システム内の管理者であって、グループ内の一般ユーザと傍聴ユーザ、ルーム、タグの登録、変更、削除を行う。 The group manager 304 is a manager in the conference system, and registers, changes, and deletes general users and hearing users, rooms, and tags in the group.

一般ユーザ３０５は、グループ内で、実際に会議を行うユーザであって、会議室の予約、変更、削除を行うと共に、予約した会議への参加、フリースペースへの参加を行うことができる。なお、グループ管理者は、一般ユーザと同様に、会議室の予約や会議への参加を行うことができる。 The general user 305 is a user who actually performs a conference within the group, and can reserve, change, and delete the conference room, and can participate in the reserved conference and participate in the free space. Note that the group manager can reserve a conference room and participate in a conference in the same manner as a general user.

その他、例えば電子メールにて招待された会議にのみ参加できるユーザである「ゲストユーザ」や、グループ内で実施されている会議を傍聴する「傍聴ユーザ」がある。また、会議を行うためには、ルームを登録する必要があり、ルームの登録は、グループ管理者が行う。 In addition, for example, there are “guest users” who can participate only in meetings invited by e-mail, and “listening users” who listen to meetings held in groups. In addition, in order to hold a conference, it is necessary to register a room, and the room manager registers the room.

ルームには、日時、会議参加者を制限して参加するルームであって、会議を行う前に予約が必要な会議室３０６と、日時、会議参加者を問わず会議の予約を行わなくても、いつでも会議を行うことができる「フリースペース」３０７の２種類ある。ユーザは、会議の内容に応じて「会議室」と「フリースペース」を使い分けることができる。 The room is a room that participates by restricting the date and time and conference participants, and the conference room 306 that needs to be reserved before the conference, and the conference room 306 does not have to be reserved regardless of the date and time or the conference participants. There are two types of “free space” 307 where a conference can be held at any time. The user can use “conference room” and “free space” depending on the content of the conference.

図４は、本発明の実施形態に係わるクライアント装置に表示される会議室を登録する画面の一例である。 FIG. 4 is an example of a screen for registering a conference room displayed on the client device according to the embodiment of the present invention.

本願発明の実施形態で会議を実施する仮想会議室（本明細書では、「ルーム」という。）の設定方法について説明する。図４は、本実施の形態において会議を行うための会議室の設定を行うルーム登録画面である。本図のルーム登録画面は、クライアント装置１０２を操作する会議システムの管理者ユーザ（アドミニストレーター）のアカウントでシステムにログインをし、画面左側のルーム管理の領域を押下することにより、ルーム登録画面がディスプレイに表示される。 A method for setting a virtual conference room (referred to as “room” in the present specification) in which a conference is performed according to an embodiment of the present invention will be described. FIG. 4 is a room registration screen for setting a conference room for conducting a conference in the present embodiment. The room registration screen in this figure is displayed by logging in to the system with the account of the administrator user (administrator) of the conference system who operates the client apparatus 102 and pressing the room management area on the left side of the screen. It appears on the display.

ルーム名４０１に、任意のルーム名の名称入力を受け付ける。ルームタイプ４０２は、「会議室」として設定するのか、「フリースペース」として設定するのかの選択を受け付ける。本発明の実施形態において「会議室」とは、会議の参加者や、開始時刻などを定めて予約をする必要があるルームのことを指す。また「フリースペース」とは会議を行うのに予約を必要とせず、参加者などの設定を特に設ける必要のない、誰でも会議に参加可能なルームのことをいう。 The room name 401 receives a name input of an arbitrary room name. The room type 402 accepts selection of whether to set as “conference room” or “free space”. In the embodiment of the present invention, the “conference room” refers to a room that needs to be reserved by determining the participants of the conference and the start time. “Free space” refers to a room in which anyone can participate in the conference without a reservation for the conference and no particular setting of participants or the like.

定員４０３は、ルーム定員の設定を受け付ける。傍聴機能４０４の設定は、会議に招待されるユーザとは別に、会議の傍聴ユーザを許可するか否かの設定を受け付ける。会議を傍聴するユーザは、定員に空きがあればルームに入室することが可能であるが、発言権限などは制限される。 The capacity 403 receives the setting of the room capacity. The setting of the hearing function 404 accepts a setting as to whether or not to allow a hearing user of the conference, separately from the user invited to the conference. A user who listens to the conference can enter the room if there is a vacant capacity, but the authority to speak is limited.

録画機能４０５の設定は、当該ルームで実施される会議の録画を許可するか否かの設定を受け付ける。録画が許可されたルームであれば、会議中に「録画開始」ボタンの押下を受け付けることで、当該会議を録画することができる。 The setting of the recording function 405 accepts a setting as to whether or not to allow recording of a conference held in the room. In a room where recording is permitted, the conference can be recorded by receiving a press of a “recording start” button during the conference.

プロファイル４０６では、ルームに適用させるプロファイルの選択を受け付ける。具体的には、右側の呼出ボタン押下を受け付けると、前述の図３のプロファイル登録画面で登録したプロファイルの一覧が表示され、現在詳細設定を行っているルームに適用するプロファイルの選択を受け付けることができる。４０７の登録ボタンの押下を受け付けると、ルームとルームの詳細設定を会議サーバ１０１の管理テーブル（不図示）に登録する。 The profile 406 receives selection of a profile to be applied to the room. Specifically, when the right call button is pressed, a list of profiles registered on the above-described profile registration screen in FIG. 3 is displayed, and selection of a profile to be applied to the room for which detailed settings are currently being made can be accepted. it can. When the pressing of the registration button 407 is accepted, the room and the room detailed settings are registered in the management table (not shown) of the conference server 101.

図５は、本発明の実施形態に係わるクライアント装置１０２に表示される会議室を予約する画面の一例である。 FIG. 5 is an example of a screen for reserving a conference room displayed on the client apparatus 102 according to the embodiment of the present invention.

会議を行うために、会議室の予約をすることができる。これは、前述の「会議室」もしくは「フリースペース」という本発明の実施形態で想定するルームの種類のうち、会議の参加者や開始時間などを決めて予約する必要のある「会議室」タイプのルームを利用する場合に必要な予約処理である。 A conference room can be reserved for a conference. This is the “meeting room” type that needs to be determined and reserved among the types of rooms assumed in the embodiment of the present invention, such as “meeting room” or “free space”, as described above. This is a reservation process required when using a room.

会議システムの予約画面は、本実施形態の会議システムの一般ユーザアカウントでシステムにログインをし、画面左側の「会議室予約５０１」の領域を押下することにより、クライアント装置１０２の表示装置に表示される。 The conference system reservation screen is displayed on the display device of the client device 102 by logging in to the system with the general user account of the conference system of the present embodiment and pressing the “conference room reservation 501” area on the left side of the screen. The

会議室名５０２は、前述の図４のルーム登録画面で登録をしたルーム（会議室、もしくはフリースペース）の選択を受け付け、会議タイトル５０３の項目で、会議名を設定する。利用日付５０４、利用時間５０５で、該当のルームの利用スケジュールを設定する。参加者５０６は、本実施形態の会議システムに登録されているユーザ情報を呼出し、会議の参加者（会議に招待する者）を選択する。５０６で、「限定する」が選択されている場合には、招待（参加）ユーザの選択画面が表示され、ユーザ情報を検索して招待するユーザの選択をする。参加者を限定しない場合には、特に参加者の設定は受け付けない。なお、会議への参加者を限定しない場合とは、例えば、フリースペース形式のルームで実施する意見交換会のような会議のことをいう。 The conference room name 502 accepts selection of the room (conference room or free space) registered on the room registration screen of FIG. 4 described above, and sets the conference name in the item of the conference title 503. The usage schedule of the corresponding room is set with the usage date 504 and the usage time 505. The participant 506 calls the user information registered in the conference system of the present embodiment and selects a conference participant (a person invited to the conference). If “Limit” is selected in 506, an invitation (participation) user selection screen is displayed, and user information is searched to select a user to invite. When the participants are not limited, the setting of the participants is not particularly accepted. In addition, the case where the participant to a meeting is not limited means the meeting like the opinion exchange meeting implemented in the room of a free space format, for example.

ゲスト参加者５０７の招待設定は、会議に参加してほしいユーザに招待通知（例えば、ユーザ情報と紐づくメールアドレスを宛先として送信される）を送信する場合に設定をする。ゲスト参加者を招待する場合には、ユーザ選択画面（不図示）が表示され、ユーザ情報を検索して招待するユーザの選択をする。 The guest participant 507 invitation setting is set when an invitation notification (for example, a mail address linked to user information is transmitted as a destination) is transmitted to a user who wants to participate in the conference. When inviting a guest participant, a user selection screen (not shown) is displayed, and user information is searched and a user to be invited is selected.

会議の録画設定５０８は、予約する会議の録画を許可するか否かの設定を受け付ける。なお図４のルーム登録画面において４０５で録画機能を「許可しない」に設定していた場合には、５０８の各項目はグレーアウトをして選択ができない状態となる。 The conference recording setting 508 receives a setting as to whether or not to permit recording of a conference to be reserved. In the room registration screen of FIG. 4, if the recording function is set to “not permitted” in 405, each item of 508 is grayed out and cannot be selected.

予約ボタン５０９の押下を受け付けると、会議の予約が完了する。会議の予約が完了すると、会議サーバ１０１は、会議の予約情報をサーバの記憶部に記憶し、会議予約データテーブル（不図示）を更新する。また、予約ボタン５０９の押下を受け付けると、予約をしたユーザおよび会議に招待されたユーザのメールアドレスに対して、会議予約メールが送信されるようにしてもよい。 When the pressing of the reservation button 509 is accepted, the conference reservation is completed. When the conference reservation is completed, the conference server 101 stores the conference reservation information in the storage unit of the server and updates the conference reservation data table (not shown). In addition, when the pressing of the reservation button 509 is accepted, a conference reservation mail may be transmitted to the mail addresses of the user who made the reservation and the user invited to the conference.

図６は、本発明の実施形態に係わるクライアント装置１０２に表示される会議に参加する画面の一例である。本画面を通して、クライアント装置１０２のユーザは、予約している会議に参加することができる。本実施の形態において会議参加画面は、本実施形態の会議システムの一般ユーザアカウントでシステムにログインをし、画面左側の「会議参加６０１」の領域を押下することにより、会議参加画面がディスプレイに表示される。図６で表示されている会議室は、ログイン中のユーザアカウントが、会議の参加者として登録されている会議の一覧である。図６では、ルームＡとルームＢの２つの会議室が予約されており、参加ボタン６０２の押下を受け付けることにより選択した会議室の会議に参加することが可能である。 FIG. 6 is an example of a screen for participating in a conference displayed on the client device 102 according to the embodiment of the present invention. Through this screen, the user of the client device 102 can participate in the reserved conference. In the present embodiment, the conference participation screen is displayed on the display by logging in to the system with the general user account of the conference system of the present embodiment and pressing the “conference participation 601” area on the left side of the screen. Is done. The conference room displayed in FIG. 6 is a list of conferences in which the logged-in user account is registered as a conference participant. In FIG. 6, two conference rooms, room A and room B, are reserved, and it is possible to participate in the conference of the selected conference room by accepting pressing of the join button 602.

図７は、本発明の実施形態に係わるクライアント装置に表示される会議実施中の画面の一例である。 FIG. 7 is an example of a screen during a conference displayed on the client device according to the embodiment of the present invention.

会議画面７００は、クライアント装置１０２で表示されるように制御されるものであり、３つの参加者画面７０１に分割されている。ここで、参加者画面７０１ａはクライアント装置１０２ａから参加者Ａ、参加者画面７０１ｂにはクライアント装置１０２ｂから参加者Ｂ、参加者画面７０１ｃにはクライアント装置１０２ｃから参加者Ｃの画像が表示されている。 The conference screen 700 is controlled to be displayed on the client device 102 and is divided into three participant screens 701. Here, the participant screen 701a displays the image of the participant A from the client device 102a, the participant screen 701b displays the image of the participant B from the client device 102b, and the participant screen 701c displays the image of the participant C from the client device 102c. .

また、現時点では、参加者Ａが音声発話者としての権限を取得している。音声発話者（参加者Ａ）の発言内容（クライアント装置１０２ａから取得した音声データ）は、音声としてクライアント装置１０２で再生されるように制御（音声データ再生制御部）され、文字情報としては表示されていない。一方、参加者Ｂ、Ｃは音声発話者としての権限がなく、文字発話者としての権限を取得している。文字発話者（参加者Ｂ、Ｃ）の発言内容（クライアント装置１０２ｂ、１０２ｃから取得した音声データ）は文字に変換され、参加者画面７０１ｂ、７０１ｃにそれぞれの発言内容が文字として表示されている。 At the present time, the participant A has acquired the authority as a voice speaker. The speech content (voice data acquired from the client device 102a) of the voice speaker (participant A) is controlled to be played back as voice on the client device 102 (voice data playback control unit) and displayed as character information. Not. On the other hand, the participants B and C have no authority as a voice speaker, and have acquired the authority as a character speaker. The utterance contents (voice data acquired from the client devices 102b and 102c) of the character utterers (participants B and C) are converted into characters, and the respective utterance contents are displayed as characters on the participant screens 701b and 701c.

なお、本発明の実施形態では、音声発話者は１参加者（１つのクライアント装置１０２からの音声データ）に限定した例として説明しているが、後述の文字発話者のように、複数の音声発話権限を用意してもよい。 In the embodiment of the present invention, the voice speaker is described as an example limited to one participant (voice data from one client device 102). However, a plurality of voices such as a character speaker described later are described. Speaking authority may be prepared.

また、図７では、参加者Ａ〜Ｃの全員が発言しており、その場合、音声または文字のいずれかによりその発言内容が音声により再生または文字により表示されるが、さらに参加者が多く文字による表示も数が多くなる可能性がある場合には、後述するように、文字発話者の数も制限してよい。すなわち、音声発話者だけではなく、文字発話者も１つではなく、複数可能であって数が制限されるようにしてもよい。 In FIG. 7, all of the participants A to C speak, and in this case, the content of the speech is reproduced or displayed in text by either voice or text. In the case where there is a possibility that the number of displays according to is likely to increase, the number of character speakers may be limited as will be described later. That is, not only a voice speaker but also a single character speaker may be used.

図８は、本発明の実施形態に係わる終了条件テーブルの構成の一例を示す図である。本発明の実施形態において、音声発話の権限は、クライアント装置１０２から送られた音声データのうち、発言を開始した順（早い者勝ち）で取得されるものである。また、発言を中止すると（音声バッファに音声データがなくなると）音声発話の権限を解除し、他の発言者がある場合には（他のクライアント装置１０２から送られた音声データがある場合には）他の発言者に音声発話の権限を取得させるものである。 FIG. 8 is a diagram showing an example of the configuration of the end condition table according to the embodiment of the present invention. In the embodiment of the present invention, the voice utterance authority is acquired from the voice data sent from the client apparatus 102 in the order in which the utterance is started (first come first served). When speech is stopped (when there is no speech data in the speech buffer), the speech utterance authority is canceled, and when there is another speaker (when there is speech data sent from another client device 102). ) Let other speakers acquire the authority to speak.

しかしながら、例えば発言中の息継ぎなど、極わずかな間隔を空けただけで（すなわち音声バッファ上の音声データとして極わずかな音声データがない区間があっただけで）他の発言者に音声発話の権限が移ってしまっては、突然、音声発話者と文字発話者が入れ替わることになり不適切である。従って、音声発話者が発言を中断した場合に、どの程度の間隔を空ければ発言を中止したと見なすかを判断し、発言が中止されたと見なされた場合に音声発話の権限を解除させるように制御する。これにより、突然、文章として区切るのが不適切な位置であるにもかかわらず、音声発話者と文字発話者が入れ替わることを防止するという効果が得られる。 However, the voice utterance authority is given to other speakers just by leaving a very short interval (ie, there is a section where there is very little voice data as voice data in the voice buffer), such as breathing during speech. If it is moved, the voice speaker and the character speaker are suddenly switched, which is inappropriate. Therefore, when a voice speaker interrupts his / her speech, it is judged how long the speech is considered to have been stopped, and if the speech is considered to be canceled, the right to speak is released. To control. As a result, it is possible to obtain an effect of preventing the voice speaker and the character speaker from being switched suddenly even though it is an inappropriate position to divide the sentence as a sentence.

また、同様に、音声発話者の音声発話権限を解除したとたんに、突然、今まで文字発話者だった参加者の発言が、音声に切り替わるのも適切ではない場合がある。例えば、発言中の内容を文字と音声に区切った位置によっては、その前後（前は文字、後は音声）が、それぞれ意味をなさないものとなる可能性がある（例えば「資料をせつめいします」が「資料をせ」と「つめいします」に区切られた場合）。その場合には、その発言を聴いている他の参加者は、文字を読むことから音声を聴くことのタイミングを適切に切り替えて理解しなければならない。また、実際には、文字発話は、音声バッファに蓄積された音声データを文字に変換する処理が伴うため、音声バッファに音声データが蓄積された後、時差が発生する。このため、この時差の分、文字発話が抜け落ちてしまうか、あるいは文字発話と後続の音声発話が重複してなされ、他の参加者は両方を同時に理解しなければならないという状況が発生する。 Similarly, as soon as the voice utterance authority of the voice utterer is canceled, it may not be appropriate that the speech of the participant who has been a character utterance suddenly switches to voice. For example, depending on the position where the content being spoken is divided into letters and voices, the front and back (the letters before, the voice after) may be meaningless (for example, “Give materials. ”Is separated into“ materials ”and“ claws ”. In that case, other participants who are listening to the remark must switch the timing of reading the voice from reading the character appropriately and understand it. In practice, since character utterance involves a process of converting voice data stored in the voice buffer into characters, a time difference occurs after the voice data is stored in the voice buffer. For this reason, the character utterance is dropped by the time difference, or the character utterance and the subsequent voice utterance are overlapped, and the other participants have to understand both at the same time.

従って、音声発話が文字発話に切り替わる場合と同様に、文字発話者が発言を中断し、どの程度の間隔を空ければ音声発話に切り替えてもよいか、を適切に判断してから音声発話の権限を取得するように制御する。これにより、突然、発言を文章として区切る位置として不適切なタイミングで、文字発話から音声発話に入れ替わることを防止するという効果が得られる。 Therefore, as in the case where the voice utterance is switched to the character utterance, the character utterer interrupts the utterance, and after determining appropriately how long the voice utterance can be switched to the voice utterance, Control to get permission. As a result, it is possible to prevent the sudden change from the character utterance to the voice utterance at an inappropriate timing as a position where the utterance is divided into sentences.

図８を用いて、終了条件の設定について説明する。条件種別８０１には、音声発話者が発言を中断してから音声発話権限を解除するまでの間隔（経過の時間）を設定する音声出力終了条件８０２（音声出力終了条件記憶部）、発言を中断してから文字発話者が音声発話権限を取得しようとするまでの間隔（経過の時間）を設定する文字出力終了条件８０３（文字出力終了条件記憶部）からなる。 The setting of the end condition will be described with reference to FIG. The condition type 801 includes a voice output end condition 802 (voice output end condition storage unit) for setting an interval (elapsed time) from when the voice speaker interrupts the speech until the voice utterance authority is canceled, and the speech is interrupted. And a character output end condition 803 (character output end condition storage unit) for setting an interval (elapsed time) from when the character speaker attempts to acquire the voice utterance authority.

端末８０４においては、設定対象となるクライアント装置１０２を特定するための端末情報を設定する。また、端末８０４に対する端末情報が設定されていない場合、あるいは端末情報は設定されているものの音声出力終了条件８０２、文字出力終了条件８０３が設定されていない場合に用いる共通８０５（デフォルト値）が設定可能である。図８の例では、音声出力終了条件８０２として“０．５秒”、文字出力終了条件８０３として“０．３秒”が設定されており、各端末に対する終了条件が設定されていない場合に、この設定値を用いる。 In the terminal 804, terminal information for specifying the client apparatus 102 to be set is set. Also, the common 805 (default value) used when the terminal information for the terminal 804 is not set or when the terminal information is set but the voice output end condition 802 and the character output end condition 803 are not set is set. Is possible. In the example of FIG. 8, when “0.5 seconds” is set as the voice output end condition 802 and “0.3 seconds” is set as the character output end condition 803, and no end condition is set for each terminal, This set value is used.

８０６には、各端末と対応する終了条件が記載されている。例えば、クライアント装置１０２ａに対しては、音声出力終了条件８０２として“１．０秒”が設定されている。また、文字出力終了条件８０３としては“−”（設定なし）と記載されているため、共通８０５の設定値“０．３秒”を用いる。 Reference numeral 806 describes termination conditions corresponding to each terminal. For example, “1.0 second” is set as the audio output end condition 802 for the client apparatus 102a. Since the character output end condition 803 is described as “-” (no setting), the setting value “0.3 second” of the common 805 is used.

なお、終了条件テーブルの設定値は、新たに参加者が追加される度に、例えば“議長”役を務めるユーザが設定できるようにクライアント装置１０２に設定画面を表示して操作させてもよい。また、図５の参加者５０６により予め参加者を限定する場合には、その時点で終了条件を設定してもよい。 The setting values in the end condition table may be operated by displaying a setting screen on the client device 102 so that, for example, a user who acts as the “chairman” can be set every time a new participant is added. Further, when the participants are limited in advance by the participant 506 in FIG. 5, an end condition may be set at that time.

端末毎に終了条件の設定値を個別に設定できることにより、例えば“議長”役のユーザや、“顧客”が使用する端末に対する終了条件は長めにし、音声発話権限が解除されにくくするなど、状況に応じた運用を可能とする効果が得られる。 Since the end condition setting value can be set individually for each terminal, for example, the end condition for the terminal used by the “chairperson” user or the “customer” can be made longer, making it difficult to cancel the voice utterance authority. The effect of enabling the corresponding operation is obtained.

図９は、本発明の実施形態に係わる音声バッファの一例を示すイメージ図である。図９の９００は、会議サーバ１０１のメモリなどでシステムが管理している音声バッファであり、図９の例では、クライアント装置１０２ａ〜１０２ｃ（参加者Ａ〜Ｃ）の音声データを別々の音声バッファ（３つの音声バッファ）に記憶する。音声データは、図１５で後述する方法などにより、会議サーバ１０１がクライアント装置１０２から受信するものである。 FIG. 9 is an image diagram showing an example of an audio buffer according to the embodiment of the present invention. 900 in FIG. 9 is an audio buffer managed by the system using the memory of the conference server 101. In the example of FIG. 9, audio data of the client apparatuses 102a to 102c (participants A to C) are separated into separate audio buffers. Store in (three audio buffers). The audio data is received by the conference server 101 from the client device 102 by a method described later with reference to FIG.

９００には、前述のようにクライアント装置１０２ａ〜１０２ｃ（参加者Ａ〜Ｃ）の３つの音声バッファがある。クライアント装置１０２ａ〜１０２ｃの参加者が発言している（音声バッファに音声データがある）状態を９０１（実線）で表している。また、発言していない（音声バッファに音声データがない）状態を９０２（点線）で表している。９０１および９０２は、左から右に向かって時間の進行を表しており、９０３（矢印）が現在の時間（すなわちバッファの先頭）を表している。すなわち時間の進行と共に、９０３（矢印）は、左から右に進んでいく。例えば、９００ａのクライアント装置１０２ａに対応する音声バッファにおいて、発言が中断される、というのは９０３（矢印）が９０６ａの左端にあり、９０１（実線）と９０２（点線）の境目にある状態などを指す。すなわち音声バッファの先頭というのは、図９の線の左端ではなく、時間の進行と共に前述の９０３（矢印）の動きの通り、右に進んでいく位置を表しているものとする。実際には音声バッファのサイズは有限であるが、使用が終わった音声データを削除し、新たにクライアント装置から受信した音声データの格納をすることを繰り返すことによって、参加者が会議室から退出するまでの時間において使用される。 900 has three audio buffers of client devices 102a-102c (participants A-C) as described above. A state in which the participants of the client apparatuses 102a to 102c are speaking (audio data is in the audio buffer) is represented by 901 (solid line). Further, a state where no speech is made (no audio data in the audio buffer) is represented by 902 (dotted line). Reference numerals 901 and 902 represent the progress of time from left to right, and 903 (arrow) represents the current time (that is, the beginning of the buffer). That is, as time progresses, 903 (arrow) advances from left to right. For example, in the audio buffer corresponding to the client device 102a of 900a, the speech is interrupted because 903 (arrow) is at the left end of 906a and is at the boundary between 901 (solid line) and 902 (dotted line). Point to. That is, the head of the audio buffer is not the left end of the line in FIG. 9, but represents the position that advances to the right as the time 903 (arrow) moves as time progresses. Actually, the size of the audio buffer is limited, but the participant leaves the conference room by repeatedly deleting the audio data that has been used and storing the audio data newly received from the client device. Used up to

９００ａにおいては当初、クライアント装置１０２ａが音声発話権限を持っているが、９０６ａの左端で音声データがなくなる。その後、終了条件テーブル（図８）におけるクライアント装置１０２ａの音声出力終了条件８０２の間（９０６ａの右端までの間）に音声データが再度現れるため、発言は中止されなかったものと見なし、クライアント装置１０２ａは音声発話権限を解除することなく、継続して音声発話がなされる。クライアント装置１０２ｂ、１０２ｃの音声バッファにも音声データがあるが、音声発話権限を取得することはない。 In 900a, the client apparatus 102a initially has the voice utterance authority, but there is no voice data at the left end of 906a. Thereafter, since the audio data appears again during the audio output end condition 802 of the client apparatus 102a in the end condition table (FIG. 8) (to the right end of 906a), it is considered that the speech has not been stopped, and the client apparatus 102a Voice utterance is continuously made without releasing the voice utterance authority. The voice buffers of the client apparatuses 102b and 102c also have voice data, but the voice utterance authority is not acquired.

９００ｂにおいては当初、クライアント装置１０２ａが音声発話権限を持っているが、９０６ａの左端で音声データがなくなる。その後、終了条件テーブル（図８）におけるクライアント装置１０２ａの音声出力終了条件８０２の間（９０６ａの右端までの間）に音声データが再度現れない。また、クライアント装置１０２ａの音声発話権限が解除された後、クライアント装置１０２ｂ、１０２ｃはいずれも一旦は音声データがなくなり、終了条件テーブル（図８）におけるそれぞれの文字出力終了条件８０３よりも長い時間、音声データが再開されないため、音声発話権限を取得することが可能である。最終的には、クライアント装置１０２ｃの音声データが先に現れるため、クライアント装置１０２ｃが音声発話権限を取得する。クライアント装置１０２ｂが発言を再開すると、再び文字発話となる。 In 900b, the client apparatus 102a initially has the voice utterance authority, but no voice data is left at the left end of 906a. Thereafter, the audio data does not appear again during the audio output end condition 802 of the client apparatus 102a in the end condition table (FIG. 8) (up to the right end of 906a). In addition, after the voice utterance authority of the client apparatus 102a is canceled, the client apparatuses 102b and 102c once have no voice data, and are longer than the respective character output end conditions 803 in the end condition table (FIG. 8). Since the voice data is not resumed, the voice utterance authority can be acquired. Finally, since the voice data of the client apparatus 102c appears first, the client apparatus 102c acquires the voice utterance authority. When the client apparatus 102b resumes speaking, it becomes a character utterance again.

９００ｃにおいては、９００ｂと同様、クライアント装置１０２ａの音声発話権限が解除された後、クライアント装置１０２ｂ、１０２ｃはいずれも一旦は音声データがなくなる。クライアント装置１０２ｂの音声データはクライアント装置１０２ｃの音声データよりも先に再開されるが、終了条件テーブル（図８）における文字出力終了条件８０３を満たさない短い時間で再開されるため、音声発話権限を取得することができず、文字発話として再開される。一方、クライアント装置１０２ｃの音声データは、文字出力終了条件８０３を満たした上で再開されるため、音声発話権限を取得する。 In 900c, similarly to 900b, after the voice utterance authority of the client apparatus 102a is canceled, both of the client apparatuses 102b and 102c once have no audio data. The voice data of the client apparatus 102b is resumed before the voice data of the client apparatus 102c, but is resumed in a short time that does not satisfy the character output termination condition 803 in the termination condition table (FIG. 8). It cannot be acquired and resumed as a character utterance. On the other hand, since the voice data of the client apparatus 102c is resumed after satisfying the character output end condition 803, the voice utterance authority is acquired.

図１０は、本発明の実施形態に係わる権限を管理する変数および変数の状態の一例を示すイメージ図である。以下の説明において、便宜上、動作主体がクライアント装置１０２であるように記載しているが、実際には図１１以降のフローチャートで説明するようにクライアント装置１０２に対応して生成された会議サーバ１０１におけるプロセスを、会議サーバ１０１のＣＰＵ２０１が実行する処理である。 FIG. 10 is an image diagram showing an example of a variable for managing authority and a state of the variable according to the embodiment of the present invention. In the following description, for the sake of convenience, it is described that the operating subject is the client device 102, but actually, in the conference server 101 generated corresponding to the client device 102 as described in the flowcharts of FIG. The process is a process executed by the CPU 201 of the conference server 101.

まず、音声発話権限の取得、解除について説明する。いずれのクライアント装置１０２も音声発話権限を取得していない場合には、１０００ａのように、“音声用ロック変数１００１ａ＝１”となっている。また、“音声出力端末１００２ａ＝指定なし”であり、いずれのクライアント装置１０２も音声発話権限を取得していないことが分かる。 First, acquisition and cancellation of voice utterance authority will be described. When none of the client apparatuses 102 has acquired the voice utterance authority, “voice lock variable 1001a = 1” is set as in 1000a. Further, it is understood that “voice output terminal 1002a = not specified”, and no client apparatus 102 has acquired the voice utterance authority.

いずれかのクライアント装置１０２が音声発話権限を取得しようとする場合、音声用ロック変数に対してロックをかける。ロックをかけると１０００ｂのように“音声用ロック変数１００１ｂ＝０”となり、他のクライアント装置１０２は、音声発話権限を取得することができなくなる。既に他のクライアント装置１０２がロックをかけている場合には、ロックをかける処理がエラーとなる。また、ロックをかけることができた場合であっても、音声出力端末１００２に既に他の端末情報が登録されている場合には、音声発話権限は取得できない。１００２ｂにはまだいずれの端末情報も登録されていないため、端末情報を登録する。 When any of the client devices 102 tries to acquire the voice utterance authority, the voice lock variable is locked. When the lock is applied, “speech lock variable 1001b = 0” as in 1000b, and the other client apparatuses 102 cannot acquire the voice utterance authority. If another client device 102 has already been locked, an error occurs in the lock processing. Even if the lock can be applied, the voice utterance authority cannot be acquired if other terminal information is already registered in the voice output terminal 1002. Since no terminal information has been registered in 1002b, the terminal information is registered.

１０００ｃは、音声出力端末１００２ｃに端末情報としてクライアント装置１０２ａを登録した状態である。ロックを解除することにより“音声用ロック変数１００１ｃ＝１”となり、クライアント装置１０２ａの音声発話権限を取得する処理が完了し、音声発話を開始する。 1000c is a state in which the client apparatus 102a is registered as terminal information in the audio output terminal 1002c. By releasing the lock, “speech lock variable 1001c = 1” is obtained, the processing for obtaining the voice utterance authority of the client apparatus 102a is completed, and voice utterance is started.

音声発話が完了すると、クライアント装置１０２ａは、再び音声用ロック変数１００１にロックをかけ（音声用ロック変数１００１ｄ＝０）、音声出力端末１００２ｄに登録されている“クライアント装置１０２ａ”を削除し、ロックを解除する。これにより１０００ｅの状態となる。 When the voice utterance is completed, the client apparatus 102a locks the voice lock variable 1001 again (voice lock variable 1001d = 0), deletes the “client apparatus 102a” registered in the voice output terminal 1002d, and locks it. Is released. As a result, the state becomes 1000e.

また、前述したように文字発話についても権限を取得可能なクライアント装置の数を制限してもよい。その場合の文字発話権限を取得するイメージを、１０１０ｆ〜１０１０ｈに記載している。この図においては、文字発話権限を２つのクライアント装置１０２が取得することができる。詳細の説明は、文字出力端末を登録する変数が複数あることを除き、音声発話時の処理と同様なので省略する。 Also, as described above, the number of client devices that can acquire authority for character utterances may be limited. Images for acquiring the character utterance authority in this case are described in 1010f to 1010h. In this figure, the two client devices 102 can acquire the character utterance authority. The detailed description is the same as the processing at the time of voice utterance, except that there are a plurality of variables for registering the character output terminal, and therefore will be omitted.

なお、音声発話権限、文字発話権限共に、１つであってもよいし複数であってもよい。また、この数は固定ではなく、図４により会議室を登録したユーザや、図５により会議室を予約したユーザにより設定されるようにしてもよい。 Note that there may be one or more voice utterance authorities and character utterance authorities. This number is not fixed and may be set by the user who registered the conference room according to FIG. 4 or the user who reserved the conference room according to FIG.

図１０の１０１２におけるブロック数（例では２つ）が制限の数であり、不図示の文字出力権限数記憶部に制限する数を記憶することで実現可能である。また、音声発話権限も複数認めるようにし（１００２のブロックを複数用意する）、その数を不図示の音声出力権限数記憶部に記憶することで実現可能である。 The number of blocks at 1012 in FIG. 10 (two in the example) is the limit number, which can be realized by storing the limit number in a character output authority number storage unit (not shown). In addition, a plurality of voice utterance authorities can be recognized (a plurality of blocks 1002 are prepared), and the number can be stored in a voice output authority number storage unit (not shown).

次に、図１１から図１６のフローチャートにより、会議システムの処理の詳細について説明する。 Next, details of the process of the conference system will be described with reference to the flowcharts of FIGS. 11 to 16.

図１１は、本発明の実施形態に係わる会議室使用の処理を示すフローチャートの一例である。図１１の各ステップは、会議サーバ１０１のＣＰＵ２０１によって実行される。 FIG. 11 is an example of a flowchart showing a process of using a conference room according to the embodiment of the present invention. Each step in FIG. 11 is executed by the CPU 201 of the conference server 101.

ステップＳ１１０１においては、会議室使用を開始する。会議室使用の開始（すなわち本フローチャートの開始）は、図５において会議室の予約をする際に利用日付５０４、利用時間５０５により設定された開始時間に自動的に開始されてもよいし、クライアント装置１０２から会議使用の開始の指示を受け付けて開始されてもよい。 In step S1101, use of the conference room is started. The start of use of the conference room (that is, the start of this flowchart) may be started automatically at the start time set by the use date 504 and the use time 505 when the conference room is reserved in FIG. It may be started upon receiving an instruction to start using the conference from the device 102.

ステップＳ１１０２においては、新たな参加者（クライアント装置１０２）が、会議室に入室したか否かをチェックする。 In step S1102, it is checked whether a new participant (client device 102) has entered the conference room.

ステップＳ１１０３においては、ステップＳ１１０２における新たな参加者のチェックに基づき、新たな参加者があると判定した場合（“ＹＥＳ”）にはステップＳ１１０４に進む。新たな参加者がない場合（“ＮＯ”）にはステップＳ１１０５に進む。 If it is determined in step S1103 that there is a new participant based on the new participant check in step S1102 ("YES"), the process proceeds to step S1104. If there is no new participant (“NO”), the process proceeds to step S1105.

ステップＳ１１０４においては、新たな参加者（クライアント装置１０２）の発言を処理するためのプロセスを生成、開始する。プロセスにおける処理の詳細は、図１２〜図１４にて説明する。 In step S1104, a process for processing a new participant (client device 102) processing is generated and started. Details of the process in the process will be described with reference to FIGS.

ステップＳ１１０５においては、ステップＳ１１０４において生成したいずれかのプロセスにおいて、参加者（クライアント装置１０２）からの退出操作を受け付けたか否かを判定する。退出操作を受け付けたプロセスがある場合（“ＹＥＳ”）には、ステップＳ１１０６に進む。退出操作を受け付けたプロセスがない場合（“ＮＯ”）には、ステップＳ１１０７に進む。 In step S1105, it is determined whether an exit operation from the participant (client device 102) has been accepted in any of the processes generated in step S1104. If there is a process that has accepted the exit operation (“YES”), the process proceeds to step S1106. If there is no process that has accepted the exit operation (“NO”), the process proceeds to step S1107.

ステップＳ１１０６においては、ステップＳ１１０５において退出操作を受け付けたプロセスを終了し、削除する。 In step S1106, the process that has accepted the exit operation in step S1105 is terminated and deleted.

ステップＳ１１０７においては、会議が終了したか否かを判定する。具体的には、ユーザの操作によりクライアント装置１０２から会議終了操作を受け付けた場合である。また、全ての参加者が退出したことにより、会議が終了したと判定するよう実装してもよい。会議が終了したと判定された場合（“ＹＥＳ”）には、ステップＳ１１０８に進む。終了していないと判定された場合（“ＮＯ”）には、ステップＳ１１０２に進む。 In step S1107, it is determined whether the conference has ended. Specifically, this is a case where a conference end operation is received from the client device 102 by a user operation. Moreover, you may implement so that it may determine that the meeting was complete | finished when all the participants left. If it is determined that the conference has ended ("YES"), the process proceeds to step S1108. If it is determined that the process has not been completed (“NO”), the process proceeds to step S1102.

ステップＳ１１０８においては、会議室の使用終了処理をする。
以上で、図１１のフローチャートの説明を完了する。 In step S1108, a conference room use end process is performed.
Above, description of the flowchart of FIG. 11 is completed.

図１２は、本発明の実施形態に係わる各クライアント装置に対応するプロセスが音声または文字による発話の権限を取得する処理を示すフローチャートの一例である。前記プロセスとは、図１１のフローチャートで生成されたプロセスである。図１２の各ステップは、会議サーバ１０１のＣＰＵ２０１によって実行される。 FIG. 12 is an example of a flowchart showing a process in which a process corresponding to each client apparatus according to the embodiment of the present invention acquires the right to speak by voice or text. The process is a process generated in the flowchart of FIG. Each step in FIG. 12 is executed by the CPU 201 of the conference server 101.

ステップＳ１２０１においては、クライアント装置１０２における操作に基づき、会議室からの退出イベントを受け付けたか否かを判定する。退出イベントを受け付けた場合（“ＹＥＳ”）には、本フローチャートを終了する。また、終了する際に、図１１で説明した処理に、本プロセスが、退出イベントを受け付けて終了する旨を通知する。退出イベントを受け付けていない場合（“ＮＯ”）には、ステップＳ１２０２に進む。 In step S <b> 1201, it is determined based on an operation in the client device 102 whether an exit event from the conference room has been received. When the exit event is accepted (“YES”), this flowchart ends. When the process ends, the process described with reference to FIG. 11 is notified that the process has ended by accepting an exit event. If an exit event has not been received (“NO”), the process proceeds to step S1202.

ステップＳ１２０２においては、本プロセスに対応する音声バッファに音声データがあるか否かを判定する。具体的には、図９で説明した音声バッファに実線部分として示された音声データが格納されているか否かを判定する。なお、音声データは、図１５のフローチャートで説明するように、クライアント装置１０２から受け取り、音声バッファ（音声データ記憶部）に格納される（音声データ登録部）。音声データがある場合（“ＹＥＳ”）には、ステップＳ１２０３に進む。ない場合（“ＮＯ”）にはステップＳ１２０１に進む。 In step S1202, it is determined whether there is audio data in the audio buffer corresponding to this process. Specifically, it is determined whether or not audio data shown as a solid line portion is stored in the audio buffer described in FIG. Note that the audio data is received from the client device 102 and stored in the audio buffer (audio data storage unit) (audio data registration unit), as described with reference to the flowchart of FIG. If there is audio data (“YES”), the process proceeds to step S1203. If not (“NO”), the process proceeds to step S1201.

ステップＳ１２０３においては、音声用ロック変数にロックをかける（図１０における１００１）。 In step S1203, the voice lock variable is locked (1001 in FIG. 10).

ステップＳ１２０４においては、ステップＳ１２０３におけるロックをかける処理が成功したか否かを判定する。成功した場合（“ＹＥＳ”）にはステップＳ１２０５に進む。成功しなかった場合（“ＮＯ”）には、ステップＳ１２１４に進む。 In step S1204, it is determined whether or not the process for locking in step S1203 has succeeded. If successful (“YES”), the process advances to step S1205. If not successful (“NO”), the process proceeds to step S1214.

ステップＳ１２０５においては、音声出力権限を取得できるか否かを判定する（音声出力権限付与判定部）。具体的には、図１０における１００２の端末登録に空きがあり、登録可能か否かを判定する。音声出力権限を取得できた場合（“ＹＥＳ”）には、ステップＳ１２０６に進む。取得できない場合（“ＮＯ”）には、ステップＳ１２１３に進む。 In step S1205, it is determined whether or not an audio output authority can be acquired (audio output authority grant determination unit). Specifically, it is determined whether there is a vacancy in the terminal registration 1002 in FIG. 10 and registration is possible. If the voice output authority has been acquired (“YES”), the process proceeds to step S1206. If it cannot be acquired (“NO”), the process proceeds to step S1213.

ステップＳ１２０６においては、本プロセスのために、音声出力権限を取得する（音声出力権限付与部）。具体的には、図１０における１００２に、本プロセスに対応するクライアント装置１０２の端末情報を登録する。 In step S1206, an audio output authority is acquired for this process (audio output authority assigning unit). Specifically, terminal information of the client apparatus 102 corresponding to this process is registered in 1002 in FIG.

ステップＳ１２０７においては、音声出力権限の取得処理が終了したため、音声出力用ロック変数にかけたロックを解除する。 In step S1207, since the sound output authority acquisition process is completed, the lock applied to the sound output lock variable is released.

ステップＳ１２０８においては、本プロセスが音声発話権限を取得したため、音声バッファ内の音声データを、音声として出力する処理を実行する。詳細は、図１３のフローチャートにより説明する。 In step S1208, since this process has acquired the voice utterance authority, a process of outputting the voice data in the voice buffer as voice is executed. Details will be described with reference to the flowchart of FIG.

図１３は、本発明の実施形態に係わる音声による発言の権限を取得したプロセスが音声出力する処理を示すフローチャートの一例である。図１３の各ステップは、会議サーバ１０１のＣＰＵ２０１によって実行される。 FIG. 13 is an example of a flowchart illustrating a process of outputting a voice by a process that has acquired the right to speak by voice according to the embodiment of the present invention. Each step of FIG. 13 is executed by the CPU 201 of the conference server 101.

ステップＳ１３０１においては、音声バッファの先頭に音声データがあるか否かを判定する。具体的には、図９の音声バッファのイメージ図において、現在時刻に対応する音声バッファの位置として９０３（矢印）の直後（矢印の右側）に音声データがあるか（図では実線になっているか）否かにより判定する。音声データがある場合（“ＹＥＳ”）には、ステップＳ１３０２に進む。音声データがない場合（“ＮＯ”）には、ステップＳ１３０４に進む。 In step S1301, it is determined whether there is audio data at the head of the audio buffer. Specifically, in the image diagram of the audio buffer in FIG. 9, is there audio data immediately after 903 (arrow) as the position of the audio buffer corresponding to the current time (is the solid line in the figure)? Judge by whether or not. If there is audio data (“YES”), the process proceeds to step S1302. If there is no audio data (“NO”), the process proceeds to step S1304.

ステップＳ１３０２においては、音声バッファの音声データを、音声として出力する。本プロセスが音声発話権限を有しており、かつ音声データが存在するためである。 In step S1302, the audio data in the audio buffer is output as audio. This is because this process has a voice utterance authority and voice data exists.

ステップＳ１３０３においては、音声バッファに音声データがなく、音声発話が中断された時刻に対応する“音声非出力開始時刻”が設定されていないことを示すため“０”を代入する（初期化）。さらにステップＳ１３０１に戻り、新たに音声バッファに音声データがあれば、音声発話を繰り返す。 In step S1303, “0” is substituted (initialization) to indicate that there is no audio data in the audio buffer and “audio non-output start time” corresponding to the time when the audio utterance was interrupted is not set. Furthermore, returning to step S1301, if there is new audio data in the audio buffer, the voice utterance is repeated.

ステップＳ１３０４においては、“音声非出力開始時刻”が“０”であるか否かを判定する。“０”である場合とは、それまでステップＳ１３０４以降の処理を実行しておらず、初めて音声データがない状態（クライアント装置１０２における発言が中断された直後である状態）である。“音声非出力開始時刻”が“０”である場合（“ＹＥＳ”）には、ステップＳ１３０５に進む。“音声非出力開始時刻”が０でない場合（“ＮＯ”）には、（音声非出力開始時刻に現在時刻を設定する必要がないため）ステップＳ１３０６に進む。 In step S1304, it is determined whether or not “voice non-output start time” is “0”. The case of “0” means a state in which the processing after step S1304 has not been executed so far and there is no audio data for the first time (a state immediately after the speech in the client apparatus 102 is interrupted). If the “voice non-output start time” is “0” (“YES”), the process proceeds to step S1305. If the “voice non-output start time” is not 0 (“NO”), the process proceeds to step S1306 (because it is not necessary to set the current time as the voice non-output start time).

ステップＳ１３０５においては、“音声非出力開始時刻”に現在の時刻を設定する。すなわち、クライアント装置１０２における参加者の発言が中断され、本プロセスに対応する音声バッファに音声データがなくなった時刻を“音声非出力開始時刻”に設定する。 In step S1305, the current time is set as the “voice non-output start time”. That is, the time when the speech of the participant in the client apparatus 102 is interrupted and no audio data is stored in the audio buffer corresponding to this process is set as the “audio non-output start time”.

ステップＳ１３０６においては、“現在時刻−音声非出力開始時刻”が“音声出力終了条件”の値よりも大きな値（または等しい場合を含んでもよい）であるか否かを判定する。すなわち、図８で記憶されている音声出力終了条件８０２の値よりも大きい（等しい場合を含んでもよい）値であるか否かを判定する。“音声出力終了条件”の値よりも大きい（等しい場合を含んでもよい）値である場合（“ＹＥＳ”）には、ステップＳ１３０７に進み“終了条件”（図１２のフローチャートに対する戻り値）に“ｔｒｕｅ”を設定し、ステップＳ１３０８において“音声非出力開始時刻”を“０”にして（初期化）、本フローチャートを終了する。そうでない場合には、“終了条件”に“ｆａｌｓｅ”を設定して本フローチャートを終了する。 In step S1306, it is determined whether or not “current time−speech non-output start time” is a value greater than (or may include a case of equal to) the “speech output end condition”. That is, it is determined whether or not the value is larger than the value of the audio output end condition 802 stored in FIG. If the value is greater than the value of “sound output end condition” (may be equal) (“YES”), the process proceeds to step S1307 and “end condition” (return value for the flowchart of FIG. 12) is set to “ “true” is set, “audio non-output start time” is set to “0” (initialization) in step S1308, and this flowchart ends. If not, “false” is set in the “end condition” and this flowchart is ended.

以上により、図１３のフローチャートの説明を完了する。説明を図１２のフローチャートに戻す。この時点で、ステップＳ１２０８が完了している。 Thus, the description of the flowchart in FIG. 13 is completed. The description returns to the flowchart of FIG. At this point, step S1208 is complete.

ステップＳ１２０９においては、図１３のフローチャートの処理からの戻り値が、“ｆａｌｓｅ”であるか“ｔｒｕｅ”であるかを判定する（音声出力権限解除判定部）。“ｆａｌｓｅ”の場合には、ステップＳ１２０８（すなわち図１３のフローチャートの処理）に戻る。“ｔｒｕｅ”の場合には、ステップＳ１２１０に進む。“ｆａｌｓｅ”の場合とは、図１３の処理において、音声バッファにおける音声データが中断したものの、中断時間が図８の音声出力終了条件８０２の値に比して短く、音声発話権限を解除する必要がない場合である。そこで再度、図１３のフローチャートの処理に戻り、音声バッファに新たに音声データがあれば、音声発話を再開する。 In step S1209, it is determined whether the return value from the processing of the flowchart of FIG. 13 is “false” or “true” (audio output authority release determination unit). In the case of “false”, the process returns to step S1208 (that is, the process of the flowchart of FIG. 13). If “true”, the process advances to step S1210. In the case of “false”, the voice data in the voice buffer is interrupted in the processing of FIG. 13, but the interruption time is shorter than the value of the voice output end condition 802 in FIG. This is the case. Therefore, the process returns to the process of the flowchart of FIG. 13 again, and if there is new voice data in the voice buffer, voice utterance is resumed.

一方、図１３のフローチャートの処理に戻ったものの、音声バッファに新たに音声データがなければ、ステップＳ１３０６の判定が繰り返される。この時点で“現在時刻”は前回の判定時より進んでいくため、何度も繰り返されるといずれ“終了条件”が“ｔｒｕｅ”となる（ステップＳ１３０７）。すなわち、いずれは発言を中断している時間が音声出力終了条件８０２の値より長くなる。 On the other hand, if the process returns to the process of the flowchart of FIG. 13 but there is no new audio data in the audio buffer, the determination in step S1306 is repeated. At this time, the “current time” advances from the previous determination, so that the “end condition” eventually becomes “true” when repeated many times (step S1307). That is, in any case, the time during which speech is interrupted becomes longer than the value of the audio output end condition 802.

ステップＳ１２１０においては、音声発話権限を解除するため図１０の音声用ロック変数１００１にロックをかける。 In step S1210, the voice lock variable 1001 in FIG. 10 is locked to release the voice utterance authority.

ステップＳ１２１１においては、図１０の音声出力端末に登録してあった、本プロセスに対応する端末情報を削除する（音声出力権限解除部）。 In step S1211, the terminal information corresponding to this process registered in the voice output terminal of FIG. 10 is deleted (voice output authority release unit).

ステップＳ１２１２においては、音声用ロック変数１００１のロックを解除する。 In step S1212, the lock of the sound lock variable 1001 is released.

ステップＳ１２１３においては、音声用ロック変数１００１のロックを解除する。本ステップは、ロックをかけたものの音声出力権限を取得できなかった場合の解除である。 In step S1213, the lock of the sound lock variable 1001 is released. This step is a release when the sound output authority cannot be acquired although the lock is applied.

ステップＳ１２１４においては、本プロセスは、音声バッファ内の音声データを、文字として出力する処理を実行する。詳細は、図１４のフローチャートにより説明する。 In step S1214, the process executes a process of outputting the audio data in the audio buffer as characters. Details will be described with reference to the flowchart of FIG.

図１４は、本発明の実施形態に係わるプロセスが文字出力する処理を示すフローチャートの一例である。図１４の各ステップは、会議サーバ１０１のＣＰＵ２０１によって実行される。 FIG. 14 is an example of a flowchart showing processing for outputting characters by the process according to the embodiment of the present invention. Each step in FIG. 14 is executed by the CPU 201 of the conference server 101.

ステップＳ１４０１においては、音声バッファの先頭に音声データがあるか否かを判定する。具体的には、図９の音声バッファのイメージ図において、現在時刻に対応する音声バッファの位置として９０３（矢印）の直後（矢印の右側）に音声データがあるか（図では実線になっているか）否かにより判定する。音声データがある場合（“ＹＥＳ”）には、ステップＳ１４０２に進む。音声データがない場合（“ＮＯ”）には、ステップＳ１４０４に進む。 In step S1401, it is determined whether there is audio data at the head of the audio buffer. Specifically, in the image diagram of the audio buffer in FIG. 9, is there audio data immediately after 903 (arrow) as the position of the audio buffer corresponding to the current time (is the solid line in the figure)? Judge by whether or not. If there is audio data (“YES”), the process proceeds to step S1402. If there is no audio data (“NO”), the process proceeds to step S1404.

ステップＳ１４０２においては、音声バッファの音声データを、文字に変換し文字として出力する（音声データ文字変換部）。 In step S1402, the voice data in the voice buffer is converted into characters and output as characters (voice data character conversion unit).

ステップＳ１４０３においては、音声バッファに音声データがなく、文字発話が中断された時刻に対応する“文字非出力開始時刻”が設定されていないことを示すため“０”を代入する（初期化）。さらにステップＳ１４０１に戻り、新たに音声バッファに音声データがあれば、文字発話を繰り返す。 In step S1403, “0” is substituted to indicate that there is no voice data in the voice buffer and “character non-output start time” corresponding to the time when the character utterance was interrupted is set (initialization). Further, returning to step S1401, if there is new voice data in the voice buffer, the character utterance is repeated.

ステップＳ１４０４においては、“文字非出力開始時刻”が“０”であるか否かを判定する。“０”である場合とは、それまでステップＳ１４０４以降の処理を実行しておらず、初めて音声データがない状態（クライアント装置１０２における発言が中断された直後である状態）である。“文字非出力開始時刻”が“０”である場合（“ＹＥＳ”）には、ステップＳ１４０５に進む。“文字非出力開始時刻”が０でない場合（“ＮＯ”）には、（文字非出力開始時刻に現在時刻を設定する必要がないため）ステップＳ１４０６に進む。 In step S1404, it is determined whether or not “character non-output start time” is “0”. The case of “0” means a state in which the processes after step S1404 have not been executed so far and there is no audio data for the first time (a state immediately after the speech in the client apparatus 102 is interrupted). If the “character non-output start time” is “0” (“YES”), the process proceeds to step S1405. If the “character non-output start time” is not 0 (“NO”), the process proceeds to step S1406 (since it is not necessary to set the current time as the character non-output start time).

ステップＳ１４０５においては、“文字非出力開始時刻”に現在の時刻を設定する。すなわち、クライアント装置１０２における参加者の発言が中断され、本プロセスに対応する音声バッファに音声データがなくなった時刻を“文字非出力開始時刻”に設定する。 In step S1405, the current time is set in “character non-output start time”. That is, the time when the participant's speech in the client apparatus 102 is interrupted and no voice data is stored in the voice buffer corresponding to this process is set as the “character non-output start time”.

ステップＳ１４０６においては、“現在時刻−文字非出力開始時刻”が“文字出力終了条件”の値よりも大きな値（等しい場合を含んでもよい）であるか否かを判定する。すなわち、図８で記憶されている文字出力終了条件８０３の値よりも大きい（等しい場合を含んでもよい）値であるか否かを判定する。“文字出力終了条件”の値よりも大きい（等しい場合を含んでもよい）値である場合（“ＹＥＳ”）には、ステップＳ１４０７に進み“終了条件”（図１２のフローチャートに対する戻り値）に“ｔｒｕｅ”を設定し、ステップＳ１４０８において“文字非出力開始時刻”を“０”にして（初期化）、本フローチャートを終了する。そうでない場合には、ステップＳ１４０９において“終了条件”に“ｆａｌｓｅ”を設定して本フローチャートを終了する。 In step S1406, it is determined whether or not “current time−character non-output start time” is a value larger than the value of “character output end condition” (may be equal). That is, it is determined whether or not the value is larger than the value of the character output end condition 803 stored in FIG. If the value is greater than the value of “character output end condition” (may be equal) (“YES”), the process proceeds to step S1407 and “end condition” (return value for the flowchart of FIG. 12) is set to “ "true" is set, "character non-output start time" is set to "0" (initialization) in step S1408, and this flowchart ends. If not, “false” is set in “end condition” in step S1409, and this flowchart is ended.

以上により、図１４のフローチャートの説明を完了する。説明を図１２のフローチャートに戻す。この時点で、ステップＳ１２１４が完了している。 Thus, the description of the flowchart in FIG. 14 is completed. The description returns to the flowchart of FIG. At this point, step S1214 has been completed.

ステップＳ１２１５においては、図１４のフローチャートの処理からの戻り値が、“ｆａｌｓｅ”であるか“ｔｒｕｅ”であるかを判定する（文字出力権限解除判定部）。“ｆａｌｓｅ”の場合には、ステップＳ１２１４（すなわち図１４のフローチャートの処理）に戻る。“ｔｒｕｅ”の場合には、ステップＳ１２０１に進む。“ｆａｌｓｅ”の場合とは、図１４の処理において、音声バッファにおける音声データが中断したものの、中断時間が図８の文字出力終了条件８０３の値に比して短く、文字発話を継続する場合である。そこで再度、図１４のフローチャートの処理に戻り、音声バッファに新たに音声データがあれば、文字発話を再開する。 In step S1215, it is determined whether the return value from the process of the flowchart of FIG. 14 is “false” or “true” (character output authority release determination unit). In the case of “false”, the process returns to step S1214 (that is, the process of the flowchart of FIG. 14). If “true”, the process advances to step S1201. In the case of “false”, the voice data in the voice buffer is interrupted in the processing of FIG. 14, but the interruption time is shorter than the value of the character output end condition 803 in FIG. is there. Therefore, returning to the process of the flowchart of FIG. 14 again, if there is new voice data in the voice buffer, the character utterance is resumed.

一方、図１４のフローチャートの処理に戻ったものの、音声バッファに新たに音声データがなければ、ステップＳ１４０６の判定が繰り返される。この時点で“現在時刻”は前回の判定時より進んでいくため、何度も繰り返されるといずれ“終了条件”が“ｔｒｕｅ”となる（ステップＳ１４０７）。すなわち、いずれは発言を中断している時間が文字出力終了条件８０３の値より長くなる。 On the other hand, if the process returns to the process of the flowchart of FIG. 14 but there is no new audio data in the audio buffer, the determination in step S1406 is repeated. At this time, the “current time” advances from the previous determination, so that the “end condition” will eventually become “true” when repeated many times (step S1407). That is, in any case, the time during which speech is interrupted becomes longer than the value of the character output end condition 803.

なお、図１２においては、文字発話を行う際（ステップＳ１２１４における文字出力）には、権限の取得は行っていない。ここで、音声発話権限と同様に、文字発話権限を要求するようにしてもよい（図１０の１０００ｆ〜１０００ｈ）。これにより、文字発話を行うクライアント装置１０２（参加者）の数を制限し、発言が分かりやすくすることが可能となる。フローチャートとしては、音声発話権限の取得、解除と同様になるので説明を省略するが、文字出力権限付与判定部、文字出力権限付与部、文字出力権限解除部を実装することが可能である。
以上により図１２のフローチャートの説明を終了する。 In FIG. 12, authority is not acquired when a character is uttered (character output in step S1214). Here, as with the voice utterance authority, the character utterance authority may be requested (1000f to 1000h in FIG. 10). This makes it possible to limit the number of client devices 102 (participants) who perform character utterances and make it easier to understand the utterances. The flowchart is the same as the acquisition and cancellation of the voice utterance authority, and thus the description thereof is omitted. However, it is possible to implement a character output authority assignment determination unit, a character output authority assignment unit, and a character output authority release unit.
This is the end of the description of the flowchart of FIG.

なお、図８の終了条件テーブルにおいて、音声出力終了条件８０２、文字出力終了条件８０３は時間（音声バッファ内の音声データが中断した時間）により設定され、また図１３、図１４のフローチャートの説明においては、前述の時間的な条件が満たされた場合に、音声出力または文字出力を終了するものとしている。 In the end condition table of FIG. 8, the voice output end condition 802 and the character output end condition 803 are set according to time (time when the voice data in the voice buffer is interrupted), and in the description of the flowcharts of FIGS. The voice output or the character output is terminated when the above-described temporal condition is satisfied.

しかしながら、条件は他の方法で指定されるものであってもよい。例えば、条件として“キーワード”を設定可能であってもよい。その場合、図１３のステップＳ１３０２、図１４のステップＳ１４０２において、その発言の中に例えば、“以上、終了します。”、“終わります。どうぞ”などのキーワードのパターンに一致した場合に、終了条件を“ｔｒｕｅ”として図１２のフローチャートに戻るようにしてもよい。 However, the conditions may be specified by other methods. For example, “keyword” may be set as a condition. In that case, in step S1302 in FIG. 13 and step S1402 in FIG. 14, the processing ends when the utterance matches a keyword pattern such as “End now.” Or “End. Please”. The condition may be “true” and the process may return to the flowchart of FIG.

また、図８の時間的な終了条件と、キーワードの終了条件、その他の終了条件を組み合わせたものであってもよい。 Further, it may be a combination of the temporal termination condition of FIG. 8, the keyword termination condition, and other termination conditions.

図１５は、本発明の実施形態に係わる各クライアント装置に入力された音声を会議サーバに送信する処理を示すフローチャートの一例である。図１５のステップＳ１５０１〜Ｓ１５０４は、クライアント装置１０２のＣＰＵ２０１によって、ステップＳ１５１１〜Ｓ１５１４は、会議サーバ１０１のＣＰＵ２０１によって実行される。 FIG. 15 is an example of a flowchart illustrating processing for transmitting audio input to each client device according to the embodiment of the present invention to the conference server. Steps S1501 to S1504 in FIG. 15 are executed by the CPU 201 of the client apparatus 102, and steps S1511 to S1514 are executed by the CPU 201 of the conference server 101.

ステップＳ１５０１において、クライアント装置１０２は、マイクロフォンを接続することにより、会議に参加するユーザから音声データの入力を受け付ける。 In step S1501, the client apparatus 102 accepts input of audio data from a user participating in the conference by connecting a microphone.

ステップＳ１５０２において、クライアント装置１０２は、ユーザから受け付けた音声データを会議サーバ１０１に送信する（音声データ送信部）。 In step S1502, the client apparatus 102 transmits the audio data received from the user to the conference server 101 (audio data transmission unit).

ステップＳ１５１１において、会議サーバ１０１は、クライアント装置１０２から音声データを受信する（音声受付部）。 In step S1511, the conference server 101 receives audio data from the client device 102 (audio reception unit).

ステップＳ１５１２において、会議サーバ１０１は、受信した音声データを音声バッファ（図９）に格納する。ユーザが発言せず、音声データがない場合には、音声バッファ内に同じ時間分だけの無音のデータを格納してもよい。あるいは、音声バッファに格納されている音声データに時刻が対応付けられる様にして、音声バッファから音声データを取り出す際に、利用する側のアプリケーションが、音声データ間の無音の間隔を解析して処理できるようにしてもよい。 In step S1512, the conference server 101 stores the received audio data in the audio buffer (FIG. 9). If the user does not speak and there is no audio data, silence data for the same time may be stored in the audio buffer. Alternatively, when the audio data is extracted from the audio buffer so that the time is associated with the audio data stored in the audio buffer, the application on the use side analyzes and processes the silence interval between the audio data. You may be able to do it.

また、クライアント装置１０２における後述のステップＳ１５０３〜Ｓ１５０４は、ステップＳ１５０１〜ステップＳ１５０２とは非同期に実行される。同様に、会議サーバ１０１における後述のステップＳ１５１３〜ステップＳ１５１４は、ステップＳ１５１１〜ステップＳ１５１２と非同期に実行される。 Further, steps S1503 to S1504 described later in the client apparatus 102 are executed asynchronously with steps S1501 to S1502. Similarly, steps S1513 to S1514 described later in the conference server 101 are executed asynchronously with steps S1511 to S1512.

ステップＳ１５０３において、クライアント装置１０２は、撮像装置を接続することにより、クライアント装置１０２を利用するユーザの様子などを動画像として入力させ、受け付ける。 In step S <b> 1503, the client apparatus 102 inputs and accepts, as a moving image, the state of the user who uses the client apparatus 102 by connecting the imaging apparatus.

ステップＳ１５０４において、クライアント装置１０２は、動画像を画像データとして会議サーバ１０１に送信する。 In step S1504, the client apparatus 102 transmits the moving image as image data to the conference server 101.

ステップＳ１５１３において、会議サーバ１０１は、クライアント装置１０２から画像データを受信する（画像受付部）。 In step S1513, the conference server 101 receives image data from the client device 102 (image reception unit).

ステップＳ１５１４において、会議サーバ１０１は、受信した画像データを画像バッファ（画像データ記憶部）に格納する（画像データ登録部）。
以上により、図１５のフローチャートの説明を完了する。 In step S1514, the conference server 101 stores the received image data in an image buffer (image data storage unit) (image data registration unit).
Thus, the description of the flowchart of FIG. 15 is completed.

図１６は、本発明の実施形態に係わる各プロセスで出力された音声および文字による発言を各クライアント装置に送信する処理を示すフローチャートの一例である。図１６のステップＳ１６０１〜Ｓ１６０４は、クライアント装置１０２のＣＰＵ２０１によって、ステップＳ１６１１〜Ｓ１６１５は、会議サーバ１０１のＣＰＵ２０１によって実行される。 FIG. 16 is an example of a flowchart showing a process of transmitting speech and text utterances output in each process according to the embodiment of the present invention to each client device. Steps S1601 to S1604 in FIG. 16 are executed by the CPU 201 of the client apparatus 102, and steps S1611 to S1615 are executed by the CPU 201 of the conference server 101.

ステップＳ１６１１において、会議サーバ１０１は、音声データを取得する（音声取得部）。具体的には、図１３のステップＳ１３０２において出力された音声データを取得する。 In step S1611, the conference server 101 acquires audio data (audio acquisition unit). Specifically, the audio data output in step S1302 of FIG. 13 is acquired.

ステップＳ１６１２において、会議サーバ１０１は、音声データをクライアント装置１０２に送信する（音声送信部）。 In step S <b> 1612, the conference server 101 transmits audio data to the client device 102 (audio transmission unit).

ステップＳ１６０１において、クライアント装置１０２は、会議サーバ１０１から音声データを受信する（音声データ受信部）。 In step S1601, the client apparatus 102 receives audio data from the conference server 101 (audio data receiving unit).

ステップＳ１６０２において、クライアント装置１０２は、音声データの出力を制御する。具体的には、クライアント装置１０２に接続されたスピーカなどから、音声データを出力させる。 In step S1602, the client apparatus 102 controls output of audio data. Specifically, audio data is output from a speaker or the like connected to the client device 102.

また、クライアント装置１０２における後述のステップＳ１６０３〜Ｓ１６０４は、ステップＳ１６０１〜ステップＳ１６０２とは非同期に実行される。同様に、会議サーバ１０１における後述のステップＳ１６１３〜ステップＳ１６１５は、ステップＳ１６１１〜ステップＳ１６１２と非同期に実行される。 Further, steps S1603 to S1604 described later in the client apparatus 102 are executed asynchronously with steps S1601 to S1602. Similarly, steps S1613 to S1615 described later in the conference server 101 are executed asynchronously with steps S1611 to S1612.

ステップＳ１６１３において、会議サーバ１０１は、画像データを取得する。具体的には、図１５のステップＳ１５１４において画像バッファに記録したクライアント装置１０２の画像データを取得する。 In step S1613, the conference server 101 acquires image data. Specifically, the image data of the client apparatus 102 recorded in the image buffer in step S1514 in FIG. 15 is acquired.

ステップＳ１６１４において、会議サーバ１０１は、文字情報を画像データに合成する（画像合成部）。具体的には、図１４のステップＳ１４０２において音声データから文字に変換され、出力された文字情報を取得し、その音声データと対応するクライアント装置１０２から送信された画像と合成する（図７の７０１ｂ、７０１ｃなど）。さらに、クライアント装置１０２が複数ある場合であって、合成された画像が複数ある場合には（図７の７０１ａ〜７０１ｃ。ただし７０１ａは文字情報が合成されていない）、さらに１つの全体画像データとしてクライアント装置１０２で表示させる形式を生成する（図７の７００）。 In step S1614, the conference server 101 synthesizes character information with image data (image synthesizer). Specifically, in step S1402 in FIG. 14, the voice data is converted into characters and output character information is acquired and synthesized with the voice data and the image transmitted from the corresponding client device 102 (701b in FIG. 7). , 701c, etc.). Further, when there are a plurality of client apparatuses 102 and there are a plurality of synthesized images (701a to 701c in FIG. 7, where 701a is not synthesized with character information), one more whole image data is obtained. A format to be displayed on the client apparatus 102 is generated (700 in FIG. 7).

なお、文字情報を全体画像データとは別にクライアント装置１０２に送信し、クライアント装置１０２において、文字情報と全体画像データを合成可能である場合には、クライアント装置１０２に文字情報を送信してもよい。 The character information may be transmitted to the client apparatus 102 separately from the entire image data, and the character information may be transmitted to the client apparatus 102 if the client apparatus 102 can synthesize the character information and the entire image data. .

ステップＳ１６１５において、会議サーバ１０１は、ステップＳ１６１４において合成された全体画像データをクライアント装置１０２に送信する（全体画像データ送信部）。 In step S1615, the conference server 101 transmits the entire image data combined in step S1614 to the client device 102 (entire image data transmission unit).

ステップＳ１６０３において、クライアント装置１０２は、会議サーバ１０１から全体画像データを受信する。 In step S <b> 1603, the client apparatus 102 receives the entire image data from the conference server 101.

ステップＳ１６０４において、クライアント装置１０２は、受信した全体画像データの表示を制御する。具体的には、クライアント装置１０２に接続された表示装置などに表示させる。
以上で、図１６のフローチャートの説明が完了する。 In step S1604, the client apparatus 102 controls display of the received entire image data. Specifically, it is displayed on a display device connected to the client device 102.
This completes the description of the flowchart of FIG.

以上で、発話を、音声発話、文字発話としてユーザに提示し、また各発話の区切りにおいて音声発話と文字発話をユーザにとって分かりやすく切り替える方式について説明した。 As described above, a method has been described in which utterances are presented to the user as voice utterances and character utterances, and voice utterances and character utterances are easily switched for the user at the end of each utterance.

次に、図１７〜図１８を（および前述の図７を再度）用いて、「文字発話」の際に表示する文字を、画面上（クライアント装置１０２のクライアント側アプリケーション、例えばＷｅｂブラウザ）のいずれの位置に分かりやすく表示するか、について説明する。 Next, using FIG. 17 to FIG. 18 (and FIG. 7 again), the character to be displayed at the time of “character utterance” is displayed on the screen (client side application of the client device 102, for example, a Web browser). It will be explained whether the information is displayed in an easy-to-understand position.

図７においては、３つのクライアント装置（１０２ａ〜１０２ｃ）において、３名（Ａ、Ｂ、Ｃ）が発話し、ＢとＣの発話が文字発話となっている。すなわち、文字発話をしている端末に対応する参加者画面がクライアント上に表示されているため、文字も対応する参加者画面上に表示している。この画面構成により誰が発話しているかを分かりやすく表示する。 In FIG. 7, in three client apparatuses (102a to 102c), three persons (A, B, C) speak and B and C utterances are character utterances. That is, since the participant screen corresponding to the terminal that is uttering characters is displayed on the client, the characters are also displayed on the corresponding participant screen. This screen structure makes it easy to see who is speaking.

図１７は、本発明の実施形態に関わるクライアント装置に表示される会議実施中の画面の一例である。１７０１の画面には、１つの端末からの参加者画面のみが表示されている。この場合、図７とは異なり、Ｂ、Ｃの参加者画面が存在しないため、各発話者に対応する参加者画面以外の領域に表示する。この領域は、表示されている１つの参加者画面の画像に合成されていてもよい。また、発話者に対応する参加者画面がない場合には、文字発話専用の枠（文字発話枠）が用意され、その中に表示されてもよい（１７０２の画面）。ここで、文字は発話のタイミングに応じて時系列的に左から右に流れるように図示しているが、これはあくまで一例であり、縦やその他の方向、向きであってもよい。 FIG. 17 is an example of a screen during a conference displayed on the client device according to the embodiment of the present invention. On the screen 1701, only the participant screen from one terminal is displayed. In this case, unlike FIG. 7, since the participant screen of B and C does not exist, it displays on the area | regions other than the participant screen corresponding to each speaker. This area may be combined with an image of one displayed participant screen. Further, when there is no participant screen corresponding to the speaker, a frame dedicated to character utterance (character utterance frame) may be prepared and displayed therein (screen 1702). Here, the characters are illustrated so as to flow from left to right in chronological order according to the timing of utterances, but this is merely an example, and may be in the vertical direction, other directions, and orientations.

また、不図示ではあるが、文字発話に対応する参加者画面が表示されているか否かにかかわらず、発話は全て文字発話枠に表示されるようにしてもよい。 Although not shown, all utterances may be displayed in the character utterance frame regardless of whether the participant screen corresponding to the character utterance is displayed.

発話に対応する参加者画面があればその参加者画面内に表示するか、参加者画面があっても文字発話枠に表示するかは、“発話枠表示フラグ”に設定する。発話枠表示フラグが“ｔｒｕｅ”の場合は、発話に対応する参加者画面があっても文字発話枠に文字を表示し、“ｆａｌｓｅ”の場合は、発話に対応する参加者画面があれば、該参加者画面内に表示する。 Whether there is a participant screen corresponding to the utterance is displayed in the participant screen or whether the participant screen is displayed in the text utterance frame is set in the “utterance frame display flag”. If the utterance frame display flag is “true”, characters are displayed in the character utterance frame even if there is a participant screen corresponding to the utterance. If “false”, if there is a participant screen corresponding to the utterance, Display in the participant screen.

また、“発話枠表示フラグ＝ｆａｌｓｅ”である場合、発話に参加者画面が表示されていなければ発話を表示しないようにしてもよい。参加者画面の表示状態は、参加者（実際にはクライアント装置１０２）に対応する画面上のフレーム（表示枠）が表示されるよう会議サーバ１０１により制御されているか否かにより、画面データ表示判定部により判定する。具体的には、“参加者発話表示フラグ”が“ｔｒｕｅ”の場合は、文字発話枠を使用する、すなわち文字発話に対応する参加者画面が表示されていない場合には、その文字発話を文字発話枠内に表示する。“ｆａｌｓｅ”の場合は、文字発話に対応する参加者画面が表示されていないときには文字発話枠を使用せず、文字発話の内容は、画面には表示されないようにしてもよい。 If “utterance frame display flag = false”, the speech may not be displayed unless the participant screen is displayed in the speech. The display state of the participant screen is determined based on whether or not the conference server 101 controls to display a frame (display frame) on the screen corresponding to the participant (actually the client device 102). Determine by part. Specifically, when the “participant utterance display flag” is “true”, the character utterance frame is used, that is, when the participant screen corresponding to the character utterance is not displayed, the character utterance is changed to the character utterance. Display in the utterance frame. In the case of “false”, when the participant screen corresponding to the character utterance is not displayed, the character utterance frame may not be used and the content of the character utterance may not be displayed on the screen.

発話枠表示フラグ、参加者発話表示フラグは、不図示の“発話文字表示設定記憶部”に記憶される。
以上で図１７の説明を完了する。 The utterance frame display flag and the participant utterance display flag are stored in a “spoken character display setting storage unit” (not shown).
This completes the description of FIG.

図１８は、本発明の実施形態に関わる文字発話の表示画面生成の処理を示すフローチャートの一例である。図１８の各ステップは会議サーバ１０１のＣＰＵ２０１によって実行される。具体的には、図１６における会議サーバ側の処理ステップであるステップＳ１６１３を詳細に説明するものである。 FIG. 18 is an example of a flowchart showing processing for generating a display screen of character utterances according to the embodiment of the present invention. Each step in FIG. 18 is executed by the CPU 201 of the conference server 101. Specifically, step S1613, which is a processing step on the conference server side in FIG. 16, will be described in detail.

ステップＳ１８０１からステップＳ１８０８のループでは、文字発話権限を持つクライアント装置１０２に対応する各音声データに対する処理を実行する。 In the loop from step S1801 to step S1808, processing is performed on each piece of audio data corresponding to the client apparatus 102 having the character utterance authority.

ステップＳ１８０２においては、実際に音声バッファにおいて音声データが認識され、これから文字発話しようとする１つの発話者（実際には１つの端末）の音声データに着目する。以下便宜上、“着目中の発話”という。例えば“着目中の発話を文字発話として表示しない”と表現する。 In step S1802, the voice data is actually recognized in the voice buffer, and attention is paid to the voice data of one speaker (actually one terminal) who is going to speak a character from now on. Hereinafter, for the sake of convenience, it is referred to as “the utterance under attention”. For example, it expresses that “the utterance under consideration is not displayed as a character utterance”.

ステップＳ１８０３においては、着目中の発話を、文字発話として文字発話枠に表示するか否かを判定する（文字発話枠表示判定部）。具体的には、“発話枠表示フラグ”がｔｒｕｅかｆａｌｓｅかにより判定する。ｔｒｕｅの場合は、全ての発話を文字発話枠に表示する場合であり、ステップＳ１８０７に進む。ｆａｌｓｅの場合はステップＳ１８０４に進む。 In step S1803, it is determined whether or not to display the utterance under attention as a character utterance in the character utterance frame (character utterance frame display determination unit). Specifically, the determination is made based on whether the “speech frame display flag” is true or false. In the case of true, all utterances are displayed in the character utterance frame, and the process proceeds to step S1807. If false, the process advances to step S1804.

ステップＳ１８０４においては、着目中の発話に対応する（すなわち着目中の発話に対応するクライアント装置１０２に対応する）参加者画面が、表示されているか否かを判定する（画像データ表示判定部）。表示されている場合（ＹＥＳの場合）は、ステップＳ１８０５に進む。表示されていない場合（ＮＯの場合）はステップＳ１８０６に進む。 In step S1804, it is determined whether or not a participant screen corresponding to the utterance being noticed (that is, corresponding to the client device 102 corresponding to the utterance being noticed) is displayed (image data display determination unit). If it is displayed (in the case of YES), the process proceeds to step S1805. If not displayed (NO), the process proceeds to step S1806.

ステップＳ１８０５においては、着目中の発話を文字発話として、対応する参加者画面に表示する。 In step S1805, the utterance under attention is displayed as a character utterance on the corresponding participant screen.

ステップＳ１８０６においては、着目中の発話を、文字発話として文字発話枠に表示するか否かを判定する（参加者発話表示判定部）。具体的には、“参加者発話表示フラグ”がｔｒｕｅかｆａｌｓｅかにより判定する。trueの場合は、着目中の発話を、文字発話として文字発話枠に表示するとの設定であり、ステップS１８０７に進む。ｆａｌｓｅの場合は、ステップＳ１８０８に進む。 In step S1806, it is determined whether or not the utterance under consideration is displayed as a character utterance in the character utterance frame (participant utterance display determination unit). Specifically, the determination is made based on whether the “participant utterance display flag” is true or false. In the case of true, it is set to display the utterance under attention as a character utterance in the character utterance frame, and the process proceeds to step S1807. If false, the process proceeds to step S1808.

ステップＳ１８０８においては、着目中の発話を文字発話として文字発話枠に文字の合成を行う。既に他の文字発話者の文字がある場合には、その文字を消してしまわないよう追加となる。 In step S1808, the utterance under consideration is regarded as a character utterance and characters are synthesized in the character utterance frame. If there is already another character's character, it will be added so that it will not be erased.

なお、図１７の１７０４の文字発話ＤとＥの横方向の位置のずれで示すように、時間的に後からなされた文字発話は、時系列的にずれるようにしてもよい。会議サーバにおいては、図１６の会議サーバ１０１側の処理で示すように、ステップＳ１６１４（すなわち図１８のフローチャートで説明する処理）は、非同期に繰り返される。その際、発話時刻に応じて、前述のように複数の文字発話の表示位置が時系列的にずれるように画面を合成することにより実現可能である。
以上で、図１８のフローチャートの説明を完了する。 Note that the character utterances made later in time may be shifted in time series, as shown by the deviation of the horizontal positions of the character utterances D and E in 1704 in FIG. In the conference server, as shown in the process on the conference server 101 side in FIG. 16, step S1614 (that is, the process described in the flowchart in FIG. 18) is asynchronously repeated. At that time, it is possible to realize by synthesizing the screen so that the display positions of the plurality of character utterances are shifted in time series according to the utterance time.
Above, description of the flowchart of FIG. 18 is completed.

次に、図１９のフローチャートを用いて第２実施形態を説明する。図１８のフローチャートで説明した処理とは異なり、あらかじめ設定した各種フラグを用いずに文字出力の表示をどのように処理するかを決定する。 Next, a second embodiment will be described using the flowchart of FIG. Unlike the processing described with reference to the flowchart of FIG. 18, it is determined how to display the character output without using various preset flags.

図１９は、本発明の第２の実施形態に関わる文字発話の表示画面生成の処理を示すフローチャートの一例である。図１９の各ステップは、会議サーバ１０１のＣＰＵ２０１によって実行される。 FIG. 19 is an example of a flowchart showing processing for generating a display screen for character utterances according to the second embodiment of the present invention. Each step in FIG. 19 is executed by the CPU 201 of the conference server 101.

ステップＳ１９０１からステップＳ１９０６のループは、全ての文字出力についての繰り返し処理である。 The loop from step S1901 to step S1906 is an iterative process for all character outputs.

ステップＳ１９０２においては、１つの文字出力に着目する。 In step S1902, attention is focused on one character output.

ステップＳ１９０３においては、着目した文字出力に対応する参加者画面が、全体画像データに含まれているか否か判定する。含まれている場合には、ステップＳ１９０４に進む。含まれていない場合にはステップＳ１９０５に進む。 In step S1903, it is determined whether or not a participant screen corresponding to the focused character output is included in the entire image data. If included, the process proceeds to step S1904. If not included, the process proceeds to step S1905.

ステップＳ１９０４においては、着目中の文字出力に対応する参加者画面に、着目中の文字出力データを合成する。 In step S1904, the character output data under attention is synthesized with the participant screen corresponding to the character output under attention.

ステップＳ１９０５においては、着目中の文字出力を文字発話枠に表示する。文字発話枠がなければ生成する。
以上で、図１９のフローチャートの説明を完了する。 In step S1905, the character output under attention is displayed in the character utterance frame. If there is no character utterance frame, it is generated.
This completes the description of the flowchart in FIG.

なお、図１８、図１９のフローチャートで、文字出力と対応する参加者画面があれば、参加者画面と出力文字を合成する旨の説明をした。実際には、参加者画面の画像データの領域内に画像データとして合成してもよいし、また、隣り合う領域に表示（例えば、画像データの下に横書きする、画像データの右に縦書きする）ようにしてもよい。いずれの方法においても、文字出力と参加者画面とが、対応していることが明確にわかればよい。 In the flowcharts of FIGS. 18 and 19, if there is a participant screen corresponding to the character output, it has been described that the participant screen and the output character are combined. Actually, it may be combined as image data in the image data area of the participant screen, or displayed in an adjacent area (for example, horizontally written below the image data, vertically written to the right of the image data) You may do it. In any method, it is only necessary to clearly understand that the character output corresponds to the participant screen.

なお、上述した各種データの構成及びその内容はこれに限定されるものではなく、用途や目的に応じて、様々な構成や内容で構成されることは言うまでもない。 It should be noted that the configuration and contents of the various data described above are not limited to this, and it goes without saying that the various data and configurations are configured according to the application and purpose.

以上、一実施形態について示したが、本発明は、例えば、システム、装置、方法、プログラムもしくは記録媒体等としての実施態様をとることが可能であり、具体的には、複数の機器から構成されるシステムに適用しても良いし、また、一つの機器からなる装置に適用しても良い。 Although one embodiment has been described above, the present invention can take an embodiment as, for example, a system, apparatus, method, program, or recording medium, and specifically includes a plurality of devices. The present invention may be applied to a system including a single device.

また、本発明におけるプログラムは、図１１〜図１６、図１８に示すフローチャートの処理方法をコンピュータが実行可能なプログラムであり、本発明の記憶媒体は図１１〜図１６、図１８の処理方法をコンピュータが実行可能なプログラムが記憶されている。なお、本発明におけるプログラムは図１１〜図１６、図１８の各装置の処理方法ごとのプログラムであってもよい。 Further, the program in the present invention is a program that allows a computer to execute the processing methods of the flowcharts shown in FIGS. 11 to 16 and 18, and the storage medium of the present invention uses the processing methods in FIGS. 11 to 16 and 18. A computer executable program is stored. Note that the program in the present invention may be a program for each processing method of each device in FIGS. 11 to 16 and 18.

以上のように、前述した実施形態の機能を実現するプログラムを記録した記録媒体を、システムあるいは装置に供給し、そのシステムあるいは装置のコンピュータ（またはＣＰＵやＭＰＵ）が記録媒体に格納されたプログラムを読出し実行することによっても、本発明の目的が達成されることは言うまでもない。 As described above, a recording medium that records a program that implements the functions of the above-described embodiments is supplied to a system or apparatus, and a computer (or CPU or MPU) of the system or apparatus stores the program stored in the recording medium. It goes without saying that the object of the present invention can also be achieved by executing the reading.

この場合、記録媒体から読み出されたプログラム自体が本発明の新規な機能を実現することになり、そのプログラムを記憶した記録媒体は本発明を構成することになる。 In this case, the program itself read from the recording medium realizes the novel function of the present invention, and the recording medium storing the program constitutes the present invention.

プログラムを供給するための記録媒体としては、例えば、フレキシブルディスク、ハードディスク、光ディスク、光磁気ディスク、ＣＤ−ＲＯＭ、ＣＤ−Ｒ、ＤＶＤ−ＲＯＭ、磁気テープ、不揮発性のメモリカード、ＲＯＭ、ＥＥＰＲＯＭ、シリコンディスク、ソリッドステートドライブ等を用いることができる。 As a recording medium for supplying the program, for example, a flexible disk, hard disk, optical disk, magneto-optical disk, CD-ROM, CD-R, DVD-ROM, magnetic tape, nonvolatile memory card, ROM, EEPROM, silicon A disk, solid state drive, or the like can be used.

また、コンピュータが読み出したプログラムを実行することにより、前述した実施形態の機能が実現されるだけでなく、そのプログラムの指示に基づき、コンピュータ上で稼働しているＯＳ（オペレーティングシステム）等が実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 Further, by executing the program read by the computer, not only the functions of the above-described embodiments are realized, but also an OS (operating system) operating on the computer based on an instruction of the program is actually It goes without saying that a case where the function of the above-described embodiment is realized by performing part or all of the processing and the processing is included.

さらに、記録媒体から読み出されたプログラムが、コンピュータに挿入された機能拡張ボードやコンピュータに接続された機能拡張ユニットに備わるメモリに書き込まれた後、そのプログラムコードの指示に基づき、その機能拡張ボードや機能拡張ユニットに備わるＣＰＵ等が実際の処理の一部または全部を行い、その処理によって前述した実施形態の機能が実現される場合も含まれることは言うまでもない。 Furthermore, after the program read from the recording medium is written to the memory provided in the function expansion board inserted into the computer or the function expansion unit connected to the computer, the function expansion board is based on the instructions of the program code. It goes without saying that the case where the CPU or the like provided in the function expansion unit performs part or all of the actual processing and the functions of the above-described embodiments are realized by the processing.

また、本発明は、複数の機器から構成されるシステムに適用しても、１つの機器からなる装置に適用してもよい。また、本発明は、システムあるいは装置にプログラムを供給することによって達成される場合にも適応できることは言うまでもない。この場合、本発明を達成するためのプログラムを格納した記録媒体を該システムあるいは装置に読み出すことによって、そのシステムあるいは装置が、本発明の効果を享受することが可能となる。さらに、本発明を達成するためのプログラムをネットワーク上のサーバ、データベース等から通信プログラムによりダウンロードして読み出すことによって、そのシステムあるいは装置が、本発明の効果を享受することが可能となる。 Further, the present invention may be applied to a system composed of a plurality of devices or an apparatus composed of a single device. Needless to say, the present invention can be applied to a case where the present invention is achieved by supplying a program to a system or apparatus. In this case, by reading a recording medium storing a program for achieving the present invention into the system or apparatus, the system or apparatus can enjoy the effects of the present invention. Furthermore, by downloading and reading a program for achieving the present invention from a server, database, etc. on a network using a communication program, the system or apparatus can enjoy the effects of the present invention.

なお、上述した各実施形態およびその変形例を組み合わせた構成も全て本発明に含まれるものである。 In addition, all the structures which combined each embodiment mentioned above and its modification are also included in this invention.

１０１会議サーバ
１０２クライアント装置
１０３ネットワーク
３０１会議システム
３０２システム管理者
３０３グループ
３０４グループ管理者
３０５一般ユーザ
３０６会議室
３０７フリースペース 101 conference server 102 client device 103 network 301 conference system 302 system administrator 303 group 304 group administrator 305 general user 306 conference room 307 free space

Claims

A conference server capable of communicating with a client device via a network, capable of converting voice data from the client device into characters and displaying the characters,
Voice receiving means for receiving voice data transmitted from the client device;
Voice data registration means for registering the voice data received by the voice reception means in the voice data storage means in association with the client device;
Voice acquisition means for acquiring voice data corresponding to the client device from the voice data storage means;
Image receiving means for receiving image data transmitted from the client device;
Image data registration means for registering image data received by the image reception means in an image data storage means in association with the client device;
Voice data acquired by the voice acquisition means, voice data character conversion means for converting into characters,
When the image data corresponding to the converted character is set to be displayed on the client device, the image data is combined with the image data as a character output, and the image data is displayed on the client device. If it is not set to be displayed, image data based on a character utterance frame for displaying the character output is generated, and the image data combined with the character output is received from the client device. When the image data and the character utterance frame are generated, image combining means for combining all the image data based on the generated character utterance frame as whole image data;
Whole image data transmitting means for transmitting the whole image data synthesized by the image synthesizing means to the client device;
A conference server.

Whether or not to display the converted character as a character output in a character utterance frame that is an area for displaying the character output, as a utterance character display setting storage means,
Character utterance frame display determination means for determining whether or not to display in the character utterance frame which is an area for displaying the character output as the character output based on the utterance frame display flag;
Have
Image composition means
When the character utterance frame display determination means determines that the character output is displayed in the character utterance frame, the image data based on the character utterance frame is generated, and the image data received from the client device or generated The conference server according to claim 1, wherein all image data based on the character utterance frame is synthesized as whole image data.

The spoken character display setting storage means further includes:
Whether or not to display the character in the corresponding image data as the character output is stored as a value of a participant utterance display flag,
Participant utterance display determination means for determining whether to display the character output in the image data based on the participant utterance display flag;
Image data display determination means for determining whether or not the image data is included in the entire image data transmitted from the conference server to the client device;
The image composition means further includes:
The character utterance frame display determination means determines that the character output is not displayed in the character utterance frame,
When the image data display determining means determines that the image data is included in the entire image data, the character output is combined with the image data,
When it is determined that the image data is not included in the entire image data, and the character output is determined to be displayed on the image data by the participant utterance display determination means, the character The image data based on an utterance frame is generated, and all of the image data received from the client device or the generated image data based on the character utterance frame is synthesized as whole image data. Conference server.

A voice output authority granting judgment unit for judging whether or not to output voice data corresponding to the client device based on the order of times when the voice data transmitted from the client device is registered in the voice data storage unit;
An audio output authority granting unit for granting an audio output authority to the client device when it is determined by the audio output authority granting determination unit that voice output;
Audio output end condition storage means for storing an audio output end condition for releasing the audio output authority of the client device to which the audio output authority is granted;
Audio output authority release determination means for determining whether or not to release the audio output authority given to the client device based on the audio output end condition stored in the audio output end condition storage means;
Audio output authority release means for releasing the audio output authority from the client device when it is determined by the audio output authority release determination means to release the audio output authority;
Voice data transmitting means for acquiring voice data of the client device to which the voice output authority is given by the voice acquisition means and transmitting the voice data to the client device;
The conference server according to any one of claims 1 to 3, wherein the conference server is provided.

Character output authority for determining whether or not to convert voice data corresponding to the client device into characters and to output characters based on the order of times when the voice data transmitted from the client device is registered in the voice data storage means Grant determination means;
A character output authority granting unit for granting a character output authority to the client device when it is determined by the character output authority granting determination unit that character output;
Voice data corresponding to the client device to which the character output authority has been granted by the character output authority granting means is acquired from the voice data storage means and converted into characters;
Character output end condition storage means for storing a character output end condition for releasing the character output authority of the client device to which the character output authority is granted;
Character output authority release determination means for determining whether or not to release the character output authority given to the client device based on the character output end condition stored in the character output end condition storage means;
Character output authority release means for releasing character output authority from the client device when it is determined to release the character output authority by the character output authority release determination means,
The image composition means includes
When the character output authority is granted, the image data based on the character utterance frame is generated, and the image data received from the client device or all of the image data based on the generated character utterance frame is displayed as an entire image. 5. The conference server according to claim 4, wherein the conference server is synthesized as data.

The audio output end condition is specified by the time that elapses without resumption of audio data after the audio data corresponding to the client device registered in the audio data storage means is interrupted,
The voice output authority release determination means is
The voice output authority given to the client device is canceled based on the interruption time of the voice data corresponding to the client device registered in the voice data storage means and the voice output end condition specified by the time. 6. The conference server according to claim 4, wherein the conference server determines whether or not.

The character output end condition is specified by a time that elapses without restarting the voice data after the voice data corresponding to the client device registered in the voice data storage unit is interrupted,
The character output authority release determination means includes
The character output authority assigned to the client device is canceled based on the interruption time of the audio data corresponding to the client device registered in the audio data storage means and the character output end condition specified by the time. The conference server according to claim 5, wherein the conference server determines whether or not.

Voice output authority number storage means for storing the number of the client devices to which the voice output authority is granted;
Further comprising
The voice output authority grant determining means further includes:
8. The method according to claim 4, wherein it is determined to output voice data to the client device corresponding to the number stored in the voice output authority number storage unit. Conference server.

The character output authority grant determining means further includes:
9. The character output authority is given to the client device that is determined not to grant the voice output authority by the voice output authority determination unit. The conference server described in the section.

Character output authority number storage means for storing the number of the client devices to which the character output authority is granted;
Further comprising
The character output authority grant determining means further includes:
10. The apparatus according to claim 5, wherein voice data is determined to be output to the client device corresponding to the number stored in the character output authority number storage unit. Conference server.

The voice output end condition can specify a keyword that means the end of the speech among the speeches of the participants participating in the conference in the client device,
The voice output authority release determination means further includes:
The determination as to whether or not to release the audio output authority assigned to the client device based on a keyword that is the audio output end condition in audio data corresponding to the client device. The conference server according to any one of claims 4 to 10.

The character output end condition can specify a keyword that means the end of the speech among the speeches of the participants participating in the conference in the client device,
The character output authority release determination means further includes:
The determination as to whether or not to release the character output authority assigned to the client device based on a keyword that is the character output end condition in the audio data corresponding to the client device. The conference server according to any one of claims 5 to 11.

A client system and a conference server are communication systems that can communicate via a network,
The client device is
Voice data transmitting means for transmitting the speech of the participant participating in the conference input to the client device to the conference server as voice data;
Audio data receiving means for receiving audio data transmitted from the conference server;
Audio data reproduction control means for reproducing the audio data;
Have
Voice receiving means for receiving voice data transmitted from the client device;
Voice data registration means for registering the voice data received by the voice reception means in the voice data storage means in association with the client device;
Voice acquisition means for acquiring voice data corresponding to the client device from the voice data storage means;
Image receiving means for receiving image data transmitted from the client device;
Image data registration means for registering image data received by the image reception means in an image data storage means in association with the client device;
Voice data acquired by the voice acquisition means, voice data character conversion means for converting into characters,
When the image data corresponding to the converted character is set to be displayed on the client device, the image data is combined with the image data as a character output, and the image data is displayed on the client device. If it is not set to be displayed, image data based on a character utterance frame for displaying the character output is generated, and the image data combined with the character output is received from the client device. When the image data and the character utterance frame are generated, image combining means for combining all the image data based on the generated character utterance frame as whole image data;
Whole image data transmitting means for transmitting the whole image data synthesized by the image synthesizing means to the client device;
A conference system.

A control method for a conference server that is communicable with a client device via a network and capable of converting voice data from the client device into characters and displaying the characters,
A voice receiving step for receiving voice data transmitted from the client device;
A voice data registration step in which the voice data registration means registers the voice data received in the voice reception step in the voice data storage means in association with the client device;
A voice acquisition step in which voice acquisition means acquires voice data corresponding to the client device from the voice data storage means;
An image receiving step for receiving image data transmitted from the client device;
Image data registration means for registering the image data received by the image receiving step in the image data storage means in association with the client device;
A voice data character conversion means for converting the voice data acquired by the voice acquisition step into a character;
When the image composition unit is set to display the image data corresponding to the converted character on the client device, the image composition unit composes the image data as character output, and the image data If it is not set to be displayed on the client device, it generates image data based on a character utterance frame for displaying the character output, and further, the image data combined with the character output; An image synthesis step of synthesizing all image data based on the generated character utterance frame as whole image data when the image data received from the client device and the character utterance frame are generated;
A whole image data transmitting means for sending the whole image data combined in the image combining step to the client device;
A method for controlling a conference server, comprising:

A program for causing a computer to communicate with a client device via a network, functioning as a conference server capable of converting voice data from the client device into characters and displaying the characters,
The computer,
Voice receiving means for receiving voice data transmitted from the client device;
Voice data registration means for registering voice data received by the voice reception means in voice data storage means in association with the client device;
Voice acquisition means for acquiring voice data corresponding to the client device from the voice data storage means;
Image receiving means for receiving image data transmitted from the client device;
Image data registration means for registering image data received by the image receiving means in an image data storage means in association with the client device;
Voice data character conversion means for converting voice data acquired by the voice acquisition means into characters;
When the image data corresponding to the converted character is set to be displayed on the client device, the image data is combined with the image data as a character output, and the image data is displayed on the client device. If it is not set to be displayed, image data based on a character utterance frame for displaying the character output is generated, and the image data combined with the character output is received from the client device. When the image data and the character utterance frame are generated, image combining means for combining all the image data based on the generated character utterance frame as whole image data,
Whole image data transmitting means for transmitting the whole image data synthesized by the image synthesizing means to the client device;
A program for causing a server to function as a conference server.

A computer-readable recording medium on which the program according to claim 15 is recorded.