JP2017201737A

JP2017201737A - Information processing apparatus, information processing method, and program

Info

Publication number: JP2017201737A
Application number: JP2016092386A
Authority: JP
Inventors: 雅樹下野; Masaki Shimono
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2016-05-02
Filing date: 2016-05-02
Publication date: 2017-11-09

Abstract

PROBLEM TO BE SOLVED: To enable easy-to-understand display according to an occurrence of an event in a display of a result of textualization of a speech.SOLUTION: A distribution apparatus 100 acquires data based on a speech that is a speech of a person and is converted into text for display and determines that a predetermined event corresponding to an action of a person who makes a speech to be converted into a text has occurred. Then, the distribution apparatus 100 switches an output of the acquired data based on the speech to be converted into a text on the basis of the determination result of the event occurrence.SELECTED DRAWING: Figure 4

Description

本発明は、テキスト化される発言に基づくデータを出力する技術に関する。 The present invention relates to a technique for outputting data based on an utterance that is converted into text.

従来、離れた拠点間での会議を可能とするために、各拠点における会議参加者の映像や音声などを他の拠点に向けて配信する多地点会議システムがある。また、会議参加者の発言をテキスト化してディスプレイなどに表示することで、会議の内容を視覚的に認識できるようにする会議システムも存在する。特許文献１には、会議中の発言をテキスト化し、そのテキストと各発言の発言時刻及び発言者を示す情報とを関連付けた議事録ファイルを作成することが記載されている。 2. Description of the Related Art Conventionally, in order to enable a conference between remote sites, there is a multipoint conference system that distributes video and audio of conference participants at each site to other sites. In addition, there is a conference system that makes it possible to visually recognize the content of the conference by converting the speech of the conference participant into text and displaying it on a display or the like. Patent Document 1 describes that a statement file in a meeting is made into text, and a minutes file is created in which the text is associated with information indicating a comment time and a speaker of each comment.

特開２００７−１８０８２８号公報JP 2007-180828 A

しかしながら、特許文献１の方法では、発言をテキスト化した結果をイベントの発生に応じて表示することができず、表示がユーザにとって理解しにくくなる場合が考えられる。例えば、多地点会議システムによる会議において、複数の拠点のそれぞれに複数人の参加者がいる場合を考える。この場合に、異なる拠点の参加者全体に向けた発言と同じ拠点内の参加者のみに向けた発言とが区別されずにまとめてテキスト化されると、ユーザはそのテキスト化された結果から会議の内容を理解しづらい虞がある。なお、多地点会議システムに限らず、２つの拠点間で会議を行う場合においても、同様の課題がある。また、１つの拠点の中で行われる会議の発言をテキスト化する場合にも同様の課題がある。 However, with the method of Patent Document 1, it is possible that the result of texting a statement cannot be displayed according to the occurrence of an event, and the display becomes difficult for the user to understand. For example, consider a case where there are a plurality of participants at each of a plurality of bases in a conference by a multipoint conference system. In this case, if the statements made for all participants at different locations and the statements made only for participants in the same location are made into a text without being distinguished, the user can make a conference based on the textual results. May be difficult to understand. In addition, not only in a multipoint conference system but also in a case where a conference is held between two bases, there is a similar problem. There is a similar problem when texts of conferences held in one site are converted into texts.

本発明は上記課題に鑑みてなされたものであり、発言をテキスト化した結果の表示において、イベントの発生に応じた理解しやすい表示を可能とするための技術を提供することを目的とする。 The present invention has been made in view of the above-described problems, and an object of the present invention is to provide a technique for enabling easy-to-understand display according to the occurrence of an event in the display of a result obtained by converting a statement into text.

上記課題を解決するため、本発明に係る情報処理装置は、例えば以下の構成を有する。すなわち、人物の発言であって表示のためにテキスト化される発言に基づくデータを取得する取得手段と、前記テキスト化される発言を行う人物の動作に応じた所定のイベントが発生したことを判別する判別手段と、前記取得手段が取得した前記テキスト化される発言に基づくデータの出力を、前記判別手段による判別結果に基づいて切り替える切り替え手段とを有する。 In order to solve the above problems, an information processing apparatus according to the present invention has, for example, the following configuration. That is, it is determined that an acquisition unit that acquires data based on a person's utterance that is converted into text for display, and that a predetermined event corresponding to the action of the person who performs the textized utterance has occurred Discriminating means for switching, and switching means for switching the output of data based on the utterance made into text acquired by the acquiring means based on the determination result by the discriminating means.

本発明によれば、発言をテキスト化した結果の表示において、イベントの発生に応じた理解しやすい表示が可能となる。 According to the present invention, an easy-to-understand display according to the occurrence of an event can be performed in the display of the result of converting a comment into text.

実施形態に係る多地点会議システム１０の全体構成を示す図である。1 is a diagram illustrating an overall configuration of a multipoint conference system 10 according to an embodiment. 実施形態に係る配信装置１００の機能構成を示すブロック図である。It is a block diagram which shows the function structure of the delivery apparatus 100 which concerns on embodiment. 配信装置１００のハードウェア構成を示すブロック図である。2 is a block diagram illustrating a hardware configuration of a distribution apparatus 100. FIG. 配信装置１００の動作を説明するためのフローチャートである。5 is a flowchart for explaining the operation of the distribution apparatus 100. 表示装置１０４の表示内容の例を示す図である。6 is a diagram illustrating an example of display contents of the display device 104. FIG.

以下、本発明の実施形態について図面を参照して説明する。なお、以下の実施形態で説明する全ての構成が本発明に必須であるとは限らない。 Embodiments of the present invention will be described below with reference to the drawings. Note that not all configurations described in the following embodiments are essential to the present invention.

［システム構成］
図１を用いて、本実施形態に係る多地点会議システム１０の全体構成を説明する。多地点会議システム１０は、配信装置１００と、拠点Ａ、拠点Ｂ及び拠点Ｃのそれぞれに設置された通信装置１０１、マイク１０２、カメラ１０３及び表示装置１０４とを有する。マイク１０２、カメラ１０３及び表示装置１０４はそれぞれが同一拠点の通信装置１０１と接続されている。また、各拠点に設置された通信装置１０１はそれぞれがネットワークを介して配信装置１００と接続されている。ユーザは多地点会議システム１０を使用することで、映像や音声、データなどを拠点間でリアルタイムに共有する事ができる。また、多地点会議システム１０は、各拠点に存在する人物の発言を表示可能となるようにテキスト化する。 [System configuration]
The overall configuration of the multipoint conference system 10 according to the present embodiment will be described with reference to FIG. The multipoint conference system 10 includes a distribution device 100, a communication device 101, a microphone 102, a camera 103, and a display device 104 installed in each of the base A, the base B, and the base C. The microphone 102, camera 103, and display device 104 are each connected to the communication device 101 at the same location. Further, each communication device 101 installed at each base is connected to the distribution device 100 via a network. By using the multipoint conference system 10, the user can share video, audio, data, etc. in real time between bases. In addition, the multipoint conference system 10 converts the speech of a person existing at each base into text so that it can be displayed.

本実施形態では、配信装置１００が拠点Ａ、拠点Ｂ及び拠点Ｃの３拠点間での通信を制御する場合の多地点会議システム１０を中心に説明する。ただしこれに限らず、配信装置１００は、４以上の拠点にそれぞれ設置された通信装置１０１と接続されてそれらの拠点間の通信を制御してもよいし、２拠点間の通信を制御してもよい。また、各拠点に設置された通信装置１０１の少なくとも何れかが配信装置１００の機能を有し、通信装置１０１同士が直接接続されて通信を行ってもよい。また、マイク１０２、カメラ１０３及び表示装置１０４の少なくとも何れかが通信装置１０１と一体となって構成されていてもよい。この場合、マイク１０２は通信装置１０１の集音部、カメラ１０３は撮像部、表示装置１０４は表示部としてそれぞれ機能する。さらに、通信装置１０１、マイク１０２、カメラ１０３及び表示装置１０４の少なくとも何れかが同一の拠点内に複数設置されていてもよいし、マイク１０２、カメラ１０３及び表示装置１０４の少なくとも何れかが存在しない拠点があってもよい。 In the present embodiment, the multipoint conference system 10 in the case where the distribution apparatus 100 controls communication between the three bases A, B, and C will be mainly described. However, the present invention is not limited to this, and the distribution apparatus 100 may be connected to the communication apparatuses 101 installed at four or more bases to control communication between those bases, or control communication between the two bases. Also good. Further, at least one of the communication devices 101 installed at each base may have the function of the distribution device 100, and the communication devices 101 may be directly connected to perform communication. In addition, at least one of the microphone 102, the camera 103, and the display device 104 may be configured integrally with the communication device 101. In this case, the microphone 102 functions as a sound collection unit of the communication device 101, the camera 103 functions as an imaging unit, and the display device 104 functions as a display unit. Further, at least one of the communication device 101, the microphone 102, the camera 103, and the display device 104 may be installed in the same base, or at least one of the microphone 102, the camera 103, and the display device 104 does not exist. There may be a base.

次に、多地点会議システム１０に含まれる各装置の機能について説明する。マイク１０２は、例えば会議参加者の発言など拠点内の音声を集音し、通信装置１０１に出力する。カメラ１０３は、例えば会議参加者が存在する会議室など拠点内の所定領域を撮像し、撮像画像を通信装置１０１に出力する。本実施形態ではカメラ１０３が出力する撮像画像は動画であるものとするが、カメラ１０３は静止画を出力してもよい。表示装置１０４は、通信装置１０１から入力された画像データを表示する。通信装置１０１から入力される画像データには、例えば他拠点において撮像された撮像画像や、会議中の発言をテキスト化した文字などが含まれる。図５は表示装置１０４による表示内容の一例である。また本実施形態では、表示装置１０４はスピーカを備えており、通信装置１０１から入力されたオーディオデータに基づいて音声を出力する。表示装置１０４の具体例は、液晶ディスプレイやプロジェクタ等である。なお、スピーカは表示装置１０４とは別個に設置されていてもよい。 Next, functions of each device included in the multipoint conference system 10 will be described. The microphone 102 collects the voice in the base such as a speech of a conference participant and outputs it to the communication device 101. For example, the camera 103 captures a predetermined area in the base such as a conference room where conference participants exist, and outputs the captured image to the communication apparatus 101. In the present embodiment, the captured image output by the camera 103 is a moving image, but the camera 103 may output a still image. The display device 104 displays the image data input from the communication device 101. The image data input from the communication apparatus 101 includes, for example, a captured image captured at another site, characters that are texts of statements made during a meeting, and the like. FIG. 5 shows an example of the contents displayed on the display device 104. In the present embodiment, the display device 104 includes a speaker and outputs sound based on audio data input from the communication device 101. Specific examples of the display device 104 include a liquid crystal display and a projector. Note that the speaker may be provided separately from the display device 104.

通信装置１０１は、マイク１０２から入力された音声に基づいてオーディオデータを生成し、配信装置１００に送信する。また通信装置１０１は、カメラ１０３から入力された撮像画像を配信装置１００に送信する。そして通信装置１０１は、各拠点において集音された音声に基づくオーディオデータを配信装置１００から受信し、スピーカを有する表示装置１０４に出力する。さらに通信装置１０１は、各拠点において集音された音声をテキスト化した文字データと各拠点において撮像された撮像画像を配信装置１００から受信し、各拠点における発言内容と撮像画像とを含む画像データを生成し、表示装置１０４に出力する。 The communication device 101 generates audio data based on the voice input from the microphone 102 and transmits the audio data to the distribution device 100. In addition, the communication apparatus 101 transmits the captured image input from the camera 103 to the distribution apparatus 100. Then, the communication device 101 receives audio data based on the sound collected at each site from the distribution device 100 and outputs it to the display device 104 having a speaker. Further, the communication device 101 receives text data obtained by converting the voice collected at each site into text and a captured image captured at each site from the distribution device 100, and includes image data including the remark content and the captured image at each site. Is generated and output to the display device 104.

配信装置１００は、図２に示すように、変換部３００とセッション管理部３０１とを有する。変換部３００は、セッション管理部３０１が各拠点の通信装置１０１から受信したオーディオデータに対して音声認識処理を行うことで、オーディオデータに含まれる発言をテキスト化し、文字データを生成する。具体的には、変換部３００はオーディオデータから特徴量を算出し、配信装置１００に記憶されている特徴量のデータベースと比較することで、オーディオデータに含まれる発言を文字に変換する。なお、音声認識の方法はこれに限らず他の方法であってもよい。セッション管理部３０１は、各拠点の通信装置１０１からオーディオデータ及び撮像画像を受信する。そしてセッション管理部３０１は、受信したオーディオデータ及び撮像画像を、そのオーディオデータに基づいて変換部３００により生成された文字データと共に各拠点の通信装置１０１に送信する。ここでセッション管理部３０１は、データの送信に用いるセッションを各拠点におけるイベントの発生状況に応じて制御する。このセッション制御の詳細については後述する。なお本実施形態におけるセッションとは、例えばＨＴＴＰ（ＨｙｐｅｒｔｅｘｔＴｒａｎｓｆｅｒＰｒｏｔｏｃｏｌ）においてセッションＩＤを用いて管理されるＨＴＴＰセッションのような、識別可能な一連の通信の単位である。配信装置１００は、異なる通信装置１０１に対して異なるセッションを用いてデータの送受信を行う。また、配信装置１００が同一の通信装置１０１に対して複数のセッションを用いてデータを送信した場合、通信装置１０１は受信したデータをセッションごとに区別して処理することができる。 As illustrated in FIG. 2, the distribution apparatus 100 includes a conversion unit 300 and a session management unit 301. The conversion unit 300 performs speech recognition processing on the audio data received by the session management unit 301 from the communication device 101 at each site, thereby converting the speech included in the audio data into text and generating character data. Specifically, the conversion unit 300 calculates a feature amount from the audio data and compares it with a database of feature amounts stored in the distribution device 100, thereby converting the speech included in the audio data into characters. Note that the speech recognition method is not limited to this and may be another method. The session management unit 301 receives audio data and captured images from the communication device 101 at each site. The session management unit 301 transmits the received audio data and captured image to the communication device 101 at each site together with the character data generated by the conversion unit 300 based on the audio data. Here, the session management unit 301 controls a session used for data transmission according to the event occurrence state at each site. Details of this session control will be described later. Note that the session in the present embodiment is a unit of a series of identifiable communication such as an HTTP session managed using a session ID in HTTP (Hypertext Transfer Protocol), for example. The distribution apparatus 100 transmits / receives data to / from different communication apparatuses 101 using different sessions. In addition, when the distribution apparatus 100 transmits data to the same communication apparatus 101 using a plurality of sessions, the communication apparatus 101 can process the received data by distinguishing each session.

［ハードウェア構成］
図３は、本実施形態に係る配信装置１００のハードウェア構成を示すブロック図である。なお、通信装置１０１も配信装置１００と同様の構成である。配信装置１００は、ＣＰＵ２０１、ＲＯＭ２０２、ＲＡＭ２０３、補助記憶装置２０４、操作部２０５、通信部２０６、及びバス２０７を有する。 [Hardware configuration]
FIG. 3 is a block diagram illustrating a hardware configuration of the distribution apparatus 100 according to the present embodiment. Note that the communication device 101 has the same configuration as the distribution device 100. The distribution device 100 includes a CPU 201, a ROM 202, a RAM 203, an auxiliary storage device 204, an operation unit 205, a communication unit 206, and a bus 207.

ＣＰＵ２０１は、ＲＯＭ２０２やＲＡＭ２０３に格納されているコンピュータプログラムやデータを用いて配信装置１００の全体を制御する。ＲＯＭ２０２は、変更を必要としないプログラムやパラメータを格納する。ＲＡＭ２０３は、補助記憶装置２０４から供給されるプログラムやデータ、及び通信部２０６を介して外部から供給されるデータなどを一時記憶する。補助記憶装置２０４は、例えばハードディスクドライブ等で構成され、画像データや音声データなどのコンテンツデータを記憶する。 The CPU 201 controls the entire distribution apparatus 100 using computer programs and data stored in the ROM 202 and the RAM 203. The ROM 202 stores programs and parameters that do not need to be changed. The RAM 203 temporarily stores programs and data supplied from the auxiliary storage device 204 and data supplied from the outside via the communication unit 206. The auxiliary storage device 204 is constituted by a hard disk drive, for example, and stores content data such as image data and audio data.

操作部２０５は、例えばキーボードやマウス等で構成され、ユーザによる操作を受けて各種の指示をＣＰＵ２０１に入力する。通信部２０６は、通信装置１０１などの外部の装置と通信を行う。例えば、配信装置１００が外部の装置と有線で接続される場合には、ＬＡＮケーブル等が通信部２０６に接続される。なお、配信装置１００が外部の装置と無線通信する機能を有する場合、通信部２０６はアンテナを備える。バス２０７は、配信装置１００の各部を繋いで情報を伝達する。 The operation unit 205 includes, for example, a keyboard and a mouse, and inputs various instructions to the CPU 201 in response to user operations. The communication unit 206 communicates with an external device such as the communication device 101. For example, when the distribution device 100 is connected to an external device by wire, a LAN cable or the like is connected to the communication unit 206. Note that when the distribution device 100 has a function of performing wireless communication with an external device, the communication unit 206 includes an antenna. A bus 207 connects the units of the distribution apparatus 100 to transmit information.

なお、本実施形態ではＣＰＵ２０１がプログラムを実行することで通信部２０６を介して配信装置１００と外部の装置との通信を制御するが、配信装置１００と外部の装置との通信の少なくとも一部を通信部２０６がハードウェア処理により制御してもよい。また、本実施形態では操作部２０５は配信装置１００の内部に存在するが、操作部２０５が配信装置１００の外部に別の装置として存在していてもよい。この場合、ＣＰＵ２０１が、操作部２０５を制御する操作制御部として動作する。また、配信装置１００が外部の表示装置と接続される場合、ＣＰＵ２０１が表示装置を制御する表示制御部として動作する。 In this embodiment, the CPU 201 executes a program to control communication between the distribution apparatus 100 and an external apparatus via the communication unit 206. However, at least a part of communication between the distribution apparatus 100 and the external apparatus is controlled. The communication unit 206 may be controlled by hardware processing. In this embodiment, the operation unit 205 exists inside the distribution apparatus 100, but the operation unit 205 may exist outside the distribution apparatus 100 as another apparatus. In this case, the CPU 201 operates as an operation control unit that controls the operation unit 205. When the distribution device 100 is connected to an external display device, the CPU 201 operates as a display control unit that controls the display device.

［動作フロー］
次に、複数の拠点間での会議を制御する配信装置１００の動作フローについて、図４を用いて説明する。図４に示す処理は、配信装置１００が何れかの拠点の通信装置１０１から会議開始の要求を受信したタイミングで開始される。ただし、図４に示す処理の開始タイミングは上記タイミングに限定されない。図４に示す処理は、ＣＰＵ２０１がＲＯＭ２０２に格納されたプログラムをＲＡＭ２０３に展開して実行することで実現される。なお、図４に示す処理の少なくとも一部を、ＣＰＵ２０１とは異なる専用のハードウェアにより実現してもよい。 [Operation flow]
Next, an operation flow of the distribution apparatus 100 that controls a conference between a plurality of bases will be described with reference to FIG. The process shown in FIG. 4 is started when the distribution apparatus 100 receives a conference start request from the communication apparatus 101 at any base. However, the start timing of the process shown in FIG. 4 is not limited to the above timing. The processing shown in FIG. 4 is realized by the CPU 201 developing the program stored in the ROM 202 on the RAM 203 and executing it. Note that at least part of the processing illustrated in FIG. 4 may be realized by dedicated hardware different from the CPU 201.

Ｓ４０１において、セッション管理部３０１は、多地点会議を開始するためのセッション設定指示を各拠点の通信装置１０１に対して行う。指示を受けた各通信装置１０１は、配信装置１００との間で通信を行うためのセッションを設定して応答する。なお、本実施形態の構成は、２拠点間で会議を行う場合においても効果があるし、１拠点内で会議を行う場合においても効果がある。１拠点内で会議を行う場合の適用例については後述する。Ｓ４０２において、セッション管理部３０１は、何れかの拠点において所定のイベントが発生したか否かを判別する。所定のイベントが発生していないことを判別した場合、Ｓ４０４の処理に進む。一方、所定のイベントが発生したことを判別した場合、Ｓ４０３に進み、セッション管理部３０１は判別結果に基づいてセッション制御を行う。所定のイベントの内容、及びセッション制御の詳細については後述する。 In step S 401, the session management unit 301 issues a session setting instruction for starting a multipoint conference to the communication device 101 at each site. Upon receiving the instruction, each communication device 101 sets a session for performing communication with the distribution device 100 and responds. Note that the configuration of the present embodiment is effective even when a conference is held between two sites, and also effective when a conference is held within one site. An application example when a conference is held within one site will be described later. In S402, the session management unit 301 determines whether a predetermined event has occurred at any of the bases. If it is determined that a predetermined event has not occurred, the process proceeds to S404. On the other hand, if it is determined that a predetermined event has occurred, the process proceeds to S403, and the session management unit 301 performs session control based on the determination result. Details of the predetermined event and details of session control will be described later.

Ｓ４０４において、セッション管理部３０１は、複数の通信装置１０１（拠点Ａ、Ｂ及びＣの通信装置１０１）から送信されたオーディオデータを受信する。このオーディオデータは、送信元の通信装置１０１がマイク１０２により集音された音声から生成したものであり、会議におけるテキスト化される発言などを含む。Ｓ４０５において、変換部３００は、Ｓ４０４でセッション管理部３０１が取得したオーディオデータに基づいてテキスト化を行うことで、会議における人物の発言などを文字で表す文字データを生成して取得する。 In step S404, the session management unit 301 receives audio data transmitted from a plurality of communication devices 101 (communication devices 101 at bases A, B, and C). This audio data is generated from the sound collected by the microphone 102 by the communication device 101 of the transmission source, and includes speech that is converted into text in the conference. In S 405, the conversion unit 300 generates and acquires character data representing characters such as a person's remarks in the conference by converting into text based on the audio data acquired by the session management unit 301 in S 404.

Ｓ４０６において、セッション管理部３０１は、Ｓ４０４において受信したオーディオデータ及びＳ４０５において変換部３００が取得した文字データを、各拠点の通信装置１０１に送信する。なお、Ｓ４０３におけるセッション制御が行われていた場合、そのセッション制御に応じたセッションでデータ送信が行われる。各拠点の通信装置１０１は、受信したデータに基づく出力を表示装置１０４に対して行う。具体的には、通信装置１０１は、オーディオデータを表示装置１０４に出力することで、他拠点において集音された音声を表示装置１０４から音声出力させる。また通信装置１０１は、受信した文字データに基づく画像データを表示装置１０４に出力することで、自拠点又は他拠点において集音された音声がテキスト化された結果、すなわち会議における発言の内容を表示装置１０４に表示させる。 In S406, the session management unit 301 transmits the audio data received in S404 and the character data acquired by the conversion unit 300 in S405 to the communication device 101 at each site. When session control is performed in S403, data transmission is performed in a session corresponding to the session control. The communication device 101 at each site outputs to the display device 104 based on the received data. Specifically, the communication device 101 outputs audio data to the display device 104, thereby causing the display device 104 to output sound collected at another site. In addition, the communication device 101 outputs image data based on the received character data to the display device 104, thereby displaying the result of converting the voice collected at the local site or the other site into text, that is, the content of the speech in the conference. It is displayed on the device 104.

Ｓ４０７において、セッション管理部３０１は、多地点会議を終了するか否かを判定する。会議を終了しないと判定された場合、配信装置１００は、再度Ｓ４０２からの処理を実行する。一方、会議を終了すると判定された場合、セッション管理部３０１は通信に用いられるセッションの終了を各拠点の通信装置１０１に指示し、図４に示す処理を終了する。なお、会議を終了すると判定される場合とは、例えば配信装置１００が通信装置１０１から会議終了の要求を受信した場合である。 In S407, the session management unit 301 determines whether to end the multipoint conference. If it is determined not to end the conference, the distribution apparatus 100 executes the processing from S402 again. On the other hand, when it is determined that the conference is to be ended, the session management unit 301 instructs the communication device 101 at each site to end the session used for communication, and ends the processing illustrated in FIG. The case where it is determined that the conference is to be ended is, for example, a case where the distribution device 100 receives a conference end request from the communication device 101.

以上で、配信装置１００の動作フローの説明を終わる。なお、上記の説明では、配信装置１００から通信装置１０１へのオーディオデータと文字データの送信に焦点を当てて説明した。ただしこれに加えて、本実施形態における配信装置１００は、Ｓ４０１において会議が開始されてからＳ４０７において終了するまでの間、各拠点の通信装置１０１から撮像画像を受信して他拠点の通信装置１０１へ送信する。配信装置１００から撮像画像を受信した通信装置１０１は、その撮像画像に基づく画像データを表示装置１０４に出力することで、他拠点の会議参加者などの画像を表示装置１０４に表示させる。なお、配信装置１００は撮像画像の送信を行わなくてもよい。また、配信装置１００は、デジタイザによる入力の内容やその他のデータを何れかの拠点の通信装置１０１から受信し、各拠点の通信装置１０１に転送してもよい。 This is the end of the description of the operation flow of the distribution apparatus 100. In the above description, the description has focused on the transmission of audio data and character data from the distribution apparatus 100 to the communication apparatus 101. However, in addition to this, the distribution apparatus 100 according to the present embodiment receives a captured image from the communication apparatus 101 at each site and starts the communication apparatus 101 at another site from the start of the conference in S401 to the end in S407. Send to. The communication device 101 that has received the captured image from the distribution device 100 outputs image data based on the captured image to the display device 104, thereby causing the display device 104 to display images of conference participants at other locations. The distribution apparatus 100 may not transmit the captured image. Further, the distribution apparatus 100 may receive the contents of the input by the digitizer and other data from the communication apparatus 101 at any base and transfer them to the communication apparatus 101 at each base.

［セッション制御方法］
次に、図４のＳ４０２におけるイベント発生の判別と、Ｓ４０３におけるセッション制御について、その詳細を説明する。Ｓ４０２において、まずセッション管理部３０１は、通信装置１０１から送信された撮像画像を取得する。そしてセッション管理部３０１は、取得した撮像画像を解析することで、テキスト化される発言を行う人物の動作に応じた所定のイベントが発生したことを判別する。本実施形態において、上記の所定のイベントは、例えばテキスト化される発言を行う人物（撮像画像内の人物）が所定の方向に向くことや、テキスト化される発言を行う人物が所定の領域に触れることなどである。ただし、セッション管理部３０１が判別するイベントの内容はこれらに限らず、例えばユーザが所定のスイッチを操作することなどが含まれていてもよい。そしてＳ４０３においてセッション管理部３０１は、Ｓ４０６における通信装置１０１へのデータの出力（送信）に用いられるセッションを、イベント発生の判別結果に基づいて切り替える。例えば、セッション管理部３０１は、判別結果に応じて通信装置１０１との間に新たなセッションを設定し、データの出力にその新たに設定されたセッションを用いる。また例えば、セッション管理部３０１は、通信装置１０１との間に設定された複数のセッションの何れかを判別結果に応じて終了し、終了していない残りのセッションを用いてデータの出力を行う。 [Session control method]
Next, details of the event occurrence determination in S402 and the session control in S403 will be described. In step S 402, the session management unit 301 first acquires a captured image transmitted from the communication apparatus 101. And the session management part 301 discriminate | determines that the predetermined | prescribed event according to the operation | movement of the person who performs the utterance made into text generate | occur | produces by analyzing the acquired captured image. In the present embodiment, the predetermined event is, for example, that a person who makes a statement made into text (a person in the captured image) faces a predetermined direction, or a person making a statement made into a text is in a predetermined area. Such as touching. However, the contents of the event determined by the session management unit 301 are not limited to these, and may include, for example, a user operating a predetermined switch. In step S 403, the session management unit 301 switches the session used for data output (transmission) to the communication apparatus 101 in step S 406 based on the determination result of the event occurrence. For example, the session management unit 301 sets a new session with the communication apparatus 101 according to the determination result, and uses the newly set session for data output. Further, for example, the session management unit 301 ends any one of a plurality of sessions set with the communication apparatus 101 according to the determination result, and outputs data using the remaining sessions that have not ended.

以下、イベントの発生に応じたセッションの切り替えの具体例を説明する。拠点Ａのマイク１０２は拠点Ａの会議参加者の発言を集音し、通信装置１０１はその集音された音声を含むオーディオデータを配信装置１００に送信する。また、拠点Ａのカメラ１０３は拠点Ａの会議参加者と表示装置１０４の表示面とを含む領域を撮像し、通信装置１０１はその撮像画像を配信装置１００に送信する。 Hereinafter, a specific example of session switching according to the occurrence of an event will be described. The microphone 102 at the site A collects the speech of the conference participant at the site A, and the communication apparatus 101 transmits audio data including the collected voice to the distribution apparatus 100. The camera 103 at the site A captures an area including the conference participant at the site A and the display screen of the display device 104, and the communication device 101 transmits the captured image to the distribution device 100.

図５は、拠点Ａの表示装置１０４の表示内容の例を示す図である。表示領域５００には、会議における発言がテキスト化された結果を示すスレッド５０１、拠点Ｂ内の会話がテキスト化された結果を示すスレッド５０４、及び拠点Ｃ内の会話がテキスト化された結果を示すスレッド５０５が表示される。さらにこれに加えて、表示領域５００には、拠点Ｂの会議参加者が写された撮像画像５０２と拠点Ｃの会議参加者が写された撮像画像５０３が表示される。 FIG. 5 is a diagram illustrating an example of display contents of the display device 104 at the site A. In the display area 500, a thread 501 indicating the result of texting the speech in the conference, a thread 504 indicating the result of texting the conversation in the site B, and the result of texting the conversation in the site C are shown. A thread 505 is displayed. In addition to this, in the display area 500, a captured image 502 in which a conference participant at the base B is copied and a captured image 503 in which the conference participant at the base C is copied are displayed.

拠点Ａの会議参加者は、拠点Ｂ及び拠点Ｃの会議参加者に向けた発言を行う際、表示装置１０４を見ながら発言をする。一方、拠点Ａの会議参加者は、他拠点との会議の内容とは無関係の会話を、拠点Ａ内の会議参加者同士で行う場合がある。このような拠点内での会話を行う際、拠点Ａの会議参加者は表示装置１０４とは異なる方向（例えば同じ拠点内の別の会議参加者がいる方向）を向いて発言をする。 The conference participant at the site A speaks while viewing the display device 104 when speaking to the conference participants at the site B and the site C. On the other hand, the conference participants at the site A may have a conversation that is unrelated to the content of the conference with another site between the conference participants in the site A. When performing a conversation in such a base, a conference participant at the base A speaks in a direction different from the display device 104 (for example, a direction in which another conference participant in the same base is present).

そしてセッション管理部３０１は、拠点Ａの通信装置１０１から受信した撮像画像を解析した結果、会議参加者が表示装置１０４とは異なる方向を向くというイベントが発生したことを判別した場合、データの送信に用いるセッションを切り替える。即ち、配信装置１００は、拠点Ａの会議参加者が表示装置１０４を見ている際の発言をテキスト化した文字データと、別の方向を見ている際の発言をテキスト化した文字データとを、異なるセッションを用いて各拠点の通信装置１０１に送信する。 When the session management unit 301 analyzes the captured image received from the communication device 101 at the site A and determines that an event has occurred in which the conference participant faces a direction different from the display device 104, data transmission is performed. Switch the session used for. In other words, the distribution device 100 generates character data obtained by converting a statement when the conference participant at the site A is viewing the display device 104 and character data obtained by converting the statement when the conference participant is looking in another direction into text. Then, the data is transmitted to the communication device 101 at each site using a different session.

各拠点の通信装置１０１は、異なるセッションを用いて受信した文字データを区別できるため、それらの文字データを容易に異なる方法で処理することができる。例えば、所定のイベントが発生する前に行われた発言がテキスト化された文字と、所定のイベントが発生した後に行われた発言がテキスト化された文字とを、表示装置１０４内において識別可能に表示することができる。イベント発生前後の発言の内容が識別可能に表示されることで、ユーザは表示装置１０４の表示から容易に会議における発言の文脈を理解することができる。 Since the communication device 101 at each site can distinguish character data received using different sessions, the character data can be easily processed in different ways. For example, in the display device 104, it is possible to distinguish between characters in which utterances made before the occurrence of a predetermined event are converted into text and characters in which utterances made after the occurrence of the predetermined event are converted into text Can be displayed. Since the content of the utterance before and after the event occurrence is displayed in an identifiable manner, the user can easily understand the context of the utterance in the conference from the display on the display device 104.

例えば、図５において、拠点Ｂの会議参加者が表示装置１０４を見ている際の発言がテキスト化された文字はスレッド５０１内に表示され、拠点Ｂの会議参加者が別の方向を見ている際の発言がテキスト化された文字はスレッド５０４内に表示される。なお、イベント発生前後の発言内容を識別可能に表示する方法は、発言がテキスト化された文字をイベント発生の判別結果に応じて異なる位置に表示させる方法に限らない。例えば、通信装置１０１は、表示装置１０４に表示される文字の色やフォントなどを判別結果に応じて切り替えてもよい。 For example, in FIG. 5, the text in which the speech when the conference participant at the base B is looking at the display device 104 is displayed in the thread 501, and the conference participant at the base B sees the other direction. The character in which the utterance is converted into text is displayed in the thread 504. Note that the method of displaying the utterance contents before and after the occurrence of the event in an identifiable manner is not limited to the method of displaying the characters in which the utterance is converted into text at different positions according to the determination result of the event occurrence. For example, the communication device 101 may switch the color or font of characters displayed on the display device 104 according to the determination result.

次に、イベントの発生に応じたセッションの切り替えに関する別の具体例を説明する。拠点Ａの会議参加者は、拠点Ｂの会議参加者には知られたくない内容を、拠点Ｃの会議参加と話したい場合がある。このように特定の拠点Ｃとの会話を行う際、拠点Ａの会議参加者は、例えば表示装置１０４に表示された拠点Ｂの会議参加者の画像（図５の撮像画像５０２）に触れて発言をする。 Next, another specific example relating to session switching according to the occurrence of an event will be described. The conference participant at site A may want to talk about the content that the conference participant at site B does not want to be known as the conference participant at site C. In this way, when a conversation with a specific site C is performed, a conference participant at the site A touches an image of the conference participant at the site B (captured image 502 in FIG. 5) displayed on the display device 104, for example. do.

そしてセッション管理部３０１は、拠点Ａの通信装置１０１から受信した撮像画像を解析した結果、会議参加者が拠点Ｂの会議参加者の画像に触れるというイベントが発生したことを判別した場合、データ送信に用いるセッションを切り替える。具体的には、セッション管理部３０１は、拠点Ａの会議参加者が拠点Ｂの会議参加者の画像に触れている際の発言をテキスト化した文字データを、拠点Ａ及び拠点Ｃへの送信用のセッションだけを用いて送信し、拠点Ｂへは送信しない。即ち、配信装置１００は、複数の通信装置１０１（拠点Ａ、Ｂ及びＣの通信装置１０１）のうち、発言をテキスト化した文字データの出力先の装置を、所定のイベントが発生したことを判別した判別結果に基づいて切り替える。また、セッション管理部３０１は、拠点Ａにおいて集音された音声に基づくオーディオデータの出力先を、文字データの出力先と同様に切り替える。これにより、拠点Ａから拠点Ｃだけに向けた発言に関しては、その発言をテキスト化した文字もその音声も拠点Ｂには伝わらない。これにより、複数拠点間の会議の中で、特定の拠点だけに伝わる会話を行うことができ、さらにその内容を特定の拠点の表示装置１０４にのみ文字で表示することができる。 When the session management unit 301 analyzes the captured image received from the communication device 101 at the site A and determines that an event has occurred in which the conference participant touches the image of the conference participant at the site B, data transmission is performed. Switch the session used for. Specifically, the session management unit 301 transmits the text data, which is a comment when the conference participant at the base A touches the image of the conference participant at the base B, to the base A and the base C. Is transmitted using only the current session, and is not transmitted to the site B. In other words, the distribution device 100 determines that a predetermined event has occurred in the output device of the character data in which the utterance is converted to text among the plurality of communication devices 101 (communication devices 101 of the bases A, B, and C). Switching based on the determined result. In addition, the session management unit 301 switches the output destination of the audio data based on the sound collected at the site A in the same manner as the output destination of the character data. As a result, regarding the remarks directed only from the base A to the base C, the text and voice of the remarks are not transmitted to the base B. As a result, in a conference between a plurality of bases, a conversation transmitted only to a specific base can be performed, and further, the contents can be displayed in characters only on the display device 104 of the specific base.

以上、配信装置１００におけるイベント発生の判別と判別結果に応じたセッション制御について説明した。なお上記の説明では、セッション管理部３０１が通信装置１０１から受信した撮像画像を解析することで所定のイベントが発生したことを判別するものとした。これにより、ユーザは複雑な操作を行うことなくデータの出力先を切り替えることができる。なお、配信装置１００におけるイベント発生の判別方法はこれに限らない。例えば、通信装置１０１がカメラ１０３による撮像画像を解析することで所定のイベントの発生を検知し、所定のイベントの発生を示す通知を配信装置１００に送信する。そしてセッション管理部３０１は、所定のイベントの発生を示す通知を通信装置１０１から受信し、その通知に基づいて所定のイベントが発生したことを判別してもよい。この方法によれば、配信装置１００の処理負荷を低減することができる。また、所定のイベントが発生したことを判別する方法は撮像画像の解析に限らず、例えばマイク１０２により集音された音声の解析などによってイベントの発生が判別されてもよい。 Heretofore, the event occurrence determination in the distribution apparatus 100 and the session control according to the determination result have been described. In the above description, the session management unit 301 determines that a predetermined event has occurred by analyzing the captured image received from the communication apparatus 101. Thereby, the user can switch the output destination of data without performing a complicated operation. Note that the method for determining event occurrence in the distribution apparatus 100 is not limited to this. For example, the communication apparatus 101 detects the occurrence of a predetermined event by analyzing an image captured by the camera 103 and transmits a notification indicating the occurrence of the predetermined event to the distribution apparatus 100. Then, the session management unit 301 may receive a notification indicating the occurrence of a predetermined event from the communication apparatus 101, and may determine that the predetermined event has occurred based on the notification. According to this method, the processing load on the distribution apparatus 100 can be reduced. In addition, the method for determining that a predetermined event has occurred is not limited to analysis of a captured image, and the occurrence of an event may be determined by, for example, analyzing sound collected by the microphone 102.

なお本実施形態では、配信装置１００が通信装置１０１から各拠点における音声の集音により生成されるオーディオデータを受信し、そのオーディオデータに含まれる発言を音声認識によりテキスト化することで文字データを生成する場合について説明した。ただし、配信装置１００が文字データを取得する方法はこれに限らない。例えば、配信装置１００は通信装置１０１から受信したオーディオデータを外部の変換装置へ送信し、変換装置は受信したオーディオデータから文字データを生成し、生成された文字データを配信装置１００が変換装置から取得してもよい。また例えば、配信装置１００は通信装置１０１から受信したオーディオデータに基づく音声をスピーカを介して出力し、その音声を聞いた操作者がキーボード等の入力装置を用いて配信装置１００に文字データを入力してもよい。 In the present embodiment, the distribution apparatus 100 receives audio data generated by collecting voice at each site from the communication apparatus 101, and converts the speech included in the audio data into text by voice recognition, thereby converting the character data. The case of generating was explained. However, the method by which the distribution apparatus 100 acquires character data is not limited to this. For example, the distribution device 100 transmits the audio data received from the communication device 101 to an external conversion device, the conversion device generates character data from the received audio data, and the distribution device 100 receives the generated character data from the conversion device. You may get it. Further, for example, the distribution apparatus 100 outputs voice based on the audio data received from the communication apparatus 101 via a speaker, and an operator who has heard the voice inputs character data to the distribution apparatus 100 using an input device such as a keyboard. May be.

また本実施形態では、配信装置１００が、発言に基づく文字データの出力に用いられるセッションを、イベントの発生を判別した判別結果に基づいて切り替える場合を中心に説明した。ただし配信装置１００は、セッションの切り替え以外の方法により、判別結果に基づいてデータの出力を切り替えてもよい。配信装置１００による出力の切り替えには、出力先の切り替えや、出力するか否かの切り替え、出力内容の切り替えなどが含まれる。 Further, in the present embodiment, the case where the distribution apparatus 100 switches the session used for outputting the character data based on the utterance based on the determination result of determining the occurrence of the event has been mainly described. However, the distribution apparatus 100 may switch data output based on the determination result by a method other than session switching. The switching of output by the distribution apparatus 100 includes switching of an output destination, switching whether to output, switching of output contents, and the like.

また、出力の切り替えの対象となるデータは、表示のためにテキスト化される発言に基づくデータであればよく、発言のテキスト化により生成される文字データに限らない。例えば、配信装置１００は通信装置１０１から発言が含まれる音声の集音により生成されるオーディオデータを取得し、そのオーディオデータの通信装置１０１への出力をイベント発生の判別結果に基づいて切り替えてもよい。そして、配信装置１００から出力されたオーディオデータを受け取った通信装置１０１が、そのオーディオデータに含まれる発言をテキスト化した結果を表示装置１０４に表示させてもよい。このような構成でも、テキスト化された会議の内容を、ユーザにとって理解しやすいように表示装置１０４に表示させることができる。 Further, the data to be switched is not limited to character data generated by converting the text into a text as long as the data is based on the text converted into text for display. For example, the distribution apparatus 100 may acquire audio data generated by collecting voice including speech from the communication apparatus 101 and switch the output of the audio data to the communication apparatus 101 based on the determination result of the event occurrence. Good. Then, the communication device 101 that has received the audio data output from the distribution device 100 may cause the display device 104 to display the result of converting the speech included in the audio data into text. Even in such a configuration, the textual contents of the meeting can be displayed on the display device 104 so as to be easily understood by the user.

以上説明したように、本実施形態に係る配信装置１００は、人物の発言であって表示のためにテキスト化される発言に基づくデータを取得し、テキスト化される発言を行う人物の動作に応じた所定のイベントが発生したことを判別する。そして配信装置１００は、取得したテキスト化される発言に基づくデータの出力を、イベント発生の判別結果に基づいて切り替える。これにより、配信装置１００からの出力を受け取る通信装置１０１は、発言をテキスト化した結果を表示装置１０４に表示してユーザに提示する際に、イベントの発生に応じた理解しやすい表示が可能となる。 As described above, the distribution apparatus 100 according to the present embodiment acquires data based on a utterance of a person and converted into text for display, and responds to the action of the person who makes the utterance converted into text. It is determined that a predetermined event has occurred. Then, the distribution apparatus 100 switches the output of data based on the acquired text-formed utterance based on the determination result of the event occurrence. As a result, the communication device 101 that receives the output from the distribution device 100 can display an easy-to-understand display according to the occurrence of an event when displaying the result of converting the text into a text on the display device 104 and presenting it to the user. Become.

本実施形態では、複数の通信装置１０１間の通信を中継する配信装置１００が、テキスト化される発言に基づくデータの出力をイベント発生の判別結果に基づいて切り替える情報処理装置として動作する場合を中心に説明した。このように複数の通信装置１０１間の通信を配信装置１００が一括して制御することで、処理が簡潔な多地点会議システム１０を構成できる。ただしこれに限らず、各通信装置１０１が上述した配信装置１００と同様の機能を有する情報処理装置として動作し、通信装置１０１同士が配信装置１００を介さず直接通信するような場合であっても、本実施形態と同様の効果を得ることができる。以下では、テキスト化される発言に基づくデータを他拠点に送信する通信装置１０１がデータの出力を切り替える場合と、テキスト化される発言に基づくデータを他拠点から受信した通信装置１０１がデータの出力を切り替える場合の、２通りについて説明する。以下で説明する多地点会議システム１０においては、各拠点の通信装置１０１は変換部３００及びセッション管理部３０１を有しており、通信装置１０１同士は直接接続されている。 In the present embodiment, the case where the distribution apparatus 100 that relays communication between the plurality of communication apparatuses 101 operates as an information processing apparatus that switches the output of data based on the utterances that are converted into text based on the determination result of the event occurrence. Explained. As described above, the distribution device 100 collectively controls communication between the plurality of communication devices 101, whereby the multipoint conference system 10 with simple processing can be configured. However, the present invention is not limited to this, and even when each communication device 101 operates as an information processing device having the same function as the distribution device 100 described above and the communication devices 101 communicate directly with each other without using the distribution device 100. The effect similar to this embodiment can be acquired. In the following, a case where the communication apparatus 101 that transmits data based on the utterance to be converted to text switches the output of the data, and a case where the communication apparatus 101 that has received data based on the utterance that is converted to the text from another base outputs the data. Two ways of switching are described. In the multipoint conference system 10 described below, the communication device 101 at each base has a conversion unit 300 and a session management unit 301, and the communication devices 101 are directly connected to each other.

まず、データ送信側の通信装置１０１が出力を切り替える場合について説明する。拠点Ａにおいて会議参加者が発言を行った場合、その音声は拠点Ａのマイク１０２により集音され、拠点Ａの通信装置１０１は集音されたテキスト化される発言に基づくデータを取得して拠点Ｂ及び拠点Ｃの通信装置１０１へ出力（送信）する。ここで、拠点Ａの通信装置１０１から出力される、テキスト化される発言に基づくデータは、音声の集音により生成されるオーディオデータであってもよいし、発言のテキスト化により生成される文字データであってもよい。文字データが出力される場合、それを受け取った拠点Ｂ及び拠点Ｃの通信装置１０１は、その文字データが示す文字を含む画像データを生成して表示装置１０４に表示させる。一方、オーディオデータが出力される場合、それを受け取った拠点Ｂ及び拠点Ｃの通信装置１０１は、そのオーディオデータに含まれる発言をテキスト化し、その結果得られる文字を含む画像データを表示装置１０４に表示させる。 First, the case where the communication apparatus 101 on the data transmission side switches the output will be described. When a conference participant makes a statement at the site A, the voice is collected by the microphone 102 at the site A, and the communication device 101 at the site A obtains data based on the collected voiced statement and obtains data. B (output) to the communication device 101 at B and site C. Here, the data based on the utterances that are converted into text and output from the communication device 101 at the site A may be audio data generated by collecting voices or characters generated by converting the utterances into text. It may be data. When the character data is output, the communication devices 101 of the base B and the base C that have received the character data generate image data including the character indicated by the character data and cause the display device 104 to display the image data. On the other hand, when the audio data is output, the communication devices 101 of the bases B and C that have received the audio data convert the speech included in the audio data into text, and display the image data including the characters obtained as a result on the display device 104. Display.

そして拠点Ａの通信装置１０１は、所定のイベントが発生したことを判別すると、上記のテキスト化される発言に基づくデータの出力を切り替える。例えば、拠点Ａの会議参加者が表示装置１０４に表示された拠点Ｂの会議参加者の画像に触れたことが判別された場合、拠点Ａの通信装置１０１は、データの出力先の装置を拠点Ｃの通信装置１０１だけに切り替える。これにより、拠点Ｂの会議参加者に知られたくない会話内容に関するデータが拠点Ｂの通信装置１０１に送信されないようにすることができる。また例えば、拠点Ａの通信装置１０１は、取得したテキスト化される発言に基づくデータを外部の装置へ出力するか否かを、イベント発生の判別結果に基づいて切り替える。具体的には、拠点Ａの会議参加者が表示装置１０４とは別の方向を向いていることが判別されると、拠点Ａの通信装置１０１は拠点内での会話が行われていると判断し、取得したテキスト化される発言に基づくデータを他の装置へ出力しない。これにより、他拠点にとって必要のない会話内容に関するデータの送信のために通信帯域が使用されることを低減できる。 When the communication apparatus 101 at the site A determines that a predetermined event has occurred, the communication apparatus 101 switches the output of data based on the above-mentioned text. For example, when it is determined that the conference participant at the site A has touched the image of the conference participant at the site B displayed on the display device 104, the communication device 101 at the site A selects the device that is the data output destination as the site. Only the C communication apparatus 101 is switched. As a result, it is possible to prevent data related to conversation contents that the conference participants at the base B do not want to know from being transmitted to the communication device 101 at the base B. Further, for example, the communication device 101 at the site A switches whether to output the data based on the acquired text-like utterance to an external device based on the determination result of the event occurrence. Specifically, when it is determined that the conference participant at the site A is facing a different direction from the display device 104, the communication device 101 at the site A determines that a conversation is being performed within the site. However, the data based on the acquired text-formed utterance is not output to another device. As a result, it is possible to reduce the use of the communication band for the transmission of data related to conversation contents that are not necessary for other sites.

次に、データ受信側の通信装置１０１が出力を切り替える場合について説明する。拠点Ａにおいて会議参加者が発言を行った場合、拠点Ａの通信装置１０１はテキスト化される発言に基づくデータを拠点Ｂ及び拠点Ｃの通信装置１０１へ送信する。拠点Ｂ及び拠点Ｃの通信装置１０１は、拠点Ａの通信装置から送信されたデータを取得し、発言のテキスト化により得られる文字を含む画像データを表示装置１０４に出力する。また、拠点Ａの通信装置１０１は、所定のイベントが発生したことを検知した場合、所定のイベントの発生を示す通知を拠点Ｂ及び拠点Ｃの通信装置１０１に送信する。 Next, the case where the communication apparatus 101 on the data receiving side switches the output will be described. When the conference participant makes a statement at the site A, the communication device 101 at the site A transmits data based on the textized statement to the communication devices 101 at the site B and the site C. The communication devices 101 of the base B and the base C acquire data transmitted from the communication device of the base A, and output image data including characters obtained by converting the speech into text to the display device 104. When the communication device 101 at the site A detects that a predetermined event has occurred, the communication device 101 at the site A transmits a notification indicating the occurrence of the predetermined event to the communication devices 101 at the site B and the site C.

そして拠点Ｂ及び拠点Ｃの通信装置１０１は、この通知を受信することで所定のイベントが発生したことを判別し、拠点Ａの通信装置１０１から取得したテキスト化される発言に基づくデータの表示装置１０４への出力を切り替える。例えば、拠点Ｂの通信装置１０１は、取得したテキスト化される発言に基づくデータの出力内容を、イベント発生の判別結果に基づいて切り替える。具体的には、拠点Ｂの通信装置１０１は、イベントの発生以前の発言内容とイベントの発生以降の発言内容とが拠点Ｂの表示装置１０４において異なる領域（例えば別ウインドウや別スレッド）に表示されるように、出力する画像データの内容を切り替える。これによりユーザは、イベント発生前の発言とイベント発生後の発言を容易に識別でき、会議の内容が理解しやすくなる。また、通信装置１０１は、異なる領域に表示されていた発言内容が、一つの領域にまとめて表示されるように、出力内容をイベント発生の判別結果に応じて切り替えてもよい。 Then, the communication devices 101 of the base B and the base C determine that a predetermined event has occurred by receiving this notification, and display data based on the textized message acquired from the communication device 101 of the base A The output to 104 is switched. For example, the communication device 101 at the site B switches the output content of the data based on the acquired text-formed utterance based on the determination result of the event occurrence. Specifically, the communication device 101 at the base B displays the content of the speech before the occurrence of the event and the content of the speech after the occurrence of the event in different areas (for example, another window or another thread) on the display device 104 at the base B. As described above, the contents of the image data to be output are switched. Thereby, the user can easily identify the utterance before the event occurrence and the utterance after the event occurrence, and can easily understand the contents of the conference. Further, the communication apparatus 101 may switch the output contents according to the determination result of the event occurrence so that the utterance contents displayed in the different areas are collectively displayed in one area.

なお、拠点Ｂの通信装置１０１は、テキスト化される発言に基づくデータを、記憶装置などに出力してもよい。そして、この記憶装置への出力内容を、イベント発生の判別結果に基づいて切り替えてもよい。これにより、例えば議事録のような、発言をテキスト化した結果を、ユーザにとって理解しやすいような形式で表示可能となるように記憶装置に記憶することができる。 Note that the communication device 101 at the site B may output data based on the uttered text to a storage device or the like. Then, the output contents to the storage device may be switched based on the determination result of the event occurrence. As a result, for example, the result of converting the utterance into text, such as the minutes, can be stored in the storage device so that it can be displayed in a format that is easy for the user to understand.

また、上記のような、テキスト化される発言に基づくデータの出力内容をイベント発生の判別結果に基づいて切り替える構成を、１つの拠点の中で行われる会議の発言をテキスト化して出力する出力装置に適用してもよい。例えば、出力装置は、拠点内で集音された音声をテキスト化して文字データを生成し、その文字データに基づく画像データを出力装置に接続された表示装置１０４に出力する。そして出力装置は、所定のイベントが発生したことを判別すると、出力する画像データの内容を切り替える。このような構成でも、上述した複数拠点間の会議を行う場合と同様に、発言をテキスト化した結果をユーザにとって理解しやすいように表示できる。 Also, an output device that converts the output contents of data based on the text-formed comments as described above into text based on the results of the event occurrence determination result and outputs the comments of the meeting held in one base You may apply to. For example, the output device generates text data by converting the voice collected at the site into text, and outputs image data based on the text data to the display device 104 connected to the output device. When the output device determines that a predetermined event has occurred, the output device switches the content of the image data to be output. Even in such a configuration, as in the case of the above-described conference between a plurality of bases, the result of converting the speech into text can be displayed so as to be easily understood by the user.

本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ等）によっても実現可能である。また、そのプログラムをコンピュータにより読み取り可能な記録媒体に記録して提供してもよい。 The present invention supplies a program that realizes one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in a computer of the system or apparatus read and execute the program This process can be realized. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions. Further, the program may be provided by being recorded on a computer-readable recording medium.

１０多地点会議システム
１００配信装置
１０１通信装置
１０４表示装置 10 Multipoint Conference System 100 Distribution Device 101 Communication Device 104 Display Device

Claims

An acquisition means for acquiring data based on an utterance of a person that is textified for display;
A discriminating means for discriminating that a predetermined event corresponding to the action of the person who makes the texted speech occurs;
An information processing apparatus comprising: a switching unit that switches output of data based on the text-generated utterance acquired by the acquisition unit based on a determination result by the determination unit.

Having a second acquisition means for acquiring a captured image;
The information processing apparatus according to claim 1, wherein the determination unit determines that the predetermined event has occurred by analyzing a captured image acquired by the second acquisition unit.

Receiving means for receiving a notification indicating the occurrence of the predetermined event;
The information processing apparatus according to claim 1, wherein the determination unit determines that the predetermined event has occurred based on a notification received by the reception unit.

The switching means includes a character in which a utterance made before the determination means determines that the predetermined event has occurred, and a character after the determination means determines that the predetermined event has occurred. The output of data based on the textualized speech acquired by the acquiring unit is switched so that characters in which the performed speech is converted into text are displayed in an identifiable manner in the display unit. Item 4. The information processing apparatus according to any one of Items 1 to 3.

5. The switching unit according to claim 1, wherein the switching unit switches a session used for outputting data based on the text-like utterance acquired by the acquiring unit based on a determination result by the determining unit. The information processing apparatus according to claim 1.

The switching means switches whether or not the information processing apparatus outputs data based on the textified message acquired by the acquiring means to an external device based on a determination result by the determining means. The information processing apparatus according to any one of claims 1 to 3.

4. The switch according to claim 1, wherein the switching unit switches a data output destination device based on the text-like utterance acquired by the acquiring unit based on a determination result by the determining unit. 5. The information processing apparatus according to item 1.

5. The switching unit according to claim 1, wherein the switching unit switches the output content of the data based on the textized utterance acquired by the acquiring unit based on a determination result by the determining unit. The information processing apparatus described in 1.

9. The data based on the text-like utterance acquired by the acquiring unit includes audio data generated by collecting voice including the utterance. The information processing apparatus described in 1.

The data based on the textized speech acquired by the acquiring means includes character data generated by speech recognition processing with respect to audio data generated by collecting speech including the speech. The information processing apparatus according to any one of claims 1 to 9.

The predetermined event includes at least one of the person who makes the utterance to be turned into text in a predetermined direction and the person who makes the utterance made into text touches a predetermined area. The information processing apparatus according to any one of claims 1 to 10.

A second receiving means for receiving audio data generated by collecting voice including speech to be converted into text from a plurality of communication devices;
The acquisition means acquires character data generated by text conversion based on audio data received by the second reception means as data based on the text to be converted into text,
4. The switching device according to claim 1, wherein the switching unit switches the output device of the character data acquired by the acquisition unit among the plurality of communication devices based on a determination result by the determination unit. 5. The information processing apparatus according to claim 1.

An acquisition step of acquiring data based on a utterance of a person and uttered as text for display;
A determination step of determining that a predetermined event corresponding to the action of the person who makes the texted statement occurs;
An information processing method comprising: a switching step of switching output of data based on the text-like utterance acquired in the acquisition step based on a determination result in the determination step.

A second acquisition step of acquiring a captured image;
14. The information processing method according to claim 13, wherein the determination step determines that the predetermined event has occurred by analyzing the captured image acquired in the second acquisition step.

A program for causing a computer to operate as the information processing apparatus according to any one of claims 1 to 12.