JP7095356B2

JP7095356B2 - Communication terminal and conference system

Info

Publication number: JP7095356B2
Application number: JP2018064176A
Authority: JP
Inventors: 未来袴谷
Original assignee: Ricoh Co Ltd
Current assignee: Ricoh Co Ltd
Priority date: 2018-03-29
Filing date: 2018-03-29
Publication date: 2022-07-05
Anticipated expiration: 2038-03-29
Also published as: JP2019176386A

Description

本発明は、通信端末及び会議システムに関する。 The present invention relates to a communication terminal and a conference system.

近年、インターネット等の通信ネットワークを介して遠隔地間で会議を行う会議システムが普及している。 In recent years, a conference system for holding a conference between remote locations via a communication network such as the Internet has become widespread.

この会議システムでは、互いに遠隔地にある複数の会議室に通信端末を設置する。通信端末は各々、会議の当事者などの会議室の画像および発言などの音声を撮影および収集する。そして、通信端末は各々、撮影および収集した画像および音声をデジタルデータに変換して、他の会議室に設定された他の通信端末に送信する。また、通信端末は、他の通信端末から画像や音声を受信すると、ディスプレイに画像表示およびスピーカに音声出力する。これにより、実際の会議に近い状態で遠隔地間の会議を行う技術が既に知られている。 In this conference system, communication terminals are installed in a plurality of conference rooms that are remote from each other. Each communication terminal captures and collects images of conference rooms such as those involved in the conference and audio such as remarks. Then, each communication terminal converts the captured and collected images and sounds into digital data and transmits the images and sounds to other communication terminals set in the other conference rooms. Further, when the communication terminal receives an image or sound from another communication terminal, the communication terminal displays the image on the display and outputs the sound to the speaker. As a result, a technique for conducting a conference between remote locations in a state close to an actual conference is already known.

また、通信端末が、収集した音声を音声認識エンジンに送信し、音声認識エンジンが、当事者などの発言内容を自動でテキスト化し、議事録を自動生成する技術が既に知られている。 Further, a technique is already known in which a communication terminal transmits the collected voice to a voice recognition engine, and the voice recognition engine automatically converts the contents of remarks of the parties and the like into text and automatically generates minutes.

上記技術の一例として特許文献１の会議システムが挙げられる。特許文献１の会議システムでは、複数の会議端末（通信端末）各々が、自身が設置されている会議室（自拠点）の音声を録画サーバ（音声認識エンジン）に送信し、録画サーバが各会議端末から受信した音声を時系列に合成して議事録を作成している。 An example of the above technique is the conference system of Patent Document 1. In the conference system of Patent Document 1, each of a plurality of conference terminals (communication terminals) transmits the voice of the conference room (own base) in which the conference terminal is installed (own base) to the recording server (voice recognition engine), and the recording server performs each conference. The minutes are created by synthesizing the voice received from the terminal in chronological order.

しかしながら、音声認識エンジン（機械）が認識しやすいノイズ除去処理などの音響処理と、遠隔会議の対向拠点の人が聴き取りやすい（自然な）音響処理と、は異なる。このため、通信端末では、議事録作成用と、会議用と、で２種類の異なる音響処理を行う必要があるため、音声処理チップが２つ必要になってしまいコストアップとなってしまうという問題があった。 However, the sound processing such as noise removal processing that is easily recognized by the voice recognition engine (machine) and the (natural) sound processing that is easy for the person at the opposite base of the remote conference to hear are different. For this reason, in the communication terminal, it is necessary to perform two different types of sound processing, one for creating minutes and the other for a conference, so that two voice processing chips are required, which increases the cost. was there.

本発明は、以上の背景に鑑みてなされたものであり、各通信端末が音響処理部を一つしか備えない場合であっても、議事録の作成と、参加者に聞き取りやすい音声出力と、を両立できるようにする通信端末及び会議システムを提供することを目的としている。 The present invention has been made in view of the above background, and even when each communication terminal has only one sound processing unit, the minutes can be created, the audio output can be easily heard by the participants, and so on. It is an object of the present invention to provide a communication terminal and a conference system that make it possible to achieve both.

上述した課題を解決するためになされた請求項１記載の発明は、複数の通信端末間の少なくとも音声を中継する会議サーバと、前記音声を用いて議事録を作成する議事録サーバと接続可能な通信端末において、他の通信端末との連結を検出する検出部と、前記検出部によって前記連結が検出されていない場合、接続先を前記会議サーバに設定し、前記検出部によって前記連結が検出された場合、接続先を前記議事録サーバに設定する接続先設定部と、マイクと、前記マイクに入力された自拠点の参加者の発話である第１の音声と、前記他の通信端末が備えるスピーカから出力されて前記マイクに入力された他拠点の参加者の発話である第２の音声と、に対して、前記接続先が前記議事録サーバに設定されている場合、議事録作成用の音響処理を実行し、前記接続先が前記会議サーバに設定されている場合、会議用の音響処理を実行する音響処理部と、前記音響処理部によって前記議事録作成用の音響処理もしくは前記会議用の音響処理が実行された後の前記第１の音声と前記第２の音声とを前記接続先が前記議事録サーバに設定されている場合、前記議事録サーバに送信し、前記接続先が前記会議サーバに設定されている場合、前記会議サーバに送信する送信部と、を備えることを特徴とする。 The invention according to claim 1 made to solve the above-mentioned problems can be connected to a conference server that relays at least voice between a plurality of communication terminals and a minutes server that creates minutes using the voice. In the communication terminal, if the detection unit that detects the connection with another communication terminal and the detection unit do not detect the connection, the connection destination is set to the conference server, and the detection unit detects the connection. In this case, the connection destination setting unit for setting the connection destination to the minutes server, the microphone, the first voice input to the microphone and the speech of the participant at the own base, and the other communication terminal are provided. When the connection destination is set to the minutes server for the second voice output from the speaker and input to the microphone by the participants of the other bases, the minutes are created. When the sound processing is executed and the connection destination is set to the conference server, the sound processing unit that executes the sound processing for the conference and the sound processing unit for creating the minutes or the conference When the connection destination is set to the minutes server, the first voice and the second voice after the sound processing of the above is executed are transmitted to the minutes server, and the connection destination is the connection destination. When set in the conference server, it is characterized by including a transmission unit for transmitting to the conference server.

以上説明したように請求項１記載の発明によれば、各拠点の通信端末が音響処理部を一つしか備えない場合であっても、議事録の作成と、参加者に聞き取りやすい音声出力と、を両立できる。 As described above, according to the invention of claim 1, even if the communication terminal of each base has only one sound processing unit, the minutes can be created and the audio output can be easily heard by the participants. , Can be compatible.

本発明の通信端末としての会議端末を組み込んだ会議システムの一実施形態を示す図である。It is a figure which shows one Embodiment of the conference system which incorporated the conference terminal as the communication terminal of this invention. 図１に示す会議端末の機能ブロック図である。It is a functional block diagram of the conference terminal shown in FIG. 図１に示す会議端末の外観図である。It is an external view of the conference terminal shown in FIG. 図１に示す会議端末の処理手順を示すフローチャートである。It is a flowchart which shows the processing procedure of the conference terminal shown in FIG. 図１に示す会議端末の処理手順を示すフローチャートである。It is a flowchart which shows the processing procedure of the conference terminal shown in FIG.

以下、本発明の一実施形態を、図面に基づいて説明する。 Hereinafter, an embodiment of the present invention will be described with reference to the drawings.

まず、会議システムの構成について図１を参照して説明する。同図に示すように会議システム１は、会議予約サーバ１１と、会議サーバ１２と、議事録サーバ１３と、これらサーバ１１～１３とインターネットＮを経由して通信する複数の会議端末１４１～１４ｎ、１５と、を備えている。 First, the configuration of the conference system will be described with reference to FIG. As shown in the figure, the conference system 1 includes a conference reservation server 11, a conference server 12, a minutes server 13, and a plurality of conference terminals 141 to 14n that communicate with these servers 11 to 13 via the Internet N. It is equipped with 15.

会議予約サーバ１１は、会議端末１４１～１４ｎ、１５と通信を行い、会議端末１４１～１４ｎ、１５から事前に会議情報（会議開催日時、会議参加者、役割、使用する会議端末等）が入力される。各会議端末１４１～１４ｎ、１５は、起動時に会議予約サーバ１１に問い合わせを行い、該当する会議を見つけた場合、事前に設定した情報に基づいて会議の制御を行う。 The conference reservation server 11 communicates with the conference terminals 141 to 14n and 15, and conference information (meeting date and time, conference participants, roles, conference terminal to be used, etc.) is input in advance from the conference terminals 141 to 14n and 15. To. Each of the conference terminals 141 to 14n and 15 makes an inquiry to the conference reservation server 11 at the time of startup, and when a corresponding conference is found, controls the conference based on the information set in advance.

会議サーバ１２は、複数の会議端末１４１～１４ｎ間の音声、映像などを中継するサーバである。また、会議サーバ１２は、各会議端末１４１～１４ｎが会議サーバ１２と接続しているか否かの状態モニタ、会議開始時に会議サーバ１２の呼び出し制御なども行う。 The conference server 12 is a server that relays audio, video, and the like between a plurality of conference terminals 141 to 14n. Further, the conference server 12 also monitors the status of whether or not each conference terminal 141 to 14n is connected to the conference server 12, and controls the call of the conference server 12 at the start of the conference.

議事録サーバ１３は、音声認識エンジンなどから構成され、会議端末１５から受信した音声を例えばテキスト化して議事録を作成する。 The minutes server 13 is composed of a voice recognition engine or the like, and creates minutes by converting the voice received from the conference terminal 15, for example, into text.

複数の会議端末１４１～１４ｎは各々、各拠点の会議室に配置して、会議サーバ１２と通信接続することにより、各会議室の映像や音声などを共有することができる。図１に示す例では、会議端末１４１～１４ｎを各拠点に配置して、会議サーバ１２に接続させ、会議端末１５を複数の会議端末１４１～１４ｎの１つである会議端末１４１に連結させて議事録サーバ１３に接続させている。 By arranging each of the plurality of conference terminals 141 to 14n in the conference room of each base and communicating with the conference server 12, the video and audio of each conference room can be shared. In the example shown in FIG. 1, the conference terminals 141 to 14n are arranged at each base and connected to the conference server 12, and the conference terminal 15 is connected to the conference terminal 141 which is one of the plurality of conference terminals 141 to 14n. It is connected to the minutes server 13.

次に、会議端末１４１～１４ｎ、１５の構成について図２及び図３を参照して説明する。 Next, the configurations of the conference terminals 141 to 14n and 15 will be described with reference to FIGS. 2 and 3.

本実施形態の会議端末１４１～１４ｎ、１５は各々、所謂インタラクティブ・ホワイトボードから構成されている。図２などに示すように会議端末１４１～１４ｎ、１５は、カメラ２１と、タッチパネルディスプレイ２２と、マイク２３と、スピーカ２４と、ＣＰＵ２５と、記憶装置２６と、メモリ２７と、送信部としてのＬＡＮＩ／Ｆ部２８と、操作部２９と、検出部としての連結部３０と、を備えている。 The conference terminals 141 to 14n and 15 of the present embodiment are each composed of a so-called interactive whiteboard. As shown in FIGS. 2 and 2, the conference terminals 141 to 14n and 15 include a camera 21, a touch panel display 22, a microphone 23, a speaker 24, a CPU 25, a storage device 26, a memory 27, and a LANI as a transmission unit. It includes a / F unit 28, an operation unit 29, and a connecting unit 30 as a detection unit.

カメラ２１は、周囲の映像を動画として取得し、ＣＰＵ２５に送信する機能を有する。カメラ２１は、図３に示すように、会議端末１４１～１４ｎ、１５の特定の位置に固定して設置されている。このように会議端末１４１～１４ｎ、１５にカメラ２１を設けることによって、カメラ２１によって撮影した各拠点の会議参加者の映像を複数の会議端末１４１～１４ｎ、１５で共有することができる。なお、図３に示す例では、カメラ２１は１つしか設けられていないが、２つ以上設けて、発話者位置に応じて使用するカメラ２１を切り替えられるようにしてもよい。 The camera 21 has a function of acquiring surrounding images as moving images and transmitting them to the CPU 25. As shown in FIG. 3, the camera 21 is fixedly installed at a specific position of the conference terminals 141 to 14n and 15. By providing the cameras 21 in the conference terminals 141 to 14n and 15 in this way, the images of the conference participants at each base taken by the cameras 21 can be shared by the plurality of conference terminals 141 to 14n and 15. In the example shown in FIG. 3, only one camera 21 is provided, but two or more cameras may be provided so that the camera 21 to be used can be switched according to the speaker position.

タッチパネルディスプレイ２２は、ＣＰＵ２５から受信した映像を画面に表示する機能有する。タッチパネルディスプレイ２２は、ユーザが指もしくはペンで触れた位置を画面に表示したり、触れた位置の座標をＣＰＵ２５に送信する機能を有する。このように会議端末１４１～１４ｎ、１５にタッチパネルディスプレイ２２を設けることにより、ユーザがペンもしくは指で筆記したデータを複数の会議端末１４１～１４ｎ、１５で共有することができる。なお、タッチパネルディスプレイ２２としては、例えばディスプレイ一体型の静電容量方式タッチパネルが挙げられる。また、ディスプレイとしては、ＬＣＤや電子ペーパーディスプレイが挙げられる。 The touch panel display 22 has a function of displaying an image received from the CPU 25 on the screen. The touch panel display 22 has a function of displaying the position touched by the user with a finger or a pen on the screen and transmitting the coordinates of the touched position to the CPU 25. By providing the touch panel display 22 on the conference terminals 141 to 14n and 15 in this way, the data written by the user with a pen or a finger can be shared by the plurality of conference terminals 141 to 14n and 15. As the touch panel display 22, for example, a display-integrated capacitive touch panel can be mentioned. Further, examples of the display include an LCD and an electronic paper display.

マイク２３は、会議参加者の音声を取得し、ＣＰＵ２５に送信する機能を有する。マイク２３は、会議端末１４１～１４ｎ、１５の特定の位置に固定して設置されている。本実施形態では、マイク２３は、複数並べて配置されている。これにより、会議端末１４１～１４ｎ、１５は、複数のマイク２３からの音声から発話者の方向を検出し、その方向以外からのノイズを除去する処理を行うことができる。このように会議端末１４１～１４ｎ、１５にマイク２３を設けることにより、マイク２３によって得た各拠点の会議参加者の音声を複数の会議端末１４１～１４ｎで、１５共有することができる。なお、図３に示す例では、マイク２３は複数設けられているが、１つだけ設けるようにしてもよい。 The microphone 23 has a function of acquiring the voices of the conference participants and transmitting them to the CPU 25. The microphone 23 is fixedly installed at a specific position of the conference terminals 141 to 14n and 15. In this embodiment, a plurality of microphones 23 are arranged side by side. As a result, the conference terminals 141 to 14n and 15 can detect the direction of the speaker from the voices from the plurality of microphones 23 and perform a process of removing noise from other directions. By providing the microphones 23 in the conference terminals 141 to 14n and 15 in this way, the voices of the conference participants at each base obtained by the microphones 23 can be shared by the plurality of conference terminals 141 to 14n. In the example shown in FIG. 3, a plurality of microphones 23 are provided, but only one microphone 23 may be provided.

ＣＰＵ２５は、会議端末１４１～１４ｎ、１５全体の制御を司る。ＣＰＵ２５は、ＣＯＤＥＣ２５Ａと、ＤＳＰ２５Ｂと、を有している。ＣＯＤＥＣ２５Ａは、カメラ２１、タッチパネルディスプレイ２２、マイク２３、から取得した映像、描画、音声をエンコードして、後述するＬＡＮＩ／Ｆ部２８に出力する。また、ＣＯＤＥＣ２５Ａは、ＬＡＮＩ／Ｆ部２８から受信した映像、描画、音声をデコードして、タッチパネルディスプレイ２２やスピーカ２４に出力する。このＣＯＤＥＣ２５Ａの一例として、ＶＰ８やＶＰ９、Ｈ．２６４／ＡＶＣ、Ｈ．２６４／ＳＶＣ、Ｈ．２６５がある。 The CPU 25 controls the conference terminals 141 to 14n and 15 as a whole. The CPU 25 has a CODEC 25A and a DSP 25B. The CODEC 25A encodes the video, drawing, and audio acquired from the camera 21, the touch panel display 22, and the microphone 23, and outputs the video, drawing, and audio to the LANI / F unit 28, which will be described later. Further, the CODEC 25A decodes the video, drawing, and audio received from the LANI / F unit 28 and outputs them to the touch panel display 22 and the speaker 24. As an example of this CODEC25A, VP8, VP9, H.I. 264 / AVC, H. 264 / SVC, H. There are 265.

音響処理部としてのＤＳＰ２５Ｂは、カメラ２１から取得した映像の映像処理や、マイク２３から取得した音声の音響処理を実行する。 The DSP 25B as an audio processing unit executes video processing of video acquired from the camera 21 and acoustic processing of audio acquired from the microphone 23.

スピーカ２４は、ＣＰＵ２５から受信した音声を出力する機能を有する。 The speaker 24 has a function of outputting the sound received from the CPU 25.

記憶装置２６は、ＣＰＵ２５が行う機器制御、ビデオ会議制御等のプログラムの記憶を行う。例としてＤＤＲメモリのような揮発性のメモリが挙げられる。 The storage device 26 stores programs such as device control and video conference control performed by the CPU 25. An example is volatile memory such as DDR memory.

ＬＡＮＩ／Ｆ部２８は、インターネットＮ等を経由して会議サーバ１２と接続し、画像・音声の送受信を行う。ＬＡＮＩ／Ｆ部２８は、10Base-T、100Base-Tに対応しEthernetに接続する有線ＬＡＮ、802.11a/b/g/n/acに対応した無線ＬＡＮが例として挙げられる。 The LANI / F unit 28 connects to the conference server 12 via the Internet N or the like, and transmits / receives images / sounds. Examples of the LANI / F unit 28 include a wired LAN corresponding to 10Base-T and 100Base-T and connecting to Ethernet, and a wireless LAN corresponding to 802.11a / b / g / n / ac.

操作部２９は、キーボードやボタン等を有し、ユーザが会議端末１４１～１４ｎ、１５の機器制御を行うことができる。 The operation unit 29 has a keyboard, buttons, and the like, and the user can control the devices of the conference terminals 141 to 14n and 15.

連結部３０は、コネクタなどから構成され、他の会議端末１４１～１４ｎ、１５と接続（連結）することができる。連結部３０は、他の会議端末１４１～１４ｎ、１５の連結を検出し、検出結果をＣＰＵ２５に出力する。 The connecting portion 30 is composed of a connector or the like, and can be connected (connected) to other conference terminals 141 to 14n and 15. The connecting unit 30 detects the connection of the other conference terminals 141 to 14n and 15, and outputs the detection result to the CPU 25.

上記構成の会議端末１４１～１４ｎ、１５は、カメラ２１、マイク２３から取得した画像・音声を会議サーバ１２に送信する。会議サーバ１２は、受信した画像・音声を他の会議端末１４１～１４ｎ、１５に対して送信する。 The conference terminals 141 to 14n and 15 having the above configuration transmit images and sounds acquired from the camera 21 and the microphone 23 to the conference server 12. The conference server 12 transmits the received images / sounds to the other conference terminals 141 to 14n and 15.

例えば会議端末１４１、１４２、１４３で会議を行った場合、会議端末１４１が送信したデータは会議サーバ１２を介して他の会議端末１４２、１４３に送信され、参加していない会議端末１４４～１４ｎ、１５には送信されない。同様に会議端末１４２、１４３のデータは会議サーバ１２を介して参加している会議端末１４１～１４３にはデータが送信され、会議に参加していない会議端末１４４～１４ｎ、１５にはデータが送信されない。 For example, when a conference is held on the conference terminals 141, 142, 143, the data transmitted by the conference terminal 141 is transmitted to the other conference terminals 142, 143 via the conference server 12, and the conference terminals 144 to 14n that do not participate in the conference. Not transmitted to 15. Similarly, the data of the conference terminals 142 and 143 are transmitted to the conference terminals 141 to 143 participating via the conference server 12, and the data is transmitted to the conference terminals 144 to 14n and 15 not participating in the conference. Not done.

上記のような制御を行うことで、複数の会議端末（多拠点）間で会議を行うことができる。 By performing the above control, it is possible to hold a conference between a plurality of conference terminals (multi-site).

次に、上述した構成の会議システムの動作について図４及び図５のフローチャートを参照して以下説明する。まず、会議端末１４１～１４ｎのＣＰＵ２５（以下、単に会議端末１４１～１４ｎと略記する）は、電源がオンすると図４に示す処理を実行する。まず、会議端末１４１～１４ｎは、ＤＳＰ２５Ｂが実行する音響処理やマイク２３、スピーカ２４の初期化設定を行う（ステップＳ１）。 Next, the operation of the conference system having the above-described configuration will be described below with reference to the flowcharts of FIGS. 4 and 5. First, the CPU 25 of the conference terminals 141 to 14n (hereinafter, simply abbreviated as conference terminals 141 to 14n) executes the process shown in FIG. 4 when the power is turned on. First, the conference terminals 141 to 14n perform sound processing executed by the DSP 25B and initialization settings of the microphone 23 and the speaker 24 (step S1).

その後、会議端末１４１～１４ｎは、連結部３０が連結を検出したか否かを判定する（ステップＳ２）。会議端末１４１～１４ｎは、連結部３０が連結を検出したと判定すると（ステップＳ２でＹ）、自身を端末１（メイン）として設定した後（ステップＳ３）、ステップＳ４に進む。一方、会議端末１４１～１４ｎは、連結部３０が連結を検出していないと判定すると（ステップＳ２でＮ）、直ちにステップＳ４に進む。 After that, the conference terminals 141 to 14n determine whether or not the connecting unit 30 has detected the connection (step S2). When the conference terminals 141 to 14n determine that the connection unit 30 has detected the connection (Y in step S2), the conference terminals 141 to 14n set themselves as the terminal 1 (main) (step S3), and then proceed to step S4. On the other hand, when the conference terminals 141 to 14n determine that the connection unit 30 has not detected the connection (N in step S2), the conference terminals 141 to 14n immediately proceed to step S4.

ステップＳ４において会議端末１４１～１４ｎは、スピーカ２４を通常の音量で出力するように設定する。また、ステップＳ４において会議端末１４１～１４ｎは、スピーカ２４から出力されてマイク２３に入力された他拠点の参加者の発話である第２の音声を除去するエコーキャンセラをオンする。また、ステップＳ４において会議端末１４１～１４ｎは、マイク２３に入力された音声から周辺ノイズを除去するノイズサプレッサーを会議用に設定する。 In step S4, the conference terminals 141 to 14n are set to output the speaker 24 at a normal volume. Further, in step S4, the conference terminals 141 to 14n turn on the echo canceller that removes the second voice that is the utterance of the participant of the other base, which is output from the speaker 24 and input to the microphone 23. Further, in step S4, the conference terminals 141 to 14n set a noise suppressor for the conference to remove peripheral noise from the voice input to the microphone 23.

その後、ユーザが操作部２９を操作して会議を開始すると（ステップＳ５でＹ）、会議端末１４１～１４ｎは会議サーバ１２に接続する（ステップＳ６）。これにより、会議端末１４１～１４ｎは、マイク２３により入力された音声にステップＳ４で設定された音声処理を施した後、会議サーバ１２に送信する。また、会議端末１４１～１４ｎは、会議サーバ１２を介して他の会議端末１４１～１４ｎで取得した画像、音声を受信し、タッチパネルディスプレイ２２やスピーカ２４に出力する。 After that, when the user operates the operation unit 29 to start the conference (Y in step S5), the conference terminals 141 to 14n connect to the conference server 12 (step S6). As a result, the conference terminals 141 to 14n perform the voice processing set in step S4 on the voice input by the microphone 23, and then transmit the voice to the conference server 12. Further, the conference terminals 141 to 14n receive images and sounds acquired by other conference terminals 141 to 14n via the conference server 12 and output them to the touch panel display 22 and the speaker 24.

ステップＳ４ではエコーキャンセラをオンにしている。このため、ステップＳ６において会議端末１４１～１４ｎは、マイク２３に入力された音声にエコーキャンセラを実行する。結果、会議端末１４１～１４ｎは、スピーカ２４から出力されてマイク２３に入力された他拠点の参加者の発話である第２の音声を除去し、自拠点の参加者からマイク２３に直接入力された発話である第１の音声のみが会議サーバ１２に送信される。 In step S4, the echo canceller is turned on. Therefore, in step S6, the conference terminals 141 to 14n execute an echo canceller on the voice input to the microphone 23. As a result, the conference terminals 141 to 14n remove the second voice, which is the utterance of the participant of the other base, which is output from the speaker 24 and input to the microphone 23, and are directly input to the microphone 23 from the participant of the own base. Only the first voice, which is the utterance, is transmitted to the conference server 12.

ステップＳ４ではノイズサプレッサーも会議用に強めに設定されている。このため、ステップＳ６において会議端末１４１～１４ｎは、スピーカ２４から出力されてマイク２３に入力された他拠点の参加者の発話をノイズとして除去するため、一層、第２の音声の除去効果を高めることができる。そして、他の会議端末１４１～１４ｎは、エコーキャンセラや高い効果のノイズサプレッサが実行された音声を受信してスピーカ２４から出力するため、他拠点の参加者の発話が聴き取りやすくなる。 In step S4, the noise suppressor is also set to be stronger for the conference. Therefore, in step S6, the conference terminals 141 to 14n remove the utterances of the participants at other bases output from the speaker 24 and input to the microphone 23 as noise, further enhancing the effect of removing the second voice. be able to. Then, since the other conference terminals 141 to 14n receive the voice on which the echo canceller or the noise suppressor with a high effect is executed and output it from the speaker 24, it becomes easy to hear the utterances of the participants at the other bases.

その後、会議が終了すると（ステップＳ７でＹ）、会議端末１４１～１４ｎは、連結部３０が連結解除を検出したか否かを判定する（ステップＳ８）。 After that, when the conference is completed (Y in step S7), the conference terminals 141 to 14n determine whether or not the connection unit 30 has detected the disconnection (step S8).

連結部３０が連結解除を検出したと判定すると（ステップＳ８でＹ）、会議端末１４１～１４ｎは、ステップＳ１に戻る。一方、連結部３０が連結解除を検出していないと判定すると（ステップＳ８でＮ）、会議端末１４１～１４ｎは、ステップＳ４に戻る。 When it is determined that the connecting unit 30 has detected the disconnection (Y in step S8), the conference terminals 141 to 14n return to step S1. On the other hand, if it is determined that the connection unit 30 has not detected the disconnection (N in step S8), the conference terminals 141 to 14n return to step S4.

一方、会議端末１５のＣＰＵ２５（以下、単に会議端末と略記する）は、電源がオンすると図５に示す処理を実行する。まず、会議端末１５は、ＤＳＰ２５Ｂが実行する音響処理やマイク２３、スピーカ２４の初期化設定を行う（ステップＳ１０）。 On the other hand, the CPU 25 of the conference terminal 15 (hereinafter, simply abbreviated as the conference terminal) executes the process shown in FIG. 5 when the power is turned on. First, the conference terminal 15 performs acoustic processing executed by the DSP 25B and initialization settings of the microphone 23 and the speaker 24 (step S10).

その後、会議端末１５は、連結部３０が連結を検出したか否かを判定する（ステップＳ１１）。会議端末１５は、連結部３０が連結を検出したと判定すると（ステップＳ１１でＹ）、自身を端末２（サブ）として設定した後（ステップＳ１２）、ステップＳ１３に進む。 After that, the conference terminal 15 determines whether or not the connection unit 30 has detected the connection (step S11). When the conference terminal 15 determines that the connection unit 30 has detected the connection (Y in step S11), the conference terminal 15 sets itself as the terminal 2 (sub) (step S12), and then proceeds to step S13.

ステップＳ１３において会議端末１５は、スピーカ２４から音声が出力されないようにミュートに設定する。ステップＳ１３において会議端末１５は、上記エコーキャンセラをオフする。また、ステップＳ１３において会議端末１５は、マイク２３に入力された音声から周辺ノイズを除去するノイズサプレッサーを議事録作成用に設定する。 In step S13, the conference terminal 15 is set to mute so that no sound is output from the speaker 24. In step S13, the conference terminal 15 turns off the echo canceller. Further, in step S13, the conference terminal 15 sets a noise suppressor that removes ambient noise from the voice input to the microphone 23 for creating minutes.

その後、ユーザが操作部２９を操作して会議を開始すると（ステップＳ１４でＹ）、会議端末１５は、接続設定部として機能し、議事録サーバ１３に接続する（ステップＳ１５）。これにより、会議端末１５は、マイク２３により入力された音声にステップＳ１３で設定された音声処理を施した後、議事録サーバ１３に送信する。 After that, when the user operates the operation unit 29 to start the conference (Y in step S14), the conference terminal 15 functions as a connection setting unit and connects to the minutes server 13 (step S15). As a result, the conference terminal 15 performs the voice processing set in step S13 on the voice input by the microphone 23, and then transmits the voice to the minutes server 13.

ステップ１３ではエコーキャンセラをオフしている。このため、ステップＳ１５において会議端末１５は、マイク２３に入力された音声にエコーキャンセラを実行しない。結果、会議端末１５は、連結された会議端末１４１のスピーカ２４から出力されてマイク２３に入力された他拠点の参加者の発話である第２の音声が除去されず、自拠点の参加者からマイク２３に直接入力された発話である第１の音声と第２の音声とが議事録サーバ１３に送信される。 In step 13, the echo canceller is turned off. Therefore, in step S15, the conference terminal 15 does not execute the echo canceller on the voice input to the microphone 23. As a result, the conference terminal 15 does not remove the second voice, which is the utterance of the participant of the other base, which is output from the speaker 24 of the connected conference terminal 141 and input to the microphone 23, and the participant of the own base does not remove the second voice. The first voice and the second voice, which are utterances directly input to the microphone 23, are transmitted to the minutes server 13.

ステップＳ１３ではノイズサプレッサーも議事録作成用に弱めに設定されている。このため、ステップＳ１５において会議端末１５は、スピーカ２４から出力されてマイク２３に入力された他拠点の参加者の発話がノイズとして除去されることがない。これにより、一つの会議端末１５から、複数の会議端末１４１～１４ｎにより取得した音声が議事録サーバ１３に出力される。 In step S13, the noise suppressor is also set to be weak for creating minutes. Therefore, in step S15, the conference terminal 15 does not remove the utterances of the participants at other bases output from the speaker 24 and input to the microphone 23 as noise. As a result, the voice acquired by the plurality of conference terminals 141 to 14n is output from one conference terminal 15 to the minutes server 13.

その後、会議が終了すると（ステップＳ１６でＹ）、会議端末１５は、連結部３０が連結解除を検出したか否かを判定する（ステップＳ１７）。 After that, when the conference is completed (Y in step S16), the conference terminal 15 determines whether or not the connection unit 30 has detected the disconnection (step S17).

連結部３０が連結解除を検出したと判定すると（ステップＳ１７でＹ）、会議端末１５は、ステップＳ１に戻る。一方、連結部３０が連結解除を検出していないと判定すると（ステップＳ１７でＮ）、会議端末１５は、ステップＳ１３に戻り、音響処理の設定を保持する。 When it is determined that the connecting unit 30 has detected the disconnection (Y in step S17), the conference terminal 15 returns to step S1. On the other hand, if it is determined that the connection unit 30 has not detected the disconnection (N in step S17), the conference terminal 15 returns to step S13 and retains the setting of the acoustic processing.

これに対して、会議端末１５は、連結部３０が連結を検出していないと判定すると（ステップＳ１１でＮ）、ステップＳ１８に進む。ステップＳ１８では、会議端末１５は、図４に示すステップＳ４と同様に、スピーカ２４を通常音量にし、エコーキャンセラをオンし、ノイズサプレッサーを会議用に設定する。 On the other hand, when the conference terminal 15 determines that the connection unit 30 has not detected the connection (N in step S11), the conference terminal 15 proceeds to step S18. In step S18, the conference terminal 15 sets the speaker 24 to a normal volume, turns on the echo canceller, and sets the noise suppressor for the conference, as in step S4 shown in FIG.

その後、会議が開始すると（ステップＳ１９でＹ）、会議端末１５は会議サーバ１２に接続する（ステップＳ２０）。これにより、会議端末１５は、他の会議端末１４１～１４ｎと連結していないときは、他の会議端末１４１～１４ｎと同様に画像やマイク２３により入力された自拠点の発話者の第１の音声のみを会議サーバ１２に送信する。 After that, when the conference starts (Y in step S19), the conference terminal 15 connects to the conference server 12 (step S20). As a result, when the conference terminal 15 is not connected to the other conference terminals 141 to 14n, the first speaker of the own base input by the image or the microphone 23 is the same as the other conference terminals 141 to 14n. Only the voice is transmitted to the conference server 12.

その後、会議が終了すると（ステップＳ２１でＹ）、会議端末１５は、連結部３０が連結解除を検出したか否かを判定する（ステップＳ２２）。 After that, when the conference is completed (Y in step S21), the conference terminal 15 determines whether or not the connection unit 30 has detected the disconnection (step S22).

連結部３０が連結解除を検出したと判定すると（ステップＳ２２でＹ）、会議端末１５は、ステップＳ１に戻る。一方、連結部３０が連結解除を検出していないと判定すると（ステップＳ２２でＮ）、会議端末１５はステップＳ１８に戻り音響処理の設定を保持する。 When it is determined that the connecting unit 30 has detected the disconnection (Y in step S22), the conference terminal 15 returns to step S1. On the other hand, when it is determined that the connection unit 30 has not detected the disconnection (N in step S22), the conference terminal 15 returns to step S18 and holds the setting of the acoustic processing.

上述した実施形態によれば、会議端末１５は、マイク２３に入力された自拠点の参加者の発話である第１の音声と、連結された他の会議端末１４１が備えたスピーカ２４から出力されてマイク２３に入力された他拠点の参加者の発話である第２の音声と、に対して議事録作成用の音響処理を実行する。そして、会議端末１５は、議事録作成用の音響処理が実行された後の第１の音声と第２の音声とを議事録サーバ１３に送信する。これにより、各拠点に配置された会議端末１４１～１４ｎは、その拠点にいる発話者の音声を議事録サーバ１３に送信する必要がなく、議事録作成用の音響処理を実行する必要がない。よって、各拠点の会議端末１４１～１４ｎの一つに当該会議端末１５を連結するだけで、各拠点の会議端末１４１～１４ｎがＤＳＰ２５Ｂを一つしか備えない場合であっても、議事録の作成と、参加者に聞き取りやすい音声出力と、を両立できる。 According to the above-described embodiment, the conference terminal 15 is output from the first voice input to the microphone 23, which is the utterance of the participant at the own base, and the speaker 24 provided by the other connected conference terminal 141. Then, the sound processing for creating the minutes is executed for the second voice input to the microphone 23, which is the utterance of the participant of the other base. Then, the conference terminal 15 transmits the first voice and the second voice after the sound processing for creating the minutes is executed to the minutes server 13. As a result, the conference terminals 141 to 14n arranged at each base do not need to transmit the voice of the speaker at the base to the minutes server 13, and do not need to execute the acoustic processing for creating the minutes. Therefore, by simply connecting the conference terminal 15 to one of the conference terminals 141 to 14n of each base, even if the conference terminals 141 to 14n of each base have only one DSP 25B, the minutes can be created. And the audio output that is easy for the participants to hear can be achieved at the same time.

また、上述した実施形態によれば、会議端末１５は、連結が検出されていない場合、接続先を会議サーバ１２に設定し、連結が検出された場合、接続先を議事録サーバ１３に設定している。また、会議端末１５は、送信先が議事録サーバ１３に設定されている場合、議事録作成用の音響処理を実行し、送信先が会議サーバ１２に設定されている場合、会議用の音響処理を実行する。これにより、会議端末１５は、他の会議端末１４１～１４ｎと連結していないときは、他の会議端末１４１～１４ｎと同様に画像やマイク２３により入力された自拠点の発話者の第１の音声のみを会議サーバ１２に送信する。 Further, according to the above-described embodiment, the conference terminal 15 sets the connection destination to the conference server 12 when the connection is not detected, and sets the connection destination to the minutes server 13 when the connection is detected. ing. Further, the conference terminal 15 executes acoustic processing for creating minutes when the destination is set to the minutes server 13, and acoustic processing for conferences when the destination is set to the conference server 12. To execute. As a result, when the conference terminal 15 is not connected to the other conference terminals 141 to 14n, the first speaker of the own base input by the image or the microphone 23 is the same as the other conference terminals 141 to 14n. Only the voice is transmitted to the conference server 12.

また、上述した実施形態によれば、会議用の音響処理は、マイク２３に入力された音声から第２の音声を除去するエコーキャンセラを含み、議事録作成用の音響処理は、エコーキャンセラを含まない。これにより、より精度よく、議事録の作成と、参加者に聞き取りやすい音声出力と、を両立できる。 Further, according to the above-described embodiment, the sound processing for the conference includes an echo canceller that removes the second sound from the sound input to the microphone 23, and the sound processing for creating the minutes includes an echo canceller. do not have. As a result, it is possible to more accurately create minutes and output audio that is easy for participants to hear.

また、上述した実施形態によれば、会議用の音響処理及び議事録作成用の音響処理は、周辺ノイズを除去するノイズサプレッサーを含み、議事録作成用のノイズサプレッサーと、会議用のノイズサプレッサーと、は除去効果が異なるように設定されている。これにより、より精度よく、議事録の作成と、参加者に聞き取りやすい音声出力と、を両立できる。 Further, according to the above-described embodiment, the acoustic processing for the conference and the acoustic processing for creating the minutes include a noise suppressor for removing ambient noise, and the noise suppressor for creating the minutes and the noise suppressor for the conference. , Are set to have different removal effects. As a result, it is possible to more accurately create minutes and output audio that is easy for participants to hear.

なお、上述した実施形態によれば、会議端末１５は、議事録作成用の音響処理と会議用の音響処理との間で切り替えることできるようにしていたが、これに限ったものではない。会議端末１５としては、議事録作成用の音響処理のみが行えるようなものであってもよい。 According to the above-described embodiment, the conference terminal 15 is capable of switching between the acoustic processing for creating minutes and the acoustic processing for conference, but the present invention is not limited to this. The conference terminal 15 may be such that only sound processing for creating minutes can be performed.

また、上述した実施形態によれば、会議端末１４１～１４ｎは、会議用の音響処理のみ行っていたが、これに限ったものではない。会議端末１４１～１４ｎは、会議端末１５と同様に、議事録作成用の音響処理と会議用の音響処理との間で切り替えることできるようにして、これら会議端末１４１～１４ｎがどれでも議事録サーバ１３と接続する端末にできるようにしてもよい。 Further, according to the above-described embodiment, the conference terminals 141 to 14n perform only the acoustic processing for the conference, but the present invention is not limited to this. Similar to the conference terminal 15, the conference terminals 141 to 14n can be switched between the sound processing for creating minutes and the sound processing for conferences, and any of these conference terminals 141 to 14n is a minutes server. It may be possible to make a terminal connected to 13.

なお、本発明は上記実施形態に限定されるものではない。即ち、本発明の骨子を逸脱しない範囲で種々変形して実施することができる。 The present invention is not limited to the above embodiment. That is, it can be variously modified and carried out within a range that does not deviate from the gist of the present invention.

１２会議サーバ
１３議事録サーバ
１５会議端末（通信端末）
２３マイク
２４スピーカ
２５ＣＰＵ（接続設定部）
２５ＢＤＳＰ（音響処理部）
２８送信部
３０連結部（検出部）
１４１会議端末（同じ拠点に配置された他の通信端末）
１４２～１４ｎ会議端末（他拠点に配置された通信端末） 12 Conference server 13 Minutes server 15 Conference terminal (communication terminal)
23 Microphone 24 Speaker 25 CPU (connection setting unit)
25B DSP (acoustic processing unit)
28 Transmission unit 30 Connection unit (detection unit)
141 Conference terminals (other communication terminals located at the same location)
142-14n conference terminal (communication terminal located at another base)

特開２０１３－１８３１８２号公報Japanese Unexamined Patent Publication No. 2013-183182

Claims

In a communication terminal that can connect to at least a conference server that relays voice between a plurality of communication terminals and a minutes server that creates minutes using the voice.
A detector that detects connections with other communication terminals located at your base, and
When the connection is not detected by the detection unit, the connection destination is set to the conference server, and when the detection unit detects the connection, the connection destination is set to the minutes server. ,
With Mike
With speakers
The first voice input to the microphone of the participant of the own base and the other base output from the speaker or the speaker of another communication terminal arranged at the own base and input to the microphone. When the connection destination is set to the minutes server for the second voice spoken by the participants, the sound processing for creating the minutes is executed, and the connection destination is the conference server. When set to, the sound processing unit that executes the sound processing for the conference and
The connection destination of the first voice and the second voice after the sound processing for creating the minutes or the sound processing for the conference is executed by the sound processing unit is set to the minutes server. A communication terminal comprising: a transmission unit that transmits to the minutes server, and if the connection destination is set to the conference server, transmission to the conference server.

The conference acoustic processing includes an echo canceller that removes the second voice from the voice input to the microphone.
The communication terminal according to claim 1, wherein the acoustic processing for creating minutes does not include the echo canceller.

The acoustic processing for the conference and the acoustic processing for creating the minutes include a noise suppressor that removes ambient noise.
The communication terminal according to claim 1 or 2 , wherein the noise suppressor for creating minutes is set to have a weaker removal effect than the noise suppressor for meetings.

With the conference server
With the minutes server
The communication terminal according to claim 1 connected to the minutes server, and
With other communication terminals located at the same base as the communication terminal connected to the conference server,
A conference system including a communication terminal arranged at another base for transmitting / receiving voice to / from the other communication terminal via the conference server.