JP2024031682A

JP2024031682A - Processing equipment, processing method and program

Info

Publication number: JP2024031682A
Application number: JP2022135384A
Authority: JP
Inventors: 修山田
Original assignee: Oki Electric Industry Co Ltd
Current assignee: Oki Electric Industry Co Ltd
Priority date: 2022-08-26
Filing date: 2022-08-26
Publication date: 2024-03-07

Abstract

【課題】ユーザに対して負担をかけずにハウリングを抑制することができる、処理装置、処理方法およびプログラムを提供する。【解決手段】ユーザにて使用される処理装置としての第１のＰＣ２０Ａであって、前記ユーザにて使用されるマイク３２から入力される信号に基づく入力音声情報と、他の処理装置としての第２のＰＣ２０Ｂから受信した受信音声情報とを比較して、第１の所定条件を満たす音声情報を有するかを判断する、第１処理部としてのハウリングチェック部２７ａと、前記入力音声情報における音量情報と前記受信音声情報における音量情報とを比較して、前記入力音声情報における音量情報が第２の所定条件を満たすと判断されたならば、前記受信音声情報の出力を下げる処理を行う第２処理部および第３処理部としての出力音声レベル制御部２７ｂと、を備える。【選択図】図４The present invention provides a processing device, a processing method, and a program that can suppress howling without imposing a burden on a user. [Solution] A first PC 20A as a processing device used by a user receives input audio information based on a signal input from a microphone 32 used by the user, and a first PC 20A as a processing device used by the user. a howling check unit 27a as a first processing unit that compares the received audio information received from the PC 20B of No. 2 and determines whether there is audio information that satisfies a first predetermined condition; and volume information in the input audio information. and volume information in the received audio information, and if it is determined that the volume information in the input audio information satisfies a second predetermined condition, a second process of reducing the output of the received audio information. and an output audio level control section 27b as a third processing section. [Selection diagram] Figure 4

Description

本発明は、処理装置、処理方法およびプログラムに関する。 The present invention relates to a processing device, a processing method, and a program.

Ｗｅｂ会議システムに使用される技術としてＷｅｂＲＴＣ（Web Real-Time Communication）が存在する。ＷｅｂＲＴＣは、ＨＴＭＬ（Hyper Text Markup Language）のＡＰＩ（Application Programming Interface）の一つであり、ソースコードが公開されているオープン規格である。ＷｅｂＲＴＣは、映像や音声などの大容量のデータをリアルタイムに送受信できることに加え、不特定多数の人がファイルなどを送受信することが可能な仕組みが備わっている（非特許文献１参照）。 WebRTC (Web Real-Time Communication) exists as a technology used in a web conference system. WebRTC is one of the APIs (Application Programming Interfaces) of HTML (Hyper Text Markup Language), and is an open standard whose source code is publicly available. In addition to being able to send and receive large amounts of data such as video and audio in real time, WebRTC is equipped with a mechanism that allows an unspecified number of people to send and receive files and the like (see Non-Patent Document 1).

ここで、Ｗｅｂ会議システムを利用する場合に、ハウリングが問題となるときがある。ハウリングは、スピーカから出た音が再びマイクで収音・増幅されてスピーカから出力されるという拡声のループによって、ある帯域の音が増幅されることにより発生する。例えば、同じ空間（一例は会議室）にいる複数人が各々のＰＣを用いてＷｅｂ会議システムに参加することなどによってハウリングが発生する。ハウリングを抑制する手段として「（１）参加者同士の距離を持たせる」、「（２）壁から離れて、反響を防ぐ」、「（３）音量をおさえる」、「（４）発言しない場合は、ミュートにする」などの対策が有効であることが知られている（非特許文献２参照）。 Here, when using a web conference system, howling may sometimes become a problem. Howling occurs when sound in a certain band is amplified by a loudspeaker loop in which the sound emitted from the speaker is picked up and amplified again by a microphone, and then output from the speaker. For example, howling occurs when multiple people in the same space (for example, a conference room) participate in a web conference system using their respective PCs. Measures to suppress feedback include ``(1) Keeping distance between participants,'' ``(2) Moving away from walls to prevent echoes,'' ``(3) Keeping volume down,'' and ``(4) Not speaking.'' It is known that countermeasures such as "muting the user" are effective (see Non-Patent Document 2).

“WebRTC SFUとは？多拠点同時接続を実現させる仕組みを解説”、[online]、エイネット株式会社、[令和４年７月２０日検索]、インターネット＜https://www.freshvoice.net/knowledge/word/6867/＞“What is WebRTC SFU? Explaining the mechanism for realizing simultaneous multi-site connection”, [online], E-net Co., Ltd., [searched on July 20, 2020], Internet <https://www.freshvoice.net /knowledge/word/6867/＞ “［ハウリングを防ぐ］WEB面談卓上集中パーテーション”、[online]、株式会社イプロス、[令和４年７月２０日検索]、インターネット＜https://www.ipros.jp/product/detail/2000616359＞“[Preventing feedback] Web interview tabletop intensive partition”, [online], Ipros Co., Ltd., [searched on July 20, 2020], Internet < https://www.ipros.jp/product/detail/2000616359 ＞

しかしながら、従来のハウリングを抑制する手段は、Ｗｅｂ会議システムのユーザに対応を委ねるものであるので、ユーザが煩わしさを感じる場合があった。 However, since the conventional means for suppressing howling leaves the responsibility to the user of the web conference system, the user may find it troublesome.

本発明は、前記課題に鑑みてなされたものであり、ユーザに対して負担をかけずにハウリングを抑制することができる、処理装置、処理方法およびプログラムを提供するものである。 The present invention has been made in view of the above problems, and provides a processing device, a processing method, and a program that can suppress howling without imposing a burden on the user.

前記課題を解決するため、本発明に係る処理装置は、ユーザにて使用される処理装置であって、前記ユーザにて使用されるマイクから入力される信号に基づく入力音声情報と、他の処理装置から受信した受信音声情報とを比較して、第１の所定条件を満たす音声情報を有するかを判断する、第１処理部と、前記第１処理部において前記第１の所定条件を満たす音声情報を有すると判断されたならば、前記入力音声情報における音量情報と前記受信音声情報における音量情報とを比較して、前記入力音声情報における音量情報が第２の所定条件を満たすか否かを判断する、第２処理部と、前記第２処理部において前記入力音声情報における音量情報が前記第２の所定条件を満たすと判断されたならば、前記受信音声情報の出力を下げる処理を行う、第３処理部と、を備えることを特徴とする。 In order to solve the above problems, a processing device according to the present invention is a processing device used by a user, which processes input audio information based on a signal input from a microphone used by the user, and other processing. a first processing unit that compares received audio information received from the device to determine whether there is audio information that satisfies a first predetermined condition; If it is determined that the input audio information has the information, the volume information in the input audio information is compared with the volume information in the received audio information to determine whether the volume information in the input audio information satisfies a second predetermined condition. a second processing unit that determines, and if the second processing unit determines that the volume information in the input audio information satisfies the second predetermined condition, perform a process of lowering the output of the received audio information; A third processing section.

また、本発明に係る処理方法は、ユーザにて使用される処理装置の処理方法であって、前記ユーザにて使用されるマイクから入力される信号に基づく入力音声情報と、他の処理装置から受信した受信音声情報とを比較して、第１の所定条件を満たす音声情報を有するかを判断する、第１処理ステップと、前記第１処理ステップにおいて前記第１の所定条件を満たす音声情報を有すると判断されたならば、前記入力音声情報における音量情報と前記受信音声情報における音量情報とを比較して、前記入力音声情報における音量情報が第２の所定条件を満たすか否かを判断する、第２処理ステップと、前記第２処理ステップにおいて前記入力音声情報における音量情報が前記第２の所定条件を満たすと判断されたならば、前記受信音声情報の出力を下げる処理を行う、第３処理ステップと、を有することを特徴とする。 Further, the processing method according to the present invention is a processing method for a processing device used by a user, in which input audio information based on a signal input from a microphone used by the user and input audio information from another processing device are provided. a first processing step of comparing received audio information to determine whether there is audio information that satisfies a first predetermined condition; If it is determined that there is, the volume information in the input audio information is compared with the volume information in the received audio information to determine whether the volume information in the input audio information satisfies a second predetermined condition. , a second processing step, and a third step of performing a process of lowering the output of the received audio information if it is determined in the second processing step that the volume information in the input audio information satisfies the second predetermined condition. It is characterized by having a processing step.

また、本発明に係るプログラムは、ユーザにて使用されるコンピュータを、前記ユーザにて使用されるマイクから入力される信号に基づく入力音声情報と、他の処理装置から受信した受信音声情報とを比較して、第１の所定条件を満たす音声情報を有するかを判断する、第１処理部、前記第１処理部において前記第１の所定条件を満たす音声情報を有すると判断されたならば、前記入力音声情報における音量情報と前記受信音声情報における音量情報とを比較して、前記入力音声情報における音量情報が第２の所定条件を満たすか否かを判断する、第２処理部、前記第２処理部において前記入力音声情報における音量情報が前記第２の所定条件を満たすと判断されたならば、前記受信音声情報の出力を下げる処理を行う、第３処理部、として機能させる。 Further, the program according to the present invention allows a computer used by a user to receive input audio information based on a signal input from a microphone used by the user and received audio information received from another processing device. a first processing unit that compares and determines whether the audio information satisfies the first predetermined condition; if the first processing unit determines that the audio information satisfies the first predetermined condition; a second processing unit that compares the volume information in the input audio information with the volume information in the received audio information and determines whether the volume information in the input audio information satisfies a second predetermined condition; If the second processing section determines that the volume information in the input audio information satisfies the second predetermined condition, the second processing section functions as a third processing section that performs a process of lowering the output of the received audio information.

本発明によれば、ユーザに対して負担をかけずにハウリングを抑制することができる。 According to the present invention, howling can be suppressed without imposing a burden on the user.

本発明の実施形態に係るＷｅｂ会議システムの概略構成図である。1 is a schematic configuration diagram of a web conference system according to an embodiment of the present invention. 本発明の実施形態に係るＷｅｂ会議システムが有するハウリング抑制に関する機能を説明するための図である。FIG. 2 is a diagram for explaining a function related to howling suppression that the web conference system according to the embodiment of the present invention has. 遅延測定用のパケットの送受信の一例である。This is an example of transmission and reception of packets for delay measurement. ハウリング抑制処理部の構成例である。It is an example of a structure of a howling suppression processing part. 音声のマッチング処理の一例である。This is an example of voice matching processing. ハウリング抑制処理部の処理のイメージ図である。FIG. 3 is an image diagram of processing by a howling suppression processing section.

以下、本発明を実施するための形態を、適宜図面を参照しながら詳細に説明する。各図は、本発明を十分に理解できる程度に、概略的に示してあるに過ぎない。よって、本発明は、図示例のみに限定されるものではない。また、本実施形態では、本発明と直接的に関連しない構成や周知な構成については、説明を省略する場合がある。なお、各図において、共通する構成要素や同様な構成要素については、同一の符号を付し、それらの重複する説明を省略する。 DESCRIPTION OF THE PREFERRED EMBODIMENTS Hereinafter, embodiments for carrying out the present invention will be described in detail with reference to the drawings as appropriate. The figures are only shown schematically to provide a thorough understanding of the invention. Therefore, the present invention is not limited to the illustrated example. Furthermore, in this embodiment, descriptions of configurations that are not directly related to the present invention or well-known configurations may be omitted. In each figure, common or similar components are designated by the same reference numerals, and their overlapping explanations will be omitted.

≪実施形態に係る修繕支援システムの構成≫
図１を参照して、実施形態に係るＷｅｂ会議システム１の構成について説明する。図１は、Ｗｅｂ会議システム１の概略構成図である。Ｗｅｂ会議システム１は、インターネットを通じてビデオ通話（映像・音声のやり取り）、音声通話（音声のやり取り）を行うことができるシステムである。Ｗｅｂ会議システム１は、資料の共有を可能にする機能を備えてもよい。Ｗｅｂ会議システム１を用いることで、例えば遠隔地にいる相手とリアルタイムで会議を行うことができる。ここで、Ｗｅｂ会議システム１が音声通話（音声のやり取り）を行うことができるシステムとして使用される場合、Ｗｅｂ会議システム１における構成は、例えば、構成１または構成２のようになる。構成１では、ビデオ通話（映像・音声のやり取り）における音声のやり取りのみの機能が使用される構成である。構成２では、音声通話（音声のやり取り）のみの機能を有している構成である。なお、以下の本実施形態のＷｅｂ会議システム１の構成では、一例として、構成１をベースにして記載される。 <<Configuration of repair support system according to embodiment>>
With reference to FIG. 1, the configuration of a web conference system 1 according to an embodiment will be described. FIG. 1 is a schematic configuration diagram of a web conference system 1. As shown in FIG. The web conference system 1 is a system that allows video calls (video/audio exchanges) and voice calls (audio exchanges) to be made over the Internet. The web conference system 1 may have a function that enables sharing of materials. By using the web conference system 1, it is possible to hold a conference in real time with a remote party, for example. Here, when the web conference system 1 is used as a system that can perform voice calls (audio exchanges), the configuration of the web conference system 1 is, for example, Configuration 1 or Configuration 2. In configuration 1, only the function of exchanging audio in a video call (exchanging video and audio) is used. Configuration 2 is a configuration that has only a voice call (voice exchange) function. Note that the configuration of the web conference system 1 of the present embodiment below will be described based on configuration 1 as an example.

本実施形態のＷｅｂ会議システム１は、ＷｅｂＲＴＣ技術が使用されている。ＷｅｂＲＴＣは、映像や音声などの大容量のデータをリアルタイムに送受信でき、また、不特定多数の人がファイルなどを送受信することが可能な仕組みが備わっている。 The Web conference system 1 of this embodiment uses WebRTC technology. WebRTC can send and receive large amounts of data such as video and audio in real time, and also has a mechanism that allows an unspecified number of people to send and receive files.

図１に示すように、Ｗｅｂ会議システム１は、サーバ１０と、第１のＰＣ２０Ａと、第２のＰＣ２０Ｂと、を有する。図１では、Ｗｅｂ会議システム１が有するＰＣの数が二つであるが、ＰＣの数は特に限定されず、三つ以上であってもよい。第１のＰＣ２０Ａと第２のＰＣ２０Ｂとは、同様の機能構成であってよい。第１のＰＣ２０Ａと第２のＰＣ２０Ｂとを区別せずに説明する場合に、まとめて「ＰＣ２０」と表記する場合がある。 As shown in FIG. 1, the web conference system 1 includes a server 10, a first PC 20A, and a second PC 20B. In FIG. 1, the number of PCs that the web conference system 1 has is two, but the number of PCs is not particularly limited and may be three or more. The first PC 20A and the second PC 20B may have similar functional configurations. When describing the first PC 20A and the second PC 20B without distinguishing them, they may be collectively referred to as "PC 20."

図１に示すように、サーバ１０は、第１のＰＣ２０Ａおよび第２のＰＣ２０Ｂとネットワーク（図示せず）を介して通信可能である。例えば、サーバ１０は、第１のＰＣ２０Ａ以外のＰＣ（図１では第２のＰＣ２０Ｂ）から映像及び音声を受信し、第１のＰＣ２０Ａへそれらを送信する。また、サーバ１０は、第２のＰＣ２０Ｂ以外のＰＣ（図１では第１のＰＣ２０Ａ）から映像及び音声を受信し、第２のＰＣ２０Ｂへそれらを送信する。 As shown in FIG. 1, the server 10 can communicate with a first PC 20A and a second PC 20B via a network (not shown). For example, the server 10 receives video and audio from a PC other than the first PC 20A (second PC 20B in FIG. 1), and transmits them to the first PC 20A. Further, the server 10 receives video and audio from a PC other than the second PC 20B (first PC 20A in FIG. 1), and transmits them to the second PC 20B.

本実施形態のＷｅｂ会議システム１は、ＳＦＵ（Selective Forwarding Unit）方式を想定する。ＳＦＵ方式におけるサーバ１０は、ＰＣ２０から送信される映像のルーティングを行うのみであり、サーバ１０側では映像及び音声の復号、合成、再符号化を行わない。つまり、サーバ１０は、ＰＣ２０から送られてくる映像及び音声を、当該映像及び当該音声を必要とする他のＰＣ２０宛てにそのまま送信する。そのため、サーバ側で映像及び音声の復号、合成、再符号化を行うＭＣＵ方式に比べて遅延が少ないというメリットがある。 The web conference system 1 of this embodiment assumes an SFU (Selective Forwarding Unit) system. In the SFU system, the server 10 only performs the routing of the video transmitted from the PC 20, and the server 10 side does not decode, synthesize, or re-encode the video and audio. In other words, the server 10 directly transmits the video and audio sent from the PC 20 to another PC 20 that requires the video and audio. Therefore, it has the advantage of less delay than the MCU method, in which video and audio are decoded, synthesized, and re-encoded on the server side.

なお、ＰＣ２０（第１のＰＣ２０Ａおよび第２のＰＣ２０Ｂ）は、「処理装置」および「クライアント端末」の一例であり、「処理装置」および「クライアント端末」の一例は、ＰＣ２０に限るものではない。
また、サーバ１０は、「中継装置」の一例であり、「中継装置」の一例は、サーバ１０に限るものではない。 Note that the PC 20 (first PC 20A and second PC 20B) is an example of a "processing device" and a "client terminal," and examples of the "processing device" and "client terminal" are not limited to the PC 20.
Further, the server 10 is an example of a "relay device," and the example of a "relay device" is not limited to the server 10.

図１に示すように、ＰＣ２０は、カメラ３１と、マイク３２と、表示部３３と、スピーカ３４と、データ通信可能に接続されている。ＰＣ２０は、カメラ３１、マイク３２、表示部３３およびスピーカ３４の一部または全部を内蔵する構成であってもよい。ＰＣ２０は、サーバ１０に、自身で取得した映像（カメラ３１から取得した映像）及び音声（マイク３２から取得した音声）を送信する。また、ＰＣ２０（例えば、ＰＣ２０Ａ）は、サーバ１０から、自身以外のＰＣ２０（例えば、ＰＣ２０Ｂ）から送信された映像及び音声を受信する。 As shown in FIG. 1, the PC 20 is connected to a camera 31, a microphone 32, a display section 33, and a speaker 34 for data communication. The PC 20 may include a part or all of a camera 31, a microphone 32, a display section 33, and a speaker 34. The PC 20 transmits to the server 10 the video that it has acquired (the video that has been acquired from the camera 31) and the audio (the audio that has been acquired from the microphone 32). Further, the PC 20 (eg, PC 20A) receives, from the server 10, video and audio transmitted from other PCs 20 (eg, PC 20B).

ＰＣ２０は、受信した映像を表示部３３に出力する。例えば、ＰＣ２０は、受信した映像が１つである場合、受信した映像を表示部３３に出力する。また、ＰＣ２０は、受信した映像が複数である場合、受信した各ＰＣ２０の映像を並べて表示部３３に出力する。また、ＰＣ２０（例えば、ＰＣ２０Ａ）は、自身で取得した映像を表示部３３に表示する場合、自身で取得した映像を、他のＰＣ２０（例えばＰＣ２０Ｂ）から受信した映像とともに表示部３３に出力する。また、ＰＣ２０は、受信した音声をスピーカ３４に出力する。音声の出力に関する処理の詳細は後述する。 The PC 20 outputs the received video to the display unit 33. For example, when the number of received videos is one, the PC 20 outputs the received video to the display unit 33. Moreover, when the received videos are plural, the PC 20 outputs the received videos of each PC 20 side by side to the display unit 33 . Further, when the PC 20 (for example, PC 20A) displays the video acquired by itself on the display unit 33, the PC 20 (for example, PC 20A) outputs the video acquired by itself to the display unit 33 together with the video received from another PC 20 (for example, PC 20B). Further, the PC 20 outputs the received audio to the speaker 34. Details of the processing related to audio output will be described later.

次に、サーバ１０のビデオ通話に関する機能構成について説明する。図１に示すように、サーバ１０は、受信部１１と、送信部１２とを有する。サーバ１０は、ＣＰＵ（Central Processing Unit）、ＲＯＭ（Read Only Memory）、ストレージ、ＲＡＭ（Random Access Memory）を備え、各機能構成（受信部１１および送信部１２）は、ＣＰＵがＲＯＭ又はストレージに記憶された処理プログラムを読み出し、ＲＡＭに展開して実行することにより実現される。 Next, the functional configuration of the server 10 regarding video calls will be explained. As shown in FIG. 1, the server 10 includes a receiving section 11 and a transmitting section 12. The server 10 includes a CPU (Central Processing Unit), a ROM (Read Only Memory), a storage, and a RAM (Random Access Memory), and each functional configuration (reception section 11 and transmission section 12) is stored by the CPU in the ROM or storage. This is realized by reading out the processed processing program, expanding it into RAM, and executing it.

受信部１１は、ＰＣ２０（図１ではＰＣ２０Ａ，２０Ｂ）から、映像及び音声を受信する。
送信部１２は、第１のＰＣ２０Ａに対して、受信部１１で受信したＰＣ２０（図１ではＰＣ２０Ａ，２０Ｂ）の映像及び音声のうち、第１のＰＣ２０Ａ以外（つまり、第２のＰＣ２０Ｂ）の映像及び音声を送信する。また、送信部１２は、第２のＰＣ２０Ｂに対して、受信部１１で受信したＰＣ２０（図１ではＰＣ２０Ａ，２０Ｂ）の映像及び音声のうち、第２のＰＣ２０Ｂ以外（つまり、第１のＰＣ２０Ａ）の映像及び音声を送信する。 The receiving unit 11 receives video and audio from the PC 20 (PCs 20A and 20B in FIG. 1).
The transmitting unit 12 transmits to the first PC 20A, among the video and audio of the PC 20 (PCs 20A and 20B in FIG. 1) received by the receiving unit 11, the video of the other PC 20A (that is, the second PC 20B). and transmit audio. In addition, the transmitter 12 transmits the video and audio of the PCs 20 (PCs 20A and 20B in FIG. 1) received by the receiver 11 to the second PC 20B other than the second PC 20B (that is, the first PC 20A). Send video and audio.

次に、ＰＣ２０のビデオ通話に関する機能構成について説明する。図１に示すように、ＰＣ２０は、符号化部２１と、送信部２２と、受信部２３と、復号部２４と、出力処理部２５とを有する。ＰＣ２０は、ＣＰＵ、ＲＯＭ、ストレージ、ＲＡＭを備え、各機能構成（符号化部２１、送信部２２、受信部２３、復号部２４および出力処理部２５）は、ＣＰＵがＲＯＭ又はストレージに記憶された処理プログラムを読み出し、ＲＡＭに展開して実行することにより実現される。 Next, the functional configuration of the PC 20 regarding video calls will be explained. As shown in FIG. 1, the PC 20 includes an encoding section 21, a transmitting section 22, a receiving section 23, a decoding section 24, and an output processing section 25. The PC 20 includes a CPU, ROM, storage, and RAM, and each functional configuration (encoding section 21, transmitting section 22, receiving section 23, decoding section 24, and output processing section 25) is configured such that the CPU is stored in the ROM or storage. This is achieved by reading a processing program, expanding it to RAM, and executing it.

符号化部２１は、ＰＣ自身に接続されたカメラ３１で取得した映像を符号化する。また、符号化部２１は、ＰＣ自身に接続されたマイク３２で取得した音声を符号化する。
送信部２２は、サーバ１０に、符号化された映像および符号化された音声を送信する。 The encoding unit 21 encodes an image obtained by a camera 31 connected to the PC itself. Furthermore, the encoding unit 21 encodes the audio acquired by the microphone 32 connected to the PC itself.
The transmitter 22 transmits encoded video and encoded audio to the server 10.

受信部２３は、他のＰＣで符号化された映像および符号化された音声を受信する。例えば、ＰＣ２０Ａの受信部２３は、他のＰＣであるＰＣ２０Ｂで符号化された映像および符号化された音声を、サーバ１０を介して受信する。また、ＰＣ２０Ｂの受信部２３は、他のＰＣであるＰＣ２０Ａで符号化された映像および符号化された音声を、サーバ１０を介して受信する。
復号部２４は、符号化された映像および符号化された音声を復号する。 The receiving unit 23 receives encoded video and encoded audio from other PCs. For example, the receiving unit 23 of the PC 20A receives, via the server 10, video and audio encoded by the PC 20B, which is another PC. Further, the receiving unit 23 of the PC 20B receives, via the server 10, the video and audio encoded by the PC 20A, which is another PC.
The decoding unit 24 decodes encoded video and encoded audio.

出力処理部２５は、復号された映像が１つである場合（他のＰＣが１つの場合）、復号された映像を所定のレイアウトに割り当てた映像を表示部３３に出力する。また、出力処理部２５は、復号された映像が複数である場合（他のＰＣが複数である場合）、復号された各映像を所定のレイアウトに割り当てた映像を表示部３３に出力する。
また、出力処理部２５は、復号された音声が１つである場合（他のＰＣが１つの場合）、復号された音声をスピーカに出力する。また、出力処理部２５は、復号された音声が複数である場合（他のＰＣが複数である場合）、各音声についてミキシングの処理を行い、ミキシング処理された音声をスピーカ３４に出力する。 When there is one decoded video (when there is only one other PC), the output processing unit 25 outputs a video in which the decoded video is allocated to a predetermined layout to the display unit 33. Furthermore, when there is a plurality of decoded videos (when there are a plurality of other PCs), the output processing unit 25 outputs a video in which each decoded video is assigned to a predetermined layout to the display unit 33.
Further, when the number of decoded voices is one (when there is one other PC), the output processing unit 25 outputs the decoded voices to the speaker. Further, when there are a plurality of decoded voices (when there are a plurality of other PCs), the output processing unit 25 performs a mixing process on each voice, and outputs the mixed voice to the speaker 34.

次に、図２を参照して、実施形態に係るＷｅｂ会議システム１が有するハウリング抑制に関する機能（ハウリング抑制システム１ａ）について説明する。図２は、ハウリング抑制に関する機能（ハウリング抑制システム１ａ）を説明するための図である。 Next, with reference to FIG. 2, a function related to howling suppression (howling suppression system 1a) included in the web conference system 1 according to the embodiment will be described. FIG. 2 is a diagram for explaining a function related to howling suppression (howling suppression system 1a).

図２に示すハウリング抑制システム１ａは、ＰＣ２０の入力音声から他のＰＣ２０が近接しているかを判定し、近接するＰＣ２０がある場合にハウリングの発生を抑制する制御を行う。ハウリング抑制システム１ａは、遅延測定処理部２６と、ハウリング抑制処理部２７とを備える。各機能構成（遅延測定処理部２６およびハウリング抑制処理部２７）は、ＣＰＵがＲＯＭ又はストレージに記憶された処理プログラムを読み出し、ＲＡＭに展開して実行することにより実現される。遅延測定処理部２６およびハウリング抑制処理部２７は、各々のＰＣ２０に設けられる。 The howling suppression system 1a shown in FIG. 2 determines whether another PC 20 is nearby from the input audio of the PC 20, and performs control to suppress the occurrence of howling when there is a nearby PC 20. The howling suppression system 1a includes a delay measurement processing section 26 and a howling suppression processing section 27. Each functional configuration (delay measurement processing unit 26 and howling suppression processing unit 27) is realized by the CPU reading out a processing program stored in the ROM or storage, loading it into the RAM, and executing it. The delay measurement processing section 26 and the howling suppression processing section 27 are provided in each PC 20.

なお、遅延測定処理部２６は、「第４の処理部」の一例であり、「第４の処理部」の一例は、遅延測定処理部２６に限るものではない。
また、ハウリング抑制処理部２７は、「第１の処理部」、「第２の処理部」および「第３の処理部」の一例であり、「第１の処理部」、「第２の処理部」および「第３の処理部」の一例は、ハウリング抑制処理部２７に限るものではない。 Note that the delay measurement processing section 26 is an example of a "fourth processing section", and an example of the "fourth processing section" is not limited to the delay measurement processing section 26.
Furthermore, the howling suppression processing section 27 is an example of a "first processing section," a "second processing section," and a "third processing section." Examples of the "third processing section" and the "third processing section" are not limited to the howling suppression processing section 27.

図２に示す遅延測定処理部２６は、遅延測定用のパケットを送受信することで、サーバ１０を中継したＰＣ２０間の通信遅延を測定する。例えば、遅延測定処理部２６は、他のＰＣ２０に対して遅延測定用のパケットを送信する。そのパケットを受け取った他のＰＣ２０は、送信元のＰＣ２０に対してそのまま応答を返信する。遅延測定用のパケットは、音声を中継する場合と同じサーバ１０を中継する。中継するサーバ１０は、遅延測定用のパケットを単に中継するのみである。遅延測定用のパケットには、例えば送信時刻、受信時刻、送信元および送信先であるＰＣ２０（図１ではＰＣ２０Ａ，２０Ｂ）の識別情報などが格納される。遅延測定用のパケットを送受信するプロトコルは特に限定されず、音声を中継する場合と同じサーバ１０を中継することが可能であればよい。例えば、「Ping」コマンドを用いて遅延測定用のパケットを送受信してもよい。 The delay measurement processing unit 26 shown in FIG. 2 measures the communication delay between the PCs 20 relayed through the server 10 by transmitting and receiving delay measurement packets. For example, the delay measurement processing unit 26 transmits a packet for delay measurement to other PCs 20. The other PC 20 that received the packet sends a response back to the sender PC 20 as is. Packets for delay measurement are relayed through the same server 10 that is used for relaying audio. The relaying server 10 simply relays packets for delay measurement. The delay measurement packet stores, for example, transmission time, reception time, identification information of the PC 20 (PC 20A, 20B in FIG. 1) which is the transmission source and transmission destination, and the like. The protocol for transmitting and receiving packets for delay measurement is not particularly limited, as long as it can be relayed through the same server 10 used for relaying audio. For example, a packet for delay measurement may be transmitted and received using a "Ping" command.

遅延測定用のパケットの送受信の流れを図３に示す。図３は、遅延測定用のパケットの送受信の一例である。図３では、第１のＰＣ２０Ａが遅延測定用のパケットを送信する場合を例示している。図示は省略するが、第２のＰＣ２０Ｂも同様に遅延測定用のパケットを送信することが可能である。 FIG. 3 shows the flow of transmitting and receiving packets for delay measurement. FIG. 3 is an example of transmission and reception of packets for delay measurement. FIG. 3 illustrates a case where the first PC 20A transmits a delay measurement packet. Although not shown, the second PC 20B can similarly transmit packets for delay measurement.

図３に示すように、第１のＰＣ２０Ａの遅延測定処理部２６は、送信部２２を介してサーバ１０に遅延測定用のパケットを送信する。サーバ１０は、受信部１１および送信部１２を介して第２のＰＣ２０Ｂに遅延測定用のパケットを転送する。第２のＰＣ２０Ｂは、受信部２３で遅延測定用のパケットを受信し、送信部２２を介して遅延測定用のパケットを第１のＰＣ２０Ａに対してそのまま応答する。応答された遅延測定用のパケットは、行きと同様にサーバ１０を介して第１のＰＣ２０Ａに転送される。第１のＰＣ２０Ａの遅延測定処理部２６は、受信部２３を介して遅延測定用のパケットを受信し、サーバ１０を中継したＰＣ間の通信遅延の時間ｔｄを求める。 As shown in FIG. 3, the delay measurement processing unit 26 of the first PC 20A transmits a delay measurement packet to the server 10 via the transmission unit 22. The server 10 transfers the delay measurement packet to the second PC 20B via the receiving section 11 and the transmitting section 12. The second PC 20B receives the delay measurement packet at the reception unit 23, and directly responds with the delay measurement packet to the first PC 20A via the transmission unit 22. The responded delay measurement packet is transferred to the first PC 20A via the server 10 in the same way as the outbound packet. The delay measurement processing unit 26 of the first PC 20A receives the delay measurement packet via the reception unit 23, and calculates the communication delay time td between the PCs relayed through the server 10.

遅延測定処理部２６は、例えば以下の式によってサーバ１０を中継したＰＣ間の通信遅延の時間ｔｄを求める。
・通信遅延の時間ｔｄ＝（遅延測定用のパケットを受信した時刻－遅延測定用のパケットを送信した時刻時間）／２ The delay measurement processing unit 26 calculates the communication delay time td between the PCs relayed through the server 10, for example, using the following equation.
・Communication delay time td = (time at which the delay measurement packet was received - time at which the delay measurement packet was sent)/2

ここで、遅延測定処理部２６は、次の手順１～手順４より、通信遅延の時間ｔｄを求めてもよい。
＜手順１＞遅延測定処理部２６は、サーバ１０を中継したＰＣ間の通信遅延の時間を、複数の回数（Ｎ回数（Ｎは正の整数値））を測定する。
＜手順２＞遅延測定処理部２６は、各回数にて測定された通信遅延の時間ｔｄ（１回目の測定）～通信遅延の時間ｔｄ（Ｎ回目の測定）を、上述の式より求める。
＜手順３＞遅延測定処理部２６は、通信遅延の時間ｔｄ（１回目の測定）～通信遅延の時間ｔｄ（Ｎ回目の測定）における、通信遅延の平均時間を求める。
＜手順４＞遅延測定処理部２６は、求めた通信遅延の平均時間を、上述の通信遅延の時間ｔｄとする。
遅延測定処理部２６は、求めた通信遅延の時間ｔｄをハウリング抑制処理部２７に伝える。 Here, the delay measurement processing unit 26 may obtain the communication delay time td from the following steps 1 to 4.
<Procedure 1> The delay measurement processing unit 26 measures the communication delay time between the PCs relayed through the server 10 a plurality of times (N times (N is a positive integer value)).
<Procedure 2> The delay measurement processing unit 26 calculates the communication delay time td (first measurement) to communication delay time td (Nth measurement) measured each time from the above-mentioned formula.
<Procedure 3> The delay measurement processing unit 26 calculates the average communication delay time from communication delay time td (first measurement) to communication delay time td (Nth measurement).
<Procedure 4> The delay measurement processing unit 26 sets the obtained average communication delay time as the above-mentioned communication delay time td.
The delay measurement processing unit 26 transmits the determined communication delay time td to the howling suppression processing unit 27.

遅延測定処理部２６は、遅延測定用のパケットを定期的に送信するのがよい。このようにすると、通信状態が時々刻々と変化する場合にも対応することが可能になるのでよい。ここでの通信状態の変化は、例えば会議室内で使用されるＷｉ－Ｆｉの通信速度の変化、会社の拠点内のネットワーク（社内ＬＡＮ）の通信速度の変化、ＰＣ間の通信において使用されるネットワーク（ＩＰネットワーク）の通信速度の変化などである。 It is preferable that the delay measurement processing unit 26 periodically transmits packets for delay measurement. In this way, it is possible to cope with the case where the communication state changes from time to time. Changes in the communication status here include, for example, changes in the communication speed of Wi-Fi used in conference rooms, changes in the communication speed of the network within the company's base (internal LAN), and the network used for communication between PCs. (IP network) communication speed changes, etc.

図２に示すハウリング抑制処理部２７は、スピーカ３４に出力される出力音声にハウリングの原因となり得る音声が含まれている場合に、出力音声レベルを下げる処理を行う（出力音声レベルを「０（ゼロ）」にする場合も含む）。ハウリング抑制処理部２７は、例えば、自身のＰＣに接続されるマイク３２の入力音声と、他のＰＣに接続されるマイク３２の入力音声であり受信した受信音声とを比較し、同一音声（音量の差は許容する）が存在する場合に、自身のＰＣに接続されるスピーカ３４への出力音声レベルを下げる処理を実行する。音声の比較は、符号化されていない音声のデジタル信号を用いて行うのがよい。 The howling suppression processing unit 27 shown in FIG. 2 performs processing to lower the output audio level (the output audio level is set to "0 ( (including cases where it is set to ``zero)''). For example, the howling suppression processing unit 27 compares the input audio of the microphone 32 connected to its own PC with the received audio that is the input audio of the microphone 32 connected to another PC, and compares the same audio (volume). (allowing for a difference between the two), a process is executed to lower the output audio level to the speaker 34 connected to the own PC. The audio comparison is preferably performed using an unencoded audio digital signal.

ハウリング抑制処理部２７の構成例を図４に示す。図４は、ハウリング抑制処理部２７の構成例である。図４では、第１のＰＣ２０Ａのハウリング抑制処理部２７を例示して説明する。なお、第２のＰＣ２０Ｂも同様の構成である。
ハウリング抑制処理部２７は、ハウリングチェック部２７ａと、出力音声レベル制御部２７ｂとを備える。 An example of the configuration of the howling suppression processing section 27 is shown in FIG. FIG. 4 shows a configuration example of the howling suppression processing section 27. As shown in FIG. In FIG. 4, the howling suppression processing unit 27 of the first PC 20A will be described as an example. Note that the second PC 20B also has a similar configuration.
The howling suppression processing section 27 includes a howling check section 27a and an output audio level control section 27b.

なお、ハウリングチェック部２７ａは、「第１の処理部」および「第２の処理部」の一例であり、「第１の処理部」および「第２の処理部」の一例は、ハウリングチェック部２７ａに限るものではない。
また、出力音声レベル制御部２７ｂは、「第３の処理部」の一例であり、「第３の処理部」の一例は、出力音声レベル制御部２７ｂに限るものではない。 Note that the howling check section 27a is an example of a "first processing section" and a "second processing section", and an example of a "first processing section" and a "second processing section" is a howling check section. It is not limited to 27a.
Furthermore, the output audio level control section 27b is an example of a "third processing section", and the example of the "third processing section" is not limited to the output audio level control section 27b.

図４に示すハウリングチェック部２７ａは、第１のＰＣ２０Ａのマイク３２からの入力音声ＳＡと、第２のＰＣ２０Ｂのマイク３２から入力された入力音声ＳＢであってサーバ１０経由で受信した受信音声ＳＢａとを比較し、同じ音（音量の差異は許容する）が含まれているかを判定する。その際に、ハウリングチェック部２７ａは、ある程度の期間（例えば、現在の時刻（現時点）から過去１秒間～過去１０秒間）だけ第１のＰＣ２０Ａのマイク３２からの入力音声ＳＡをバッファ上に記憶し、バッファ上に記憶した入力音声ＳＡとサーバ１０経由で受信した受信音声ＳＢａとを比較する。第１のＰＣ２０Ａのマイク３２からの入力音声ＳＡをバッファ上に記憶する期間は、遅延測定処理部２６によって求められる通信遅延の時間ｔｄによって決定する。つまり、入力音声ＳＡと受信音声ＳＢａとの比較において、比較対象となる入力音声ＳＡおよび受信音声ＳＢａの期間は、ＰＣ２０間での通信遅延の時間ｔｄに基づいて決定される。 The howling check unit 27a shown in FIG. to determine whether they contain the same sound (differences in volume are allowed). At this time, the howling check unit 27a stores the input audio SA from the microphone 32 of the first PC 20A on the buffer for a certain period of time (for example, from 1 second to 10 seconds from the current time). , the input audio SA stored on the buffer and the received audio SBa received via the server 10 are compared. The period during which the input audio SA from the microphone 32 of the first PC 20A is stored on the buffer is determined by the communication delay time td determined by the delay measurement processing section 26. That is, in comparing the input audio SA and the received audio SBa, the periods of the input audio SA and the received audio SBa to be compared are determined based on the communication delay time td between the PCs 20.

ハウリングチェック部２７ａは、例えば音声の特徴をマッチングすることにより、入力音声ＳＡと受信音声ＳＢａとを比較する。音声のマッチング方法は特に限定されない。音声のマッチング処理の一例を図５に示す。 The howling check unit 27a compares the input sound SA and the received sound SBa by, for example, matching the characteristics of the sounds. The voice matching method is not particularly limited. FIG. 5 shows an example of voice matching processing.

図５に示すように、ハウリングチェック部２７ａは、サーバ１０を経由することによる通信遅延の時間を考慮して、時刻「ｔ_A」から入力を開始した第１のＰＣ２０Ａのマイク３２の入力音声ＳＡの音声波形と、時刻「ｔ_A＋ｔｄ１」から受信を開始した第２のＰＣ２０Ｂのマイク３２の入力音声ＳＢの音声波形（つまり、受信音声ＳＢａの音声波形）とを比較する（ステップＳ１）。ここで、「ｔｄ１」は、サーバ１０を中継したＰＣ間の通信遅延の時間である。 As shown in FIG. 5, the howling check unit 27a checks the input voice SA of the microphone 32 of the first PC 20A, which started inputting from time "t _A ", taking into account the communication delay time due to passing through the server 10. The voice waveform of the input voice SB of the microphone 32 of the second PC 20B (that is, the voice waveform of the received voice SBa) whose reception started from time "t _A +td1" is compared (step S1). Here, "td1" is the communication delay time between the PCs relayed by the server 10.

ハウリングチェック部２７ａ（図４参照）は、入力音声ＳＡおよび受信音声ＳＢａの音声波形に対して、音量を「－１」～「＋１」の範囲で正規化する（ステップＳ２）。
次に、ハウリングチェック部２７ａは、正規化された入力音声ＳＡおよび受信音声ＳＢａの音声波形の音量を絶対値に変換する（ステップＳ３）。
次に、ハウリングチェック部２７ａは、絶対値に変換された入力音声ＳＡおよび受信音声ＳＢａの音声波形の音量（正規化済）を、１個目～ｎ個目にてサンプリングする（ステップＳ４）。
そして、ハウリングチェック部２７ａは、１個目～ｎ個目にてサンプリングされた入力音声ＳＡおよび受信音声ＳＢａの音量の相関値を演算処理する（ステップＳ５）。 The howling check unit 27a (see FIG. 4) normalizes the volume of the audio waveforms of the input audio SA and the received audio SBa within the range of "-1" to "+1" (step S2).
Next, the howling check unit 27a converts the volume of the normalized audio waveforms of the input audio SA and the received audio SBa into absolute values (step S3).
Next, the howling check unit 27a samples the volume (normalized) of the audio waveforms of the input audio SA and the received audio SBa converted into absolute values at the first to nth audio waveforms (step S4).
Then, the howling check unit 27a calculates a correlation value between the volumes of the input audio SA and the received audio SBa sampled from the first to nth samples (step S5).

ハウリングチェック部２７ａは、相関値と所定値（一例は「0.5」）とを比較し、「相関値＞所定値」である場合に、入力音声ＳＡと受信音声ＳＢａとには同じ音が含まれていると判定する。
ハウリングチェック部２７ａは、入力音声ＳＡと受信音声ＳＢａとに同じ音が含まれていると判定された場合に、さらに入力音声ＳＡの音量と受信音声ＳＢａの音量とを比較する。そして、ハウリングチェック部２７ａは、受信音声ＳＢａの音量に対して入力音声ＳＡの音量が小さい場合に、遠くの音声（他のＰＣのマイク３２に入力された音声）を拾っていると判断する（ステップＳ６）。
なお、入力音声ＳＡと受信音声ＳＢａとに同一の音声情報が含まれていることは、「第１の所定条件を満たす」ことの一例である。
なお、受信音声ＳＢａの音量に対して入力音声ＳＡの音量が小さいことは、「第２の所定条件を満たす」ことの一例である。 The howling check unit 27a compares the correlation value with a predetermined value (an example is "0.5"), and if "correlation value>predetermined value", the input audio SA and the received audio SBa contain the same sound. It is determined that the
When it is determined that the input audio SA and the received audio SBa contain the same sound, the howling check unit 27a further compares the volume of the input audio SA and the volume of the received audio SBa. Then, when the volume of the input audio SA is lower than the volume of the received audio SBa, the howling check unit 27a determines that distant audio (audio input to the microphone 32 of another PC) is being picked up ( Step S6).
Note that the fact that the input audio SA and the received audio SBa include the same audio information is an example of "satisfying the first predetermined condition".
Note that the fact that the volume of the input audio SA is lower than the volume of the received audio SBa is an example of "satisfying the second predetermined condition".

図４に示す出力音声レベル制御部２７ｂは、ハウリングチェック部２７ａによって遠くの音声（他のＰＣのマイク３２に入力された音声）を拾っていると判断されると、受信音声ＳＢａの出力音声レベルを下げる処理を行う（または出力音声レベルを「０」（ゼロ）にする処理を行う）。このような処理を行っても、第１のＰＣ２０Ａと第２のＰＣ２０Ｂとが近くに配置されているので、第２のＰＣ２０Ｂのマイク３２に入力される音（例えば、第２のＰＣ２０Ｂのユーザの音声）を第１のＰＣ２０Ａのユーザが直接聞くことが可能である。
また、出力音声レベル制御部２７ｂは、ハウリングチェック部２７ａによって遠くの音声（他のＰＣのマイク３２に入力された音声）を拾っていないと判断されると、受信音声ＳＢａの出力音声レベルをそのままとする（出力音声レベルを変えない）を行う（ステップＳ７）。 When the howling check unit 27a determines that a distant sound (sound input to the microphone 32 of another PC) is being picked up, the output sound level control unit 27b shown in FIG. 4 controls the output sound level of the received sound SBa. (or perform processing to set the output audio level to "0"). Even if such processing is performed, since the first PC 20A and the second PC 20B are located close to each other, the sound input to the microphone 32 of the second PC 20B (for example, the sound of the user of the second PC 20B) The user of the first PC 20A can directly listen to the audio).
Furthermore, when the howling check unit 27a determines that distant audio (audio input to the microphone 32 of another PC) is not picked up, the output audio level control unit 27b changes the output audio level of the received audio SBa as it is. (the output audio level is not changed) (step S7).

図５に示すように、出力音声レベル制御部２７ｂは、例えば「相関値＞所定値」である場合であって受信音声ＳＢａの音量に対して入力音声ＳＡの音量が小さい場合に、第１のＰＣ２０Ａの出力音声レベルを下げる処理を行う（または出力音声レベルを「０」（ゼロ）にする処理を行う）。なお、三人以上のユーザでＷｅｂ会議を行っている場合、該当するユーザ（近くにいるユーザ）の受信音声ＳＢａの出力音声レベルのみを下げるようにするのがよい（または出力音声レベルを「０」（ゼロ）にするのがよい）。 As shown in FIG. 5, the output audio level control unit 27b controls the first A process is performed to lower the output audio level of the PC 20A (or a process is performed to set the output audio level to "0"). Note that when three or more users are holding a web conference, it is recommended to lower only the output audio level of the received audio SBa of the relevant user (a nearby user) (or set the output audio level to "0"). ” (zero)).

ここまで説明したハウリング抑制処理部２７の処理のイメージを図６に示す。図６に示すように、第２のＰＣ２０Ｂのユーザが話す音声は、第２のＰＣ２０Ｂに接続されるマイク３２を介して第２のＰＣ２０Ｂに入力され、サーバ１０を介して第１のＰＣ２０Ａに送信される。また、第１のＰＣ２０Ａと第２のＰＣ２０Ｂとが近くに配置された場合、第２のＰＣ２０Ｂのユーザが話す音声が実空間の空気を介して伝わり、第１のＰＣ２０Ａに接続されるマイク３２を介して第１のＰＣ２０Ａに入力される。この場合、ハウリング抑制処理部２７は、サーバ１０を介して受信した受信音声ＳＢａの出力音声レベルを下げる処理を行う（または出力音声レベルを「０（ゼロ）」にする処理を行う）。 FIG. 6 shows an image of the processing of the howling suppression processing section 27 described so far. As shown in FIG. 6, the voice spoken by the user of the second PC 20B is input to the second PC 20B via the microphone 32 connected to the second PC 20B, and transmitted to the first PC 20A via the server 10. be done. Further, when the first PC 20A and the second PC 20B are placed close to each other, the voice spoken by the user of the second PC 20B is transmitted through the air in the real space, and the microphone 32 connected to the first PC 20A is transmitted. The data is input to the first PC 20A via the host PC 20A. In this case, the howling suppression processing unit 27 performs processing to lower the output audio level of the received audio SBa received via the server 10 (or performs processing to set the output audio level to "0 (zero)").

以上のように、実施形態に係るＷｅｂ会議システム１は、自身のＰＣ２０が遠くの音声（他のＰＣ２０のマイク３２に入力された音声）を拾っていると判断した場合に、サーバ１０を介して受信した受信音声ＳＢａの出力音声レベルを下げる処理を行う。そのため、ハウリングが発生するほどまでには音声が増幅されないので、ハウリングを抑制することが可能である。 As described above, when the web conference system 1 according to the embodiment determines that its own PC 20 is picking up distant audio (audio input into the microphone 32 of another PC 20), the web conference system 1 according to the embodiment Processing is performed to lower the output audio level of the received received audio SBa. Therefore, since the sound is not amplified to the extent that howling occurs, it is possible to suppress howling.

以上、本発明の実施形態について説明したが、本発明はこれに限定されるものではなく、特許請求の範囲の趣旨を変えない範囲で実施することができる。 Although the embodiments of the present invention have been described above, the present invention is not limited thereto, and can be implemented without changing the spirit of the claims.

１Ｗｅｂ会議システム
１ａハウリング抑制システム
１０サーバ
１１受信部
１２送信部
２０，２０Ａ，２０ＢＰＣ
２１符号化部
２２送信部
２３受信部
２４復号部
２５出力処理部
２６遅延測定処理部
２７ハウリング抑制処理部
２７ａハウリングチェック部
２７ｂ出力音声レベル制御部
３１カメラ
３２マイク
３３表示部
３４スピーカ 1 Web conference system 1a Howling suppression system 10 Server 11 Receiving section 12 Transmitting section 20, 20A, 20B PC
21 Encoding section 22 Transmitting section 23 Receiving section 24 Decoding section 25 Output processing section 26 Delay measurement processing section 27 Howling suppression processing section 27a Howling check section 27b Output audio level control section 31 Camera 32 Microphone 33 Display section 34 Speaker

Claims

A processing device used by a user,
Comparing input audio information based on a signal input from a microphone used by the user with received audio information received from another processing device to determine whether there is audio information that satisfies a first predetermined condition. a first processing unit,
If the first processing unit determines that there is audio information that satisfies the first predetermined condition, the volume information in the input audio information is compared with the volume information in the received audio information, and the input audio information is a second processing unit that determines whether the volume information in satisfies a second predetermined condition;
a third processing unit that performs a process of lowering the output of the received audio information if the second processing unit determines that the volume information in the input audio information satisfies the second predetermined condition;
A processing device comprising:

In comparing the input audio information and the received audio information, a period of the input audio information and the received audio information to be compared is determined based on a communication delay time between the processing devices.
The processing device according to claim 1, characterized in that:

a fourth processing unit configured to transmit and receive packets for delay measurement via a relay device that relays the received audio information, and calculates the communication delay time from the difference between the transmission time and the reception time;
The processing device according to claim 2, characterized in that:

The fourth processing unit periodically transmits the delay measurement packet and receives the response.
The processing device according to claim 3, characterized in that:

The processing device is a client terminal in the SFU (Selective Forwarding Unit) method of WebRTC (Web Real-Time Communication),
The processing device according to claim 1, characterized in that:

A processing method of a processing device used by a user, comprising:
Comparing input audio information based on a signal input from a microphone used by the user with received audio information received from another processing device to determine whether there is audio information that satisfies a first predetermined condition. a first processing step,
If it is determined in the first processing step that there is audio information that satisfies the first predetermined condition, the volume information in the input audio information is compared with the volume information in the received audio information, and the input audio information is a second processing step of determining whether the volume information in satisfies a second predetermined condition;
If it is determined in the second processing step that the volume information in the input audio information satisfies the second predetermined condition, a third processing step of performing a process of lowering the output of the received audio information;
A processing method characterized by having the following.

The computer used by the user,
Comparing input audio information based on a signal input from a microphone used by the user with received audio information received from another processing device to determine whether there is audio information that satisfies a first predetermined condition. a first processing unit,
If the first processing unit determines that there is audio information that satisfies the first predetermined condition, the volume information in the input audio information is compared with the volume information in the received audio information, and the input audio information is a second processing unit that determines whether the volume information in satisfies a second predetermined condition;
a third processing unit that performs a process of lowering the output of the received audio information if the second processing unit determines that the volume information in the input audio information satisfies the second predetermined condition;
A program to function as