JP2023081469A

JP2023081469A - Program, apparatus, method, and system

Info

Publication number: JP2023081469A
Application number: JP2021195194A
Authority: JP
Inventors: 勝義山上; Katsuyoshi Yamagami; 繁樹松田; Shigeki Matsuda; 正明土田; Masaaki Tsuchida
Original assignee: Cotoba Design Inc
Current assignee: Cotoba Design Inc
Priority date: 2021-12-01
Filing date: 2021-12-01
Publication date: 2023-06-13

Abstract

To easily grasp a state of voice of a call partner.SOLUTION: A program for operating a terminal device 10 that comprises a processor 19 and a memory 15 and can transmit/receive voice data to/from another terminal device 10 makes a processor 29 execute a step of receiving voice data transmitted from the other terminal device 10, a step of evaluating a state of the received voice data and outputting it as a first evaluation result, a step of returning the first evaluation result to the other terminal device 10, and a step of outputting voice based on the received voice data to a user.SELECTED DRAWING: Figure 1

Description

本開示は、プログラム、装置、方法及びシステムに関する。 The present disclosure relates to programs, devices, methods and systems.

Web会議システムや音声通信アプリなどで通話している時に、発話者の音声が通話相手に届かない、もしくは、聞き取りにくい状態の音声となることがある。その原因として、発話側の音声入力の問題、発話側の通信の問題及び／または受話側の通信の問題が考えられる。上記のような原因で音声が聞き取りにくいとき、発話者自身がそれに気づくことは難しく、通話相手に指摘されて初めて気づくことになる。 When talking with a web conference system or voice communication application, the speaker's voice may not reach the other party, or the voice may be difficult to hear. This may be caused by speech input problems on the speaking side, communication problems on the speaking side, and/or communication problems on the receiving side. When the voice is difficult to hear due to the above reasons, it is difficult for the speaker himself/herself to notice it, and he/she does not notice it until the other party points it out.

上述した問題に関連する技術として、例えば特許文献１、特許文献２に開示された技術がある。 Techniques related to the above-described problem are disclosed, for example, in Patent Documents 1 and 2.

特許文献１には、ユーザから受信したメッセージが理解可能であることを自動的に検証するためのシステムおよび方法が開示されている。一例として、システムは、入力音声の了解度の推定値を計算し、了解度のしきい値と比較して、計算された了解度の推定値が了解度のしきい値を下回ると判断された場合、ユーザはメッセージの少なくとも一部を繰り返すように促される。 US Pat. No. 5,900,009 discloses a system and method for automatically verifying that messages received from users are understandable. As an example, the system computes an intelligibility estimate of the input speech, compares it to an intelligibility threshold, and determines that the computed intelligibility estimate is below the intelligibility threshold. If so, the user is prompted to repeat at least part of the message.

また、特許文献２には、信号対雑音比が悪いために音声入力がデバイスによって処理されない可能性が高い時期をユーザに示す音声制御装置が開示されている。 Also, US Pat. No. 6,200,003 discloses a voice control apparatus that indicates to the user when voice input is likely not to be processed by the device due to poor signal-to-noise ratio.

米国特許第７６６０７１６号明細書U.S. Pat. No. 7,660,716 米国特許第９５５８７５８号明細書U.S. Pat. No. 9,558,758

複数のユーザが音声通話を行っている状況においては、音声の状態に関して、少なくとも発話するユーザの端末入力時の音声の状態と、その音声が通話相手に到達した状態での音声の状態は、同一とは限らない。このため、複数人と通話中のユーザは、音声の状態に関して、自身側に問題があるのか、通話相手側に問題があるのかを容易に知ることができない。 In a situation where multiple users are making a voice call, at least the state of the voice when the user who speaks is input to the terminal and the state of the voice when the voice reaches the other party are the same. Not necessarily. Therefore, a user who is talking with a plurality of people cannot easily know whether there is a problem with the voice state of the user himself or the other party.

そこで、本開示は、上記課題を解決すべくなされたものであって、その目的は、通話相手の音声の状態を容易に把握することである。 Accordingly, the present disclosure has been made to solve the above problems, and its object is to easily grasp the state of the voice of the other party.

プロセッサとメモリとを備え、他の端末装置と音声データの送受信が可能な端末装置を動作させるためのプログラムである。プログラムは、プロセッサに、他の端末装置から送信された音声データを受信するステップと、受信した音声データの状態を評価して第１の評価結果として出力するステップと、第１の評価結果を他の端末装置に返信するステップと、受信した音声データに基づく音声をユーザに出力するステップとを実行させる。 A program for operating a terminal device having a processor and a memory and capable of transmitting and receiving voice data to and from another terminal device. The program causes the processor to receive voice data transmitted from another terminal device, evaluate the state of the received voice data and output it as a first evaluation result, and transmit the first evaluation result to the processor. and outputting voice based on the received voice data to the user.

本開示によれば、通話相手の音声の状態を容易に把握することができる。 According to the present disclosure, it is possible to easily grasp the state of the voice of the other party.

実施形態のシステムの全体の構成を示す図である。It is a figure showing composition of the whole system of an embodiment. 実施形態の端末装置の機能的な構成を示す図である。It is a figure which shows the functional structure of the terminal device of embodiment. 実施形態の音声判定部及び提示制御部の機能構成の例を表すブロック図である。It is a block diagram showing the example of functional composition of the voice judgment part of an embodiment, and a presentation control part. 実施形態のサーバの機能的な構成を示す図である。It is a figure which shows the functional structure of the server of embodiment. 実施形態のデータベースのデータ構造を示す図である。It is a figure which shows the data structure of the database of embodiment. 実施形態のデータベースのデータ構造を示す図である。It is a figure which shows the data structure of the database of embodiment. 実施形態のシステムにおける処理流れの一例を示すフローチャートである。It is a flowchart which shows an example of the processing flow in the system of embodiment. 実施形態のシステムにおける処理流れの他の例を示すフローチャートである。8 is a flow chart showing another example of the processing flow in the system of the embodiment; 実施形態のシステムにおける処理流れの他の例を示すフローチャートである。8 is a flow chart showing another example of the processing flow in the system of the embodiment; 実施形態のシステムにおける処理流れの一例を示すシーケンス図である。It is a sequence diagram showing an example of a processing flow in the system of the embodiment. 実施形態の端末装置で表示される画面の一例を表す模式図である。It is a schematic diagram showing an example of the screen displayed on the terminal device of the embodiment. 実施形態の端末装置で表示される画面の別の一例を表す模式図である。It is a schematic diagram showing another example of the screen displayed on the terminal device of the embodiment. 実施形態の端末装置で表示される画面の一例を表す模式図である。It is a schematic diagram showing an example of the screen displayed on the terminal device of the embodiment. 実施形態の端末装置で表示される画面の別の一例を表す模式図である。It is a schematic diagram showing another example of the screen displayed on the terminal device of the embodiment. 実施形態の端末装置で表示される画面の別の一例を表す模式図である。It is a schematic diagram showing another example of the screen displayed on the terminal device of the embodiment.

以下、本開示の実施形態について図面を参照して説明する。実施形態を説明する全図において、共通の構成要素には同一の符号を付し、繰り返しの説明を省略する。なお、以下の実施形態は、特許請求の範囲に記載された本開示の内容を不当に限定するものではない。また、実施形態に示される構成要素のすべてが、本開示の必須の構成要素であるとは限らない。また、各図は模式図であり、必ずしも厳密に図示されたものではない。 Hereinafter, embodiments of the present disclosure will be described with reference to the drawings. In all the drawings for explaining the embodiments, common constituent elements are denoted by the same reference numerals, and repeated explanations are omitted. It should be noted that the following embodiments do not unduly limit the content of the present disclosure described in the claims. Also, not all the components shown in the embodiments are essential components of the present disclosure. Each figure is a schematic diagram and is not necessarily strictly illustrated.

また、以下の説明において、「プロセッサ」は、１以上のプロセッサである。少なくとも１つのプロセッサは、典型的には、ＣＰＵ（Central Processing Unit）のようなマイクロプロセッサであるが、ＧＰＵ（Graphics Processing Unit）のような他種のプロセッサでもよい。少なくとも１つのプロセッサは、シングルコアでもよいしマルチコアでもよい。 Also, in the following description, a "processor" is one or more processors. The at least one processor is typically a microprocessor such as a CPU (Central Processing Unit), but may be another type of processor such as a GPU (Graphics Processing Unit). At least one processor may be single-core or multi-core.

また、少なくとも１つのプロセッサは、処理の一部又は全部を行うハードウェア回路（例えばＦＰＧＡ（Field-Programmable Gate Array）又はＡＳＩＣ（Application Specific Integrated Circuit））といった広義のプロセッサでもよい。 Also, at least one processor may be a broadly defined processor such as a hardware circuit (for example, FPGA (Field-Programmable Gate Array) or ASIC (Application Specific Integrated Circuit)) that performs part or all of the processing.

また、以下の説明において、「ｘｘｘテーブル」といった表現により、入力に対して出力が得られる情報を説明することがあるが、この情報は、どのような構造のデータでもよいし、入力に対する出力を発生するニューラルネットワークのような学習モデルでもよい。従って、「ｘｘｘテーブル」を「ｘｘｘ情報」と言うことができる。 In the following explanation, the expression "xxx table" may be used to describe information that produces an output for an input. It may be a learning model such as a generated neural network. Therefore, the "xxx table" can be called "xxx information".

また、以下の説明において、各テーブルの構成は一例であり、１つのテーブルは、２以上のテーブルに分割されてもよいし、２以上のテーブルの全部又は一部が１つのテーブルであってもよい。 Also, in the following description, the configuration of each table is an example, and one table may be divided into two or more tables, or all or part of two or more tables may be one table. good.

また、以下の説明において、「プログラム」を主語として処理を説明する場合があるが、プログラムは、プロセッサによって実行されることで、定められた処理を、適宜に記憶部及び／又はインタフェース部などを用いながら行うため、処理の主語が、プロセッサ（或いは、そのプロセッサを有するコントローラのようなデバイス）とされてもよい。 Further, in the following description, the processing may be described using the term “program” as the subject. As it is used, the subject of processing may be a processor (or a device, such as a controller, having that processor).

プログラムは、計算機のような装置にインストールされてもよいし、例えば、プログラム配布サーバ又は計算機が読み取り可能な（例えば非一時的な）記録媒体にあってもよい。また、以下の説明において、２以上のプログラムが１つのプログラムとして実現されてもよいし、１つのプログラムが２以上のプログラムとして実現されてもよい。 The program may be installed in a device such as a computer, or may be, for example, in a program distribution server or a computer-readable (eg, non-temporary) recording medium. Also, in the following description, two or more programs may be implemented as one program, and one program may be implemented as two or more programs.

また、以下の説明において、種々の対象の識別情報として、識別番号が使用されるが、識別番号以外の種類の識別情報（例えば、英字や符号を含んだ識別子）が採用されてもよい。 In the following description, identification numbers are used as identification information for various objects, but identification information of types other than identification numbers (for example, identifiers including alphabetic characters and symbols) may be employed.

また、以下の説明において、同種の要素を区別しないで説明する場合には、参照符号（又は、参照符号のうちの共通符号）を使用し、同種の要素を区別して説明する場合は、要素の識別番号（又は参照符号）を使用することがある。 In addition, in the following description, when describing the same type of elements without distinguishing between them, reference symbols (or common symbols among the reference symbols) are used, and when describing the same types of elements with different An identification number (or reference sign) may be used.

また、以下の説明において、制御線や情報線は、説明上必要と考えられるものを示しており、製品上必ずしも全ての制御線や情報線を示しているとは限らない。全ての構成が相互に接続されていてもよい。 Also, in the following description, control lines and information lines indicate those considered necessary for the description, and not all control lines and information lines are necessarily indicated on the product. All configurations may be interconnected.

＜０システムの概要＞
本開示のシステムでは、一例としてスマートフォン、タブレット端末など、移動体通信網などのネットワークを用いて相互通話可能な端末装置間において、端末装置のユーザが発話した音声が通話相手の端末装置においてどのように受信されているか、すなわち、通話相手における音声状態を容易に把握できる。このため、端末装置は、通話相手である他の端末装置から送信された音声データを受信し、受信した音声データの状態を評価して第１の評価結果として出力し、この第１の評価結果を前記他の端末装置に返信する。 <0 System Overview>
In the system of the present disclosure, as an example, between terminal devices such as smartphones and tablet terminals that are capable of mutual communication using a network such as a mobile communication network, how the voice uttered by the user of the terminal device is expressed in the terminal device of the other party. In other words, the voice state of the other party can be easily grasped. For this reason, the terminal device receives voice data transmitted from another terminal device that is the other party of the call, evaluates the state of the received voice data, outputs it as a first evaluation result, and outputs this first evaluation result. to the other terminal device.

また、本開示のシステムでは、端末装置のユーザが発話した音声の状態を容易に把握できる。このため、端末装置は、ユーザから発せられた音声に基づく音声データの品質を評価し、品質の評価結果である第３の評価結果をユーザに提示する。 Also, in the system of the present disclosure, it is possible to easily grasp the state of the voice uttered by the user of the terminal device. For this reason, the terminal device evaluates the quality of voice data based on the voice uttered by the user, and presents the user with the third evaluation result, which is the quality evaluation result.

さらに、本開示のシステムでは、端末装置を用いて相互通話中のユーザの音声パワーのバラツキを評価し、バラツキの評価結果をユーザに提示する。このため、端末装置間での通話を仲介するサーバは、端末装置毎の入力音声の音声パワーを検出し、検出結果に基づいて、音声パワーの平均値及び分散を算出し、算出した平均値及び分散に基づいて、複数の端末装置を利用する各々の発話者の音声パワーと平均値とのずれが所定値以上であるか否かを判定し、ずれが所定値以上であると判定された発話者に対して第４の判定結果を提示する。 Furthermore, in the system of the present disclosure, the terminal device is used to evaluate the variation in voice power of the user during mutual communication, and the evaluation result of the variation is presented to the user. For this reason, a server that mediates calls between terminal devices detects the voice power of the input voice for each terminal device, calculates the average value and variance of the voice power based on the detection results, and calculates the calculated average value and variance. Based on the variance, it is determined whether or not the difference between the voice power of each speaker using a plurality of terminal devices and the average value is a predetermined value or more, and the utterances determined to have a difference of a predetermined value or more A fourth determination result is presented to the person.

＜１システム全体の構成図＞
図１は、実施形態に係るシステム１の全体構成の例を示す図である。図１に示すように、システム１は、端末装置１０と、サーバ２０とを備えている。端末装置１０と、サーバ２０とは、有線又は無線の通信規格（含む移動体通信規格）を用い、ネットワーク８０を介して相互に通信可能に接続されている。図示の例では、複数の端末装置１０がシステム１に含まれている。 <1 Configuration diagram of the entire system>
FIG. 1 is a diagram showing an example of the overall configuration of a system 1 according to an embodiment. As shown in FIG. 1, the system 1 includes a terminal device 10 and a server 20. As shown in FIG. The terminal device 10 and the server 20 are connected to communicate with each other via a network 80 using a wired or wireless communication standard (including a mobile communication standard). In the illustrated example, the system 1 includes a plurality of terminal devices 10 .

ネットワーク８０は、インターネット、ＬＡＮ、無線基地局等によって構築される各種移動通信システム等で構成される。例えば、ネットワークには、３Ｇ、４Ｇ、５Ｇ移動通信システム、ＬＴＥ（Long Term Evolution）、所定のアクセスポイントによってインターネットに接続可能な無線ネットワーク（例えばWi-Fi（登録商標））等が含まれる。無線で接続する場合、通信プロトコルとして例えば、Ｚ－Ｗａｖｅ（登録商標）、ＺｉｇＢｅｅ（登録商標）、Ｂｌｕｅｔｏｏｔｈ（登録商標）等が含まれる。有線で接続する場合は、ネットワークには、ＵＳＢ（Universal Serial Bus）ケーブル等により直接接続するものも含む。 The network 80 is composed of various mobile communication systems constructed by the Internet, LAN, wireless base stations, and the like. For example, networks include 3G, 4G, and 5G mobile communication systems, LTE (Long Term Evolution), wireless networks (for example, Wi-Fi (registered trademark)) that can be connected to the Internet through predetermined access points, and the like. When connecting wirelessly, communication protocols include, for example, Z-Wave (registered trademark), ZigBee (registered trademark), Bluetooth (registered trademark), and the like. In the case of wired connection, the network includes direct connection using a USB (Universal Serial Bus) cable or the like.

なお、図１では、サーバ２０が１台のコンピュータである場合を示しているが、サーバ２０は、複数台のコンピュータが組み合わされて実現されてもよい。また、図１では、端末装置１０が３台である場合を示しているが、システム１に収容される端末装置１０の台数に制限はなく、２台、あるいは４台以上であっても構わない。 Although FIG. 1 shows the case where the server 20 is one computer, the server 20 may be realized by combining a plurality of computers. Although FIG. 1 shows a case where there are three terminal devices 10, the number of terminal devices 10 accommodated in the system 1 is not limited, and may be two or four or more. .

端末装置１０は、サーバ２０を介して相互に通信可能に構成された端末である。さらに、端末装置１０は、他の端末装置１０との間で音声データの送受信が可能な、言い換えれば端末装置１０のユーザが発話した音声により相互通話可能な端末である。例えば、端末装置１０は、スマートフォン、タブレット端末など、移動体通信網などのネットワークを用いて相互通話可能な情報処理装置である。あるいは、端末装置１０は、所定の音声会話アプリケーションが搭載された据え置き型のＰＣ（Personal Computer）、ラップトップＰＣ、ヘッドマウントディスプレイ等の情報処理装置であってもよい。 The terminal devices 10 are terminals configured to be able to communicate with each other via the server 20 . Further, the terminal device 10 is a terminal capable of transmitting/receiving voice data to/from another terminal device 10, in other words, capable of mutual communication by voice spoken by the user of the terminal device 10. FIG. For example, the terminal device 10 is an information processing device such as a smart phone, a tablet terminal, etc., capable of mutual communication using a network such as a mobile communication network. Alternatively, the terminal device 10 may be an information processing device such as a stationary PC (Personal Computer), a laptop PC, a head-mounted display, etc., in which a predetermined voice conversation application is installed.

図１に示すように、端末装置１０は、通信ＩＦ（Interface）１２と、入力装置１３と、出力装置１４と、メモリ１５と、ストレージ１６と、プロセッサ１９とを備える。通信ＩＦ１２、入力装置１３、出力装置１４、メモリ１５、ストレージ１６、及びプロセッサ１９は、例えば、バスを介して互いに通信可能に接続されている。 As shown in FIG. 1 , the terminal device 10 includes a communication IF (Interface) 12 , an input device 13 , an output device 14 , a memory 15 , a storage 16 and a processor 19 . The communication IF 12, the input device 13, the output device 14, the memory 15, the storage 16, and the processor 19 are communicably connected to each other via a bus, for example.

通信ＩＦ１２は、端末装置１０が外部の装置との音声通信を含む通信をするため、音声データを含む信号を送受信するためのインタフェースである。入力装置１３は、ユーザからの入力操作を受け付けるための入力装置である。入力装置１３は、例えば、タッチパネル、タッチパッド、マウス等のポインティングデバイス、キーボード等を含む。出力装置１４は、ユーザに対し情報を提示するための出力装置である。出力装置１４は、例えば、ディスプレイ、スピーカ等を含む。 The communication IF 12 is an interface for transmitting and receiving signals including voice data so that the terminal device 10 can communicate with an external device, including voice communication. The input device 13 is an input device for receiving an input operation from a user. The input device 13 includes, for example, a touch panel, a touch pad, a pointing device such as a mouse, a keyboard, and the like. The output device 14 is an output device for presenting information to the user. The output device 14 includes, for example, a display and speakers.

メモリ１５は、プログラム、及び、プログラム等で処理されるデータ等を一時的に記憶するためのものであり、例えば、ＤＲＡＭ（Dynamic Random Access Memory）等の揮発性のメモリにより実現される。ストレージ１６は、データを保存するための記憶装置であり、例えば、フラッシュメモリ、ＨＤＤ（Hard Disc Drive）等の不揮発性のメモリにより実現される。プロセッサ１９は、プログラムに記述された命令セットを実行するためのハードウェアであり、演算装置、レジスタ、周辺回路等により構成される。 The memory 15 temporarily stores programs, data processed by the programs, and the like, and is realized by, for example, a volatile memory such as a DRAM (Dynamic Random Access Memory). The storage 16 is a storage device for storing data, and is realized by non-volatile memory such as flash memory and HDD (Hard Disc Drive). The processor 19 is hardware for executing an instruction set described in a program, and is composed of arithmetic units, registers, peripheral circuits, and the like.

サーバ２０は、サービスに関する情報を管理し、管理している情報を参照し、サービスを提供する情報処理装置である。さらに、サーバ２０は、複数の端末装置１０の間で（２台の端末装置１０のみならず、図１に示すように３台の端末装置１０があれば、これら３台の間で）相互に音声データの送受信を行い、相互通話を可能にするサービスを提供する。サーバ２０は、例えば、ネットワーク８０に接続されたコンピュータである。 The server 20 is an information processing device that manages information about services, refers to the managed information, and provides services. Furthermore, the server 20 mutually communicates between a plurality of terminal devices 10 (not only between two terminal devices 10, but also between these three if there are three terminal devices 10 as shown in FIG. 1). Provides a service that enables two-way communication by sending and receiving voice data. Server 20 is, for example, a computer connected to network 80 .

なお、サーバ２０は、相互通話可能なサービスに加えて、相互にデータ通信（含むテキストベースのメッセージ）を行うサービスを提供してもよい。このようなサービスの一例として、端末装置１０によりインターネット上のサイトにアクセスできるサービスや、テキスト形式でメッセージを伝えるサービス、つまり、チャットサービスが挙げられる。 Note that the server 20 may provide a service for mutual data communication (including text-based messages) in addition to the service that enables mutual communication. Examples of such services include a service that allows access to sites on the Internet using the terminal device 10, and a service that transmits messages in text format, that is, a chat service.

図１に示すように、サーバ２０は、通信ＩＦ２２と、入出力ＩＦ２３と、メモリ２５と、ストレージ２６と、プロセッサ２９とを備える。通信ＩＦ２２、入出力ＩＦ２３、メモリ２５、ストレージ２６、及びプロセッサ２９は、例えば、バスを介して互いに通信可能に接続されている。 As shown in FIG. 1, the server 20 includes a communication IF 22, an input/output IF 23, a memory 25, a storage 26, and a processor 29. The communication IF 22, the input/output IF 23, the memory 25, the storage 26, and the processor 29 are communicably connected to each other via a bus, for example.

通信ＩＦ２２は、サーバ２０が外部の装置と通信するため、信号を送受信するためのインタフェースである。入出力ＩＦ２３は、ユーザからの入力操作を受け付けるための入力装置、及び、ユーザに対し情報を提示するための出力装置とのインタフェースとして機能する。メモリ２５は、プログラム、及び、プログラム等で処理されるデータ等を一時的に記憶するためのものであり、例えばＤＲＡＭ等の揮発性のメモリにより実現される。 The communication IF 22 is an interface for transmitting and receiving signals for the server 20 to communicate with an external device. The input/output IF 23 functions as an interface with an input device for receiving input operations from the user and an output device for presenting information to the user. The memory 25 is for temporarily storing programs, data processed by the programs, etc., and is realized by a volatile memory such as a DRAM, for example.

ストレージ２６は、データを保存するための記憶装置であり、例えばフラッシュメモリ、ＨＤＤ等の不揮発性のメモリにより実現される。ストレージ２６は、必ずしも単独の回路により実現されなくてもよい。ストレージ２６は、例えば、複数の記憶回路により実現されてもよい。プロセッサ２９は、プログラムに記述された命令セットを実行するためのハードウェアであり、演算装置、レジスタ、周辺回路などにより構成される。 The storage 26 is a storage device for storing data, and is implemented by non-volatile memory such as flash memory and HDD. Storage 26 does not necessarily have to be realized by a single circuit. The storage 26 may be realized by, for example, multiple storage circuits. The processor 29 is hardware for executing an instruction set described in a program, and is composed of arithmetic units, registers, peripheral circuits, and the like.

＜１．１端末装置の構成＞
図２は、本実施形態のシステム１に含まれる端末装置１０の構成を示すブロック図である。図２に示すように、端末装置１０は、通信部１２１と、入力装置１３と、出力装置１４と、カメラ１６０と、記憶部１７０と、制御部１８０とを備える。 <1.1 Configuration of terminal device>
FIG. 2 is a block diagram showing the configuration of the terminal device 10 included in the system 1 of this embodiment. As shown in FIG. 2 , the terminal device 10 includes a communication section 121 , an input device 13 , an output device 14 , a camera 160 , a storage section 170 and a control section 180 .

通信部１２１は、端末装置１０が他の装置と通信するための処理を行う。通信部１２１は、制御部１８０で生成された信号に送信処理を施し、外部（例えば、サーバ２０）へ送信する。通信部１２１は、外部から受信した信号に受信処理を施し、制御部１８０へ出力する。 The communication unit 121 performs processing for the terminal device 10 to communicate with other devices. The communication unit 121 performs transmission processing on the signal generated by the control unit 180 and transmits the signal to the outside (for example, the server 20). Communication unit 121 performs reception processing on a signal received from the outside, and outputs the signal to control unit 180 .

加えて、通信部１２１は、他の端末装置１０との間で音声データの送受信を行うための処理を行う。従って、通信部１２１は、移動体通信網の規格に準拠した通信を行う。一義的には、通信部１２１は、端末装置１０が存在するエリアに割り当てられた基地局との間での通信を行い、基地局及びサーバ２０を含む移動体通信システムが、端末装置１０間の音声データの送受信を実現する。移動体通信網の規格は、本実施形態のシステム１が運用されている時期において、移動体通信システムを運用する通信事業者が提供しているサービスが準拠する規格である。このような規格の一例として、第４世代移動体通信システムの規格であるＩＭＴ－Ａｄｖａｎｃｅｄ、第５世代移動通信システムの規格であるＩＭＴ－２０２０などが挙げられる。また、第３世代移動通信システムが依然として運用されている地域においては、この第３世代移動通信システムの規格であるＩＭＴ－２０００であってもよい。上に上げた移動体通信規格は、音声通信にもデータ通信にも適用される規格であるので、通信部１２１は、音声通話及びデータ通信のいずれも行いうる。 In addition, the communication unit 121 performs processing for transmitting and receiving audio data to and from another terminal device 10 . Therefore, the communication unit 121 performs communication conforming to the standard of the mobile communication network. Primarily, the communication unit 121 communicates with the base station assigned to the area where the terminal device 10 exists, and the mobile communication system including the base station and the server 20 communicates between the terminal devices 10 Realize transmission and reception of voice data. The standard of the mobile communication network is the standard to which the services provided by the telecommunications carrier operating the mobile communication system comply with when the system 1 of the present embodiment is in operation. Examples of such standards include IMT-Advanced, which is the standard for the 4th generation mobile communication system, and IMT-2020, which is the standard for the 5th generation mobile communication system. Also, in areas where the third generation mobile communication system is still in operation, IMT-2000, which is the standard for this third generation mobile communication system, may be used. Since the mobile communication standards listed above are applicable to both voice communication and data communication, the communication unit 121 can perform both voice communication and data communication.

入力装置１３は、端末装置１０を所有するユーザが指示を入力するための装置である。入力装置１３は、例えば、マウス１３１、キーボード１３２、及び、操作面へ触れることで指示が入力されるタッチ・センシティブ・デバイス１３３等により実現される。また、入力装置１３は、マイク１３４により実現される。入力装置１３は、ユーザから入力される指示又は音声を電気信号へ変換し、電気信号を制御部１８０へ出力する。なお、入力装置１３は、マウス１３１、及びキーボード１３２等の物理的な操作デバイスに限定されない。入力装置１３には、例えば、外部の入力機器から入力される電気信号を受け付ける受信ポートが含まれてもよい。さらに、端末装置１０の仕様によってはマウス１３１、キーボード１３２を省略してもよい。この場合、タッチ・センシティブ・デバイス１３３によりテキスト等の各種入力がされる。 The input device 13 is a device for the user who owns the terminal device 10 to input instructions. The input device 13 is realized by, for example, a mouse 131, a keyboard 132, and a touch sensitive device 133 that inputs instructions by touching an operation surface. Also, the input device 13 is implemented by a microphone 134 . The input device 13 converts an instruction or voice input by the user into an electrical signal and outputs the electrical signal to the control unit 180 . Note that the input device 13 is not limited to physical operation devices such as the mouse 131 and keyboard 132 . The input device 13 may include, for example, a reception port that receives an electrical signal input from an external input device. Furthermore, depending on the specifications of the terminal device 10, the mouse 131 and keyboard 132 may be omitted. In this case, the touch-sensitive device 133 is used to input various texts and the like.

出力装置１４は、端末装置１０を所有するユーザへ情報を提示するための装置である。出力装置１４は、例えば、ディスプレイ１４１、スピーカ１４２等により実現される。ディスプレイ１４１は、制御部１８０の制御に応じて、画像、動画、テキスト等のデータを表示する。ディスプレイ１４１は、例えば、ＬＣＤ（Liquid Crystal Display）、又は有機ＥＬ（Electro-Luminescence）ディスプレイ等によって実現される。スピーカ１４２は、制御部１８０の制御に応じて、音声を出力する。 The output device 14 is a device for presenting information to the user who owns the terminal device 10 . The output device 14 is realized by, for example, a display 141, a speaker 142, and the like. The display 141 displays data such as images, moving images, and text under the control of the control unit 180 . The display 141 is implemented by, for example, an LCD (Liquid Crystal Display) or an organic EL (Electro-Luminescence) display. Speaker 142 outputs sound under the control of control unit 180 .

なお、図２では端末装置１０のマイク１３４及び／またはスピーカ１４２には、例えばBluetooth（登録商標）等の無線通信手段により端末装置１０のその他の構成要素と離間して使用可能なものも含まれる。例えば、これらマイク１３４及びスピーカ１４２がいわゆるヘッドセットとして別体に構成されるものである。 In FIG. 2, the microphone 134 and/or the speaker 142 of the terminal device 10 include those that can be used separately from other components of the terminal device 10 by wireless communication means such as Bluetooth (registered trademark). . For example, the microphone 134 and speaker 142 are configured separately as a so-called headset.

カメラ１６０は、受光素子により光を受光し、撮影信号として出力するためのデバイスである。カメラ１６０は、例えば、光を受光するレンズがディスプレイ１４１と並べて設置されている。これにより、カメラ１６０は、ディスプレイ１４１を視認するユーザを撮影可能となっている。 The camera 160 is a device for receiving light with a light receiving element and outputting it as a photographing signal. The camera 160 has, for example, a lens for receiving light arranged side by side with the display 141 . This allows the camera 160 to capture an image of the user viewing the display 141 .

記憶部１７０は、例えば、メモリ１５、及びストレージ１６等により実現され、端末装置１０が使用するデータ、及びプログラムを記憶する。具体的には、記憶部１７０は、例えば、アプリケーション１７１を記憶する。アプリケーション１７１は、制御部１８０により実行されることで、後述する制御部１８０における操作受付部１８１等の機能実現部を実現する。また、記憶部１７０は、音声判定部１８５による評価結果、判定結果が格納される検出結果データ１７２、及び音声判定部１８５による判定の基準となる評価テーブル１７３を記憶する。 The storage unit 170 is realized by, for example, the memory 15 and the storage 16, and stores data and programs used by the terminal device 10. FIG. Specifically, the storage unit 170 stores an application 171, for example. The application 171 is executed by the control unit 180 to implement a function implementation unit such as an operation receiving unit 181 in the control unit 180, which will be described later. The storage unit 170 also stores evaluation results by the voice determination unit 185 , detection result data 172 storing the determination results, and an evaluation table 173 that serves as criteria for determination by the voice determination unit 185 .

制御部１８０は、プロセッサ１９が記憶部１７０に記憶されるプログラム（アプリケーション１７１）を読み込み、プログラムに含まれる命令を実行することにより実現される。制御部１８０は、端末装置１０の動作を制御する。具体的には、例えば、制御部１８０は、操作受付部１８１、通信制御部１８２、音声入力部１８３、音声出力部１８４、音声判定部１８５、提示制御部１８６としての機能を発揮する。 Control unit 180 is implemented by processor 19 reading a program (application 171) stored in storage unit 170 and executing instructions included in the program. The control unit 180 controls operations of the terminal device 10 . Specifically, for example, the control unit 180 functions as an operation reception unit 181 , a communication control unit 182 , an audio input unit 183 , an audio output unit 184 , an audio determination unit 185 and a presentation control unit 186 .

操作受付部１８１は、入力装置１３から入力されるユーザの操作を受け付けるための処理を行う。 The operation accepting unit 181 performs processing for accepting a user's operation input from the input device 13 .

通信制御部１８２は、端末装置１０が、相互通信（含む相互通話）中である他の端末装置１０と、通信プロトコルに従って通信するための処理を行う。通信制御部１８２は、例えば、マイク１３４から入力される音声が音声入力部１８３により変換された音声データを、相互通話中の他の端末装置１０へ送信する。また、通信制御部１８２は、他の端末装置１０から送信される音声データを受信して音声出力部１８４に送出し、この音声出力部１８４により音声に変換してスピーカ１４２から出力させる。 The communication control unit 182 performs processing for the terminal device 10 to communicate with another terminal device 10 that is in mutual communication (including mutual communication) according to a communication protocol. The communication control unit 182 transmits, for example, voice data obtained by converting the voice input from the microphone 134 by the voice input unit 183 to the other terminal device 10 in mutual communication. The communication control unit 182 also receives audio data transmitted from another terminal device 10 , sends the audio data to the audio output unit 184 , converts the data into audio by the audio output unit 184 , and outputs the data from the speaker 142 .

また、通信制御部１８２は、例えば、キーボード１３２、タッチ・センシティブ・デバイス１３３等から入力されるテキストデータを相互通信中の他の端末装置１０へ送信する。また、通信制御部１８２は、他の端末装置１０から送信されるデータを受信し、テキストデータに変換してディスプレイ１４１から出力する。 Also, the communication control unit 182 transmits text data input from, for example, the keyboard 132, the touch sensitive device 133, etc. to the other terminal devices 10 that are in mutual communication. The communication control unit 182 also receives data transmitted from another terminal device 10 , converts it into text data, and outputs the text data from the display 141 .

音声入力部１８３は、端末装置１０のユーザが発話し、マイク１７４から入力された音声を音声データに変換し、通信制御部１８２に送出する。音声出力部１８４は、通信制御部１８２が出力する音声データを音声に変換し、スピーカ１４２から出力させる。 The voice input unit 183 converts the voice uttered by the user of the terminal device 10 and input from the microphone 174 into voice data, and sends the voice data to the communication control unit 182 . The audio output unit 184 converts audio data output by the communication control unit 182 into audio, and outputs the audio from the speaker 142 .

音声判定部１８５は、音声入力部１８３及び音声出力部１８４が出力する音声データに対して各種信号処理を行い、音声データの品質についての評価、判定を行ってその評価結果、判定結果を検出結果データ１７２に格納する。詳細は後述する。 The audio determination unit 185 performs various signal processing on the audio data output by the audio input unit 183 and the audio output unit 184, evaluates and determines the quality of the audio data, and outputs the evaluation results and determination results as detection results. Stored in data 172 . Details will be described later.

提示制御部１８６は、ユーザに対して種々の情報を提示するため、出力装置１４を制御する。具体的には、例えば、提示制御部１８６は、音声判定部１８５による評価結果をディスプレイ１４１に表示させる。また、例えば、提示制御部１８６は、評価結果を音声に変換して音声出力部１８４に送出し、この音声出力部１８４及びスピーカ１４２を介して評価結果を音声出力する。 The presentation control unit 186 controls the output device 14 to present various information to the user. Specifically, for example, the presentation control unit 186 causes the display 141 to display the evaluation result by the audio determination unit 185 . Also, for example, the presentation control unit 186 converts the evaluation result into audio and sends it to the audio output unit 184 , and outputs the evaluation result as audio through the audio output unit 184 and the speaker 142 .

図３は、通信制御部１８２、音声判定部１８５及び提示制御部１８６を含む端末装置１０の機能構成の例を表すブロック図である。通信制御部１８２は、符号化処理部１８２１、送信部１８２２、受信部１８２３及び復号化処理部１８２４を有する。また、音声判定部１８５は、音声パワー検出部１８５１、SN比検出部１８５２、マイク特性検出部１８５３、入力音声評価部１８５４及び受信音声評価部１８５５を有する。さらに、提示制御部１８６は、入力音声状態提示部１８６１、通話相手受信音声状態提示部１８６２及び音声パワー状態提示部１８６３を有する。 FIG. 3 is a block diagram showing an example of the functional configuration of the terminal device 10 including the communication control section 182, the voice determination section 185 and the presentation control section 186. As shown in FIG. The communication control section 182 has an encoding processing section 1821 , a transmitting section 1822 , a receiving section 1823 and a decoding processing section 1824 . Also, the voice determination section 185 has a voice power detection section 1851 , an SN ratio detection section 1852 , a microphone characteristic detection section 1853 , an input voice evaluation section 1854 and a received voice evaluation section 1855 . Furthermore, the presentation control unit 186 has an input voice state presenting unit 1861 , a caller receiving voice state presenting unit 1862 and a voice power state presenting unit 1863 .

符号化処理部１８２１は、音声入力部１８３が出力する音声データに符号化処理（エンコード処理）を行ってデータを圧縮し、送信部１８２２に送出する。符号化処理は、移動体通信規格に準拠したものである。この際、符号化処理部１８２１は、符号化した音声データをパケットに変換し、個々のパケットに番号を付与する。符号化処理部１８２１による一連の処理は、端末装置１０による移動体通信がＬＴＥ規格に基づくのであればＶｏＬＴＥ（Voice over Long Term Evolution）として知られる処理である。また、一般的にはＶｏＩＰ（Voice over Internet Protocol）として知られる処理である。 The encoding processing unit 1821 compresses the audio data output from the audio input unit 183 by performing encoding processing (encoding processing) on the audio data, and sends the compressed data to the transmission unit 1822 . The encoding process complies with mobile communication standards. At this time, the encoding processing unit 1821 converts the encoded audio data into packets and assigns numbers to each packet. A series of processing by the encoding processing unit 1821 is processing known as VoLTE (Voice over Long Term Evolution) if mobile communication by the terminal device 10 is based on the LTE standard. It is also a process generally known as VoIP (Voice over Internet Protocol).

送信部１８２２は、符号化処理部１８２１により符号化処理された音声データを、送信部１８２２に含まれる図略のアンテナを介して無線送信し、サーバ２０の送受信部２０３１に送出する。送信部１８２２による無線通信も、既に説明した移動体通信網の規格に準拠する。 The transmission unit 1822 wirelessly transmits the audio data encoded by the encoding processing unit 1821 via an antenna (not shown) included in the transmission unit 1822 and sends the data to the transmission/reception unit 2031 of the server 20 . Wireless communication by the transmitting unit 1822 also complies with the mobile communication network standards already described.

送信部１８２２は、音声データを無線送信する際に、個々の端末装置１０に固有の識別子を付与する。さらに、送信部１８２２は、第１の評価結果である受信音声評価部１８５５が出力した受信音声の評価スコアと、第３の評価結果である音声パワー検出部１８５１が検出した入力音声のパワー検出結果を、音声データに重畳して通話相手である端末装置１０に送信する。 The transmitting unit 1822 assigns a unique identifier to each terminal device 10 when wirelessly transmitting voice data. Further, the transmission unit 1822 generates the evaluation score of the received voice output by the received voice evaluation unit 1855, which is the first evaluation result, and the power detection result of the input voice detected by the voice power detection unit 1851, which is the third evaluation result. is superimposed on the voice data and transmitted to the terminal device 10 that is the other party.

受信部１８２３は、サーバ２０の送受信部２０３１から送出され、受信部１８２３に含まれる図略のアンテナを介して受信した無線入力を受信し、復号化処理部に送出する。ここで、受信部１８２３が受信する無線入力には、通話相手の端末装置１０の送信部１８２２が付与した、通話相手の端末装置１０に固有の識別子が付与されている。また、受信部１８２３は、通話相手の端末装置１０から受信した無線入力に重畳された、受信音声の評価スコアである第２の評価結果を受信し、この第２の評価結果を、通話相手の端末装置１０に固有の識別子とともに通話相手受信音声状態提示部１８６２に送出する。さらに、受信部１８２３は、サーバ２０から送出された、第４の判定結果である、端末装置１０のユーザから発声された音声データの音声パワーが、現在通話中の複数の端末装置１０の音声パワーから算出された音声パワーの平均値が所定値以上であると判定された判定結果を受信し、この第４の評価結果を音声パワー状態提示部１８６３に送出する。 Receiving section 1823 receives radio input transmitted from transmitting/receiving section 2031 of server 20 and received via an antenna (not shown) included in receiving section 1823, and transmits the received radio input to the decoding processing section. Here, the wireless input received by the receiving unit 1823 is provided with an identifier unique to the terminal device 10 of the other party of communication, which is provided by the transmitting unit 1822 of the terminal device 10 of the other party of communication. In addition, the receiving unit 1823 receives the second evaluation result, which is the evaluation score of the received voice superimposed on the wireless input received from the terminal device 10 of the other party, and transmits the second evaluation result to the other party. It is sent to the calling party reception voice state presenting unit 1862 together with an identifier unique to the terminal device 10 . Furthermore, the receiving unit 1823 determines that the voice power of the voice data uttered by the user of the terminal device 10, which is the fourth determination result sent from the server 20, is equal to the voice power of the plurality of terminal devices 10 currently engaged in a call. It receives the determination result that the average value of the voice power calculated from the above is equal to or greater than the predetermined value, and sends this fourth evaluation result to the voice power state presenting unit 1863 .

復号化処理部１８２４は、受信部１８２３が受信した受信データに復号化処理部（デコード処理）を行って音声データに復号し、この音声データを音声出力部１８４に出力する。また、復号化処理部１８２４は、受信部１８２３が受信した、現在通話中の端末装置１０から送出された信号を復号化処理した際のパケットロス率（パケット損失率）を算出し、算出したパケットロス率を受信音声評価部１８５５に送出する。パケットロス率の算出方法については既知であり、通信規格で定められている場合もあるので、ここでの詳細な説明は行わない。この際、復号化処理部１８２４は、算出したパケットロス率を、通話相手の端末装置１０の識別子を付与して受信音声評価部１８５５に送出する。 The decoding processing unit 1824 decodes the received data received by the receiving unit 1823 to audio data, and outputs the audio data to the audio output unit 184 . In addition, the decoding processing unit 1824 calculates the packet loss rate (packet loss rate) when decoding the signal sent from the terminal device 10 currently in communication, received by the receiving unit 1823, and calculates the calculated packet The loss rate is sent to received voice evaluation section 1855 . The method of calculating the packet loss rate is well known and may be specified in the communication standard, so detailed description will not be given here. At this time, the decoding processing unit 1824 adds the identifier of the terminal device 10 of the other party to the calculated packet loss rate and sends it to the received voice evaluation unit 1855 .

音声パワー検出部１８５１は、発話者である端末装置１０のユーザが発話した音声を音声入力部１８３が変換した音声データの音声パワーを検出し、検出結果を入力音声評価部１８５４及び送信部１８２２に出力する。この際、音声パワー検出部１８５１は、音声パワーの検出結果を量子化してスコアとして出力する。量子化の手法には特段の限定はないが、本実施形態の音声パワー検出部１８５１では、良好であれば０、やや悪い場合は－１、悪い場合は－２というスコアとして音声パワーの検出結果を出力する。 The voice power detection unit 1851 detects the voice power of the voice data obtained by converting the voice uttered by the user of the terminal device 10, who is the speaker, by the voice input unit 183, and outputs the detection result to the input voice evaluation unit 1854 and the transmission unit 1822. Output. At this time, the voice power detection unit 1851 quantizes the voice power detection result and outputs it as a score. The quantization method is not particularly limited, but in the voice power detection unit 1851 of this embodiment, the voice power detection result is given as a score of 0 if good, -1 if slightly bad, and -2 if bad. to output

SN比検出部１８５２は、発話者である端末装置１０のユーザが発話した音声を音声入力部１８３が変換した音声データのSN比を検出し、検出結果を入力音声評価部１８５４に出力する。この際、SN比検出部１８５２は、SN比の検出結果を量子化してスコアとして出力する。量子化の手法には特段の限定はないが、本実施形態のSN比検出部１８５２では、良好であれば０、やや悪い場合は－１、悪い場合は－２というスコアとしてSN比の検出結果を出力する。SN比検出部１８５２は、SN比を入力データとし、入力したSN比に対する評価を正解出力データとして学習された学習済みモデルを用いて実現されてもよい。 The SN ratio detection unit 1852 detects the SN ratio of the voice data obtained by converting the voice uttered by the user of the terminal device 10 , who is the speaker, by the voice input unit 183 , and outputs the detection result to the input voice evaluation unit 1854 . At this time, the SN ratio detection unit 1852 quantizes the detection result of the SN ratio and outputs it as a score. The quantization method is not particularly limited, but in the SN ratio detection unit 1852 of the present embodiment, the SN ratio detection result is given as a score of 0 if good, -1 if slightly bad, and -2 if bad. to output The SN ratio detection unit 1852 may be implemented using a trained model that has been trained using the SN ratio as input data and the evaluation of the input SN ratio as correct output data.

マイク特性検出部１８５３は、発話者である端末装置１０のユーザが発話した音声を音声入力部１８３が変換した音声データから音声入力部１８３（マイク１３４）のマイク特性を検出し、検出結果を入力音声評価部１８５４に出力する。この際、マイク特性検出部１８５３は、マイク特性の検出結果を量子化してスコアとして出力する。量子化の手法には特段の限定はないが、本実施形態のマイク特性検出部１８５３では、良好であれば０、やや悪い場合は－１、悪い場合は－２というスコアとしてマイク特性の検出結果を出力する。マイク特性検出部１８５３は、マイク特性を入力データとし、入力したマイク特性に対する評価を正解出力データとして学習された学習済みモデルを用いて実現されてもよい。 The microphone characteristic detection unit 1853 detects the microphone characteristics of the voice input unit 183 (microphone 134) from the voice data obtained by converting the voice uttered by the user of the terminal device 10, who is the speaker, by the voice input unit 183, and inputs the detection result. Output to the speech evaluation unit 1854 . At this time, the microphone characteristic detection unit 1853 quantizes the microphone characteristic detection result and outputs it as a score. The quantization method is not particularly limited, but the microphone characteristic detection unit 1853 of the present embodiment gives the detection result of the microphone characteristic as a score of 0 if good, -1 if slightly bad, and -2 if bad. to output The microphone characteristic detection unit 1853 may be implemented using a trained model trained using microphone characteristics as input data and evaluation of the input microphone characteristics as correct output data.

好ましくは、音声パワー検出部１８５１、SN比検出部１８５２及びマイク特性検出部１８５３は、端末装置１０のユーザが発音をする度に、つまり、音声入力部１８３に音声が入力される度に検出動作を行う。あるいは、音声パワー検出部１８５１、SN比検出部１８５２及びマイク特性検出部１８５３はリアルタイムで（つまり連続的に）検出作業を行ってもよいし、所定時間間隔を（例えば１秒毎）おいて検出作業を行ってもよい。 Preferably, the voice power detection unit 1851, the SN ratio detection unit 1852, and the microphone characteristics detection unit 1853 perform the detection operation every time the user of the terminal device 10 pronounces, that is, every time voice is input to the voice input unit 183. I do. Alternatively, the voice power detection unit 1851, the SN ratio detection unit 1852, and the microphone characteristic detection unit 1853 may perform detection work in real time (that is, continuously), or may perform detection at predetermined time intervals (for example, every second). You can work.

そして、音声パワー検出部１８５１、SN比検出部１８５２及びマイク特性検出部１８５３は、検出結果を検出結果データ１７２に格納する。検出結果データ１７２に格納される検出結果は、音声パワー検出部１８５１、SN比検出部１８５２及びマイク特性検出部１８５３による検出動作が行われる度に更新される。従って、検出結果データ１７２に格納される検出結果は、音声入力部１８３に音声が入力される度に更新される。そして、音声入力が途絶えると、検出結果データ１７２の更新（上書き）がされないので、結果的に、最後の検出値が検出結果データ１７２に保持される。 Then, the voice power detection unit 1851 , the SN ratio detection unit 1852 and the microphone characteristics detection unit 1853 store detection results in the detection result data 172 . The detection result stored in the detection result data 172 is updated each time the detection operation by the voice power detection section 1851, the SN ratio detection section 1852, and the microphone characteristic detection section 1853 is performed. Therefore, the detection result stored in the detection result data 172 is updated each time a voice is input to the voice input unit 183 . When the voice input stops, the detection result data 172 is not updated (overwritten), so that the detection result data 172 retains the last detection value.

なお、音声パワー検出部１８５１、SN比検出部１８５２及びマイク特性検出部１８５３は、自身の検出結果を０、－１、－２の３種類のスコアに量子化していたが、量子化のレベルもこの３段階に限定されず、２段階、あるいは４段階以上に量子化してもよい。 The voice power detection unit 1851, the SN ratio detection unit 1852, and the microphone characteristics detection unit 1853 quantize their own detection results into three types of scores of 0, -1, and -2, but the quantization level is also The quantization is not limited to these three stages, and may be quantized in two stages, or in four or more stages.

入力音声評価部１８５４は、音声パワー検出部１８５１、SN比検出部１８５２及びマイク特性検出部１８５３からそれぞれ入力された検出結果を評価値として量子化し（スコア化し）、統合スコアを決定して入力音声状態提示部１８６１及び通話相手受信音声状態提示部１８６２に出力する。入力音声状態提示部１８６１から出力される統合スコアが、第３の評価結果である、端末装置１０のユーザから発声された音声に基づく音声データの品質の評価結果である。 The input speech evaluation unit 1854 quantizes (scores) the detection results respectively input from the audio power detection unit 1851, the SN ratio detection unit 1852, and the microphone characteristics detection unit 1853 as evaluation values, determines an integrated score, and determines the input speech. It is output to the state presenting section 1861 and the other party's received voice state presenting section 1862 . The integrated score output from the input speech state presentation unit 1861 is the evaluation result of the quality of the speech data based on the speech uttered by the user of the terminal device 10, which is the third evaluation result.

入力音声評価部１８５４による統合スコアの算出手法に特段の限定はないが、本実施形態の入力音声評価部１８５４では、音声パワー検出部１８５１、SN比検出部１８５２及びマイク特性検出部１８５３からそれぞれ入力された量子化された検出結果であるスコアを加算し、加算された値（和）に基づいて統合スコアを決定している。一例として、音声パワー検出部１８５１、SN比検出部１８５２及びマイク特性検出部１８５３からそれぞれ入力されたスコアの和が０であれば良好、－１であればやや悪い、－２であれば悪い、とする。より詳細には、音声パワー検出部１８５１から入力されたスコアが良好（０）、SN比検出部１８５２から入力されたスコアが良好（０）、マイク特性検出部１８５３から入力されたスコアが良好（０）であれば、入力音声評価部１８５４は、これらスコアの和が０になることから、統合スコアは良好（０）となる。また、音声パワー検出部１８５１から入力されたスコアが良好（０）、SN比検出部１８５２から入力されたスコアがやや悪い（－１）、マイク特性検出部１８５３から入力されたスコアがやや悪い（－１）であれば、入力音声評価部１８５４は、これらスコアの和が－２になることから、統合スコアは悪い（－２）となる。ここで、音声パワー検出部１８５１、SN比検出部１８５２、及びマイク特性検出部１８５３から入力されたスコアの和が－２以下になる場合は、入力音声評価部１８５４は統合スコアを悪い（－２）として扱うというようにしてもよい。一例として、音声パワー検出部１８５１、SN比検出部１８５２、及びマイク特性検出部１８５３から入力されたスコアがいずれもやや悪い（－１）であれば、これらスコアの和は－３になるが、入力音声評価部１８５４が算出する統合スコアとしては悪い（－２）という評価にしてもよい。また入力音声評価部１８５４による統合スコアの量子化の段階も０、－１、－２の３段階に限定されず、２段階、あるいは４段階以上に量子化してもよい。加えて、音声パワー検出部１８５１、SN比検出部１８５２及びマイク特性検出部１８５３からの入力スコアの和を単純に取る以外の統合スコアの算出方法も可能である。一例として、各スコアについて所定の重み付けをつけて加算する、所定の関係式に基づいて四則演算、関数を用いて統合スコアを算出してもよい。 Although there is no particular limitation on the calculation method of the integrated score by the input speech evaluation unit 1854, in the input speech evaluation unit 1854 of this embodiment, the The scores that are the quantized detection results are added, and the integrated score is determined based on the added value (sum). For example, if the sum of the scores input from the voice power detection unit 1851, the SN ratio detection unit 1852, and the microphone characteristic detection unit 1853 is 0, it is good, -1 is somewhat bad, and -2 is bad. and More specifically, the score input from the voice power detection unit 1851 is good (0), the score input from the SN ratio detection unit 1852 is good (0), and the score input from the microphone characteristics detection unit 1853 is good ( 0), the input speech evaluation unit 1854 determines that the sum of these scores is 0, so the integrated score is good (0). Also, the score input from the voice power detection unit 1851 is good (0), the score input from the SN ratio detection unit 1852 is slightly bad (-1), and the score input from the microphone characteristics detection unit 1853 is bad ( If it is -1), the input speech evaluation unit 1854 determines that the sum of these scores is -2, so the integrated score is bad (-2). Here, when the sum of the scores input from the voice power detection unit 1851, the SN ratio detection unit 1852, and the microphone characteristics detection unit 1853 is -2 or less, the input voice evaluation unit 1854 evaluates the integrated score as bad (-2 ) may be treated as As an example, if the scores input from the voice power detection unit 1851, the SN ratio detection unit 1852, and the microphone characteristics detection unit 1853 are all slightly bad (-1), the sum of these scores will be -3. The integrated score calculated by the input speech evaluation unit 1854 may be evaluated as bad (-2). Also, the quantization stage of the integrated score by the input speech evaluation unit 1854 is not limited to the three stages of 0, -1, and -2, and may be quantized in two stages or four or more stages. In addition, an integrated score calculation method other than simply taking the sum of the input scores from the audio power detection unit 1851, the SN ratio detection unit 1852, and the microphone characteristics detection unit 1853 is possible. As an example, an integrated score may be calculated using four arithmetic operations and functions based on a predetermined relational expression in which each score is added with a predetermined weight.

受信音声評価部１８５５は、復号化処理部１８２４から出力される、復号化処理部１８２４における複合処理時のパケットロス率に基づいて、受信音声の評価スコアを算出し、この評価スコアを送信部１８２２に送出する。この際、受信音声評価部１８５５は、受信音声の評価スコアの算出結果を量子化してスコアとして出力する。量子化の手法には特段の限定はないが、本実施形態の受信音声評価部１８５５では、良好であれば０、やや悪い場合は－１、悪い場合は－２というスコアとして受信音声の評価スコアの算出結果を出力する。さらに、受信音声評価部１８５５は、算出した評価スコアに、この評価スコアに係る音声データを出力した端末装置１０に固有の識別子を付与して送信部１８２２に送出する。 Received audio evaluation section 1855 calculates the evaluation score of received audio based on the packet loss rate during compound processing in decoding processing section 1824, which is output from decoding processing section 1824, and transmits this evaluation score to transmission section 1822. send to At this time, the received voice evaluation unit 1855 quantizes the calculation result of the evaluation score of the received voice and outputs it as a score. The quantization method is not particularly limited, but the received voice evaluation unit 1855 of this embodiment evaluates the received voice as a score of 0 if good, -1 if somewhat bad, and -2 if bad. output the calculation result of Furthermore, the received voice evaluation unit 1855 adds an identifier unique to the terminal device 10 that has output the voice data related to this evaluation score to the calculated evaluation score, and transmits the result to the transmission unit 1822 .

なお、音声区間検出技術を用いて音声区間のみを音声判定部１８５が検出、評価作業を行ってもよい。この場合、受話音声の重なり具合に基づいた評価結果を出力することができる。例えば、３人以上で同時音声通話を行っているとき、あるユーザの発話に重畳して発話を行ったユーザに対して、自分の発話による音声通話の内容が他のユーザに聞き取りづらかった可能性があるという評価結果を出力してもよい。 Note that the speech determination unit 185 may detect and evaluate only speech segments using a speech segment detection technique. In this case, it is possible to output an evaluation result based on the degree of overlapping of received voices. For example, when three or more people are making a voice call at the same time, there is a possibility that it was difficult for the other users to hear the content of the voice call made by the user who superimposed the utterance on the other user's utterance. You may output the evaluation result that there exists.

また、通話相手の端末装置１０のマイク１３４に周辺の環境ノイズが大きく入っている場合、音声判定部１８５がこれを検知して、端末装置１０のユーザにその検知結果を通知してもよい。例えば、通話相手の周辺がうるさいので聞き取りづらかった可能性があるという検知結果を出力してもよい。 In addition, when the microphone 134 of the terminal device 10 of the other party is receiving a large amount of surrounding environmental noise, the voice determination unit 185 may detect this and notify the user of the terminal device 10 of the detection result. For example, a detection result may be output indicating that it may be difficult to hear the other party because the surroundings are noisy.

さらに、相互通話中の端末装置１０における個々の端末装置１０からの受話音声の音声区間検出時間を端末装置１０間で送受信し、この情報に基づいて音声判定部１８５が評価結果を出力してもよい。例えば、送信側の送信音声の音声区間検出時間と受信側の受信音声の音声区間検出時間との間にかなりの差異が生じている場合、受信側が聞き取りづらかった可能性があるという評価結果を出力してもよい。より具体的には、送信側の送信音声の音声区間検出時間は５秒であったが、受信側の受信音声の音声区間検出時間は３秒であった場合、受信部１８２３及び／または復号化処理部１８２４における処理に何かしらの問題があったことが推測される。このような問題はパケットロスとは別に生じうる。従って、送信側の送信音声の音声区間検出時間と受信側の受信音声の音声区間検出時間との間の不一致を検出することの意味は大きい。 Furthermore, even if the voice interval detection time of the received voice from each terminal device 10 is transmitted and received between the terminal devices 10 during mutual communication, and the voice determination unit 185 outputs the evaluation result based on this information. good. For example, if there is a significant difference between the voice interval detection time of the transmitted voice on the transmitting side and the voice interval detection time of the received voice on the receiving side, the evaluation result is output indicating that the receiving side may have had difficulty hearing. You may More specifically, when the voice interval detection time of the transmitted voice on the transmitting side was 5 seconds, but the voice interval detection time of the received voice on the receiving side was 3 seconds, the receiving unit 1823 and/or the decoding It is presumed that there was some problem in the processing in the processing unit 1824 . Such problems can occur separately from packet loss. Therefore, it is significant to detect a mismatch between the voice period detection time of the transmitted voice on the transmitting side and the voice period detection time of the received voice on the receiving side.

入力音声状態提示部１８６１は、入力音声評価部１８５４から出力される統合スコアを端末装置１０のディスプレイ１４１に表示する。 The input speech state presentation unit 1861 displays the integrated score output from the input speech evaluation unit 1854 on the display 141 of the terminal device 10 .

また、通話相手受信音声状態提示部１８６２は、入力音声評価部１８６４が出力する統合スコアと、受信部１８２３から出力される受信音声の評価スコアとに基づいて、通話相手受信音声状態のスコアを算出し、端末装置１０のディスプレイ１４１に表示させる。この際、通話相手受信音声状態提示部１８６２は、通話相手受信音声状態を量子化してスコアとして出力する。量子化の手法には特段の限定はないが、本実施形態の通話相手受信音声状態提示部１８６２では、入力音声評価部１８６４が出力する統合スコアは、良好であれば０、やや悪い場合は－１、悪い場合は－２というスコアとして量子化されており、また、受信部１８２３から出力される受信音声の評価スコアも、良好であれば０、やや悪い場合は－１、悪い場合は－２というスコアとして量子化されているので、これらスコアの和が０であれば良好、－１であればやや悪い、－２であれば悪い、とする。ここで、入力音声評価部１８６４が出力する統合スコアと受信部１８２３から出力される受信音声の評価スコアとの和がが－２以下になる場合は、通話相手受信音声状態提示部１８６２は統合スコアを悪い（－２）として扱うというようにしてもよい。 In addition, the receiving voice state presentation unit 1862 calculates the score of the receiving voice state of the calling party based on the integrated score output by the input voice evaluation unit 1864 and the evaluation score of the received voice output from the receiving unit 1823. and displayed on the display 141 of the terminal device 10 . At this time, the caller's received voice state presenting unit 1862 quantizes the caller's received voice state and outputs it as a score. Although there is no particular limitation on the method of quantization, in the calling party received voice state presentation unit 1862 of the present embodiment, the integrated score output by the input voice evaluation unit 1864 is 0 if it is good, and - It is quantized as a score of 1 if it is bad, and -2 if it is bad, and the evaluation score of the received voice output from the receiving unit 1823 is also 0 if it is good, -1 if it is bad, and -2 if it is bad. If the sum of these scores is 0, it is good, -1 is somewhat bad, and -2 is bad. Here, when the sum of the integrated score output by the input voice evaluation unit 1864 and the evaluation score of the received voice output from the receiving unit 1823 is -2 or less, the caller received voice state presentation unit 1862 may be treated as bad (-2).

好ましくは、通話相手受信音声状態提示部１８６２は、受信部１８２３から出力される受信音声の評価スコアも端末装置１０のディスプレイ１４１に表示させる。 Preferably, the calling party received voice state presenting unit 1862 also displays the evaluation score of the received voice output from the receiving unit 1823 on the display 141 of the terminal device 10 .

音声パワー状態提示部１８６３は、サーバ２０から送出された、端末装置１０のユーザから発声された音声データの音声パワーが、現在通話中の複数の端末装置１０の音声パワーから算出された音声パワーの平均値が所定値以上であると判定された判定結果を受け取り、この判定結果に基づいて、端末装置１０のユーザ自身が発話した音声のパワー状態を端末装置１０のディスプレイ１４１に表示させる。このとき、相互通話中のユーザの中にまだ発話をしていないユーザがいる間は、音声パワー状態提示部１８６３は音声のパワー状態を端末装置１０のディスプレイ１４１に表示させない。 The voice power state presentation unit 1863 determines whether the voice power of the voice data uttered by the user of the terminal device 10 sent from the server 20 is the voice power calculated from the voice powers of the plurality of terminal devices 10 currently engaged in a call. A determination result indicating that the average value is equal to or greater than a predetermined value is received, and the power state of the voice uttered by the user of the terminal device 10 is displayed on the display 141 of the terminal device 10 based on the determination result. At this time, the voice power state display unit 1863 does not display the voice power state on the display 141 of the terminal device 10 while there is a user who has not yet spoken among the users who are engaged in mutual communication.

入力音声状態提示部１８６１による統合スコア、通話相手受信音声状態提示部１８６２による受信音声の評価スコア及び通話相手受信音声状態のスコアの表示形態、さらには音声パワー状態提示部１８６３による判定結果の表示形態に特段の限定はなく、一例として、スコアそのものを表示する、スコアをスケール表示する、スコアに応じて表示色を変化させるなどが挙げられる。また、スコアが良好（０）である、すなわち、端末装置１０のユーザが発話した音声の状態が良好である、あるいは、通話相手の端末装置１０の受信音声の状態が良好であるならば、ディスプレイ１４１に何も表示せず、スコアがやや悪い（－１）、悪い（－２）時に、ディスプレイ１４１にこのスコアに応じた表示を行う表示形態であってもよい。 Display format of the integrated score by the input voice state presentation unit 1861, the received voice evaluation score and the score of the other party's received voice state by the caller reception voice state presentation unit 1862, and the display format of the determination result by the voice power state presentation unit 1863 is not particularly limited, and examples include displaying the score itself, displaying the score on a scale, and changing the display color according to the score. If the score is good (0), that is, if the state of the voice uttered by the user of the terminal device 10 is good, or if the state of the received voice of the terminal device 10 of the other party is good, the display A display mode may be adopted in which nothing is displayed on the display 141 and a display corresponding to the score is displayed on the display 141 when the score is slightly bad (-1) or bad (-2).

ここで、受信音声の評価スコアには、このスコアが作成された（評価された）端末装置１０に固有の識別子が付与されている。そこで、入力音声状態提示部１８６１及び通話相手受信音声状態提示部１８６２は、事前にこの識別子と、端末装置１０の記憶部１７０に格納されている、端末装置１０の通話相手である端末装置１０の電話帳（これには電話番号とユーザが登録した通話相手を識別するユーザ名等の固有名とが含まれる）との紐付けを行い、スコアを表示する際に、ユーザ名を表示したアイコンを伴った表示を行ってもよい。また、入力音声評価部１８５４から出力される統合スコアは、端末装置１０を所有するユーザに係るものであることが自明であるので、統合スコアを表示する際に、端末装置１０の初期登録時などにユーザが自身で登録したユーザ名を表示したアイコンを伴った表示を行ってもよい。 Here, an identifier unique to the terminal device 10 for which the score was created (evaluated) is assigned to the received voice evaluation score. Therefore, the input voice state presenting unit 1861 and the other party's receiving voice state presenting unit 1862 use this identifier and the information of the terminal device 10, which is the other party of the call, stored in the storage unit 170 of the terminal device 10 in advance. When linking the phone book (this includes the phone number and the unique name such as the user name that identifies the caller registered by the user), and displaying the score, the icon that displays the user name will be displayed. You may perform the display accompanying. In addition, since it is obvious that the integrated score output from the input speech evaluation unit 1854 relates to the user who owns the terminal device 10, when the integrated score is displayed, the initial registration of the terminal device 10, etc. may be displayed with an icon displaying the user name registered by the user himself/herself.

このように、スコアに係るユーザ名を表示したアイコンを表示する態様を採用した場合、スコアが良好（０）である、すなわち、端末装置１０のユーザが発話した音声の状態が良好である、あるいは、通話相手の端末装置１０の受信音声の状態が良好であるならば、ディスプレイ１４１にアイコンを表示せず、スコアがやや悪い（－１）、悪い（－２）時に、ディスプレイ１４１にアイコンを表示するとともに、このスコアに応じた表示を行ってもよい。さらに、スコアがやや悪い（－１）、悪い（－２）時に、アイコンの透明度を上げて視認しづらくする、アイコン全体を特定色（例えば赤色）に徐々に着色する、などの表示態様を行ってもよい。 In this way, when adopting the mode of displaying the icon displaying the user name related to the score, the score is good (0), that is, the state of the voice uttered by the user of the terminal device 10 is good, or , the icon is not displayed on the display 141 if the state of the received voice of the terminal device 10 of the other party is good, and the icon is displayed on the display 141 when the score is slightly poor (-1) or poor (-2). At the same time, display according to this score may be performed. In addition, when the score is slightly bad (-1) or bad (-2), the transparency of the icon is increased to make it difficult to see, or the entire icon is gradually colored in a specific color (for example, red). may

さらに、アイコンやスコアのスケールをユーザがタップ（タッチ）した際に、スコアの根拠となる情報を端末装置１０のディスプレイ１４１に表示してもよい。同様に、評価結果等をテキスト表示してもよい。評価結果等をテキスト表示する場合、ディスプレイ１４１に一定時間（一例として数秒）だけ表示し、ユーザがディスプレイ１４１をタップする等の入力動作を行うとこのテキスト表示を消去することが好ましい。表示態様の具体例については後述する。 Furthermore, when the user taps (touches) an icon or a score scale, information that serves as the basis for the score may be displayed on the display 141 of the terminal device 10 . Similarly, evaluation results and the like may be displayed as text. When the evaluation result or the like is displayed in text, it is preferable to display the text on the display 141 for a certain period of time (for example, several seconds) and erase the text display when the user performs an input operation such as tapping the display 141 . A specific example of the display mode will be described later.

さらに、あるいは、提示制御部１８６による提示形態は、上述した端末装置１０のディスプレイ１４１に表示する形態のみならず、端末装置１０のスピーカ１４２を用いて音声により通知する形態も可能である。より詳細には、例えば、提示制御部１８６は、音声判定部１８５や受信部１８２３から入力された評価結果そのもの、あるいは、評価結果に基づいて事前に用意されたメッセージを音声合成により音声データに変換し、この音声データによりユーザに提示（通知）することも可能である。あるいは、提示制御部１８６は警報音、ブザー音やチャイム等の単純な音データによりユーザに提示（通知）することも可能である。提示制御部１８６が生成した音（声）データは音声出力部１８４に入力され、この音声出力部１８４及びスピーカ１４２を介してユーザに提示される。 Furthermore, alternatively, the form of presentation by the presentation control unit 186 is not limited to the form of displaying on the display 141 of the terminal device 10 described above, but can also be a form of notifying by voice using the speaker 142 of the terminal device 10 . More specifically, for example, the presentation control unit 186 converts the evaluation result input from the speech determination unit 185 or the reception unit 1823 or a message prepared in advance based on the evaluation result into speech data by speech synthesis. However, it is also possible to present (notify) the user with this voice data. Alternatively, the presentation control unit 186 can present (notify) the user with simple sound data such as an alarm sound, buzzer sound, or chime. Sound (voice) data generated by the presentation control unit 186 is input to the audio output unit 184 and presented to the user via the audio output unit 184 and the speaker 142 .

提示制御部１８６の音声による通知形態にも特段の限定はないが、判定結果そのもの、及び／または、判定結果（スコア）の根拠となる情報を音声通知してもよい。この時、上述したように、現在音声通話中のユーザは既知であるので、どのユーザの受信音声状態に関する情報（例えば電話帳に登録されているユーザの登録名）であるかを合わせて音声通知してもよい。端末装置１０のユーザの発話による入力音声情報に関する情報については、ユーザ自身のユーザ名を合わせて音声通知してもよいし、省略してもよい。音声通知の手法は既知の手段から適宜選択されればよく、音声合成モジュールによるなど、特段の限定はない。 There is no particular limitation on the form of notification by voice of the presentation control unit 186, but the determination result itself and/or information that is the basis of the determination result (score) may be notified by voice. At this time, as described above, the user who is currently engaged in voice communication is already known, so the voice notification is made together with the information on the received voice state of which user (for example, the registered name of the user registered in the telephone directory). You may Information related to the input voice information uttered by the user of the terminal device 10 may be voice-notified together with the user's own user name, or may be omitted. The method of voice notification may be appropriately selected from known means, and there is no particular limitation such as using a voice synthesis module.

また、音声区間検出技術を用いて音声区間のみを音声判定部１８５が検出、評価作業を行ってもよい。
＜１．２サーバの構成＞ Alternatively, the speech determination unit 185 may detect and evaluate only speech segments using a speech segment detection technique.
<1.2 Server configuration>

図４は、本実施形態のシステム１に含まれるサーバ２０の機能的な構成を示すブロック図である。図４に示すように、サーバ２０は、通信部２０１、記憶部２０２、及び制御部２０３としての機能を発揮する。 FIG. 4 is a block diagram showing the functional configuration of the server 20 included in the system 1 of this embodiment. As shown in FIG. 4 , the server 20 functions as a communication section 201 , a storage section 202 and a control section 203 .

通信部２０１は、サーバ２０が外部の装置と通信するための処理を行う。 The communication unit 201 performs processing for the server 20 to communicate with an external device.

記憶部２０２は、例えば、メモリ２５、及びストレージ２６等により実現され、サーバ２０が使用するデータ、及びプログラムを記憶する。具体的には、記憶部２０２は、例えば、アプリケーション２０２１を記憶する。アプリケーション２０２１は、制御部２０３により実行されることで、後述する制御部２０３における送受信部２０３１等の機能実現部を実現する。また、記憶部２０２は、音声パワー分布計算部２０３４による計算結果が格納される計算結果データ２０２２を記憶する。 The storage unit 202 is realized by, for example, the memory 25 and the storage 26, and stores data and programs used by the server 20. FIG. Specifically, the storage unit 202 stores an application 2021, for example. The application 2021 is executed by the control unit 203 to implement a function implementation unit such as a transmission/reception unit 2031 in the control unit 203, which will be described later. The storage unit 202 also stores calculation result data 2022 in which the calculation result by the audio power distribution calculation unit 2034 is stored.

制御部２０３は、プロセッサ２９が記憶部２０２に記憶されるプログラム（アプリケーション２０２１）を読み込み、プログラムに含まれる命令を実行することにより実現される。制御部２０３は、サーバ２０の動作を制御する。具体的には、例えば、制御部２０３は、送受信部２０３１、記憶制御部２０３２、通信制御部２０３３、音声パワー分布計算部２０３４、及び音声パワー判定部２０３５としての機能を発揮する。 The control unit 203 is implemented by the processor 29 reading a program (application 2021) stored in the storage unit 202 and executing instructions included in the program. The control unit 203 controls operations of the server 20 . Specifically, for example, the control unit 203 functions as a transmission/reception unit 2031 , a storage control unit 2032 , a communication control unit 2033 , an audio power distribution calculation unit 2034 and an audio power determination unit 2035 .

送受信部２０３１は、サーバ２０が、端末装置１０等の外部の装置と、通信プロトコルに従ってデータを送受信する処理を制御する。 The transmitting/receiving unit 2031 controls processing for transmitting/receiving data between the server 20 and an external device such as the terminal device 10 according to a communication protocol.

記憶制御部２０３２は、記憶部２０２への情報の記憶を制御する。 Storage control unit 2032 controls storage of information in storage unit 202 .

通信制御部２０３３は、端末装置１０間で、所定の通信プロトコルに従った移動体通信を実現するための処理を行う。 The communication control unit 2033 performs processing for realizing mobile communication according to a predetermined communication protocol between the terminal devices 10 .

音声パワー分布計算部２０３４は、サーバ２０を介して現在相互通話中の複数の端末装置１０の音声パワー検出部１８５１で検出された発話者毎の音声パワーに基づいて、相互通話中の複数の発話者の音声パワーの平均と分散を算出し、音声パワー判定部２０３５に送出する。また、音声パワー分布計算部２０３４は、算出結果を計算結果データ２０２２に格納する。 The voice power distribution calculation unit 2034 calculates a plurality of utterances during mutual communication based on the voice power of each speaker detected by the voice power detection unit 1851 of the plurality of terminal devices 10 currently engaged in mutual communication via the server 20. The average and variance of the voice power of the user are calculated and sent to the voice power determination section 2035 . Also, the voice power distribution calculator 2034 stores the calculation result in the calculation result data 2022 .

音声パワー判定部２０３５は、音声パワー分布計算部２０３４により算出された音声パワーの平均と分散とに基づいて、発話者毎の音声パワーと音声パワーの平均とのズレを算出し、このズレが所定値以上であるか否かを判定し、判定結果を相互通話中の端末装置１０に通知する。 The voice power determination unit 2035 calculates the difference between the voice power of each speaker and the average of the voice power based on the average and variance of the voice power calculated by the voice power distribution calculation unit 2034. It determines whether or not it is equal to or greater than the value, and notifies the terminal device 10 during mutual communication of the determination result.

音声パワー分布計算部２０３４及び音声パワー判定部２０３５は、所定の間隔、好ましくは少なくとも１秒以上の時間間隔を置いて音声パワーの平均及び分散の算出動作及び判定動作を行うことが好ましい。これは、あまり短い時間間隔で音声パワーの平均及び分散の算出動作及び判定動作を行うと、端末装置１０における音声パワー状態表示（通知）が不安定になるからである。同様に、音声パワー分布計算部２０３４及び音声パワー判定部２０３５は、音声パワーの平均値・分散の計算結果を安定化させるため、所定回数の過去の計算結果を用いて移動平均を取ることで、平滑化した計算結果を用いても良い。 It is preferable that the voice power distribution calculation unit 2034 and the voice power determination unit 2035 perform the calculation operation and the determination operation of the mean and variance of the voice power at predetermined intervals, preferably at least one second or more. This is because the voice power state display (notification) in the terminal device 10 becomes unstable if the average and variance of voice power are calculated and determined at too short time intervals. Similarly, the audio power distribution calculation unit 2034 and the audio power determination unit 2035 take a moving average using past calculation results of a predetermined number of times in order to stabilize the calculation results of the average value and variance of the audio power. A smoothed calculation result may be used.

また、音声パワー分布計算部２０３４及び音声パワー判定部２０３５は、相互通話中のユーザに変更があった場合、音声パワーの平均及び分散の算出動作及び判定動作をやり直す。つまり、新たに音声パワーの平均及び分散の算出動作及び判定動作を行う。 Also, when there is a change in the user during mutual communication, the voice power distribution calculation unit 2034 and the voice power determination unit 2035 redo the voice power average and variance calculation and determination operations. That is, the operation of calculating and determining the mean and variance of voice power is newly performed.

＜２データ構造＞
図５及び図６は、端末装置１０が記憶するデータベースのデータ構造を示す図である。なお、図５及び図６は一例であり、記載されていないデータを除外するものではない。 <2 Data structure>
5 and 6 are diagrams showing the data structure of the database stored in the terminal device 10. FIG. Note that FIGS. 5 and 6 are examples, and do not exclude data not described.

図５は、評価テーブル１７３のデータ構造を示す図である。図５に示すように、評価テーブル１７３には、音声パワー、SN比及びマイク特性のそれぞれについて良好、やや悪い及び悪いと入力音声評価部１８５４が評価した際のスコアが格納されている。また、評価テーブル１７３は、入力音声評価部１８５４がこれらスコアを合算して統合スコアを算出した結果、最終的な評価結果を良好、やや悪い及び悪いのいずれにするかのテーブルである。 FIG. 5 is a diagram showing the data structure of the evaluation table 173. As shown in FIG. As shown in FIG. 5, the evaluation table 173 stores scores when the input speech evaluation unit 1854 evaluates the audio power, SN ratio, and microphone characteristics as good, somewhat bad, and bad. Also, the evaluation table 173 is a table indicating whether the final evaluation result is good, somewhat bad, or bad as a result of the input speech evaluation unit 1854 summing these scores to calculate an integrated score.

図６は、検出結果データ１７２のデータ構造を示す図である。図６に示すように、検出結果データ１７２のレコードの各々は、例えば、項目「通話ＩＤ」と、項目「開始時刻」と、項目「終了時刻」と、項目「ユーザＩＤ」と、項目「音声パワースコア」と、項目「SN比スコア」と、項目「マイク特性スコア」と、項目「入力音声スコア」と、項目「受信音声スコア」とを含む。検出結果データ１７２に記憶された情報は、音声判定部１８５による判定動作がされる度に更新される。 FIG. 6 is a diagram showing the data structure of the detection result data 172. As shown in FIG. As shown in FIG. 6, each record of the detection result data 172 includes, for example, the item “call ID”, the item “start time”, the item “end time”, the item “user ID”, and the item “voice power score", item "SN ratio score", item "microphone characteristic score", item "input voice score", and item "received voice score". The information stored in the detection result data 172 is updated each time the voice determination unit 185 performs a determination operation.

項目「通話ＩＤ」は、端末装置１０が通話相手である他の端末装置１０との間で行った個々の通話を識別するための情報である。項目「開始時刻」は、項目「通話ＩＤ」により特定される通話が開始した時刻の情報である。項目「終了時刻」は、項目「通話ＩＤ」により特定される通話が終了した時刻の情報である。項目「ユーザＩＤ」は、項目「通話ＩＤ」により特定される通話の相手である他の端末装置１０を識別するための情報である。好ましくは、項目「ユーザＩＤ」は、受信部１８２３が受信した、他の端末装置１０を識別するための識別子である。なお、端末装置１０による通話は３人以上のユーザによる同時通話も可能であるので、項目「ユーザＩＤ」には複数の識別情報が格納されうる。項目「音声パワースコア」は、音声パワー検出部１８５１により検出された結果であるスコアである。項目「SN比スコア」は、SN比検出部１８５２により検出された結果であるスコアである。項目「マイク特性スコア」は、マイク特性検出部１８５３により検出された結果であるスコアである。項目「入力音声スコア」は、入力音声評価部１８５４により判定された結果であるスコアである。項目「受信音声スコア」は、受信音声評価部１８５５により評価された結果であるスコアである。 The item “call ID” is information for identifying individual calls made between the terminal device 10 and another terminal device 10 that is a call partner. The item "start time" is information on the time when the call specified by the item "call ID" started. The item "end time" is information on the time when the call specified by the item "call ID" ended. The item “user ID” is information for identifying the other terminal device 10 that is the other party of the call specified by the item “call ID”. Preferably, the item “user ID” is an identifier for identifying another terminal device 10 received by the receiving unit 1823 . Since three or more users can make simultaneous calls using the terminal device 10, a plurality of pieces of identification information can be stored in the item "user ID". The item “audio power score” is a score that is the result of detection by the audio power detection unit 1851 . The item “SN ratio score” is a score that is the result of detection by the SN ratio detection unit 1852 . The item “microphone characteristic score” is a score that is the result of detection by the microphone characteristic detection unit 1853 . The item “input speech score” is a score that is the result of determination by the input speech evaluation unit 1854 . The item “received voice score” is the score that is the result of evaluation by the received voice evaluation unit 1855 .

＜３動作例＞
以下、端末装置１０及びサーバ２０の動作の一例について説明する。 <3 Operation example>
An example of operations of the terminal device 10 and the server 20 will be described below.

図７は、端末装置１０の動作の一例を表すフローチャートである。図７は、端末装置１０のユーザが発話した入力音声に基づいて、音声判定部１８５が検出及び判定動作を行い、提示制御部１８６を介してユーザに通知する際の動作の例を表すフローチャートである。 FIG. 7 is a flow chart showing an example of the operation of the terminal device 10. As shown in FIG. FIG. 7 is a flow chart showing an example of the operation when the voice determination unit 185 detects and determines based on the input voice uttered by the user of the terminal device 10 and notifies the user via the presentation control unit 186. be.

なお、図７～図９のフローチャート及びシーケンス図に示す各種動作は並行して実行されうる。 Various operations shown in the flowcharts and sequence diagrams of FIGS. 7 to 9 can be executed in parallel.

ステップＳ１１において、端末装置１０の制御部１８０は、サーバ２０を経由して特定の端末装置１０宛に発呼通信を行う。具体的には、例えば、制御部１８０は、サーバ２０の送受信部２０３１及び通信制御部２０３３を経由して、通信制御部１８２により特定の端末装置１０宛に発呼動作を行う。通信制御部１８２による発呼動作については既知の動作であるので、これ以上の説明は行わない。なお、図７のフローチャートにおいては、端末装置１０から発呼動作を行う例を示しているが、端末装置１０が着呼動作を行う場合でも同様の動作が行われる。つまり、図７のフローチャートに示す動作において、端末装置１０が発呼動作を行うか着呼動作を行うかは任意である。 In step S11 , the control unit 180 of the terminal device 10 performs call communication to a specific terminal device 10 via the server 20 . Specifically, for example, the control unit 180 causes the communication control unit 182 to make a call to a specific terminal device 10 via the transmission/reception unit 2031 and the communication control unit 2033 of the server 20 . Since the calling operation by the communication control unit 182 is a known operation, no further explanation will be given. Although the flowchart of FIG. 7 shows an example in which the terminal device 10 performs a calling operation, the same operation is performed when the terminal device 10 performs a call receiving operation. That is, in the operation shown in the flowchart of FIG. 7, it is arbitrary whether the terminal device 10 performs the calling operation or the receiving operation.

そして、ステップＳ１２において、制御部１８０は、ステップＳ１１で発呼通信を行った特定の端末装置１０が着呼するのを待ち、着呼したら（ステップＳ１２においてＹＥＳ）ステップＳ１３以降の処理を行う。 Then, in step S12, the control unit 180 waits for a call to be received from the specific terminal device 10 that performed call-originating communication in step S11.

この後、制御部１８０は、着呼をした特定の端末装置１０、すなわち通話相手の特定の端末装置１０との間で音声通話を行う。具体的には、例えば、制御部１８０は、音声入力部１８３により端末装置１０のユーザが発話した音声入力の処理を行って音声データに変換し、通信制御部１８２により処理が行われた音声データを、サーバ２０の送受信部２０３１を介して特定の端末装置１０に送信し、また、通信制御部１８２によりサーバ２０の送受信部２０３１を介して特定の端末装置１０から送信された音声データを受信して処理を行い、音声出力部１８４により音声に変換する。 Thereafter, the control unit 180 performs a voice call with the specific terminal device 10 that received the call, that is, the specific terminal device 10 of the other party. Specifically, for example, the control unit 180 processes voice input uttered by the user of the terminal device 10 using the voice input unit 183 and converts it into voice data, and the voice data processed by the communication control unit 182. is transmitted to the specific terminal device 10 via the transmission/reception unit 2031 of the server 20, and the voice data transmitted from the specific terminal device 10 is received by the communication control unit 182 via the transmission/reception unit 2031 of the server 20. Then, the audio output unit 184 converts it into audio.

なお、本実施形態のシステム１では、３人以上のユーザによる相互通話も可能である。どの時点で相互通話を行うユーザを追加するか（発呼動作を行うか、着呼動作を行うか）についての詳細な説明はここでは行わない。 In addition, in the system 1 of the present embodiment, it is also possible for three or more users to make mutual calls. A detailed description of when to add a user for intercommunication (whether to perform an outgoing call operation or an incoming call operation) is not provided here.

ステップＳ１３において、制御部１８０は、端末装置１０のユーザが発話した入力音声を受け入れる。具体的には、例えば、制御部１８０は、音声入力部１８３により、端末装置１０のユーザが発話した入力音声を受け入れる。そして、ステップＳ１４～Ｓ１６において、制御部１８０は、ステップＳ１３において受け入れた、端末装置１０のユーザが発話した入力音声についての検出処理を行う。具体的には、例えば、制御部１８０は、音声パワー検出部１８５１により入力音声の音声パワーを検出し、SN比検出部１８５２により入力音声のSN比を検出し、マイク特性検出部１８５３により入力音声のマイク特性を検出する。これらステップＳ１４～Ｓ１６に示す処理はいずれも並行して実行される。 In step S13 , the control unit 180 accepts the input voice uttered by the user of the terminal device 10 . Specifically, for example, the control unit 180 receives an input voice uttered by the user of the terminal device 10 through the voice input unit 183 . Then, in steps S14 to S16, the control unit 180 performs detection processing for the input voice uttered by the user of the terminal device 10, which is accepted in step S13. Specifically, for example, the control unit 180 detects the sound power of the input sound with the sound power detection unit 1851, detects the SN ratio of the input sound with the SN ratio detection unit 1852, and detects the input sound with the microphone characteristics detection unit 1853. to detect microphone characteristics. All of the processes shown in steps S14 to S16 are executed in parallel.

ステップＳ１７において、制御部１８０は、ステップＳ１４～Ｓ１６において検出された音声パワー等に基づいて、入力音声の評価動作を行う。具体的には、例えば、制御部１８０は、入力音声評価部１８５４により、検出された音声パワー等に基づいて、入力音声の評価動作を行い、統合スコアを算出する。そして、入力音声評価部１８５４は、評価結果を提示制御部１８６に送出する。 In step S17, the control section 180 performs evaluation operation of the input speech based on the speech power and the like detected in steps S14 to S16. Specifically, for example, the control unit 180 causes the input speech evaluation unit 1854 to perform an input speech evaluation operation based on the detected speech power and the like, and calculates an integrated score. Input speech evaluation section 1854 then sends the evaluation result to presentation control section 186 .

ステップＳ１８において、制御部１８０は、ステップＳ１６における評価結果を端末装置１０のディスプレイ１４１に表示させる。具体的には、例えば、制御部１８０は、提示制御部１８６（入力音声状態提示部１８６１及び通話相手受信音声状態提示部１８６２）により、入力音声評価部１８５４が評価した評価結果（算出した統合スコア）に基づいて、端末装置１０のディスプレイ１４１に評価結果を表示させる。 In step S18 , the control unit 180 causes the display 141 of the terminal device 10 to display the evaluation result in step S16 . Specifically, for example, the control unit 180 causes the presentation control unit 186 (the input voice state presentation unit 1861 and the other party's received voice state presentation unit 1862) to control the evaluation result (calculated integrated score) evaluated by the input voice evaluation unit 1854 ), the evaluation result is displayed on the display 141 of the terminal device 10 .

ステップＳ１９において、制御部１８０は、端末装置１０から通話切断の指示があったか、あるいは、通話相手の端末装置１０が通話切断を行ったか否かを判定する。そして、まだ通話中であれば（ステップＳ１９においてＮＯ）ステップＳ１３の処理に戻り、通話切断があったと判定したら（ステップＳ１９においてＹＥＳ）、通話切断処理を行い、図７に示すプログラムを終了する。 In step S19, the control unit 180 determines whether there is an instruction to disconnect the call from the terminal device 10, or whether the terminal device 10 of the other party has disconnected the call. If the call is still in progress (NO in step S19), the process returns to step S13, and if it is determined that the call has been disconnected (YES in step S19), the call disconnection process is performed and the program shown in FIG. 7 ends.

図８は、端末装置１０の動作の他の例を表すフローチャートである。図８は、ユーザが相互通話を行っている相手からの受話音声に基づいて、音声判定部１８５が検出及び判定動作を行い、通信制御部１８２を介して通話相手である他の端末装置１０に判定結果を送信する際の動作の例を表すフローチャートである。 FIG. 8 is a flow chart showing another example of the operation of the terminal device 10. FIG. In FIG. 8, based on the received voice from the other party with whom the user is making mutual calls, the voice determination unit 185 performs detection and determination operation, 9 is a flow chart showing an example of an operation when transmitting a determination result;

ステップＳ２１において、端末装置１０の制御部１８０は、サーバ２０を経由して特定の端末装置１０からの呼び出しを受ける。具体的には、例えば、制御部１８０は、サーバ２０の送受信部２０３１及び通信制御部２０３３を経由して、通信制御部１８２により特定の端末装置１０からの着呼動作を行う。通信制御部１８２による着呼動作については既知の動作であるので、これ以上の説明は行わない。なお、図８のフローチャートにおいては、端末装置１０から着呼動作を行う例を示しているが、端末装置１０が発呼動作を行う場合でも同様の動作が行われる。つまり、図８のフローチャートに示す動作においても、端末装置１０が発呼動作を行うか着呼動作を行うかは任意である。 In step S21 , the control unit 180 of the terminal device 10 receives a call from a specific terminal device 10 via the server 20 . Specifically, for example, the control unit 180 causes the communication control unit 182 to receive a call from a specific terminal device 10 via the transmission/reception unit 2031 and the communication control unit 2033 of the server 20 . Since the incoming call operation by the communication control unit 182 is a known operation, no further explanation will be given. Although the flowchart of FIG. 8 shows an example in which a call is received from the terminal device 10, the same operation is performed when the terminal device 10 performs a call origination operation. That is, in the operation shown in the flowchart of FIG. 8 as well, it is arbitrary whether the terminal device 10 performs the calling operation or the receiving operation.

そして、ステップＳ２２において、制御部１８０は、ステップＳ２１で着呼動作を行った特定の端末装置１０との間で通話が成立するのを待ち、通話が成立したら（ステップＳ２２においてＹＥＳ）ステップＳ２３以降の処理を行う。この後、制御部１８０は、通話が成立した特定の端末装置１０との間で音声通話を行う。 Then, in step S22, the control unit 180 waits until a call is established with the specific terminal device 10 that performed the incoming call operation in step S21. process. After that, the control unit 180 makes a voice call with the specific terminal device 10 with which the call has been established.

ステップＳ２３において、制御部１８０は、通話相手である特定の端末装置１０から送信されて端末装置１０で受信した、特定の端末装置１０からの音声データを受信する。具体的には、例えば、制御部１８０は、通話相手である特定の端末装置１０から送信され、サーバ２０の送受信部２０３１を介して送信された音声データを通信制御部１８２により受信する。 In step S23 , the control unit 180 receives voice data from the specific terminal device 10 that is transmitted from the specific terminal device 10 that is the other party and received by the terminal device 10 . Specifically, for example, the control unit 180 receives voice data transmitted from the specific terminal device 10 that is the communication partner and transmitted via the transmission/reception unit 2031 of the server 20 by the communication control unit 182 .

次いで、ステップＳ２４において、制御部１８０は、ステップＳ２３で受信した受信音声データの評価動作を行う。具体的には、例えば、制御部１８０は、受信音声評価部１８５５により、ステップＳ２３で受信した受信音声データの評価動作を行い、評価スコアを算出する。 Next, in step S24, the control section 180 performs an evaluation operation of the received audio data received in step S23. Specifically, for example, the control unit 180 causes the received voice evaluation unit 1855 to perform an evaluation operation on the received voice data received in step S23, and calculates an evaluation score.

さらに、ステップＳ２５において、制御部１８０は、ステップＳ２４で行った受信音声評価結果を、音声を送信した他の端末装置１０に送信する。具体的には、例えば、制御部１８０は、入力音声評価部１８５４により、受信音声評価結果を通信制御部１８２及びサーバ２０の送受信部２０３１を介して、音声を送信した特定の端末装置１０に送信する。 Furthermore, in step S25, the control unit 180 transmits the received voice evaluation result performed in step S24 to the other terminal device 10 that transmitted the voice. Specifically, for example, the control unit 180 causes the input voice evaluation unit 1854 to transmit the received voice evaluation result to the specific terminal device 10 that transmitted the voice via the communication control unit 182 and the transmission/reception unit 2031 of the server 20. do.

そして、ステップＳ２６において、制御部１８０は、ステップＳ２３で受信した音声データを出力する。具体的には、例えば、制御部１８０は、通信制御部１８２により受信され、復号化された音声データを、音声出力部１８４及びスピーカ１４２により音声として出力する。 Then, in step S26, control unit 180 outputs the audio data received in step S23. Specifically, for example, the control unit 180 outputs the audio data received and decoded by the communication control unit 182 as audio through the audio output unit 184 and the speaker 142 .

図９は、端末装置１０の動作の他の例を表すフローチャートである。図９は、ユーザが相互通話を行っている通話相手である他の端末装置１０から送出された受信音声評価結果を受信し、提示制御部１８６により端末装置１０のユーザに提示する際の動作の例を表すフローチャートである。 FIG. 9 is a flow chart showing another example of the operation of the terminal device 10. FIG. FIG. 9 shows the operation of receiving the received voice evaluation results sent from the other terminal device 10 with which the user is making mutual calls and presenting them to the user of the terminal device 10 by the presentation control unit 186. 4 is a flowchart representing an example;

ステップＳ３１～Ｓ３３の動作は、図８のステップＳ２１～Ｓ２３と同一である。なお、図９のフローチャートにおいては、端末装置１０から着呼動作を行う例を示しているが、端末装置１０が発呼動作を行う場合でも同様の動作が行われる。つまり、図９のフローチャートに示す動作においても、端末装置１０が発呼動作を行うか着呼動作を行うかは任意である。 The operations of steps S31 to S33 are the same as steps S21 to S23 of FIG. Although the flowchart of FIG. 9 shows an example in which the terminal device 10 performs a call-in operation, the same operation is performed when the terminal device 10 performs a call-out operation. That is, in the operation shown in the flowchart of FIG. 9 as well, it is arbitrary whether the terminal device 10 performs the calling operation or the receiving operation.

ステップＳ３４において、制御部１８０は、通話相手である特定の端末装置１０から送信された、特定の端末装置１０における受信音声の評価結果（この評価結果は、端末装置１０のユーザが発話した音声が他の端末装置１０においてどのように受信されたかを評価した評価結果である）を受信する。具体的には、例えば、制御部１８０は、通信制御部１８２（受信部１８２３）により受信した、特定の端末装置１０における受信音声の評価結果を受信する。そして、受信部１８２３は、評価結果を提示制御部１８６に送出する。 In step S34, the control unit 180 evaluates the received voice in the specific terminal device 10 transmitted from the specific terminal device 10 that is the other party of the call (this evaluation result is the voice uttered by the user of the terminal device 10). ), which is an evaluation result of how the other terminal device 10 received it. Specifically, for example, the control unit 180 receives the evaluation result of the received voice in the specific terminal device 10 received by the communication control unit 182 (receiving unit 1823). The receiving unit 1823 then sends the evaluation result to the presentation control unit 186 .

ステップＳ３５において、制御部１８０は、ステップＳ３４において受信した評価結果を端末装置１０のディスプレイ１４１に表示させる。具体的には、例えば、制御部１８０は、提示制御部１８６（通話相手受信音声状態提示部１８６２）により、受信部１８２３が受信した評価結果（算出した統合スコア）に基づいて、端末装置１０のディスプレイ１４１に評価結果を表示させる。 In step S35, the control unit 180 causes the display 141 of the terminal device 10 to display the evaluation result received in step S34. Specifically, for example, the control unit 180 causes the presentation control unit 186 (calling party reception voice state presentation unit 1862) to display the evaluation result (calculated integrated score) received by the reception unit 1823. The evaluation result is displayed on the display 141 .

ステップＳ３６の動作は図８のステップＳ２６と同一である。 The operation of step S36 is the same as step S26 of FIG.

ステップＳ３７において、制御部１８０は、端末装置１０から通話切断の指示があったか、あるいは、通話相手の端末装置１０が通話切断を行ったか否かを判定する。そして、まだ通話中であれば（ステップＳ３７においてＮＯ）ステップＳ２３の処理に戻り、通話切断があったと判定したら（ステップＳ３７においてＹＥＳ）、通話切断処理を行い、図９に示すプログラムを終了する。 In step S37, the control unit 180 determines whether there is an instruction to disconnect the call from the terminal device 10, or whether the terminal device 10 of the other party has disconnected the call. If the call is still in progress (NO in step S37), the process returns to step S23, and if it is determined that the call has been disconnected (YES in step S37), the call disconnection process is performed and the program shown in FIG. 9 ends.

図１０は、端末装置１０及びサーバ２０の動作の一例を表すシーケンス図である。図１０は、ユーザが２台以上の端末装置１０を用いて相互通話を行っている際に端末装置１０が発話した入力音声に基づいて、サーバ２０の音声パワー分布計算部２０３４及び音声パワー判定部２０３５が計算及び判定動作を行い、端末装置１０の提示制御部１８６を介してユーザに通知する際の動作の例を表すフローチャートである。なお、図１０において端末装置１０を第１の端末装置１０と第２の端末装置１０として表しているが、これは単に端末装置１０を区別するための表記である。また、２台以上の端末装置１０による通話動作においても図１０と同様の動作が行われる。 FIG. 10 is a sequence diagram showing an example of operations of the terminal device 10 and the server 20. As shown in FIG. FIG. 10 shows the voice power distribution calculation unit 2034 and the voice power determination unit of the server 20 based on the input voice uttered by the terminal device 10 when the user is making a mutual call using two or more terminal devices 10. 2035 is a flowchart showing an example of an operation when 2035 performs calculation and determination operation and notifies the user via the presentation control unit 186 of the terminal device 10; Note that although the terminal devices 10 are represented as the first terminal device 10 and the second terminal device 10 in FIG. In addition, the same operation as that shown in FIG. 10 is performed in a call operation by two or more terminal devices 10 .

ステップＳ４１において、第１の端末装置１０の制御部１８０は、第２の端末装置１０宛に発呼通信を行うためにサーバ２０に発呼動作を行う。具体的には、例えば、制御部１８０は、サーバ２０に対して、通信制御部１８２により第２の端末装置１０宛に発呼動作を行う。なお、図１０のフローチャートにおいては、第１の端末装置１０が発呼動作を行い、第２の端末装置１０が着呼動作を行う例を示していたが、第２の端末装置１０が発呼動作を行う場合でも同様の動作が行われる。つまり、図１０のフローチャートに示す動作においても、第１の端末装置１０が発呼動作を行うか第２の端末装置１０が発呼動作を行うかは任意である。 In step S41 , the control unit 180 of the first terminal device 10 makes a call to the server 20 to perform call communication to the second terminal device 10 . Specifically, for example, the control unit 180 causes the communication control unit 182 to call the server 20 to the second terminal device 10 . Note that the flowchart of FIG. 10 shows an example in which the first terminal device 10 performs a calling operation and the second terminal device 10 performs an incoming call operation. A similar operation is performed when performing an operation. That is, in the operation shown in the flowchart of FIG. 10 as well, it is arbitrary whether the first terminal device 10 performs the calling operation or the second terminal device 10 performs the calling operation.

ステップＳ４２において、サーバ２０は、ステップＳ４１で受信した発呼動作に係る第２の端末装置１０に対して呼び出し動作を行う。具体的には、例えば、サーバ２０の制御部２０３は、送受信部２０３１及び通信制御部２０３３により、第２の端末装置１０に対して呼び出し動作を行う。送受信部２０３１及び通信制御部２０３３による呼び出し動作については既知の動作であるので、これ以上の説明は行わない。 In step S42, the server 20 performs a call operation to the second terminal device 10 related to the call operation received in step S41. Specifically, for example, the control unit 203 of the server 20 uses the transmission/reception unit 2031 and the communication control unit 2033 to call the second terminal device 10 . Since the calling operation by the transmission/reception unit 2031 and the communication control unit 2033 is a known operation, no further explanation will be given.

ステップＳ４３において、ステップＳ４２における呼び出しの対象である第２の端末装置１０の制御部１８０は、ステップＳ４２における呼び出しに対して着呼動作をする。具体的には、例えば、制御部１８０は、通信制御部１８２により端末装置１０からの着呼動作を行う。これにより、ステップＳ４４において、第１の端末装置１０と第２の端末装置１０との間で通話が成立する。 In step S43, the control unit 180 of the second terminal device 10, which is the target of the call in step S42, performs an incoming call operation for the call in step S42. Specifically, for example, the control unit 180 causes the communication control unit 182 to receive a call from the terminal device 10 . As a result, a call is established between the first terminal device 10 and the second terminal device 10 in step S44.

ステップＳ４４及びＳ４５において、第１の端末装置１０及び第２の端末装置１０の制御部１８０は、それぞれの端末装置１０に入力された入力音声を入力音声データに変換して、通話先である端末装置１０に送信し、また、通話先である端末装置１０から送信された音声データを受信し、音声に変換して出力する。具体的には、例えば、第１の端末装置１０及び第２の端末装置１０の制御部１８０は、音声入力部１８３により入力音声を受け入れ、通信制御部１８２により音声データに変換して、通話先である端末装置１０に送信し、また、通話先である端末装置１０から送信され、サーバ２０の送受信部２０３１を介してた送信された音声データを通信制御部１８２により受信し、この通信制御部１８２により音声に変換して音声出力部１８４及びスピーカ１４２を介して音声として出力する。 In steps S44 and S45, the control unit 180 of the first terminal device 10 and the second terminal device 10 converts the input voice input to each terminal device 10 into input voice data, It transmits the voice data to the device 10 and also receives the voice data transmitted from the terminal device 10 which is the destination of the call, converts it into voice and outputs it. Specifically, for example, the control unit 180 of the first terminal device 10 and the second terminal device 10 accepts an input voice by the voice input unit 183, converts it into voice data by the communication control unit 182, Also, the communication control unit 182 receives the voice data transmitted from the terminal device 10 that is the call destination and transmitted through the transmission/reception unit 2031 of the server 20, and this communication control unit 182 converts it into sound and outputs it as sound through the sound output unit 184 and the speaker 142 .

ステップＳ４７において、第１の端末装置の制御部１８０は、第１の端末装置１０のユーザから発声された音声の音声パワーを検出し、その結果をサーバ２０に送出する。具体的には、例えば、制御部１８０は、音声判定部１８５の音声パワー検出部１８５１により、第１の端末装置１０のユーザから発声された音声の音声パワーを検出し、その結果をサーバ２０に送出する。 At step S47 , the control unit 180 of the first terminal device detects the voice power of the voice uttered by the user of the first terminal device 10 and sends the result to the server 20 . Specifically, for example, the control unit 180 detects the voice power of the voice uttered by the user of the first terminal device 10 by the voice power detection unit 1851 of the voice determination unit 185, and sends the result to the server 20. Send out.

同様に、ステップＳ４８において、第２の端末装置の制御部１８０は、第２の端末装置１０のユーザから発声された音声の音声パワーを検出し、その結果をサーバ２０に送出する。具体的には、例えば、制御部１８０は、音声判定部１８５の音声パワー検出部１８５１により、第２の端末装置１０のユーザから発声された音声の音声パワーを検出し、その結果をサーバ２０に送出する。 Similarly, in step S48 , the control unit 180 of the second terminal device detects the voice power of the voice uttered by the user of the second terminal device 10 and sends the result to the server 20 . Specifically, for example, the control unit 180 detects the voice power of the voice uttered by the user of the second terminal device 10 by the voice power detection unit 1851 of the voice determination unit 185, and sends the result to the server 20. Send out.

ステップＳ４９において、サーバ２０は、ステップＳ３５、Ｓ３６で送出されてきた、第１の端末装置１０及び第２の端末装置１０のユーザから発声された音声の音声パワー検出結果に基づいて、これら音声パワーの平均値及び分散を計算する。具体的には、例えば、制御部２０３は、音声パワー分布計算部２０３４により、第１の端末装置１０及び第２の端末装置１０のユーザから発声された音声の音声パワー検出結果に基づいて、これら音声パワーの平均値及び分散を算出する。 In step S49, the server 20 detects the voice powers of the voices uttered by the users of the first terminal device 10 and the second terminal device 10, which are transmitted in steps S35 and S36. Calculate the mean and variance of Specifically, for example, the control unit 203 causes the voice power distribution calculation unit 2034 to determine these values based on the voice power detection results of the voices uttered by the users of the first terminal device 10 and the second terminal device 10. Calculate the mean and variance of the speech power.

次いで、ステップＳ５０において、サーバ２０は、ステップＳ４９で算出した音声パワーの平均値と音声パワー検出値との間のずれが所定値以上であるか否かを判定する。具体的には、例えば、制御部２０３は、音声パワー判定部２０３５により、ステップＳ４９において音声パワー分布計算部２０３４が計算した音声パワーの平均値に基づいて、第１の端末装置１０及び／または第２の端末装置１０が検出した音声パワーがこの平均値との間に所定値以上のずれがあるか否かを判定する。なお、ステップＳ５０において、サーバ２０の音声パワー判定部２０３５は、ステップＳ４９で音声パワー分布計算部２０３４が計算した音声パワーの平均値と分散に基づいて音声パワー検出値の標準偏差を求め、その標準偏差が所定値以上であるか否かで判定してもよい。 Next, in step S50, the server 20 determines whether or not the difference between the average value of the audio power calculated in step S49 and the detected audio power value is equal to or greater than a predetermined value. Specifically, for example, the control unit 203 causes the audio power determination unit 2035 to determine the average value of the audio power calculated by the audio power distribution calculation unit 2034 in step S49. It is determined whether or not the voice power detected by the second terminal device 10 deviates from the average value by a predetermined value or more. In step S50, the audio power determination unit 2035 of the server 20 obtains the standard deviation of the detected audio power value based on the average value and variance of the audio power calculated by the audio power distribution calculation unit 2034 in step S49. It may be determined whether or not the deviation is equal to or greater than a predetermined value.

そして、ステップＳ５１において、サーバ２０は、ステップＳ３７で所定値以上のずれがあると判定した音声パワー検出値を送出した第１の端末装置１０及び／または第２の端末装置１０に対して、所定値以上のずれがあることを通知する。具体的には、例えば、制御部２０３は、音声パワー判定部２０３５、送受信部２０３１及び通信制御部２０３３により、ステップＳ５０で所定値以上のずれがあると判定した音声パワー検出値を送出した第１の端末装置１０及び／または第２の端末装置１０に対して、所定値以上のずれがあることを通知する。図１０に示す例では、第１の端末装置１０において所定値以上のずれがあると判定されており、従って、音声パワー判定部２０３５、送受信部２０３１及び通信制御部２０３３は、第１の端末装置１０に対して通知を行う。ここで、所定値以上のずれがあると判定された場合、音声パワーが平均値より所定値以上大きい、あるいは所定値以上小さい場合がありうるので、音声パワー判定部２０３５は、平均値より大きい、または平均値より小さいという情報も第１の端末装置１０及び／または第２の端末装置１０に対して通知する。 Then, in step S51, the server 20 sends a predetermined value to the first terminal device 10 and/or the second terminal device 10 that has transmitted the detected voice power value determined to have a deviation of a predetermined value or more in step S37. Notifies that there is a deviation greater than or equal to the value. Specifically, for example, the control unit 203 causes the voice power determining unit 2035, the transmitting/receiving unit 2031, and the communication control unit 2033 to transmit the voice power detection value determined to have a deviation of a predetermined value or more in step S50. terminal device 10 and/or the second terminal device 10 that there is a deviation of a predetermined value or more. In the example shown in FIG. 10, it is determined that there is a deviation of a predetermined value or more in the first terminal device 10. Therefore, the voice power determination unit 2035, the transmission/reception unit 2031, and the communication control unit 2033 are configured in the first terminal device 10 will be notified. Here, when it is determined that there is a deviation of a predetermined value or more, there is a possibility that the voice power is greater than the average value by a predetermined value or less, or less than a predetermined value. Alternatively, the first terminal device 10 and/or the second terminal device 10 is also notified that the information is smaller than the average value.

ステップＳ５２において、第１の端末装置１０は、ステップＳ５１で行われた通知を受信し、受信した通知に基づいてずれがあることを表示する。具体的には、例えば、制御部１８０は、通信制御部１８２により通知を受信し、通信制御部１８２はこの通知を音声パワー状態提示部１８６３に送出し、音声パワー状態提示部１８６３は、第１の端末装置１０のディスプレイ１４１に通知があったことを表示する。 In step S52, the first terminal device 10 receives the notification made in step S51 and displays that there is a deviation based on the received notification. Specifically, for example, the control unit 180 receives a notification from the communication control unit 182, the communication control unit 182 sends this notification to the audio power state presentation unit 1863, and the audio power state presentation unit 1863 outputs the first The notification is displayed on the display 141 of the terminal device 10 of .

なお、図１０では、２台の端末装置１０（第１の端末装置１０及び第２の端末装置１０）による通話の例を示しているが、サーバ２０の音声パワー分布計算部２０３４が音声パワーの平均値及び分散を算出し、音声パワー判定部２０３５がこの平均値とのズレを判定していることから、図１０に示す例は、３台以上の端末装置１０による通話の際において、より所望の効果を奏することができる。 Note that FIG. 10 shows an example of a call by two terminal devices 10 (the first terminal device 10 and the second terminal device 10). Since the average value and variance are calculated and the voice power determination unit 2035 determines the deviation from this average value, the example shown in FIG. It is possible to achieve the effect of

＜４画面例＞
以下、端末装置１０から出力される画面の一例を、図１１～図１３を参照して説明する。 <4 Screen example>
An example of a screen output from the terminal device 10 will be described below with reference to FIGS. 11 to 13. FIG.

図１１は、入力音声、すなわち、端末装置１０のユーザが発話した音声についての状態、及び、受信（受話）音声、すなわち、端末装置１０が相互通話中である特定の端末装置１０のユーザが発音し、端末装置１０において受信した音声についての状態を端末装置１０のディスプレイ１４１に表示した画面である。図１１に示す端末装置１０の表示画面は、その端末装置１０を使って通話しているユーザの端末装置１０の表示画面であって、以降、図１１の端末装置１０を使用しているユーザを発話者と呼ぶ。また、発話者が相互通話を行っている（複数の）通話相手を受話者と呼ぶ。図１１（及び図１２）において、発話者のユーザ名をTanaka、受話者のユーザ名をそれぞれSato、Yamadaとする。つまり、図１１においては、ユーザ名Tanakaとユーザ名Sato、ユーザ名Yamadaの３名により同時音声通話をしている。 FIG. 11 shows the state of the input voice, that is, the voice uttered by the user of the terminal device 10, and the received (received) voice, that is, the state of the voice uttered by the user of the specific terminal device 10 with which the terminal device 10 is in mutual communication. and the state of the voice received by the terminal device 10 is displayed on the display 141 of the terminal device 10. FIG. The display screen of the terminal device 10 shown in FIG. 11 is the display screen of the terminal device 10 of the user who is making a call using the terminal device 10. Hereinafter, the user using the terminal device 10 of FIG. called speaker. Also, the (multiple) other parties with whom the speaker is making mutual calls are called receivers. In FIG. 11 (and FIG. 12), the speaker's user name is Tanaka, and the receiver's user names are Sato and Yamada, respectively. In other words, in FIG. 11, three people with the user name Tanaka, the user name Sato, and the user name Yamada are making a simultaneous voice call.

図１１に示すように、端末装置１０のディスプレイ１４１には、ユーザ毎のアイコン９００～９０２が表示されている。なお、図１１（及び図１２～図１５）に図示した例において、アイコン９００～９０２は絵文字により表示しているが、アイコン９００～９０２の表示形態に特段の限定はなく、例えば、通話をしているユーザ（発話者、受話者）それぞれの画像であってもよいし、単に発話者、受話者のユーザ名を表示するのみであってもよい。そして、このアイコン９００～９０２の下部には、音声状態を判定した結果であるスコアをバー９０３により表示している。なお、バーは、アイコンの一態様であり、インジケーターと称してもよい。図１１に示す例では、スコアが０、すなわち良好であればバー９０３が３つ、スコアが－１、すなわちやや悪いであればバー２つ、スコアが－２、すなわち悪いであればバーが１つ表示されている。また、スコアに応じてバーが表示される色も変えて表示されている。図１１は図示の関係上白黒で表示しているが、一例として、図１１で白抜きの長方形で示しているバーは緑色、内部が斜線のハッチングがされた長方形で示しているバーは黄色、黒く塗りつぶした長方形で示しているバーは赤色に表示されている。 As shown in FIG. 11, the display 141 of the terminal device 10 displays icons 900 to 902 for each user. In the example shown in FIG. 11 (and FIGS. 12 to 15), the icons 900 to 902 are displayed as pictograms, but the display form of the icons 900 to 902 is not particularly limited. It may be an image of each user (speaker, receiver) who is speaking, or may simply display the user names of the speaker and receiver. Below the icons 900 to 902, a bar 903 displays the score, which is the result of judging the voice state. Note that the bar is one aspect of the icon and may be called an indicator. In the example shown in FIG. 11, there are 3 bars 903 if the score is 0, ie good, 2 bars if the score is -1, ie moderately bad, and 1 bar if the score is -2, ie bad. are displayed. In addition, the color in which the bar is displayed is also changed according to the score. Although FIG. 11 is shown in black and white for the sake of illustration, as an example, bars indicated by white rectangles in FIG. 11 are green, bars indicated by hatched rectangles inside are yellow, Bars shown as black-filled rectangles are displayed in red.

図１１（ａ）に示したアイコン９００～９０２では、発話者Tanakaの音声状態は良好であり、発話者Satoの音声状態はやや悪い、発話者Yamadaの音声状態は悪いと判定されている。また、図１１（ｂ）に示したアイコン９００～９０２では、発話者Tanaka、発話者Sato、及び発話者Yamadaの音声状態はいずれも悪いと判定されている。 In icons 900 to 902 shown in FIG. 11A, it is determined that the voice state of speaker Tanaka is good, the voice state of speaker Sato is somewhat poor, and the voice state of speaker Yamada is poor. Also, in the icons 900 to 902 shown in FIG. 11B, the voice states of speaker Tanaka, speaker Sato, and speaker Yamada are all determined to be bad.

次に、図１２は、図１１において表示された音声状態の詳細な表示を行った画面である。 Next, FIG. 12 is a screen showing detailed display of the voice state displayed in FIG.

端末装置１０のユーザが、ディスプレイ１４１のアイコン９００をタップすると、図１２の上部に示すような詳細表示がディスプレイ１４１に表示される。図１２（ａ）に示すアイコン９００は発話者のアイコンであるので、詳細表示には、入力音声の統合スコア、及び、その根拠となる入力音声パワー等の判定結果が表示される。 When the user of the terminal device 10 taps the icon 900 on the display 141, a detailed display as shown in the upper part of FIG. 12 is displayed on the display 141. FIG. Since the icon 900 shown in FIG. 12(a) is the icon of the speaker, the detailed display displays the integrated score of the input voice and the determination result such as the input voice power that is the basis for the score.

同様に、端末装置１０のユーザが、ディスプレイ１４１のアイコン９０２をタップすると、図１２（ｂ）に示すような詳細表示がディスプレイ１４１に表示される。図１２（ｂ）に示すアイコン９０２は受話者のアイコンであるので、詳細表示には、受話音声のスコア、及び、その根拠となるパケットロス率が表示される。 Similarly, when the user of the terminal device 10 taps the icon 902 on the display 141, a detailed display as shown in FIG. 12(b) is displayed on the display 141. FIG. Since the icon 902 shown in FIG. 12B is the icon of the receiver, the detailed display displays the score of the received voice and the packet loss rate that is the basis for the score.

さらに、図１３は、端末装置１０のユーザが発話した入力音声の音声パワーが、現在相互通話中のユーザが発話した入力音声の音声パワーの平均値との間に所定値以上のずれがあると判定されたとき、判定結果をこの端末装置１０のディスプレイ１４１に表示した画面である。図１３に示す例では、端末装置１０のユーザの入力音声の音声パワーが平均値より所定値以下である場合、ディスプレイ１４１に、判定結果を記載したダイアログ１１００が表示される。 Furthermore, FIG. 13 shows the case where there is a deviation of a predetermined value or more between the voice power of the input voice uttered by the user of the terminal device 10 and the average value of the voice power of the input voice uttered by the user currently engaged in mutual communication. This is a screen displaying the determination result on the display 141 of the terminal device 10 when the determination is made. In the example shown in FIG. 13, when the voice power of the user's input voice of the terminal device 10 is less than or equal to the average value by a predetermined value, the display 141 displays a dialog 1100 describing the determination result.

図１３ではユーザの入力音声パワーのバラツキの判定結果を表示しているが、判定結果に基づいて解決策を提示することも可能である（例えば、小さく話してください、大きく話してください）。 Although FIG. 13 shows the determination result of the variation in the user's input voice power, it is also possible to present a solution based on the determination result (for example, speak softly, speak loudly).

＜５実施形態の効果＞
以上詳細に説明したように、本実施形態のシステム１によれば、端末装置１０の音声判定部１８５の受信音声評価部１８５５により、この端末装置１０が受信した受信音声データの状態を評価し、この評価結果である第１の評価結果を、通話相手である他の端末装置１０に返信しているので、他の端末装置１０のユーザは、この第１の評価結果に基づいて、自身の入力音声が通話相手にどのように聞こえているかを知ることができる。従って、本実施形態のシステム１によれば、通話相手の音声の状態を容易に把握することが可能となる。 <5 Effect of Embodiment>
As described in detail above, according to the system 1 of the present embodiment, the received voice evaluation unit 1855 of the voice determination unit 185 of the terminal device 10 evaluates the state of the received voice data received by the terminal device 10, Since the first evaluation result, which is the result of this evaluation, is sent back to the other terminal device 10 that is the other party of the call, the user of the other terminal device 10 can input his/her own input based on this first evaluation result. You can know how your voice sounds to the other party. Therefore, according to the system 1 of this embodiment, it is possible to easily grasp the state of the voice of the other party.

また、本実施形態のシステム１によれば、端末装置１０のユーザが発話した入力音声の品質の評価を音声判定部１８５で行い、この品質の評価結果である第３の評価結果をユーザに提示しているので、通話相手の音声の状態とともに、自身の入力音声の状態を同時に把握することができる。 Further, according to the system 1 of the present embodiment, the quality of the input voice uttered by the user of the terminal device 10 is evaluated by the voice determination unit 185, and the third evaluation result, which is the quality evaluation result, is presented to the user. Therefore, it is possible to grasp the state of the input voice of oneself at the same time as the state of the voice of the other party.

さらに、本実施形態のシステム１によれば、相互通話中のユーザの入力音声パワーのバラツキをユーザが把握することができる。 Furthermore, according to the system 1 of the present embodiment, the user can grasp the variation in the input voice power of the user during mutual communication.

すなわち、Web会議などの複数人の通話システムにおいて、通話参加者の入力音声パワーのバラツキが大きい場合、通話参加者全体の平均の入力音声パワーよりも、小さすぎる参加者の音声は聞こえにくく、通話参加者全体の平均の入力音声パワーよりも、大きすぎる参加者の音声はうるさく耳障りとなる。 In other words, in a multi-person call system such as a web conference, when the input voice power of the call participants varies greatly, it is difficult to hear the voice of a participant who is too weak compared to the average input voice power of all call participants. A participant's speech that is louder than the average input speech power of all participants is loud and harsh.

このようなバラツキが発生している時、受信側の音声出力のボリューム調整が困難になる。より詳細には、小さすぎる参加者の音声を聞きやすくするためボリュームを上げると、大きすぎる参加者の音声がさらに大きくなり余計に耳障りになる。一方、大きすぎる参加者の音声を聞きやすくするためボリュームを下げると、小さすぎる参加者の音声がさらに小さくなり余計に聞こえにくくなる。従って、音声を発話するユーザが自身の音声のパワーが通話者全体の音声パワーの平均よりも小さすぎる、もしくは、大きすぎることを把握し、ユーザが自身の発話音声のパワーを調整して平均に近づけることにより、このような入力音声パワーのバラツキを解消することが重要である。 When such variations occur, it becomes difficult to adjust the volume of the audio output on the receiving side. More specifically, increasing the volume to make the too quiet participant's voice more audible makes the too loud participant's voice louder and even more annoying. On the other hand, if the volume is lowered to make it easier to hear the voices of the participants who are too loud, the voices of the participants who are too quiet become even quieter and harder to hear. Therefore, a user uttering a voice grasps that the power of his own voice is too small or too large compared to the average voice power of all callers, and the user adjusts the power of his own uttered voice to reach the average. It is important to eliminate such variations in the input voice power by bringing them close to each other.

相互通話中のユーザの入力音声パワーのバラツキに関連する技術として、国際公開第２００８／０１１９０１号に開示された技術がある。この技術では、少なくとも１つの音声端末は、それぞれの受信チャンネルによって提供される音声データがグループの個別の音声信号に復号されるように、第１のグループに関連付けられる個別音声信号を重畳して集約音声信号を形成し、集約音声信号を第１集約音声データに符号化する。さらに、少なくとも２つの音声端末は、第２グループに関連付けられており、第１の集約音声データは、第１の集約音声データの出力毎に、第２のグループに対応付けられた音声端末に供給される。 WO 2008/011901 discloses a technique related to variations in input voice power of a user during mutual communication. In this technique, at least one voice terminal superimposes and aggregates individual audio signals associated with a first group such that audio data provided by each receive channel is decoded into individual audio signals for the group. An audio signal is formed and the aggregated audio signal is encoded into first aggregated audio data. Further, the at least two voice terminals are associated with the second group, and the first aggregated voice data is provided to the voice terminals associated with the second group for each output of the first aggregated voice data. be done.

しかしながら、複数のユーザが音声通話を行っている状況においては、各ユーザの入力音声レベルが閾値を上回っている場合でも、各ユーザの入力音声レベルが通話に参加している複数ユーザの間でバラついていると、聞きづらさにつながる。 However, in a situation where a plurality of users are making a voice call, even if the input voice level of each user exceeds the threshold, the input voice level of each user may vary among the multiple users participating in the call. If it is on, it will lead to hearing difficulty.

上述した技術においても、通話に参加する複数のユーザの入力音声レベルのバラツキに対してのフィードバックは考慮されていない。 Even in the above-described technique, no consideration is given to feedback on variations in the input voice levels of multiple users participating in a call.

一方、本実施形態のシステム１によれば、複数のユーザの入力音声レベルのバラツキ、すなわち、端末装置１０の入力音声パワーの検出値の平均値及び分散を音声パワー分布計算部２０３４が計算し、計算された平均値に対して所定値以上のずれがあるか否かを音声パワー判定部２０３５が判定し、この判定結果である第４の判定結果を端末装置１０のユーザ（発話者）に提示しているので、第４の判定結果を提示されたユーザは、入力音声パワーのバラツキを把握することができる。 On the other hand, according to the system 1 of the present embodiment, the variation in the input voice levels of a plurality of users, that is, the average value and variance of the detected values of the input voice power of the terminal device 10 are calculated by the voice power distribution calculation unit 2034, The voice power determination unit 2035 determines whether or not there is a deviation of a predetermined value or more from the calculated average value, and presents the fourth determination result, which is the result of this determination, to the user (speaker) of the terminal device 10. Therefore, the user presented with the fourth determination result can grasp the variation in the input voice power.

＜６付記＞
なお、上記した実施形態は本開示を分かりやすく説明するために構成を詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また、各実施形態の構成の一部について、他の構成に追加、削除、置換することが可能である。 <6 Notes>
It should be noted that the above-described embodiments describe the configurations in detail in order to explain the present disclosure in an easy-to-understand manner, and are not necessarily limited to those having all the described configurations. Also, part of the configuration of each embodiment can be added, deleted, or replaced with another configuration.

一例として、端末装置１０が音声判定部１８５のうち受信音声評価部１８５５のみを有する構成も可能である。また、受信音声評価部１８５５及び通話相手受信音声状態提示部１８６２のみを有する構成も可能である。さらに、サーバ２０が音声パワー分布計算部２０３４及び音声パワー判定部２０３５を有しない構成も可能である。この場合、端末装置１０は音声パワー状態提示部１８６３を有さない。 As an example, a configuration in which the terminal device 10 includes only the received voice evaluation section 1855 in the voice determination section 185 is also possible. A configuration having only the received voice evaluation unit 1855 and the caller received voice state presentation unit 1862 is also possible. Furthermore, a configuration in which the server 20 does not have the audio power distribution calculation unit 2034 and the audio power determination unit 2035 is also possible. In this case, the terminal device 10 does not have the audio power state presentation section 1863 .

また、上記した実施形態においては、端末装置１０のユーザが発話した入力音声について品質評価、すなわち、音声パワー、SN比及びマイク特性の各観点に沿って評価を行っていたが、通話相手である他の端末装置１０からの受信音声について、音声パワー検出部１８５１、SN比検出部１８５２及びマイク特性検出部１８５３による検出及び評価を行い、入力音声評価部１８５４により統合スコアを算出することで評価を行ってもよい。そして、入力音声評価部１８５４による品質評価結果は、パケットロス率に基づく受信音声評価部１８５５の評価結果とともに送信部１８２２を介して通話相手である特定の端末装置１０に送出される。入力音声評価部１８５４による品質評価結果にも、受信音声評価部１８５５の評価結果と同様に、端末装置１０を識別するための識別子が付されて送出される。 Further, in the above-described embodiment, the quality of the input voice uttered by the user of the terminal device 10 is evaluated from the viewpoints of voice power, SN ratio, and microphone characteristics. Received voices from other terminal devices 10 are detected and evaluated by a voice power detection unit 1851, an SN ratio detection unit 1852, and a microphone characteristic detection unit 1853, and evaluated by calculating an integrated score by an input voice evaluation unit 1854. you can go Then, the quality evaluation result by the input voice evaluation unit 1854 is sent to the specific terminal device 10 of the other party via the transmission unit 1822 together with the evaluation result of the received voice evaluation unit 1855 based on the packet loss rate. An identifier for identifying the terminal device 10 is added to the quality evaluation result by the input speech evaluation section 1854 and transmitted, as with the evaluation result of the received speech evaluation section 1855 .

図１４は、発話者の入力音声を受話者の端末装置１０において品質評価を行い、この結果を発話者の端末装置１０が受信してそのディスプレイ１４１に表示した画面である。 FIG. 14 shows a screen in which the speaker's input voice is subjected to quality evaluation in the receiver's terminal device 10, and the result is received by the speaker's terminal device 10 and displayed on its display 141. FIG.

図１４に示す画面は図１１に示す画面と同様であるが、発話者の端末装置１０において発話者の入力音声についての品質評価は行っていないので、端末装置１０のディスプレイ１４１には、受話者についてのアイコン９０１、９０２及びバー９０３が表示されている。また、図１４に示した画面例ではバー９０３を表示しているが、アイコン９０１、９０２のみ表示してもよい。これは、図１１～図１２の画面例においても同様である。また、図１４では、発話者のアイコン９００が表示されていない例を示しているが、発話者のアイコン９００が表示されてもよい。 The screen shown in FIG. 14 is similar to the screen shown in FIG. Icons 901 and 902 and a bar 903 for are displayed. Further, although the bar 903 is displayed in the screen example shown in FIG. 14, only the icons 901 and 902 may be displayed. This also applies to screen examples in FIGS. 11 and 12. FIG. Also, FIG. 14 shows an example in which the speaker icon 900 is not displayed, but the speaker icon 900 may be displayed.

さらに、図１５は、図１２と同様に、図１３において表示された音声状態の詳細な表示を行った画面である。図１５において、受話者のアイコン（図１４ではアイコン９０２）をタップすると、発話者の入力音声が受話者の端末装置１０で受信された際のスコア、及び、その根拠となるパケットロス率、さらには、品質評価の結果である受話者の入力音声の統合スコア、及び、その根拠となる入力音声パワー等の判定結果が表示される。 Furthermore, FIG. 15, like FIG. 12, is a screen displaying the details of the voice state displayed in FIG. In FIG. 15, when the listener's icon (icon 902 in FIG. 14) is tapped, the score when the speaker's input voice is received by the receiver's terminal device 10, the packet loss rate as the basis, and further displays the integrated score of the input voice of the receiver, which is the result of the quality evaluation, and the determination results such as the input voice power, which is the basis for the score.

ここで、３人以上のユーザによる同時通話を行っている場合、受信音声に関する評価結果は、通話相手である複数の端末装置１０から別々に送出される。従って、通話相手受信音声状態提示部１８６２により評価結果を提示する際に、通話相手である端末装置１０から送信された評価結果を全て提示してもよいし、評価結果に基づいて少なくとも１つの端末装置１０から送信された評価結果を選択して提示してもよい。例えば、受信音声の評価が低い通信相手から送信されてきた評価結果については破棄するようにしてもよい。また、一例として、悪い評価結果を受信したということは、自分の会話が聞き取りづらい通話相手がいるということであるから、「やや悪い」「悪い」という評価結果のみ提示してもよい。また、品質評価結果についても、どの端末装置１０による評価結果であるかを、例えばユーザ名とともに提示してもよい。さらに、受信音声の評価が高い通信相手から送信されてきた評価結果についても表示してもよい。加えて、相互通話中のユーザが５人いるようなときは、受信音声の評価が最高のユーザのユーザ名や、受信音声の評価が最低のユーザのユーザ名を表示してもよいし、評価の順序（ランキング）を表示してもよい。 Here, when three or more users are making simultaneous calls, the evaluation results regarding the received voice are sent separately from the plurality of terminal devices 10 that are the parties to the call. Therefore, when presenting the evaluation result by the caller reception voice state presentation unit 1862, all the evaluation results transmitted from the terminal device 10 that is the caller may be presented, or at least one terminal may be displayed based on the evaluation result. The evaluation results transmitted from the device 10 may be selected and presented. For example, an evaluation result transmitted from a communication partner with a low received voice evaluation may be discarded. Also, as an example, receiving a bad evaluation result means that there is a person on the other end of the conversation who is difficult to hear your own conversation. Also, for the quality evaluation result, which terminal device 10 has the evaluation result may be presented together with, for example, the user name. Furthermore, evaluation results transmitted from communication partners with high evaluations of received voices may also be displayed. In addition, when there are five users in mutual communication, the user name of the user with the highest received voice evaluation or the user name of the user with the lowest received voice evaluation may be displayed. You may display the order (ranking) of.

また、上記の各構成、機能、処理部、処理手段等は、それらの一部又は全部を、例えば集積回路で設計する等によりハードウェアで実現してもよい。また、本発明は、実施例の機能を実現するソフトウェアのプログラムコードによっても実現できる。この場合、プログラムコードを記録した記憶媒体をコンピュータに提供し、そのコンピュータが備えるプロセッサが記憶媒体に格納されたプログラムコードを読み出す。この場合、記憶媒体から読み出されたプログラムコード自体が前述した実施例の機能を実現することになり、そのプログラムコード自体、及びそれを記憶した記憶媒体は本発明を構成することになる。このようなプログラムコードを供給するための記憶媒体としては、例えば、フレキシブルディスク、ＣＤ－ＲＯＭ、ＤＶＤ－ＲＯＭ、ハードディスク、ＳＳＤ、光ディスク、光磁気ディスク、ＣＤ－Ｒ、磁気テープ、不揮発性のメモリカード、ＲＯＭなどが用いられる。 Further, each of the above configurations, functions, processing units, processing means, and the like may be realized by hardware, for example, by designing them in an integrated circuit. The present invention can also be implemented by software program code that implements the functions of the embodiments. In this case, a computer is provided with a storage medium recording the program code, and a processor included in the computer reads the program code stored in the storage medium. In this case, the program code itself read from the storage medium implements the functions of the above-described embodiments, and the program code itself and the storage medium storing it constitute the present invention. Storage media for supplying such program codes include, for example, flexible disks, CD-ROMs, DVD-ROMs, hard disks, SSDs, optical disks, magneto-optical disks, CD-Rs, magnetic tapes, and non-volatile memory cards. , ROM and the like are used.

また、本実施例に記載の機能を実現するプログラムコードは、例えば、アセンブラ、Ｃ／Ｃ＋＋、ｐｅｒｌ、Ｓｈｅｌｌ、ＰＨＰ、Ｊａｖａ（登録商標）等の広範囲のプログラム又はスクリプト言語で実装できる。 Also, the program code that implements the functions described in this embodiment can be implemented in a wide range of programs or script languages, such as assembler, C/C++, perl, Shell, PHP, and Java (registered trademark).

さらに、実施例の機能を実現するソフトウェアのプログラムコードを、ネットワークを介して配信することによって、それをコンピュータのハードディスクやメモリ等の記憶手段又はＣＤ－ＲＷ、ＣＤ－Ｒ等の記憶媒体に格納し、コンピュータが備えるプロセッサが当該記憶手段や当該記憶媒体に格納されたプログラムコードを読み出して実行するようにしてもよい。 Furthermore, by distributing the program code of the software that implements the functions of the embodiment via a network, it can be stored in storage means such as a hard disk or memory of a computer, or in a storage medium such as a CD-RW or CD-R. Alternatively, a processor provided in the computer may read and execute the program code stored in the storage means or the storage medium.

以上の各実施形態で説明した事項を以下に付記する。 The items described in the above embodiments will be added below.

（付記１）
プロセッサ（１９）とメモリ（１５）とを備え、他の端末装置（１０）と音声データの送受信が可能な端末装置（１０）を動作させるためのプログラム（１７１）であって、プログラム（１７１）は、プロセッサ（１９）に、他の端末装置（１０）から送信された音声データを受信するステップ（Ｓ２３）と、受信した音声データの状態を評価して第１の評価結果として出力するステップ（Ｓ２４）と、第１の評価結果を他の端末装置（１０）に返信するステップ（Ｓ２５）と、受信した音声データに基づく音声をユーザに出力するステップ（Ｓ２６）とを実行させるプログラム。
（付記２）
他の端末装置（１０）から送信される、端末装置（１０）から送信されて他の端末装置（１０）で受信された音声データの状態を評価した第２の評価結果を受信するステップ（Ｓ３３）と、受信した第２の評価結果をユーザに提示するステップ（Ｓ３５）とをさらに実行させる付記１に記載のプログラム。
（付記３）
第２の評価結果に応じたアイコン（９００～９０２）をユーザに提示するステップ（Ｓ３５）をさらに実行させる付記２に記載のプログラム。
（付記４）
アイコン（９００～９０２）をユーザに提示するステップ（Ｓ３５）では、第２の評価結果に応じて、他の端末装置（１０）のユーザに関する画像の表示態様を変化させる付記３に記載のプログラム。
（付記５）
第２の評価結果をユーザに提示するステップ（Ｓ３５）では、第２の評価結果に応じた音を発生させる付記２に記載のプログラム。
（付記６）
第２の評価結果をユーザに提示するステップ（Ｓ３５）では、第２の評価結果が所定状態よりも良好である場合、第２の評価結果を提示しない付記２～５のいずれか一項に記載のプログラム。
（付記７）
第２の評価結果には、音声データを受信した他の端末装置（１０）を特定する識別子が付されており、第２の評価結果をユーザに提示するステップ（Ｓ３５）では、第２の評価結果を、他の端末装置（１０）が識別可能にユーザに提示する付記２～６のいずれか一項に記載のプログラム。
（付記８）
第２の評価結果をユーザに提示するステップ（Ｓ３５）では、他の端末装置（１０）のユーザの画像と関連付けて提示する付記７に記載のプログラム。
（付記９）
音声データを受信するステップ（Ｓ２３）では、符号化されてパケットに変換されている音声データを受信してこの音声データを復号化し。第１の評価結果として出力するステップ（Ｓ２４）では、復号化の際のパケットロス率に基づいて音声データの状態を評価する付記２～８のいずれか一項に記載のプログラム。
（付記１０）
第２の評価結果をユーザに提示するステップ（Ｓ３５）では、第２の評価結果として他の端末装置（１０）との通信におけるパケットロス率をユーザに提示する付記９に記載のプログラム。
（付記１１）
第１の評価結果として出力するステップ（Ｓ２４）では、音声区間検出を利用し、音声データの状態を評価する付記２～８のいずれか一項に記載のプログラム。
（付記１２）
第１の評価結果として出力するステップ（Ｓ２４）では、音声データの品質に基づいて音声データの状態を評価する付記２～８のいずれか一項に記載のプログラム。
（付記１３）
第２の評価結果をユーザに提示するステップ（Ｓ３５）では、第２の評価結果としてユーザが発した音声に基づく音声データの品質をユーザに提示する付記１２に記載のプログラム。
（付記１４）
第２の評価結果を受信するステップ（Ｓ３３）では、複数の他の端末装置（１０）から複数の第２の評価結果を受信し、第２の評価結果をユーザに提示するステップ（Ｓ２６）では、少なくとも１つ以外の第２の評価結果を除去する付記１３に記載のプログラム。
（付記１５）
音声データを受信するステップ（Ｓ２３）では、他の端末装置（１０）を特定する第１識別子が付与された音声データを受信し、第１の評価結果として出力するステップ（Ｓ２４）では、第１の評価結果に端末装置（１０）を特定する第２識別子を付し、第１の評価結果を他の端末装置（１０）に返信するステップ（Ｓ２５）では、第２識別子が付された第１の評価結果を、第１識別子により特定される他の端末装置（１０）を送信先として送信する付記１～１４のいずれか一項に記載のプログラム。
（付記１６）
端末装置（１０）のユーザから発せられた音声に基づく音声データの品質を評価するステップ（Ｓ１７）と、端末装置（１０）のユーザから発せられた音声に基づく音声データの品質の評価結果である第３の評価結果をユーザに提示するステップ（Ｓ１８）とをさらに実行させる付記１～１５のいずれか一項に記載のプログラム。
（付記１７）
音声データの品質を評価するステップ（Ｓ１７）では、端末装置（１０）のユーザから発せられた音声の特性である音声のパワー、SN比、マイク特性、又はこれらのうち少なくとも２つの検出の組み合わせに基づいて音声データの品質を評価する付記１６に記載のプログラム。
（付記１８）
音声データの品質を評価するステップ（Ｓ１７）では、端末装置（１０）のユーザから音声データが入力される毎に特性を検出し、音声データの入力が途絶えたら最後に検出した特性の値を保持する付記１７に記載のプログラム。
（付記１９）
音声データの品質を評価するステップ（Ｓ１７）では、特性の検出結果に基づいてスコアを算出することで音声データの品質を評価する付記１７または１８に記載のプログラム。
（付記２０）
第３の評価結果を端末装置（１０）のユーザに提示するステップ（Ｓ１８）では、端末装置（１０）のユーザから音声データが入力されていない状態では第３の評価結果を提示しない付記１６～１９のいずれか一項に記載のプログラム。
（付記２１）
第３の評価結果を端末装置（１０）のユーザに提示するステップ（Ｓ１８）では、第３の評価結果として音声のパワー、SN比、マイク特性、又はこれらのうち少なくとも２つの検出の組み合わせを端末装置（１０）のユーザに提示する付記１７～１９のいずれか一項に記載のプログラム。
（付記２２）
プロセッサ（２９）とメモリ（２５）とを備え、複数の端末装置（１０）の間で音声データの送受信をさせるサーバ（２０）を動作させるためのプログラム（２０２１）であって、プログラム（２０２１）は、プロセッサ（２９）に、端末装置（１０）毎の入力音声の音声パワーを検出するステップ（Ｓ４７、Ｓ４８）と、音声パワーの検出結果に基づいて、音声パワーの平均値及び分散を算出するステップ（Ｓ４９）と、算出した平均値及び分散に基づいて、複数の端末装置（１０）を利用する各々の発話者の音声パワーと平均値とのずれが所定値以上であるか否かを判定するステップ（Ｓ５０）と、音声パワーと平均値とのずれが所定値以上であると判定された発話者に対して第４の判定結果を提示するステップ（Ｓ５１）とを実行させるプログラム。
（付記２３）
音声パワーを検出するステップ（Ｓ４７、Ｓ４８）では、端末装置（１０）から音声が入力される毎に音声パワーの検出を行い、音声の入力が途絶えたら最後の検出値を保持する付記２２に記載のプログラム。
（付記２４）
音声パワーの平均値及び分散を算出するステップ（Ｓ４９）では、秒単位での間隔を置いて音声パワーの平均値及び分散を算出する付記２２または２３に記載のプログラム。
（付記２５）
音声パワーと平均値とのずれが所定値以上であるか否かを判定するステップ（Ｓ５０）では、秒単位での間隔を置いて判定する付記２２～２４のいずれか一項に記載のプログラム。
（付記２６）
音声データには、音声データを送出した端末装置（１０）を特定する識別子が付されており、音声パワーの平均値及び分散を算出するステップ（Ｓ４９）では、識別子に基づいて現在通話をしている発話者が変更されたことを検出したら、音声パワーの平均値及び分散を算出し直す付記２２～２５のいずれか一項に記載のプログラム。
（付記２７）
第４の判定結果を提示するステップ（Ｓ５１）では、通信回線が接続されているが発話をしていない発話者がいたら、第４の判定結果を提示しない付記２２～２６のいずれか一項に記載のプログラム。
（付記２８）
第４の判定結果を提示するステップ（Ｓ５１）では、第４の判定結果とともに解決策を提示する付記２２～２７のいずれか一項に記載のプログラム。
（付記２９）
プロセッサ（１９）とメモリ（１５）とを備え、他の装置（１０）と音声データの送受信が可能な装置（１０）であって、プロセッサ（１９）は、他の装置（１０）から送信された音声データを受信するステップ（Ｓ２３）と、受信した音声データの状態を評価して第１の評価結果として出力するステップ（Ｓ２４）と、第１の評価結果を他の装置（１０）に返信するステップ（Ｓ２５）と、受信した音声データに基づく音声をユーザに出力するステップ（Ｓ２６）とを実行する装置（１０）。
（付記３０）
プロセッサ（１９）とメモリ（１５）とを備え、他のコンピュータ（１０）と音声データの送受信が可能なコンピュータ（１０）により実行される方法であって、プロセッサ（１９）は、他のコンピュータから送信された音声データを受信するステップ（Ｓ２３）と、受信した音声データの状態を評価して第１の評価結果として出力するステップ（Ｓ２４）と、第１の評価結果を他のコンピュータに返信するステップ（Ｓ２５）と、受信した音声データに基づく音声をユーザに出力するステップ（Ｓ２６）とを実行する方法。
（付記３１）
端末装置（１０）と他の端末装置（１０）との間での音声データの送受信が可能なシステム（１）であって、端末装置（１０）は、他の端末装置（１０）から送信された音声データを受信する手段（１８２３）と、受信した音声データの状態を評価して第１の評価結果として出力する手段（１８５５）と、第１の評価結果を他の端末装置（１０）に返信する手段（１８２２）と、受信した音声データに基づく音声をユーザに出力する手段（１８４）とを具備するシステム（１）。
（付記３２）
プロセッサ（２９）とメモリ（２５）とを備え、複数の端末装置（１０）の間で音声データの送受信をさせる装置（２０）であって、プロセッサ（２９）は、端末装置（１０）毎の入力音声の音声パワーを検出するステップ（Ｓ４７、Ｓ４８）と、音声パワーの検出結果に基づいて、音声パワーの平均値及び分散を算出するステップ（Ｓ４９）と、算出した平均値及び分散に基づいて、複数の端末装置（１０）を利用する各々の発話者の音声パワーと平均値とのずれが所定値以上であるか否かを判定するステップ（Ｓ５０）と、音声パワーと平均値とのずれが所定値以上であると判定された発話者に対して第４の判定結果を提示するステップ（Ｓ５１）とを実行する装置（１０）。
（付記３３）
プロセッサ（２９）とメモリ（２５）とを備え、複数の端末装置（１０）の間で音声データの送受信をさせるコンピュータ（２０）により実行される方法であって、プロセッサ（２９）は、端末装置（１０）毎の入力音声の音声パワーを検出するステップ（Ｓ４７、Ｓ４８）と、音声パワーの検出結果に基づいて、音声パワーの平均値及び分散を算出するステップ（Ｓ４９）と、算出した平均値及び分散に基づいて、複数の端末装置（１０）を利用する各々の発話者の音声パワーと平均値とのずれが所定値以上であるか否かを判定するステップ（Ｓ５０）と、音声パワーと平均値とのずれが所定値以上であると判定された発話者に対して第４の判定結果を提示するステップ（Ｓ５１）とを実行する方法。
（付記３４）
複数の端末装置（１０）と、これら複数の端末装置（１０）の間で音声データの送受信をさせるサーバ（２０）とを有するシステム（１）であって、サーバ（２０）は、端末装置（１０）毎の入力音声の音声パワーを検出する手段（２０３１、２０３３）と、音声パワーの検出結果に基づいて、音声パワーの平均値及び分散を算出する手段（２０３４）と、算出した平均値及び分散に基づいて、複数の端末装置（１０）を利用する各々の発話者の音声パワーと平均値とのずれが所定値以上であるか否かを判定する手段（２０３５）と、音声パワーと平均値とのずれが所定値以上であると判定された発話者に対して第４の判定結果を提示する手段（２０３１、２０３３）とを具備するシステム。 (Appendix 1)
A program (171) for operating a terminal device (10) comprising a processor (19) and a memory (15) and capable of transmitting and receiving voice data to and from another terminal device (10), the program (171) The processor (19) receives voice data transmitted from another terminal device (10) (S23), evaluates the state of the received voice data and outputs it as a first evaluation result ( S24), a step of replying the first evaluation result to another terminal device (10) (S25), and a step of outputting voice based on the received voice data to the user (S26).
(Appendix 2)
A step of receiving a second evaluation result obtained by evaluating the state of the voice data transmitted from the other terminal device (10) and received by the other terminal device (10) (S33 ) and presenting the received second evaluation result to the user (S35).
(Appendix 3)
The program according to appendix 2, further executing a step (S35) of presenting icons (900 to 902) corresponding to the second evaluation result to the user.
(Appendix 4)
3. The program according to appendix 3, wherein in the step (S35) of presenting the icons (900 to 902) to the user, the display mode of the image regarding the user of the other terminal device (10) is changed according to the second evaluation result.
(Appendix 5)
3. The program according to appendix 2, wherein in the step of presenting the second evaluation result to the user (S35), a sound is generated according to the second evaluation result.
(Appendix 6)
6. The method according to any one of appendices 2 to 5, wherein in the step of presenting the second evaluation result to the user (S35), if the second evaluation result is better than a predetermined state, the second evaluation result is not presented. program.
(Appendix 7)
The second evaluation result is attached with an identifier specifying the other terminal device (10) that received the voice data, and in the step of presenting the second evaluation result to the user (S35), the second evaluation result 7. The program according to any one of appendices 2 to 6, wherein the result is presented to the user in an identifiable manner by another terminal device (10).
(Appendix 8)
8. The program according to appendix 7, wherein in the step (S35) of presenting the second evaluation result to the user, the program is presented in association with the image of the user of the other terminal device (10).
(Appendix 9)
In the step of receiving voice data (S23), voice data encoded and converted into packets is received and decoded. 9. The program according to any one of appendices 2 to 8, wherein in the step of outputting as the first evaluation result (S24), the state of the audio data is evaluated based on the packet loss rate during decoding.
(Appendix 10)
10. The program according to appendix 9, wherein in the step of presenting the second evaluation result to the user (S35), a packet loss rate in communication with another terminal device (10) is presented to the user as the second evaluation result.
(Appendix 11)
9. The program according to any one of Appendices 2 to 8, wherein in the step of outputting as the first evaluation result (S24), speech section detection is used to evaluate the state of the speech data.
(Appendix 12)
9. The program according to any one of appendices 2 to 8, wherein in the step of outputting as the first evaluation result (S24), the state of the audio data is evaluated based on the quality of the audio data.
(Appendix 13)
13. The program according to supplementary note 12, wherein in the step of presenting the second evaluation result to the user (S35), the quality of voice data based on the voice uttered by the user is presented to the user as the second evaluation result.
(Appendix 14)
In the step of receiving a second evaluation result (S33), a plurality of second evaluation results are received from a plurality of other terminal devices (10), and in the step of presenting the second evaluation results to the user (S26) 14. The program of claim 13, which removes second evaluation results other than at least one.
(Appendix 15)
In the step of receiving voice data (S23), voice data to which a first identifier specifying the other terminal device (10) is assigned is received, and in the step of outputting as a first evaluation result (S24), the first In the step (S25) of attaching a second identifier specifying the terminal device (10) to the evaluation result of and returning the first evaluation result to the other terminal device (10), the first 15. The program according to any one of appendices 1 to 14, wherein the evaluation result of is transmitted to another terminal device (10) specified by the first identifier.
(Appendix 16)
A step (S17) of evaluating the quality of voice data based on the voice uttered by the user of the terminal device (10), and the evaluation result of the quality of the voice data based on the voice uttered by the user of the terminal device (10). 16. The program according to any one of appendices 1 to 15, further executing a step (S18) of presenting the third evaluation result to the user.
(Appendix 17)
In the step of evaluating the quality of the voice data (S17), the characteristics of the voice uttered by the user of the terminal device (10), namely the power of the voice, the signal-to-noise ratio, the microphone characteristics, or a combination of detections of at least two of these, 17. The program of clause 16, wherein the program evaluates the quality of the audio data based on.
(Appendix 18)
In the step (S17) of evaluating the quality of voice data, characteristics are detected each time voice data is input from the user of the terminal device (10), and the value of the last detected characteristics is held when the input of voice data stops. 17. The program according to Supplementary Note 17.
(Appendix 19)
19. The program according to appendix 17 or 18, wherein in the step of evaluating the quality of the audio data (S17), the quality of the audio data is evaluated by calculating a score based on the characteristic detection result.
(Appendix 20)
In the step (S18) of presenting the third evaluation result to the user of the terminal device (10), the third evaluation result is not presented unless voice data is input from the user of the terminal device (10). 20. The program according to any one of 19.
(Appendix 21)
In the step (S18) of presenting the third evaluation result to the user of the terminal device (10), the third evaluation result is the power of the voice, the SN ratio, the microphone characteristics, or a combination of at least two of these detections. 20. A program according to any one of clauses 17-19 for presentation to a user of a device (10).
(Appendix 22)
A program (2021) for operating a server (20) comprising a processor (29) and a memory (25) and for transmitting and receiving voice data between a plurality of terminal devices (10), the program (2021) the processor (29) to detect the voice power of the input voice for each terminal device (10) (S47, S48), and to calculate the average value and the variance of the voice power based on the voice power detection result. Based on the step (S49) and the calculated average value and variance, it is determined whether or not the difference between the voice power of each speaker using the plurality of terminal devices (10) and the average value is equal to or greater than a predetermined value. and a step (S51) of presenting a fourth determination result to the speaker determined to have a difference between the voice power and the average value equal to or greater than a predetermined value.
(Appendix 23)
Described in appendix 22, wherein in the step of detecting voice power (S47, S48), voice power is detected each time voice is input from the terminal device (10), and the last detected value is retained when voice input is interrupted. program.
(Appendix 24)
24. The program according to appendix 22 or 23, wherein in the step of calculating the mean value and variance of voice power (S49), the mean value and variance of voice power are calculated at intervals of seconds.
(Appendix 25)
25. The program according to any one of appendices 22 to 24, wherein in the step (S50) of determining whether or not the difference between the voice power and the average value is equal to or greater than a predetermined value, the determination is made at intervals of seconds.
(Appendix 26)
The voice data is attached with an identifier specifying the terminal device (10) that sent the voice data. 26. The program according to any one of appendices 22 to 25, for recalculating the average value and variance of voice power when detecting that the current speaker has been changed.
(Appendix 27)
In the step of presenting the fourth determination result (S51), if there is a speaker who is connected to the communication line but does not speak, the fourth determination result is not presented. program as described.
(Appendix 28)
28. The program according to any one of appendices 22 to 27, wherein in the step of presenting the fourth determination result (S51), a solution is presented together with the fourth determination result.
(Appendix 29)
A device (10) comprising a processor (19) and a memory (15) and capable of transmitting and receiving audio data to and from another device (10), wherein the processor (19) receives data transmitted from the other device (10). a step of receiving the received voice data (S23); a step of evaluating the state of the received voice data and outputting it as a first evaluation result (S24); and returning the first evaluation result to the other device (10). (S25), and a step (S26) of outputting voice based on the received voice data to the user.
(Appendix 30)
A method performed by a computer (10) comprising a processor (19) and a memory (15) and capable of transmitting and receiving audio data to and from another computer (10), wherein the processor (19) a step of receiving the transmitted voice data (S23); a step of evaluating the state of the received voice data and outputting it as a first evaluation result (S24); and returning the first evaluation result to another computer. A method for carrying out the steps of (S25) and outputting (S26) a voice based on the received voice data to a user.
(Appendix 31)
A system (1) capable of transmitting and receiving voice data between a terminal device (10) and another terminal device (10), wherein the terminal device (10) means (1823) for receiving voice data received; means (1855) for evaluating the state of the received voice data and outputting it as a first evaluation result; and sending the first evaluation result to another terminal device (10) A system (1) comprising means (1822) for replying and means (184) for outputting voice based on received voice data to a user.
(Appendix 32)
A device (20), comprising a processor (29) and a memory (25), for transmitting and receiving voice data between a plurality of terminal devices (10), wherein the processor (29) is provided for each terminal device (10) Steps of detecting the voice power of the input voice (S47, S48); calculating the mean value and variance of the voice power based on the voice power detection result (S49); and based on the calculated mean value and variance a step (S50) of determining whether or not the difference between the voice power of each speaker using a plurality of terminal devices (10) and the average value is equal to or greater than a predetermined value; a step (S51) of presenting a fourth determination result to a speaker determined to be equal to or greater than a predetermined value.
(Appendix 33)
A method performed by a computer (20) comprising a processor (29) and a memory (25) for transmitting and receiving audio data between a plurality of terminal devices (10), wherein the processor (29) comprises: (10) steps of detecting the voice power of the input voice (S47, S48); calculating the average value and variance of the voice power based on the voice power detection result (S49); and based on the variance, a step (S50) of determining whether or not the difference between the voice power of each speaker using a plurality of terminal devices (10) and the average value is equal to or greater than a predetermined value; and a step (S51) of presenting a fourth determination result to a speaker determined to have a deviation from the average value equal to or greater than a predetermined value.
(Appendix 34)
A system (1) having a plurality of terminal devices (10) and a server (20) for transmitting and receiving voice data between the plurality of terminal devices (10), wherein the server (20) comprises terminal devices ( 10) means (2031, 2033) for detecting the voice power of each input voice; means (2034) for calculating the mean value and variance of the voice power based on the voice power detection result; Means (2035) for determining whether the difference between the voice power of each speaker using a plurality of terminal devices (10) and the average value is equal to or greater than a predetermined value based on the variance; means (2031, 2033) for presenting a fourth determination result to a speaker determined to have a deviation from the value equal to or greater than a predetermined value.

１…システム、１０…端末装置、２０…サーバ、１４１…ディスプレイ、１４２…スピーカ、１７０…記憶部、１７１…アプリケーション、１７２…検出結果データ、１７３…評価テーブル、１８０…制御部、１８２…通信制御部、１８３…音声入力部、１８４…音声出力部、１８５…音声判定部、１８６…提示制御部、２０２…記憶部、２０３…制御部、９００～９０２…アイコン、９０３…バー、１１００…ダイアログ、１８２１…符号化処理部、１８２２…送信部、１８２３…受信部、１８２４…復号化処理部、１８５１…音声パワー検出部、１８５２…SN比検出部、１８５３…マイク特性検出部、１８５４…入力音声評価部、１８５５…受信音声評価部、１８６１…入力音声状態提示部、１８６２…通話相手受信音声状態提示部、１８６３…音声パワー状態提示部、１８６４…入力音声評価部、２０２１…アプリケーション、２０２２…計算結果データ、２０３１…送受信部、２０３２…記憶制御部、２０３３…通信制御部、２０３４…音声パワー分布計算部、２０３５…音声パワー判定部
DESCRIPTION OF SYMBOLS 1... System, 10... Terminal device, 20... Server, 141... Display, 142... Speaker, 170... Storage part, 171... Application, 172... Detection result data, 173... Evaluation table, 180... Control part, 182... Communication control Part 183... Audio input unit 184... Audio output unit 185... Audio determination unit 186... Presentation control unit 202... Storage unit 203... Control unit 900 to 902... Icon 903... Bar 1100... Dialog, DESCRIPTION OF SYMBOLS 1821... Encoding process part 1822... Transmission part 1823... Reception part 1824... Decoding process part 1851... Voice power detection part 1852... SN ratio detection part 1853... Microphone characteristic detection part 1854... Input voice evaluation Part 1855 Received voice evaluation unit 1861 Input voice state presentation unit 1862 Reception voice state presentation unit 1863 Voice power state presentation unit 1864 Input voice evaluation unit 2021 Application 2022 Calculation result Data 2031 Transmitting/receiving unit 2032 Storage control unit 2033 Communication control unit 2034 Sound power distribution calculation unit 2035 Sound power determination unit

Claims

A program for operating a terminal device comprising a processor and a memory and capable of transmitting and receiving audio data to and from another terminal device,
The program causes the processor to:
receiving the audio data transmitted from the other terminal device;
evaluating the state of the received audio data and outputting it as a first evaluation result;
a step of returning the first evaluation result to the other terminal device;
and outputting a voice based on the received voice data to a user.

a step of receiving a second evaluation result of evaluating the state of the audio data transmitted from the terminal device and received by the other terminal device, which is transmitted from the other terminal device;
The program according to claim 1, further causing a step of presenting the received second evaluation result to the user.

3. The program according to claim 2, further causing a step of presenting an icon according to the second evaluation result to the user.

4. The program according to claim 3, wherein in the step of presenting the icon to the user, a display mode of an image related to the user on the other terminal device is changed according to the second evaluation result.

3. The program according to claim 2, wherein in the step of presenting said second evaluation result to said user, a sound corresponding to said second evaluation result is generated.

6. The step of presenting the second evaluation result to the user does not present the second evaluation result if the second evaluation result is better than a predetermined condition. program described in .

The second evaluation result is attached with an identifier that identifies the other terminal device that has received the audio data,
7. The program according to any one of claims 2 to 6, wherein, in the step of presenting the second evaluation result to the user, the second evaluation result is presented to the user in an identifiable manner by another terminal device. .

8. The program according to claim 7, wherein in the step of presenting the second evaluation result to the user, the second evaluation result is presented in association with an image of the user on the other terminal device.

the step of receiving the audio data includes receiving the audio data that has been encoded and converted into packets and decoding the audio data;
9. The program according to any one of claims 2 to 8, wherein in the outputting as the first evaluation result, the state of the audio data is evaluated based on the packet loss rate during the decoding.

10. The program according to claim 9, wherein in the step of presenting the second evaluation result to the user, the packet loss rate in communication with the other terminal device is presented to the user as the second evaluation result.

9. The program according to any one of claims 2 to 8, wherein the step of outputting the first evaluation result utilizes speech segment detection to evaluate the state of the speech data.

9. The program according to any one of claims 2 to 8, wherein in the step of outputting the first evaluation result, the state of the audio data is evaluated based on the quality of the audio data.

13. The program according to claim 12, wherein in the step of presenting the second evaluation result to the user, the quality of the voice data based on the voice uttered by the user is presented to the user as the second evaluation result.

In the step of receiving the second evaluation result, receiving evaluations of the state of the audio data based on the quality of the audio data from the plurality of other terminal devices;
14. The program according to claim 13, wherein in the step of presenting the second evaluation results to the user, the second evaluation results other than at least one are removed.

In the step of receiving the voice data, the voice data to which a first identifier specifying the other terminal device is assigned is received;
In the step of outputting as the first evaluation result, attaching a second identifier specifying the terminal device to the first evaluation result,
In the step of returning the first evaluation result to the other terminal device, the first evaluation result to which the second identifier is attached is sent to the other terminal device specified by the first identifier. 15. The program according to any one of claims 1 to 14, which is transmitted as.

evaluating the quality of the voice data based on the voice uttered by the user of the terminal device;
and presenting to the user a third evaluation result, which is an evaluation result of the quality of the voice data based on the voice uttered by the user of the terminal device. program described in .

The step of evaluating the quality of the voice data is based on characteristics of the voice emitted by the user of the terminal device, such as power of the voice, signal-to-noise ratio, microphone characteristics, or a combination of detection of at least two of these. 17. The program according to claim 16, wherein the quality of said audio data is evaluated by using a program.

In the step of evaluating the quality of the voice data, the characteristics are detected each time the voice data is input from the user of the terminal device, and the finally detected value of the characteristics is evaluated when the input of the voice data stops. 18. The program of claim 17, comprising:

19. The program according to claim 17, wherein in the step of evaluating the quality of the audio data, the quality of the audio data is evaluated by calculating a score based on the detection result of the characteristics.

The step of presenting the third evaluation result to the user of the terminal device does not present the third evaluation result unless the voice data is input from the user of the terminal device. A program according to any one of paragraphs.

In the step of presenting the third evaluation result to the user of the terminal device, the third evaluation result is the power of the voice, the SN ratio, the microphone characteristics, or a combination of detection of at least two of them. to the user of the terminal device.

A program for operating a server that includes a processor and a memory and transmits and receives audio data between a plurality of terminal devices,
The program causes the processor to:
detecting the audio power of the input audio for each of the terminal devices;
calculating an average value and a variance of the audio power based on the detection result of the audio power;
determining whether the difference between the voice power of each speaker using the plurality of terminal devices and the average value is equal to or greater than a predetermined value, based on the calculated average value and the variance;
and presenting a fourth determination result to the speaker determined to have a difference between the voice power and the average value equal to or greater than a predetermined value.

23. The program according to claim 22, wherein in the step of detecting the voice power, the voice power is detected each time voice is input from the terminal device, and the last detected value is held when the voice input is stopped.

24. The program according to claim 22 or 23, wherein in the step of calculating the mean value and the variance of the voice power, the mean value and the variance of the voice power are calculated at intervals of seconds.

25. The program according to any one of claims 22 to 24, wherein in the step of determining whether or not the difference between said voice power and said average value is equal to or greater than a predetermined value, determination is made at intervals of seconds.

The audio data is attached with an identifier that identifies the terminal device that sent the audio data,
In the step of calculating the average value and the variance of the voice power, when it is detected that the speaker who is currently speaking is changed based on the identifier, the average value and the variance of the voice power are calculated. 26. The program according to any one of claims 22 to 25, which recalculates.

27. The method according to any one of claims 22 to 26, wherein in the step of presenting the fourth determination result, the fourth determination result is not presented if there is a speaker who is connected to the communication line but does not speak. program as described.

28. The program according to any one of claims 22 to 27, wherein in the step of presenting the fourth determination result, a solution is presented together with the fourth determination result.

A device comprising a processor and memory and capable of transmitting and receiving audio data to and from another device,
The processor
receiving the audio data transmitted from the other device;
evaluating the state of the received audio data and outputting it as a first evaluation result;
returning the first evaluation result to the other device;
and outputting audio based on the received audio data to a user.

A computer-implemented method comprising a processor and memory and capable of transmitting and receiving audio data to and from another computer, comprising:
The processor
receiving the audio data transmitted from the other computer;
evaluating the state of the received audio data and outputting it as a first evaluation result;
returning the first evaluation result to the other computer;
and outputting audio to a user based on the received audio data.

A system capable of transmitting and receiving audio data between a terminal device and another terminal device,
The terminal device
means for receiving the audio data transmitted from the other terminal device;
means for evaluating the state of the received audio data and outputting it as a first evaluation result;
means for returning the first evaluation result to the other terminal device;
and means for outputting to a user voice based on the received voice data.

A device for transmitting and receiving voice data between a plurality of terminal devices, comprising a processor and a memory,
The processor
detecting the audio power of the input audio for each of the terminal devices;
calculating an average value and a variance of the audio power based on the detection result of the audio power;
determining whether the difference between the voice power of each speaker using the plurality of terminal devices and the average value is equal to or greater than a predetermined value, based on the calculated average value and the variance;
and presenting a fourth determination result to the speaker determined to have a difference between the voice power and the average value equal to or greater than a predetermined value.

1. A computer-implemented method, comprising a processor and memory, for transmitting and receiving audio data between a plurality of terminal devices, the method comprising:
The processor
detecting the audio power of the input audio for each of the terminal devices;
calculating an average value and a variance of the audio power based on the detection result of the audio power;
determining whether the difference between the voice power of each speaker using the plurality of terminal devices and the average value is equal to or greater than a predetermined value, based on the calculated average value and the variance;
and presenting a fourth determination result to the speaker determined to have a difference between the voice power and the average value equal to or greater than a predetermined value.

A system having a plurality of terminal devices and a server for transmitting and receiving voice data between the plurality of terminal devices,
The server is
means for detecting audio power of input audio for each terminal device;
means for calculating an average value and variance of the audio power based on the detection result of the audio power;
means for determining whether a difference between the voice power of each speaker using the plurality of terminal devices and the average value is equal to or greater than a predetermined value, based on the calculated average value and the variance;
and means for presenting a fourth determination result to the speaker determined to have a difference between the voice power and the average value equal to or greater than a predetermined value.