JP5929810B2

JP5929810B2 - Voice analysis system, voice terminal apparatus and program

Info

Publication number: JP5929810B2
Application number: JP2013066881A
Authority: JP
Inventors: 米山　博人; 博人米山; 靖飯田; 洋平西野; 藤居　徹; 徹藤居; 啓下谷; 原田　陽雄; 陽雄原田
Original assignee: Fuji Xerox Co Ltd; Fujifilm Business Innovation Corp
Current assignee: Fujifilm Business Innovation Corp
Priority date: 2013-03-27
Filing date: 2013-03-27
Publication date: 2016-06-08
Anticipated expiration: 2033-03-27
Also published as: JP2014191201A

Description

本発明は、音声解析システム、音声端末装置およびプログラムに関する。 The present invention relates to a voice analysis system, a voice terminal device, and a program.

特許文献１には、複数の移動局と、複数の移動局に対し無線通信により情報送受信を行う基地局とを有し、複数の移動局をそれぞれ所持する複数の社員のコミュニケーション状態を検出するコミュニケーション検出システムであって、複数の移動局のアンテナから送信された電波信号に基づき複数の移動局それぞれの位置検出を行い、その検出結果に基づき各移動局の社員に係わるコミュニケーション要素を記録し、その記録内容に基づき各社員同士のコミュニケーション活発度を算出するコミュニケーション検出システムが開示されている。 Patent Literature 1 includes a plurality of mobile stations and a base station that transmits and receives information to and from the plurality of mobile stations by wireless communication, and detects communication states of a plurality of employees who respectively own the plurality of mobile stations. A detection system that detects the position of each of a plurality of mobile stations based on radio signals transmitted from the antennas of the plurality of mobile stations, records communication elements related to employees of each mobile station based on the detection results, and A communication detection system that calculates the communication activity level between employees based on the recorded contents is disclosed.

特開２０１０−２１７００号公報JP 2010-21700 A

本発明は、複数の装着者の音声取得手段で取得される音声に関する情報から装着者相互の対話関係を判定する際の、判定精度を向上させることを目的とする。 An object of this invention is to improve the determination precision at the time of determining the dialog relationship between wearers from the information regarding the sound acquired by the sound acquisition means of a plurality of wearers.

請求項１に記載の発明は、装着者の発声部位からの距離が異なる位置に配置され、話者の音声を取得する複数の音声取得手段と、少なくとも２つの前記音声取得手段で取得された音声の音圧に基づいて、話者が前記装着者か当該装着者以外の他者かを識別する識別手段と、前記識別手段により話者が前記装着者であると識別された場合に、前記音声取得手段で取得された当該装着者の音声の音圧に基づいた電波強度で、当該装着者の発話に関する発話信号を送信する発話信号送信手段と、前記発話信号送信手段から送信された前記発話信号を受信する発話信号受信手段と、前記発話信号受信手段による前記発話信号の受信状況および前記識別手段による識別結果に基づいて、前記装着者の対話関係を判定する対話関係判定手段とを備える音声解析システムである。
請求項２に記載の発明は、前記発話信号受信手段により前記発話信号が受信された場合に、当該発話信号の受信に基づいた受信情報を送信する受信情報送信手段と、前記受信情報送信手段から送信された前記受信情報を取得する受信情報取得手段とを更に含み、前記対話関係判定手段は、前記識別手段による識別結果と、前記受信情報取得手段により取得された前記受信情報とに基づいて、前記装着者の対話関係を判定することを特徴とする請求項１記載の音声解析システムである。 According to the first aspect of the present invention, a plurality of voice acquisition means that are arranged at different distances from the wearer's utterance site and that acquire the voice of the speaker and voices acquired by at least two of the voice acquisition means Identification means for identifying whether the speaker is the wearer or another person other than the wearer based on the sound pressure of the voice, and the voice when the speaker is identified as the wearer by the identification means An utterance signal transmitting means for transmitting an utterance signal related to the utterance of the wearer at a radio wave intensity based on a sound pressure of the wearer's voice acquired by the acquiring means; and the utterance signal transmitted from the utterance signal transmitting means. A speech signal receiving means for receiving the speech signal, and a dialogue relation determining means for judging the dialogue relation of the wearer based on the reception status of the speech signal by the speech signal receiving means and the identification result by the identification means It is an analysis system.
According to a second aspect of the present invention, when the utterance signal is received by the utterance signal receiving means, a reception information transmission means for transmitting reception information based on reception of the utterance signal, and the reception information transmission means Receiving information acquisition means for acquiring the received reception information, wherein the dialogue relationship determination means is based on the identification result by the identification means and the reception information acquired by the reception information acquisition means, The speech analysis system according to claim 1, wherein a dialogue relation of the wearer is determined.

請求項３に記載の発明は、装着者の発声部位からの距離が異なる位置に配置され、話者の音声を取得する複数の音声取得手段と、少なくとも２つの当該音声取得手段で取得された音声の音圧に基づいて、話者が当該装着者か当該装着者以外の他者かを識別する識別手段と、当該識別手段による識別結果に基づいて無線通信回線を介した外部との通信を行う通信手段とをそれぞれが備える複数の音声端末装置と、それぞれの前記音声端末装置の装着者同士の対話関係を判定する対話関係判定手段を備える音声解析装置と、を備え、前記音声端末装置の前記通信手段は、前記識別手段により話者が前記装着者であると識別された場合に、前記音声取得手段で取得された当該装着者の音声の音圧に基づいた電波強度で、当該装着者の発話に関する発話信号を送信する発話信号送信手段と、他の音声端末装置における発話信号送信手段から送信された発話信号を受信する発話信号受信手段とを含み、前記音声解析装置の前記対話関係判定手段は、それぞれの前記音声端末装置の前記識別手段による識別結果と、それぞれの当該音声端末装置の前記発話信号受信手段による前記発話信号の受信状況とに基づいて、対話関係の判定を行うことを特徴とする音声解析システムである。
請求項４に記載の発明は、前記音声解析装置の前記対話関係判定手段は、複数の前記音声端末装置の前記装着者のうち、前記発話信号送信手段により前記発話信号を送信した当該音声端末装置の装着者と、前記発話信号受信手段により当該発話信号を受信した当該音声端末装置の装着者とについて、対話関係の判定を行うことを特徴とする請求項３記載の音声解析システムである。
請求項５に記載の発明は、前記音声解析装置の前記対話関係判定手段は、前記発話信号送信手段により前記発話信号を送信した前記音声端末装置の前記音声取得手段にて取得された音声と、前記発話信号受信手段により当該発話信号を受信した当該音声端末装置の当該音声取得手段にて取得された音声との同調性を比較することで、対話関係の判定を行うことを特徴とする請求項４記載の音声解析システムである。 The invention according to claim 3 is arranged at a position where the distance from the utterance part of the wearer is different, and a plurality of voice acquisition means for acquiring the voice of the speaker and the voice acquired by at least two of the voice acquisition means Based on the sound pressure, identification means for identifying whether the speaker is the wearer or another person other than the wearer, and communication with the outside via a wireless communication line based on the identification result by the identification means A plurality of voice terminal devices each provided with a communication means, and a voice analysis device provided with a dialogue relation determination means for judging a dialogue relation between wearers of each of the voice terminal devices, and the voice terminal device The communication means, when the speaker is identified as the wearer by the identification means, the radio wave intensity based on the sound pressure of the wearer's voice acquired by the voice acquisition means, Speech related to utterance An utterance signal transmitting means for transmitting a signal, and an utterance signal receiving means for receiving an utterance signal transmitted from an utterance signal transmitting means in another voice terminal device, wherein the dialogue relation determining means of the voice analysis device is respectively And determining a dialogue relation based on the identification result of the voice terminal device by the identification unit and the reception status of the utterance signal by the utterance signal reception unit of each voice terminal device. It is an analysis system.
According to a fourth aspect of the present invention, in the voice terminal device, the dialogue relation determination unit of the voice analysis device transmits the utterance signal by the utterance signal transmission unit among the wearers of the plurality of voice terminal devices. The speech analysis system according to claim 3, wherein a dialogue relation is determined between the wearer of the wearer and the wearer of the voice terminal device that has received the speech signal by the speech signal receiving means.
According to a fifth aspect of the present invention, the dialogue relation determination unit of the voice analysis device acquires the voice acquired by the voice acquisition unit of the voice terminal device that has transmitted the utterance signal by the utterance signal transmission unit, and The dialogue relation is determined by comparing synchrony with the voice acquired by the voice acquisition unit of the voice terminal device that has received the speech signal by the utterance signal receiving unit. 4. The voice analysis system according to 4.

請求項６に記載の発明は、装着者の発声部位からの距離が異なる位置に配置され、話者の音声を取得する複数の音声取得手段と、少なくとも２つの前記音声取得手段で取得された音声の音圧に基づいて、話者が前記装着者か当該装着者以外の他者かを識別する識別手段と、前記識別手段により話者が前記装着者であると識別された場合に、前記音声取得手段で取得された当該装着者の音声の音圧に基づいた電波強度で、当該装着者の発話に関する発話信号を送信する発話信号送信手段と、他者の発話に関する発話信号を受信する発話信号受信手段とを含む音声端末装置である。
請求項７に記載の発明は、前記音声取得手段にて取得した音声に関する情報と前記識別手段による識別結果とを含む発話情報を送信する発話情報送信手段をさらに有することを特徴とする請求項６記載の音声端末装置である。
請求項８に記載の発明は、前記発話信号受信手段は、他者が装着した他の装置の発話信号送信手段により送信された当該他者の発話に関する発話信号を受信することを特徴とする請求項６または７記載の音声端末装置である。
請求項９に記載の発明は、前記発話信号送信手段は、前記音声取得手段で取得された前記装着者の音声の音圧が大きいほど、大きな電波強度で前記発話信号を送信することを特徴とする請求項６乃至８のいずれか１項記載の音声端末装置である。
請求項１０に記載の発明は、コンピュータに、装着者の発声部位からの距離が異なる位置に配置され、話者の音声を取得する複数の音声取得手段から音声の情報を取得する機能と、少なくとも２つの前記音声取得手段で取得された音声の音圧差に基づいて、話者が前記装着者か当該装着者以外の他者かを識別する機能と、話者が前記装着者であると識別された場合に、前記音声取得手段で取得された当該装着者の音声の音圧に基づいた電波強度で、当該装着者の発話に関する発話信号を送信する機能と、他者の発話に関する発話信号を受信する機能とを実現させるプログラムである。 According to the sixth aspect of the present invention, a plurality of voice acquisition means that are arranged at different distances from the wearer's utterance site and that acquire the voice of the speaker, and voices acquired by at least two of the voice acquisition means Identification means for identifying whether the speaker is the wearer or another person other than the wearer based on the sound pressure of the voice, and the voice when the speaker is identified as the wearer by the identification means An utterance signal transmitting means for transmitting an utterance signal related to the utterance of the wearer at a radio wave intensity based on the sound pressure of the wearer's voice acquired by the acquisition means, and an utterance signal for receiving an utterance signal related to the utterance of the other person And a voice terminal apparatus including a receiving unit.
The invention according to claim 7 further includes speech information transmitting means for transmitting speech information including information related to the voice acquired by the voice acquisition means and the identification result by the identification means. It is the voice terminal device described.
The invention according to claim 8 is characterized in that the utterance signal receiving means receives an utterance signal relating to the utterance of the other person transmitted by the utterance signal transmitting means of another device worn by the other person. Item 8. The voice terminal device according to Item 6 or 7.
The invention according to claim 9 is characterized in that the utterance signal transmitting means transmits the utterance signal with a higher radio field strength as the sound pressure of the wearer's voice acquired by the voice acquisition means is larger. The voice terminal device according to any one of claims 6 to 8.
The invention according to claim 10 is characterized in that the computer has a function of acquiring voice information from a plurality of voice acquisition means that are arranged at different positions from the wearer's utterance site and that acquire the voice of the speaker, A function for identifying whether the speaker is the wearer or another person other than the wearer, and the speaker is identified as the wearer based on the sound pressure difference between the two voices acquired by the two voice acquisition means A function to transmit an utterance signal related to the utterance of the wearer at a radio wave intensity based on the sound pressure of the wearer's voice acquired by the voice acquisition means, and an utterance signal related to the utterance of the other person It is a program that realizes the function to perform.

請求項１に係る発明によれば、本構成を採用しない場合と比較して、複数の装着者の音声取得手段で取得される音声に関する情報から装着者相互の対話関係を判定する際の、判定精度を向上させることが可能になる。
請求項２に係る発明によれば、本構成を採用しない場合と比較して、装着者相互の対話関係の判定を行う際の、誤判定の発生を抑制することが可能になる。
請求項３に係る発明によれば、本構成を採用しない場合と比較して、複数の装着者の音声取得手段で取得される音声に関する情報から装着者相互の対話関係を判定する際の、判定精度を向上させることが可能になる。
請求項４に係る発明によれば、本構成を採用しない場合と比較して、対話関係の判断に係る処理が煩雑になるのを抑制することが可能になる。
請求項５に係る発明によれば、本構成を採用しない場合と比較して、装着者相互の対話関係の判定を行う際の、誤判定の発生を抑制することが可能になる。 According to the first aspect of the present invention, compared with the case where this configuration is not adopted, the determination when determining the interactive relationship between the wearers from the information regarding the sound acquired by the sound acquisition means of the plurality of wearers. The accuracy can be improved.
According to the invention which concerns on Claim 2, compared with the case where this structure is not employ | adopted, it becomes possible to suppress generation | occurrence | production of a misjudgment at the time of determining the wearer's mutual dialogue relationship.
According to the third aspect of the present invention, compared with the case where the present configuration is not adopted, the determination when determining the interactive relationship between the wearers from the information regarding the sound acquired by the sound acquisition means of the plurality of wearers. The accuracy can be improved.
According to the fourth aspect of the present invention, it is possible to prevent the processing related to the determination of the dialogue relation from becoming complicated as compared with the case where this configuration is not adopted.
According to the invention which concerns on Claim 5, compared with the case where this structure is not employ | adopted, it becomes possible to suppress generation | occurrence | production of a misjudgment at the time of determining the dialog relationship between wearers.

請求項６に係る発明によれば、本構成を採用しない場合と比較して、複数の装着者の音声取得手段で取得される音声に関する情報から装着者相互の対話関係を判定する際の、判定精度を向上させることが可能になる。
請求項７に係る発明によれば、本構成を採用しない場合と比較して、装着者相互の対話関係の判定をより容易に行うことが可能になる。
請求項８に係る発明によれば、本構成を採用しない場合と比較して、装着者相互の対話関係の判定を行う際の、誤判定の発生を抑制することが可能になる。
請求項９に係る発明によれば、本構成を採用しない場合と比較して、対話関係の判断に係る処理が煩雑になるのを抑制することが可能になる。
請求項１０に係る発明によれば、本構成を採用しない場合と比較して、複数の装着者の音声取得手段で取得される音声に関する情報から装着者相互の対話関係を判定する際の、判定精度を向上させることが可能になる。 According to the sixth aspect of the present invention, compared with the case where this configuration is not adopted, the determination when determining the interactive relationship between the wearers from the information regarding the sound acquired by the sound acquisition means of the plurality of wearers. The accuracy can be improved.
According to the seventh aspect of the present invention, it is possible to more easily determine the interactive relationship between the wearers than in the case where this configuration is not adopted.
According to the invention which concerns on Claim 8, compared with the case where this structure is not employ | adopted, it becomes possible to suppress generation | occurrence | production of a misjudgment at the time of determining the dialog relationship between wearers.
According to the ninth aspect of the present invention, it is possible to suppress the processing related to the determination of the dialogue relation from becoming complicated as compared with the case where this configuration is not adopted.
According to the tenth aspect of the present invention, as compared with the case where the present configuration is not adopted, the determination when determining the interactive relationship between the wearers from the information regarding the sound acquired by the sound acquisition means of the plurality of wearers. The accuracy can be improved.

本実施の形態が適用される音声解析システムの構成例を示す図である。It is a figure which shows the structural example of the audio | voice analysis system with which this Embodiment is applied. 端末装置の構成例を示す図である。It is a figure which shows the structural example of a terminal device. 本実施の形態の端末装置をそれぞれ装着した複数の装着者が対話している状況を示す図である。It is a figure which shows the condition where the several wearer who respectively mounted | worn with the terminal device of this Embodiment is interacting. 図３の対話状況における各端末装置の発話情報の例を示す図である。It is a figure which shows the example of the speech information of each terminal device in the dialog condition of FIG. 本実施の形態が適用される端末装置にて実行される処理を示すフローチャートである。It is a flowchart which shows the process performed with the terminal device to which this Embodiment is applied. 対話する２者間の距離と装着者の発話音声の音圧との関係、および、対話する２者間の距離と端末装置から送信する発話信号の電波強度との関係を示した図である。It is the figure which showed the relationship between the distance between two persons who talk, and the sound pressure of a wearer's utterance voice, and the relationship between the distance between two persons who talk, and the radio field intensity of the speech signal transmitted from a terminal device. 本実施の形態が適用されるホスト装置にて実行される処理を示すフローチャートである。It is a flowchart which shows the process performed with the host apparatus to which this Embodiment is applied. 本実施の形態が適用される音声解析システムにて実行される処理を詳細に説明するための図である。It is a figure for demonstrating in detail the process performed with the audio | voice analysis system to which this Embodiment is applied. 実施例１および比較例１における装着者の配置を示した図である。It is the figure which showed arrangement | positioning of the wearer in Example 1 and Comparative Example 1. FIG. 実施例１および比較例１における発話信号の受信状況を示した図である。It is the figure which showed the reception condition of the speech signal in Example 1 and Comparative Example 1. FIG.

以下、添付図面を参照して、本発明の実施の形態について詳細に説明する。
＜システム構成例＞
図１は、本実施の形態が適用される音声解析システムの構成例を示す図である。
図１に示すように、本実施の形態の音声解析システム１は、音声端末装置の一例としての端末装置１０と、音声解析装置の一例としてのホスト装置２０とを備えて構成される。端末装置１０とホスト装置２０とは、無線通信回線を介して接続されている。無線通信回線の種類としては、Ｗｉ−Ｆｉ（Wireless Fidelity）、Bluetooth（登録商標）、ZigBee、ＵＷＢ（Ultra Wideband）等の既存の方式による回線を用いてよい。また、図示の例では、端末装置１０が１台のみ記載されているが、詳しくは後述するように、端末装置１０は、使用者各人が装着して使用するものであり、実際には使用者数分の端末装置１０が用意される。以下、端末装置１０を装着した使用者を装着者と呼ぶ。 Embodiments of the present invention will be described below in detail with reference to the accompanying drawings.
<System configuration example>
FIG. 1 is a diagram illustrating a configuration example of a speech analysis system to which the exemplary embodiment is applied.
As shown in FIG. 1, the voice analysis system 1 according to the present embodiment includes a terminal device 10 as an example of a voice terminal device and a host device 20 as an example of a voice analysis device. The terminal device 10 and the host device 20 are connected via a wireless communication line. As a type of the wireless communication line, a line using an existing method such as Wi-Fi (Wireless Fidelity), Bluetooth (registered trademark), ZigBee, or UWB (Ultra Wideband) may be used. In the illustrated example, only one terminal device 10 is described, but as will be described in detail later, the terminal device 10 is worn and used by each user and is actually used. As many terminal devices 10 as the number of persons are prepared. Hereinafter, a user wearing the terminal device 10 is referred to as a wearer.

端末装置１０は、音声を取得するための音声取得手段として、複数のマイクロフォン（第１マイクロフォン１１および第２マイクロフォン１２）と、増幅器（第１増幅器１３および第２増幅器１４）とを備える。また、端末装置１０は、取得した音声を解析する音声解析部１５と、解析結果をホスト装置２０に送信するためのデータ送信部１６とを備える。
また、端末装置１０は、他の端末装置１０から出力された後述する発話信号を受信するための信号受信部１７を備える。
さらに、端末装置１０は、端末装置１０の各部へ電力を供給するための電源部１８を備える。
なお、本実施の形態の端末装置１０では、データ送信部１６と信号受信部１７とにより通信手段が構成される。 The terminal device 10 includes a plurality of microphones (first microphone 11 and second microphone 12) and amplifiers (first amplifier 13 and second amplifier 14) as voice acquisition means for acquiring voice. Further, the terminal device 10 includes a voice analysis unit 15 that analyzes the acquired voice and a data transmission unit 16 that transmits the analysis result to the host device 20.
In addition, the terminal device 10 includes a signal receiving unit 17 for receiving an utterance signal (described later) output from another terminal device 10.
Furthermore, the terminal device 10 includes a power supply unit 18 for supplying power to each unit of the terminal device 10.
In the terminal device 10 according to the present embodiment, the data transmitting unit 16 and the signal receiving unit 17 constitute a communication unit.

第１マイクロフォン１１および第２マイクロフォン１２は、装着者の口（発声部位）からの距離が異なる位置に配される（なお、以下の説明において第１マイクロフォン１１と第２マイクロフォン１２とを区別しない場合には、マイクロフォン１１、１２と記載することがある）。ここでは、第１マイクロフォン１１は装着者の口（発声部位）から遠い位置（例えば、３５ｃｍ程度）に配置され、第２マイクロフォン１２は装着者の口（発声部位）に近い位置（例えば、１０ｃｍ程度）に配置されるものとする。本実施の形態の第１マイクロフォン１１および第２マイクロフォン１２として用いられるマイクロフォンの種類としては、ダイナミック型、コンデンサ型等、既存の種々のものを用いてよい。とくに無指向性のＭＥＭＳ（Micro Electro Mechanical Systems）型マイクロフォンが好ましい。 The first microphone 11 and the second microphone 12 are arranged at different positions from the wearer's mouth (speaking part) (in the following description, the first microphone 11 and the second microphone 12 are not distinguished from each other). May be described as microphones 11 and 12). Here, the first microphone 11 is arranged at a position (for example, about 35 cm) far from the mouth (speaking part) of the wearer, and the second microphone 12 is a position (for example, about 10 cm) near the mouth (speaking part) of the wearer. ). As the types of microphones used as the first microphone 11 and the second microphone 12 of the present embodiment, various existing types such as a dynamic type and a condenser type may be used. In particular, a non-directional MEMS (Micro Electro Mechanical Systems) type microphone is preferable.

第１増幅器１３および第２増幅器１４は、それぞれ第１マイクロフォン１１および第２マイクロフォン１２が取得した音声に応じて出力する電気信号（音声信号）を増幅する。本実施の形態の第１増幅器１３および第２増幅器１４として用いられる増幅器としては、既存のオペアンプ等を用いてよい。 The first amplifier 13 and the second amplifier 14 amplify electrical signals (audio signals) that are output according to the audio acquired by the first microphone 11 and the second microphone 12, respectively. As an amplifier used as the first amplifier 13 and the second amplifier 14 of the present embodiment, an existing operational amplifier or the like may be used.

音声解析部１５は、識別手段の一例であって、第１増幅器１３および第２増幅器１４から出力された音声信号を解析する。そして、第１マイクロフォン１１および第２マイクロフォン１２で取得した音声が端末装置１０を装着した装着者自身が発話した音声か、他者の発話による音声かを識別（自他識別）する。音声識別のための具体的な処理の内容については後述する。 The voice analysis unit 15 is an example of an identification unit, and analyzes the voice signals output from the first amplifier 13 and the second amplifier 14. Then, it identifies whether the voice acquired by the first microphone 11 and the second microphone 12 is the voice of the wearer wearing the terminal device 10 or the voice of another person's speech (self-other identification). Details of specific processing for voice identification will be described later.

データ送信部１６は、音声解析部１５による発話の解析結果（自他識別の結果）を含む取得データと端末装置１０の端末ＩＤとを、上記の無線通信回線を介してホスト装置２０へ送信する。ホスト装置２０へ送信する情報としては、ホスト装置２０において行われる処理の内容に応じて、上記の解析結果の他、例えば、第１マイクロフォン１１および第２マイクロフォン１２による音声の取得時刻、取得音声の音圧等の情報を含めて良い。本実施の形態では、データ送信部１６からホスト装置２０へ送信されるこれらのデータを発話情報と呼ぶ。
また、端末装置１０に音声解析部１５による解析結果等の発話情報を蓄積するデータ蓄積部を設け、一定期間に保存したデータ（発話情報）を一括送信してもよい。なお、データ送信部１６からホスト装置２０への発話情報の送信は、有線回線を用いて行ってもよい。 The data transmission unit 16 transmits the acquired data including the result of speech analysis (result of self-other identification) by the voice analysis unit 15 and the terminal ID of the terminal device 10 to the host device 20 via the wireless communication line. . As information to be transmitted to the host device 20, in addition to the above analysis results, for example, the acquisition time of the sound by the first microphone 11 and the second microphone 12 and the acquired sound according to the contents of the processing performed in the host device 20 Information such as sound pressure may be included. In the present embodiment, these data transmitted from the data transmission unit 16 to the host device 20 are referred to as speech information.
Further, the terminal device 10 may be provided with a data accumulation unit that accumulates utterance information such as analysis results by the voice analysis unit 15, and data (utterance information) stored for a certain period may be transmitted collectively. Note that the transmission of utterance information from the data transmission unit 16 to the host device 20 may be performed using a wired line.

さらに、本実施の形態のデータ送信部１６は、音声解析部１５において、第１マイクロフォン１１および第２マイクロフォン１２で取得した音声が装着者自身の発話音声であると識別した場合に、上記の無線通信回線を介して他の端末装置１０に発話信号を送信する。
さらにまた、本実施の形態のデータ送信部１６は、信号受信部１７にて発話信号を受信した場合に、ホスト装置２０へ受信情報を送信する。
本実施の形態の端末装置１０では、データ送信部１６により発話信号送信手段、発話情報送信手段および受信情報送信手段が構成される。 Furthermore, when the voice analysis unit 15 identifies that the voice acquired by the first microphone 11 and the second microphone 12 is the voice of the wearer itself, the data transmission unit 16 of the present embodiment performs the above wireless communication. An utterance signal is transmitted to another terminal device 10 via a communication line.
Furthermore, the data transmission unit 16 according to the present embodiment transmits reception information to the host device 20 when the signal reception unit 17 receives an utterance signal.
In the terminal device 10 of the present embodiment, the data transmission unit 16 constitutes an utterance signal transmission unit, an utterance information transmission unit, and a reception information transmission unit.

ここで、発話信号には、例えば、発話信号を送信する自身の端末装置１０のＩＤや、第１マイクロフォン１１および第２マイクロフォン１２による装着者の発話音声の取得時刻等が含まれる。
また、受信情報としては、受信情報を送信する自身の端末装置１０のＩＤ情報や、受信した発話信号に含まれる、発話信号を送信した相手方の端末装置１０のＩＤ情報、発話音声の取得時刻に関する情報等が含まれる。
なお、データ送信部１６における発話信号および受信情報の送信処理等については、後段にて詳細に説明する。 Here, the speech signal includes, for example, the ID of the terminal device 10 that transmits the speech signal, the acquisition time of the wearer's speech by the first microphone 11 and the second microphone 12, and the like.
The received information relates to the ID information of the terminal device 10 that transmits the received information, the ID information of the other terminal device 10 that transmitted the utterance signal included in the received utterance signal, and the acquisition time of the utterance voice. Information etc. are included.
Note that the transmission processing of the speech signal and the reception information in the data transmission unit 16 will be described in detail later.

信号受信部１７は、発話信号受信手段の一例であって、上述したように、他の端末装置１０から送信された発話信号を受信する。
電源部１８は、上記の第１マイクロフォン１１、第２マイクロフォン１２、第１増幅器１３、第２増幅器１４、音声解析部１５、データ送信部１６および信号受信部１７に電力を供給する。電源としては、例えば乾電池や充電池等の既存の電源が用いられる。また、電源部１８は、必要に応じて、電圧変換回路および充電制御回路等の周知の回路を含む。 The signal receiving unit 17 is an example of an utterance signal receiving unit, and receives an utterance signal transmitted from another terminal device 10 as described above.
The power supply unit 18 supplies power to the first microphone 11, the second microphone 12, the first amplifier 13, the second amplifier 14, the voice analysis unit 15, the data transmission unit 16, and the signal reception unit 17. As the power source, for example, an existing power source such as a dry battery or a rechargeable battery is used. The power supply unit 18 includes known circuits such as a voltage conversion circuit and a charge control circuit as necessary.

ホスト装置２０は、端末装置１０から送信されたデータを受信するデータ受信部２１と、受信したデータを蓄積するデータ蓄積部２２と、蓄積したデータを解析するデータ解析部２３と、解析結果を出力する出力部２４とを備える。このホスト装置２０は、例えばパーソナルコンピュータ等の情報処理装置により実現される。また、上記のように本実施の形態では複数台の端末装置１０が使用され、ホスト装置２０は、その複数台の端末装置１０の各々から送信されたデータを受信する。 The host device 20 outputs a data reception unit 21 that receives data transmitted from the terminal device 10, a data storage unit 22 that stores the received data, a data analysis unit 23 that analyzes the stored data, and an analysis result. Output unit 24. The host device 20 is realized by an information processing device such as a personal computer. Further, as described above, a plurality of terminal devices 10 are used in the present embodiment, and the host device 20 receives data transmitted from each of the plurality of terminal devices 10.

データ受信部２１は、受信情報取得手段の一例であって、上記の無線通信回線に対応しており、各端末装置１０からデータを受信してデータ蓄積部２２へ送る。データ受信部２１が受信するデータとしては、各端末装置１０のデータ送信部１６から送信される発話情報や受信情報等が挙げられる。
データ蓄積部２２は、例えばパーソナルコンピュータの磁気ディスク装置等の記憶装置により実現され、データ受信部２１から取得した発話情報および受信情報等に含まれるデータを発話者別に蓄積する。ここで、発話者の識別は、端末装置１０から送信される発話情報や受信情報に含まれる端末ＩＤと、あらかじめホスト装置２０に登録されている発話者名との照合により行う。また、端末装置１０から端末ＩＤのかわりに装着者状態を送信するようにしてもよい。
また、詳細については後述するが、データ受信部２１は、各端末装置１０のデータ送信部１６から送信される受信情報についても受信する。 The data receiving unit 21 is an example of a reception information acquisition unit and corresponds to the above-described wireless communication line. The data receiving unit 21 receives data from each terminal device 10 and sends the data to the data storage unit 22. Examples of data received by the data receiving unit 21 include speech information and reception information transmitted from the data transmitting unit 16 of each terminal device 10.
The data storage unit 22 is realized by a storage device such as a magnetic disk device of a personal computer, for example, and stores data included in speech information and reception information acquired from the data reception unit 21 for each speaker. Here, the speaker is identified by collating the terminal ID included in the utterance information and reception information transmitted from the terminal device 10 with the utterer name registered in the host device 20 in advance. Further, the wearer state may be transmitted from the terminal device 10 instead of the terminal ID.
Although details will be described later, the data reception unit 21 also receives reception information transmitted from the data transmission unit 16 of each terminal device 10.

データ解析部２３は、対話関係判定手段の一例であって、例えばパーソナルコンピュータのプログラム制御されたＣＰＵにより実現され、データ蓄積部２２に蓄積されたデータを解析する。具体的な解析内容および解析手法は、本実施の形態のシステムの利用目的や利用態様に応じて種々の内容および手法を取り得る。例えば、端末装置１０の装着者どうしの対話頻度や各装着者の対話相手の傾向を分析したり、対話における個々の発話の長さや音圧の情報から対話者の関係を類推したりすることが行われる。
詳細については後段にて説明するが、本実施の形態では、データ解析部２３は、データ受信部２１が受信しデータ蓄積部２２に蓄積された発話情報および受信情報に基づいて、発話音声の同調性を判別し、対話者の関係を判定している。 The data analysis unit 23 is an example of a dialogue relationship determination unit, and is realized by a program-controlled CPU of a personal computer, for example, and analyzes data stored in the data storage unit 22. Specific analysis contents and analysis methods may take various contents and methods depending on the purpose and use of the system according to the present embodiment. For example, the frequency of dialogue between wearers of the terminal device 10 and the tendency of each wearer's dialogue partner may be analyzed, or the relationship between the dialogues may be inferred from information on individual utterance length and sound pressure in the dialogue. Done.
Although details will be described later, in the present embodiment, the data analysis unit 23 synchronizes the uttered voice based on the utterance information received by the data reception unit 21 and stored in the data storage unit 22 and the reception information. Gender is discriminated, and the relationship of the interlocutor is judged.

出力部２４は、データ解析部２３による解析結果を出力したり、解析結果に基づく他のデータの出力を行ったりする。この解析結果等を出力する手段は、システムの利用目的や利用態様、解析結果の内容や形式等に応じて、ディスプレイ表示、プリンタによる印刷出力、音声出力等、種々の手段を取り得る。 The output unit 24 outputs the analysis result obtained by the data analysis unit 23 or outputs other data based on the analysis result. The means for outputting the analysis result or the like can take various means such as display display, print output by a printer, voice output, and the like according to the purpose and use of the system and the content and format of the analysis result.

＜端末装置の構成例＞
図２は、端末装置１０の構成例を示す図である。
上記のように、端末装置１０は、各使用者（装着者）に装着されて使用される。使用者が装着可能とするため、本実施の形態の端末装置１０は、図２に示すように、装置本体３０と、装置本体３０に接続された提げ紐４０とを備えた構成とする。図示の構成において、使用者は、提げ紐４０に首を通し、装置本体３０を首から提げて装着する。 <Configuration example of terminal device>
FIG. 2 is a diagram illustrating a configuration example of the terminal device 10.
As described above, the terminal device 10 is worn and used by each user (wearer). In order for the user to be able to wear the terminal device 10 according to the present embodiment, as shown in FIG. 2, the terminal device 10 includes a device main body 30 and a strap 40 connected to the device main body 30. In the configuration shown in the figure, the user puts the neck through the strap 40 and hangs the apparatus main body 30 from the neck.

装置本体３０は、金属や樹脂等で形成された薄い直方体のケース３１に、少なくとも第１増幅器１３、第２増幅器１４、音声解析部１５、データ送信部１６、信号受信部１７および電源部１８を実現する回路と電源部１８の電源（電池）とを収納して構成される。また、本実施の形態では、ケース３１に第１マイクロフォン１１が設けられる。さらに、ケース３１には、装着者の氏名や所属等のＩＤ情報を表示したＩＤカード等を挿入するポケットを設けても良い。また、ケース３１自体の表面にそのようなＩＤ情報等を印刷したり、ＩＤ情報等を記載したシールを貼り付けたりしても良い。 The apparatus main body 30 includes at least a first amplifier 13, a second amplifier 14, a voice analysis unit 15, a data transmission unit 16, a signal reception unit 17, and a power supply unit 18 in a thin rectangular parallelepiped case 31 made of metal, resin, or the like. The circuit to be realized and the power supply (battery) of the power supply unit 18 are accommodated. In the present embodiment, the case 31 is provided with the first microphone 11. Further, the case 31 may be provided with a pocket for inserting an ID card or the like displaying ID information such as the name and affiliation of the wearer. Further, such ID information or the like may be printed on the surface of the case 31 itself, or a sticker describing the ID information or the like may be attached.

提げ紐４０には、第２マイクロフォン１２が設けられる。第２マイクロフォン１２は、提げ紐４０の内部を通るケーブル（電線等）により、装置本体３０のケース３１に収納された第２増幅器１４に接続される。提げ紐４０の材質としては、革、合成皮革、木綿その他の天然繊維や樹脂等による合成繊維、金属等、既存の種々の材質を用いて良い。また、シリコン樹脂やフッ素樹脂等を用いたコーティング処理が施されていても良い。 The strap 40 is provided with the second microphone 12. The second microphone 12 is connected to the second amplifier 14 accommodated in the case 31 of the apparatus main body 30 by a cable (electric wire or the like) passing through the inside of the strap 40. As the material of the strap 40, various existing materials such as leather, synthetic leather, cotton and other natural fibers and synthetic fibers such as resin, metal, etc. may be used. Moreover, the coating process using a silicon resin, a fluororesin, etc. may be given.

この提げ紐４０は、筒状の構造を有し、提げ紐４０の内部に第２マイクロフォン１２を収納している。第２マイクロフォン１２を提げ紐４０の内部に設けることにより、第２マイクロフォン１２の損傷や汚れを防ぎ、対話者が第２マイクロフォン１２の存在を意識することが抑制される。
なお、本実施の形態では、装着者の口（発声部位）から遠い位置に配置される第１マイクロフォン１１を装置本体３０に設けたが、第２マイクロフォン１２と同様に、第１マイクロフォン１１を提げ紐４０に設けてもよい。 The strap 40 has a cylindrical structure, and the second microphone 12 is housed inside the strap 40. By providing the second microphone 12 inside the strap 40, the second microphone 12 is prevented from being damaged or soiled, and the conversation person is prevented from being aware of the presence of the second microphone 12.
In the present embodiment, the first microphone 11 disposed at a position far from the wearer's mouth (speaking part) is provided in the apparatus main body 30, but like the second microphone 12, the first microphone 11 is provided. The string 40 may be provided.

本実施の形態では、端末装置１０にて取得した音声について、話者が装着者であるか装着者以外の他者であるかを識別（自他識別）し、自他識別の結果を利用して、端末装置１０から他の端末装置１０へ発話信号を送信している。そして、ホスト装置２０では、他の端末装置１０における発話信号の受信状況等に基づいて、それぞれの端末装置１０を装着する装着者の対話関係を判別している。
以下、自他識別の方法および対話関係の判別の方法について、順に説明する。 In the present embodiment, the voice acquired by the terminal device 10 is identified (self-other identification) whether the speaker is a wearer or another person other than the wearer, and the result of self-other identification is used. Thus, an utterance signal is transmitted from the terminal device 10 to another terminal device 10. Then, the host device 20 determines the dialogue relationship of the wearer wearing each terminal device 10 based on the reception status of the utterance signal in the other terminal devices 10.
Hereinafter, the self-other identification method and the interactive relationship determination method will be described in order.

＜話者が装着者であるか他者であるかを識別する方法の説明＞
続いて、以上の音声解析システム１において、端末装置１０の音声解析部１５にて話者が装着者であるか装着者以外のものである他者であるかを識別（自他識別）する方法について説明する。
本実施の形態の音声解析システム１では、端末装置１０に設けられた第１マイクロフォン１１および第２マイクロフォン１２にて取得した音声の情報を用いて、取得した音声が端末装置１０の装着者自身の発話音声であるか他者の発話音声であるかを識別する。言い換えれば、本実施の形態の音声解析システム１では、取得した音声の発話者に関して自他の別を識別する。また、本実施の形態では、取得した音声の情報のうち、形態素解析や辞書情報等を用いて得られる言語情報ではなく、音圧（第１マイクロフォン１１および第２マイクロフォン１２への入力音量）等の非言語情報に基づいて発話者を識別する。言い換えれば、本実施の形態では、言語情報により特定される発話内容ではなく、非言語情報により特定される発話状況から音声の発話者を識別する。 <Description of how to identify whether a speaker is a wearer or someone else>
Subsequently, in the speech analysis system 1 described above, the speech analysis unit 15 of the terminal device 10 identifies (self-other identification) whether the speaker is a wearer or another person other than the wearer. Will be described.
In the voice analysis system 1 according to the present embodiment, using the voice information acquired by the first microphone 11 and the second microphone 12 provided in the terminal device 10, the acquired voice is transmitted to the wearer himself / herself of the terminal device 10. It is identified whether it is a speech voice or another person's speech voice. In other words, in the voice analysis system 1 of the present embodiment, the other person is identified as to the acquired voice speaker. In the present embodiment, the sound pressure (the input sound volume to the first microphone 11 and the second microphone 12), etc., not the linguistic information obtained by using morphological analysis, dictionary information, or the like among the acquired voice information. The speaker is identified based on the non-linguistic information. In other words, in the present embodiment, the voice speaker is identified not from the utterance content specified by the linguistic information but from the utterance situation specified by the non-linguistic information.

図１および図２を参照して説明したように、本実施の形態の端末装置１０において、第１マイクロフォン１１は装着者の口（発声部位）から遠い位置に配置され、第２マイクロフォン１２は装着者の口（発声部位）に近い位置に配置される。すなわち、装着者の口（発声部位）を音源とすると、第１マイクロフォン１１と音源との間の距離と、第２マイクロフォン１２と音源との間の距離とが大きく異なる。例えば、第１マイクロフォン１１と音源との間の距離は、第２マイクロフォン１２と音源との間の距離の１．５倍〜４倍程度に設定することができる。
ここで、マイクロフォン１１、１２にて取得される音声の音圧は、マイクロフォン１１、１２と音源との間の距離が大きくなるに従って減衰（距離減衰）する。したがって、装着者の発話音声に関して、第１マイクロフォン１１にて取得した音声の音圧と、第２マイクロフォン１２にて取得した音声の音圧とは大きく異なる。 As described with reference to FIGS. 1 and 2, in the terminal device 10 of the present embodiment, the first microphone 11 is disposed at a position far from the wearer's mouth (speaking site), and the second microphone 12 is worn. It is arranged at a position close to the person's mouth (voice part). That is, when the wearer's mouth (speaking part) is a sound source, the distance between the first microphone 11 and the sound source and the distance between the second microphone 12 and the sound source are greatly different. For example, the distance between the first microphone 11 and the sound source can be set to about 1.5 to 4 times the distance between the second microphone 12 and the sound source.
Here, the sound pressure of the sound acquired by the microphones 11 and 12 is attenuated (distance attenuation) as the distance between the microphones 11 and 12 and the sound source increases. Therefore, regarding the voice of the wearer, the sound pressure of the sound acquired by the first microphone 11 and the sound pressure of the sound acquired by the second microphone 12 are greatly different.

一方、装着者以外の者（他者）の口（発声部位）を音源とした場合を考えると、通常その他者は装着者から離れているため、第１マイクロフォン１１と音源との間の距離と、第２マイクロフォン１２と音源との間の距離とは、大きく変わらない。装着者に対する他者の位置によっては、両距離の差は生じるが、装着者の口（発声部位）を音源とした場合のように、第１マイクロフォン１１と音源との間の距離が、第２マイクロフォン１２と音源との間の距離の数倍となることはない。したがって、他者の発話音声に関して、第１マイクロフォン１１にて取得した音声の音圧と、第２マイクロフォン１２にて取得した音声の音圧とは、装着者の発話音声の場合のように大きく異なることはない。 On the other hand, considering the case where the mouth (speaking part) of a person other than the wearer is used as a sound source, since the other person is usually away from the wearer, the distance between the first microphone 11 and the sound source The distance between the second microphone 12 and the sound source does not change greatly. Depending on the position of the other person with respect to the wearer, a difference between the two distances may occur, but the distance between the first microphone 11 and the sound source is the second as in the case where the mouth (speaking part) of the wearer is used as the sound source. It is never several times the distance between the microphone 12 and the sound source. Therefore, the sound pressure of the sound acquired by the first microphone 11 and the sound pressure of the sound acquired by the second microphone 12 with respect to the speech sound of the other person are greatly different as in the case of the wearer's speech sound. There is nothing.

そこで、本実施の形態では、第１マイクロフォン１１にて取得した音声の音圧と第２マイクロフォン１２にて取得した音声の音圧との比である音圧比を求める。そして、この音圧比を用いて、取得した音声が装着者自身の発話音声であるか他者の発話音声であるかを識別する。より具体的には、本実施の形態では、第２マイクロフォン１２にて取得した音声の音圧に対する第１マイクロフォン１１にて取得した音声の音圧の比（音圧比）について、閾値を設定する。そして、取得した音声の音圧比が閾値よりも大きい場合には、装着者自身の発話音声と判断し、音圧比が閾値よりも小さい場合には、他者の発話音声と判断する。
そして、上述した方法により得られた、音声の話者が装着者であるか他者であるかの識別結果は、発話情報に含まれて端末装置１０のデータ送信部１６からホスト装置２０へ送信される。 Therefore, in the present embodiment, a sound pressure ratio that is a ratio between the sound pressure of the sound acquired by the first microphone 11 and the sound pressure of the sound acquired by the second microphone 12 is obtained. Then, using this sound pressure ratio, it is identified whether the acquired voice is the wearer's own voice or the other person's voice. More specifically, in the present embodiment, a threshold is set for the ratio of the sound pressure of the sound acquired by the first microphone 11 to the sound pressure of the sound acquired by the second microphone 12 (sound pressure ratio). And when the sound pressure ratio of the acquired sound is larger than the threshold value, it is determined as the speech sound of the wearer itself, and when the sound pressure ratio is smaller than the threshold value, it is determined as the speech sound of the other person.
The identification result obtained by the above-described method as to whether the voice speaker is a wearer or another person is included in the utterance information and transmitted from the data transmission unit 16 of the terminal device 10 to the host device 20. Is done.

なお、上述した例では、第１マイクロフォン１１および第２マイクロフォン１２にて取得した音声の音圧を基に自他識別の判断を行ったが、これに音声の位相差の情報を加味することも考えられる。つまり、装着者の口（発声部位）を音源とすると、上述したように、第１マイクロフォン１１と音源との間の距離と、第２マイクロフォン１２と音源との間の距離とは大きく異なる。そのため、第１マイクロフォン１１にて取得した音声と、第２マイクロフォン１２にて取得した音声との位相差は大きくなる。一方、装着者以外の他者の口（発声部位）を音源とした場合は、上述したように、他者が装着者から離れているため、第１マイクロフォン１１と音源との間の距離と、第２マイクロフォン１２と音源との間の距離とは、大きく変わらない。そのため、第１マイクロフォン１１にて取得した音声と、第２マイクロフォン１２にて取得した音声との位相差は小さくなる。
よって、第１マイクロフォン１１にて取得した音声と第２マイクロフォン１２にて取得した音声との位相差を考慮することで、自他識別の判断の精度を向上させることができる。 In the above-described example, the self / other identification is determined based on the sound pressure of the sound acquired by the first microphone 11 and the second microphone 12, but information on the phase difference of the sound may be added to this. Conceivable. That is, when the wearer's mouth (speaking part) is a sound source, as described above, the distance between the first microphone 11 and the sound source and the distance between the second microphone 12 and the sound source are greatly different. Therefore, the phase difference between the sound acquired by the first microphone 11 and the sound acquired by the second microphone 12 becomes large. On the other hand, when the other person's mouth (speaking part) other than the wearer is used as the sound source, as described above, since the other person is away from the wearer, the distance between the first microphone 11 and the sound source, The distance between the second microphone 12 and the sound source does not change greatly. Therefore, the phase difference between the sound acquired by the first microphone 11 and the sound acquired by the second microphone 12 is small.
Therefore, by considering the phase difference between the sound acquired by the first microphone 11 and the sound acquired by the second microphone 12, the accuracy of the self-other identification determination can be improved.

＜装着者同士の対話関係の判定＞
続いて、ホスト装置２０のデータ解析部２３において、各端末装置１０から送信され、データ受信部２１にて受信した発話情報に基づいて複数の装着者同士の対話関係を判定する方法について説明する。本実施の形態では、各端末装置１０から受信した発話情報について、音声の同調性の有無を判別することにより、装着者同士の対話関係を判定している。
なお、以下で述べる方法は装着者同士の対話関係を判定する方法の一例であり、他の方法を採用しても構わない。 <Determination of dialogue between wearers>
Next, a method of determining the dialogue relationship between a plurality of wearers based on the utterance information transmitted from each terminal device 10 and received by the data receiving unit 21 in the data analysis unit 23 of the host device 20 will be described. In the present embodiment, the dialogue relationship between the wearers is determined by determining the presence / absence of audio synchronism with respect to the utterance information received from each terminal device 10.
Note that the method described below is an example of a method for determining a dialogue relationship between wearers, and other methods may be adopted.

図３は、本実施の形態の端末装置１０をそれぞれ装着した複数の装着者が対話している状況を示す図である。図４は、図３の対話状況における各端末装置１０Ａ、１０Ｂの発話情報の例を示す図である。
図３に示すように、端末装置１０Ａ、端末装置１０Ｂをそれぞれ装着した二人の装着者Ａ、装着者Ｂが対話している場合を考える。このとき、装着者Ａの発話音声は、装着者Ａの端末装置１０Ａと装着者Ｂの端末装置１０Ｂとの双方に捉えられる。同様に、装着者Ｂの発話音声は、装着者Ａの端末装置１０Ａと装着者Ｂの端末装置１０Ｂとの双方に捉えられる。 FIG. 3 is a diagram illustrating a situation in which a plurality of wearers each wearing the terminal device 10 of the present embodiment are interacting with each other. FIG. 4 is a diagram showing an example of utterance information of each terminal device 10A, 10B in the conversation state of FIG.
As shown in FIG. 3, a case is considered in which two wearers A and B who are respectively wearing the terminal device 10A and the terminal device 10B are interacting with each other. At this time, the voice of the wearer A is captured by both the terminal device 10A of the wearer A and the terminal device 10B of the wearer B. Similarly, the voice of the wearer B is captured by both the terminal device 10A of the wearer A and the terminal device 10B of the wearer B.

端末装置１０Ａおよび端末装置１０Ｂからは、それぞれ、独立に、発話情報がホスト装置２０に送られる。ここで、装着者Ａの端末装置１０Ａにおいて装着者の発話として認識される音声は、装着者Ｂの端末装置１０Ｂでは他者の発話として認識される。反対に、端末装置１０Ｂにおいて装着者の発話として認識される音声は、端末装置１０Ａでは他者の発話として認識される。このため、端末装置１０Ａから取得した発話情報と、端末装置１０Ｂから取得した発話情報とは、図４に示すように、発話時間の長さや発話者が切り替わったタイミング等の発話状況を示す情報は近似する。 The terminal device 10A and the terminal device 10B independently send utterance information to the host device 20. Here, the voice recognized as the utterance of the wearer in the terminal device 10A of the wearer A is recognized as the utterance of the other person in the terminal device 10B of the wearer B. On the other hand, the voice recognized as the utterance of the wearer in the terminal device 10B is recognized as the utterance of the other person in the terminal device 10A. For this reason, the utterance information acquired from the terminal device 10A and the utterance information acquired from the terminal device 10B are information indicating the utterance status such as the length of the utterance time and the timing when the utterer is switched as shown in FIG. Approximate.

そこで、この例においてホスト装置２０は、端末装置１０Ａから取得した情報と端末装置１０Ｂから取得した情報とを比較することにより、これらの情報が同じ発話状況を示しているか否かを判断し、これに基づいて、装着者Ａと装着者Ｂとの対話の有無を認識する。
ここで、発話状況を示す情報としては、少なくとも、上述した発話者ごとの個々の発話における発話時間の長さ、個々の発話の開始時刻と終了時刻、発話者が切り替わった時刻（タイミング）等のように、発話に関する時間情報が用いられる。なお、特定の会話に係る発話状況を判断するために、これらの発話に関する時間情報の一部のみを用いてもよいし、他の情報を付加的に用いてもよい。 Therefore, in this example, the host device 20 compares the information acquired from the terminal device 10A and the information acquired from the terminal device 10B to determine whether or not these information indicate the same utterance situation. Based on the above, the presence or absence of dialogue between the wearer A and the wearer B is recognized.
Here, the information indicating the utterance state includes at least the length of the utterance time in each utterance for each utterer described above, the start time and end time of each utterance, and the time (timing) at which the utterer is switched. As described above, time information related to the utterance is used. In addition, in order to judge the utterance situation concerning a specific conversation, only a part of time information regarding these utterances may be used, or other information may be additionally used.

ここで、本実施の形態では、端末装置１０の音声解析部１５にて解析し、発話情報としてデータ送信部１６を介して送信された自他識別情報を使用して、複数の端末装置１０からマイクロフォン１１、１２にて受信した発話音声の音声信号の同調性を判別している。
すなわち、発話情報に自他識別情報が付与されることで、取得された音声が装着者自身によるものであるか、装着者以外の他者によるものであるかを予め判別できているため、発話者が切り替わったタイミング等を明確に把握できる。そして、このタイミングにおいて発話者が逆転する装着者同士を見つければ、発話音声の同調性があると判断し、この装着者同士が対話していると判定することができる。 Here, in this Embodiment, it analyzes from the audio | voice analysis part 15 of the terminal device 10, and uses the self-and-others identification information transmitted via the data transmission part 16 as speech information, from several terminal device 10 The synchronism of the voice signal of the uttered voice received by the microphones 11 and 12 is determined.
That is, since the identification information is added to the utterance information, it can be determined in advance whether the acquired voice is from the wearer itself or from someone other than the wearer. It is possible to clearly grasp the timing when the person switched. If wearers whose speakers are reversed at this timing are found, it can be determined that the utterances are synchronized, and it can be determined that the wearers are interacting with each other.

ところで、従来、音声解析システム１では、端末装置１０を装着した装着者が複数人存在する場合、装着者同士の対話関係を判定するために、予め定められた空間内に存在する装着者の全てを対象として、上述したような対話関係の判定をしている。具体的には、予め定められた空間内に存在する複数の装着者の中から、予め基準となる装着者を定め、この基準となる装着者が装着する端末装置１０からの発話情報と、上記空間内に存在する他の装着者が装着する全ての端末装置１０からの発話情報とを順次比較することにより、発話音声の同調性の有無を判別し、基準となる装着者と他の装着者との対話関係を判定する。
したがって、予め定められた空間内に存在する装着者の数が多い場合には、基準となる装着者との発話音声の同調性の有無を判別する他の装着者の数が多くなる。これにより、複数の装着者同士の対話関係を判定するための処理が煩雑になり、実際には対話を行っていないのに対話していると判定される等の誤判定が発生しやすくなる等、対話関係の判定の精度が低下する懸念がある。 By the way, conventionally, in the voice analysis system 1, when there are a plurality of wearers wearing the terminal device 10, all of the wearers existing in a predetermined space are determined in order to determine the dialogue relationship between the wearers. As described above, the above-mentioned dialogue relation is determined. Specifically, a reference wearer is determined in advance from a plurality of wearers existing in a predetermined space, and the utterance information from the terminal device 10 worn by the reference wearer, By sequentially comparing the utterance information from all the terminal devices 10 worn by other wearers present in the space, the presence or absence of synchronism of the uttered voice is determined, and the reference wearer and other wearers The dialogue relationship with is determined.
Therefore, when the number of wearers existing in a predetermined space is large, the number of other wearers that determine the presence / absence of synchronism of speech with the wearer serving as a reference increases. This complicates the process for determining the interaction relationship between a plurality of wearers, and makes it easy to generate erroneous determinations such as determining that the user is interacting without actually performing the interaction. There is a concern that the accuracy of the determination of dialogue relations will be reduced.

このような問題を解決するために、予め定められた空間内に存在する他の装着者のうち、基準となる装着者からの距離が予め定められた距離よりも小さい範囲内にいる他の装着者を、対話関係の判定を行う対象とすることが考えられる。
しかし、基準となる装着者からの距離を大きく定めた場合には、対話関係の判定を行う範囲内に存在する他の装着者の数が多くなるため、上述した問題と同様の問題が生じ得る。
また、基準となる装着者からの距離を小さく定めた場合には、基準となる装着者と実際に対話を行っている他の装着者が、対話関係の判定を行う範囲内に存在しない場合がある。すなわち、対話を行う環境等によっては相手との距離が離れた状態で対話を行うことも考えられ、対話を行う基準となる装着者と他の装着者との間の距離が、予め定められた距離よりも大きくなる場合がある。このような場合には、この基準となる装着者と他の装着者との間で対話関係の判定が行われないため、装着者間の対話関係を正確に把握することが困難になる。 In order to solve such a problem, among other wearers existing in a predetermined space, other wearers whose distance from the reference wearer is within a range smaller than the predetermined distance It is conceivable that a person is a target for determining a dialogue relationship.
However, when the distance from the reference wearer is set large, the number of other wearers existing within the range for determining the dialogue relation increases, and the same problem as the above problem may occur. .
In addition, when the distance from the reference wearer is set small, other wearers who are actually interacting with the reference wearer may not be within the range for determining the dialogue relationship. is there. In other words, depending on the environment in which the conversation is performed, it may be possible to perform the conversation with a distance from the other party, and the distance between the wearer serving as a reference for the conversation and the other wearer is determined in advance. May be greater than distance. In such a case, it is difficult to accurately grasp the dialogue relationship between the wearers because the dialogue relationship between the wearer serving as the reference and other wearers is not determined.

そこで、本実施の形態の端末装置１０は、端末装置１０にて取得した音声が装着者の発話音声であると識別した場合に、装着者の発話音声の音圧に基づいた電波強度で、上述した発話信号を他の端末装置１０に向けて送信している。そして、他の端末装置１０のうち発話信号を受信した端末装置１０が、ホスト装置２０へ受信情報を送信している。
さらに、本実施の形態のホスト装置２０では、端末装置１０から受信した発話情報および受信情報に基づいて、複数の端末装置１０のうち対話関係の判定を行う端末装置１０同士の組み合わせを限定し、上述した音声の同調性の判別、装着者同士の対話関係の判定を行っている。
以下、端末装置１０にて実行される処理およびホスト装置２０にて実行される処理について、順に説明する。 Therefore, when the terminal device 10 according to the present embodiment identifies that the voice acquired by the terminal device 10 is the utterance voice of the wearer, the terminal apparatus 10 uses the radio wave intensity based on the sound pressure of the utterance voice of the wearer. The transmitted speech signal is transmitted to another terminal device 10. The terminal device 10 that has received the speech signal among the other terminal devices 10 transmits the reception information to the host device 20.
Furthermore, in the host device 20 of the present embodiment, based on the utterance information and reception information received from the terminal device 10, the combination of the terminal devices 10 that perform dialogue-related determination among the plurality of terminal devices 10 is limited, The above-described determination of the synchronism of the voice and the determination of the dialogue relationship between the wearers are performed.
Hereinafter, processing executed by the terminal device 10 and processing executed by the host device 20 will be described in order.

＜端末装置１０にて実行される処理＞
図５は、本実施の形態が適用される端末装置１０にて実行される処理を示すフローチャートである。続いて、図５および上述した図１を参照して、本実施の形態の端末装置１０にて実行される処理について説明する。
本実施の形態の端末装置１０では、まず、第１マイクロフォン１１、第２マイクロフォン１２にて音声を取得すると（ステップ５０１）、取得音声に基づいた音声信号が、第１増幅器１３および第２増幅器１４に送られる。
第１増幅器１３および第２増幅器１４では、第１マイクロフォン１１および第２マイクロフォン１２からの音声信号をそれぞれ取得すると、取得した音声信号を増幅して音声解析部１５に送る（ステップ５０２）。 <Processing executed by terminal device 10>
FIG. 5 is a flowchart showing processing executed by the terminal device 10 to which the exemplary embodiment is applied. Next, with reference to FIG. 5 and FIG. 1 described above, processing executed in the terminal device 10 of the present embodiment will be described.
In the terminal device 10 according to the present embodiment, first, when sound is acquired by the first microphone 11 and the second microphone 12 (step 501), sound signals based on the acquired sound are converted into the first amplifier 13 and the second amplifier 14 respectively. Sent to.
When the first amplifier 13 and the second amplifier 14 acquire the audio signals from the first microphone 11 and the second microphone 12, respectively, the acquired audio signals are amplified and sent to the audio analysis unit 15 (step 502).

音声解析部１５は、第１増幅器１３および第２増幅器１４で増幅された音声信号の音圧を算出する（ステップ５０３）。
具体的には、例えば、まず第１マイクロフォン１１にて取得され第１増幅器１３で増幅された音声信号および第２マイクロフォン１２にて取得され第２増幅器１４で増幅された音声信号のそれぞれに対して、フィルタリング処理を行い、音声信号から環境音等の雑音（ノイズ）の成分を除去する。そして、音声解析部１５は、雑音成分が除かれたそれぞれの音声信号について、一定の時間単位（例えば、数十分の一秒〜数百分の一秒）毎に平均音圧を算出する。 The voice analysis unit 15 calculates the sound pressure of the voice signal amplified by the first amplifier 13 and the second amplifier 14 (step 503).
Specifically, for example, for each of the audio signal first acquired by the first microphone 11 and amplified by the first amplifier 13 and the audio signal acquired by the second microphone 12 and amplified by the second amplifier 14, respectively. Filtering processing is performed to remove noise components such as environmental sounds from the audio signal. Then, the sound analysis unit 15 calculates an average sound pressure for each sound signal from which the noise component has been removed, every certain time unit (for example, tens of seconds to hundreds of seconds).

続いて、音声解析部１５は、上述した自他識別の方法を用いて、ステップ５０３にて算出した音声信号の音圧に基づいて、第１マイクロフォン１１および第２マイクロフォン１２にて取得した音声が装着者によるものか装着者以外の他者による発話音声であるかを判断する（ステップ５０４）。 Subsequently, the voice analysis unit 15 uses the above-described self-other identification method and the voices acquired by the first microphone 11 and the second microphone 12 based on the sound pressure of the voice signal calculated in Step 503. It is determined whether the voice is from the wearer or from other person than the wearer (step 504).

そして、音声解析部１５により、第１マイクロフォン１１および第２マイクロフォン１２にて取得した音声が装着者によるものであると判断された場合（ステップ５０４にてＹＥＳ）には、ステップ５０３にて算出した音声の音圧に基づいて、発話信号を送信するための無線通信の電波強度を算出する（ステップ５０５）。
続いて、ステップ５０５にて算出した電波強度で、無線通信回線を介して他の端末装置１０に向けて発話信号を送信する（ステップ５０６）。
なお、ステップ５０５における電波強度の算出の仕方およびステップ５０６における発話信号の送信に関しては、後段にて詳細に説明する。 If the voice analysis unit 15 determines that the voice acquired by the first microphone 11 and the second microphone 12 is from the wearer (YES in step 504), the calculation is performed in step 503. Based on the sound pressure of the voice, the radio field intensity of wireless communication for transmitting the speech signal is calculated (step 505).
Subsequently, an utterance signal is transmitted to another terminal apparatus 10 through the wireless communication line with the radio wave intensity calculated in step 505 (step 506).
Note that the method of calculating the radio field intensity in step 505 and the transmission of the speech signal in step 506 will be described in detail later.

一方、音声解析部１５により、音声が装着者以外の他者によるものであると判断された場合（ステップ５０４にてＮＯ）には、ステップ５０５およびステップ５０６は実行せずに次のステップへ進む。 On the other hand, if the voice analysis unit 15 determines that the voice is from another person other than the wearer (NO in step 504), the process proceeds to the next step without executing steps 505 and 506. .

続いて、端末装置１０は、信号受信部１７において他の端末装置１０から送信された発話信号を受信したか否かの判断を行う（ステップ５０７）。
信号受信部１７にて他の端末装置から送信された発話信号を受信した場合（ステップ５０７でＹＥＳ）には、データ送信部１６を介して、ホスト装置２０に向けて受信情報を送信する（ステップ５０８）。
一方、信号受信部１７にて他の端末装置１０からの発話信号を受信しない場合（ステップ５０７でＮＯ）には、ステップ５０８は実行せずに次のステップへ進む。 Subsequently, the terminal device 10 determines whether or not the utterance signal transmitted from the other terminal device 10 is received by the signal receiving unit 17 (step 507).
When the signal reception unit 17 receives an utterance signal transmitted from another terminal device (YES in step 507), the reception information is transmitted to the host device 20 via the data transmission unit 16 (step 507). 508).
On the other hand, when the signal receiving unit 17 does not receive an utterance signal from another terminal device 10 (NO in step 507), the process proceeds to the next step without executing step 508.

続いて、データ送信部１６は、ホスト装置２０に向けて発話情報を送信する（ステップ５０９）。
以上のステップにより、本実施の形態の端末装置１０にて実行される処理が終了する。
なお、この例では、発話信号の送信等を行った後に、受信情報および発話情報の送信を行うものとしたが、これらの信号、情報の送信の順序はこれに限られず、例えば発話情報を送信した後に発話信号の送信等を行っても構わない。 Subsequently, the data transmission unit 16 transmits the utterance information to the host device 20 (step 509).
With the above steps, the process executed by the terminal device 10 of the present embodiment is completed.
In this example, the transmission of the reception information and the utterance information is performed after the transmission of the utterance signal. However, the order of the transmission of these signals and information is not limited to this. For example, the utterance information is transmitted. After that, an utterance signal may be transmitted.

続いて、上述したステップ５０５における電波強度の算出およびステップ５０６における発話信号の送信に関して、詳細に説明する。
ステップ５０５では、上述したように、ステップ５０３において音声解析部１５により算出された装着者の発話音声の音圧に基づいて、発話信号を送信するための電波強度を算出している。 Next, the calculation of the radio wave intensity in step 505 and the transmission of the speech signal in step 506 will be described in detail.
In step 505, as described above, the radio field intensity for transmitting the speech signal is calculated based on the sound pressure of the wearer's speech sound calculated by the speech analysis unit 15 in step 503.

図６は、対話する２者間の距離（装着者と対話相手との間の距離）と、装着者の発話音声の音圧との関係、および、対話する２者間の距離と端末装置１０から送信する発話信号の電波強度との関係を示した図である。
なお、図６において、実線が、装着者の発生音声の音圧と２者間の距離との関係を示しており、破線が、発話信号の電波強度と２者間の距離との関係を示している。また、図６では、発生音声の音圧および発話信号の電波強度は、２者間の距離が１ｍの場合の値を１とした場合の相対値で表している。 FIG. 6 shows the relationship between the distance between the two parties having a conversation (the distance between the wearer and the conversation partner) and the sound pressure of the voice of the wearer, and the distance between the two parties having the conversation and the terminal device 10. It is the figure which showed the relationship with the radio field intensity of the speech signal transmitted from.
In FIG. 6, the solid line indicates the relationship between the sound pressure of the voice generated by the wearer and the distance between the two persons, and the broken line indicates the relationship between the radio wave intensity of the speech signal and the distance between the two persons. ing. In FIG. 6, the sound pressure of the generated speech and the radio wave intensity of the speech signal are expressed as relative values when the value when the distance between the two is 1 m is 1.

ここで、一般に、複数の者が対話を行う際には、発話する者（発話者）の声の大きさ（発話音声の音圧）は、対話相手との間の距離によって異なる。すなわち、発話者は、通常、発話した音声が対話相手に届いて対話が成立するように、対話相手との間の距離に応じて発話音声の音圧を変化させている。
具体的には、図６に示すように、対話する２者間の距離が小さい場合には、発話者の発話音声の音圧は小さくなり、対話する２者間の距離が大きくなるにつれて、発話者の発話音声の音圧が大きくなる傾向がある。 Here, generally, when a plurality of persons have a conversation, the loudness of the person who speaks (speaker) (the sound pressure of the spoken voice) varies depending on the distance to the conversation partner. That is, the speaker usually changes the sound pressure of the uttered voice according to the distance from the conversation partner so that the spoken voice reaches the conversation partner and the conversation is established.
Specifically, as shown in FIG. 6, when the distance between the two parties having a conversation is small, the sound pressure of the uttered voice of the speaker is decreased, and the utterance is increased as the distance between the two parties having the conversation is increased. There is a tendency for the sound pressure of the uttered voice of the person to increase.

また、端末装置１０から無線通信回線を介して発話信号等の信号を送信する場合、信号を送信する端末装置１０（端末装置１０の装着者）と、他の端末装置１０（他の端末装置１０の装着者；対話相手）との間の距離によって、端末装置１０から送信された発話信号が他の端末装置１０に到達するために必要な信号の電波強度が異なる。すなわち、図６に示すように、２者間の距離（装着者と対話相手との距離）が小さい場合には、発話信号の電波強度が小さい場合であっても、端末装置１０から送信された発話信号が他の端末装置１０まで到達するが、２者間の距離が大きくなるにつれて、端末装置１０から送信された発話信号を他の端末装置１０まで到達させるために、発話信号の電波強度を大きくする必要がある。 Further, when a signal such as an utterance signal is transmitted from the terminal device 10 via a wireless communication line, the terminal device 10 (the wearer of the terminal device 10) that transmits the signal and another terminal device 10 (other terminal device 10). Depending on the distance between the wearer and the conversation partner, the signal strength of the signal required for the speech signal transmitted from the terminal device 10 to reach the other terminal device 10 differs. That is, as shown in FIG. 6, when the distance between the two persons (the distance between the wearer and the conversation partner) is small, the signal is transmitted from the terminal device 10 even when the radio field intensity of the speech signal is small. The utterance signal reaches the other terminal device 10, but in order to make the utterance signal transmitted from the terminal device 10 reach the other terminal device 10 as the distance between the two increases, the radio field intensity of the utterance signal is changed. It needs to be bigger.

以上の関係に基づいて、本実施の形態の音声解析部１５では、端末装置１０を装着した装着者の発話音声の音圧から、端末装置１０から発話信号を送信する際の電波強度を算出している。
すなわち、ステップ５０５では、ステップ５０４にて端末装置１０が取得した音声が装着者によるものと判断された場合に、ステップ５０３にて算出した装着者の発話音声の音圧に基づいて、図６に示した関係を利用して装着者と対話相手との間の距離を算出する。そして、算出した２者間の距離に基づいて、図６に示した関係を利用して発話信号を送信するための電波強度を算出している。
なお、ステップ５０５において電波強度の算出に用いる発話音声の音圧は、端末装置１０における第１マイクロフォン１１にて取得された発話音声の音圧であっても、第２マイクロフォン１２にて取得された発話音声の音圧であってもよく、またこれらの平均値や総和等であってもよい。
また、音声解析部１５は、装着者の発話音声の音圧と、発話信号の電波強度とを対応付けて記憶しておいてもよく、装着者の発話音声の音圧に基づいて、直接、発話信号の電波強度を算出してもよい。 Based on the above relationship, the voice analysis unit 15 according to the present embodiment calculates the radio field intensity when the speech signal is transmitted from the terminal device 10 from the sound pressure of the speech sound of the wearer wearing the terminal device 10. ing.
That is, in step 505, when it is determined that the voice acquired by the terminal device 10 in step 504 is due to the wearer, based on the sound pressure of the uttered voice of the wearer calculated in step 503, FIG. The distance between the wearer and the conversation partner is calculated using the relationship shown. Based on the calculated distance between the two persons, the radio field intensity for transmitting the speech signal is calculated using the relationship shown in FIG.
Note that the sound pressure of the utterance voice used for the calculation of the radio field intensity in step 505 is acquired by the second microphone 12 even if it is the sound pressure of the utterance voice acquired by the first microphone 11 in the terminal device 10. The sound pressure of the uttered voice may be used, or the average value or the sum of these may be used.
The voice analysis unit 15 may store the sound pressure of the utterance voice of the wearer and the radio wave intensity of the utterance signal in association with each other, and directly based on the sound pressure of the wearer's utterance voice. The radio field intensity of the speech signal may be calculated.

そして、ステップ５０６では、ステップ５０５にて算出した電波強度で、無線通信により発話信号を送信する。
これにより、本実施の形態の音声解析システム１では、端末装置１０の装着者と対話をしていると考えられる他の端末装置１０の装着者（対話相手）には、端末装置１０から送信された発話信号が到達し、端末装置１０の装着者から遠く離れた位置に存在して端末装置１０の装着者と対話を行っていないと考えられる他の端末装置１０の装着者には、発話信号は到達しない。この結果、端末装置１０の装着者の対話相手が装着する他の端末装置１０では、端末装置１０から送信された発話信号を受信することができる。 In step 506, the speech signal is transmitted by wireless communication with the radio wave intensity calculated in step 505.
As a result, in the voice analysis system 1 of the present embodiment, the terminal device 10 transmits the message to the wearer (dialog partner) of the other terminal device 10 that is considered to have a dialogue with the wearer of the terminal device 10. The other utterance signal is sent to the wearer of the other terminal device 10 that is considered to have not reached the dialogue with the wearer of the terminal device 10 because the utterance signal has arrived and is far away from the wearer of the terminal device 10. Will not reach. As a result, the other terminal device 10 worn by the conversation partner of the wearer of the terminal device 10 can receive the speech signal transmitted from the terminal device 10.

＜ホスト装置２０にて実行される処理＞
図７は、本実施の形態が適用されるホスト装置２０にて実行される処理を示すフローチャートである。続いて、図７および上述した図１を参照して、本実施の形態のホスト装置２０にて実行される処理について説明する。
本実施の形態のホスト装置２０では、まず、データ受信部２１が、複数の端末装置１０から発話音声の音声信号に関する発話情報を受信する（ステップ７０１）。なお、発話情報には、端末装置１０のＩＤ情報、端末装置１０にて取得した発話音声の取得時刻、端末装置１０において上述のステップ５０４にて得た自他識別の解析結果等の情報が含まれる。
そして、データ受信部２１にて受信した発話情報を、一旦データ蓄積部２２に送り蓄積する（ステップ７０２）。 <Processing executed by host device 20>
FIG. 7 is a flowchart showing processing executed by the host device 20 to which this exemplary embodiment is applied. Next, with reference to FIG. 7 and FIG. 1 described above, processing executed by the host device 20 of the present embodiment will be described.
In the host device 20 of the present embodiment, first, the data receiving unit 21 receives utterance information related to speech signals of uttered speech from a plurality of terminal devices 10 (step 701). Note that the utterance information includes information such as the ID information of the terminal device 10, the acquisition time of the utterance voice acquired by the terminal device 10, and the analysis result of the self-other identification obtained at the above-described step 504 in the terminal device 10. It is.
Then, the utterance information received by the data receiving unit 21 is temporarily sent to the data storage unit 22 and stored (step 702).

続いて、端末装置１０から送信された受信情報を、データ受信部２１にて受信した場合（ステップ７０３でＹＥＳ）には、受信した受信情報を、一旦データ蓄積部２２に送り蓄積する（ステップ７０４）。
なお、データ受信部２１にて受信情報を受信しない場合（ステップ７０３でＮＯ）には、その後のステップは実行せずにステップ７０１に戻る。 Subsequently, when the reception information transmitted from the terminal device 10 is received by the data reception unit 21 (YES in step 703), the received reception information is temporarily sent to the data storage unit 22 and stored (step 704). ).
If the reception information is not received by the data receiving unit 21 (NO in step 703), the subsequent steps are not executed and the process returns to step 701.

次に、データ解析部２３は、ステップ７０１にて取得しステップ７０２にてデータ蓄積部２２に蓄積した発話情報と、ステップ７０３にて取得しステップ７０４にてデータ蓄積部２２に蓄積した受信情報とに基づいて、複数の端末装置１０の中から上述した音声の同調性の判断を実行する端末装置１０の組み合わせを限定する（ステップ７０５）。
なお、ステップ７０５の、複数の端末装置１０から音声の同調性の判断を実行する端末装置１０の組み合わせを限定する処理については、後段にて詳細に説明する。 Next, the data analysis unit 23 acquires the utterance information acquired in step 701 and stored in the data storage unit 22 in step 702, and the reception information acquired in step 703 and stored in the data storage unit 22 in step 704. Based on the above, the combinations of the terminal devices 10 that execute the above-described determination of the audio synchrony among the plurality of terminal devices 10 are limited (step 705).
Note that the process of limiting the combination of the terminal devices 10 that execute the determination of the audio synchronism from the plurality of terminal devices 10 in step 705 will be described in detail later.

続いて、ステップ７０５にて限定した端末装置１０の組について、上述した音声の同調性の判断を行う（ステップ７０６）。
そして、端末装置１０の間で音声の同調性があると判断した場合（ステップ７０７にてＹＥＳ）には、音声の同調性の判断を行った端末装置１０の装着者同士の間に対話関係があると判定する（ステップ７０８）。
一方、端末装置１０の間で音声の同調性がないと判断した場合（ステップ７０７にてＮＯ）には、ステップ７０１に戻る。 Subsequently, for the set of terminal devices 10 limited in step 705, the above-described sound synchrony is determined (step 706).
If it is determined that there is audio synchronism between the terminal devices 10 (YES in step 707), there is an interactive relationship between the wearers of the terminal device 10 that have determined the audio synchronism. It is determined that there is (step 708).
On the other hand, when it is determined that there is no audio synchronism between the terminal devices 10 (NO in step 707), the process returns to step 701.

＜プログラムの説明＞
なお、図５で説明を行った本実施の形態における端末装置１０が行う処理、および図７で説明を行った本実施の形態におけるホスト装置２０が行う処理は、ソフトウェアとハードウェア資源とが協働することにより実現される。すなわち、端末装置１０およびホスト装置２０に設けられた制御用コンピュータ内部のＣＰＵが、端末装置１０およびホスト装置２０の各機能を実現するプログラムを実行し、これらの各機能を実現される。 <Description of the program>
Note that the processing performed by the terminal device 10 in the present embodiment described in FIG. 5 and the processing performed by the host device 20 in the present embodiment described in FIG. 7 are performed by software and hardware resources. Realized by working. That is, the CPUs in the control computers provided in the terminal device 10 and the host device 20 execute programs that realize the functions of the terminal device 10 and the host device 20, and these functions are realized.

＜音声解析システムにて実行される処理の具体例＞
続いて、音声解析システム１にて実行される処理の具体例について説明する。
図８は、本実施の形態が適用される音声解析システム１にて実行される処理を詳細に説明するための図である。
ここで、図８に示す例では、端末装置１０Ｘ、１０Ｙ、１０Ｚをそれぞれ装着した装着者Ｘ、Ｙ、Ｚが存在している。そして、この例では、装着者Ｘと装着者Ｙとが対話をしており、装着者Ｘと装着者Ｚとの間、装着者Ｙと装着者Ｚとの間には対話はないものとする。また、この例では、装着者Ｘと装着者Ｙとの間の距離ｍ１と比較して、装着者Ｘと装着者Ｚとの間の距離ｍ２および装着者Ｙと装着者Ｚとの間の距離ｍ３が大きくなっている（ｍ１＜ｍ２、ｍ１＜ｍ３）。なお、図８に示す例では、図示は省略するが、端末装置１０Ｘ、１０Ｙ、１０Ｚと無線通信回線を介して接続されるホスト装置２０（図１参照）が存在する。
ここで、図８に示す例では、装着者Ｘと装着者Ｚとの間および装着者Ｙと装着者Ｚとの間には対話はないものの、装着者Ｚ（端末装置１０Ｚ）には、装着者Ｘおよび装着者Ｙの発話音声が届いて、端末装置１０Ｚのマイクロフォン１１、１２が、装着者Ｘおよび装着者Ｙの発話音声を取得する場合がある。 <Specific example of processing executed in speech analysis system>
Next, a specific example of processing executed in the voice analysis system 1 will be described.
FIG. 8 is a diagram for explaining in detail processing executed in the speech analysis system 1 to which this exemplary embodiment is applied.
Here, in the example illustrated in FIG. 8, there are wearers X, Y, and Z wearing the terminal devices 10X, 10Y, and 10Z, respectively. In this example, it is assumed that the wearer X and the wearer Y have a dialogue, and there is no dialogue between the wearer X and the wearer Z and between the wearer Y and the wearer Z. . In this example, the distance m2 between the wearer X and the wearer Z and the distance between the wearer Y and the wearer Z are compared with the distance m1 between the wearer X and the wearer Y. m3 is increased (m1 <m2, m1 <m3). In the example shown in FIG. 8, although not shown, there is a host device 20 (see FIG. 1) connected to the terminal devices 10X, 10Y, and 10Z via a wireless communication line.
Here, in the example shown in FIG. 8, although there is no dialogue between the wearer X and the wearer Z and between the wearer Y and the wearer Z, the wearer Z (terminal device 10Z) The voices of the wearer X and the wearer Y arrive, and the microphones 11 and 12 of the terminal device 10Z may acquire the voices of the wearer X and the wearer Y.

（端末装置１０Ｘにて行われる処理の具体例）
まず、図８に示した例において、装着者Ｘが対話相手である装着者Ｙに向けて発話した際に、装着者Ｘが装着する端末装置１０Ｘにて行われる処理について、図５で説明した手順に基づき説明する。
図８に示す例において装着者Ｘが発話すると、端末装置１０Ｘでは、第１マイクロフォン１１および第２マイクロフォン１２（それぞれ図１参照）が、その発話音声を取得する（ステップ５０１）。
続いて、音声解析部１５（図１参照）により、取得した発話音声について音声信号の増幅等の処理を実行した後（ステップ５０２）、増幅した音声信号に基づいて、発話音声の音圧を算出する（ステップ５０３）。
次いで、上述したような方法で、発話音声の自他識別を行う（ステップ５０４）。この例では、装着者Ｘが発話をしているため、端末装置１０Ｘにて取得された発話音声は、装着者Ｘ自身によるものと判定される（ステップ５０４にてＹＥＳ）。 (Specific example of processing performed in terminal device 10X)
First, in the example shown in FIG. 8, the processing performed in the terminal device 10 X worn by the wearer X when the wearer X speaks to the wearer Y who is the conversation partner has been described with reference to FIG. 5. This will be explained based on the procedure.
When the wearer X utters in the example shown in FIG. 8, in the terminal device 10X, the first microphone 11 and the second microphone 12 (see FIG. 1 respectively) acquire the uttered voice (step 501).
Subsequently, the speech analysis unit 15 (see FIG. 1) performs processing such as amplification of the speech signal on the acquired speech speech (step 502), and then calculates the sound pressure of the speech speech based on the amplified speech signal. (Step 503).
Next, the utterance voice is identified by the method as described above (step 504). In this example, since the wearer X is speaking, it is determined that the uttered voice acquired by the terminal device 10X is due to the wearer X itself (YES in step 504).

続いて、端末装置１０Ｘの音声解析部１５では、ステップ５０３にて算出した装着者Ｘ自身の発話音声の音圧および図６に示した関係に基づいて、発話信号を出力するための電波強度を算出する（ステップ５０５）。
ここで、図８に示した例では、装着者Ｘは、装着者Ｙに向けて発話を行っており、装着者Ｘの発話音声の音圧は、装着者Ｘとの距離がｍ１である位置にいる装着者Ｙとの間で対話を行うのに適した大きさとなっている。
これにより、音声解析部１５は、装着者Ｘの発話音声の音圧に基づいて、図６に示した関係により、装着者Ｘと装着者Ｙとの距離をｍ１と算出することができる。そして、音声解析部１５は、算出した装着者Ｘと装着者Ｙとの間の距離ｍ１から、発話信号を送信するための電波強度を算出する。すなわち、図６に示した関係から、端末装置１０Ｘからの距離がｍ１以下の範囲には発話信号が到達し、端末装置１０Ｘからの距離がｍ１よりも離れた範囲には発話信号が到達しないような電波強度が算出される。 Subsequently, in the voice analysis unit 15 of the terminal device 10X, based on the sound pressure of the uttered voice of the wearer X calculated in Step 503 and the relationship shown in FIG. Calculate (step 505).
Here, in the example shown in FIG. 8, the wearer X is speaking toward the wearer Y, and the sound pressure of the uttered voice of the wearer X is a position where the distance from the wearer X is m1. The size is suitable for a conversation with the wearer Y in
Thereby, the voice analysis unit 15 can calculate the distance between the wearer X and the wearer Y as m1 based on the sound pressure of the voice of the wearer X based on the relationship shown in FIG. Then, the voice analysis unit 15 calculates the radio field intensity for transmitting the speech signal from the calculated distance m1 between the wearer X and the wearer Y. That is, from the relationship shown in FIG. 6, the utterance signal reaches the range where the distance from the terminal device 10X is less than or equal to m1, and the utterance signal does not reach the range where the distance from the terminal device 10X is more than m1. Radio field strength is calculated.

続いて、端末装置１０Ｘのデータ送信部１６（図１参照）が、ステップ５０５にて算出した電波強度で、無線通信を用いて発話信号を送信する（ステップ５０６）。なお、発話信号としては、端末装置１０ＸのＩＤ情報や、端末装置１０Ｘにて取得した装着者Ｘの発話音声の取得時刻等の情報が含まれる。
図８には、ステップ５０５にて算出された電波強度で端末装置１０Ｘから送信された発話信号が到達する範囲（すなわち、装着者Ｘ（端末装置１０Ｘ）からの距離がｍ１以内となる範囲）を、破線で示している。
ここで、装着者Ｘ（端末装置１０Ｘ）と装着者Ｙ（端末装置１０Ｙ）との間の距離は、ｍ１であるため、図８に示すように、端末装置１０Ｘから送信された発話信号は、端末装置１０Ｙまで到達する。そして、端末装置１０Ｙの信号受信部１７では、端末装置１０Ｘからの発話信号を受信する。 Subsequently, the data transmission unit 16 (see FIG. 1) of the terminal device 10X transmits an utterance signal using wireless communication with the radio wave intensity calculated in Step 505 (Step 506). Note that the speech signal includes information such as the ID information of the terminal device 10X and the acquisition time of the speech sound of the wearer X acquired by the terminal device 10X.
FIG. 8 shows a range in which the speech signal transmitted from the terminal device 10X reaches with the radio wave intensity calculated in step 505 (that is, a range in which the distance from the wearer X (terminal device 10X) is within m1). This is indicated by a broken line.
Here, since the distance between the wearer X (terminal device 10X) and the wearer Y (terminal device 10Y) is m1, the speech signal transmitted from the terminal device 10X as shown in FIG. The terminal device 10Y is reached. Then, the signal receiving unit 17 of the terminal device 10Y receives the speech signal from the terminal device 10X.

一方、装着者Ｘ（端末装置１０Ｘ）と装着者Ｚ（端末装置１０Ｚ）との間の距離は、ｍ２（＞ｍ１）であるため、図８に示すように、端末装置１０Ｘから送信された発話信号は、端末装置１０Ｚまで到達せず、端末装置１０Ｚでは、端末装置１０Ｘからの発話信号を受信しない。 On the other hand, since the distance between the wearer X (terminal device 10X) and the wearer Z (terminal device 10Z) is m2 (> m1), the utterance transmitted from the terminal device 10X as shown in FIG. The signal does not reach the terminal device 10Z, and the terminal device 10Z does not receive the speech signal from the terminal device 10X.

続いて、端末装置１０Ｘでは、ホスト装置２０に向けて発話情報を送信し（ステップ５０９）、一連の処理を終了する。
ここで、通常、ホスト装置２０は、端末装置１０Ｘ〜１０Ｚを装着する装着者Ｘ〜Ｚから離れた位置に設置されることが多いため、発話情報をホスト装置２０へ送信するための電波強度は、上述した発話信号を送信する電波強度よりも大きく設定されている。
また、端末装置１０Ｘからホスト装置２０へ送信される発話情報には、端末装置１０Ｘにて取得した装着者Ｘの発話音声の解析結果や、装着者Ｘの発話音声の取得時刻、装着者Ｘの発話音声の音圧等の情報、端末装置１０ＸのＩＤ情報等が含まれる。 Subsequently, in the terminal device 10X, the utterance information is transmitted to the host device 20 (step 509), and the series of processes is terminated.
Here, since the host device 20 is usually installed at a position away from the wearers X to Z who wear the terminal devices 10X to 10Z, the radio wave intensity for transmitting the speech information to the host device 20 is , It is set to be larger than the radio wave intensity at which the speech signal is transmitted.
The utterance information transmitted from the terminal device 10X to the host device 20 includes the analysis result of the utterance voice of the wearer X acquired by the terminal device 10X, the acquisition time of the utterance voice of the wearer X, and the wearer X Information such as the sound pressure of the uttered voice, ID information of the terminal device 10X, and the like are included.

なお、この例では、装着者Ｘが装着者Ｙに向けて発話を行っており、装着者Ｙおよび装着者Ｚは発話を行っていないため、端末装置１０Ｙおよび端末装置１０Ｚから発話信号は送信されず、端末装置１０Ｘは発話信号を受信しない（ステップ５０７でＮＯ）。したがって、この例では、端末装置１０Ｘでは、ステップ５０８の受信情報の送信は行わない。 In this example, since the wearer X is speaking to the wearer Y and the wearer Y and the wearer Z are not speaking, the utterance signal is transmitted from the terminal device 10Y and the terminal device 10Z. The terminal device 10X does not receive the speech signal (NO in step 507). Therefore, in this example, the terminal device 10X does not transmit the reception information in step 508.

（端末装置１０Ｙにて行われる処理の具体例）
続いて、図８に示した例において、装着者Ｘが対話相手である装着者Ｙに向けて発話した際に、装着者Ｙが装着する端末装置１０Ｙにて行われる処理について、図５で説明した手順に基づき説明する。
図８に示す例において装着者Ｘが発話すると、端末装置１０Ｙでは、第１マイクロフォン１１および第２マイクロフォン１２（それぞれ図１参照）が、その発話音声を取得する（ステップ５０１）。
続いて、音声解析部１５（図１参照）により、取得した発話音声について音声信号の増幅等の処理を実行した後（ステップ５０２）、増幅した音声信号に基づいて、発話音声の音圧を算出する（ステップ５０３）。 (Specific example of processing performed in terminal device 10Y)
Next, in the example illustrated in FIG. 8, processing performed by the terminal device 10 Y worn by the wearer Y when the wearer X speaks to the wearer Y who is a conversation partner will be described with reference to FIG. 5. This will be described based on the procedure.
When the wearer X utters in the example shown in FIG. 8, in the terminal device 10Y, the first microphone 11 and the second microphone 12 (see FIG. 1 respectively) acquire the uttered voice (step 501).
Subsequently, the speech analysis unit 15 (see FIG. 1) performs processing such as amplification of the speech signal on the acquired speech speech (step 502), and then calculates the sound pressure of the speech speech based on the amplified speech signal. (Step 503).

次いで、端末装置１０Ｙの音声解析部１５は、上述したような方法で、発話音声の自他識別を行う（ステップ５０４）。この例では、装着者Ｘが発話をしているため、端末装置１０Ｙにて取得された発話音声は、装着者Ｙ以外の他者によるものと判定される（ステップ５０４でＮＯ）。したがって、端末装置１０Ｙでは、ステップ５０５の発話信号の電波強度算出、および、ステップ５０６の発話信号の送信を行わない。 Next, the voice analysis unit 15 of the terminal device 10Y performs self-other identification of the uttered voice by the method as described above (step 504). In this example, since the wearer X is uttering, it is determined that the uttered voice acquired by the terminal device 10Y is from someone other than the wearer Y (NO in step 504). Accordingly, the terminal device 10Y does not calculate the radio field intensity of the speech signal in step 505 and does not transmit the speech signal in step 506.

続いて、端末装置１０Ｙでは、上述したように、端末装置１０Ｘから送信された発話信号を受信する（ステップ５０７でＹＥＳ）。ここで、端末装置１０Ｘからの発話信号には、上述したように、端末装置１０ＸのＩＤ情報や、端末装置１０Ｘによる発話音声の取得時刻等の情報が含まれる。
そして、端末装置１０Ｙは、データ送信部１６により、ホスト装置２０へ向けて受信情報を送信する（ステップ５０８）。ここで、端末装置１０Ｙからホスト装置２０へ向けて送信される受信情報には、端末装置１０ＹのＩＤ情報と、端末装置１０Ｘから受信した発話信号に含まれる、端末装置１０ＸのＩＤ情報や端末装置１０Ｘによる発話音声の取得時刻等の情報とが含まれる。 Subsequently, as described above, the terminal device 10Y receives the speech signal transmitted from the terminal device 10X (YES in step 507). Here, as described above, the speech signal from the terminal device 10X includes information such as the ID information of the terminal device 10X and the acquisition time of the speech sound by the terminal device 10X.
Then, the terminal device 10Y transmits the reception information to the host device 20 by the data transmission unit 16 (step 508). Here, the reception information transmitted from the terminal device 10Y toward the host device 20 includes the ID information of the terminal device 10Y and the ID information of the terminal device 10X and the terminal device included in the speech signal received from the terminal device 10X. Information such as the acquisition time of the speech voice by 10X.

次いで、端末装置１０Ｙでは、ホスト装置２０に向けて発話情報を送信し（ステップ５０９）、一連の処理を終了する。
この際、端末装置１０Ｙにより送信される発話情報には、端末装置１０Ｙにて取得した装着者Ｘの発話音声の解析結果、装着者Ｘの発話音声の取得時刻、装着者Ｘの発話音声の音圧等の情報や、端末装置１０ＹのＩＤ情報等が含まれる。 Next, the terminal device 10Y transmits the utterance information to the host device 20 (step 509), and the series of processing ends.
At this time, the speech information transmitted by the terminal device 10Y includes the analysis result of the speech of the wearer X acquired by the terminal device 10Y, the acquisition time of the speech of the wearer X, and the sound of the speech of the wearer X. Information such as pressure, ID information of the terminal device 10Y, and the like are included.

（端末装置１０Ｚにて行われる処理）
続いて、図８に示した例において、装着者Ｘが対話相手である装着者Ｙに向けて発話した際に、装着者Ｚが装着する端末装置１０Ｚにて行われる処理について、図５で説明した手順に基づき説明する。
図８に示す例において装着者Ｘが発話すると、端末装置１０Ｚでは、第１マイクロフォン１１および第２マイクロフォン１２（それぞれ図１参照）が、その発話音声を取得する（ステップ５０１）。
続いて、音声解析部１５（図１参照）により、取得した発話音声について音声信号の増幅等の処理を実行した後（ステップ５０２）、増幅した音声信号に基づいて、発話音声の音圧を算出する（ステップ５０３）。 (Processing performed in the terminal device 10Z)
Next, in the example shown in FIG. 8, processing performed by the terminal device 10 Z worn by the wearer Z when the wearer X speaks to the wearer Y who is a conversation partner will be described with reference to FIG. 5. This will be described based on the procedure.
When the wearer X speaks in the example shown in FIG. 8, in the terminal device 10Z, the first microphone 11 and the second microphone 12 (refer to FIG. 1 respectively) acquire the speech voice (step 501).
Subsequently, the speech analysis unit 15 (see FIG. 1) performs processing such as amplification of the speech signal on the acquired speech speech (step 502), and then calculates the sound pressure of the speech speech based on the amplified speech signal. (Step 503).

次いで、端末装置１０Ｚの音声解析部１５は、上述したような方法で、発話音声の自他識別を行う（ステップ５０４）。この例では、装着者Ｘが発話をしているため、端末装置１０Ｚにて取得された発話音声は、装着者Ｚ以外の他者によるものと判定される（ステップ５０４でＮＯ）。したがって、端末装置１０Ｙでは、ステップ５０５の発話信号の電波強度の算出、および、ステップ５０６の発話信号の送信を行わない。 Next, the voice analysis unit 15 of the terminal device 10Z performs self-other identification of the uttered voice by the method as described above (step 504). In this example, since the wearer X is uttering, it is determined that the uttered voice acquired by the terminal device 10Z is by someone other than the wearer Z (NO in step 504). Therefore, the terminal device 10Y does not calculate the radio field intensity of the speech signal in step 505 and does not transmit the speech signal in step 506.

続いて、端末装置１０Ｚは、上述したように、端末装置１０Ｘからの距離がｍ２（＞ｍ１）となっているため、端末装置１０Ｘから送信された発話信号が到達せず、端末装置１０Ｘからの発話信号を受信しない（ステップ５０７でＮＯ）。したがって、端末装置１０Ｚでは、ステップ５０８の受信情報の送信を行わない。 Subsequently, as described above, since the distance from the terminal device 10X is m2 (> m1), the terminal device 10Z does not reach the utterance signal transmitted from the terminal device 10X, and the terminal device 10X No speech signal is received (NO in step 507). Therefore, the terminal device 10Z does not transmit the reception information in step 508.

次いで、端末装置１０Ｚでは、ホスト装置２０に向けて発話情報を送信し（ステップ５０９）、一連の処理を終了する。
この際、端末装置１０Ｚにより送信される発話情報には、端末装置１０Ｚにて取得した装着者Ｘの発話音声の解析結果、装着者Ｘの発話音声の取得時刻、装着者Ｘの発話音声の音圧等の情報や、端末装置１０ＺのＩＤ情報等が含まれる。 Next, the terminal device 10Z transmits the utterance information to the host device 20 (step 509), and the series of processing ends.
At this time, the utterance information transmitted by the terminal device 10Z includes the analysis result of the utterance voice of the wearer X acquired by the terminal device 10Z, the acquisition time of the utterance voice of the wearer X, and the sound of the utterance voice of the wearer X. Information such as pressure, ID information of the terminal device 10Z, and the like are included.

なお、上述の具体例では、装着者Ｘと装着者Ｙとが対話を行っている場合に、装着者Ｘが装着者Ｙに向けて発話を行った際の処理について説明を行ったが、例えば装着者Ｙが装着者Ｘに向けて発話を行った際の処理も同様に考えることができる。
すなわち、装着者Ｙが装着者Ｘに向けて発話を行った際には、端末装置１０Ｙでは、装着者Ｙの発話音声を取得し、発話音声の音圧に基づいた電波強度で発話信号を送信する。また、端末装置１０Ｙは、ホスト装置２０に向けて発話情報を送信する。 In the above specific example, when the wearer X and the wearer Y have a dialogue, the process when the wearer X speaks to the wearer Y has been described. The processing when the wearer Y speaks toward the wearer X can be considered in the same manner.
That is, when the wearer Y utters toward the wearer X, the terminal device 10Y acquires the utterance voice of the wearer Y and transmits the utterance signal with the radio wave intensity based on the sound pressure of the utterance voice. To do. Further, the terminal device 10 Y transmits utterance information to the host device 20.

また、装着者Ｙが装着者Ｘに向けて発話を行った際に端末装置１０Ｘでは、装着者Ｙの発話音声を取得するとともに、端末装置１０Ｙから送信された発話信号を受信し、ホスト装置２０に向けて受信信号を送信する。また、端末装置１０Ｘは、ホスト装置２０に向けて発話情報を送信する。
さらに、装着者Ｙが装着者Ｘに向けて発話を行った際に端末装置１０Ｚでは、装着者Ｙの発話音声を取得し、発話情報をホスト装置２０に向けて送信する。なお、端末装置１０Ｙ（装着者Ｙ）と端末装置１０Ｚ（装着者Ｚ）との距離はｍ３（＞ｍ１）であるから、端末装置１０Ｙから送信された発話信号は、端末装置１０Ｚには到達せず、端末装置１０Ｚでは発話信号を受信しない。 Further, when the wearer Y utters toward the wearer X, the terminal device 10X acquires the utterance voice of the wearer Y and receives the utterance signal transmitted from the terminal device 10Y, and the host device 20 A reception signal is transmitted toward. In addition, the terminal device 10 X transmits the utterance information to the host device 20.
Furthermore, when the wearer Y utters toward the wearer X, the terminal device 10Z acquires the utterance voice of the wearer Y and transmits the utterance information to the host device 20. Since the distance between the terminal device 10Y (wearer Y) and the terminal device 10Z (wearer Z) is m3 (> m1), the speech signal transmitted from the terminal device 10Y cannot reach the terminal device 10Z. The terminal device 10Z does not receive the speech signal.

（ホスト装置２０にて行われる処理の具体例）
続いて、端末装置１０Ｘ、端末装置１０Ｙおよび端末装置１０Ｚにおいて上述したような処理が行われた場合に、端末装置１０Ｘ〜１０Ｚに無線通信回線を介して接続されるホスト装置２０（図１参照）にて行われる処理について、図７で説明した手順に基づき説明する。 (Specific example of processing performed in host device 20)
Subsequently, when processing as described above is performed in the terminal device 10X, the terminal device 10Y, and the terminal device 10Z, the host device 20 connected to the terminal devices 10X to 10Z via a wireless communication line (see FIG. 1). The process performed in is described based on the procedure described in FIG.

上述したように、ホスト装置２０には、端末装置１０Ｘ〜１０Ｚのそれぞれから発話情報が送信され、ホスト装置２０のデータ受信部２１にてこれらの発話情報を受信する（ステップ７０１）。
そして、ホスト装置２０では、受信した発話情報を、発話情報に含まれるＩＤ情報等に基づいて、データ蓄積部２２に装着者ごと（装着者Ｘ、Ｙ、Ｚ）に分けて蓄積する（ステップ７０２）。 As described above, utterance information is transmitted from each of the terminal devices 10X to 10Z to the host device 20, and the utterance information is received by the data receiving unit 21 of the host device 20 (step 701).
Then, the host device 20 accumulates the received utterance information separately for each wearer (wearers X, Y, Z) in the data accumulation unit 22 based on the ID information included in the utterance information (step 702). ).

続いて、ホスト装置２０のデータ受信部２１では、上述したように端末装置１０Ｙから送信された受信情報、および、端末装置１０Ｘから送信された受信情報を取得する（ステップ７０３でＹＥＳ）。そして、取得した受信情報は、データ蓄積部２２に送られ、蓄積される（ステップ７０４）。
ここで、データ受信部２１で受信した端末装置１０Ｙからの受信情報は、端末装置１０Ｙにて受信した端末装置１０Ｘからの発話信号に基づいている。そして、端末装置１０Ｙからの受信情報には、端末装置１０ＹのＩＤ情報と、端末装置１０Ｘからの発話信号に含まれていた端末装置１０ＸのＩＤ情報および装着者Ｘの発話音声の取得時刻の情報等とが含まれている。
同様に、端末装置１０Ｘからの受信情報は、端末装置１０Ｘで受信した端末装置１０Ｙからの発話信号に基づいている。そして、端末装置１０Ｘから受信情報には、端末装置１０ＸのＩＤ情報と、端末装置１０Ｙからの発話信号に含まれていた端末装置１０ＸのＩＤ情報および装着者Ｙの発話音声の取得時刻の情報等とが含まれている。 Subsequently, the data reception unit 21 of the host device 20 acquires the reception information transmitted from the terminal device 10Y and the reception information transmitted from the terminal device 10X as described above (YES in step 703). The acquired reception information is sent to the data storage unit 22 and stored (step 704).
Here, the reception information from the terminal device 10Y received by the data receiving unit 21 is based on the utterance signal from the terminal device 10X received by the terminal device 10Y. The received information from the terminal device 10Y includes the ID information of the terminal device 10Y, the ID information of the terminal device 10X included in the utterance signal from the terminal device 10X, and the information on the acquisition time of the uttered voice of the wearer X Etc. are included.
Similarly, the reception information from the terminal device 10X is based on the speech signal from the terminal device 10Y received by the terminal device 10X. The received information from the terminal device 10X includes the ID information of the terminal device 10X, the ID information of the terminal device 10X included in the utterance signal from the terminal device 10Y, the information on the acquisition time of the uttered voice of the wearer Y, etc. And are included.

そして、ホスト装置２０のデータ解析部２３では、取得した受信情報に基づいて、音声の同調性判断を行う端末装置１０の組み合わせ（装着者の組み合わせ）を限定する（ステップ７０５）。
ここで、端末装置１０Ｙからの受信情報には端末装置１０ＸのＩＤ情報等が含まれており、端末装置１０Ｘからの受信情報には端末装置１０ＹのＩＤ情報等が含まれているため、ホスト装置２０のデータ解析部２３は、端末装置１０Ｘの装着者Ｘと端末装置１０Ｙの装着者Ｙとの組み合わせを、音声の同調性判断を行う対象とする。
一方、データ受信部２１は、端末装置１０Ｚから受信情報を取得しておらず、また、端末装置１０Ｙからの受信情報および端末装置１０Ｘからの受信情報には、端末装置１０ＺのＩＤ情報は含まれていない。したがって、ホスト装置２０のデータ解析部２３は、端末装置１０Ｚの装着者Ｚについて、装着者Ｘおよび装着者Ｙとの音声の同調性の判断を行う対象とはしない。 Then, the data analysis unit 23 of the host device 20 limits the combination of the terminal devices 10 (the combination of the wearer) that performs the sound synchrony determination based on the acquired reception information (step 705).
Here, since the reception information from the terminal device 10Y includes the ID information of the terminal device 10X and the reception information from the terminal device 10X includes the ID information of the terminal device 10Y, the host device The data analysis unit 23 of 20 uses the combination of the wearer X of the terminal device 10X and the wearer Y of the terminal device 10Y as a target for performing the sound synchrony determination.
On the other hand, the data receiving unit 21 has not acquired the reception information from the terminal device 10Z, and the reception information from the terminal device 10Y and the reception information from the terminal device 10X include the ID information of the terminal device 10Z. Not. Therefore, the data analysis unit 23 of the host device 20 does not set the wearer Z of the terminal device 10 Z as a target for determining the audio synchrony with the wearer X and the wearer Y.

続いて、ホスト装置２０では、ステップ７０５にて限定した対象の端末装置１０の組み合わせ（装着者の組み合わせ）について、音声の同調性の判断を行う（ステップ７０６）。
すなわち、この例では、ステップ７０１にて受信しステップ７０２にて蓄積した発話情報のうち、端末装置１０Ｘからの発話情報と端末装置１０Ｙからの発話情報とを比較する。そして、この例では、装着者Ｘと装着者Ｙとは対話を行っているため、端末装置１０Ｘからの発話情報と端末装置１０Ｙからの発話情報とは、発話時間の長さや発話者が切り替わったタイミング等の発話状況を示す情報が近似する。これにより、端末装置１０Ｘからの発話情報と端末装置１０Ｙからの発話情報との間には、音声の同調性があると判断され（ステップ７０７でＹＥＳ）、装着者Ｘと装着者Ｙとの間には対話関係があると判断される（ステップ７０８）。 Subsequently, the host device 20 determines audio synchrony for the combination of the target terminal devices 10 (the combination of the wearer) limited in Step 705 (Step 706).
That is, in this example, among the utterance information received in step 701 and accumulated in step 702, the utterance information from the terminal device 10X is compared with the utterance information from the terminal device 10Y. In this example, since the wearer X and the wearer Y are interacting, the utterance information from the terminal device 10X and the utterance information from the terminal device 10Y are switched between the length of the utterance and the speaker. Information indicating the utterance status such as timing approximates. As a result, it is determined that there is audio synchronism between the utterance information from the terminal device 10X and the utterance information from the terminal device 10Y (YES in step 707), and between the wearer X and the wearer Y Is determined to have an interactive relationship (step 708).

一方、端末装置１０Ｘからの発話情報と端末装置１０Ｚからの発話情報との間、および端末装置１０Ｙからの発話情報と端末装置１０Ｚからの発話情報との間では、音声の同調性の判断は行わない。
これにより、本構成を採用しない場合と比較して、対話関係の判断に係る処理が煩雑になるのを抑制することが可能になり、対話関係の判断を精度よく行うことが可能になる。
また、上述した構成を有することで、実際には対話関係はないのに、装着者Ｘと装着者Ｚとの間、装着者Ｙと装着者Ｚとの間に対話関係があると誤って判断されるのを抑制することができる。 On the other hand, between the utterance information from the terminal device 10X and the utterance information from the terminal device 10Z, and between the utterance information from the terminal device 10Y and the utterance information from the terminal device 10Z, the determination of the audio synchrony is performed. Absent.
As a result, it is possible to suppress the complexity of the process related to the determination of the dialog relationship as compared with the case where the present configuration is not adopted, and the determination of the dialog relationship can be performed with high accuracy.
Further, by having the above-described configuration, it is erroneously determined that there is a dialogue relationship between the wearer X and the wearer Z, and between the wearer Y and the wearer Z, although there is actually no dialogue relationship. Can be suppressed.

続いて、実施例を用いて本発明をさらに詳細に説明する。なお、本発明は以下の実施例に限定されるものではない。
（実施例１）
端末装置１０ａを装着する装着者ａ、端末装置１０ｂを装着する装着者ｂ、端末装置１０ｃを装着する装着者ｃおよび端末装置１０ｄを装着する装着者ｄを、図９に示すように配置した。すなわち、装着者ａと装着者ｂとの間の距離が１．５ｍ、装着者ｂと装着者ｃとの間の距離が４ｍ、装着者ｃと装着者ｄとの間の距離が１．５ｍ、装着者ａと装着者ｄとの間の距離が７ｍとなるように、装着者ａ〜装着者ｄを配置した。そして、装着者ａと装着者ｂとの間で予め定められた期間、対話を行うとともに、同じ時間帯において装着者ｃと装着者ｄとの間で対話を行った。なお、装着者ａ〜装着者ｄは、互いの音声が届く範囲内に位置している。
この際、各端末装置１０では、図５等に示したような処理を行う。
そして、このような状況の下で、各端末装置１０での発話信号の取得状況を観察し、その結果を図１０（ａ）に示した。 Subsequently, the present invention will be described in more detail using examples. In addition, this invention is not limited to a following example.
Example 1
A wearer a wearing the terminal device 10a, a wearer b wearing the terminal device 10b, a wearer c wearing the terminal device 10c, and a wearer d wearing the terminal device 10d are arranged as shown in FIG. That is, the distance between the wearer a and the wearer b is 1.5 m, the distance between the wearer b and the wearer c is 4 m, and the distance between the wearer c and the wearer d is 1.5 m. The wearer a to the wearer d are arranged so that the distance between the wearer a and the wearer d is 7 m. Then, a dialogue was performed between the wearer a and the wearer b for a predetermined period, and a dialogue was carried out between the wearer c and the wearer d in the same time zone. Note that the wearers a to d are located within a range where each other's voice can reach.
At this time, each terminal device 10 performs processing as shown in FIG.
And under such a situation, the acquisition situation of the utterance signal in each terminal device 10 was observed, and the result is shown in FIG.

（比較例１）
装着者ａ、装着者ｂ、装着者ｃおよび装着者ｄを、実施例１と同様に配置し、装着者ａと装着者ｂとの間および装着者ｃと装着者ｄとの間で、実施例１と同様に対話を行った。上述したように、装着者ａ〜装着者ｄは、互いの音声が届く範囲内に位置している。
比較例１では、実施例１とは異なり、各端末装置１０は、装着者の発話音声の音圧に関わらず、予め定められた一律の電波強度で発話信号を送信する。なお、この電波強度で端末装置１０から送信された発話信号は、距離が７ｍ以上離れた位置に存在する他の端末装置１０まで到達するものとする。すなわち、装着者ａ〜装着者ｄのうちある装着者の端末装置１０から送信された発話信号は、他の全ての装着者の端末装置１０まで到達する。
また、このような状況の下で、各端末装置１０での発話信号の取得状況を観察し、その結果を図１０（ｂ）に示した。 (Comparative Example 1)
The wearer a, the wearer b, the wearer c, and the wearer d are arranged in the same manner as in the first embodiment, and are performed between the wearer a and the wearer b and between the wearer c and the wearer d. Dialogue was conducted as in Example 1. As described above, the wearer a to the wearer d are located within a range where each other's voice can reach.
In the first comparative example, unlike the first embodiment, each terminal device 10 transmits an utterance signal with a predetermined uniform radio wave intensity regardless of the sound pressure of the uttered voice of the wearer. It is assumed that the utterance signal transmitted from the terminal device 10 with this radio wave intensity reaches another terminal device 10 existing at a distance of 7 m or more. That is, the utterance signal transmitted from the terminal device 10 of one of the wearers a to d reaches the terminal devices 10 of all other wearers.
Under such circumstances, the acquisition status of the utterance signal at each terminal device 10 was observed, and the result is shown in FIG.

（観察結果）
図１０（ａ）に示すように、実施例１において、装着者ａが装着する端末装置１０ａでは、装着者ａの対話相手である装着者ｂが装着する端末装置１０ｂから送信された発話信号のみを受信し、装着者ａと対話を行っていない装着者ｃの端末装置１０ｃおよび装着者ｄの端末装置１０ｄからの発話信号は受信しないことが分かる。
同様に、端末装置１０ｂでは、端末装置１０ａから送信された発話信号のみを受信し、端末装置１０ｃでは、端末装置１０ｄから送信された発話信号のみを受信し、端末装置１０ｄでは、端末装置１０ｃから送信された発話信号のみを受信することが分かる。 (Observation results)
As shown in FIG. 10A, in the first embodiment, in the terminal device 10a worn by the wearer a, only the utterance signal transmitted from the terminal device 10b worn by the wearer b who is the conversation partner of the wearer a. It can be seen that the utterance signals from the terminal device 10c of the wearer c and the terminal device 10d of the wearer d who do not interact with the wearer a are not received.
Similarly, the terminal device 10b receives only the utterance signal transmitted from the terminal device 10a, the terminal device 10c receives only the utterance signal transmitted from the terminal device 10d, and the terminal device 10d receives from the terminal device 10c. It can be seen that only the transmitted speech signal is received.

この結果から、実施例１においてホスト装置２０のデータ解析部２３では、装着者ａ〜装着者ｄ（端末装置１０ａ〜端末装置１０ｄ）を、装着者ａと装着者ｂとの組み合わせ（端末装置１０ａと端末装置１０ｂとの組み合わせ）と、装着者ｃと装着者ｄとの組み合わせ（端末装置１０ｃと端末装置１０ｄとの組み合わせ）とに分けることができる。そして、データ解析部２３において、端末装置１０ａと端末装置１０ｂとの組、および、端末装置１０ｃと端末装置１０ｄとの組について、音声の同調性の判断を行う。この場合、例えば端末装置１０ａと端末装置１０ｃとの組、端末装置１０ａと端末装置１０ｄとの組等については、音声の同調性の判断を行う必要がない。
この結果、本構成を採用しない場合と比較して、データ解析部２３にて実行する処理が煩雑になるのを抑制でき、本構成を採用しない場合と比較して精度よく対話関係の判定を行うことができる。
さらに、音声の同調性の判断を行う前に、予め装着者（端末装置１０）を複数の組に分けているため、本構成を採用しない場合と比較して、対話関係の誤判断が生じにくい。 From this result, in the data analysis unit 23 of the host device 20 in the first embodiment, the wearer a to the wearer d (terminal device 10a to terminal device 10d) are combined with the wearer a and the wearer b (terminal device 10a). And a combination of the terminal device 10b) and a combination of the wearer c and the wearer d (combination of the terminal device 10c and the terminal device 10d). Then, in the data analysis unit 23, the audio synchronism is determined for the set of the terminal device 10a and the terminal device 10b and the set of the terminal device 10c and the terminal device 10d. In this case, for example, for the set of the terminal device 10a and the terminal device 10c, the set of the terminal device 10a and the terminal device 10d, etc., it is not necessary to determine the synchrony of voice.
As a result, compared with the case where this configuration is not adopted, it is possible to suppress the processing executed by the data analysis unit 23 from becoming complicated, and the dialogue relation can be accurately determined as compared with the case where this configuration is not adopted. be able to.
Further, since the wearer (terminal device 10) is divided into a plurality of groups in advance before the determination of the synchronicity of the voice, it is less likely to make an erroneous determination regarding the dialogue than in the case where this configuration is not adopted. .

一方、図１０（ｂ）に示すように、比較例１において、装着者ａが装着する端末装置１０ａでは、装着者ａと対話する装着者ｂの端末装置１０ｂから送信された発話信号だけでなく、装着者ａと対話を行っていない装着者ｃの端末装置１０ｃおよび装着者ｄの端末装置１０ｄから送信された発話信号についても受信することが分かる。
同様に、端末装置１０ｂでは、端末装置１０ａから送信された発話信号だけでなく、端末装置１０ｃおよび端末装置１０ｄから送信された発話信号についても受信し、端末装置１０ｃでは、端末装置１０ｄから送信された発話信号だけでなく、端末装置１０ａおよび端末装置１０ｂから送信された発話信号についても受信し、端末装置１０ｄでは、端末装置１０ｃから送信された発話信号だけでなく、端末装置１０ａおよび端末装置１０ｂから送信された発話信号についても受信することが分かる。 On the other hand, as shown in FIG. 10 (b), in Comparative Example 1, the terminal device 10a worn by the wearer a is not only the utterance signal transmitted from the terminal device 10b of the wearer b interacting with the wearer a. It can be seen that the utterance signal transmitted from the terminal device 10c of the wearer c who does not interact with the wearer a and the terminal device 10d of the wearer d is also received.
Similarly, the terminal device 10b receives not only the utterance signal transmitted from the terminal device 10a but also the utterance signal transmitted from the terminal device 10c and the terminal device 10d, and the terminal device 10c transmits the utterance signal from the terminal device 10d. The terminal apparatus 10d receives not only the utterance signal transmitted from the terminal apparatus 10c but also the utterance signal transmitted from the terminal apparatus 10c and the terminal apparatus 10b. It can be seen that the speech signal transmitted from is also received.

この結果、比較例１では、発話信号の受信状況および受信した発話信号に基づいてホスト装置２０に送信する受信情報等に基づいて、発話音声の同調性の判断を行う対象者を分離することが困難となっている。そして、ホスト装置２０のデータ解析部２３では、装着者ａ〜ｄの対話関係を判定するために、例えば端末装置１０ａと端末装置１０ｂとの組、端末装置１０ａと端末装置１０ｃとの組、端末装置１０ａと端末装置１０ｄとの組、端末装置１０ｂと端末装置１０ｃとの組、端末装置１０ｂと端末装置１０ｄとの組および端末装置１０ｃと端末装置１０ｄとの組について全て同調性の判断を実行する必要がある。
その結果、データ解析部２３にて行われる処理が煩雑になり、対話関係の判定の精度が低下して、対話関係について誤判定が発生する懸念がある。 As a result, in the first comparative example, it is possible to separate the target person who determines the tunedness of the uttered voice based on the reception status of the utterance signal and the reception information transmitted to the host device 20 based on the received utterance signal. It has become difficult. And in the data analysis part 23 of the host device 20, in order to determine the dialogue relationship of the wearers a to d, for example, a set of the terminal device 10a and the terminal device 10b, a set of the terminal device 10a and the terminal device 10c, a terminal Judgment of synchronicity is performed for all sets of the device 10a and the terminal device 10d, the set of the terminal device 10b and the terminal device 10c, the set of the terminal device 10b and the terminal device 10d, and the set of the terminal device 10c and the terminal device 10d. There is a need to.
As a result, the processing performed in the data analysis unit 23 becomes complicated, and the accuracy of determination of the dialog relationship is lowered, and there is a concern that erroneous determination of the dialog relationship may occur.

１…音声解析システム、１０…端末装置、１１…第１マイクロフォン、１２…第２マイクロフォン、１３…第１増幅器、１４…第２増幅器、１５…音声解析部、１６…データ送信部、１７…信号受信部、２０…ホスト装置、２１…データ受信部、２２…データ蓄積部、２３…データ解析部、２４…出力部 DESCRIPTION OF SYMBOLS 1 ... Speech analysis system, 10 ... Terminal device, 11 ... 1st microphone, 12 ... 2nd microphone, 13 ... 1st amplifier, 14 ... 2nd amplifier, 15 ... Speech analysis part, 16 ... Data transmission part, 17 ... Signal Receiving unit, 20 ... host device, 21 ... data receiving unit, 22 ... data storage unit, 23 ... data analyzing unit, 24 ... output unit

Claims

A plurality of voice acquisition means that are arranged at different positions from the wearer's utterance site and that acquire the voice of the speaker;
Identification means for identifying whether the speaker is the wearer or another person other than the wearer, based on the sound pressure of the sound obtained by at least two of the sound obtaining means;
When the speaker is identified as the wearer by the identification unit, the utterance signal relating to the utterance of the wearer with the radio wave intensity based on the sound pressure of the voice of the wearer acquired by the voice acquisition unit Utterance signal transmission means for transmitting
A speech signal receiving means for receiving the speech signal transmitted from the speech signal transmitting means;
A speech analysis system comprising: a dialog relation determining means for determining a dialog relation of the wearer based on a reception state of the speech signal by the speech signal receiving means and a discrimination result by the discrimination means.

When the utterance signal is received by the utterance signal reception means, reception information transmission means for transmitting reception information based on reception of the utterance signal;
Receiving information acquisition means for acquiring the reception information transmitted from the reception information transmission means;
2. The dialog relation determining unit determines the dialog relation of the wearer on the basis of an identification result by the identification unit and the reception information acquired by the reception information acquisition unit. Voice analysis system.

Based on the sound pressures of the voices acquired by the plurality of voice acquisition means that are arranged at different distances from the wearer's utterance part and that acquire the voice of the speaker, and at least two of the voice acquisition means. A plurality of identification means for identifying whether the wearer is a person other than the wearer and a communication means for performing communication with the outside via a wireless communication line based on the identification result by the identification means A voice terminal device;
A voice analysis device comprising a dialogue relation determination means for judging a dialogue relation between wearers of each of the voice terminal devices,
The communication means of the voice terminal device, when the speaker is identified as the wearer by the identification means, the radio field intensity based on the sound pressure of the wearer's voice acquired by the voice acquisition means The speech signal transmitting means for transmitting the speech signal related to the utterance of the wearer, and the speech signal receiving means for receiving the speech signal transmitted from the speech signal transmitting means in the other voice terminal device,
The dialogue relation determination unit of the voice analysis device is based on an identification result by the identification unit of each voice terminal device and a reception state of the utterance signal by the utterance signal reception unit of each voice terminal device. A speech analysis system characterized by determining dialogue relations.

The dialogue relation determination unit of the voice analysis device includes the wearer of the voice terminal device that has transmitted the utterance signal by the utterance signal transmission unit, and the utterance signal reception among the wearers of the plurality of voice terminal devices. 4. The speech analysis system according to claim 3, wherein a dialogue relation is determined with respect to a wearer of the voice terminal device that has received the speech signal by means.

The dialogue relation determination unit of the voice analysis device includes the voice acquired by the voice acquisition unit of the voice terminal device that has transmitted the utterance signal by the utterance signal transmission unit, and the utterance signal by the utterance signal reception unit. 5. The voice analysis system according to claim 4, wherein the dialogue relation is determined by comparing synchrony with the voice acquired by the voice acquisition means of the voice terminal device that has received the message.

A plurality of voice acquisition means that are arranged at different positions from the wearer's utterance site and that acquire the voice of the speaker;
Identification means for identifying whether the speaker is the wearer or another person other than the wearer, based on the sound pressure of the sound obtained by at least two of the sound obtaining means;
When the speaker is identified as the wearer by the identification unit, the utterance signal relating to the utterance of the wearer with the radio wave intensity based on the sound pressure of the voice of the wearer acquired by the voice acquisition unit Utterance signal transmission means for transmitting
An audio terminal device including an utterance signal receiving means for receiving an utterance signal related to an utterance of another person.

The voice terminal apparatus according to claim 6, further comprising speech information transmitting means for transmitting speech information including information related to the voice acquired by the voice acquisition means and an identification result by the identification means.

8. The voice terminal apparatus according to claim 6, wherein the utterance signal receiving unit receives an utterance signal related to the utterance of the other person transmitted by the utterance signal transmitting unit of the other apparatus worn by the other person. .

9. The utterance signal transmitting means transmits the utterance signal with a higher radio field intensity as the sound pressure of the wearer's voice acquired by the voice acquisition means is larger. The voice terminal device according to claim 1.

On the computer,
A function of acquiring voice information from a plurality of voice acquisition means arranged at different positions from the wearer's utterance site and acquiring the voice of the speaker;
A function for identifying whether the speaker is the wearer or another person other than the wearer based on the sound pressure difference between the voices acquired by the at least two voice acquisition means;
A function of transmitting an utterance signal related to the utterance of the wearer with a radio wave intensity based on a sound pressure of the wearer's voice acquired by the voice acquisition unit when the speaker is identified as the wearer. When,
A program that realizes a function of receiving an utterance signal related to another person's utterance.