JP2019022131A

JP2019022131A - Speech voice collection method for receiving speech voices for each group, system, speech analysis server and program

Info

Publication number: JP2019022131A
Application number: JP2017140392A
Authority: JP
Inventors: 河合　直樹; Naoki Kawai; 直樹河合
Original assignee: KDDI Corp
Current assignee: KDDI Corp
Priority date: 2017-07-19
Filing date: 2017-07-19
Publication date: 2019-02-07
Anticipated expiration: 2037-07-19
Also published as: JP6755220B2

Abstract

To provide a speech voice collection method or the like capable of receiving speech voices for each group in such a manner that analysis of speech voices in a group work is not congested.SOLUTION: Multiple groups consist of multiple users. A speech voice collection method includes: a first step of each user terminal transmitting to a speech analysis server voice interval detection information making a terminal identifier and a group identifier correspondent to one or more speech intervals for each continuous predetermined voice interval; a second step of the speech analysis server receiving multiple pieces of speech interval detection information from multiple user terminals for each group and determining a speech voice reception order of the groups based on a predetermined condition in multiple speech intervals included in a voice interval; and a third step of the speech analysis server transmitting a speech voice request corresponding to the speech interval to the multiple user terminals for the unit of groups in the speech voice reception order and receiving speech voices from the user terminals.SELECTED DRAWING: Figure 2

Description

本発明は、グループワークにおけるユーザの発話分析の技術に関する。 The present invention relates to a technique for analyzing user utterances in group work.

近年、教育やビジネスの分野における２１世紀型スキルとして、批評的思考力や課題解決力の養成が重要となってきている。そのために、知識を一方的に提供するセミナ形式ではなく、アクティブ・ラーニングに基づくグループワーク形式が注目されている。
グループワーク形式は、例えば３〜４人で１つのグループを構成し、知識を交換し合いながら、課題の解決に向かって創造的に話し合いを続ける協働学習に基づくものである。ここで、グループワークを管理する教師は、活発な議論が行われていないグループに対しては、有効なアドバイスを提案する必要がある。 In recent years, critical thinking skills and problem solving skills have become important as 21st century skills in the fields of education and business. For this purpose, attention is focused on a group work format based on active learning, not a seminar format that provides knowledge unilaterally.
The group work format is based on collaborative learning in which, for example, 3 to 4 people form one group and exchange ideas while continuing creative discussions toward solving problems. Here, the teacher who manages the group work needs to propose effective advice to the group in which active discussion is not conducted.

一方で、グループワーク形式の場合、１人の教師が、全てのグループの議論の学習状況を観察することはできない。そのために、ＩＣＴ(Information and Communication Technology)を用いて、複数のグループの学習状況を同時に把握する技術が求められる。
例えば、発話分析サーバが、全てのグループにおける各ユーザの発話音声を収集し、その発話音声を音声認識処理によってテキスト化し、発話分析処理によってその分析結果を教師へ提供する技術がある。教師は、グループ毎の分析結果を見ながら、学習状況を観察することができる。 On the other hand, in the case of the group work format, one teacher cannot observe the learning situation of discussion of all groups. For this purpose, there is a demand for a technique for simultaneously grasping the learning status of a plurality of groups using ICT (Information and Communication Technology).
For example, there is a technique in which an utterance analysis server collects utterance voices of each user in all groups, texts the utterance voices by voice recognition processing, and provides the analysis results to the teacher by utterance analysis processing. The teacher can observe the learning status while looking at the analysis results for each group.

図１は、グループワークにおけるシステム構成図である。 FIG. 1 is a system configuration diagram in group work.

図１によれば、３〜４人のユーザで議論するグループが、複数構成されている。図１のシステムによれば、発話分析サーバ１と、複数のユーザ端末２及び管理者端末３とが、アクセスポイント４を介して接続されている。 According to FIG. 1, a plurality of groups to be discussed by 3 to 4 users are configured. According to the system of FIG. 1, the utterance analysis server 1 is connected to a plurality of user terminals 2 and an administrator terminal 3 via an access point 4.

発話分析サーバ１は、ユーザ端末２から発話音声を受信し、グループ毎に発話音声を分析する。
ユーザ端末２は、各ユーザによって保持又は装着されるマイク装置であって、具体的には、当該ユーザの発話音声を収録するスマートフォンのようなものである。
管理者端末３は、発話分析サーバ１からグループ毎の分析結果を取得し、教師へ明示する。
アクセスポイント４は、一方をＷＡＮ(Wide Area Network)に接続し、他方をＬＡＮ(Local Area Network)に接続し、そのＬＡＮを介して複数のユーザ端末２を収容する。アクセスポイント４とユーザ端末２との間のネットワークは、無線／有線ＬＡＮであってもよいし、Bluetooth（登録商標）や事業者アクセスネットワークであってもよい。 The utterance analysis server 1 receives the utterance voice from the user terminal 2 and analyzes the utterance voice for each group.
The user terminal 2 is a microphone device that is held or worn by each user, and specifically, is a smartphone that records the voice of the user.
The administrator terminal 3 acquires the analysis result for each group from the utterance analysis server 1 and clearly indicates it to the teacher.
One access point 4 is connected to a WAN (Wide Area Network), the other is connected to a LAN (Local Area Network), and accommodates a plurality of user terminals 2 via the LAN. The network between the access point 4 and the user terminal 2 may be a wireless / wired LAN, a Bluetooth (registered trademark), or a carrier access network.

発話分析サーバ１は、音声認識や発話分析の処理に、一定時間を要する。そのために、ユーザ端末２が、発話音声をランダムに発話分析サーバ１へアップロードした場合、音声認識処理の一時的な処理増大による遅延や、グループ毎の発話音声が揃わないことによる発話分析処理の処理待ちによる遅延が生じることなる。 The utterance analysis server 1 requires a certain amount of time for speech recognition and utterance analysis processing. For this reason, when the user terminal 2 randomly uploads speech to the speech analysis server 1, a delay due to a temporary increase in speech recognition processing or speech analysis processing due to a lack of speech for each group There will be a delay due to waiting.

尚、従来、多数の携帯端末における輻輳について、サービスの待ち時間の増大による処理遅延の発生頻度を低下させる技術がある（例えば特許文献１参照）。この技術によれば、端末とサービス提供装置との間に、受付装置が備えられる。受付装置は、端末の識別番号を待ち行列に受け付け、その待ち行列をサービス提供装置に通知する。サービス提供装置は、その識別番号に基づいて携帯端末をスレーブとして、輻輳状態を生じないようにサービスを提供する。 Conventionally, there is a technique for reducing the frequency of processing delay due to an increase in service waiting time for congestion in a large number of mobile terminals (see, for example, Patent Document 1). According to this technique, a receiving device is provided between the terminal and the service providing device. The accepting device accepts the terminal identification number in a queue and notifies the service providing device of the queue. The service providing apparatus provides a service based on the identification number so that the mobile terminal is a slave so that a congestion state does not occur.

特開２００７−１３３９１２号公報JP 2007-133912 A

一般に、ユーザ端末２によれば、発話音声をアップロードするタイミングは、制御されていない。そのために、多数のユーザ端末２から同時に発話音声が送信された場合、トラヒック増加によるネットワーク輻輳や、音声認識処理の処理増大又は発話分析処理の待ち遅延による輻輳を生じ、コネクションが遮断されることもある。
特に、同じ室内で同時に多数のユーザによるグループワークが実施された場合、アクセスポイント４の配下に多数のユーザ端末２が接続されることとなる。この場合、無線ＬＡＮの伝送容量が不足すると共に、ユーザ端末２から発話分析サーバ１への発話音声のデータがアクセスポイント４に集中し、その結果、輻輳が生じやすくなる。 In general, according to the user terminal 2, the timing of uploading the speech voice is not controlled. Therefore, when uttered voices are simultaneously transmitted from a large number of user terminals 2, network congestion due to increased traffic, congestion due to increased speech recognition processing or waiting delay in utterance analysis processing may occur, and the connection may be blocked. is there.
In particular, when group work is performed by a large number of users at the same time in the same room, a large number of user terminals 2 are connected under the access point 4. In this case, the transmission capacity of the wireless LAN is insufficient and speech data from the user terminal 2 to the speech analysis server 1 is concentrated on the access point 4, and as a result, congestion is likely to occur.

また、グループワークを管理する教師は、グループ毎の分析結果を見ながら、議論が活発でないグループを見出すように観察する。そのために、教師としては、できる限り、議論が活発でないグループから順に、その分析結果に応じたアドバイスを提供したいと考える。 In addition, the teacher who manages the group work observes the group so that the discussion is not active while looking at the analysis result for each group. Therefore, as much as possible, the teacher wants to provide advice according to the analysis results in order from the group with less active discussion.

そこで、本発明は、グループワークにおける発話音声の分析に輻輳が生じないようにすると共に、議論が活発でないグループから優先的に発話音声を受信する発話音声収集方法、システム、発話分析サーバ及びプログラムを提供することを目的とする。 Therefore, the present invention provides an utterance voice collection method, system, utterance analysis server, and program for preferentially receiving utterance voice from a group where discussion is not active while preventing congestion in the analysis of the utterance voice in group work. The purpose is to provide.

本発明によれば、ユーザ毎に発話音声を収録する複数のユーザ端末と、各ユーザ端末からネットワークを介して発話音声を受信する発話分析サーバとを有するシステムの発話音声収集方法において、
複数のユーザからなる複数のグループが構成されており、
各ユーザ端末が、連続する所定の音声区間毎に、１つ以上の発話区間に端末識別子及びグループ識別子を対応付けた音声区間検出情報を、発話分析サーバへ送信する第１のステップと、
発話分析サーバが、グループ毎に複数のユーザ端末から複数の音声区間検出情報を受信し、音声区間に含まれる複数の発話区間における所定条件に基づいて、グループの発話音声受信順序を決定する第２のステップと、
発話分析サーバが、発話音声受信順序のグループ単位で、複数のユーザ端末へ発話区間に対応した発話音声要求を送信し、各ユーザ端末から発話音声を受信する第３のステップと
を有することを特徴とする。 According to the present invention, in a utterance voice collection method of a system having a plurality of user terminals that record utterance voices for each user and an utterance analysis server that receives utterance voices from each user terminal via a network,
There are multiple groups of multiple users,
A first step in which each user terminal transmits voice segment detection information in which a terminal identifier and a group identifier are associated with one or more speech segments to a speech analysis server for each predetermined continuous speech segment;
A speech analysis server receives a plurality of speech segment detection information from a plurality of user terminals for each group, and determines a speech speech reception order of groups based on predetermined conditions in a plurality of speech segments included in the speech segment And the steps
The utterance analysis server includes a third step of transmitting an utterance voice request corresponding to an utterance section to a plurality of user terminals in units of groups of the utterance voice reception order, and receiving the utterance voice from each user terminal. And

本発明のシステムの発話音声収集方法における他の実施形態によれば、
第２のステップについて、所定条件に基づくグループの発話音声受信順序は、各グループにおける複数のユーザの発話区間の合計時間又は平均時間が短い順である
ことも好ましい。 According to another embodiment of the speech voice collection method of the system of the present invention,
Regarding the second step, it is preferable that the speech reception order of the groups based on the predetermined condition is the order in which the total time or the average time of the speech sections of a plurality of users in each group is short.

本発明のシステムの発話音声収集方法における他の実施形態によれば、
過去の多数のグループにおける複数のユーザの発話区間の合計時間又は平均時間に基づく確率分布を予め作成しておき、
第２のステップについて、所定条件に基づくグループの発話音声受信順序は、各グループにおける複数のユーザの発話区間の合計時間又は平均時間が、確率分布の所定確率範囲外に離れている順である
ことも好ましい。 According to another embodiment of the speech voice collection method of the system of the present invention,
Create a probability distribution based on the total time or average time of the utterance sections of a plurality of users in a number of past groups in advance,
Regarding the second step, the speech reception order of the groups based on the predetermined condition is the order in which the total time or average time of the speech sections of the plurality of users in each group is out of the predetermined probability range of the probability distribution. Is also preferable.

本発明のシステムの発話音声収集方法における他の実施形態によれば、
第２のステップについて、所定条件に基づくグループの発話音声受信順序は、各グループにおける複数のユーザの発話区間について、所定閾値時間以上となる「長時間発話区間の数」が少ない順であることも好ましい。 According to another embodiment of the speech voice collection method of the system of the present invention,
Regarding the second step, the speech reception order of the groups based on the predetermined condition may be the order in which the “number of long-time speech sections” that are equal to or longer than the predetermined threshold time is small for the speech sections of a plurality of users in each group. preferable.

本発明のシステムの発話音声収集方法における他の実施形態によれば、
第２のステップについて、所定条件に基づくグループの発話音声受信順序は、各グループにおける複数のユーザの発話区間について、所定閾値時間よりも短い「短時間発話区間の数」に対する、所定閾値時間以上となる「長時間発話区間の数」の割合が小さい順であることも好ましい。 According to another embodiment of the speech voice collection method of the system of the present invention,
Regarding the second step, the speech reception order of the group based on the predetermined condition is equal to or greater than a predetermined threshold time for the “number of short-time speech sections” shorter than the predetermined threshold time for the speech sections of a plurality of users in each group. It is also preferable that the ratio of “the number of long utterance sections” is in ascending order.

本発明のシステムの発話音声収集方法における他の実施形態によれば、
第２のステップについて、所定条件に基づくグループの発話音声受信順序は、各ユーザの発話区間の合計時間又は平均時間と所定閾値時間との差分となる絶対差分時間を算出し、その総時間が大きい順であることも好ましい。 According to another embodiment of the speech voice collection method of the system of the present invention,
As for the second step, the speech reception order of the group based on the predetermined condition is calculated as an absolute difference time that is a difference between the total time or average time of the utterance section of each user and the predetermined threshold time, and the total time is large. It is also preferred that the order be.

本発明のシステムの発話音声収集方法における他の実施形態によれば、
第２のステップについて、所定条件に基づくグループの発話音声受信順序は、
過去の多数のグループにおける複数のユーザの発話区間の合計時間又は平均時間に基づく確率分布を予め作成しておき、
グループの各ユーザについて確率分布における確率値の合計値を算出し、その合計値が小さい順であることも好ましい。 According to another embodiment of the speech voice collection method of the system of the present invention,
Regarding the second step, the speech reception order of the group based on the predetermined condition is as follows:
Create a probability distribution based on the total time or average time of the utterance sections of a plurality of users in a number of past groups in advance,
It is also preferable that the total value of the probability values in the probability distribution is calculated for each user in the group, and the total value is in ascending order.

本発明のシステムの発話音声収集方法における他の実施形態によれば、
発話区間は、ユーザの発話の開始時刻及び終了時刻である
ことも好ましい。 According to another embodiment of the speech voice collection method of the system of the present invention,
The utterance section is also preferably the start time and end time of the user's utterance.

本発明によれば、ユーザ毎に発話音声を収録する複数のユーザ端末と、
各ユーザ端末からネットワークを介して発話音声を受信する発話分析サーバと、
複数のグループを管理する管理者端末と、
複数のユーザ端末へ発話区間に対応した発話音声要求を送信する通知サーバと
を有するシステムの発話音声収集方法であって、
複数のユーザからなる複数のグループが構成されており、
各ユーザ端末が、連続する所定の音声区間毎に、１つ以上の発話区間に端末識別子及びグループ識別子を対応付けた音声区間検出情報を、発話分析サーバへ送信する第１のステップと、
発話分析サーバが、グループ毎に複数のユーザ端末から複数の音声区間検出情報を受信し、音声区間に含まれる複数の発話区間における所定条件に基づいて、グループの発話音声受信順序を決定し、当該発話音声受信順序を管理者端末へ送信する第２のステップと、
管理者端末が、管理者の操作に応じて、発話音声受信順序に基づくグループ識別子を、通知サーバへ送信する第３のステップと、
通知サーバが、管理者端末から受信したグループ識別子に属する複数のユーザ端末へ発話区間に対応した発話音声要求を送信する第４のステップと、
発話音声要求を受信した各ユーザ端末が、発話音声を発話分析サーバへ送信する第５のステップと
を有することを特徴とする。 According to the present invention, a plurality of user terminals that record speech for each user;
An utterance analysis server that receives uttered voices from each user terminal via a network;
An administrator terminal that manages multiple groups,
An utterance voice collection method for a system having a notification server that transmits an utterance voice request corresponding to an utterance section to a plurality of user terminals,
There are multiple groups of multiple users,
A first step in which each user terminal transmits voice segment detection information in which a terminal identifier and a group identifier are associated with one or more speech segments to a speech analysis server for each predetermined continuous speech segment;
The utterance analysis server receives a plurality of speech section detection information from a plurality of user terminals for each group, determines a group speech speech reception order based on a predetermined condition in a plurality of utterance sections included in the speech section, and A second step of transmitting the utterance voice reception order to the administrator terminal;
A third step in which the administrator terminal transmits a group identifier based on the utterance voice reception order to the notification server in response to an operation of the administrator;
A fourth step in which the notification server transmits an utterance voice request corresponding to the utterance section to a plurality of user terminals belonging to the group identifier received from the administrator terminal;
Each user terminal that has received the utterance voice request has a fifth step of transmitting the utterance voice to the utterance analysis server.

本発明によれば、ユーザ毎に発話音声を収録する複数のユーザ端末と、各ユーザ端末からネットワークを介して発話音声を受信する発話分析サーバとを有するシステムにおいて、
複数のユーザからなる複数のグループが構成されており、
各ユーザ端末は、
連続する所定の音声区間毎に、１つ以上の発話区間に端末識別子及びグループ識別子を対応付けた音声区間検出情報を、発話分析サーバへ送信する音声区間検出情報送信手段と、
発話分析サーバから発話音声要求を受信した際に、当該発話分析サーバへ発話音声を送信する発話音声送信手段と
を有し、
発話分析サーバは、
グループ毎に複数のユーザ端末から複数の音声区間検出情報を受信し、音声区間に含まれる複数の発話区間における所定条件に基づいて、グループの発話音声受信順序を決定する発話音声受信順序決定手段と、
発話音声受信順序のグループ単位で、複数のユーザ端末へ発話区間に対応した発話音声要求を送信し、各ユーザ端末から発話音声を受信する発話音声受信手段と
を有することを特徴とする。 According to the present invention, in a system having a plurality of user terminals that record utterance voices for each user, and an utterance analysis server that receives utterance voices from each user terminal via a network,
There are multiple groups of multiple users,
Each user terminal
Voice segment detection information transmitting means for transmitting to the speech analysis server voice segment detection information in which a terminal identifier and a group identifier are associated with one or more speech segments for each predetermined continuous voice segment;
An utterance voice transmitting means for transmitting an utterance voice to the utterance analysis server when an utterance voice request is received from the utterance analysis server;
Utterance analysis server
Utterance voice reception order determination means for receiving a plurality of voice section detection information from a plurality of user terminals for each group and determining a utterance voice reception order of the group based on a predetermined condition in a plurality of utterance sections included in the voice section; ,
It is characterized by having speech voice receiving means for transmitting a speech voice request corresponding to a speech section to a plurality of user terminals and receiving a speech voice from each user terminal for each group of speech voice reception order.

本発明によれば、
ユーザ毎に発話音声を収録する複数のユーザ端末と、
各ユーザ端末からネットワークを介して発話音声を受信する発話分析サーバと、
複数のグループを管理する管理者端末と、
複数のユーザ端末へ発話区間に対応した発話音声要求を送信する通知サーバと
を有するシステムであって、
複数のユーザからなる複数のグループが構成されており、
各ユーザ端末は、
連続する所定の音声区間毎に、１つ以上の発話区間に端末識別子及びグループ識別子を対応付けた音声区間検出情報を、発話分析サーバへ送信する音声区間検出情報送信手段と、
通知サーバから発話音声要求を受信した際に、発話分析サーバへ発話音声を送信する発話音声送信手段と
を有し、
発話分析サーバは、
グループ毎に複数のユーザ端末から複数の音声区間検出情報を受信し、音声区間に含まれる複数の発話区間における所定条件に基づいて、グループの発話音声受信順序を決定する発話音声受信順序決定手段と、
発話音声受信順序を、管理者端末へ送信する発話音声受信順序送信手段と、
ユーザ端末から、発話音声を受信する発話音声受信手段と
を有し、
管理者端末が、管理者の操作に応じて、発話音声受信順序に基づくグループ識別子を、通知サーバへ送信し、
通知サーバが、管理者端末から受信したグループ識別子に属する複数のユーザ端末へ発話区間に対応した発話音声要求を送信する
ことを特徴とする。 According to the present invention,
Multiple user terminals that record speech for each user;
An utterance analysis server that receives uttered voices from each user terminal via a network;
An administrator terminal that manages multiple groups,
A system having a notification server that transmits an utterance voice request corresponding to an utterance section to a plurality of user terminals,
There are multiple groups of multiple users,
Each user terminal
Voice segment detection information transmitting means for transmitting to the speech analysis server voice segment detection information in which a terminal identifier and a group identifier are associated with one or more speech segments for each predetermined continuous voice segment;
An utterance voice transmitting means for transmitting the utterance voice to the utterance analysis server when the utterance voice request is received from the notification server;
Utterance analysis server
Utterance voice reception order determination means for receiving a plurality of voice section detection information from a plurality of user terminals for each group and determining a utterance voice reception order of the group based on a predetermined condition in a plurality of utterance sections included in the voice section; ,
An utterance voice reception order transmission means for transmitting the utterance voice reception order to the administrator terminal;
Utterance voice receiving means for receiving utterance voice from the user terminal,
In response to the operation of the administrator, the administrator terminal sends a group identifier based on the speech reception order to the notification server,
The notification server transmits an utterance voice request corresponding to the utterance section to a plurality of user terminals belonging to the group identifier received from the administrator terminal.

本発明によれば、ユーザ毎に発話音声を収録する複数のユーザ端末から、ネットワークを介して発話音声を受信する発話分析サーバにおいて、
複数のユーザからなる複数のグループが構成されており、
各ユーザ端末から、連続する所定の音声区間毎に、１つ以上の発話区間に端末識別子及びグループ識別子を対応付けた音声区間検出情報を受信する音声区間検出情報受信手段と、
複数の音声区間検出情報を用いて、音声区間に含まれる複数の発話区間における所定条件に基づいて、グループの発話音声受信順序を決定する発話音声受信順序決定手段と、
発話音声受信順序のグループ単位で、複数のユーザ端末へ発話区間に対応した発話音声要求を送信し、各ユーザ端末から発話音声を受信する発話音声受信手段と
を有することを特徴とする。 According to the present invention, in a utterance analysis server that receives uttered voices from a plurality of user terminals that record uttered voices for each user via a network,
There are multiple groups of multiple users,
Voice segment detection information receiving means for receiving, from each user terminal, voice segment detection information in which a terminal identifier and a group identifier are associated with one or more speech segments for each predetermined continuous voice segment;
Utterance voice reception order determination means for determining a utterance voice reception order of a group based on a predetermined condition in a plurality of utterance sections included in the voice section, using a plurality of voice section detection information;
It is characterized by having speech voice receiving means for transmitting a speech voice request corresponding to a speech section to a plurality of user terminals and receiving a speech voice from each user terminal for each group of speech voice reception order.

本発明によれば、ユーザ毎に発話音声を収録する複数のユーザ端末から、ネットワークを介して発話音声を受信する発話分析サーバに搭載されたコンピュータを機能させるプログラムにおいて、
複数のユーザからなる複数のグループが構成されており、
各ユーザ端末から、連続する所定の音声区間毎に、１つ以上の発話区間に端末識別子及びグループ識別子を対応付けた音声区間検出情報を受信する音声区間検出情報受信手段と、
複数の音声区間検出情報を用いて、音声区間に含まれる複数の発話区間における所定条件に基づいて、グループの発話音声受信順序を決定する発話音声受信順序決定手段と、
発話音声受信順序のグループ単位で、複数のユーザ端末へ発話区間に対応した発話音声要求を送信し、各ユーザ端末から発話音声を受信する発話音声受信手段と
してコンピュータを機能させることを特徴とする。 According to the present invention, in a program for functioning a computer mounted on an utterance analysis server that receives uttered speech from a plurality of user terminals that record uttered speech for each user via a network,
There are multiple groups of multiple users,
Voice segment detection information receiving means for receiving, from each user terminal, voice segment detection information in which a terminal identifier and a group identifier are associated with one or more speech segments for each predetermined continuous voice segment;
Utterance voice reception order determination means for determining a utterance voice reception order of a group based on a predetermined condition in a plurality of utterance sections included in the voice section, using a plurality of voice section detection information;
The computer is caused to function as an utterance voice receiving unit that transmits an utterance voice request corresponding to an utterance section to a plurality of user terminals in groups of the utterance voice reception order, and receives the utterance voice from each user terminal.

本発明の発話音声収集方法、システム、発話分析サーバ及びプログラムによれば、グループワークにおける発話音声の分析に輻輳が生じないようにすると共に、議論が活発でないグループから優先的に発話音声を受信することができる。 According to the utterance voice collection method, system, utterance analysis server, and program of the present invention, congestion is not generated in the utterance voice analysis in the group work, and the utterance voice is preferentially received from the group where the discussion is not active. be able to.

グループワークにおけるシステム構成図である。It is a system configuration figure in group work. 本発明における第１のシーケンス図である。It is a 1st sequence diagram in this invention. グループワーク時間を区分した音声区間を表す説明図である。It is explanatory drawing showing the audio | voice area which divided the group work time. 音声区間を区分したユーザ毎の発話区間を表す説明図である。It is explanatory drawing showing the speech area for every user which divided the audio | voice area. 本発明における第２のシーケンス図である。It is a 2nd sequence diagram in this invention. 本発明におけるグループの発話音声受信順序の決定方法を表す説明図である。It is explanatory drawing showing the determination method of the speech reception order of the group in this invention. 本発明におけるグループ内のユーザの発話パターンに基づく発話音声受信順序の決定方法を表す説明図である。It is explanatory drawing showing the determination method of the speech audio | voice reception order based on the speech pattern of the user in the group in this invention. 本発明における発話分析サーバ及びユーザ端末の機能構成図である。It is a function block diagram of the speech analysis server in this invention, and a user terminal. 本発明における通知サーバを含むシステム構成図である。It is a system configuration figure containing a notice server in the present invention.

以下では、本発明の実施形態について、図面を用いて詳細に説明する。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.

本発明が想定するシステムとしては、前述した図１と同様に、複数のユーザからなる複数のグループが構成されており、以下の装置がネットワークを介して接続されている。
複数のユーザ端末：ユーザ毎に発話音声の収録
発話分析サーバ：各ユーザ端末から発話音声を受信し、グループ毎に発話分析
本発明によれば、グループワークにおける発話音声のアップロードやサーバ処理に輻輳が生じないようにするべく、発話分析サーバは、グループ内の議論が活発でないと想定されるグループに属するユーザ端末から順に、その発話音声をアップロードさせる。発話分析サーバ１は、そのようなグループから順に発話音声に対する音声認識処理及び発話分析処理を実行し、その分析結果を、教師が操作する管理者端末へ明示する。 As a system assumed by the present invention, a plurality of groups of a plurality of users are configured as in FIG. 1 described above, and the following apparatuses are connected via a network.
Multiple user terminals: Recording of utterance voice for each user Speaking analysis server: receiving utterance voice from each user terminal and utterance analysis for each group According to the present invention, there is congestion in uploading utterance voice and server processing in group work. In order not to occur, the utterance analysis server uploads the utterance voices in order from the user terminals belonging to the group in which discussion within the group is assumed not to be active. The utterance analysis server 1 executes speech recognition processing and utterance analysis processing on the uttered speech in order from such a group, and clarifies the analysis result to the administrator terminal operated by the teacher.

図２は、本発明における第１のシーケンス図である。 FIG. 2 is a first sequence diagram in the present invention.

（Ｓ１）各ユーザ端末２は、いずれか１つのグループに属しており、グループワーク時間の各ユーザの発話音声から、連続する所定の音声区間毎に、１つ以上の発話区間を検出する。 (S1) Each user terminal 2 belongs to one of the groups, and detects one or more utterance sections for each predetermined continuous voice section from the utterance voice of each user during the group work time.

図３は、グループワーク時間を区分した音声区間を表す説明図である。
図３によれば、例えば３０分間のグループワーク時間が、例えば５分間の音声区間に区分されている。ユーザ端末２は、グループワークが開始されると、発話音声の収録が可能となる。また、各ユーザ端末は、音声区間毎に発話音声の収録の開始と終了とを繰り返す。 FIG. 3 is an explanatory diagram showing a voice section in which group work time is divided.
According to FIG. 3, for example, a group work time of 30 minutes is divided into, for example, a voice interval of 5 minutes. When the group work is started, the user terminal 2 can record a speech voice. In addition, each user terminal repeats the start and end of recording of uttered voice for each voice section.

図４は、音声区間を区分したユーザ毎の発話区間を表す説明図である。
図４によれば、グループ毎の音声区間は、複数のユーザの発話区間が含まれる。発話区間とは、ユーザの発話として音声が検出された「開始時刻及び終了時刻」で表される。 FIG. 4 is an explanatory diagram showing an utterance section for each user into which voice sections are divided.
According to FIG. 4, the voice section for each group includes the speech sections of a plurality of users. The utterance section is represented by “start time and end time” at which voice is detected as a user's utterance.

ユーザ端末２は、例えば音量が所定閾値以上となった際に、発話の開始時刻とし、所定閾値よりも小さくなった音量が所定時間以上継続した場合、発話の終了時刻とする。このようにして、発話区間を区分していく。１つの音声区間内で、１人のユーザが複数回発話した場合、その発話毎の発話区間を含む音声検出情報が生成される。
尚、ここでは、実施形態として、音量（総電力）に基づく発話検出方法について説明したが、他の実施形態として、低周波数帯域の電力、周波数スペクトル、ゼロクロッシングに基づくもの等、様々な既存技術を適用することができる。 For example, the user terminal 2 sets the utterance start time when the volume exceeds a predetermined threshold, and sets the utterance end time when the volume lower than the predetermined threshold continues for a predetermined time or more. In this way, the utterance section is divided. When one user utters a plurality of times within one voice section, voice detection information including the utterance section for each utterance is generated.
Here, the speech detection method based on sound volume (total power) has been described as an embodiment. However, as other embodiments, various existing technologies such as those based on low-frequency band power, frequency spectrum, and zero crossing can be used. Can be applied.

音声区間検出情報は、「ＶＡＤ(Voice activity Detection)情報」と称される。ＶＡＤ情報には、例えば以下の情報を含む短いテキストデータである。
「グループＩＤ」「ユーザ（端末）ＩＤ」「発話区間」 The voice section detection information is referred to as “VAD (Voice activity detection) information”. The VAD information is, for example, short text data including the following information.
“Group ID” “User (terminal) ID” “Speaking section”

全てのユーザ端末２が、１つの音声区間が終了する毎に、ＶＡＤ情報を、発話分析サーバ１へ送信する。ＶＡＤ情報は、短いテキストデータに過ぎないために、極めて短時間で、発話分析サーバ１へ送信することができる。 All user terminals 2 transmit VAD information to the utterance analysis server 1 every time one voice section ends. Since the VAD information is only short text data, it can be transmitted to the speech analysis server 1 in a very short time.

（Ｓ２）発話分析サーバ１は、複数のユーザ端末から複数のＶＡＤ情報を受信する。発話分析サーバ１は、グループ番号を参照して、グループ毎にＶＡＤ情報を分類する。
そして、発話分析サーバ１は、音声区間に含まれる複数の発話区間における所定条件に基づいて、グループの「発話音声受信順序」を決定する。
発話音声受信順序は、議論が活発でないグループから順に決定されるものであり、図５以降で後述する。
尚、図２によれば、「発話音声受信順序」として、グループ２->３->１の順に決定されたとする。ここでは、グループ２における議論が最も活発でないと判定されている。 (S2) The utterance analysis server 1 receives a plurality of VAD information from a plurality of user terminals. The utterance analysis server 1 refers to the group number and classifies the VAD information for each group.
Then, the utterance analysis server 1 determines the “utterance voice reception order” of the group based on a predetermined condition in a plurality of utterance sections included in the voice section.
The speech reception order is determined in order from the group in which discussion is not active, and will be described later with reference to FIG.
According to FIG. 2, it is assumed that the “utterance voice reception order” is determined in the order of groups 2->3-> 1. Here, it is determined that the discussion in group 2 is the least active.

（Ｓ３）発話分析サーバ１は、発話音声受信順序のグループ単位で、複数のユーザ端末２へ発話音声要求を送信する。図２によれば、最初に、グループ２に属する複数のユーザ端末２へ、発話音声要求が送信される。
これに対し、発話音声要求を受信したグループ２の各ユーザ端末２は、発話音声を、発話分析サーバ１へ送信する。
発話分析サーバ１は、発話音声を受信したグループ２について、音声認識処理及び発話分析処理によって発話を分析する。
そして、発話分析サーバ１は、グループ２の分析が終了すると、発話音声受信順序に応じた次のグループ３の複数のユーザ端末２へ、発話音声要求を送信する。
このように、発話分析サーバ１は、各ユーザ端末２が収録した音声区間の発話音声を、発話音声受信順序に応じたグループ単位の順番で、アップロードさせる。 (S3) The utterance analysis server 1 transmits an utterance voice request to the plurality of user terminals 2 in units of groups in the utterance voice reception order. According to FIG. 2, first, an utterance voice request is transmitted to a plurality of user terminals 2 belonging to group 2.
On the other hand, each user terminal 2 of the group 2 that has received the utterance voice request transmits the utterance voice to the utterance analysis server 1.
The utterance analysis server 1 analyzes utterances by speech recognition processing and utterance analysis processing for the group 2 that has received the utterance speech.
And the utterance analysis server 1 will transmit an utterance audio | voice request | requirement to the several user terminal 2 of the following group 3 according to an utterance audio | voice reception order, if the analysis of the group 2 is complete | finished.
As described above, the utterance analysis server 1 uploads the utterance voices of the voice sections recorded by the user terminals 2 in the order of the group unit corresponding to the utterance voice reception order.

本発明によれば、グループ単位（即ちグループ内の全てのユーザ端末）で、発話音声がアクセスポイントを介して発話分析サーバ１へアップロードされるために、ＬＡＮ内で輻輳が生じることもない。尚、グループ単位の発話音声は、ユーザ端末２を１台ずつアプロードさせてもよいし、グループ単位で複数のユーザ端末２に同時にアップロードさせてもよい。 According to the present invention, since the uttered voice is uploaded to the utterance analysis server 1 via the access point in units of groups (that is, all user terminals in the group), congestion does not occur in the LAN. Note that the utterance voice in units of groups may be uploaded by one user terminal 2 or may be simultaneously uploaded to a plurality of user terminals 2 in units of groups.

図５は、本発明におけるグループの発話音声受信順序の決定方法を表す説明図である。 FIG. 5 is an explanatory diagram showing a method for determining the speech reception order of groups in the present invention.

発話音声受信順序は、グループ内の議論の活性化の判定指標に基づく。以下では、７つの発話音声受信順序の決定方法の実施形態について説明する。 The speech reception order is based on a determination index for activation of discussion within the group. In the following, an embodiment of a method for determining the order of receiving seven speech voices will be described.

＜第１の実施形態：発話区間の合計時間に基づく順序決定方法＞
図５（ａ）によれば、発話音声受信順序は、各グループにおける複数のユーザの発話区間の合計時間が短い順である。
同じ音声区間であっても、グループ内の発話区間の合計時間が短いほど、その音声認識及び発話分析の処理時間も短くなる。
第１の実施形態によれば、グループ内の議論が活発でないために、発話区間の合計時間が短いグループほど、他のグループよりも早く、発話音声を収集して分析することができる。これは、管理者端末３を操作する教師にとっても、議論が活発でないグループから順に、その発話分析結果を早く知ることができる。 <First Embodiment: Order Determination Method Based on Total Time of Speech Section>
According to Fig.5 (a), a speech audio | voice reception order is an order with the short total time of the speech area of the some user in each group.
Even for the same speech section, the shorter the total time of the speech sections in the group, the shorter the processing time for speech recognition and speech analysis.
According to the first embodiment, since discussion within the group is not active, the speech time can be collected and analyzed earlier in the group in which the total duration of the speech period is shorter than in other groups. This enables the teacher who operates the administrator terminal 3 to quickly know the speech analysis results in order from the group in which discussion is not active.

＜第２の実施形態：発話区間の平均時間に基づく順序決定方法＞
図５（ｂ）によれば、発話音声受信順序は、各グループにおける複数のユーザの発話区間の平均時間が短い順である。
平均時間＝発話区間の合計時間／発話回数
発話区間の平均時間が短いほど、相づち（うん）、つなぎ言葉（それで）、賛成（はい）、質問・否定（なんで）など、自分の意見を話していない場合が多い。一方で、発話区間の平均時間が長いほど、自分の意見、説明、内容のある話をしている場合が多い。
第２の実施形態によれば、グループ内の議論が活発でないために、発話区間の平均時間が短いグループほど、他のグループよりも早く、発話音声を収集して分析することができる。 <Second Embodiment: Order Determination Method Based on Average Time of Speaking Section>
According to FIG.5 (b), a speech audio | voice reception order is an order with short average time of the speech area of the some user in each group.
Average time = total time of utterance section / number of utterances The shorter the average time of the utterance section, the more you talk about your opinion, such as companion (yes), connective words (thus), yes (yes), questions / denies (why) Often not. On the other hand, the longer the average duration of the utterance section, the more often the story has its own opinion, explanation, and content.
According to the second embodiment, since the discussion within the group is not active, a group having a shorter average duration of the utterance section can collect and analyze the utterance voice earlier than the other groups.

＜第３の実施形態：発話区間の合計時間／平均時間の確率分布に基づく順序決定方法＞
図５（ｃ）によれば、過去の多数のグループにおける複数のユーザの発話区間の合計時間又は平均時間に基づく確率分布を予め作成している。
そして、グループの発話音声受信順序は、各グループにおける複数のユーザの発話区間の合計時間又は平均時間が、確率分布の所定確率範囲外に離れている順とする。
確率分布を正規分布とした場合、例えば複数のグループの平均からのずれが±１σから外れている場合、そのグループ内の議論が活発でないと判定できる。
第３の実施形態によれば、グループ内の議論が活発でないために、確率分布の平均から所定確率範囲外に離れたグループほど、他のグループよりも早く、発話音声を収集して分析することができる。 <Third Embodiment: Order Determination Method Based on Probability Distribution of Total Time / Average Time of Speech Section>
According to FIG.5 (c), the probability distribution based on the total time or average time of the speech area of the some user in many past groups is produced previously.
And the utterance audio | voice reception order of a group shall be the order which the total time or average time of the utterance area of the some user in each group is separated from the predetermined probability range of probability distribution.
When the probability distribution is a normal distribution, for example, when the deviation from the average of a plurality of groups deviates from ± 1σ, it can be determined that discussion within the group is not active.
According to the third embodiment, since the discussion within the group is not active, the speech voice is collected and analyzed earlier in the group farther from the average probability distribution than in the predetermined probability range than the other groups. Can do.

＜第４の実施形態：長時間発話区間の数に基づく順序決定方法＞
図５（ｄ）によれば、グループの発話音声受信順序は、各グループにおける複数のユーザの発話区間について、所定閾値時間以上となる「長時間発話区間の数」が少ない順である。
長時間発話区間の数が少ないほど、相づち、つなぎ言葉、賛成、質問・否定など、自分の意見を話していない場合が多い。一方で、長時間発話区間の数が多いほど、自分の意見、説明、内容のある話をしている場合が多い。
第４の実施形態によれば、グループ内の議論が活発でないために、長時間発話区間の数が少ないグループほど、他のグループよりも早く、発話音声を収集して分析することができる。 <Fourth embodiment: Order determination method based on the number of long speech segments>
According to FIG. 5D, the utterance voice reception order of the groups is the order in which “the number of long-time utterance sections” that is equal to or longer than the predetermined threshold time is small for the utterance sections of a plurality of users in each group.
The smaller the number of utterance sections for a long time, the more often they are not speaking their opinions, such as matching, connecting words, approval, questions and denials. On the other hand, as the number of utterance sections for a long time increases, there are many cases where the story has its own opinion, explanation, and content.
According to the fourth embodiment, since the discussion within the group is not active, a group with a smaller number of long-term utterance intervals can collect and analyze utterance speech earlier than other groups.

＜第５の実施形態：長時間発話区間の数の割合に基づく順序決定方法＞
図５（ｅ）によれば、グループの発話音声受信順序は、各グループにおける複数のユーザの発話区間について、所定閾値時間よりも短い「短時間発話区間の数」に対する、所定閾値時間以上となる「長時間発話区間の数」の割合が小さい順である。
割合＝長時間発話区間の数／短時間発話区間の数
この割合が小さいほど、相づち、つなぎ言葉、賛成、質問・否定など、自分の意見を話していない場合が多い。一方で、この割合が大きいほど、自分の意見、説明、内容のある話をしている場合が多い。
第５の実施形態によれば、グループ内の議論が活発でないために、この割合が小さいグループほど、他のグループよりも早く、発話音声を収集して分析することができる。 <Fifth Embodiment: Order Determination Method Based on the Ratio of the Number of Long Speaking Sections>
According to FIG.5 (e), the speech reception order of a group becomes more than predetermined threshold time with respect to "the number of short time speech areas" shorter than predetermined threshold time about the speech area of the some user in each group. The ratio of “the number of long-time utterance sections” is in ascending order.
Proportion = number of long utterance sections / number of short utterance sections The smaller this ratio, the more often they are not speaking their opinions, such as collusion, connecting words, approval, questions / denials, etc. On the other hand, the higher this ratio, the more often people are talking about their opinions, explanations, and contents.
According to the fifth embodiment, since discussions within a group are not active, a group with a smaller ratio can collect and analyze utterance voices earlier than other groups.

図６は、本発明におけるグループ内のユーザの発話パターンに基づく発話音声受信順序の決定方法を表す説明図である。 FIG. 6 is an explanatory diagram showing a method for determining the speech reception order based on the speech patterns of the users in the group according to the present invention.

図６によれば、グループの各ユーザの発話区間の合計時間又は平均時間がユーザ間で大きく異なるグループを、優先的な発話音声受信順序として決定したものである。 According to FIG. 6, a group in which the total time or average time of the utterance sections of the users of the group greatly differs among users is determined as the priority speech reception order.

＜第６の実施形態：所定閾値時間からの差分に基づく順序決定方法＞
図６（ａ）によれば、グループの発話音声受信順序は、各ユーザの発話区間の合計時間（又は平均時間）と所定閾値時間との差分となる絶対差分時間を算出し、その総時間が大きい順である。所定閾値時間は、グループ内のユーザの平均発話時間であってもよいし、想定する任意の時間であってもよい。
例えば図６（ａ）の場合、グループの各ユーザの発話区間の合計時間の長短が大きく異なるようなパターンを抽出することできる。
また、図６（ａ）以外のパターンとして、ユーザ全員の発話区間の合計時間が、所定閾値時間より短く、その差分の合計時間が大きい場合も抽出することができる。
更に、ユーザ全員の発話区間の合計時間が、所定閾値時間より長く、その差分の合計時間が大きい場合も抽出することができる。この場合は、例えば、書いた文章を読んで発表している時間が長いケースや、主張して話す時間が長すぎて話が聞かれていないケースなど、話し合いとして問題がある場合も抽出することができる。 <Sixth Embodiment: Order Determination Method Based on Difference from Predetermined Threshold Time>
According to FIG. 6 (a), the utterance voice reception order of the group calculates an absolute difference time which is a difference between the total time (or average time) of each user's utterance section and a predetermined threshold time, and the total time In descending order. The predetermined threshold time may be an average utterance time of users in the group, or may be an arbitrary time that is assumed.
For example, in the case of FIG. 6A, it is possible to extract a pattern in which the total duration of the utterance section of each user in the group is greatly different.
Further, as a pattern other than FIG. 6A, it is possible to extract the case where the total time of the utterance sections of all the users is shorter than the predetermined threshold time and the total time of the difference is large.
Furthermore, it is possible to extract the case where the total time of all the user's utterance sections is longer than the predetermined threshold time and the total time of the difference is large. In this case, for example, if there is a problem as a discussion, such as a case where the written sentence is read and published for a long time, or a case where the talk is asserted and the talk is not heard for too long Can do.

＜第７の実施形態：確率分布を用いた差分に基づく順序決定方法＞
図６（ｂ）によれば、過去の多数のグループにおける複数のユーザの発話区間の合計時間又は平均時間に基づく確率分布を予め作成している。そして、グループの発話音声受信順序は、グループの各ユーザについて確率分布における確率値の合計値を算出し、その合計値が小さい順である。
例えば図６（ｂ）の場合、グループの各ユーザの発話時間の合計時間（又は平均時間）の長短が大きく異なるパターンを抽出することができる。
また、図６（ｂ）以外のパターンとして、グループのユーザ全員の発話区間の合計時間が短く、その確率値の合計が小さい場合も抽出することができる。
一方で、グループのユーザ全員の発話区間の合計時間が長く、その確率値の合計が小さい場合も抽出することができる。この場合は、例えば、書いた文章を読んで発表している時間が長いケースや、主張して話す時間が長すぎて話が聞かれていないケースなど、話し合いとして問題がある場合も抽出することができる。 <Seventh Embodiment: Order Determination Method Based on Difference Using Probability Distribution>
According to FIG.6 (b), the probability distribution based on the total time or average time of the speech area of the some user in many past groups is produced previously. And the utterance voice reception order of the group is the order in which the total value of the probability values in the probability distribution is calculated for each user of the group and the total value is small.
For example, in the case of FIG. 6B, it is possible to extract patterns in which the total time (or average time) of the utterance time of each user in the group is greatly different.
Moreover, as a pattern other than FIG. 6B, it is possible to extract a case where the total time of the utterance sections of all the users in the group is short and the sum of the probability values is small.
On the other hand, it is possible to extract the case where the total time of the utterance sections of all the users in the group is long and the sum of the probability values is small. In this case, for example, if there is a problem as a discussion, such as a case where the written sentence is read and published for a long time, or a case where the talk is asserted and the talk is not heard for too long Can do.

図７は、本発明における発話分析サーバ及びユーザ端末の機能構成図である。 FIG. 7 is a functional configuration diagram of the utterance analysis server and the user terminal according to the present invention.

各ユーザ端末２は、発話音声蓄積部２０と、音声区間検出情報送信部２１と、発話音声送信部２２とを有する。
発話音声蓄積部２０は、ユーザによって発話された発話音声を収録する。
音声区間検出情報送信部２１は、連続する所定の音声区間毎に、１つ以上の発話区間に端末ＩＤ及びグループＩＤを対応付けたＶＡＤ情報（音声区間検出情報）を、発話分析サーバへ送信する（前述した図２のＳ１参照）。
発話音声送信部２２は、発話分析サーバ１から発話音声要求を受信した際に、当該発話分析サーバ２へ発話音声を送信する（前述した図２のＳ３参照）。 Each user terminal 2 includes an utterance voice accumulation unit 20, an audio segment detection information transmission unit 21, and an utterance voice transmission unit 22.
The utterance voice accumulation unit 20 records the utterance voice uttered by the user.
The voice section detection information transmitting unit 21 transmits VAD information (voice section detection information) in which a terminal ID and a group ID are associated with one or more utterance sections to the utterance analysis server for each predetermined continuous voice section. (See S1 in FIG. 2 described above).
When receiving the utterance voice request from the utterance analysis server 1, the utterance voice transmission unit 22 transmits the utterance voice to the utterance analysis server 2 (see S3 in FIG. 2 described above).

発話分析サーバ１は、発話音声蓄積部１０と、発話音声受信順序決定部１１と、発話音声受信部１２とを有する。
発話音声蓄積部２０は、ユーザ端末２から受信した発話音声を蓄積する。
発話音声受信順序決定部１１は、グループ毎に複数のユーザ端末から複数のＶＡＤ情報（音声区間検出情報）を受信し、音声区間に含まれる複数の発話区間における所定条件に基づいて、グループの発話音声受信順序を決定する（前述した図２のＳ２参照）。
発話音声受信部１２は、発話音声受信順序のグループ単位で、複数のユーザ端末へ発話区間に対応した発話音声要求を送信し、各ユーザ端末から発話音声を受信する（前述した図２のＳ３参照）。 The utterance analysis server 1 includes an utterance voice accumulation unit 10, an utterance voice reception order determination unit 11, and an utterance voice reception unit 12.
The utterance voice storage unit 20 stores the utterance voice received from the user terminal 2.
The utterance voice reception order determination unit 11 receives a plurality of VAD information (voice section detection information) from a plurality of user terminals for each group, and utters a group based on a predetermined condition in the plurality of utterance sections included in the voice section. The voice reception order is determined (see S2 in FIG. 2 described above).
The utterance voice receiving unit 12 transmits the utterance voice request corresponding to the utterance section to a plurality of user terminals in units of the utterance voice reception order, and receives the utterance voice from each user terminal (see S3 in FIG. 2 described above). ).

図８は、本発明における第２のシーケンス図である。 FIG. 8 is a second sequence diagram in the present invention.

図８のシステムは、図２と比較して更に、管理者端末３と、通知サーバ５とを更に有する。
管理者端末３は、グループワークの教師によって操作されるものであり、複数のグループを管理する。
通知サーバ５は、管理者端末３からの指示に応じて、複数のユーザ端末へ発話区間に対応した発話音声要求を送信する。 The system of FIG. 8 further includes an administrator terminal 3 and a notification server 5 as compared with FIG.
The manager terminal 3 is operated by a group work teacher and manages a plurality of groups.
The notification server 5 transmits an utterance voice request corresponding to an utterance section to a plurality of user terminals in response to an instruction from the administrator terminal 3.

（Ｓ１）各ユーザ端末が、連続する所定の音声区間毎に、１つ以上の発話区間に端末識別子及びグループ識別子を対応付けたＶＡＤ情報（音声区間検出情報）を、発話分析サーバ１へ送信する。
（Ｓ２１）発話分析サーバ１は、グループ毎に複数のユーザ端末から複数の音声区間検出情報を受信し、音声区間に含まれる複数の発話区間における所定条件に基づいて、グループの発話音声受信順序を決定する。
（Ｓ２２）発話分析サーバ１は、当該発話音声受信順序を管理者端末３へ送信する。
（Ｓ２３）管理者端末３は、管理者の操作に応じて、発話音声受信順序に基づくグループ識別子を、通知サーバ５へ送信する。
（Ｓ３）通知サーバ５が、管理者端末から受信したグループ識別子に属する複数のユーザ端末へ発話区間に対応した発話音声要求を送信する。
これに対し、発話音声要求を受信した各ユーザ端末は、発話音声を発話分析サーバへ送信する。 (S1) Each user terminal transmits VAD information (voice section detection information) in which a terminal identifier and a group identifier are associated with one or more utterance sections to the utterance analysis server 1 for each predetermined continuous voice section. .
(S21) The utterance analysis server 1 receives a plurality of speech segment detection information from a plurality of user terminals for each group, and sets the utterance speech reception order of the groups based on predetermined conditions in the plurality of utterance segments included in the speech segment. decide.
(S22) The utterance analysis server 1 transmits the utterance voice reception order to the administrator terminal 3.
(S23) The administrator terminal 3 transmits a group identifier based on the speech reception order to the notification server 5 according to the operation of the administrator.
(S3) The notification server 5 transmits an utterance voice request corresponding to the utterance section to a plurality of user terminals belonging to the group identifier received from the administrator terminal.
On the other hand, each user terminal that has received the utterance voice request transmits the utterance voice to the utterance analysis server.

更に、管理者端末３は、管理者の操作に応じて、ユーザ端末２の端末ＩＤ毎に、その発話音声受信要求を、通知サーバ５へ送信するものであってもよい。
この場合、通知サーバ５は、管理者端末３から指示された端末ＩＤのユーザ端末へ発話区間に対応した発話音声要求をプッシュ的に送信する。
発話音声要求を受信した当該ユーザ端末２は、発話音声を発話分析サーバ１へ送信する。
発話分析サーバ１は、ユーザ端末２の発話音声を分析すると共に、その分析結果を端末ＩＤと共に、管理者端末３へ送信する。
これを繰り返すことによって、管理者端末３は、ユーザ端末１毎の発話分析結果を、管理者へ明示することができる。 Furthermore, the administrator terminal 3 may transmit the utterance voice reception request to the notification server 5 for each terminal ID of the user terminal 2 in accordance with the operation of the administrator.
In this case, the notification server 5 pushes an utterance voice request corresponding to the utterance section to the user terminal having the terminal ID instructed from the administrator terminal 3 in a push manner.
The user terminal 2 that has received the utterance voice request transmits the utterance voice to the utterance analysis server 1.
The utterance analysis server 1 analyzes the utterance voice of the user terminal 2 and transmits the analysis result to the manager terminal 3 together with the terminal ID.
By repeating this, the manager terminal 3 can clearly indicate the utterance analysis result for each user terminal 1 to the manager.

前述した図２のシーケンスによれば、発話分析サーバ１が、ユーザ端末２へ発話音声要求を送信している。しかしながら、例えばＨＴＴＰ(HyperText Transfer Protocol)の場合、クライアント（ユーザ端末にインストールされたブラウザアプリケーション）がサーバ（発話分析サーバ）へリクエストを送信し、当該サーバからクライアントへレスポンスを返信する要求応答型のPull方式となる。そのために、発話分析サーバ１が、例えばスマートフォンのようなユーザ端末２へ、プッシュ的に送信することが難しい。
これに対し、図８のシーケンスによれば、図２のシーケンスと比較して、ユーザ端末２から発話音声を受信し且つ分析する発話分析サーバ１と、ユーザ端末２へ発話音声要求を送信する通知サーバ５とに分担して実現している。 According to the sequence of FIG. 2 described above, the utterance analysis server 1 transmits an utterance voice request to the user terminal 2. However, for example, in the case of HTTP (HyperText Transfer Protocol), a request response type Pull in which a client (browser application installed in a user terminal) transmits a request to a server (utterance analysis server) and returns a response from the server to the client. It becomes a method. Therefore, it is difficult for the utterance analysis server 1 to transmit in a push manner to the user terminal 2 such as a smartphone.
On the other hand, according to the sequence of FIG. 8, compared with the sequence of FIG. 2, the utterance analysis server 1 that receives and analyzes the utterance voice from the user terminal 2, and the notification that transmits the utterance voice request to the user terminal 2 This is shared with the server 5.

通知サーバ５におけるプッシュ的な情報送信機能として、例えば、JavaScript（登録商標）のWeb Notification APIや、Google（登録商標）のFirebase Notification（登録商標）、Apple（登録商標）のPush Notificationなどがある。これらのＡＰＩ(Application Programming Interface)に対応したアプリケーションを、ユーザ端末２にインストールしておくことによって、サーバプッシュを実現することができる。 Examples of the push information transmission function in the notification server 5 include JavaScript (registered trademark) Web Notification API, Google (registered trademark) Firebase Notification (registered trademark), and Apple (registered trademark) Push Notification. By installing an application corresponding to these APIs (Application Programming Interface) in the user terminal 2, a server push can be realized.

図９は、本発明における通知サーバを含むシステム構成図である。 FIG. 9 is a system configuration diagram including a notification server in the present invention.

図９のシステムは、図８のシーケンスに対応したものである。
各ユーザ端末２は、図７と比較して、発話音声送信部２２のみが相違する。図９の発話音声送信部２２は、通知サーバ５から発話音声要求を受信した際に、発話分析サーバ１へ発話音声を送信する。
発話分析サーバ１は、図７と比較して、発話受信順序送信部１３のみが相違する。図９の発話受信順序送信部１３は、発話音声受信順序を、管理者端末３へ送信する。 The system of FIG. 9 corresponds to the sequence of FIG.
Each user terminal 2 is different from FIG. 7 only in the utterance voice transmission unit 22. The utterance voice transmitting unit 22 in FIG. 9 transmits the utterance voice to the utterance analysis server 1 when receiving the utterance voice request from the notification server 5.
The speech analysis server 1 is different from the speech analysis server 1 only in the speech reception order transmission unit 13. The utterance reception order transmission unit 13 in FIG. 9 transmits the utterance voice reception order to the administrator terminal 3.

管理者端末３は、発話分析サーバ１から発話受信順序を受信し、その発話受信順序を管理者（教師）へ明示する。そして、管理者の操作に応じて、発話音声受信順序に基づくグループＩＤ、及び、そのグループＩＤに含まれるユーザ端末ＩＤを、通知サーバ５へ送信する。 The administrator terminal 3 receives the utterance reception order from the utterance analysis server 1 and clearly indicates the utterance reception order to the administrator (teacher). Then, according to the operation of the administrator, the group ID based on the speech reception order and the user terminal ID included in the group ID are transmitted to the notification server 5.

通知サーバ５は、管理者端末３から受信したグループＩＤに属する複数のユーザ端末２へ、発話音声要求をプッシュ的に送信する。 The notification server 5 pushes an utterance voice request to a plurality of user terminals 2 belonging to the group ID received from the administrator terminal 3 in a push manner.

＜通知サーバ５による他の実施形態＞
図８及び図９のシステムによれば、管理者端末３からの指示によって、通知サーバ５を用いて、ユーザ端末２に対する様々な制御を実行することができる。
ここで、ユーザ端末２は、専用の発話収録アプリケーションをインストールしており、管理者端末３も、専用の教師制御アプリケーションをインストールしている。
ユーザ端末２の発話収録アプリケーションは、初期設定として固有の端末ＩＤを生成し、その端末ＩＤ及びグループＩＤを発話分析サーバ１へ予め登録する。そして、発話収録アプリケーションは、通知サーバ５から受信する指示情報を解読し、それに応じて機能を発動することができる。
管理者端末３の教師制御アプリケーションは、発話分析サーバ１から受信したグループＩＤ及び端末ＩＤを、ディスプレイによって管理者へ明示する。そして、教師制御アプリケーションは、グループＩＤ毎又は端末ＩＤ毎に、ユーザ端末２に対して通知サーバ２を介して指示情報を送信することができる。 <Other embodiment by the notification server 5>
8 and 9, various controls for the user terminal 2 can be executed using the notification server 5 according to instructions from the administrator terminal 3.
Here, the user terminal 2 has installed a dedicated utterance recording application, and the administrator terminal 3 has also installed a dedicated teacher control application.
The utterance recording application of the user terminal 2 generates a unique terminal ID as an initial setting, and registers the terminal ID and group ID in the utterance analysis server 1 in advance. Then, the utterance recording application can decode the instruction information received from the notification server 5 and activate the function accordingly.
The teacher control application of the administrator terminal 3 clearly indicates the group ID and terminal ID received from the utterance analysis server 1 to the administrator using the display. The teacher control application can transmit instruction information to the user terminal 2 via the notification server 2 for each group ID or each terminal ID.

具体的には、以下のような他の実施形態を実現することもできる。
［第１の他の実施形態：ユーザ端末２に対する発話収録の開始／終了の制御］
管理者端末３からの指示情報によって、通知サーバ５は、ユーザ端末２の音声区間の開始／終了を同期させる。グループワークを時系列で正確に発話分析するためには、複数のユーザ端末２における音声区間を同期させることが必要となる。
管理者端末３は、管理者における発話収録の開始／終了のボタン操作によって、全てのユーザ端末ＩＤを含む音声区間同期要求（開始／終了）が、通知サーバ５へ送信される。
これに対して、通知サーバ５は、複数のユーザ端末２へ同報的に、音声区間同期要求（開始／終了）を送信する。
音声区間同期要求（開始／終了）を受信したユーザ端末２は、グループワーク時間における発話音声の収録を開始し又は終了する。
これによって、グループワーク時間及び音声区間を、全てのユーザ端末２について同期させることでき、収録の時間差が無くなり、時系列に正確な発話分析をすることができる。 Specifically, the following other embodiments can be realized.
[First Other Embodiment: Control of Start / End of Utterance Recording for User Terminal 2]
Based on the instruction information from the administrator terminal 3, the notification server 5 synchronizes the start / end of the voice section of the user terminal 2. In order to accurately analyze the utterance of the group work in time series, it is necessary to synchronize the voice sections in the plurality of user terminals 2.
The administrator terminal 3 transmits a voice segment synchronization request (start / end) including all user terminal IDs to the notification server 5 by a button operation for starting / ending utterance recording by the administrator.
In response to this, the notification server 5 transmits a voice segment synchronization request (start / end) to the plurality of user terminals 2 in a broadcast manner.
The user terminal 2 that has received the voice segment synchronization request (start / end) starts or ends the recording of the uttered voice during the group work time.
As a result, the group work time and the voice section can be synchronized with respect to all the user terminals 2, the recording time difference is eliminated, and accurate speech analysis can be performed in time series.

［第２の他の実施形態：ユーザ端末２の音声区間の時間幅／区切りタイミングの制御］
管理者端末３からの指示情報によって、通知サーバ５は、ユーザ端末２の音声区間の時間幅／区切りタイミングを制御する。これによって、固定的な音声区間ではなく、グループワークの内容に応じて音声区間を可変制御することができる。
制御可能なものとしては、音声区間の時間幅（例えば5分間->10分間）、又は、区切りタイミング（例えば11:05、11:15、11:20など）がある。
例えば最初は、音声区間の時間幅を長くしてグループワークの発話状況を分析し、その後、音声区間の時間幅を短くしてグループワークの発話状況を分析する。これによって、教師の必要に応じて、発話音声の分析結果に応じてアドバイスを提供することができる。 [Second Other Embodiment: Control of Time Width / Separation Timing of Voice Section of User Terminal 2]
Based on the instruction information from the administrator terminal 3, the notification server 5 controls the time width / separation timing of the voice section of the user terminal 2. As a result, the voice section can be variably controlled according to the contents of the group work, not the fixed voice section.
As controllable ones, there are time widths of voice sections (for example, 5 minutes-> 10 minutes) or break timings (for example, 11:05, 11:15, 11:20, etc.).
For example, at first, the duration of the voice section is increased to analyze the utterance situation of the group work, and then the duration of the voice section is reduced to analyze the utterance situation of the group work. As a result, advice can be provided according to the analysis result of the uttered voice as required by the teacher.

［第３の他の実施形態：ユーザ端末２の起動状態の認識］
管理者端末３からの指示情報によって、通知サーバ５は、ユーザ端末２の電源ＯＮ／ＯＦＦを確認する。
管理者端末３は、管理者における起動確認のボタン操作によって、全てのユーザ端末ＩＤを含む起動確認要求が、通知サーバ５へ送信される。
これに対して、通知サーバ５は、複数のユーザ端末２へ同報的に、起動確認要求を送信する。
起動確認要求を受信したユーザ端末２は、起動応答を発話分析サーバ１へ送信する。
発話分析サーバ１は、起動応答を受信した端末ＩＤのリストを、管理者端末３へ送信する。
これによって、管理者端末３は、端末ＩＤのリストを管理者へ明示し、管理者は起動中の端末ＩＤを認識することができる。これは、欠席者の確認にも利用できる。 [Third Other Embodiment: Recognition of Activation State of User Terminal 2]
Based on the instruction information from the administrator terminal 3, the notification server 5 confirms whether the user terminal 2 is powered on / off.
The administrator terminal 3 transmits an activation confirmation request including all user terminal IDs to the notification server 5 by an activation confirmation button operation by the administrator.
In response to this, the notification server 5 transmits an activation confirmation request to the plurality of user terminals 2 in a broadcast manner.
The user terminal 2 that has received the activation confirmation request transmits an activation response to the utterance analysis server 1.
The utterance analysis server 1 transmits a list of terminal IDs that have received the activation response to the administrator terminal 3.
As a result, the administrator terminal 3 clearly indicates the list of terminal IDs to the administrator, and the administrator can recognize the active terminal ID. This can also be used to check for absentees.

以上詳細に説明したように、本発明の発話音声収集方法、システム、発話分析サーバ及びプログラムによれば、グループワークにおける発話音声の分析に輻輳が生じないようにすると共に、議論が活発でないグループから優先的に発話音声を受信することができる。 As explained in detail above, according to the speech voice collection method, system, speech analysis server and program of the present invention, it is possible to prevent congestion in the analysis of speech voice in group work, and from a group where discussion is not active. Speech speech can be received preferentially.

以上に述べた本発明の種々の実施形態について、本発明の技術思想及び見地の範囲の種々の変更、修正及び省略は、当業者によれば容易に行うことができる。前述の説明はあくまで例であって、何ら制約しようとするものではない。本発明は、特許請求の範囲及びその均等物として限定するものにのみ制約される。 For the various embodiments of the present invention described above, various changes, modifications, and omissions in the technical idea and scope of the present invention can be easily made by those skilled in the art. The above description is merely an example, and is not intended to be restrictive. The invention is limited only as defined in the following claims and the equivalents thereto.

１発話分析サーバ
１０発話音声蓄積部
１１発話音声受信順序決定部
１２発話音声受信部
２ユーザ端末
２０発話音声蓄積部
２１音声区間検出情報送信部
２２発話音声送信部
３管理者端末
４アクセスポイント
５通知サーバ
DESCRIPTION OF SYMBOLS 1 Utterance analysis server 10 Utterance voice storage part 11 Utterance voice reception order determination part 12 Utterance voice reception part 2 User terminal 20 Utterance voice storage part 21 Voice section detection information transmission part 22 Utterance voice transmission part 3 Administrator terminal 4 Access point 5 Notification server

Claims

In a method for collecting speech sounds of a system having a plurality of user terminals that record speech sounds for each user and a speech analysis server that receives speech sounds from each user terminal via a network,
There are multiple groups of multiple users,
A first step in which each user terminal transmits, to the utterance analysis server, voice section detection information in which a terminal identifier and a group identifier are associated with one or more utterance sections for each predetermined continuous voice section;
The utterance analysis server receives a plurality of speech segment detection information from a plurality of user terminals for each group, and determines a group speech speech reception order based on predetermined conditions in a plurality of utterance segments included in the speech segment. Two steps,
The utterance analysis server includes a third step of transmitting an utterance voice request corresponding to the utterance section to a plurality of user terminals and receiving the utterance voice from each user terminal in a group unit of the utterance voice reception order. A speech voice collection method for a system characterized by the above.

2. The second step according to claim 1, wherein the utterance voice reception order of the groups based on the predetermined condition is an order in which the total time or average time of the utterance sections of a plurality of users in each group is short. How to collect voices from the system.

Create a probability distribution based on the total time or average time of the utterance sections of a plurality of users in a number of past groups in advance,
Regarding the second step, the speech reception order of the groups based on the predetermined condition is the order in which the total time or average time of the speech sections of a plurality of users in each group is out of the predetermined probability range of the probability distribution. The method of collecting speech of the system according to claim 1, wherein:

Regarding the second step, the utterance voice reception order of the groups based on the predetermined condition is the order in which “the number of long-time utterance sections” that is equal to or longer than a predetermined threshold time is small for the utterance sections of a plurality of users in each group. The utterance voice collection method of the system according to claim 1.

In the second step, the speech reception order of the groups based on the predetermined condition is equal to or greater than a predetermined threshold time for the “number of short-time speech sections” shorter than the predetermined threshold time for the speech sections of a plurality of users in each group The method according to claim 1, wherein the ratio of “the number of long-term utterance sections” is ascending in order.

Regarding the second step, the speech reception order of the group based on the predetermined condition is calculated as an absolute difference time that is a difference between a total time or an average time of each user's utterance section and a predetermined threshold time, and the total time The method according to claim 1, wherein the utterance voice collection method is in the descending order.

Regarding the second step, the speech reception order of the group based on the predetermined condition is:
Create a probability distribution based on the total time or average time of the utterance sections of a plurality of users in a number of past groups in advance,
The total value of the probability values in the probability distribution is calculated for each user of the group, and the total value is in ascending order.
The method for collecting speech sounds of the system according to claim 1.

The method for collecting utterance voices of a system according to any one of claims 1 to 7, wherein the utterance section is a start time and an end time of a user's utterance.

Multiple user terminals that record speech for each user;
An utterance analysis server that receives the uttered voice from each user terminal via a network;
An administrator terminal that manages multiple groups,
An utterance voice collection method of a system having a notification server that transmits an utterance voice request corresponding to the utterance section to a plurality of user terminals,
There are multiple groups of multiple users,
A first step in which each user terminal transmits, to the utterance analysis server, voice section detection information in which a terminal identifier and a group identifier are associated with one or more utterance sections for each predetermined continuous voice section;
The utterance analysis server receives a plurality of voice section detection information from a plurality of user terminals for each group, determines a group voice reception order based on predetermined conditions in a plurality of utterance sections included in the voice section, A second step of transmitting the speech reception order to the manager terminal;
A third step in which the administrator terminal transmits a group identifier based on the speech reception order to the notification server in response to an operation of the administrator;
A fourth step in which the notification server transmits an utterance voice request corresponding to the utterance section to a plurality of user terminals belonging to the group identifier received from the administrator terminal;
A utterance voice collection method for a system, comprising: a fifth step in which each user terminal that has received the utterance voice request transmits the utterance voice to the utterance analysis server.

In a system having a plurality of user terminals that record utterance voices for each user and an utterance analysis server that receives the utterance voices from each user terminal via a network,
There are multiple groups of multiple users,
Each user terminal
Voice segment detection information transmitting means for transmitting voice segment detection information in which a terminal identifier and a group identifier are associated with one or more utterance segments to the utterance analysis server for each predetermined continuous speech segment;
Utterance voice transmission means for transmitting the utterance voice to the utterance analysis server when the utterance voice request is received from the utterance analysis server,
The speech analysis server
Utterance voice reception order determination means for receiving a plurality of voice section detection information from a plurality of user terminals for each group and determining a utterance voice reception order of the group based on a predetermined condition in a plurality of utterance sections included in the voice section; ,
A speech voice receiving means for transmitting a speech voice request corresponding to the speech period to a plurality of user terminals and receiving a speech voice from each user terminal in units of groups of the speech voice reception order; .

Multiple user terminals that record speech for each user;
An utterance analysis server that receives the uttered voice from each user terminal via a network;
An administrator terminal that manages multiple groups,
A system having a notification server that transmits an utterance voice request corresponding to the utterance section to a plurality of user terminals,
There are multiple groups of multiple users,
Each user terminal
Voice segment detection information transmitting means for transmitting voice segment detection information in which a terminal identifier and a group identifier are associated with one or more utterance segments to the utterance analysis server for each predetermined continuous speech segment;
An utterance voice transmitting means for transmitting an utterance voice to the utterance analysis server when an utterance voice request is received from the notification server;
The speech analysis server
Utterance voice reception order determination means for receiving a plurality of voice section detection information from a plurality of user terminals for each group and determining a utterance voice reception order of the group based on a predetermined condition in a plurality of utterance sections included in the voice section; ,
An utterance voice reception order transmission means for transmitting the utterance voice reception order to the manager terminal;
Utterance voice receiving means for receiving utterance voice from the user terminal,
The manager terminal sends a group identifier based on the utterance voice reception order to the notification server according to an operation of the manager,
The notification server transmits an utterance voice request corresponding to the utterance section to a plurality of user terminals belonging to the group identifier received from the administrator terminal.

In the utterance analysis server that receives the uttered voice via a network from a plurality of user terminals that record the uttered voice for each user,
There are multiple groups of multiple users,
Voice segment detection information receiving means for receiving, from each user terminal, voice segment detection information in which a terminal identifier and a group identifier are associated with one or more speech segments for each predetermined continuous voice segment;
Utterance voice reception order determination means for determining a utterance voice reception order of a group based on a predetermined condition in a plurality of utterance sections included in the voice section using the plurality of voice section detection information;
Utterance voice receiving means for transmitting an utterance voice request corresponding to the utterance section to a plurality of user terminals and receiving the utterance voice from each user terminal in a group unit of the utterance voice reception order Analysis server.

In a program for functioning a computer installed in an utterance analysis server that receives the utterance voice via a network from a plurality of user terminals that record utterance voice for each user,
There are multiple groups of multiple users,
Voice segment detection information receiving means for receiving, from each user terminal, voice segment detection information in which a terminal identifier and a group identifier are associated with one or more speech segments for each predetermined continuous voice segment;
Utterance voice reception order determination means for determining a utterance voice reception order of a group based on a predetermined condition in a plurality of utterance sections included in the voice section using the plurality of voice section detection information;
The computer is caused to function as an utterance voice receiving unit that transmits an utterance voice request corresponding to the utterance section to a plurality of user terminals in units of groups of the utterance voice reception order, and receives the utterance voice from each user terminal. A program for the utterance analysis server.