JP2017220807A

JP2017220807A - Voice data collection system

Info

Publication number: JP2017220807A
Application number: JP2016114027A
Authority: JP
Inventors: 和人橋本; Kazuto Hashimoto; 久美子小島; Kumiko Kojima
Original assignee: Hitachi Systems Ltd
Current assignee: Hitachi Systems Ltd
Priority date: 2016-06-08
Filing date: 2016-06-08
Publication date: 2017-12-14

Abstract

PROBLEM TO BE SOLVED: To collect voice data naturally, efficiently and securely.SOLUTION: A voice data collection system 1, which collects voice data of the utterance of a user 2, includes: a voice data collection environment, including a PBX 10, a CTI server 20 and an IVR server 30 to acquire, as voice data, the utterance of the user 2 in speech communication on the telephone received from the user 2; and a voice analysis server 40 which is connected to each IVR server 30 through a network to process the voice data acquired from each IVR server 30 to enable an information processing terminal 53 of the user 2 to read the processing result. In the speech communication on the telephone received from the user 2, the voice data collection environment issues a query to the user 2 to ask for an answer, to record the utterance of the answer from the user 2 to obtain voice data, so as to repeat the issue of queries and the acquisition of answer voice data from the user 2 until the voice data reaches the predetermined number.SELECTED DRAWING: Figure 1

Description

本発明は、特定・不特定の多数のユーザからデータを収集する技術に関し、特に、ユーザの音声のデータを収集する音声データ収集システムに適用して有効な技術に関するものである。 The present invention relates to a technique for collecting data from a large number of specific and unspecified users, and more particularly to a technique that is effective when applied to a voice data collection system that collects voice data of users.

本人でも分かりづらい常に変化している心の状態を、人が日常的に発する音声に基づいて把握するという音声病態分析技術がある。この技術では、人の声に含まれる、その人が自然に出してしまった不随意の成分（本音）と、自分が相手に伝えようとして出した随意の成分（建前）のうち、不随意の成分に基づいて感情の状態を数値化・可視化する。 There is a speech pathological analysis technique that grasps a constantly changing mental state that is difficult for the person to understand based on speech that a person utters daily. In this technology, the involuntary component included in the voice of the person (the real intention) that the person naturally put out and the voluntary component that the person tried to convey to the other party (the erection) Quantify and visualize emotional states based on ingredients.

この技術を利用して、人が日常的に発する音声を収集・分析して心の状態をモニタリングし、分析結果を可視化することができるＭＩＭＯＳＹＳ（Mind Monitoring Systems、非特許文献１、登録商標（以下同様））というシステムも開発されている（詳細は、例えば、特開２０１５−１２８５７９号公報（特許文献１）を参照）。これによれば、心の状態が普通、上向き、活発である状態や、時には低調で休息が必要である状態等、ユーザのストレスや心の状態を計測して数値として表すことができる。そして、この技術を広く活用することにより、「うつ」状態等の心身の異常を本人が自覚する前に早期に検知し、適切な治療や対処を行うことで未然に疾病を防ぐことが可能になると期待される。 Using this technology, MIMSYS (Mind Monitoring Systems, Non-Patent Document 1, Registered Trademark (hereinafter referred to as the “registered trademark”)) can collect and analyze voices uttered daily by a person, monitor the state of the mind, and visualize the analysis results. The same system is also developed (see, for example, JP-A-2015-128579 (Patent Document 1) for details). According to this, it is possible to measure and express the user's stress and the state of the heart as a numerical value, such as a state in which the state of the heart is normal, upward, and active, or a state in which the state is low and sometimes needs rest. And by widely utilizing this technology, it is possible to detect illnesses such as “depression” at an early stage before the person becomes aware of it, and prevent diseases by taking appropriate treatment and countermeasures. Expected to be.

特開２０１５−１２８５７９号公報Japanese Patent Laying-Open No. 2015-128579

“ＭＩＭＯＳＹＳ｜音声病態分析技術のＰＳＴ株式会社”、［online］、ＰＳＴ株式会社、［平成２８年３月１８日検索］、インターネット＜URL：http://medical-pst.com/products-2/mimosys＞"MIMOSYS | PST Corporation for Speech Pathology Analysis Technology", [online], PST Corporation, [March 18, 2016 search], Internet <URL: http://medical-pst.com/products-2/ mimosys>

上述した従来技術では、例えば、スマートフォン等の携帯端末に導入されたアプリケーションプログラムの機能により、音声の収集・解析から可視化までを行うことが可能である。 In the above-described conventional technology, for example, it is possible to perform from voice collection / analysis to visualization by the function of an application program installed in a mobile terminal such as a smartphone.

一方で、携帯端末に導入されたアプリケーションを用いる場合、ユーザが当該アプリケーションを導入し、起動した上で能動的に音声を発話しなければならない。この点、例えば、ユーザが携帯電話で一般の通話を行う際にその通話内容を取得することも有効であると考えられるが、電話による通話自体がいつ行われるか不確定である。よって、これらの手法のみでは、ユーザの音声データを収集するという点で不確実性が高く、また効率性も高くない。国民全体として疾病を未然に防ぐという目的を実現するため、幅広いユーザに様々な場面や状況において広く利用してもらえるような音声データの収集の仕組みが必要である。 On the other hand, when using an application installed in a portable terminal, the user must speak the voice actively after installing and starting the application. In this regard, for example, it is considered effective to acquire the contents of a call when the user makes a general call with a mobile phone, but it is uncertain when the call by the telephone itself is performed. Therefore, these methods alone are highly uncertain and efficient in collecting user voice data. In order to realize the purpose of preventing illness as a whole, it is necessary to have a voice data collection mechanism that can be widely used by various users in various situations and situations.

そこで本発明の目的は、より自然で効率的かつ確実に音声データを収集できる音声データ収集システムを提供することにある。 Accordingly, an object of the present invention is to provide an audio data collection system that can collect audio data more naturally, efficiently and reliably.

本発明の前記ならびにその他の目的と新規な特徴は、本明細書の記述および添付図面から明らかになるであろう。 The above and other objects and novel features of the present invention will be apparent from the description of this specification and the accompanying drawings.

本願において開示される発明のうち、代表的なものの概要を簡単に説明すれば、以下のとおりである。 Of the inventions disclosed in this application, the outline of typical ones will be briefly described as follows.

本発明の代表的な実施の形態による音声データ収集システムは、ユーザの発話に係る音声データを収集する音声データ収集システムであって、１つ以上の拠点のそれぞれに対して構成された、ＰＢＸと、ＣＴＩサーバと、ＩＶＲサーバと、を有し、前記ユーザから受けた電話での通話における前記ユーザの発話を前記音声データとして取得する音声データ収集環境と、前記各音声データ収集環境における前記各ＩＶＲサーバとネットワークを介して接続され、前記各ＩＶＲサーバから取得した前記音声データを処理して処理結果を前記ユーザの情報処理端末に対して閲覧可能とする音声分析サーバと、を有するものである。 An audio data collection system according to an exemplary embodiment of the present invention is an audio data collection system that collects audio data related to a user's utterance, and is configured for each of one or more sites. , A CTI server, and an IVR server, and a voice data collection environment for acquiring the user's utterance in a telephone call received from the user as the voice data, and each IVR in each voice data collection environment A voice analysis server that is connected to a server via a network and that processes the voice data acquired from each of the IVR servers and allows the processing result to be viewed on the information processing terminal of the user.

そして、前記音声データ収集環境は、前記ユーザから受けた電話での通話において、前記ユーザに対して回答を求める質問を発出し、前記質問に対する前記ユーザからの回答に係る発話を録音して前記音声データとし、前記音声データが所定の数に達するまで、前記質問の発出と前記ユーザからの回答に係る前記音声データの取得とを繰り返す。 The voice data collection environment issues a question asking the user for an answer in a telephone call received from the user, records an utterance related to the answer from the user to the question, and records the voice As the data, until the voice data reaches a predetermined number, the issuing of the question and the acquisition of the voice data related to the answer from the user are repeated.

本願において開示される発明のうち、代表的なものによって得られる効果を簡単に説明すれば以下のとおりである。 Among the inventions disclosed in the present application, effects obtained by typical ones will be briefly described as follows.

すなわち、本発明の代表的な実施の形態によれば、より自然で効率的かつ確実に音声データを収集できる音声データ収集システムを提供することにある。 That is, according to a typical embodiment of the present invention, it is an object to provide an audio data collection system capable of collecting audio data more naturally, efficiently and reliably.

本発明の一実施の形態である音声データ収集システムの構成例について概要を示した図である。It is the figure which showed the outline | summary about the structural example of the audio | voice data collection system which is one embodiment of this invention. 本発明の一実施の形態における音声データの収集と分析に係る処理の流れの例について概要を示した図である。It is the figure which showed the outline | summary about the example of the flow of a process concerning collection and analysis of the audio | voice data in one embodiment of this invention. （ａ）、（ｂ）は、本発明の一実施の形態における音声分析サーバでの分析結果として情報処理端末上に表示される画面の例について概要を示した図である。(A), (b) is the figure which showed the outline | summary about the example of the screen displayed on an information processing terminal as an analysis result in the audio | voice analysis server in one embodiment of this invention. 本発明の一実施の形態における音声データの収集と分析に係る処理の流れの他の例について概要を示した図である。It is the figure which showed the outline | summary about the other example of the flow of the process which concerns on collection and analysis of the audio | voice data in one embodiment of this invention. 本発明の一実施の形態におけるスマートフォンにより音声データを収集するアプリケーションの画面例について概要を示した図である。It is the figure which showed the outline | summary about the example of a screen of the application which collects audio | voice data with the smart phone in one embodiment of this invention. 本発明の一実施の形態におけるＩＶＲサーバの認証マスタＤＢのデータ構成の例について概要を示した図である。It is the figure which showed the outline | summary about the example of a data structure of authentication master DB of the IVR server in one embodiment of this invention. 本発明の一実施の形態におけるＩＶＲサーバの音声データＤＢのデータ構成の例について概要を示した図である。It is the figure which showed the outline | summary about the example of the data structure of audio | voice data DB of the IVR server in one embodiment of this invention. 本発明の一実施の形態における音声分析サーバの音声データＤＢのデータ構成の例について概要を示した図である。It is the figure which showed the outline | summary about the example of a data structure of audio | voice data DB of the audio | voice analysis server in one embodiment of this invention. 本発明の一実施の形態における音声分析サーバのユーザマスタＤＢのデータ構成の例について概要を示した図である。It is the figure which showed the outline | summary about the example of a data structure of user master DB of the audio | voice analysis server in one embodiment of this invention. 本発明の一実施の形態における音声分析サーバの分析結果ＤＢのデータ構成の例について概要を示した図である。It is the figure which showed the outline | summary about the example of the data structure of the analysis result DB of the audio | voice analysis server in one embodiment of this invention.

以下、本発明の実施の形態を図面に基づいて詳細に説明する。なお、実施の形態を説明するための全図において、同一部には原則として同一の符号を付し、その繰り返しの説明は省略する。一方で、ある図において符号を付して説明した部位について、他の図の説明の際に再度の図示はしないが同一の符号を付して言及する場合がある。 Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings. Note that components having the same function are denoted by the same reference symbols throughout the drawings for describing the embodiment, and the repetitive description thereof will be omitted. On the other hand, parts described with reference numerals in some drawings may be referred to with the same reference numerals although not illustrated again in the description of other drawings.

＜システム構成＞
図１は、本発明の一実施の形態である音声データ収集システムの構成例について概要を示した図である。本実施の形態の音声データ収集システム１は、ユーザ２が電話機５１やスマートフォン５２を介して発話した音声を取得・収集するとともに、これを上述のＭＩＭＯＳＹＳ等の音声病態分析エンジンにより分析し、その結果をユーザ２が閲覧可能なように可視化するシステムである。 <System configuration>
FIG. 1 is a diagram showing an outline of a configuration example of an audio data collection system according to an embodiment of the present invention. The voice data collection system 1 according to the present embodiment acquires and collects voices uttered by the user 2 via the telephone 51 and the smartphone 52, and analyzes the voices using a voice pathological analysis engine such as the above-mentioned MIMSSYS, and the result. Is a system for visualizing the user 2 so that the user 2 can view it.

音声データ収集システム１は、例えば、ＰＢＸ（Private Branch eXchange：構内交換機）１０、ＣＴＩ（Computer Telephony Integration）サーバ２０、ＩＶＲ（Interactive Voice Response：自動音声応答装置）サーバ３０からなる音声データ収集環境、および音声分析サーバ４０の各サブシステムからなる情報処理システムである。音声データ収集環境は、例えば、オフィスビル等の拠点毎に独立して複数構築することができる。 The voice data collection system 1 includes a voice data collection environment including, for example, a PBX (Private Branch eXchange) 10, a CTI (Computer Telephony Integration) server 20, an IVR (Interactive Voice Response) server 30, and This is an information processing system including each subsystem of the voice analysis server 40. For example, a plurality of voice data collection environments can be constructed independently for each base such as an office building.

ＰＢＸ１０は、所定の電話番号に対応して、当該電話番号への複数のユーザ２からの電話機５１による通話を受け付ける交換機である。電話番号は、例えばフリーダイヤル（登録商標）であってもよいしオフィスビル等における内線番号であってもよい。ＰＢＸ１０には、一般的に用いられるＰＢＸ機器を適宜使用することができる。なお、電話機５１は、固定電話や携帯電話、スマートフォンなど特に限定されない。内線電話であってもよい。 The PBX 10 is an exchange that accepts calls by the telephones 51 from a plurality of users 2 corresponding to a predetermined telephone number. The telephone number may be, for example, a free dial (registered trademark) or an extension number in an office building or the like. A commonly used PBX device can be used as the PBX 10 as appropriate. Note that the telephone 51 is not particularly limited, such as a fixed phone, a mobile phone, and a smartphone. It may be an extension telephone.

ＣＴＩサーバ２０は、ＰＢＸ１０により受け付けた電話機５１からの通話によるアクセスをＩＶＲサーバ３０等の情報処理システムに連携する機能を有するサーバ機器である。ＣＴＩサーバ２０についても、一般的に用いられるＣＴＩ機器を適宜利用することができる。本実施の形態では、ＣＴＩサーバ２０は、例えば、図示しないＯＳ（Operating System）やＤＢＭＳ（DataBase Management System）、ＣＴＩソフトウェア等のミドルウェア上で稼働するソフトウェアプログラムやスクリプト等として実装された収集制御部２１を有する。収集制御部２１は、後述するＩＶＲサーバ３０と連携して、ユーザ２から発話に伴う音声データを取得・収集するための一連の処理フローを実行・制御する。 The CTI server 20 is a server device having a function of linking access by a call from the telephone 51 received by the PBX 10 to an information processing system such as the IVR server 30. Also for the CTI server 20, commonly used CTI equipment can be used as appropriate. In the present embodiment, the CTI server 20 includes, for example, a collection control unit 21 implemented as a software program or script that runs on middleware such as an OS (Operating System), a DBMS (DataBase Management System), or a CTI software (not shown). Have The collection control unit 21 executes and controls a series of processing flows for acquiring and collecting voice data associated with the utterance from the user 2 in cooperation with the IVR server 30 described later.

ＩＶＲサーバ３０は、ＣＴＩサーバ２０と連携し、ユーザ２の電話機５１による通話に対して予め設定された内容に基づいて所定の音声ガイダンスを出力する機能を有するサーバ機器である。本実施の形態では、一般的なＣＴＩ／ＩＶＲによる音声案内システムやコールセンターシステムでの利用形態とは異なり、通話内容からユーザ２の音声データを取得して音声データＤＢ３４に記録するため、ユーザ２に発話を促すような音声ガイダンスを行うものである。 The IVR server 30 is a server device that cooperates with the CTI server 20 and has a function of outputting a predetermined voice guidance based on contents set in advance with respect to a call made by the telephone 2 of the user 2. In the present embodiment, unlike in a general CTI / IVR voice guidance system or call center system, the user 2's voice data is acquired from the call contents and recorded in the voice data DB 34. It provides voice guidance that encourages utterances.

ＩＶＲサーバ３０は、例えば、図示しないＯＳやＤＢＭＳ、ＩＶＲソフトウェア等のミドルウェア上で稼働するソフトウェアプログラムやスクリプト等として実装された認証部３１および音声収集部３２などの各部を有する。また、データベース等として実装された認証マスタデータベース（ＤＢ）３３および音声データＤＢ３４等の各データストアを有する。 The IVR server 30 includes various units such as an authentication unit 31 and a voice collection unit 32 that are implemented as software programs or scripts that run on middleware such as an OS, DBMS, or IVR software (not shown). Each data store includes an authentication master database (DB) 33 and a voice data DB 34 implemented as a database.

認証部３１は、電話機５１からの通話によるアクセスを行っているユーザ２に対する認証を行う機能を有する。本実施の形態では、後述するように、ユーザ２が電話機５１を操作して一連の数字をプッシュすることによりＩＤコードを入力し、入力されたＩＤコードが認証マスタＤＢ３３に登録されているか否かを認証部３１が照合することにより認証を行う。認証手法はこれに限られず、例えば、ＩＤコード以外の他の情報を入力させてこれを照合するようにしてもよい。また、後述する音声収集部３２により取得したユーザ２の音声の情報に基づいて、公知の声紋認証技術等を用いて認証を行なってもよい。 The authentication unit 31 has a function of authenticating the user 2 who is accessing by telephone from the telephone 51. In the present embodiment, as will be described later, the user 2 inputs the ID code by operating the telephone 51 and pushing a series of numbers, and whether or not the input ID code is registered in the authentication master DB 33. Is verified by the authentication unit 31. The authentication method is not limited to this. For example, information other than the ID code may be input and collated. Further, based on information of the voice of the user 2 acquired by the voice collecting unit 32 described later, authentication may be performed using a known voiceprint authentication technique or the like.

音声収集部３２は、予め音声ガイダンス情報３５に登録されている音声ガイダンスのパターンに基づいて、ユーザ２からの電話機５１による通話に対して音声でのガイダンスを自動的に行う機能を有する。そして、ガイダンスに対する応答としてユーザ２が発話した音声の内容をデジタルデータとして録音し、音声データＤＢ３４に記録する。 The voice collection unit 32 has a function of automatically performing voice guidance for a call from the user 2 by the telephone 51 based on a voice guidance pattern registered in the voice guidance information 35 in advance. Then, the content of the voice uttered by the user 2 as a response to the guidance is recorded as digital data and recorded in the voice data DB 34.

音声の情報に基づいて、後述する音声分析サーバ４０で音声病態分析を精度よく行うには、発話として区切ることができる単位での音声データを複数個（例えば、７、８個程度）取得することが望ましい。本実施の形態では、後述するように、複数個の音声データを取得できるよう、音声ガイダンスとして複数個の質問をユーザ２に対して発出し、その回答を音声データとしてそれぞれ記録するものとする。 In order to accurately analyze the voice pathological condition by the voice analysis server 40 described later based on the voice information, a plurality of voice data (for example, about 7 or 8) in units that can be divided as utterances are acquired. Is desirable. In this embodiment, as will be described later, a plurality of questions are issued to the user 2 as voice guidance so that a plurality of voice data can be acquired, and the answers are recorded as voice data.

質問は、予め設定された一定のパターンのものを毎回用いてもよいし、予め用意した質問候補の中から必要数の質問をランダムもしくは所定の基準により選択して用いるようにしてもよい。ユーザ２のＩＤコードに基づいて認証マスタＤＢ３３等からユーザの属性情報を取得し、これに応じて質問を切り替えてもよい。また、回答内容を即時に音声認識して解析し、回答内容に応じて質問を切り替えてもよい。 Questions with a predetermined pattern may be used each time, or a required number of questions may be selected from among candidate questions prepared in advance at random or based on a predetermined standard. The user attribute information may be acquired from the authentication master DB 33 or the like based on the ID code of the user 2 and the question may be switched accordingly. Also, the answer contents may be immediately recognized and analyzed, and the question may be switched according to the answer contents.

上記のＰＢＸ１０、ＣＴＩサーバ２０、およびＩＶＲサーバ３０からなる音声データ収集環境により、ユーザ２が電話機５１を利用して通話を行った際の音声データを収集するサービスを提供する。本実施の形態では、音声データの収集手段はこれに限られない。例えば、ユーザ２がスマートフォン５２を利用してスマートフォン５２上に録音した音声データを用いることもできる。この場合は、電話サービスのキャリアの通話品質に左右されずに音声データを取得することが可能である。これらの手段により収集された音声データは、収集手段による区別なく共通に、後述する音声分析サーバ４０での分析のために用いられる。このように、音声データの収集手段を複数設けることにより、ユーザ２の状況に応じた簡易・適切な手段で音声データを収集することが可能となる。 The voice data collection environment including the PBX 10, the CTI server 20, and the IVR server 30 provides a service for collecting voice data when the user 2 makes a call using the telephone 51. In the present embodiment, the voice data collection means is not limited to this. For example, voice data recorded on the smartphone 52 by the user 2 using the smartphone 52 can be used. In this case, it is possible to acquire voice data regardless of the call quality of the carrier of the telephone service. The voice data collected by these means is commonly used for analysis by the voice analysis server 40 to be described later without being distinguished by the collection means. Thus, by providing a plurality of voice data collection means, it is possible to collect voice data with simple and appropriate means according to the situation of the user 2.

音声分析サーバ４０は、各音声データ収集環境のＩＶＲサーバ３０や、ユーザ２が使用するスマートフォン５２等から音声データを取得して音声データＤＢ４５として記録し、その内容を上述のＭＩＭＯＳＹＳ等の音声病態分析エンジンにより分析し、結果をユーザ２が閲覧可能なように可視化するサーバ機器である。例えば、クラウドコンピューティング環境に構築された仮想サーバにより構成され、各拠点の音声データ収集環境に対して音声病態分析の機能をクラウドサービスとして提供する構成とすることができる。 The voice analysis server 40 acquires voice data from the IVR server 30 of each voice data collection environment, the smartphone 52 used by the user 2, and the like and records it as the voice data DB 45, and the contents thereof are voice pathological analysis such as the above-mentioned MIMSSYS. It is a server device that analyzes the engine and visualizes the result so that the user 2 can view it. For example, it can be configured by a virtual server constructed in a cloud computing environment, and can provide a voice pathological analysis function as a cloud service for the voice data collection environment at each site.

本実施の形態の音声分析サーバ４０は、例えば、図示しないＯＳやＤＢＭＳ、Ｗｅｂサーバプログラム等のミドルウェア上で稼働するソフトウェアプログラムとして実装された音声データ取得部４１、音声分析部４２、分析結果処理部４３、およびユーザインタフェース（ＩＦ）部４４などの各部を有する。また、データベース等として実装された音声データＤＢ４５、ユーザマスタＤＢ４６、および分析結果ＤＢ４７などの各データストアを有する。 The voice analysis server 40 according to the present embodiment includes, for example, a voice data acquisition unit 41, a voice analysis unit 42, and an analysis result processing unit implemented as a software program that runs on middleware such as an OS, a DBMS, and a Web server program (not shown). 43 and a user interface (IF) unit 44. Each data store includes a voice data DB 45, a user master DB 46, and an analysis result DB 47 implemented as a database.

音声データ取得部４１は、音声データ収集環境のＩＶＲサーバ３０や、ユーザ２が使用するスマートフォン５２等から音声データを取得し、音声データＤＢ４５として記録する機能を有する。例えば、各拠点のＩＶＲサーバ３０にて取得された音声データＤＢ３４のうち未処理のものは、一定時間毎（例えば１時間毎）のバッチ処理にて音声分析サーバ４０に送信される。音声データのサイズやネットワークの帯域等との関係で、リアルタイムもしくはこれに近い状態での送信が可能な場合は、ＩＶＲサーバ３０で取得された音声データを、都度リアルタイムで音声分析サーバ４０に送信する構成とすることも可能である。 The voice data acquisition unit 41 has a function of acquiring voice data from the IVR server 30 in the voice data collection environment, the smartphone 52 used by the user 2, and the like and recording it as the voice data DB 45. For example, the unprocessed audio data DB 34 acquired by the IVR server 30 at each site is transmitted to the audio analysis server 40 by batch processing at regular intervals (for example, every hour). When transmission in real time or near this is possible due to the size of audio data, network bandwidth, etc., the audio data acquired by the IVR server 30 is transmitted to the audio analysis server 40 in real time each time. A configuration is also possible.

また、スマートフォン５２上の図示しないアプリケーションからリアルタイムで送信される音声データを受信して音声データＤＢ４５に記録するようにしてもよい。音声データ取得部４１は、音声データ収集環境のＩＶＲサーバ３０やユーザ２のスマートフォン５２から送信された音声データについて、必要に応じてフォーマット変換する等の処理や加工を行って音声データＤＢ４５に記録する。 Further, audio data transmitted in real time from an application (not shown) on the smartphone 52 may be received and recorded in the audio data DB 45. The voice data acquisition unit 41 performs processing and processing such as format conversion on voice data transmitted from the IVR server 30 in the voice data collection environment or the smartphone 52 of the user 2 as necessary, and records the data in the voice data DB 45. .

音声分析部４２は、音声データＤＢ４５に記録されたユーザ２毎の音声データについて、音声病態分析を行なって心の状態（現在の状態、時系列での変化の状況）を把握する機能を有する。本実施の形態では、上述のＭＩＭＯＳＹＳ等の音声病態分析エンジンを含んで実装される。音声データＤＢ４５に新たに音声データが記録されたことをトリガーとして、未処理の音声データについて都度リアルタイムで分析を行うようにしてもよいし、一定時間毎（例えば１時間毎）に、音声データＤＢ４５に蓄積されている未処理の音声データについて一括して分析を行うようにしてもよい。 The voice analysis unit 42 has a function of performing voice pathological analysis on the voice data for each user 2 recorded in the voice data DB 45 to grasp the state of mind (current state, time-series change state). In the present embodiment, the speech pathological analysis engine such as the above-described MIMMOS is included. Triggered by newly recorded audio data in the audio data DB 45, the unprocessed audio data may be analyzed in real time each time, or at regular intervals (for example, every hour). The unprocessed audio data stored in the database may be analyzed collectively.

分析結果処理部４３は、音声分析部４２により音声病態分析が行われた結果を取得して、これをユーザ２毎に可視化して分析結果ＤＢ４７に記録する機能を有する。可視化とは、例えば、分析結果の出力データを、ユーザ２に提示できるような文字情報や数値情報、評価情報等に変換等するとともに、情報処理端末５３上に表示できるような画面データや画像データを生成する処理を含む。このとき、例えば、ユーザ２毎の属性情報や独自の設定情報等を保持するユーザマスタＤＢ４６を参照して、ユーザ２に即した可視化を行うようにしてもよい。 The analysis result processing unit 43 has a function of acquiring the result of the voice pathological analysis performed by the voice analysis unit 42, visualizing the result for each user 2, and recording the result in the analysis result DB 47. Visualization refers to, for example, conversion of output data of analysis results into character information, numerical information, evaluation information, etc. that can be presented to the user 2, and screen data or image data that can be displayed on the information processing terminal 53. Including the process of generating At this time, for example, the user master DB 46 that holds attribute information for each user 2, unique setting information, and the like may be referred to, and visualization according to the user 2 may be performed.

ユーザＩＦ部４４は、情報処理端末５３を介したユーザ２からの要求を受け付けて、分析結果ＤＢ４７に記録された可視化されたデータを情報処理端末５３の画面上に表示させる機能を有する。まだユーザ登録しておらず、ＩＤコード等のアカウント情報を有していない不特定のユーザ２に対して、新たにアカウント情報を登録してＩＤコードを発行する機能を有していてもよい。 The user IF unit 44 has a function of accepting a request from the user 2 via the information processing terminal 53 and displaying the visualized data recorded in the analysis result DB 47 on the screen of the information processing terminal 53. It may have a function of newly registering account information and issuing an ID code for an unspecified user 2 who has not yet registered as a user and does not have account information such as an ID code.

なお、情報処理端末５３は、例えば、ＰＣ（Personal Computer）や、スマートフォン、タブレット型端末などの汎用の情報処理端末を適宜利用することができる。音声データ録音用のスマートフォン５２を有している場合はこれをそのまま情報処理端末５３として用いてもよい。１人のユーザ２が状況に応じて複数種類の情報処理端末５３を使い分けてもよい。ユーザ２は、例えば、情報処理端末５３上の図示しないＷｅｂブラウザを利用してクラウドコンピューティング環境上の音声分析サーバ４０にアクセスする。 As the information processing terminal 53, for example, a general-purpose information processing terminal such as a PC (Personal Computer), a smartphone, or a tablet terminal can be used as appropriate. When the smartphone 52 for recording voice data is provided, it may be used as the information processing terminal 53 as it is. One user 2 may use different types of information processing terminals 53 depending on the situation. For example, the user 2 accesses the voice analysis server 40 on the cloud computing environment using a web browser (not shown) on the information processing terminal 53.

図１の例では、ＰＢＸ１０、ＣＴＩサーバ２０、ＩＶＲサーバ３０、および音声分析サーバ４０をそれぞれ個別の機器やサーバシステムとして構成しているが、このような構成に限られない。さらに複数のサーバシステム等に機能を分散して構成してもよいし、逆に複数のサーバシステムの機能を１つのサーバシステムに集約して構成してもよい。 In the example of FIG. 1, the PBX 10, the CTI server 20, the IVR server 30, and the voice analysis server 40 are configured as individual devices and server systems, respectively, but the configuration is not limited thereto. Furthermore, the functions may be configured to be distributed among a plurality of server systems, or conversely, the functions of the plurality of server systems may be integrated into one server system.

＜処理の流れ（電話での通話を介した音声データの収集）＞
図２は、本実施の形態における音声データの収集と分析に係る処理の流れの例について概要を示した図である。ここでは、ユーザ２が電話機５１を用いて行った通話から音声データ収集環境が音声データを取得・収集する場合について示す。まず、ユーザ２が電話機５１により所定の電話番号に電話をかける（Ｓ０１）。この電話を、対応する拠点すなわち音声データ収集環境のＰＢＸ１０が受けると、まず、対応するＣＴＩサーバ２０の収集制御部２１等により、サービス時間内であるか否かの判定を行う（Ｓ０２）。サービス時間外である場合は、図示しないが、対応するＩＶＲサーバ３０の音声収集部３２と連携してその旨の音声ガイダンスをユーザ２の電話機５１に応答して、通話を終了する、すなわち全体の処理を終了する。 <Processing flow (voice data collection via telephone call)>
FIG. 2 is a diagram showing an outline of an example of the flow of processing related to the collection and analysis of audio data in the present embodiment. Here, a case where the voice data collection environment acquires and collects voice data from a call made by the user 2 using the telephone 51 will be described. First, the user 2 makes a call to a predetermined telephone number by the telephone 51 (S01). When this call is received by the corresponding base, that is, the PBX 10 in the voice data collection environment, first, the collection control unit 21 of the corresponding CTI server 20 determines whether it is within the service time (S02). When it is outside the service time, although not shown, the voice guidance to that effect is responded to the telephone 51 of the user 2 in cooperation with the voice collecting unit 32 of the corresponding IVR server 30, that is, the entire call is terminated. The process ends.

サービス時間内である場合は、ＩＶＲサーバ３０の音声収集部３２と連携して、ユーザの認証を行うためにＩＤコード等の識別情報の入力を促す音声からなる認証ガイダンスをユーザ２の電話機５１に応答する（Ｓ０３）。ここでは、例えば、電話機５１のプッシュボタンやテンキー等により、ＩＤコードをなす一連の数字をプッシュして入力するようユーザ２に指示する。 If it is within the service time, an authentication guidance consisting of voice prompting the input of identification information such as an ID code is performed on the telephone 51 of the user 2 in cooperation with the voice collecting unit 32 of the IVR server 30. A response is made (S03). Here, for example, the user 2 is instructed to push and input a series of numbers forming an ID code by using a push button or a numeric keypad of the telephone 51.

認証ガイダンスに従ってユーザ２がＩＤコード等を入力すると（Ｓ０４）、ＣＴＩサーバ２０の収集制御部２１は、ＩＶＲサーバ３０の認証部３１と連携して、入力されたＩＤコード等が認証マスタＤＢ３３に登録されているか否かによりユーザの認証を行う（Ｓ０５）。上述したように、認証の手法はこれに限られず、これに代えて、もしくはこれに加えて、例えば、ユーザ２に発話を促し、取得した音声データの声紋等を解析して認証する等の他の手法を用いてもよい。 When the user 2 inputs an ID code or the like according to the authentication guidance (S04), the collection control unit 21 of the CTI server 20 registers the input ID code or the like in the authentication master DB 33 in cooperation with the authentication unit 31 of the IVR server 30. The user is authenticated depending on whether or not it is done (S05). As described above, the authentication method is not limited to this, and instead of this, or in addition to this, for example, the user 2 is urged to speak, and the voice print or the like of the acquired voice data is analyzed and authenticated. You may use the method of.

認証が失敗した場合、すなわち入力されたＩＤコード等が認証マスタＤＢ３３に登録されていない場合は、図示しないが、所定の回数（例えば、３回）のリトライを許容する。所定のリトライ回数をオーバーした場合は、認証できなかった旨の音声ガイダンスをユーザ２の電話機５１に応答して、通話を終了する。 If the authentication fails, that is, if the input ID code or the like is not registered in the authentication master DB 33, a predetermined number of times (for example, three times) is allowed, although not shown. If the predetermined number of retries is exceeded, voice guidance indicating that the authentication could not be performed is responded to the telephone 51 of the user 2 and the call is terminated.

認証が成功した場合は、ＩＶＲサーバ３０の音声収集部３２と連携して、ユーザ２に対して所定の質問を行う音声ガイダンスをユーザ２の電話機５１に応答する（Ｓ０６）。ユーザ２が電話機５１により質問に対する回答を発話すると（Ｓ０７）、ＣＴＩサーバ２０の収集制御部２１は、ＩＶＲサーバ３０の音声収集部３２と連携して、発話内容を音声データとして録音する（Ｓ０８）。得られた音声データは音声データＤＢ３４に記録する。音声データが適切に取得できなかった場合や、録音できなかった場合は、図示しないが、所定の回数（例えば、３回）のリトライを許容する。リトライ回数をオーバーした場合は、音声データが取得できなかった旨の音声ガイダンスをユーザ２の電話機５１に応答して、通話を終了する。 When the authentication is successful, in cooperation with the voice collecting unit 32 of the IVR server 30, voice guidance for making a predetermined question to the user 2 is returned to the telephone 51 of the user 2 (S06). When the user 2 utters an answer to the question using the telephone 51 (S07), the collection control unit 21 of the CTI server 20 records the utterance content as voice data in cooperation with the voice collection unit 32 of the IVR server 30 (S08). . The obtained voice data is recorded in the voice data DB 34. If the audio data cannot be properly acquired or recorded, retrying a predetermined number of times (for example, 3 times) is allowed, although not shown. When the number of retries is exceeded, voice guidance indicating that voice data could not be acquired is responded to the telephone 51 of the user 2 and the call is terminated.

音声データが録音できた場合は、当該通話中で音声データの録音が所定の回数できたか否か、すなわち、音声データを所定の数以上取得できたか否かを判定する（Ｓ０９）。所定の数に満たない場合は、ステップＳ０６に戻って次（ｎ個目）の質問を行う。音声データを取得する所定の数は、音声分析サーバ４０の音声分析部４２で用いられる音声病態分析エンジンの仕様や必要な精度等に応じて適宜設定する。例えば、ＭＩＭＯＳＹＳの場合は現在７個以上の音声データがあればよいとされる。 If the voice data can be recorded, it is determined whether or not the voice data has been recorded a predetermined number of times during the call, that is, whether or not a predetermined number or more of voice data has been acquired (S09). If it is less than the predetermined number, the process returns to step S06 to ask the next (nth) question. The predetermined number for acquiring the voice data is appropriately set according to the specifications of the voice pathological analysis engine used in the voice analysis unit 42 of the voice analysis server 40, the required accuracy, and the like. For example, in the case of MIMOSYS, it is sufficient that there are currently seven or more audio data.

所定の数の音声データを取得できた場合は、ＩＶＲサーバ３０の音声収集部３２と連携して、質問を終了して通話を終了する旨の終了ガイダンスをユーザ２の電話機５１に応答する（Ｓ１０）。そして、ユーザ２は電話機５１による通話を終了する（Ｓ１１）。ＣＴＩサーバ２０の収集制御部２１が主導して通話を終了してもよい。なお、所定の数の音声データが取得できる前にユーザ２との通話が切れた場合は、それまでに取得できた音声データのみを用いるようにしてもよい。 When a predetermined number of voice data can be acquired, in cooperation with the voice collection unit 32 of the IVR server 30, a termination guidance for terminating the question and terminating the call is returned to the telephone 51 of the user 2 (S10). ). Then, the user 2 ends the telephone call using the telephone 51 (S11). The collection control unit 21 of the CTI server 20 may lead and end the call. If the call with the user 2 is disconnected before the predetermined number of audio data can be acquired, only the audio data acquired up to that time may be used.

このように、本実施の形態では、所定の数の質問を繰り返して回答の音声データを取得する。上述したように、質問は、予め定められたパターンのものを毎回用いてもよいし、予め用意した質問候補の中から必要数の質問をランダムもしくは所定の基準により選択して用いるようにしてもよい。ユーザ２のＩＤコードに基づいてユーザの属性情報を取得し、これに応じて質問を切り替えてもよいし、回答内容を即時に音声認識して解析し、回答内容に応じて質問を切り替えてもよい。このような変化をつけることにより、ユーザ２をできるだけ飽きさせないようにすることができる。 As described above, in the present embodiment, a predetermined number of questions are repeated to obtain answer voice data. As described above, the question may be used in a predetermined pattern every time, or a necessary number of questions may be selected from random candidate questions or selected based on a predetermined criterion. Good. The user attribute information may be acquired based on the ID code of the user 2 and the question may be switched according to the user's attribute information, or the answer content may be immediately recognized and analyzed, and the question may be switched according to the answer content. Good. By giving such a change, it is possible to prevent the user 2 from getting bored as much as possible.

全ての質問が異なるものであることが要求されない場合には、各質問のうち１つ以上が他と同じ内容の質問であってもよい。また、音声データの収集のみを目的としてユーザ２に発話させるのではなく、例えば、電話予約など電話による他の業務やサービスと組み合わせ、その際に発話される内容を取得して転用するようにしてもよい。 If not all questions are required to be different, one or more of each question may be the same content as the others. Also, instead of letting the user 2 speak only for the purpose of collecting voice data, for example, it is combined with other services and services by telephone such as telephone reservation, and the contents spoken at that time are acquired and diverted. Also good.

上記の一連の処理により録音された音声データは、例えば、１時間毎等の一定時間毎にＩＶＲサーバ３０から音声分析サーバ４０に転送される（Ｓ２１）。音声分析サーバ４０では、音声データ取得部４１により、取得した音声データを音声データＤＢ４５に記録する（Ｓ２２）。そして、例えば一定時間毎に、未処理の音声データに対して、音声分析部４２により音声病態分析の処理を行う（Ｓ２３）。処理を行った音声データは、プライバシー等を考慮して削除するのが望ましい。その後、分析結果の内容を分析結果処理部４３により可視化して分析結果ＤＢ４７に記録する処理を行う（Ｓ２４）。この一連の処理により、ユーザ２が分析結果を表示・閲覧する準備が完了したことになる。分析が完了した旨を対象のユーザ２に対して電子メールやプッシュ通知等により通知するようにしてもよい。 The voice data recorded by the above series of processes is transferred from the IVR server 30 to the voice analysis server 40 at regular time intervals such as every hour (S21). In the voice analysis server 40, the voice data acquisition unit 41 records the acquired voice data in the voice data DB 45 (S22). Then, for example, the speech analysis unit 42 performs speech pathological analysis processing on unprocessed speech data at regular time intervals (S23). It is desirable to delete the processed audio data in consideration of privacy and the like. Thereafter, the content of the analysis result is visualized by the analysis result processing unit 43 and recorded in the analysis result DB 47 (S24). With this series of processes, the user 2 is ready to display and browse the analysis results. You may make it notify to the target user 2 by e-mail, a push notification, etc. that the analysis was completed.

ユーザ２は、音声分析サーバ４０での分析の完了後の任意のタイミングで、情報処理端末５３上のＷｅｂブラウザ等を利用して音声分析サーバ４０のユーザＩＦ部４４にアクセスし、アカウント情報を入力してログインする（Ｓ３１）。アカウント情報としては、例えば、ユーザ２のＩＤコードとパスワードを用いることができるが、他のアカウント情報であってもよく、また、生体認証等の他の手法のための認証情報を含んでいてもよい。 The user 2 accesses the user IF unit 44 of the voice analysis server 40 by using a Web browser or the like on the information processing terminal 53 and inputs account information at an arbitrary timing after the analysis by the voice analysis server 40 is completed. And login (S31). As the account information, for example, the ID code and password of the user 2 can be used, but other account information may be used, and authentication information for other methods such as biometric authentication may be included. Good.

音声分析サーバ４０では、ユーザマスタＤＢ４６の登録内容に基づいてユーザ認証を行う（Ｓ３２）。認証が失敗した場合、すなわち入力されたＩＤコードやパスワード等がユーザマスタＤＢ４６に登録されていない場合や登録されている内容と合致しない場合は、図示しないが、認証できなかった旨の画面を情報処理端末５３上に表示して処理を終了する。 The voice analysis server 40 performs user authentication based on the registered contents of the user master DB 46 (S32). If the authentication fails, that is, if the input ID code, password, etc. are not registered in the user master DB 46 or do not match the registered contents, a screen indicating that the authentication could not be performed is not shown. The information is displayed on the processing terminal 53 and the processing is terminated.

認証が成功した場合は、分析結果ＤＢ４７から対象のユーザ２に係る分析結果のデータを取得して、表示画面（例えば、ＨＴＭＬ（HyperText Markup Language）データ）を生成して出力する（Ｓ３３）。情報処理端末５３では、図示しないＷｅｂブラウザ上で、出力された分析結果の画面を表示する（Ｓ３４）。これにより、ユーザ２は、自身の発話に係る音声データに対して音声病態分析を行った結果として、心の状態を把握することができる。 When the authentication is successful, the analysis result data relating to the target user 2 is acquired from the analysis result DB 47, and a display screen (for example, HTML (HyperText Markup Language) data) is generated and output (S33). The information processing terminal 53 displays the output analysis result screen on a web browser (not shown) (S34). Thereby, the user 2 can grasp the state of the mind as a result of performing the speech pathological analysis on the speech data related to his / her speech.

＜画面例＞
図３は、音声分析サーバ４０での分析結果として情報処理端末５３上に表示される画面の例について概要を示した図である。図示しないログイン画面において、ユーザ２がＩＤコードやパスワード等のアカウント情報を入力してログインした後、例えば、図３（ａ）に示すような分析結果の画面が表示される。この画面では、「心の活量値」として、時系列（例えば、直近の２週間等）での変化の状況に基づく評価を示している。ここでは、例えば、計測値の範囲を数段階に区分して、計測値に対応する区分の内容（図中の例では「キラキラ★」を示すことで、ユーザ２が直感的に理解しやすいようにしている。 <Screen example>
FIG. 3 is a diagram showing an outline of an example of a screen displayed on the information processing terminal 53 as an analysis result in the voice analysis server 40. After the user 2 logs in by entering account information such as an ID code and a password on a login screen (not shown), for example, an analysis result screen as shown in FIG. 3A is displayed. This screen shows an evaluation based on the state of change in time series (for example, the latest two weeks) as the “heart activity value”. Here, for example, the range of the measured value is divided into several stages, and the content of the category corresponding to the measured value (in the example shown in the figure, “Glitter ★”) makes it easy for the user 2 to understand intuitively. I have to.

また、ユーザ２が画面を切り替えることで、図３（ｂ）に示すような分析結果の画面が表示される。この画面では、「元気圧」として、対象の通話を行った時点、すなわち音声データを取得した時点での心の元気さの評価を、ユーザ２が直感的に把握しやすいようにメーターの画像により示している。これらの画面では、いずれも、過去の分析結果の履歴についても閲覧することが可能である。なお、図３に示した画面はあくまで一例であり、他の表示形式をとってもよいことは言うまでもない。 Further, when the user 2 switches the screen, an analysis result screen as shown in FIG. 3B is displayed. On this screen, as the “original pressure”, the evaluation of the spirit at the time when the target call is made, that is, when the voice data is acquired, is displayed on the meter image so that the user 2 can easily grasp intuitively. Show. In any of these screens, it is possible to browse the history of past analysis results. Note that the screen shown in FIG. 3 is merely an example, and it goes without saying that other display formats may be used.

＜処理の流れ（スマートフォンによる音声データの収集）＞
図４は、本実施の形態における音声データの収集と分析に係る処理の流れの他の例について概要を示した図である。ここでは、スマートフォン５２が有する音声録音機能を用いてユーザ２が音声データを取得・収集する場合について示す。まず、ユーザ２がスマートフォン５２上で専用のアプリケーションを起動する（Ｓ４１、Ｓ４２）。スマートフォン５２上のアプリケーションは、起動すると、まず、認証画面を表示する（Ｓ４３）。 <Process flow (voice data collection by smartphone)>
FIG. 4 is a diagram showing an outline of another example of the flow of processing relating to the collection and analysis of audio data in the present embodiment. Here, a case where the user 2 acquires and collects voice data using the voice recording function of the smartphone 52 will be described. First, the user 2 starts a dedicated application on the smartphone 52 (S41, S42). When the application on the smartphone 52 is activated, first, an authentication screen is displayed (S43).

認証画面の指示に従ってユーザ２がユーザＩＤやパスワード等のアカウント情報を入力すると（Ｓ４４）、スマートフォン５２のアプリケーションは、音声分析サーバ４０に対して認証要求を行う（Ｓ４５）。音声分析サーバ４０では、ユーザマスタＤＢ４６の登録内容に基づいてユーザ認証を行う（Ｓ４６）。認証が失敗した場合、すなわち入力されたＩＤコードやパスワード等がユーザマスタＤＢ４６に登録されていない場合や登録されている内容と合致しない場合は、図示しないが、認証できなかった旨の画面をスマートフォン５２のアプリケーションに表示して処理を終了する。 When the user 2 inputs account information such as a user ID and a password according to the instruction on the authentication screen (S44), the application of the smartphone 52 makes an authentication request to the voice analysis server 40 (S45). The voice analysis server 40 performs user authentication based on the registered contents of the user master DB 46 (S46). If the authentication fails, that is, if the input ID code, password, etc. are not registered in the user master DB 46 or do not match the registered contents, a screen indicating that the authentication could not be performed is displayed on the smartphone. It displays on the application of 52, and complete | finishes a process.

認証が成功した場合は、スマートフォン５２のアプリケーションは、ユーザ２に対して所定の質問を行う画面を表示する（Ｓ４７）。質問の内容はアプリケーションの更新や音声分析サーバ４０等のサーバからのダウンロードにより、適宜変更することが可能である。ユーザ２がスマートフォン５２のマイクロフォンに対して質問に対する回答を発話すると（Ｓ４８）、スマートフォン５２は、自身が備える音声録音機能により発話内容を音声データとして録音する（Ｓ４９）。得られた音声データはメモリ等に記録しておく。音声データが適切に取得できなかった場合や、録音できなかった場合は、図示しないが、所定の回数（例えば、３回）のリトライを許容する。リトライ回数をオーバーした場合は、音声データが取得できなかった旨の画面をスマートフォン５２のアプリケーション上に表示し、処理を終了する。 When the authentication is successful, the application of the smartphone 52 displays a screen for asking a predetermined question to the user 2 (S47). The contents of the question can be appropriately changed by updating the application or downloading from a server such as the voice analysis server 40. When the user 2 utters an answer to the question to the microphone of the smartphone 52 (S48), the smartphone 52 records the utterance content as voice data by the voice recording function that the user 52 has (S49). The obtained audio data is recorded in a memory or the like. If the audio data cannot be properly acquired or recorded, retrying a predetermined number of times (for example, 3 times) is allowed, although not shown. If the number of retries is exceeded, a screen indicating that voice data could not be acquired is displayed on the application of the smartphone 52, and the process ends.

音声データが録音できた場合は、当該アプリケーションの実行中における音声データの録音が所定の回数できたか否か、すなわち、音声データを所定の数以上取得できたか否かを判定する（Ｓ５０）。所定の数に満たない場合は、ステップＳ４７に戻って次（ｎ個目）の質問を行う。音声データを取得する所定の数は、上述の図２における電話機５１の場合と同様に、音声分析サーバ４０の音声分析部４２で用いられる音声病態分析エンジンの仕様や必要な精度等に応じて適宜設定する。基本的には電話機５１での通話品質と、スマートフォン５２での録音品質との差に基づく相違はなく、例えば、ＭＩＭＯＳＹＳの場合は現在７個以上の音声データがあればよいとされる。 If the audio data can be recorded, it is determined whether or not the audio data has been recorded a predetermined number of times during execution of the application, that is, whether or not the audio data has been acquired a predetermined number or more (S50). If it is less than the predetermined number, the process returns to step S47 to ask the next (nth) question. As in the case of the telephone 51 in FIG. 2 described above, the predetermined number for acquiring the voice data is appropriately determined according to the specifications of the voice pathological analysis engine used in the voice analysis unit 42 of the voice analysis server 40, required accuracy, and the like. Set. Basically, there is no difference based on the difference between the call quality at the telephone 51 and the recording quality at the smartphone 52. For example, in the case of MIMSSYS, it is sufficient that there are currently seven or more voice data.

所定の数の音声データを取得できた場合は、質問を終了する旨の画面を表示する（Ｓ５１）。その後、上記の一連の処理により録音された音声データを、音声分析サーバ４０に転送する（Ｓ５２）。音声分析サーバ４０では、音声データ取得部４１により、取得した音声データを音声データＤＢ４５に記録する（Ｓ５３）。そして、未処理の音声データに対して、音声分析部４２により音声病態分析の処理を行い（Ｓ５４）、分析結果の内容を分析結果処理部４３により可視化して分析結果ＤＢ４７に記録する処理を行う（Ｓ５５）。この一連の処理により、ユーザ２が分析結果を表示・閲覧する準備が完了する。ユーザ２は、音声データの録音後、即時もしくは短時間のうちに、スマートフォン５２のアプリケーションを用いて、上述の図３に示したような分析結果を表示させて参照することができる（Ｓ５６、Ｓ５７）。 If a predetermined number of audio data can be acquired, a screen to end the question is displayed (S51). Thereafter, the voice data recorded by the series of processes described above is transferred to the voice analysis server 40 (S52). In the voice analysis server 40, the voice data acquisition unit 41 records the acquired voice data in the voice data DB 45 (S53). The speech analysis unit 42 performs speech pathological analysis processing on the unprocessed speech data (S54), and the analysis result contents are visualized by the analysis result processing unit 43 and recorded in the analysis result DB 47. (S55). With this series of processes, the user 2 is ready to display and browse the analysis results. The user 2 can display and refer to the analysis result as shown in FIG. 3 using the application of the smartphone 52 immediately or within a short time after recording the voice data (S56, S57). ).

なお、図４の例では、ステップＳ４９でスマートフォン５２上に録音された音声データを、質問が全て終了した時点で一括して音声分析サーバ４０に転送しているが（ステップＳ５２）、音声データの転送の手法はこれに限られない。例えば、ステップＳ４９で録音された音声データをその都度音声分析サーバ４０に送信するようにして、質問終了後の処理負担を軽減し、分析結果が出力されるまでの処理時間を短縮するようにしてもよい。 In the example of FIG. 4, the voice data recorded on the smartphone 52 in step S49 is transferred to the voice analysis server 40 at the time when all the questions are completed (step S52). The transfer method is not limited to this. For example, the voice data recorded in step S49 is transmitted to the voice analysis server 40 each time, thereby reducing the processing burden after the question is finished and reducing the processing time until the analysis result is output. Also good.

＜画面例＞
図５は、スマートフォン５２により音声データを収集するアプリケーションの画面例について概要を示した図である。上段左の画面は、起動時に表示される認証画面であり、この画面を介してユーザ２はユーザＩＤやパスワード等のアカウント情報を入力してユーザ認証を行う。なお、一度入力されたアカウント情報はＣｏｏｋｉｅ等を用いて保持し、次回以降は自動的に認証を行うようにしてもよい。ユーザ認証が成功すると、上段中央に示すような開始およびガイダンスの画面が表示される。ここで「スタート」ボタンを押下すると、上段右に示すような質問画面が表示され、音声データの取得処理が開始する。 <Screen example>
FIG. 5 is a diagram showing an overview of an example screen of an application that collects audio data by the smartphone 52. The upper left screen is an authentication screen displayed at the time of activation, and the user 2 performs user authentication by inputting account information such as a user ID and a password via this screen. Note that the account information once input may be stored using Cookie or the like, and authentication may be automatically performed after the next time. When the user authentication is successful, a start and guidance screen as shown in the upper center is displayed. When the “Start” button is pressed here, a question screen as shown in the upper right is displayed, and the voice data acquisition process starts.

ユーザ２は、画面の「マイク」ボタンをタップして、画面に表示されている質問事項に対する回答を発話する。このとき、画面は下段左に示すような録音中の画面となる。アプリケーションは、スマートフォン５２の音声録音機能を利用して、マイクロフォンから入力されたユーザ２の発話を音声データとして録音する。ユーザ２が下段左の画面における「停止」ボタンをタップすることで録音は終了する。 The user 2 taps the “microphone” button on the screen, and utters an answer to the question item displayed on the screen. At this time, the screen is a recording screen as shown in the lower left. The application uses the voice recording function of the smartphone 52 to record the speech of the user 2 input from the microphone as voice data. Recording is ended when the user 2 taps a “stop” button on the lower left screen.

録音が終了すると、次の質問に対する処理が繰り返される。現時点で回答した質問および残りの質問の数は、画面上部の●（回答済み）、○（未回答）のマークの数により示される。所定の数の質問に対する回答（音声データの録音）が完了すると、下段中央に示すような終了画面が表示される。このとき、録音した音声データがスマートフォン５２から音声分析サーバ４０に自動的に送信され、音声分析サーバ４０で音声病態分析の処理がリアルタイムで行われる。 When the recording is finished, the process for the next question is repeated. The number of questions that have been answered at this time and the remaining questions are indicated by the numbers of ● (answered) and ○ (unanswered) marks at the top of the screen. When answers to predetermined number of questions (recording of voice data) are completed, an end screen as shown in the lower center is displayed. At this time, the recorded voice data is automatically transmitted from the smartphone 52 to the voice analysis server 40, and the voice pathological analysis process is performed in real time by the voice analysis server 40.

その後、ユーザ２は、例えば、下段右に示すようなメニュー画面から「分析結果サイトへ」という項目を選択することで、上述の図３に示したような分析結果の画面をアプリケーションもしくは連携するＷｅｂブラウザ上に表示させ、即時に分析結果を確認することができる。 Thereafter, for example, the user 2 selects an item “go to analysis result site” from the menu screen as shown on the lower right side, so that the analysis result screen as shown in FIG. It can be displayed on the browser and the analysis result can be confirmed immediately.

＜データ構成＞
図６は、ＩＶＲサーバ３０の認証マスタＤＢ３３のデータ構成の例について概要を示した図である。認証マスタＤＢ３３は、ユーザ２毎に、音声データの取得のために電話機５１により通話を行う際にユーザ認証を行うための認証データを保持するマスタテーブルであり、例えば、ＩＤコード、およびユーザ名などの各項目を有する。 <Data structure>
FIG. 6 is a diagram showing an overview of an example of the data configuration of the authentication master DB 33 of the IVR server 30. The authentication master DB 33 is a master table that holds authentication data for performing user authentication when a call is made by the telephone 51 for acquiring voice data for each user 2, and includes, for example, an ID code and a user name. Each item.

ＩＤコードの項目は、対象のユーザ２に対してユニークに割り当てられた数桁の数字からなるコード値である。電話機５１でのボタンのプッシュにより入力されるため、数字により構成されるものとしているが、入力が可能な場合には文字や記号を含んでいてもよい。ユーザ名の項目は、対象のユーザ２の属性情報としてのユーザ名や氏名の情報を保持する。例えば、ユーザ２に対する認証時に、公知のテキスト読み上げ機能によりユーザ名を音声として応答することで、対象のユーザ２として正しく認証されたことをユーザ２自身が確認できるようにしてもよい。 The ID code item is a code value composed of several digits uniquely assigned to the target user 2. Since it is input by pushing a button on the telephone 51, it is assumed to be composed of numbers. However, if input is possible, it may include characters and symbols. The user name item holds user name and name information as attribute information of the target user 2. For example, when the user 2 is authenticated, the user 2 may confirm that the user 2 has been correctly authenticated by responding with the user name as a voice by a known text-to-speech function.

図７は、ＩＶＲサーバ３０の音声データＤＢ３４のデータ構成の例について概要を示した図である。音声データＤＢ３４は、ユーザ２の通話における発話毎に、録音した音声データを保持するテーブルであり、例えば、ＩＤコード、拠点コード、通話番号、データ番号、音声データ、および録音日時などの各項目を有する。 FIG. 7 is a diagram showing an outline of an example of the data configuration of the audio data DB 34 of the IVR server 30. The voice data DB 34 is a table that holds recorded voice data for each utterance in the call of the user 2, and includes items such as an ID code, a base code, a call number, a data number, voice data, and a recording date and time. Have.

ＩＤコードの項目は、対象の音声データに係る発話を行ったユーザ２を特定するコード値を保持する。この項目は、上述の図６の認証マスタＤＢ３３におけるＩＤコードの項目と同じである。拠点コードの項目は、対象の音声データを収集した拠点、すなわち、対象の音声データに係る通話を受けたＰＢＸ１０等が属する音声データ収集環境を特定するＩＤやコード等の情報を保持する。ＰＢＸ１０に対応する電話番号であってもよい。通話番号の項目は、各ユーザ２においてそれぞれの通話を一意に識別するシーケンス番号等の情報を保持する。また、データ番号の項目は、各通話においてそれぞれの発話を一意に識別するシーケンス番号等の情報を保持する。すなわち、ＩＤコード、通話番号、およびデータ番号の項目の組み合わせにより、音声データ収集システム１全体で対象の発話（音声データ）を一意に特定することができる。 The item of ID code holds a code value that identifies the user 2 who has made a speech related to the target voice data. This item is the same as the ID code item in the authentication master DB 33 of FIG. The base code item holds information such as an ID and a code for specifying a base where the target voice data is collected, that is, a voice data collection environment to which the PBX 10 or the like that has received a call related to the target voice data belongs. The telephone number corresponding to PBX10 may be sufficient. The call number item holds information such as a sequence number for uniquely identifying each call in each user 2. Further, the data number item holds information such as a sequence number for uniquely identifying each utterance in each call. That is, the target speech (voice data) can be uniquely specified in the entire voice data collection system 1 by the combination of the items of the ID code, the call number, and the data number.

音声データの項目は、対象の発話において録音された音声データの情報を保持する。音声データ自体を直接保持していてもよいし、音声データをファイルとして別途保持しておき、そのファイル名やパスの情報を保持するようにしてもよい。録音日時の項目は、対象の発話に係る音声データを録音したときのタイムスタンプの情報を保持する。 The audio data item holds information of audio data recorded in the target speech. The audio data itself may be held directly, or the audio data may be held separately as a file, and the file name and path information may be held. The recording date / time item holds time stamp information when the voice data related to the target speech is recorded.

図８は、音声分析サーバ４０の音声データＤＢ４５のデータ構成の例について概要を示した図である。音声データＤＢ４５は、各拠点のＩＶＲサーバ３０もしくは各ユーザ２のスマートフォン５２からそれぞれ送られた音声データの情報を記録するテーブルであり、基本的には上述の図７の音声データＤＢ３４と同様のデータ構成を有する。スマートフォン５２からの音声データも含むため、電話機５１での通話を特定する通話番号に加えて、スマートフォン５２での各発話を特定する発話番号の情報も保持できるよう、通話番号の項目に代えて通話・発話番号の項目を有している。また、音声データＤＢ３４のデータ構成に加えて、例えば、分析状況などの項目を有する。分析状況の項目は、対象の音声データについての分析が実施済みか未済かを識別するコード値やフラグ等の情報を保持する。 FIG. 8 is a diagram showing an outline of an example of the data configuration of the voice data DB 45 of the voice analysis server 40. The voice data DB 45 is a table for recording information of voice data sent from the IVR server 30 at each site or the smartphone 52 of each user 2, and basically the same data as the voice data DB 34 in FIG. It has a configuration. Since voice data from the smartphone 52 is also included, in place of the call number specifying the call on the telephone 51, the call number is replaced with the call number item so that information on the utterance number specifying each utterance on the smartphone 52 can be held.・ Has an utterance number field. Further, in addition to the data structure of the voice data DB 34, for example, items such as analysis status are included. The analysis status item holds information such as a code value and a flag for identifying whether the analysis of the target audio data has been performed or not.

図９は、音声分析サーバ４０のユーザマスタＤＢ４６のデータ構成の例について概要を示した図である。ユーザマスタＤＢ４６は、音声データに基づく音声病態分析の結果を閲覧するためのユーザ２のアカウント情報を保持するマスタテーブルであり、例えば、ＩＤコード、パスワード、ユーザ名、ユーザ属性情報、およびユーザ設定情報などの各項目を有する。 FIG. 9 is a diagram showing an outline of an example of the data configuration of the user master DB 46 of the voice analysis server 40. The user master DB 46 is a master table that holds the account information of the user 2 for browsing the result of voice pathological analysis based on the voice data. For example, the ID code, password, user name, user attribute information, and user setting information Each item.

ＩＤコードの項目は、対象のユーザ２を一意に識別するコード値を保持する。この項目は、上述の図６の認証マスタＤＢ３３におけるＩＤコードの項目と同じであってもよい。パスワードの項目は、対象のユーザ２のＩＤコードに対応するパスワードの情報を保持する。パスワードに加えて、もしくはこれに代えて、生体認証その他の認証手法に係る認証情報を保持していてもよい。さらに、パスワードの登録日時や更新日時等の情報を保持する項目を有していてもよい。 The ID code item holds a code value that uniquely identifies the target user 2. This item may be the same as the item of the ID code in the authentication master DB 33 of FIG. The password item holds password information corresponding to the ID code of the target user 2. In addition to or instead of the password, authentication information related to biometric authentication or other authentication methods may be held. Further, it may have an item for holding information such as password registration date and time and update date and time.

ユーザ名の項目は、対象のユーザの表示名の情報を保持する。ユーザ属性情報の項目は、対象のユーザ２の各種の属性情報を保持する。例えば、性別や年齢等が含まれ得る。これらの属性情報は、例えば、音声病態分析や分析結果の可視化等の処理を行う際の参照情報とすることができる。ユーザ設定情報の項目は、対象のユーザ２により設定された独自の設定情報の内容を保持する。例えば、音声病態分析や分析結果の可視化等の処理を行う際の条件のカスタマイズや、図３に示したような分析結果を表示する画面における表示項目や表示方法のカスタマイズ等の内容を各ユーザ２がそれぞれ設定することができる。ユーザ属性情報やユーザ設定情報のデータフォーマット等は特に限定されず、任意のものとすることができる。 The item of the user name holds information on the display name of the target user. The item of user attribute information holds various attribute information of the target user 2. For example, sex and age can be included. These attribute information can be used as reference information when performing processing such as voice pathological analysis and visualization of analysis results, for example. The user setting information item holds the contents of the unique setting information set by the target user 2. For example, each user 2 can customize the contents such as customization of conditions when performing processing such as voice pathological analysis and visualization of analysis results, and customization of display items and display methods on the screen displaying the analysis results as shown in FIG. Can be set individually. The data format of user attribute information and user setting information is not particularly limited, and can be arbitrary.

図１０は、音声分析サーバ４０の分析結果ＤＢ４７のデータ構成の例について概要を示した図である。分析結果ＤＢ４７は、ユーザ２毎に、音声データに対する音声病態分析が行われた結果を可視化したデータを保持するテーブルであり、例えば、ＩＤコード、拠点コード、通話・発話番号、分析結果データ、および分析日時などの各項目を有する。 FIG. 10 is a diagram showing an outline of an example of the data configuration of the analysis result DB 47 of the voice analysis server 40. The analysis result DB 47 is a table that holds data visualizing the result of the voice pathological analysis performed on the voice data for each user 2, and includes, for example, an ID code, a base code, a call / utterance number, analysis result data, and Each item includes analysis date and time.

ＩＤコードの項目は、対象のユーザ２を一意に識別するコード値を保持する。この項目は、上述の図９のユーザマスタＤＢ４６におけるＩＤコードの項目と同じである。拠点コードの項目は、対象の分析結果に係る音声データが収集された拠点を特定するシーケンス番号やコード等の情報を保持する。この項目は、上述の図７の音声データＤＢ３４や図８の音声データＤＢ４５における拠点コードの項目と同じである。 The ID code item holds a code value that uniquely identifies the target user 2. This item is the same as the ID code item in the user master DB 46 of FIG. 9 described above. The item of the base code holds information such as a sequence number and a code for identifying the base from which the voice data related to the target analysis result is collected. This item is the same as the item of the base code in the voice data DB 34 of FIG. 7 and the voice data DB 45 of FIG.

通話・発話番号の項目は、対象の分析結果に係る音声データを収集した通話や発話を特定するシーケンス番号等である。この項目は、上述の図７の音声データＤＢ３４や図８の音声データＤＢ４５における通話・発話番号の項目と同じである。分析結果データの項目は、対象の分析結果が可視化された内容に係る情報を保持する。データ自体を保持していてもよいし、データの所在場所のパス等を保持していてもよい。データフォーマット等は特に限定されず、任意のものとすることができる。分析日時の項目は、音声分析部４２および分析結果処理部４３による分析処理および分析結果の可視化の処理が行われたときのタイムスタンプの情報を保持する。 The item of the call / speech number is a sequence number or the like for identifying a call or utterance that collects voice data related to the target analysis result. This item is the same as the call / speech number item in the voice data DB 34 of FIG. 7 and the voice data DB 45 of FIG. The item of the analysis result data holds information relating to the contents of the target analysis result visualized. The data itself may be held, or the path of the data location may be held. The data format and the like are not particularly limited and can be arbitrary. The item of analysis date and time holds time stamp information when the analysis processing and the analysis result visualization processing by the voice analysis unit 42 and the analysis result processing unit 43 are performed.

なお、上述の図６〜図１０で示した各テーブルのデータ構成（項目）はあくまで一例であり、同様のデータを保持・管理することが可能な構成であれば、他のテーブル構成やデータ構成であってもよい。 Note that the data configuration (items) of each table shown in FIGS. 6 to 10 is merely an example, and other table configurations and data configurations may be used as long as similar data can be held and managed. It may be.

以上に説明したように、本発明の一実施の形態である音声データ収集システム１によれば、不特定・特定の多数のユーザ２から、必要な数の音声データをより自然で効率的かつ確実に収集することができ、音声病態分析等の音声データを必要とする処理への入力とすることができる。 As described above, according to the voice data collection system 1 which is an embodiment of the present invention, a necessary number of voice data is obtained from a large number of unspecified / specific users 2 in a more natural, efficient and reliable manner. And can be used as an input to a process that requires voice data such as voice pathological analysis.

以上、本発明者によってなされた発明を実施の形態に基づき具体的に説明したが、本発明は上記の実施の形態に限定されるものではなく、その要旨を逸脱しない範囲で種々変更可能であることはいうまでもない。例えば、上記の実施の形態は本発明を分かりやすく説明するために詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また、上記の各実施の形態の構成の一部について、他の構成の追加・削除・置換をすることが可能である。 As mentioned above, the invention made by the present inventor has been specifically described based on the embodiments. However, the present invention is not limited to the above-described embodiments, and various modifications can be made without departing from the scope of the invention. Needless to say. For example, the above-described embodiment has been described in detail for easy understanding of the present invention, and is not necessarily limited to the one having all the configurations described. In addition, it is possible to add, delete, and replace other configurations for a part of the configuration of each of the above embodiments.

例えば、上記の実施の形態では、１回の通話もしくは発話において所定の数の質問を行い、これに対する回答として所定の数の音声データを全て取得する構成としているが、このような構成に限られない。１日の単位で所定の数の音声データが得られれば、１回の通話もしくは発話で全ての数の音声データを取得することは必要ではなく、複数回の通話もしくは発話の合計で所定の数の音声データを取得するようにしてもよい。また、この場合の音声データは、図２や図４に示した一連の処理により得られたものに限られない。例えば、これとは無関係に日常生活や業務等において行われた電話等の内容についても自動的に録音しておき、ここで得られた既存の音声データも流用して全体として所定の数の音声データを確保することも可能である。 For example, in the above-described embodiment, a predetermined number of questions are asked in one call or utterance, and a predetermined number of voice data are all acquired as an answer to the question. Absent. If a predetermined number of voice data is obtained in units of one day, it is not necessary to acquire all the number of voice data in one call or utterance, and a predetermined number in total of multiple calls or utterances. The voice data may be acquired. Further, the audio data in this case is not limited to the data obtained by the series of processes shown in FIGS. For example, regardless of this, the contents of telephone calls made in daily life or business are automatically recorded, and the existing voice data obtained here is also diverted to a predetermined number of voices as a whole. It is also possible to secure data.

また、上記の実施の形態では、ユーザ２が能動的に電話機５１により電話を掛けたり、スマートフォン５２でアプリケーションを起動したり等の行為を行う必要があるが、これらの行為がより確実に行われるような仕組みを別途有していてもよい。例えば、毎日所定の時間に、音声データ収集環境の側からＣＴＩサーバ２０やＩＶＲサーバ３０の機能によりユーザ２に自動的に電話を掛けるようにしてもよい。当該電話に応答しない場合にはさらに一定時間後に電話を掛けるようにしてもよい。同様に、スマートフォン５２上の専用アプリケーションが所定の時間に実行を促す通知を行うようにしてもよい。 Further, in the above embodiment, the user 2 needs to actively make a call with the telephone 51 or start an application with the smartphone 52, but these actions are performed more reliably. Such a mechanism may be provided separately. For example, the user 2 may be automatically called by the function of the CTI server 20 or the IVR server 30 from the voice data collection environment side at a predetermined time every day. If the telephone is not answered, a telephone call may be made after a predetermined time. Similarly, a dedicated application on the smartphone 52 may make a notification prompting execution at a predetermined time.

また、毎日所定の時間に、音声分析サーバ４０による分析結果をレポートとしてユーザ２に対して電子メール等により送信し、リマインドするようにしてもよい。また、直近の利用日時（すなわち、音声データの録音と分析を行った日時）から所定の期間以上利用がない場合に電子メール等により通知するようにしてもよい。 Further, at a predetermined time every day, the analysis result by the voice analysis server 40 may be transmitted as a report to the user 2 by e-mail or the like for reminding. Alternatively, notification may be made by e-mail or the like when there is no use for a predetermined period from the most recent use date and time (that is, the date and time when voice data was recorded and analyzed).

本発明は、ユーザの音声のデータを収集する音声データ収集システムに利用可能である。 The present invention is applicable to a voice data collection system that collects user voice data.

１…音声データ収集システム、２…ユーザ、
１０…ＰＢＸ、
２０…ＣＴＩサーバ、２１…収集制御部、
３０…ＩＶＲサーバ、３１…認証部、３２…音声収集部、３３…認証マスタＤＢ、３４…音声データＤＢ、３５…音声ガイダンス情報、
４０…音声分析サーバ、４１…音声データ取得部、４２…音声分析部、４３…分析結果処理部、４４…ユーザＩＦ部、４５…音声データＤＢ、４６…ユーザマスタＤＢ、４７…分析結果ＤＢ、
５１…電話機、５２…スマートフォン、５３…情報処理端末
1 ... voice data collection system, 2 ... user,
10 ... PBX,
20 ... CTI server, 21 ... collection control unit,
30 ... IVR server, 31 ... Authentication unit, 32 ... Voice collection unit, 33 ... Authentication master DB, 34 ... Voice data DB, 35 ... Voice guidance information,
DESCRIPTION OF SYMBOLS 40 ... Voice analysis server, 41 ... Voice data acquisition part, 42 ... Voice analysis part, 43 ... Analysis result process part, 44 ... User IF part, 45 ... Voice data DB, 46 ... User master DB, 47 ... Analysis result DB,
51 ... Telephone, 52 ... Smartphone, 53 ... Information processing terminal

Claims

An audio data collection system for collecting audio data related to a user's utterance,
A PBX, a CTI server, and an IVR server configured for each of one or more locations, and acquiring the user's utterance in a telephone call received from the user as the voice data Voice data collection environment,
Voice connected to each IVR server in each voice data collection environment via a network, processing the voice data acquired from each IVR server, and enabling processing results to be viewed on the information processing terminal of the user An analysis server,
The voice data collection environment issues a question asking the user for an answer in a telephone call received from the user, and records the utterance related to the answer from the user to the question as the voice data. The voice data collection system that repeats issuing the question and obtaining the voice data related to the answer from the user until the voice data reaches a predetermined number.

The voice data collection system according to claim 1,
The voice data collection system in which one or more of the questions issued in a telephone call received from the user by the voice data collection environment have the same contents as others.

The voice data collection system according to claim 1,
When the voice data collection environment receives a telephone call from the user, one or more numbers input via the telephone by the user match one of one or more code values registered in advance. A voice data collection system that issues the question.

The voice data collection system according to claim 1,
The voice analysis server performs voice pathological analysis on the voice data acquired from the IVR server, generates screen information for display based on the analysis result, and requests the user from the information processing terminal. And a voice data collection system for causing the information processing terminal to display contents based on the screen information.

The voice data collection system according to claim 1,
The voice data collection system, wherein the voice data collection environment transmits the acquired voice data to the voice analysis server at predetermined time intervals.

An audio data collection system for collecting audio data related to a user's utterance,
An information processing terminal having a voice recording function;
A voice analysis server that processes the voice data recording the user's utterance acquired from the information processing terminal, and enables a processing result to be viewed on the information processing terminal;
The information processing terminal displays a question asking the user for an answer, records an utterance related to the answer from the user to the question as the voice data, and until the voice data reaches a predetermined number, An audio data collection system that repeats display of the question and acquisition of the audio data related to an answer from the user.

The voice data collection system according to claim 6,
Each of the questions displayed by the information processing terminal is an audio data collection system in which one or more of them have the same content as the others.

The voice data collection system according to claim 6,
The speech analysis server performs speech pathological analysis on the speech data acquired from the information processing terminal, generates screen information for display based on the analysis result, and requests from the information processing terminal of the user And receiving the content based on the screen information on the information processing terminal.