JP7000171B2

JP7000171B2 - Communication systems, communication methods and communication programs

Info

Publication number: JP7000171B2
Application number: JP2018005120A
Authority: JP
Inventors: 健太郎石井
Original assignee: NTT Communications Corp
Current assignee: NTT Communications Corp
Priority date: 2018-01-16
Filing date: 2018-01-16
Publication date: 2022-01-19
Anticipated expiration: 2038-01-16
Also published as: JP2019124815A

Description

本発明は、コミュニケーションシステム、コミュニケーション方法およびコミュニケーションプログラムに関する。 The present invention relates to communication systems, communication methods and communication programs.

近年、音声によるＵＩ（User Interface）／ＵＸ（User Experience）が普及しつつある昨今において、音声ＩＦ（Interface）を通じて数字を確認するケースが増えている。音声を通じて出力される数字には、例として、距離、時間、件数、金額、速度などが挙げられる。例えば、ユーザが端末に対して数字に関する質問を入力した場合に、質問に対応する数値の発話をユーザに対して端末から出力する。 In recent years, UI (User Interface) / UX (User Experience) by voice has become widespread, and there are increasing cases of confirming numbers through voice IF (Interface). Examples of numbers output through voice include distance, time, number of cases, amount of money, speed, and the like. For example, when the user inputs a question about a number to the terminal, the utterance of the numerical value corresponding to the question is output from the terminal to the user.

特開２０１４－２１５３９６号公報Japanese Unexamined Patent Publication No. 2014-215396

しかしながら、従来の手法では、ユーザに対する特定の発話が長くなり、円滑なコミュニケーションを行うことができなくなる場合があるという課題があった。例えば、ユーザから資産残高の金額について質問があった場合に、資産残高の金額の桁数が大きいと、資産残高の金額をそのまま発話した際に無駄に長くなる場合があり、円滑なコミュニケーションを阻害する場合があった。 However, the conventional method has a problem that a specific utterance to the user becomes long and smooth communication may not be possible. For example, when a user asks about the amount of the asset balance, if the number of digits of the asset balance amount is large, it may become unnecessarily long when the asset balance amount is spoken as it is, which hinders smooth communication. There was a case.

上述した課題を解決し、目的を達成するために、本発明のコミュニケーションシステムは、端末から受け付けた入力データに対する応答として、前記端末に出力する音声データを生成する生成部と、前記生成部によって生成された音声データが特定の発話を含む場合には、該特定の発話が所定の条件を満たすか否かを判定する判定部と、前記判定部によって所定の条件を満たすと判定された場合には、前記特定の発話の一部を省略した音声データを前記端末に出力する出力部とを有することを特徴とする。 In order to solve the above-mentioned problems and achieve the object, the communication system of the present invention is generated by a generation unit that generates voice data to be output to the terminal as a response to input data received from the terminal, and a generation unit. When the voiced voice data includes a specific utterance, a determination unit for determining whether or not the specific utterance satisfies a predetermined condition, and when the determination unit determines that the predetermined condition is satisfied, It is characterized by having an output unit for outputting voice data to the terminal, omitting a part of the specific utterance.

また、本発明のコミュニケーション方法は、コミュニケーションシステムによって実行されるコミュニケーション方法であって、端末から受け付けた入力データに対する応答として、前記端末に出力する音声データを生成する生成工程と、前記生成工程によって生成された音声データが特定の発話を含む場合には、該特定の発話が所定の条件を満たすか否かを判定する判定工程と、前記判定工程によって所定の条件を満たすと判定された場合には、前記特定の発話の一部を省略した音声データを前記端末に出力する出力工程とを含んだことを特徴とする。 Further, the communication method of the present invention is a communication method executed by a communication system, and is generated by a generation step of generating voice data to be output to the terminal as a response to input data received from the terminal and a generation step of generating the voice data to be output to the terminal. When the voiced voice data includes a specific utterance, a determination step of determining whether or not the specific utterance satisfies a predetermined condition, and a determination step of determining whether or not the predetermined condition is satisfied by the determination step. It is characterized by including an output step of outputting voice data in which a part of the specific utterance is omitted to the terminal.

また、本発明のコミュニケーションプログラムは、端末から受け付けた入力データに対する応答として、前記端末に出力する音声データを生成する生成ステップと、前記生成ステップによって生成された音声データが特定の発話を含む場合には、該特定の発話が所定の条件を満たすか否かを判定する判定ステップと、前記判定ステップによって所定の条件を満たすと判定された場合には、前記特定の発話の一部を省略した音声データを前記端末に出力する出力ステップとをコンピュータに実行させることを特徴とする。 Further, the communication program of the present invention includes a generation step of generating voice data to be output to the terminal as a response to input data received from the terminal, and a case where the voice data generated by the generation step includes a specific utterance. Is a determination step for determining whether or not the specific utterance satisfies a predetermined condition, and a voice in which a part of the specific utterance is omitted when the determination step determines that the predetermined condition is satisfied. It is characterized in that a computer executes an output step of outputting data to the terminal.

本発明によれば、ユーザに対する特定の発話を省略して、円滑なコミュニケーションを図ることができるという効果を奏する。 According to the present invention, there is an effect that smooth communication can be achieved by omitting a specific utterance to the user.

図１は、第１の実施形態に係るコミュニケーションシステムの構成例を示すブロック図である。FIG. 1 is a block diagram showing a configuration example of a communication system according to the first embodiment. 図２は、資産データ記憶部に記憶されるデータの一例を示す図である。FIG. 2 is a diagram showing an example of data stored in the asset data storage unit. 図３は、判定用データ記憶部に記憶されるデータの一例を示す図である。FIG. 3 is a diagram showing an example of data stored in the determination data storage unit. 図４は、概算値を出力した後、ユーザから詳細な値の要求があった場合に正確な数字を出力する例を説明する図である。FIG. 4 is a diagram illustrating an example of outputting an accurate number when a user requests a detailed value after outputting an approximate value. 図５は、省略する下位の桁数を決定する処理例を説明する図である。FIG. 5 is a diagram illustrating a processing example for determining the number of lower digits to be omitted. 図６は、サーバによって実行される処理の流れの一例を示すフローチャートである。FIG. 6 is a flowchart showing an example of the flow of processing executed by the server. 図７は、コミュニケーションプログラムを実行するコンピュータを示す図である。FIG. 7 is a diagram showing a computer that executes a communication program.

以下に、本願に係るコミュニケーションシステム、コミュニケーション方法およびコミュニケーションプログラムの実施の形態を図面に基づいて詳細に説明する。なお、この実施の形態により本願に係るコミュニケーションシステム、コミュニケーション方法およびコミュニケーションプログラムが限定されるものではない。 Hereinafter, the communication system, the communication method, and the embodiment of the communication program according to the present application will be described in detail with reference to the drawings. It should be noted that this embodiment does not limit the communication system, communication method and communication program according to the present application.

［第１の実施形態］
以下の実施の形態では、第１の実施形態に係るコミュニケーションシステム１００におけるサーバ１０の構成、サーバ１０の処理の流れを順に説明し、最後に第１の実施形態による効果を説明する。 [First Embodiment]
In the following embodiment, the configuration of the server 10 in the communication system 100 according to the first embodiment and the processing flow of the server 10 will be described in order, and finally, the effect of the first embodiment will be described.

［コミュニケーションシステムの構成］
まず、図１を用いて、コミュニケーションシステム１００の構成を説明する。図１は、第１の実施形態に係るコミュニケーションシステムの構成例を示すブロック図である。図１に示すように、コミュニケーションシステム１００は、サーバ１０と端末２０とを有し、サーバ１０と端末２０とがネットワーク３０を介して接続されている。 [Communication system configuration]
First, the configuration of the communication system 100 will be described with reference to FIG. FIG. 1 is a block diagram showing a configuration example of a communication system according to the first embodiment. As shown in FIG. 1, the communication system 100 has a server 10 and a terminal 20, and the server 10 and the terminal 20 are connected to each other via a network 30.

サーバ１０は、端末２０から受け付けた入力データに対する応答として、音声データを生成し、端末２０に送信するコンピュータである。例えば、サーバ１０は、端末２０によって送信されるユーザの発話データとユーザに関する情報とに応じて、ユーザの発話に対する音声データを生成し、該音声データを端末２０に送信する。端末２０は、例えば、携帯電話機、スマートフォン、スマートスピーカ、ＰＤＡ（Personal Digital Assistant）、タブレット型ＰＣ、ノート型ＰＣ、デスクトップ型ＰＣ等の情報処理装置である。 The server 10 is a computer that generates voice data and transmits it to the terminal 20 as a response to the input data received from the terminal 20. For example, the server 10 generates voice data for the user's utterance according to the user's utterance data transmitted by the terminal 20 and information about the user, and transmits the voice data to the terminal 20. The terminal 20 is an information processing device such as a mobile phone, a smartphone, a smart speaker, a PDA (Personal Digital Assistant), a tablet PC, a notebook PC, or a desktop PC.

サーバ１０は、通信処理部１１と、制御部１２と、記憶部１３とを有する。なお、以下の説明では、サーバ１０が、ユーザの資産残高などを管理するサービスを提供するアプリケーション機能を有する例を説明するが、これに限定されるものではなく、ユーザと音声によるコミュニケーションを行うサービスであれば何にでも適用できるものとする。 The server 10 has a communication processing unit 11, a control unit 12, and a storage unit 13. In the following description, an example in which the server 10 has an application function for providing a service for managing the asset balance of the user will be described, but the present invention is not limited to this, and a service for communicating with the user by voice. If so, it can be applied to anything.

通信処理部１１は、ネットワーク３０を介して端末２０との間で各種データを送受信する。通信処理部１１は、例えば、ＮＩＣ等に該当し、端末２０との間でユーザの音声データの送受信を行う。なお、通信処理部１１は、端末２０との間で送受信される音声データとして、例えば、発話に関する音声のデータそのものを送受信してもよいし、ユーザが発話した音声のテキストデータ等を受信したり、端末２０側で発話される音声のテキストデータ等を送信したりするようにしてもよい。 The communication processing unit 11 transmits and receives various data to and from the terminal 20 via the network 30. The communication processing unit 11 corresponds to, for example, a NIC or the like, and transmits / receives user voice data to / from the terminal 20. The communication processing unit 11 may, for example, send and receive the voice data itself related to the utterance as the voice data to be transmitted and received to and from the terminal 20, or receive the text data of the voice spoken by the user. , Text data of voice spoken on the terminal 20 side may be transmitted.

記憶部１３は、制御部１２による各種処理に必要なデータおよびプログラムを格納するが、特に本発明に密接に関連するものとしては、資産データ記憶部１３ａおよび判定用データ記憶部１３ｂを有する。例えば、記憶部１３は、ＲＡＭ（Random Access Memory）、フラッシュメモリ（Flash Memory）等の半導体メモリ素子、又は、ハードディスク、光ディスク等の記憶装置などである。 The storage unit 13 stores data and programs necessary for various processes by the control unit 12, and particularly closely related to the present invention, the storage unit 13 includes an asset data storage unit 13a and a determination data storage unit 13b. For example, the storage unit 13 is a semiconductor memory element such as a RAM (Random Access Memory) or a flash memory (Flash Memory), or a storage device such as a hard disk or an optical disk.

資産データ記憶部１３ａは、各ユーザの資産残高を記憶する。例えば、資産データ記憶部１３ａは、図２に例示するように、ユーザ名に対応付けて資産残高を記憶している。図２の例を挙げて具体的に説明すると、資産データ記憶部１３ａは、ユーザ名「Ａ」に対応付けて資産残高「7,654,321」を記憶し、ユーザ名「Ｂ」に対応付けて資産残高「1,541,562」を記憶する。なお、ここでは、ユーザの情報の例として、資産残高のみを記憶する場合を説明したが、他にも収入や支出等の種々の情報を記憶するようにしてもよい。 The asset data storage unit 13a stores the asset balance of each user. For example, the asset data storage unit 13a stores the asset balance in association with the user name, as illustrated in FIG. Specifically, the asset data storage unit 13a stores the asset balance “7,654,321” in association with the user name “A” and the asset balance “B” in association with the user name “B”. 1,541,562 "is memorized. Here, as an example of user information, the case where only the asset balance is stored has been described, but various information such as income and expenditure may also be stored.

判定用データ記憶部１３ｂは、後述する判定部１２ｂが行う判定処理が実行される際に用いられるデータを記憶する。例えば、判定用データ記憶部１３ｂは、発話に含まれる数字の「桁数」と、数字の発話に掛かる「発話時間」とを対応付けて記憶する。 The determination data storage unit 13b stores data used when the determination process performed by the determination unit 12b, which will be described later, is executed. For example, the determination data storage unit 13b stores the "number of digits" of the numbers included in the utterance and the "speech time" required for the utterance of the numbers in association with each other.

図３の例を挙げて具体的に説明すると、判定用データ記憶部１３ｂは、桁数「１」に対応付けて発話時間「０．５秒」を記憶する。これは、１桁の数字を発話するのに「０．５秒」掛かるものとして設定されていることを意味する。また、判定用データ記憶部１３ｂは、桁数「２」に対応付けて発話時間「１．２秒」を記憶し、桁数「３」に対応付けて発話時間「２．３秒」を記憶し、桁数「４」に対応付けて発話時間「３．０秒」を記憶する。 More specifically, the determination data storage unit 13b stores the utterance time "0.5 seconds" in association with the number of digits "1". This means that it is set to take "0.5 seconds" to speak a one-digit number. Further, the determination data storage unit 13b stores the utterance time "1.2 seconds" in association with the number of digits "2" and stores the utterance time "2.3 seconds" in association with the number of digits "3". Then, the utterance time "3.0 seconds" is stored in association with the number of digits "4".

なお、図３の例では、桁数ごとに発話時間を設定しているが、より細かく設定してもよい。例えば、図３の例では、桁数が１の場合、数字が「０」～「９」のいずれであっても発話時間が「０．５秒」と設定されているが、厳密には、「０」～「９」それぞれで発話時間が異なる。このため、数字ごとに発話時間を設定するようにしてもよい。 In the example of FIG. 3, the utterance time is set for each number of digits, but it may be set more finely. For example, in the example of FIG. 3, when the number of digits is 1, the utterance time is set to "0.5 seconds" regardless of whether the number is "0" to "9". The utterance time is different for each of "0" to "9". Therefore, the utterance time may be set for each number.

制御部１２は、各種の処理手順などを規定したプログラムおよび所要データを格納するための内部メモリを有し、これらによって種々の処理を実行するが、特に本発明に密接に関連するものとしては、生成部１２ａ、判定部１２ｂおよび出力部１２ｃを有する。ここで、制御部１２は、ＣＰＵ（Central Processing Unit）やＭＰＵ（Micro Processing Unit）などの電子回路やＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）などの集積回路である。 The control unit 12 has an internal memory for storing a program that defines various processing procedures and required data, and executes various processing by these. However, the control unit 12 is particularly closely related to the present invention. It has a generation unit 12a, a determination unit 12b, and an output unit 12c. Here, the control unit 12 is an electronic circuit such as a CPU (Central Processing Unit) or an MPU (Micro Processing Unit), or an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).

生成部１２ａは、端末２０から受け付けた入力データに対する応答として、端末２０に出力する音声データを生成する。例えば、生成部１２ａは、端末２０からユーザＡの発話データとして、資産残高の質問を受信すると、資産データ記憶部１３ａからユーザＡの資産残高「7,654,321」を読み出し、「7,654,321円です（ななひゃくろくじゅうごまんよんせんさんびゃくにじゅういちえんです）」と発話する音声データを生成する。 The generation unit 12a generates voice data to be output to the terminal 20 as a response to the input data received from the terminal 20. For example, when the generation unit 12a receives a question about the asset balance from the terminal 20 as the utterance data of the user A, the generation unit 12a reads out the asset balance "7,654,321" of the user A from the asset data storage unit 13a and costs "7,654,321 yen" (Nanahyaku). Rokujugoman Yonsen Sanbyaku Nijuichien) ”is generated.

また、生成部１２ａは、「7,654,321」の下位４桁を省略した概算値「7,650,000」を用いて、「おおよそ765万円です（おおよそななひゃくろくじゅうごまんえんです）」と発話する音声データも生成する。また、数字が７桁である場合には、下位４桁を省略する等というように、省略する下位の桁数については、予め設定されているものとする。なお、音声データの生成手法については、既存の手法によって行うものであり、ここでは説明を省略する。 In addition, the generation unit 12a uses the estimated value "7,650,000", which omits the lower four digits of "7,654,321", and speaks "about 7.65 million yen (about Nanahyaku Rokujugomanen)". Also generate. Further, when the number is 7 digits, the number of lower digits to be omitted is set in advance, such as omitting the lower 4 digits. It should be noted that the voice data generation method is performed by an existing method, and the description thereof will be omitted here.

判定部１２ｂは、生成部１２ａによって生成された音声データが特定の発話を含む場合には、該特定の発話が所定の条件を満たすか否かを判定する。具体的には、判定部１２ｂは、生成部１２ａによって生成された音声データのうち、特定の発話として、数字の発話を含む場合には、該数字の発話が所定の条件を満たすか否かを判定する。 When the voice data generated by the generation unit 12a includes a specific utterance, the determination unit 12b determines whether or not the specific utterance satisfies a predetermined condition. Specifically, when the determination unit 12b includes a numerical utterance as a specific utterance among the voice data generated by the generation unit 12a, the determination unit 12b determines whether or not the utterance of the numerical value satisfies a predetermined condition. judge.

例えば、判定部１２ｂは、数字の発話に掛かる発話時間に対する、下位の桁の数字を省略した数字の発話の発話時間の割合が所定の閾値ｎ未満になるか否かを判定する。そして、判定部１２ｂは、数字の発話における数値と、下位の桁の数字を省略した数値との差が所定の閾値ｍ未満になるか否かを判定する。 For example, the determination unit 12b determines whether or not the ratio of the utterance time of the utterance of the number omitting the lower digit number to the utterance time of the utterance of the number is less than the predetermined threshold value n. Then, the determination unit 12b determines whether or not the difference between the numerical value in the utterance of the numerical value and the numerical value omitting the lower digit number is less than the predetermined threshold value m.

以下では、具体例として、数字を省略しないで正確な数字を発話する場合に出力する音声データが「7,654,321円です（ななひゃくろくじゅうごまんよんせんさんびゃくにじゅういちえんです）」であり、数字を省略して概算値で発話する場合に出力する音声データが「おおよそ765万円です（おおよそななひゃくろくじゅうごまんえんです）」である場合を例に説明する。 In the following, as a specific example, the voice data to be output when uttering an accurate number without omitting the number is "7,654,321 yen (Nanahyaku Rokujugoman Yonsen Sanbyaku Nijuichien)", which is a number. The case where the voice data to be output when speaking with the approximate value is "Approximately 7.65 million yen (Approximately Nanahyaku Rokujugomanen)" will be explained as an example.

判定部１２ｂは、「7,654,321（ななひゃくろくじゅうごまんよんせんさんびゃくにじゅういち）」の発話時間と、「7,650,000（ななひゃくろくじゅうごまん）」の発話時間とを判定用データ記憶部１３ｂに記憶されたデータを参照して特定する。ここで、判定部１２ｂは、「7,654,321（ななひゃくろくじゅうごまんよんせんさんびゃくにじゅういち）」の発話時間については７桁の発話時間として特定し、「7,650,000（ななひゃくろくじゅうごまん）」の発話時間については３桁の発話時間として特定する。 The determination unit 12b determines the utterance time of "7,654,321 (Nanahyaku Rokujugoman)" and the utterance time of "7,650,000 (Nanahyaku Rokujugoman)". Identify by referring to the data stored in. Here, the judgment unit 12b specifies the utterance time of "7,654,321 (Nanahyaku Rokujugoman Yonsen Sanbyaku Nijuichi)" as a 7-digit utterance time, and "7,650,000 (Nanahyaku Rokujugoman)". The utterance time of "" is specified as a three-digit utterance time.

そして、判定部１２ｂは、「7,650,000（ななひゃくろくじゅうごまん）」の発話時間から「7,654,321（ななひゃくろくじゅうごまんよんせんさんびゃくにじゅういち）」の発話時間で除算し、除算した値が閾値ｎ％（例えば、７０％）未満であるか判定する。なお、上記では、判定部１２ｂが判定用データ記憶部１３ｂに記憶されたデータを参照して数字の発話部分の発話時間を特定し、判定する場合を例に説明したが、これに限定されるものではない。例えば、判定部１２ｂは、「7,654,321円です（ななひゃくろくじゅうごまんよんせんさんびゃくにじゅういちえんです）」の実際の発話時間と「おおよそ765万円です（おおよそななひゃくろくじゅうごまんえんです）」の実際の発話時間とを計測し、上記の判定処理を行ってもよい。 Then, the judgment unit 12b divides the utterance time of "7,650,000 (Nanahyaku Rokujugoman)" by the utterance time of "7,654,321 (Nanahyaku Rokujugoman Yonsen Sanbyaku Nijuichi)" and divides the value. Is less than the threshold value n% (for example, 70%). In the above description, the case where the determination unit 12b specifies the utterance time of the utterance portion of the numerical value by referring to the data stored in the determination data storage unit 13b and makes a determination is described as an example, but the present invention is limited to this. It's not a thing. For example, the judgment unit 12b has the actual utterance time of "7,654,321 yen (Nanahyaku Rokujugoman Yonsen Sanbyaku Nijuichien)" and "Approximately 7.65 million yen (Approximately Nanahyaku Rokujugomanen). It is possible to measure the actual utterance time of ")" and perform the above determination process.

続いて、判定部１２ｂは、除算した値が閾値ｎ未満である場合には、「7,654,321」から「7,650,000」を減算し、減算した値「4,321」を「7,654,321」で除算する。そして、判定部１２ｂは、除算した値が閾値ｍ（例えば、３％）未満であるか判定する。この結果、判定部１２ｂは、除算した値が閾値ｍ未満である場合には、数字の発話のうち下位の桁の数字を省略した音声データ「おおよそ765万円です（おおよそななひゃくろくじゅうごまんえんです）」を端末２０に出力するよう出力部１２ｃに指示する。なお、判定部１２ｂは、発話時間等を用いずに、発話に含まれる数字が予め設定された所定の桁以上であるか否かを判定し、所定の桁以上であれば下位の桁の数字を省略した音声データを出力すると判定するという、簡単な処理を行うようにしてもよい。 Subsequently, when the divided value is less than the threshold value n, the determination unit 12b subtracts "7,650,000" from "7,654,321" and divides the subtracted value "4,321" by "7,654,321". Then, the determination unit 12b determines whether the divided value is less than the threshold value m (for example, 3%). As a result, when the divided value is less than the threshold value m, the determination unit 12b omits the lower digit of the utterance of the number and omits the voice data "Approximately 7.65 million yen (approximately Nanahyaku Rokujugo). I'm sorry) ”is instructed to the output unit 12c to output to the terminal 20. The determination unit 12b determines whether or not the number included in the utterance is a predetermined digit or more set in advance without using the utterance time or the like, and if it is a predetermined digit or more, the lower digit number. It may be possible to perform a simple process of determining that the audio data in which is omitted is to be output.

出力部１２ｃは、判定部１２ｂによって所定の条件を満たすと判定された場合には、特定の発話の一部を省略した音声データを端末２０に出力する。具体的には、出力部１２ｃは、判定部１２ｂによって所定の条件を満たすと判定された場合には、数字の発話のうち下位の桁の数字を省略した音声データを端末２０に出力する。 When the determination unit 12b determines that the predetermined condition is satisfied, the output unit 12c outputs voice data in which a part of the specific utterance is omitted to the terminal 20. Specifically, when the determination unit 12b determines that the predetermined condition is satisfied, the output unit 12c outputs the voice data in which the lower digit of the utterance of the number is omitted to the terminal 20.

例えば、出力部１２ｃは、判定部１２ｂによって数字の発話に掛かる発話時間に対する、下位の桁の数字を省略した数字の発話の発話時間の割合が所定の閾値未満になる場合には、数字の発話のうち下位の桁の数字を省略した音声データを端末２０に出力する。また、出力部１２ｃは、数字の発話における数値と、下位の桁の数字を省略した数値との差が所定の閾値未満になる場合には、数字の発話のうち下位の桁の数字を省略した音声データを端末２０に出力する。 For example, when the ratio of the utterance time of the number obtained by omitting the lower digit to the utterance time of the number taken by the determination unit 12b by the determination unit 12b is less than a predetermined threshold value, the output unit 12c speaks the number. The voice data in which the lower digit is omitted is output to the terminal 20. Further, when the difference between the numerical value in the utterance of the number and the numerical value in which the lower digit number is omitted is less than a predetermined threshold value, the output unit 12c omits the lower digit number in the utterance of the numerical value. The voice data is output to the terminal 20.

上記の例を挙げて説明すると、例えば、出力部１２ｃは、音声データ「おおよそ765万円です（おおよそななひゃくろくじゅうごまんえんです）」を端末２０に出力するよう指示を受け付けた場合には、音声データ「おおよそ765万円です（おおよそななひゃくろくじゅうごまんえんです）」を端末２０に出力する。 To explain using the above example, for example, when the output unit 12c receives an instruction to output the voice data "approximately 7.65 million yen (approximately Nanahyaku Rokujugomanen)" to the terminal 20. Outputs the voice data "Approximately 7.65 million yen (approximately Nanahyaku Rokujugomanen)" to the terminal 20.

また、判定部１２ｂは、数字の発話に掛かる発話時間が所定の時間を超えるか否かを判定するようにしてもよい。この場合には、出力部１２ｃは、判定部１２ｂによって数字の発話に掛かる発話時間が所定の時間を超えると判定された場合には、発話時間が所定の時間以内となるように、数字の発話のうち下位の桁の数字を省略した音声データを端末２０に出力する。 Further, the determination unit 12b may determine whether or not the utterance time required for uttering the numbers exceeds a predetermined time. In this case, when the determination unit 12b determines that the utterance time required for uttering the numbers exceeds the predetermined time, the output unit 12c speaks the numbers so that the utterance time is within the predetermined time. The voice data in which the lower digit is omitted is output to the terminal 20.

例えば、判定部１２ｂは、「7,654,321円です（ななひゃくろくじゅうごまんよんせんさんびゃくにじゅういちえんです）」の発話時間が所定の時間を超えるか判定する。そして、出力部１２ｃは、判定部１２ｂによって発話時間が所定の時間を超えると判定された場合には、発話時間が所定の時間以内となるように、数字の発話のうち下位の桁の数字を省略した音声データ「おおよそ765万円です（おおよそななひゃくろくじゅうごまんえんです）」を端末２０に出力する。つまり、資産残高にもよるが、自分の資産残高を確認するに当たり「4,321円」の部分を精緻に出力されることにはあまり意味がない上に時間の無駄であるため、このような場合には、概算値を出力する。 For example, the determination unit 12b determines whether the utterance time of "7,654,321 yen (Nanahyaku Rokujugoman Yonsen Sanbyaku Nijuichien)" exceeds a predetermined time. Then, when the determination unit 12b determines that the utterance time exceeds a predetermined time, the output unit 12c sets the lower digit of the utterance of the number so that the utterance time is within the predetermined time. The omitted voice data "Approximately 7.65 million yen (approximately Nanahyaku Rokujugomanen)" is output to the terminal 20. In other words, although it depends on the asset balance, it does not make much sense to output the "4,321 yen" part precisely when checking your own asset balance, and it is a waste of time. Outputs an approximate value.

また、出力部１２ｃは、下位の桁の数字を省略した音声データを端末２０に出力した後に、端末２０から正確な数字の発話の要求を受け付けた場合には、下位の桁の数字を省略していない数字の発話を出力する。なお、正確な数字の発話の要求を判別する方法については、例えば、予め設定されたワードや文章（例えば、「詳しく」、「詳細な数字」等）を含むか判定することで行っているものとする。 Further, when the output unit 12c receives a request for utterance of an accurate number from the terminal 20 after outputting the voice data in which the lower digit number is omitted to the terminal 20, the output unit 12c omits the lower digit number. Outputs utterances of unnumbered numbers. As for the method of determining the request for utterance of an accurate number, for example, it is performed by determining whether or not a preset word or sentence (for example, "detailed", "detailed number", etc.) is included. And.

ここで、図４を用いて、概算値を出力した後、ユーザから詳細な値の要求があった場合に正確な数字を出力する例を説明する。図４は、概算値を出力した後、ユーザから詳細な値の要求があった場合に正確な数字を出力する例を説明する図である。図４に例示するように、例えば、ユーザが端末２０に対して「いまの資産残高を教えて」と発話すると、端末２０は、ユーザが発話した音声データをサーバ１０に送信する。そして、サーバ１０は、上述した処理を行った上で、「おおよそ765万円です（おおよそななひゃくろくじゅうごまんえんです）」と発話する音声データを端末２０に送信する。 Here, an example of outputting an accurate number when a user requests a detailed value after outputting an approximate value will be described with reference to FIG. FIG. 4 is a diagram illustrating an example of outputting an accurate number when a user requests a detailed value after outputting an approximate value. As illustrated in FIG. 4, for example, when the user speaks to the terminal 20 "Tell me the current asset balance", the terminal 20 transmits the voice data spoken by the user to the server 10. Then, after performing the above-mentioned processing, the server 10 transmits the voice data saying "It is about 7.65 million yen (about Nanahyaku Rokujugomanen)" to the terminal 20.

そして、端末２０は、「おおよそ765万円です（おおよそななひゃくろくじゅうごまんえんです）」と発話した後、ユーザから正確な数字の発話の要求として、「詳しく」という発話を受け付ける。この場合には、サーバ１０は、正確な数字の音声データとして、「7,654,321円です（ななひゃくろくじゅうごまんよんせんさんびゃくにじゅういちえんです）」と発話する音声データを端末２０に送信する。 Then, after saying "It is about 7.65 million yen (about Nanahyaku Rokujugomanen)", the terminal 20 accepts the utterance "Details" as a request from the user to utter an accurate number. In this case, the server 10 transmits voice data saying "It is 7,654,321 yen (Nanahyaku Rokujugoman Yonsen Sanbyaku Nijuichien)" as voice data of accurate numbers to the terminal 20.

なお、上記の説明では、発話に含まれる数字として、金額についての例を説明したが、これに限定されるものではなく、時間や速度などにも適用することができる。例えば、「音速（マッハ）」は一般的に１，２２５ｋｍ／ｈと定義されるが、上記の例に則り、ｎ＝７０％、ｍ＝３％をあてはめるならば、「およそ１，２００ｋｍ／ｈ」と発話させることができる。 In the above explanation, an example of the amount of money has been described as a number included in the utterance, but the present invention is not limited to this, and can be applied to time, speed, and the like. For example, "sound velocity (Mach)" is generally defined as 1,225 km / h, but if n = 70% and m = 3% are applied according to the above example, "about 1,200 km / h". Can be made to speak.

また、上記の説明では、省略する下位の桁数が事前に設定されている場合を説明したが、閾値ｍに応じて、決定する処理を行ってもよい。ここで、図５を用いて、省略する下位の桁数を決定する処理を説明する。図５は、省略する下位の桁数を決定する処理例を説明する図である。図５の例では、円周率「３．１４１５９２６５・・・」に対して、ｍ＝３％と設定されている場合に省略する下位の桁数と、ｍ＝５％と設定される場合に省略する下位の桁数とを例示している。 Further, in the above description, the case where the number of lower digits to be omitted is set in advance has been described, but the process of determining may be performed according to the threshold value m. Here, a process of determining the number of lower digits to be omitted will be described with reference to FIG. FIG. 5 is a diagram illustrating a processing example for determining the number of lower digits to be omitted. In the example of FIG. 5, the number of lower digits to be omitted when m = 3% is set for the circumference ratio “3.14159265 ...” and when m = 5% is set. The number of lower digits to be omitted is illustrated.

サーバ１０は、円周率「３．１４１５９２６５・・・」の概算値を「３」とする場合には、「３．１４１５９２６５・・・」から「３」を減算し、減算した値「０．１４１５９２６５」を「３．１４１５９２６５・・・」で除算する。そして、判定部１２ｂは、除算した値が「０．０４５・・・」となり、閾値ｍ＝５％である場合には、閾値未満となるため、小数点第１位以下の下位の桁数を省略すると決定する。 When the approximate value of the circumference ratio "3.14159265 ..." is set to "3", the server 10 subtracts "3" from "3.14159265 ..." and subtracts the subtracted value "0. 14159265 ”is divided by“ 3.14159265 ... ”. Then, in the determination unit 12b, the divided value becomes "0.045 ...", and when the threshold value m = 5%, the value is less than the threshold value, so that the number of lower digits after the first decimal place is omitted. Then it is decided.

一方、閾値ｍ＝３％である場合には、除算した値「０．０４５・・・」が閾値ｍを超えてしまう。この場合には、サーバ１０は、円周率「３．１４１５９２６５・・・」の概算値を「３．１」として、「３．１４１５９２６５・・・」から「３．１」を減算し、減算した値「０．０４１５９２６５」を「３．１４１５９２６５・・・」で除算する。そして、判定部１２ｂは、除算した値が「０．１３・・・」となり、閾値ｍ＝３％未満となるため、小数点第２位以下の下位の桁数を省略すると決定する。このように、省略する下位の桁数を決定する処理を行ってもよい。 On the other hand, when the threshold value m = 3%, the divided value "0.045 ..." exceeds the threshold value m. In this case, the server 10 sets the approximate value of the circumference ratio "3.14159265 ..." to "3.1", subtracts "3.1" from "3.14159265 ...", and subtracts it. The value "0.04159265" is divided by "3.1418265 ...". Then, the determination unit 12b determines that the divided value is "0.13 ..." and the threshold value m is less than 3%, so that the number of lower digits after the second decimal place is omitted. In this way, the process of determining the number of lower digits to be omitted may be performed.

例えば、正確な金額というのは、視覚的に把握する場合には１円単位で出力することにより実現できる。ただし、この場合には判読を容易にするために、３桁ごとにカンマを加えるなど、おおよその金額を理解しやすくする工夫がなされる。音声デバイスでＵＩ／ＵＸを実現する場合には、このカンマに相当する概念がないため、１円単位で金額を読み上げることは、情報の正しい伝達につながらない。このため、サーバ１０では、所定の条件を満たす場合には、概算値を出力することで、音声ＵＩ／ＵＸによる無駄に正しい数値の出力を回避し、人とコンピュータのコミュニケーションの円滑化を実現する。 For example, an accurate amount of money can be realized by outputting it in units of 1 yen when visually grasping it. However, in this case, in order to make it easier to read, a comma is added every three digits to make it easier to understand the approximate amount. When realizing UI / UX with a voice device, there is no concept corresponding to this comma, so reading out the amount in units of 1 yen does not lead to correct transmission of information. Therefore, when the predetermined condition is satisfied, the server 10 outputs an approximate value to avoid unnecessary output of a correct numerical value by the voice UI / UX and realize smooth communication between a person and a computer. ..

なお、上記の説明では、発話に含まれる数字の下位の桁数を省略する場合を説明したが、これに限定されるものではなく、特定の発話の一部を省略するようにしてもよい。例えば、サーバ１０は、発話に住所として「東京都千代田区霞が関○丁目・・・」が含まれる場合に、この部分について「東京都千代田区」と発話する音声データを端末２０に送信し、それ以降の「霞が関○丁目・・・」については省略するようにしてもよい。 In the above description, the case where the number of lower digits of the number included in the utterance is omitted has been described, but the present invention is not limited to this, and a part of a specific utterance may be omitted. For example, when the utterance includes "Kasumigaseki-chome, Chiyoda-ku, Tokyo ...", the server 10 transmits voice data to the terminal 20 to say "Chiyoda-ku, Tokyo" for this part. Subsequent "Kasumigaseki ○ chome ..." may be omitted.

［サーバの処理手順］
次に、図６を用いて、第１の実施形態に係るサーバ１０による処理手順の例を説明する。図６は、サーバによって実行される処理の流れの一例を示すフローチャートである。 [Server processing procedure]
Next, an example of the processing procedure by the server 10 according to the first embodiment will be described with reference to FIG. FIG. 6 is a flowchart showing an example of the flow of processing executed by the server.

まず、図６に例示するように、サーバ１０の生成部１２ａは、端末２０から入力データを受け付けると（ステップＳ１０１肯定）、端末２０に出力する音声データを生成する（ステップＳ１０２）。そして、判定部１２ｂは、正確な数字を発話した場合の発話時間に対する下位の桁を省略した場合の発話時間の割合がｎ％未満であるか否かを判定する（ステップＳ１０３）。 First, as illustrated in FIG. 6, when the generation unit 12a of the server 10 receives the input data from the terminal 20 (step S101 affirmative), the generation unit 12a generates the voice data to be output to the terminal 20 (step S102). Then, the determination unit 12b determines whether or not the ratio of the utterance time when the lower digit to the utterance time when an accurate number is spoken is omitted is less than n% (step S103).

この結果、判定部１２ｂは、正確な数字を発話した場合の発話時間に対する下位の桁を省略した場合の発話時間の割合がｎ％未満であると判定された場合には（ステップＳ１０３肯定）、正確な数字と下位の桁を省略した数字との差がｍ未満であるか否かを判定する（ステップＳ１０４）。 As a result, when the determination unit 12b determines that the ratio of the utterance time when the lower digit is omitted to the utterance time when an accurate number is spoken is less than n% (step S103 affirmative), It is determined whether or not the difference between the exact number and the number omitting the lower digit is less than m (step S104).

続いて、判定部１２ｂが正確な数字と下位の桁を省略した数字との差がｍ未満であると判定した場合には（ステップＳ１０４肯定）、出力部１２ｃは、下位の桁の数字を省略した音声データを端末に出力する（ステップＳ１０５）。 Subsequently, when the determination unit 12b determines that the difference between the accurate number and the number with the lower digit omitted is less than m (step S104 affirmative), the output unit 12c omits the lower digit number. The voice data is output to the terminal (step S105).

また、判定部１２ｂが正確な数字を発話した場合の発話時間に対する下位の桁を省略した場合の発話時間の割合がｎ％以上であると判定した場合（ステップＳ１０３否定）、もしくは、判定部１２ｂが正確な数字と下位の桁を省略した数字との差がｍ以上であると判定した場合には（ステップＳ１０４否定）、出力部１２ｃは、正確な数字の音声データを端末２０に出力する（ステップＳ１０６）。 Further, when it is determined that the ratio of the utterance time when the lower digit to the utterance time when the determination unit 12b utters an accurate number is omitted is n% or more (step S103 negation), or when the determination unit 12b When it is determined that the difference between the accurate number and the number omitting the lower digit is m or more (negation of step S104), the output unit 12c outputs the voice data of the accurate number to the terminal 20 (the voice data of the accurate number is output to the terminal 20). Step S106).

［第１の実施形態の効果］
第１の実施形態に係るコミュニケーションシステム１００のサーバ１０は、端末２０から受け付けた入力データに対する応答として、端末２０に出力する音声データを生成し、生成された音声データが特定の発話を含む場合には、該特定の発話が所定の条件を満たすか否かを判定する。そして、サーバ１０は、所定の条件を満たすと判定された場合には、特定の発話の一部を省略した音声データを端末２０に出力する。このため、サーバ１０では、ユーザに対する特定の発話を省略して、音声ＵＩ／ＵＸによる円滑なコミュニケーションを図ることが可能となる。 [Effect of the first embodiment]
The server 10 of the communication system 100 according to the first embodiment generates voice data to be output to the terminal 20 as a response to the input data received from the terminal 20, and the generated voice data includes a specific utterance. Determines whether or not the particular utterance satisfies a predetermined condition. Then, when it is determined that the predetermined condition is satisfied, the server 10 outputs the voice data in which a part of the specific utterance is omitted to the terminal 20. Therefore, in the server 10, it is possible to omit a specific utterance to the user and to achieve smooth communication by voice UI / UX.

また、サーバ１０は、生成された音声データのうち、特定の発話として、数字の発話を含む場合には、該数字の発話が所定の条件を満たすか否かを判定し、所定の条件を満たすと判定された場合には、数字の発話のうち下位の桁の数字を省略した音声データを端末２０に出力する。このため、サーバ１０では、桁数の多い数字の発話のうち下位の桁の数字を省略してレスポンスにかかる総時間を短縮し、円滑なコミュニケーションを図ることが可能となる。 Further, when the server 10 includes a numerical utterance as a specific utterance in the generated voice data, the server 10 determines whether or not the numerical utterance satisfies a predetermined condition, and satisfies the predetermined condition. If it is determined, the voice data in which the lower digit of the utterance of the numerical value is omitted is output to the terminal 20. Therefore, in the server 10, it is possible to shorten the total time required for the response by omitting the lower digit number among the utterances of the number having a large number of digits, and to achieve smooth communication.

つまり、音声ＵＩ／ＵＸにおいて、レスポンスにかかる総時間を短縮し、利用者の直感的な理解の向上を図ることが可能である。従来では、常に正確な数値を読み上げることにより、正確性を担保していた。しかしながら、音声ＵＩ／ＵＸにおいては、必ずしも正確に読み上げることが望ましいとは限らず、むしろアバウトな概算値を出力される方が速く、理解しやすいケースがある。また、人と人とのコミュニケーションでも、「だいたい」、「おおよそ」のようなあいまいな表現を用いることにより、コミュニケーションを円滑にしている例が多くみられる。第１の実施形態に係るコミュニケーションシステム１００では、コンピュータを人間に近づける一環として、「曖昧さ」の導入によりコミュニケーションを円滑にする効果をもたらす。 That is, in the voice UI / UX, it is possible to shorten the total time required for the response and improve the intuitive understanding of the user. In the past, accuracy was ensured by always reading out accurate numerical values. However, in voice UI / UX, it is not always desirable to read aloud accurately, and there are cases where it is faster and easier to understand when an approximate estimated value is output. Also, in communication between people, there are many examples of facilitating communication by using ambiguous expressions such as "roughly" and "approximate". In the communication system 100 according to the first embodiment, as a part of bringing a computer closer to a human being, the introduction of "ambiguity" brings about the effect of facilitating communication.

また、サーバ１０は、数字の発話に掛かる発話時間に対する、下位の桁の数字を省略した数字の発話の発話時間の割合が所定の閾値未満になるか否かを判定し、割合が所定の閾値未満になる場合には、数字の発話のうち下位の桁の数字を省略した音声データを端末２０に出力する。このため、正確な数字を発話した場合よりも、一定の割合以上短くなる場合にのみ、下位の桁の数字を省略するので、レスポンスにかかる総時間を効果的に短縮することが可能である。 Further, the server 10 determines whether or not the ratio of the utterance time of the number obtained by omitting the lower digit to the utterance time of the number is less than the predetermined threshold value, and the ratio is the predetermined threshold value. If it is less than, the voice data in which the lower digit of the utterance of the number is omitted is output to the terminal 20. Therefore, since the lower digit number is omitted only when the number is shorter than a certain percentage of the time when the correct number is spoken, the total time required for the response can be effectively shortened.

また、サーバ１０は、数字の発話における数値と、下位の桁の数字を省略した数値との差が所定の閾値未満になるか否かを判定し、差が所定の閾値未満になる場合には、数字の発話のうち下位の桁の数字を省略した音声データを端末２０に出力する。このため、正確な数値と概算値との差が一定以上大きい場合には数値を省略せず、差が一定未満である場合にのみ下位の桁の数字を省略するので、数字を必要以上にアバウトにすることなく、円滑なコミュニケーションを図ることが可能となる。 Further, the server 10 determines whether or not the difference between the numerical value in the utterance of the numerical value and the numerical value omitting the lower digit number is less than a predetermined threshold value, and if the difference is less than a predetermined threshold value, the server 10 determines. , The voice data in which the lower digit of the utterance of the number is omitted is output to the terminal 20. Therefore, if the difference between the accurate value and the estimated value is larger than a certain value, the value is not omitted, and only when the difference is less than a certain value, the lower digit number is omitted. It is possible to achieve smooth communication without using.

また、サーバ１０は、数字の発話に掛かる発話時間が所定の時間を超えるか否かを判定し、数字の発話に掛かる発話時間が所定の時間を超えると判定された場合には、発話時間が所定の時間以内となるように、数字の発話のうち下位の桁の数字を省略した音声データを端末２０に出力する。このため、正確な数字を発話した場合に所定の時間を超えると判定された場合にのみ、下位の桁の数字を省略するので、数字の発話に掛かる発話時間が長くなる場合にはレスポンスにかかる総時間を効果的に短縮することが可能である。 Further, the server 10 determines whether or not the utterance time required to speak the numbers exceeds a predetermined time, and if it is determined that the utterance time required to speak the numbers exceeds the predetermined time, the utterance time is determined. The voice data in which the lower digit of the utterance of the number is omitted is output to the terminal 20 so as to be within a predetermined time. For this reason, only when it is determined that the predetermined time is exceeded when an accurate number is spoken, the lower digit number is omitted, so if the speech time required for the number to be spoken becomes long, it will take a response. It is possible to effectively reduce the total time.

また、サーバ１０は、下位の桁の数字を省略した音声データを端末２０に出力した後に、端末２０から正確な数字の発話の要求を受け付けた場合には、下位の桁の数字を省略していない数字の発話を出力する。このため、利用者が詳細な数字を知りたい場合のみ、正しい数値を出力することが可能である。 Further, when the server 10 receives a request for utterance of an accurate number from the terminal 20 after outputting the voice data in which the lower digit number is omitted to the terminal 20, the server 10 omits the lower digit number. Outputs no number utterances. Therefore, it is possible to output the correct numerical value only when the user wants to know the detailed numerical value.

［システム構成等］
また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。例えば、サーバ１０の生成部１２ａ、判定部１２ｂおよび出力部１２ｃのそれぞれ機能について複数のサーバで分散して保持させるようにしてもよい。さらに、各装置にて行なわれる各処理機能は、その全部または任意の一部が、ＣＰＵやＧＰＵ（Graphics Processing Unit）および当該ＣＰＵやＧＰＵにて解析実行されるプログラムにて実現され、あるいは、ワイヤードロジックによるハードウェアとして実現され得る。 [System configuration, etc.]
Further, each component of each of the illustrated devices is a functional concept, and does not necessarily have to be physically configured as shown in the figure. That is, the specific form of distribution / integration of each device is not limited to the one shown in the figure, and all or part of them may be functionally or physically distributed / physically distributed in any unit according to various loads and usage conditions. Can be integrated and configured. For example, the functions of the generation unit 12a, the determination unit 12b, and the output unit 12c of the server 10 may be distributed and held by a plurality of servers. Further, each processing function performed by each device is realized by a CPU or GPU (Graphics Processing Unit) and a program analyzed and executed by the CPU or GPU, or is wired. It can be realized as hardware by logic.

また、本実施の形態において説明した各処理のうち、自動的におこなわれるものとして説明した処理の全部または一部を手動的におこなうこともでき、あるいは、手動的におこなわれるものとして説明した処理の全部または一部を公知の方法で自動的におこなうこともできる。この他、上記文書中や図面中で示した処理手順、制御手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。 Further, among the processes described in the present embodiment, all or part of the processes described as being automatically performed can be manually performed, or the processes described as being manually performed. It is also possible to automatically perform all or part of the above by a known method. In addition, the processing procedure, control procedure, specific name, and information including various data and parameters shown in the above document and drawings can be arbitrarily changed unless otherwise specified.

［プログラム］
また、上記実施形態において説明したサーバ１０が実行する処理をコンピュータが実行可能な言語で記述したプログラムを作成することもできる。例えば、実施形態に係るサーバ１０が実行する処理をコンピュータが実行可能な言語で記述したコミュニケーションプログラムを作成することもできる。この場合、コンピュータがコミュニケーションプログラムを実行することにより、上記実施形態と同様の効果を得ることができる。さらに、かかるコミュニケーションプログラムをコンピュータ読み取り可能な記録媒体に記録して、この記録媒体に記録されたコミュニケーションプログラムをコンピュータに読み込ませて実行することにより上記実施形態と同様の処理を実現してもよい。 [program]
It is also possible to create a program in which the processing executed by the server 10 described in the above embodiment is described in a language that can be executed by a computer. For example, it is possible to create a communication program in which the processing executed by the server 10 according to the embodiment is described in a language that can be executed by a computer. In this case, when the computer executes the communication program, the same effect as that of the above embodiment can be obtained. Further, the same processing as that of the above embodiment may be realized by recording the communication program on a computer-readable recording medium, reading the communication program recorded on the recording medium into the computer, and executing the program.

図７は、コミュニケーションプログラムを実行するコンピュータを示す図である。図７に例示するように、コンピュータ１０００は、例えば、メモリ１０１０と、ＣＰＵ１０２０と、ハードディスクドライブインタフェース１０３０と、ディスクドライブインタフェース１０４０と、シリアルポートインタフェース１０５０と、ビデオアダプタ１０６０と、ネットワークインタフェース１０７０とを有し、これらの各部はバス１０８０によって接続される。 FIG. 7 is a diagram showing a computer that executes a communication program. As illustrated in FIG. 7, the computer 1000 has, for example, a memory 1010, a CPU 1020, a hard disk drive interface 1030, a disk drive interface 1040, a serial port interface 1050, a video adapter 1060, and a network interface 1070. However, each of these parts is connected by a bus 1080.

メモリ１０１０は、図７に例示するように、ＲＯＭ（Read Only Memory）１０１１及びＲＡＭ１０１２を含む。ＲＯＭ１０１１は、例えば、ＢＩＯＳ（Basic Input Output System）等のブートプログラムを記憶する。ハードディスクドライブインタフェース１０３０は、図７に例示するように、ハードディスクドライブ１０９０に接続される。ディスクドライブインタフェース１０４０は、図７に例示するように、ディスクドライブ１１００に接続される。例えば磁気ディスクや光ディスク等の着脱可能な記憶媒体が、ディスクドライブ１１００に挿入される。シリアルポートインタフェース１０５０は、図７に例示するように、例えばマウス１１１０、キーボード１１２０に接続される。ビデオアダプタ１０６０は、図７に例示するように、例えばディスプレイ１１３０に接続される。 The memory 1010 includes a ROM (Read Only Memory) 1011 and a RAM 1012, as illustrated in FIG. The ROM 1011 stores, for example, a boot program such as a BIOS (Basic Input Output System). The hard disk drive interface 1030 is connected to the hard disk drive 1090, as illustrated in FIG. The disk drive interface 1040 is connected to the disk drive 1100 as illustrated in FIG. For example, a removable storage medium such as a magnetic disk or an optical disk is inserted into the disk drive 1100. The serial port interface 1050 is connected to, for example, a mouse 1110 and a keyboard 1120, as illustrated in FIG. The video adapter 1060 is connected, for example, to a display 1130, as illustrated in FIG.

ここで、図７に例示するように、ハードディスクドライブ１０９０は、例えば、ＯＳ１０９１、アプリケーションプログラム１０９２、プログラムモジュール１０９３、プログラムデータ１０９４を記憶する。すなわち、上記の、コミュニケーションプログラムは、コンピュータ１０００によって実行される指令が記述されたプログラムモジュールとして、例えばハードディスクドライブ１０９０に記憶される。 Here, as illustrated in FIG. 7, the hard disk drive 1090 stores, for example, the OS 1091, the application program 1092, the program module 1093, and the program data 1094. That is, the above-mentioned communication program is stored in, for example, the hard disk drive 1090 as a program module in which a command executed by the computer 1000 is described.

また、上記実施形態で説明した各種データは、プログラムデータとして、例えばメモリ１０１０やハードディスクドライブ１０９０に記憶される。そして、ＣＰＵ１０２０が、メモリ１０１０やハードディスクドライブ１０９０に記憶されたプログラムモジュール１０９３やプログラムデータ１０９４を必要に応じてＲＡＭ１０１２に読み出し、各種処理手順を実行する。 Further, the various data described in the above embodiment are stored as program data in, for example, a memory 1010 or a hard disk drive 1090. Then, the CPU 1020 reads the program module 1093 and the program data 1094 stored in the memory 1010 and the hard disk drive 1090 into the RAM 1012 as needed, and executes various processing procedures.

なお、コミュニケーションプログラムに係るプログラムモジュール１０９３やプログラムデータ１０９４は、ハードディスクドライブ１０９０に記憶される場合に限られず、例えば着脱可能な記憶媒体に記憶され、ディスクドライブ等を介してＣＰＵ１０２０によって読み出されてもよい。あるいは、コミュニケーションプログラムに係るプログラムモジュール１０９３やプログラムデータ１０９４は、ネットワーク（ＬＡＮ（Local Area Network）、ＷＡＮ（Wide Area Network）等）を介して接続された他のコンピュータに記憶され、ネットワークインタフェース１０７０を介してＣＰＵ１０２０によって読み出されてもよい。 The program module 1093 and program data 1094 related to the communication program are not limited to those stored in the hard disk drive 1090, and may be stored in, for example, a removable storage medium and read by the CPU 1020 via a disk drive or the like. good. Alternatively, the program module 1093 and the program data 1094 related to the communication program are stored in another computer connected via a network (LAN (Local Area Network), WAN (Wide Area Network), etc.), and are stored in another computer via the network interface 1070. It may be read by the CPU 1020.

上記の実施形態やその変形は、本願が開示する技術に含まれると同様に、特許請求の範囲に記載された発明とその均等の範囲に含まれるものである。 The above-described embodiments and modifications thereof are included in the invention described in the claims and the equivalent scope thereof, as included in the technology disclosed in the present application.

１０サーバ
１１通信処理部
１２制御部
１２ａ生成部
１２ｂ判定部
１２ｃ出力部
１３記憶部
１３ａ資産データ記憶部
１３ｂ判定用データ記憶部
２０端末
３０ネットワーク
１００コミュニケーションシステム 10 Server 11 Communication processing unit 12 Control unit 12a Generation unit 12b Judgment unit 12c Output unit 13 Storage unit 13a Asset data storage unit 13b Judgment data storage unit 20 Terminal 30 Network 100 Communication system

Claims

As a response to the input data received from the terminal, a generation unit that generates voice data to be output to the terminal, and a generation unit.
When the voice data generated by the generation unit includes a specific utterance, a determination unit for determining whether or not the specific utterance satisfies a predetermined condition, and a determination unit.
When the determination unit determines that a predetermined condition is satisfied, it has an output unit that outputs voice data to the terminal by omitting a part of the specific utterance.
When the specific utterance includes the utterance of a number among the voice data generated by the generation unit, the determination unit determines whether or not the utterance of the number satisfies a predetermined condition.
The communication unit is characterized in that, when it is determined by the determination unit that a predetermined condition is satisfied, the output unit outputs voice data in which the lower digit of the utterance of the number is omitted to the terminal. system.

The determination unit determines whether or not the ratio of the utterance time of the number obtained by omitting the lower digit to the utterance time of the number is less than a predetermined threshold value.
The output unit is characterized in that when the ratio is less than a predetermined threshold value by the determination unit, the output unit outputs voice data in which the lower digit of the utterance of the number is omitted to the terminal. The communication system according to Item 1 .

The determination unit determines whether or not the difference between the numerical value in the utterance of the numerical value and the numerical value omitting the lower digit number is less than a predetermined threshold value.
The output unit is characterized in that when the difference is less than a predetermined threshold value by the determination unit, the output unit outputs voice data in which the lower digit of the utterance of the number is omitted to the terminal. The communication system according to Item 1 .

The determination unit determines whether or not the utterance time required for utterance of the number exceeds a predetermined time.
When the determination unit determines that the utterance time for uttering the number exceeds a predetermined time, the output unit determines that the utterance time is within the predetermined time. The communication system according to claim 1 , wherein voice data in which the lower digit is omitted is output to the terminal.

When the output unit receives a request for utterance of an accurate number from the terminal after outputting voice data in which the lower digit number is omitted to the terminal, the output unit omits the lower digit number. The communication system according to claim 1 , wherein the utterance of an unnumbered number is output.

A communication method performed by a communication system,
A generation step of generating voice data to be output to the terminal as a response to the input data received from the terminal, and
When the voice data generated by the generation step includes a specific utterance, a determination step of determining whether or not the specific utterance satisfies a predetermined condition, and a determination step.
When it is determined by the determination step that a predetermined condition is satisfied, the output step of outputting voice data in which a part of the specific utterance is omitted to the terminal is included.
When the specific utterance includes the utterance of a number among the voice data generated by the generation step, the determination step determines whether or not the utterance of the number satisfies a predetermined condition.
The output step is a communication characterized in that, when it is determined by the determination step that a predetermined condition is satisfied, voice data in which the lower digit of the utterance of the number is omitted is output to the terminal. Method.

As a response to the input data received from the terminal, a generation step of generating voice data to be output to the terminal, and a generation step.
When the voice data generated by the generation step includes a specific utterance, a determination step for determining whether or not the specific utterance satisfies a predetermined condition, and a determination step.
When it is determined by the determination step that the predetermined condition is satisfied, the computer is made to execute the output step of outputting the voice data in which a part of the specific utterance is omitted to the terminal .
In the determination step, when the specific utterance includes the utterance of a number among the voice data generated by the generation step, it is determined whether or not the utterance of the number satisfies a predetermined condition.
The output step is a communication characterized in that, when it is determined by the determination step that a predetermined condition is satisfied, voice data in which the lower digit of the utterance of the number is omitted is output to the terminal. program.