JP2019053473A

JP2019053473A - Pseudo-response transmitting device, nodding expression learning device, information terminal device, communication system, pseudo-response transmitting method, nodding expression learning method and pseudo-response transmitting program

Info

Publication number: JP2019053473A
Application number: JP2017176610A
Authority: JP
Inventors: 充裕後藤; Mitsuhiro Goto; 正典横山; Masanori Yokoyama; 崇由望月; Takayoshi Mochizuki; 純史布引; Ayafumi Nunobiki; 山田　智広; Tomohiro Yamada; 智広山田
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2017-09-14
Filing date: 2017-09-14
Publication date: 2019-04-04
Anticipated expiration: 2037-09-14
Also published as: JP6738781B2

Abstract

To achieve a smooth real-time communication via a communication channel.SOLUTION: A pseudo-response transmitting device according to an embodiment is a pseudo-response transmitting device that transmits nodding to message information transmitted from a transmitter, and includes: an input status obtaining unit that obtains a status of an input frequency of message information; an utterance intent estimating unit that estimates an utterance intent by the transmitter from the message information; a nodding expression searching unit that searches, from a storage device that stores the status of the input frequency, the utterance intent, and the nodding expression to the transmitter in association with each other, the nodding expression associated with the obtained status of the input frequency and with the estimated utterance intent; and a transmitting unit that transmits the searched nodding expression to the transmitter when a response from a receiver of the message information to the transmitter causes a delay that satisfies a predetermined condition.SELECTED DRAWING: Figure 1

Description

本発明の実施形態は、擬似応答送信装置、相づち表現学習装置、情報端末装置、通信システム、擬似応答送信方法、相づち表現学習方法および擬似応答送信プログラムに関する。 FIELD Embodiments described herein relate generally to a pseudo-response transmission device, a cueing expression learning device, an information terminal device, a communication system, a fake response transmission method, a cueing expression learning method, and a pseudo-response transmission program.

従来の、コミュニケーションチャネルを介したリアルタイムコミュニケーションにおいて、コミュニケーションチャネルが定めるモダリティ(modality)、例えば音声モダリティ（例：電話）、テキストモダリティ（例：チャット）、視聴覚モダリティ（例：テレビ電話）などを利用して、コミュニケーションを実現する。 In conventional real-time communication via communication channels, the modalities defined by the communication channel, such as voice modalities (eg, telephone), text modalities (eg, chat), audiovisual modalities (eg, videophone), etc., are used. To realize communication.

従来のリアルタイムコミュニケーションでは、利用可能なモダリティがコミュニケーションチャネルにより決定される。このため、コミュニケーションを実施するために、送信ユーザは、受信ユーザが利用できるモダリティのチャネルを明示的に選択している。また、リアルタイムコミュニケーションに関わる手法として、非特許文献１に開示されるように、テキストチャットの参加者の感じる不快感を抑えるために、参加者の応答期待時間（自分の発言に対する応答までに要する時間の期待値）を延長させる手法がある。 In conventional real-time communication, available modalities are determined by a communication channel. For this reason, in order to perform communication, the transmitting user explicitly selects a channel of a modality that can be used by the receiving user. In addition, as disclosed in Non-Patent Document 1, as a method related to real-time communication, in order to suppress discomfort felt by a participant in text chat, the expected response time of the participant (the time required for the response to his / her speech) There is a method to extend the expected value of

チャット参加者の応答期待時間の延長を目指した情報提示とそのタイミング、村田和義,川口修,渋谷雄,倉本到,辻野嘉宏、ヒューマンインタフェース学会論文誌, Vol.8, No.3, pp.103?114 (2006)Information presentation and timing to extend the expected response time of chat participants, Kazuyoshi Murata, Osamu Kawaguchi, Yu Shibuya, Toru Kuramoto, Yoshihiro Kanno, Journal of Human Interface Society, Vol.8, No.3, pp.103 ? 114 (2006)

上記のリアルタイムコミュニケーションにおいて、ユーザによる入力（テキストまたは音声）に時間を要する、具体的には入力開始から入力終了までの入力待ち時間が長くなると、このユーザから通信相手へのレスポンスに遅延が生じてしまい、リアルタイムコミュニケーションを円滑に実施できない。 In the above real-time communication, if the input (text or voice) by the user takes time, specifically, if the input waiting time from the start of input to the end of input becomes long, the response from the user to the communication partner is delayed. As a result, real-time communication cannot be carried out smoothly.

また、上記のリアルタイムコミュニケーションにおいて、送信ユーザと受信ユーザとの間でモダリティの変換を行なうことが考えられる。
例えば、送信ユーザが利用できるモダリティが音声であって、受信ユーザが利用できるモダリティがテキストである場合、送信ユーザにより発話された音声がモダリティ変換処理（ここでは音声認識処理）によりテキストに変換されて受信ユーザに伝達される。また、送信ユーザが利用できるモダリティがテキストであって、受信ユーザが利用できるモダリティが音声である場合、送信ユーザにより入力されたテキストがモダリティ変換処理（ここでは音声合成処理）により音声に変換されて受信ユーザに伝達される。 In the above real-time communication, it is conceivable to convert the modality between the transmitting user and the receiving user.
For example, when the modality that can be used by the sending user is voice and the modality that can be used by the receiving user is text, the voice spoken by the sending user is converted into text by modality conversion processing (here, voice recognition processing). Is communicated to the receiving user. In addition, when the modality that can be used by the sending user is text and the modality that can be used by the receiving user is speech, the text input by the sending user is converted to speech by modality conversion processing (here, speech synthesis processing). Is communicated to the receiving user.

しかし、上記の変換処理には、処理の開始から終了までの処理時間を要する為、上記の入力待ち時間と合わせた時間だけレスポンスの遅延を招き、リアルタイムコミュニケーションを阻害する要因となる。 However, since the above conversion processing requires a processing time from the start to the end of the processing, a response delay is caused by a time that is combined with the above input waiting time, which becomes a factor that hinders real-time communication.

本発明の目的は、コミュニケーションチャネルを介したリアルタイムコミュニケーションを円滑に実現できる擬似応答送信装置、相づち表現学習装置、情報端末装置、通信システム、擬似応答送信方法、相づち表現学習方法および擬似応答送信プログラムを提供することである。 An object of the present invention is to provide a pseudo-response transmission device, a matching expression learning device, an information terminal device, a communication system, a pseudo-response transmission method, a matching expression learning method, and a pseudo-response transmission program capable of smoothly realizing real-time communication via a communication channel. Is to provide.

上記目的を達成するために、この発明の一実施形態における擬似応答送信装置の第１の態様は、送信者から送信されたメッセージ情報への擬似応答である相づちを送信する擬似応答送信装置であって、前記メッセージ情報の入力頻度の状態を取得する入力状態取得部と、前記メッセージ情報から前記送信者による発話意図を推定する発話意図推定部と、前記メッセージ情報の入力頻度の状態、前記送信者の発話意図、および前記送信者への相づち表現を対応づけて記憶する記憶装置から、前記入力状態取得部により取得された前記入力頻度の状態と前記発話意図推定部により推定された前記発話意図とに対応付けられる相づち表現を検索する相づち表現検索部と、前記送信されたメッセージ情報に対する、当該メッセージ情報の受信者から前記送信者への応答に所定の条件を満たす遅延が生ずるか否かを判断する判断部と、前記所定の条件を満たす遅延が生ずると前記判断部により判断された場合に、前記検索された相づち表現を前記送信者へ送信する送信部とを有する装置を提供する。 In order to achieve the above object, a first aspect of a pseudo-response transmission device according to an embodiment of the present invention is a pseudo-response transmission device that transmits a phase that is a pseudo-response to message information transmitted from a sender. An input state acquisition unit that acquires a state of the input frequency of the message information, an utterance intention estimation unit that estimates an utterance intention by the sender from the message information, a state of the input frequency of the message information, the sender Utterance intention and the state of the input frequency acquired by the input state acquisition unit and the utterance intention estimated by the utterance intention estimation unit from a storage device that stores the corresponding expressions to the sender in association with each other. A matching expression search unit that searches for a matching expression associated with the message information sent from the recipient of the message information. A determination unit that determines whether or not a delay that satisfies a predetermined condition occurs in a response to the sender; and the searched correlation expression when the determination unit determines that a delay that satisfies the predetermined condition occurs. And a transmitter that transmits the message to the sender.

上記構成の擬似応答送信装置の第２の態様は、第１の態様において、前記送信者からのメッセージ情報が音声情報であり、前記音声情報に対する音声認識処理を行なうことで、前記音声情報をテキストに変換する音声認識部と、前記送信部が送信する相づち表現のテキストに対する音声合成処理を行なうことで前記相づち表現のテキストを音声情報に変換する音声合成部とをさらに備える装置を提供する。 According to a second aspect of the pseudo-response transmission apparatus configured as described above, in the first aspect, the message information from the sender is voice information, and the voice information is converted into text by performing voice recognition processing on the voice information. There is provided an apparatus further comprising: a speech recognition unit that converts the text into a speech information, and a speech synthesis unit that converts the text in the text representation into speech information by performing speech synthesis processing on the text in the text representation transmitted by the transmission unit.

この発明の一実施形態における相づち表現学習装置の態様は、送信者からのメッセージ情報の入力頻度の状態を取得する入力状態取得部と、前記メッセージ情報から前記送信者の通信相手の発話意図を推定する発話意図推定部と、前記メッセージ情報から前記通信相手への相づち表現を抽出し、前記抽出された前記相づち表現を、前記入力状態取得部により取得された前記入力頻度の状態と前記発話意図推定部により推定された前記発話意図とに対応づけて前記相づち表現の学習情報として記憶装置に記憶する相づち表現抽出部とを備える装置を提供する。 In one embodiment of the present invention, an aspect of the learning expression learning apparatus includes an input state acquisition unit that acquires a state of an input frequency of message information from a sender, and estimates the utterance intention of the communication partner of the sender from the message information. An utterance intention estimation unit that extracts a corresponding expression to the communication partner from the message information, and the extracted expression of the continuation expression and the input frequency state acquired by the input state acquisition unit and the utterance intention estimation There is provided an apparatus including a correlation expression extraction unit that stores in a storage device as learning information of the correlation expression in association with the utterance intention estimated by a unit.

この発明の一実施形態における情報端末装置の態様は、受信したメッセージ情報の入力頻度の状態を取得する入力状態取得部と、前記メッセージ情報から前記送信者による発話意図を推定する発話意図推定部と、前記メッセージ情報の入力頻度の状態、前記メッセージ情報の送信者の発話意図、および前記送信者への相づち表現を対応づけて記憶する記憶装置から、前記入力状態取得部により取得された前記入力頻度の状態と前記発話意図推定部により推定された前記発話意図とに対応付けられる相づち表現を検索する相づち表現検索部と、前記送信者からのメッセージ情報に対する、当該メッセージ情報の受信者から前記送信者への応答に所定の条件を満たす遅延が生ずるか否かを判断する判断部と、前記所定の条件を満たす遅延が生ずると前記判断部により判断された場合に、前記検索された相づち表現を送信する送信部とを備える装置を提供する。 An aspect of the information terminal device according to an embodiment of the present invention includes an input state acquisition unit that acquires a state of an input frequency of received message information, an utterance intention estimation unit that estimates an utterance intention by the sender from the message information, The input frequency acquired by the input status acquisition unit from a storage device that stores the state of the input frequency of the message information, the utterance intention of the sender of the message information, and the corresponding expression to the sender in association with each other A message expression from the sender of the message information to the sender of the message information from the sender, and a message expression from the sender A determination unit that determines whether or not a delay that satisfies a predetermined condition occurs in a response to the response, and a delay that satisfies the predetermined condition occurs If it is determined by the determining unit, to provide a device and a transmitter for transmitting the retrieved back-channel feedback representation.

この発明の一実施形態における通信システムの態様は、サーバ装置と、メッセージ情報の送信者が所持する情報端末装置とを有する通信システムであって、前記サーバ装置は、受信したメッセージ情報の入力頻度の状態を取得する入力状態取得部と、前記メッセージ情報から前記送信者による発話意図を推定する発話意図推定部と、前記入力状態取得部により取得された前記入力頻度の状態、前記発話意図推定部により推定された前記発話意図、および前記送信者への相づち表現のテキストを対応づけて記憶する記憶装置から、前記入力状態取得部により取得された前記入力頻度の状態と前記発話意図推定部により推定された前記発話意図とに対応付けられる相づち表現を検索する相づち表現検索部と、前記送信者からのメッセージ情報に対する、当該メッセージ情報の受信者から前記送信者への応答に所定の条件を満たす遅延が生ずるか否かを判断する判断部と、前記所定の条件を満たす遅延が生ずると前記判断部により判断された場合に、前記検索された相づち表現のテキストを前記送信者へ送信する送信部とを有し、前記情報端末装置は、前記送信者からのメッセージ情報が音声情報であるときに、前記音声情報に対する音声認識処理を行なうことで、前記音声情報をテキストに変換して前記入力状態取得部および前記発話意図推定部に送る音声認識部と、前記送信部から送信される相づち表現のテキストに対する音声合成処理を行なうことで前記相づち表現のテキストを音声情報に変換する音声合成部とを有するシステムを提供する。 An aspect of a communication system according to an embodiment of the present invention is a communication system having a server device and an information terminal device possessed by a sender of message information, wherein the server device has an input frequency of received message information. An input state acquisition unit that acquires a state, an utterance intention estimation unit that estimates an utterance intention by the sender from the message information, an input frequency state acquired by the input state acquisition unit, and the utterance intention estimation unit Estimated by the utterance intention estimation unit and the state of the input frequency acquired by the input state acquisition unit from a storage device that associates and stores the estimated utterance intention and the text of the corresponding expression to the sender A matching expression search unit for searching for a matching expression associated with the utterance intention, and message information from the sender. The determination unit determines whether a delay that satisfies a predetermined condition occurs in the response from the receiver of the message information to the sender, and the determination unit determines that a delay that satisfies the predetermined condition occurs. When the message information from the sender is voice information, the information terminal device includes a transmitter that transmits the retrieved text of the combined expression to the sender. A speech recognition unit that converts the speech information into text and sends it to the input state acquisition unit and the utterance intention estimation unit, and speech synthesis with respect to the mixed expression text transmitted from the transmission unit Provided is a system including a speech synthesis unit that converts the text of the combined expression into speech information by performing processing.

本発明の一実施形態における擬似応答送信方法の態様は、擬似応答送信装置が行う擬似応答送信方法であって、送信者から送信されたメッセージ情報の入力頻度の状態を取得し、前記メッセージ情報から前記送信者による発話意図を推定し、前記メッセージ情報の入力頻度の状態、前記送信者の発話意図、および前記送信者への相づち表現を対応づけて記憶する記憶装置から、前記取得された前記入力頻度の状態と、前記推定された前記発話意図とに対応付けられる相づち表現を検索し、前記送信されたメッセージ情報に対する、当該メッセージ情報の受信者から前記送信者への応答に所定の条件を満たす遅延が生ずるか否かを判断し、前記所定の条件を満たす遅延が生ずると判断された場合に、前記検索された相づち表現を前記送信者へ送信する方法を提供する。 An aspect of a pseudo-response transmission method according to an embodiment of the present invention is a pseudo-response transmission method performed by a pseudo-response transmission device, which acquires a state of input frequency of message information transmitted from a sender, and from the message information The input obtained from the storage device that estimates the utterance intention by the sender and stores the state of the input frequency of the message information, the utterance intention of the sender, and the corresponding expression to the sender in association with each other A matching expression associated with the frequency state and the estimated utterance intention is searched, and a response from the receiver of the message information to the sender for the transmitted message information satisfies a predetermined condition. It is determined whether or not there is a delay, and when it is determined that a delay that satisfies the predetermined condition occurs, the searched matching expression is transmitted to the sender. To provide a method for.

本発明の一実施形態における相づち表現学習方法の態様は、相づち表現学習装置が行なう相づち表現学習方法であって、送信者からのメッセージ情報の入力頻度の状態を取得し、前記メッセージ情報から前記送信者の通信相手の発話意図を推定し、前記メッセージ情報から前記通信相手への相づち表現を抽出し、前記抽出された前記相づち表現を、前記取得された前記入力頻度の状態と前記推定された前記発話意図と対応づけて前記相づち表現の学習情報として記憶装置に記憶する方法を提供する。 The aspect of the method for learning expression of speech according to an embodiment of the present invention is a method of learning expression for expression used by the device for learning expression of expression, which acquires a state of input frequency of message information from a sender, and transmits the message from the message information. Estimating the utterance intention of the communication partner of the person, extracting a corresponding expression to the communication partner from the message information, and extracting the extracted expression to the state of the acquired input frequency and the estimated There is provided a method of storing in a storage device as learning information of the combined expression in association with an utterance intention.

本発明の一実施形態における擬似応答送信プログラムの態様は、第１乃至第２態様のいずれか１つにおける擬似応答送信装置の前記各部としてプロセッサを機能させるプログラムを提供する。 An aspect of the pseudo response transmission program according to an embodiment of the present invention provides a program that causes a processor to function as each unit of the pseudo response transmission device according to any one of the first to second aspects.

本発明によれば、コミュニケーションチャネルを介したリアルタイムコミュニケーションを円滑に実現することが可能になる。 According to the present invention, it becomes possible to smoothly realize real-time communication via a communication channel.

本発明の一実施形態におけるリアルタイムコミュニケーションシステムの機能構成例を示す図。The figure which shows the function structural example of the real-time communication system in one Embodiment of this invention. 本発明の一実施形態におけるリアルタイムコミュニケーションシステムの相づち表現記憶部の記憶内容の一例を示す図。The figure which shows an example of the memory content of the combination expression memory | storage part of the real-time communication system in one Embodiment of this invention. 本発明の一実施形態におけるリアルタイムコミュニケーションシステムによる相づち表現学習処理の一例を示すフローチャート。The flowchart which shows an example of the matching expression learning process by the real-time communication system in one Embodiment of this invention. 本発明の一実施形態におけるリアルタイムコミュニケーションシステムによる擬似応答送信処理の一例を示すフローチャート。The flowchart which shows an example of the pseudo response transmission process by the real-time communication system in one Embodiment of this invention.

以下、図面を参照しながら、この発明に係わる一実施形態を説明する。
図１は、本発明の一実施形態におけるリアルタイムコミュニケーションシステムの機能構成例を示す図である。ここでは、テキストと音声との間の変換を経てリアルタイムコミュニケーションを実施する例について説明する。
図１に示すように、本発明の一実施形態におけるリアルタイムコミュニケーションシステム（通信システム）は、複数台の情報端末（送信ユーザおよび受信ユーザの一方が使用する情報端末１０−１，送信ユーザおよび受信ユーザの他方が使用する情報端末１０−２）、コミュニケーション装置２０、相づち表現学習装置３０、擬似応答送信装置４０を有する。情報端末１０−１，１０−２、コミュニケーション装置２０、相づち表現学習装置３０、擬似応答送信装置４０は、パーソナルコンピュータ（ＰＣ）などのコンピュータデバイスとした装置により実現することができる。例えば、コンピュータデバイスは、ＣＰＵ（Central Processing Unit）などのプロセッサと、プロセッサに接続されるメモリと、通信インタフェースと、を備える。また、コミュニケーション装置２０、相づち表現学習装置３０、擬似応答送信装置４０は、例えば単一のサーバ装置として実現することもできる。 Hereinafter, an embodiment according to the present invention will be described with reference to the drawings.
FIG. 1 is a diagram illustrating a functional configuration example of a real-time communication system according to an embodiment of the present invention. Here, an example in which real-time communication is performed through conversion between text and voice will be described.
As shown in FIG. 1, a real-time communication system (communication system) according to an embodiment of the present invention includes a plurality of information terminals (an information terminal 10-1, a transmission user and a reception user used by one of a transmission user and a reception user). Information terminal 10-2), communication device 20, co-expression learning device 30, and pseudo-response transmission device 40. The information terminals 10-1 and 10-2, the communication device 20, the learning expression learning device 30, and the pseudo response transmission device 40 can be realized by a device such as a personal computer (PC). For example, the computer device includes a processor such as a CPU (Central Processing Unit), a memory connected to the processor, and a communication interface. In addition, the communication device 20, the matching expression learning device 30, and the pseudo-response transmission device 40 can be realized as a single server device, for example.

コミュニケーション装置２０は、情報通信部２１、音声認識部２２、音声合成部２３、情報出力時間記憶部２４を有する。
情報通信部２１は、情報端末１０−１と情報端末１０−２との間のメッセージ情報（ここでは送信ユーザが情報端末１０−１または１０−２の図示しないキーボード、マウスを用いて入力したテキスト、または、送信ユーザが情報端末１０−１または１０−２の図示しないマイクロフォンを用いて発話した音声）を送受信する。 The communication device 20 includes an information communication unit 21, a voice recognition unit 22, a voice synthesis unit 23, and an information output time storage unit 24.
The information communication unit 21 receives message information between the information terminal 10-1 and the information terminal 10-2 (here, text input by the transmitting user using a keyboard and mouse (not shown) of the information terminal 10-1 or 10-2). Or the voice transmitted by the transmitting user using a microphone (not shown) of the information terminal 10-1 or 10-2).

また、情報通信部２１は、情報端末１０−１が送受信できるメッセージ情報の種別（テキストまたは音声）、情報端末１０−２が送受信できるメッセージ情報の種別（テキストまたは音声）を示す情報を内部メモリに保持する。この種別は、情報端末１０−１または情報端末１０−２がリアルタイムコミュニケーションで利用できるモダリティを意味する。 In addition, the information communication unit 21 stores information indicating the type (text or voice) of message information that can be transmitted / received by the information terminal 10-1 and the type of message information (text or voice) that can be transmitted / received by the information terminal 10-2 in the internal memory. Hold. This type means a modality that the information terminal 10-1 or the information terminal 10-2 can use in real-time communication.

音声認識部２２は、情報端末１０−１および情報端末１０−２の一方から情報通信部２１が受信したメッセージ情報が音声であって、情報端末１０−１および情報端末１０−２の他方に送信できるメッセージ情報の種別がテキストである場合に、この音声を音声認識処理によりテキストに変換する。
また、音声認識部２２は、情報端末１０−１および情報端末１０−２の一方から情報通信部２１が受信したメッセージ情報が音声であって、この音声を相づち表現学習装置３０による処理のためにテキストに変換する。 The voice recognition unit 22 transmits the message information received by the information communication unit 21 from one of the information terminal 10-1 and the information terminal 10-2 to the other of the information terminal 10-1 and the information terminal 10-2. When the type of message information that can be performed is text, this voice is converted into text by voice recognition processing.
Further, the voice recognition unit 22 has message information received by the information communication unit 21 from one of the information terminal 10-1 and the information terminal 10-2 as voice, and this voice is combined for processing by the expression learning device 30. Convert to text.

音声合成部２３は、情報端末１０−１および情報端末１０−２の一方から情報通信部２１が受信したメッセージ情報がテキストであって、情報端末１０−１および情報端末１０−２の他方に送信できるメッセージ情報の種別が音声である場合に、該当のテキストを音声合成処理により音声に変換する。
また、音声合成部２３は、擬似応答送信装置４０からのメッセージであるテキストの送信先である情報端末１０−１または情報端末１０−２に送信できるメッセージ情報の種別が音声である場合、このテキストを音声合成処理により音声に変換する。 The voice synthesizer 23 transmits the message information received by the information communication unit 21 from one of the information terminal 10-1 and the information terminal 10-2 to the other of the information terminal 10-1 and the information terminal 10-2. When the type of message information that can be performed is speech, the corresponding text is converted to speech by speech synthesis processing.
In addition, when the type of message information that can be transmitted to the information terminal 10-1 or the information terminal 10-2 that is a transmission destination of the text that is the message from the pseudo-response transmission device 40 is speech, the speech synthesizer 23 stores the text Is converted into speech by speech synthesis processing.

情報出力時間記憶部２４は、例えば不揮発性メモリ等の記憶装置であり、情報通信部２１が送受信するメッセージ情報に基づく経過時間の情報を記憶する。経過時間とは、情報端末１０−１または情報端末１０−２の一方から他方へのメッセージ情報を情報通信部２１が受信したタイミングから、この受信した情報端末から送信元の情報端末へのメッセージ情報を情報通信部２１が受信するタイミングまでの経過時間である。 The information output time storage unit 24 is a storage device such as a nonvolatile memory, for example, and stores elapsed time information based on message information transmitted and received by the information communication unit 21. The elapsed time is message information from the received information terminal to the source information terminal from the timing at which the information communication unit 21 receives message information from one of the information terminal 10-1 or the information terminal 10-2 to the other. Is the elapsed time until the information communication unit 21 receives the information.

相づち表現学習装置３０は、入力分析部３１、入力状態取得部３１ａ、発話意図推定部３１ｂ、相づち表現抽出部３２、相づち表現記憶部３３を有する。
入力分析部３１は、入力状態取得部３１ａ、発話意図推定部３１ｂを有する。
入力状態取得部３１ａは、コミュニケーション装置２０からのメッセージ情報が音声認識部２２を介さずに情報通信部２１から出力されたテキストである場合、このテキストの入力頻度の状態（テキスト入力速度（単位時間当たりの入力文字数））（以降、入力状態と称することがある）を分析し、この分析した入力状態を取得する。 The matching expression learning device 30 includes an input analysis unit 31, an input state acquisition unit 31 a, an utterance intention estimation unit 31 b, a matching expression extraction unit 32, and a matching expression storage unit 33.
The input analysis unit 31 includes an input state acquisition unit 31a and an utterance intention estimation unit 31b.
When the message information from the communication device 20 is text output from the information communication unit 21 without passing through the voice recognition unit 22, the input state acquisition unit 31a determines the input frequency state of the text (text input speed (unit time Number of input characters)) (hereinafter referred to as an input state) is analyzed, and the analyzed input state is obtained.

また、入力状態取得部３１ａは、コミュニケーション装置２０からのメッセージ情報が音声認識部２２から出力されたテキストである場合、情報通信部２１が受信した、音声認識部２２による変換前の音声に基づいて、この音声の入力状態（音声速度（テンポ）（単位時間当たりの発話文字数））を分析し、この分析した入力状態を取得する。 In addition, when the message information from the communication device 20 is text output from the speech recognition unit 22, the input state acquisition unit 31 a is based on the speech received by the information communication unit 21 and before conversion by the speech recognition unit 22. The voice input state (voice speed (tempo) (the number of uttered characters per unit time)) is analyzed, and the analyzed input state is acquired.

テキスト入力速度は、例えば情報端末１０−１または情報端末１０−２に搭載されるテキスト入力速度解析アプリで解析された速度である。
音声速度は、例えば入力分析部３１に搭載される、音声のモーラ数（１秒間におけるモーラ（拍）の数）の解析機能により取得することができる。
発話意図推定部３１ｂは、情報通信部２１または音声認識部２２から出力されたテキストを分析することで、送信ユーザの通信相手からの発話意図（対話行為）またはトピックを推定する。この発話意図は、テキスト入力の意図を含む。 The text input speed is a speed analyzed by, for example, a text input speed analysis application installed in the information terminal 10-1 or the information terminal 10-2.
The voice speed can be acquired by, for example, an analysis function of the number of voice mora (number of mora (beats) per second) installed in the input analysis unit 31.
The utterance intention estimation unit 31b analyzes the text output from the information communication unit 21 or the speech recognition unit 22 to estimate the utterance intention (interaction act) or topic from the communication partner of the transmission user. This utterance intention includes a text input intention.

相づち表現抽出部３２は、入力状態取得部３１ａまたは発話意図推定部３１ｂによる処理対象となった、送信ユーザからのメッセージ情報に対応するテキストを分析することで、この送信ユーザから受信ユーザへの相づち表現（感声表現）を所定のパターンマッチなどで抽出する。相づち表現は、例えば、承認、受容、驚き、感心、気付き、同意を含む。 The correlation expression extraction unit 32 analyzes the text corresponding to the message information from the transmission user, which is the processing target of the input state acquisition unit 31a or the utterance intention estimation unit 31b, so that the correlation from the transmission user to the reception user is performed. An expression (voice expression) is extracted by a predetermined pattern match or the like. Syntactic expressions include, for example, approval, acceptance, surprise, impression, awareness, and consent.

図２は、本発明の一実施形態におけるリアルタイムコミュニケーションシステムの相づち表現記憶部の記憶内容の一例を示す図である。
図２に示すように、相づち表現記憶部３３には、情報端末１０−１と情報端末１０−２の間で送受信するメッセージ情報についての、（１）送信ユーザにより入力され、入力状態取得部３１ａにより取得した、テキスト入力速度またはテンポ（音声速度）、（２）発話意図推定部３１ｂにより推定した、送信ユーザの通信相手からの発話意図またはトピック、（３）相づち表現、が対応付けられる相づち表現学習情報がユーザごとに区分して記憶される。 FIG. 2 is a diagram illustrating an example of the contents stored in the correlation expression storage unit of the real-time communication system according to the embodiment of the present invention.
As shown in FIG. 2, (1) the message information transmitted / received between the information terminal 10-1 and the information terminal 10-2 is (1) input by the transmission user and input state acquisition unit 31 a. The text input speed or tempo (speech speed) acquired by the above, (2) the speech intention or topic from the communication partner of the transmitting user, estimated by the speech intention estimating unit 31b, and (3) the multi-expression are associated with each other. Learning information is stored separately for each user.

擬似応答送信装置４０は、入力分析部４１、相づち表現検索部４２、情報出力制御部４３、合成・認識処理時間予測部４４を有する。
入力分析部４１は、入力状態取得部４１ａ（入力状態取得部３１ａと同機能）、発話意図推定部４１ｂ（発話意図推定部３１ｂと同機能）を有する。 The pseudo-response transmission device 40 includes an input analysis unit 41, a matching expression search unit 42, an information output control unit 43, and a synthesis / recognition processing time prediction unit 44.
The input analysis unit 41 includes an input state acquisition unit 41a (same function as the input state acquisition unit 31a) and a speech intention estimation unit 41b (same function as the speech intention estimation unit 31b).

相づち表現検索部４２は、あるメッセージ情報に対する、入力状態取得部４１ａによる取得結果と、同じメッセージ情報に対する、発話意図推定部４１ｂによる推定結果とを検索条件として、この検索条件に対応する相づち表現を相づち表現記憶部３３における、該当ユーザの相づち表現学習情報から検索する。 The matching expression search unit 42 uses the acquisition result by the input state acquisition unit 41a for a certain message information and the estimation result by the utterance intention estimation unit 41b for the same message information as a search condition, and selects a matching expression corresponding to the search condition. A search is performed from the corresponding expression learning information of the corresponding user in the corresponding expression storage unit 33.

情報出力制御部４３は、判断部４３ａ、送信部４３ｂを有する。
判断部４３ａは、情報出力時間記憶部２４に記憶される経過時間に基づいて、コミュニケーション装置２０の情報通信部２１がメッセージ情報を受信した時刻から現在時刻までの経過時間が一定の時間に達したか否かを判断する。
また、判断部４３ａは、情報端末１０−１と情報端末１０−２の間で送受信するメッセージ情報についての、音声認識処理または音声合成処理に要する処理時間の予測値が一定の時間に達したか否かを判断する。 The information output control unit 43 includes a determination unit 43a and a transmission unit 43b.
Based on the elapsed time stored in the information output time storage unit 24, the determination unit 43a has reached a certain time from the time when the information communication unit 21 of the communication device 20 receives the message information to the current time. Determine whether or not.
In addition, the determination unit 43a determines whether the predicted value of the processing time required for speech recognition processing or speech synthesis processing has reached a certain time for message information transmitted / received between the information terminal 10-1 and the information terminal 10-2. Judge whether or not.

送信部４３ｂは、判断部４３ａにより判断した上記の経過時間が一定の時間に達したとき、または、判断部４３ａにより判断した、上記の処理時間の予測値が一定の時間に達したときに、相づち表現検索部４２が検索した相づち表現をコミュニケーション装置２０に送信する。 When the elapsed time determined by the determination unit 43a reaches a certain time, or when the predicted value of the processing time determined by the determination unit 43a reaches a certain time, the transmission unit 43b The matching expression searched by the matching expression search unit 42 is transmitted to the communication device 20.

合成・認識処理時間予測部４４は、コミュニケーション装置２０の情報通信部２１が受信したメッセージ情報についての、音声認識処理または音声合成処理に要する処理時間を予測する。 The synthesis / recognition processing time prediction unit 44 predicts the processing time required for speech recognition processing or speech synthesis processing for message information received by the information communication unit 21 of the communication device 20.

次に、相づち表現学習処理について説明する。
図３は、本発明の一実施形態におけるリアルタイムコミュニケーションシステムによる相づち表現学習処理の一例を示すフローチャートである。
まず、送信ユーザにより利用する情報端末１０−１が、受信ユーザにより利用する情報端末１０−２宛のメッセージ情報をコミュニケーション装置２０へ送信すると（Ｓ１１）、この送信されたメッセージ情報がテキストでなく音声情報であれば（Ｓ１２のＮｏ）、コミュニケーション装置２０の情報通信部２１は、この音声情報を音声認識部２２に送る。音声認識部２２は、この音声情報に対する音声認識処理を行なうことで当該音声情報をテキストに変換し、このテキストを、相づち表現学習装置３０の入力分析部３１に出力する（Ｓ１３）。 Next, the combined expression learning process will be described.
FIG. 3 is a flow chart showing an example of the learning process for learning the expression by the real-time communication system according to the embodiment of the present invention.
First, when the information terminal 10-1 used by the transmitting user transmits message information addressed to the information terminal 10-2 used by the receiving user to the communication device 20 (S11), the transmitted message information is not a text but a voice. If it is information (No in S12), the information communication unit 21 of the communication device 20 sends the voice information to the voice recognition unit 22. The voice recognition unit 22 converts the voice information into text by performing voice recognition processing on the voice information, and outputs the text to the input analysis unit 31 of the expression learning device 30 (S13).

また、Ｓ１１で送信されたメッセージ情報がテキストであれば（Ｓ１２のＹｅｓ）、情報通信部２１は、このテキストを音声認識部２２を介さずに相づち表現学習装置３０の入力分析部３１に直接出力する。 If the message information transmitted in S11 is text (Yes in S12), the information communication unit 21 outputs the text directly to the input analysis unit 31 of the expression learning device 30 without passing through the speech recognition unit 22. To do.

入力分析部３１の入力状態取得部３１ａは、コミュニケーション装置２０から出力されたメッセージ情報が音声認識部２２を介さずに情報通信部２１から直接出力されたテキストである場合、このテキストの入力状態（テキスト入力速度（単位時間当たりの入力文字数））を分析し、この分析した入力状態を取得する。 When the message information output from the communication device 20 is text directly output from the information communication unit 21 without passing through the voice recognition unit 22, the input state acquisition unit 31 a of the input analysis unit 31 inputs this text ( Text input speed (number of input characters per unit time)) is analyzed, and the analyzed input state is acquired.

また、入力状態取得部３１ａは、コミュニケーション装置２０からの情報が音声認識部２２から出力されたテキストである場合、情報通信部２１が受信した、音声認識部２２による変換前の音声に基づいて、このテキストの元の音声の入力状態（音声速度（テンポ）（単位時間当たりの発話文字数））を分析し、この分析した入力状態を取得する（Ｓ１４）。
発話意図推定部３１ｂは、情報通信部２１または音声認識部２２から出力されたテキストを分析することで、送信ユーザの通信相手の発話意図またはトピックを推定する（Ｓ１５）。 In addition, when the information from the communication device 20 is text output from the speech recognition unit 22, the input state acquisition unit 31a is based on the speech received by the information communication unit 21 and before conversion by the speech recognition unit 22. The input state (speech speed (tempo) (number of uttered characters per unit time)) of the original voice of the text is analyzed, and the analyzed input state is acquired (S14).
The utterance intention estimation unit 31b analyzes the text output from the information communication unit 21 or the voice recognition unit 22 to estimate the utterance intention or topic of the communication partner of the transmitting user (S15).

相づち表現抽出部３２は、入力状態取得部３１ａ、発話意図推定部３１ｂによる処理対象となったテキストを分析することで、このテキストから相づち表現（感声表現）を抽出する（Ｓ１６）。 The matching expression extraction unit 32 extracts the matching expression (voice expression) from the text by analyzing the text that has been processed by the input state acquisition unit 31a and the speech intention estimation unit 31b (S16).

相づち表現抽出部３２は、Ｓ１４で取得された入力状態と、Ｓ１５で推定された発話意図とを検索条件として、送信ユーザの通信相手への相づち表現を相づち表現記憶部３３における該当ユーザの相づち表現学習情報から検索する（Ｓ１７）。 The matching expression extraction unit 32 uses the input state acquired in S14 and the utterance intention estimated in S15 as a search condition, and the matching expression of the corresponding user in the expression storage unit 33 is calculated based on the matching expression to the communication partner of the transmitting user. Search from the learning information (S17).

上記の検索条件に対応する相づち表現が相づち表現記憶部３３に既に記憶されている場合（Ｓ１８のＹｅｓ）、相づち表現抽出部３２は、この相づち表現を、Ｓ１６で抽出された相づち表現に更新する（Ｓ１９）。 When the matching expression corresponding to the search condition is already stored in the matching expression storage unit 33 (Yes in S18), the matching expression extraction unit 32 updates the matching expression to the matching expression extracted in S16. (S19).

一方、上記の検索条件における相づち表現が相づち表現記憶部３３にまだ記憶されていない場合（Ｓ１８のＮｏ）、相づち表現抽出部３２は、この相づち表現を、上記の検索条件（入力状態、発話意図）に対応づけて相づち表現記憶部３３における該当ユーザの相づち表現学習情報に新たに登録する（Ｓ２０）。 On the other hand, if the matching expression in the search condition is not yet stored in the matching expression storage unit 33 (No in S18), the matching expression extraction unit 32 converts the matching expression into the search condition (input state, speech intention). ) And newly registered in the corresponding expression learning information of the corresponding user in the corresponding expression storage unit 33 (S20).

このようにして、送信ユーザから受信ユーザ宛のメッセージ情報から、受信ユーザへの相づち表現を抽出し、この相づち表現を相づち表現記憶部３３に記憶することができ、送信ユーザから受信ユーザ宛にメッセージ情報が送信されるたびに、相づち表現を相づち表現記憶部３３に記憶することを繰り返すことで、相づち表現学習装置３０は、入力状態、発話意図に対応する相づち表現を学習することができる。 In this way, from the message information addressed to the receiving user from the sending user, the matching expression to the receiving user can be extracted, and this matching expression can be stored in the matching expression storage unit 33, and the message from the sending user to the receiving user can be stored. Each time information is transmitted, by repeating the storing of the combined expression in the combined expression storage unit 33, the combined expression learning device 30 can learn the combined expression corresponding to the input state and the utterance intention.

次に、相づち表現学習処理により学習した相づち表現を用いた擬似応答送信処理について説明する。
図４は、本発明の一実施形態におけるリアルタイムコミュニケーションシステムによる擬似応答送信処理の一例を示すフローチャートである。
まず、受信ユーザにより利用される情報端末１０−２が、送信ユーザにより利用される情報端末１０−１から当該情報端末１０−２宛のメッセージ情報をコミュニケーション装置２０から受信する（Ｓ２１）。 Next, the pseudo-response transmission process using the correlation expression learned by the correlation expression learning process will be described.
FIG. 4 is a flowchart illustrating an example of a pseudo response transmission process by the real-time communication system according to the embodiment of the present invention.
First, the information terminal 10-2 used by the receiving user receives message information addressed to the information terminal 10-2 from the communication device 20 from the information terminal 10-1 used by the transmitting user (S21).

そして、この情報端末１０−２を利用する受信ユーザが送信ユーザ（通信相手）宛の返信となるメッセージ情報を入力中である（入力完了したメッセージ情報を情報通信部２１が受信していない）場合で（Ｓ２２のＹｅｓ）、この入力中のメッセージ情報がテキストでなく音声情報である、つまり、情報通信部２１が保持する、情報端末１０−２が送受信できるメッセージ情報の種別が音声情報であれば（Ｓ２３のＮｏ）、コミュニケーション装置２０の情報通信部２１は、この音声情報を音声認識部２２に送る。音声認識部２２は、この音声情報に対する音声認識処理を行なうことで音声情報をテキストに変換し、このテキストを、擬似応答送信装置４０の入力分析部４１に出力する（Ｓ２４）。 When the receiving user who uses the information terminal 10-2 is inputting message information that is a reply addressed to the transmitting user (communication partner) (the information communication unit 21 has not received the input message information). (Yes in S22), if the message information being input is not text but voice information, that is, if the type of message information held by the information communication unit 21 that can be transmitted and received by the information terminal 10-2 is voice information. (No in S23), the information communication unit 21 of the communication device 20 sends the voice information to the voice recognition unit 22. The voice recognition unit 22 converts the voice information into text by performing voice recognition processing on the voice information, and outputs the text to the input analysis unit 41 of the pseudo-response transmission device 40 (S24).

また、上記の入力中のメッセージ情報がテキストである、つまり、情報通信部２１が保持する、情報端末１０−２が送受信できるメッセージ情報の種別がテキストであれば（Ｓ２３のＹｅｓ）、情報通信部２１は、このテキストを音声認識部２２を介さずに擬似応答送信装置４０の入力分析部４１に直接出力する。 If the message information being input is text, that is, if the type of message information held by the information communication unit 21 that can be transmitted and received by the information terminal 10-2 is text (Yes in S23), the information communication unit 21 directly outputs this text to the input analysis unit 41 of the pseudo-response transmission device 40 without going through the voice recognition unit 22.

入力分析部４１の入力状態取得部４１ａは、コミュニケーション装置２０から出力されたメッセージ情報が音声認識部２２を介さずに情報通信部２１から直接出力されたテキストである場合、このテキストの入力状態（テキスト入力速度（単位時間当たりの入力文字数））を分析し、この分析した入力状態を取得する。
また、入力状態取得部４１ａは、コミュニケーション装置２０からの情報が音声認識部２２から出力されたテキストである場合、情報通信部２１が受信した、音声認識部２２による変換前の音声に基づいて、このテキストの元の音声の入力状態（音声速度（テンポ）（単位時間当たりの発話文字数））を分析し、この分析した入力状態を取得する（Ｓ２５）。 When the message information output from the communication device 20 is text directly output from the information communication unit 21 without passing through the voice recognition unit 22, the input state acquisition unit 41 a of the input analysis unit 41 inputs this text ( Text input speed (number of input characters per unit time)) is analyzed, and the analyzed input state is acquired.
In addition, when the information from the communication device 20 is text output from the voice recognition unit 22, the input state acquisition unit 41a is based on the voice received by the information communication unit 21 and converted by the voice recognition unit 22. The input state (speech speed (tempo) (number of uttered characters per unit time)) of the original voice of the text is analyzed, and the analyzed input state is acquired (S25).

発話意図推定部４１ｂは、情報通信部２１または音声認識部２２から出力されたテキストを分析することで、受信ユーザの通信相手の発話意図またはトピックを推定する（Ｓ２６）。 The speech intention estimation unit 41b analyzes the text output from the information communication unit 21 or the speech recognition unit 22 to estimate the speech intention or topic of the communication partner of the receiving user (S26).

相づち表現検索部４２は、Ｓ２５で取得された入力状態と、Ｓ２６で推定された発話意図とを検索条件として、相づち表現を相づち表現記憶部３３における該当ユーザの相づち表現学習情報から検索する（Ｓ２７）。 The matching expression search unit 42 searches for the matching expression from the matching expression learning information of the corresponding user in the expression storage unit 33 using the input state acquired in S25 and the utterance intention estimated in S26 as search conditions (S27). ).

上記の検索条件に対応する相づち表現が相づち表現記憶部３３における該当ユーザの相づち表現学習情報に記憶されている場合（Ｓ２８のＹｅｓ）、相づち表現抽出部３２は、この相づち表現を情報出力制御部４３に送る。情報出力制御部４３は、この相づち表現を内部メモリに一時的に保持することで送信準備を行なう（Ｓ２９）。 In the case where the matching expression corresponding to the search condition is stored in the matching expression learning information of the corresponding user in the matching expression storage unit 33 (Yes in S28), the matching expression extraction unit 32 uses the matching expression as an information output control unit. Send to 43. The information output control unit 43 prepares for transmission by temporarily storing this combined expression in the internal memory (S29).

一方、上記の検索条件における相づち表現が相づち表現記憶部３３における該当ユーザの相づち表現学習情報に記憶されていない場合（Ｓ２８のＮｏ）、相づち表現抽出部３２は、相づち表現記憶部３３における該当ユーザの相づち表現学習情報から相づち表現をランダムに読み出し、この相づち表現を情報出力制御部４３に送る。情報出力制御部４３は、この相づち表現を内部メモリに一時的に保持することで送信準備を行なう（Ｓ３０）。 On the other hand, if the corresponding expression in the search condition is not stored in the corresponding expression learning information of the corresponding user in the corresponding expression storage unit 33 (No in S28), the corresponding expression extraction unit 32 determines that the corresponding user in the corresponding expression storage unit 33 The combination expression is randomly read out from the combination expression learning information, and the combination expression is sent to the information output control unit 43. The information output control unit 43 prepares for transmission by temporarily holding this combined expression in the internal memory (S30).

Ｓ２９またはＳ３０の後、Ｓ２２における、受信ユーザが送信ユーザ宛の返信として入力中のメッセージ情報の入力が完了して情報通信部２１に送信され、このメッセージ情報が、音声認識部２２による音声認識処理または音声合成部２３による音声合成処理を要する場合、つまり、情報通信部２１が保持する、情報端末１０−１が送受信できるメッセージ情報の種別と、情報端末１０−２が送受信できるメッセージ情報の種別とが異なる場合（Ｓ３１のＹｅｓ）、情報通信部２１は、このメッセージ情報に対する音声認識部２２による音声認識処理または音声合成部２３による音声合成処理に要する処理時間の予測を擬似応答送信装置４０の合成・認識処理時間予測部４４に指示する。
この指示を受けて、合成・認識処理時間予測部４４は、処理時間の予測値（合成・認識処理予測時間）を計算して、この計算した予測値を情報出力制御部４３に送る。 After S29 or S30, the input of the message information being input by the receiving user as a reply addressed to the transmitting user in S22 is completed and transmitted to the information communication unit 21, and this message information is processed by the voice recognition unit 22 as a voice recognition process. Alternatively, when speech synthesis processing by the speech synthesizer 23 is required, that is, the type of message information held by the information communication unit 21 and transmitted / received by the information terminal 10-1 and the type of message information transmitted / received by the information terminal 10-2 Are different (Yes in S31), the information communication unit 21 synthesizes the pseudo response transmission apparatus 40 to predict the processing time required for the voice recognition process by the voice recognition unit 22 or the voice synthesis process by the voice synthesis unit 23 for this message information. Instruct the recognition processing time prediction unit 44.
Upon receiving this instruction, the synthesis / recognition processing time prediction unit 44 calculates a predicted value of the processing time (synthesis / recognition processing predicted time) and sends the calculated predicted value to the information output control unit 43.

情報出力制御部４３の判断部４３ａが、この予測値が一定時間以上であると判断した場合（Ｓ３２のＹｅｓ）、判断部４３ａは、Ｓ２９またはＳ３０で送信準備していた相づち表現を情報通信部２１に送信する。情報通信部２１は、この相づち表現の送信先の情報端末がテキストを送受信可能であれば、このテキストを当該送信先の情報端末に送信する。また、情報通信部２１は、この相づち表現の送信先の情報端末がテキストを送受信可能でなければ、このテキストを音声合成部２３に送り、この音声合成部２３により変換された音声を当該送信先の情報端末に送信する（Ｓ３２→Ｓ３４）。 When the determination unit 43a of the information output control unit 43 determines that the predicted value is equal to or longer than the predetermined time (Yes in S32), the determination unit 43a displays the corresponding expression prepared for transmission in S29 or S30 as the information communication unit. To 21. The information communication unit 21 transmits the text to the destination information terminal if the destination information terminal of the combined expression can transmit and receive text. In addition, if the information terminal that is the destination of the combined expression is not capable of transmitting and receiving text, the information communication unit 21 sends the text to the speech synthesizer 23 and the speech converted by the speech synthesizer 23 is sent to the destination. (S32 → S34).

一方で、Ｓ２９またはＳ３０の後、Ｓ２２における、受信ユーザから送信ユーザ宛の返信として入力中のメッセージ情報が情報通信部２１に送信されていない場合で（Ｓ３１のＮｏ）、情報出力制御部４３の判断部４３ａが、Ｓ２１での受信時刻から現在時刻までの経過時間（情報出力時間記憶部２４に記憶される経過時間）が一定時間以上であると判断した場合（Ｓ３３のＹｅｓ）、判断部４３ａは、Ｓ２９またはＳ３０で送信準備していた相づち表現を情報通信部２１に送信する。情報通信部２１は、この相づち表現の送信先の情報端末がテキストを送受信可能であれば、このテキストを当該送信先の情報端末に送信する。また、情報通信部２１は、この相づち表現の送信先の情報端末がテキストを送受信可能でなければ、このテキストを音声合成部２３に送り、この音声合成部２３により変換された音声を当該送信先の情報端末に送信する（Ｓ３３→Ｓ３４）。 On the other hand, after S29 or S30, when the message information being input as a reply addressed to the transmitting user from the receiving user is not transmitted to the information communication unit 21 in S22 (No in S31), the information output control unit 43 When the determination unit 43a determines that the elapsed time from the reception time in S21 to the current time (elapsed time stored in the information output time storage unit 24) is equal to or longer than a certain time (Yes in S33), the determination unit 43a Transmits the corresponding expression prepared for transmission in S29 or S30 to the information communication unit 21. The information communication unit 21 transmits the text to the destination information terminal if the destination information terminal of the combined expression can transmit and receive text. In addition, if the information terminal that is the destination of the combined expression is not capable of transmitting and receiving text, the information communication unit 21 sends the text to the speech synthesizer 23 and the speech converted by the speech synthesizer 23 is sent to the destination. (S33 → S34).

このようにして、上記の相づち表現学習処理で学習した情報に基づいて、新たなメッセージ情報に対応する相づち表現を、受信からの経過時間や処理時間の予測値に基づいて送信することができるので、送信ユーザと受信ユーザとの間のメッセージ情報を遅滞なく送受信することができる。よって、コミュニケーションチャネルを介したリアルタイムコミュニケーションを円滑に実現できる。 In this manner, based on the information learned in the above-described correlation expression learning process, the correlation expression corresponding to the new message information can be transmitted based on the elapsed time from reception or the predicted value of the processing time. The message information between the sending user and the receiving user can be transmitted and received without delay. Therefore, real-time communication via the communication channel can be realized smoothly.

（構成の変形例）
上記の例では、本発明の一実施形態におけるリアルタイムコミュニケーションシステムにおいて、情報端末１０−１，１０−２は、コミュニケーション装置２０、相づち表現学習装置３０、擬似応答送信装置４０とは別であり、これらの装置の機能を有しない例を説明したが、これに限らず、例えば、情報端末１０−１，１０−２は、コミュニケーション装置２０、相づち表現学習装置３０、擬似応答送信装置４０のうち少なくとも１種類の装置の機能を有してもよい。例えば、リアルタイムコミュニケーションシステムを、情報端末１０−１，１０−２（擬似応答送信装置４０の機能を有する）、コミュニケーション装置２０、相づち表現学習装置３０を有するシステムとしてもよい。 (Configuration variation)
In the above example, in the real-time communication system according to the embodiment of the present invention, the information terminals 10-1 and 10-2 are separate from the communication device 20, the matching expression learning device 30, and the pseudo-response transmission device 40. However, the present invention is not limited to this example. For example, the information terminals 10-1 and 10-2 may be at least one of the communication device 20, the matching expression learning device 30, and the pseudo-response transmission device 40. It may have the function of a type of device. For example, the real-time communication system may be a system including the information terminals 10-1 and 10-2 (having the function of the pseudo-response transmission device 40), the communication device 20, and the corresponding expression learning device 30.

また、本発明の一実施形態におけるリアルタイムコミュニケーションシステムにおいて、コミュニケーション装置２０の音声認識部２２および音声合成部２３の機能を情報端末１０−１，１０−２で実現し、コミュニケーション装置２０の情報通信部２１、情報出力時間記憶部２４の機能、相づち表現学習装置３０の機能、擬似応答送信装置４０の機能をサーバ装置で実現してもよい。 In the real-time communication system according to the embodiment of the present invention, the functions of the speech recognition unit 22 and the speech synthesis unit 23 of the communication device 20 are realized by the information terminals 10-1 and 10-2, and the information communication unit of the communication device 20 is realized. 21. The function of the information output time storage unit 24, the function of the matching expression learning device 30, and the function of the pseudo-response transmission device 40 may be realized by the server device.

なお、本発明は、上記実施形態に限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で種々に変形することが可能である。また、各実施形態は適宜組み合わせて実施してもよく、その場合組み合わせた効果が得られる。更に、上記実施形態には種々の発明が含まれており、開示される複数の構成要件から選択された組み合わせにより種々の発明が抽出され得る。例えば、実施形態に示される全構成要件からいくつかの構成要件が削除されても、課題が解決でき、効果が得られる場合には、この構成要件が削除された構成が発明として抽出され得る。 In addition, this invention is not limited to the said embodiment, In the implementation stage, it can change variously in the range which does not deviate from the summary. Further, the embodiments may be implemented in combination as appropriate, and in that case, the combined effect can be obtained. Furthermore, the present invention includes various inventions, and various inventions can be extracted by combinations selected from a plurality of disclosed constituent elements. For example, even if several constituent requirements are deleted from all the constituent requirements shown in the embodiment, if the problem can be solved and an effect can be obtained, the configuration from which the constituent requirements are deleted can be extracted as an invention.

また、各実施形態に記載した手法は、計算機（コンピュータ）に実行させることができるプログラム（ソフトウエア手段）として、例えば磁気ディスク（フロッピー（登録商標）ディスク、ハードディスク等）、光ディスク（ＣＤ−ＲＯＭ、ＤＶＤ、ＭＯ等）、半導体メモリ（ＲＯＭ、ＲＡＭ、フラッシュメモリ等）等の記録媒体に格納し、また通信媒体により伝送して頒布することもできる。なお、媒体側に格納されるプログラムには、計算機に実行させるソフトウエア手段（実行プログラムのみならずテーブルやデータ構造も含む）を計算機内に構成させる設定プログラムをも含む。本装置を実現する計算機は、記録媒体に記録されたプログラムを読み込み、また場合により設定プログラムによりソフトウエア手段を構築し、このソフトウエア手段によって動作が制御されることにより上述した処理を実行する。なお、本明細書でいう記録媒体は、頒布用に限らず、計算機内部あるいはネットワークを介して接続される機器に設けられた磁気ディスクや半導体メモリ等の記憶媒体を含むものである。 In addition, the method described in each embodiment is, for example, a magnetic disk (floppy (registered trademark) disk, hard disk, etc.), optical disk (CD-ROM, etc.) as a program (software means) that can be executed by a computer (computer). It can be stored in a recording medium such as a DVD, MO, etc., semiconductor memory (ROM, RAM, flash memory, etc.), or transmitted and distributed by a communication medium. The program stored on the medium side includes a setting program that configures software means (including not only the execution program but also a table and data structure) in the computer. A computer that implements this apparatus reads a program recorded on a recording medium, constructs software means by a setting program as the case may be, and executes the above-described processing by controlling the operation by this software means. The recording medium referred to in this specification is not limited to distribution, but includes a storage medium such as a magnetic disk or a semiconductor memory provided in a computer or a device connected via a network.

１０−１，１０−２…情報端末、２０…コミュニケーション装置、２１…情報通信部、２２…音声認識部、２３…音声合成部、２４…情報出力時間記憶部、３０…相づち表現学習装置、３１，４１…入力分析部、３１ａ，４１ａ…入力状態取得部、３１ｂ，４１ｂ…発話意図推定部、３２…相づち表現抽出部、３３…相づち表現記憶部、４０…擬似応答送信装置、４２…相づち表現検索部、４３…情報出力制御部、４３ａ…判断部、４３ｂ…送信部、４４…合成・認識処理時間予測部。 10-1 and 10-2 Information terminal, 20 Communication device, 21 Information communication unit, 22 Speech recognition unit, 23 Speech synthesis unit, 24 Information output time storage unit, 30 Synchronized expression learning device, 31 , 41 ... input analysis unit, 31a, 41a ... input state acquisition unit, 31b, 41b ... utterance intention estimation unit, 32 ... phase expression extraction unit, 33 ... phase expression storage unit, 40 ... pseudo-response transmission device, 42 ... phase expression Search unit, 43 ... information output control unit, 43a ... determination unit, 43b ... transmission unit, 44 ... synthesis / recognition processing time prediction unit.

Claims

A pseudo-response transmission device that transmits a phase that is a pseudo-response to message information transmitted from a sender,
An input state acquisition unit for acquiring a state of an input frequency of the message information;
An utterance intention estimation unit that estimates an utterance intention by the sender from the message information;
The input frequency state acquired by the input state acquisition unit from the storage device that stores the state of the input frequency of the message information, the utterance intention of the sender, and the corresponding expression to the sender in association with each other, and the A syntactic expression search unit that searches for a syntactic expression associated with the utterance intention estimated by the utterance intention estimation unit;
A determination unit that determines whether or not a delay that satisfies a predetermined condition occurs in a response from the receiver of the message information to the transmitter with respect to the transmitted message information;
A pseudo-response transmission apparatus comprising: a transmission unit that transmits the searched correlation expression to the sender when the determination unit determines that a delay that satisfies the predetermined condition occurs.

The message information from the sender is voice information,
A voice recognition unit that converts the voice information into text by performing voice recognition processing on the voice information;
The pseudo-response transmission apparatus according to claim 1, further comprising: a voice synthesis unit that converts the text of the serial expression into voice information by performing a voice synthesis process on the text of the serial expression transmitted by the transmission unit.

An input state acquisition unit for acquiring a state of input frequency of message information from the sender;
An utterance intention estimation unit that estimates an utterance intention of the communication partner of the sender from the message information;
The utterance extracted from the message information to the communication partner is extracted, and the extracted utterance is estimated by the utterance intention estimation unit and the state of the input frequency acquired by the input state acquisition unit. A correlation expression learning apparatus comprising: a correlation expression extraction unit that stores the correlation expression as learning information in a storage device in association with an intention.

An input status acquisition unit that acquires the status of the input frequency of the received message information;
An utterance intention estimation unit that estimates an utterance intention by a sender of the message information from the message information;
The input frequency state acquired by the input state acquisition unit from the storage device that stores the state of the input frequency of the message information, the utterance intention of the sender, and the corresponding expression to the sender in association with each other, and the A syntactic expression search unit that searches for a syntactic expression associated with the utterance intention estimated by the utterance intention estimation unit;
A determination unit that determines whether a delay that satisfies a predetermined condition occurs in a response from the receiver of the message information to the sender with respect to the message information from the sender;
An information terminal device comprising: a transmission unit that transmits the searched matching expression when the determination unit determines that a delay that satisfies the predetermined condition occurs.

A communication system having a server device and an information terminal device possessed by a sender of message information,
The server device
An input status acquisition unit that acquires the status of the input frequency of the received message information;
An utterance intention estimation unit that estimates an utterance intention by the sender from the message information;
From the storage device for storing the input frequency state acquired by the input state acquisition unit, the utterance intention estimated by the utterance intention estimation unit, and the text of the corresponding expression to the sender in association with each other, the input A phase expression search unit for searching for a phase expression associated with the state of the input frequency acquired by the state acquisition unit and the utterance intention estimated by the utterance intention estimation unit;
A determination unit that determines whether a delay that satisfies a predetermined condition occurs in a response from the receiver of the message information to the sender with respect to the message information from the sender;
A transmission unit that transmits the searched text of the combined expression to the sender when the determination unit determines that a delay that satisfies the predetermined condition occurs;
The information terminal device
When the message information from the sender is voice information, the voice information is converted into text by performing voice recognition processing on the voice information, and sent to the input state acquisition unit and the utterance intention estimation unit A recognition unit;
A communication system comprising: a speech synthesizer that performs speech synthesis processing on the mixed expression text transmitted from the transmission unit to convert the mixed expression text into speech information.

A pseudo-response transmission method performed by a pseudo-response transmission device,
Get the input frequency status of message information sent from the sender,
Estimating the utterance intention by the sender from the message information,
From the storage device that stores the input frequency state of the message information, the utterance intention of the sender, and the corresponding expression to the sender in association with each other, the acquired input frequency state, and the estimated Search for syntactic expressions associated with utterance intentions,
Determining whether a delay that satisfies a predetermined condition occurs in a response from the receiver of the message information to the sender with respect to the transmitted message information;
A pseudo-response transmission method for transmitting the searched phase expression to the sender when it is determined that a delay that satisfies the predetermined condition occurs.

A method for learning a common expression performed by a common expression learning device,
Get the status of message information input from the sender,
Estimating the utterance intention of the communication partner of the sender from the message information,
Learning a correlation expression to the communication partner from the message information, and associating the extracted correlation expression with the acquired state of the input frequency and the estimated utterance intention A learning method for storing expressions in a storage device as information.

A pseudo-response transmission program that causes a processor to function as each unit of the pseudo-response transmission device according to claim 1.