JP6867939B2

JP6867939B2 - Computers, language analysis methods, and programs

Info

Publication number: JP6867939B2
Application number: JP2017243880A
Authority: JP
Inventors: 雄太藤澤; 友春羽角; 恵理川井
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2017-12-20
Filing date: 2017-12-20
Publication date: 2021-05-12
Anticipated expiration: 2037-12-20
Also published as: JP2019109424A

Description

本発明は、ユーザが発した音声（言語）を解析し、解析結果に基づいてユーザに対する応答を行う計算機システムに関する。 The present invention relates to a computer system that analyzes a voice (language) emitted by a user and responds to the user based on the analysis result.

近年、施設に設置されたロボット等の対話装置を活用した技術が注目されている。対話装置は、ユーザが発した音声の情報（音声信号）を取得し、音声信号を解析することによってユーザの発話意図を特定する。また、対話装置は、発話意図に応じてユーザに対するコミュニケーションを行い、又は、ユーザに対してサービスを提供する。 In recent years, a technology utilizing a dialogue device such as a robot installed in a facility has attracted attention. The dialogue device acquires voice information (voice signal) uttered by the user and analyzes the voice signal to identify the user's utterance intention. In addition, the dialogue device communicates with the user or provides a service to the user according to the intention of speaking.

サービスの提供又はコミュニケーションを適切に行うためには、ユーザの発話意図を正確に特定する必要がある。ユーザの発話意図を理解する方法として、例えば、特許文献１及び特許文献２に記載の技術が知られている。 In order to properly provide services or communicate, it is necessary to accurately identify the user's utterance intention. As a method of understanding the user's utterance intention, for example, the techniques described in Patent Document 1 and Patent Document 2 are known.

特許文献１には、「音声処理装置は、音声信号を取得する音声入力部と、音声入力部によって取得された音声信号に対して音声認識を行う音声認識部と、音声認識部によって認識された認識結果に基づいて、利用者の意図を理解する意図理解部と、意図理解部によって理解された理解結果に基づいて利用者に対して質問を行う質問部と、を備え、質問部は、理解結果と所定の優先度に応じて利用者に対する質問内容を変更する。」ことが記載されている。 In Patent Document 1, "The voice processing device is recognized by a voice input unit that acquires a voice signal, a voice recognition unit that performs voice recognition for the voice signal acquired by the voice input unit, and a voice recognition unit. It is equipped with an intention understanding unit that understands the user's intention based on the recognition result and a question unit that asks a question to the user based on the understanding result understood by the intention understanding department. The content of the question to the user will be changed according to the result and the predetermined priority. "

特許文献２には、「解析可能な単位の自然言語文の一部が入力するごとに、各解析処理部で逐次的かつ並列的に解析処理を実行する逐次解析処理部１０と、逐次解析処理部の各解析処理部での解析結果に基づいて、対話応答文などの出力を得る出力部３、４とを備える。逐次解析処理部に用意された各処理部は、自らの処理部での直前又はそれより前の過去の解析結果と、他の処理部での直前又はそれより前の過去の解析結果とを取得し、取得した解析結果を参照しながら先読みをしつつ解析結果を得る。」ことが記載されている。 Patent Document 2 describes, "Sequential analysis processing unit 10 that executes analysis processing sequentially and in parallel in each analysis processing unit each time a part of a natural language sentence of an analyzable unit is input, and sequential analysis processing. It is provided with output units 3 and 4 for obtaining an output such as a dialogue response sentence based on the analysis result in each analysis processing unit of the unit. Each processing unit prepared in the sequential analysis processing unit is in its own processing unit. The analysis result of the past immediately before or before and the analysis result of the past immediately before or before at another processing unit are acquired, and the analysis result is obtained while pre-reading while referring to the acquired analysis result. "Is stated.

特開２０１７−５８５４５号公報Japanese Unexamined Patent Publication No. 2017-58545 特開２０１７−１０２７７１号公報Japanese Unexamined Patent Publication No. 2017-102771

特許文献１及び特許文献２に記載の技術では、ユーザの発話におけるフィラー及び間が考慮されていない。フィラー及び間が含まれる発話が行われた場合、対話装置は発話の区切れを正しく認識できない。すなわち、処理単位の音声信号（文字列）を特定できない。したがって、フィラー及び間を含む発話が行われた場合、従来の対話装置はユーザの発話意図を正確に特定できない。 In the techniques described in Patent Document 1 and Patent Document 2, fillers and intervals in the user's utterance are not taken into consideration. When an utterance containing a filler and a gap is made, the dialogue device cannot correctly recognize the utterance break. That is, the audio signal (character string) of the processing unit cannot be specified. Therefore, when the utterance including the filler and the interval is performed, the conventional dialogue device cannot accurately identify the user's utterance intention.

本発明は、適切なサービスの提供又はコミュニケーションを行うために、フィラー及び間を考慮した言語解析を実現する装置、方法、及びプログラムを提供する。 The present invention provides devices, methods, and programs that realize fillers and intervening language analysis in order to provide appropriate services or communicate.

本願において開示される発明の代表的な一例を示せば以下の通りである。すなわち、ユーザが発した音声に対応する音声信号を処理する計算機であって、演算装置、前記演算装置に接続される記憶装置、及び前記演算装置に接続される通信インタフェースを備え、前記演算装置は、前記通信インタフェースを介して前記音声信号を受信し、前記受信した音声信号を複数の文字列から構成されるテキストに変換し、前記変換されたテキストを解析することによって、前記ユーザの発話が継続中であることを示す発話継続文字列が前記変換されたテキストに含まれるか否かを判定し、前記発話継続文字列が前記変換されたテキストに含まれると判定された場合、前記記憶装置に前記変換されたテキストを蓄積し、前記発話継続文字列が前記変換されたテキストに含まれないと判定された場合、一つ以上の前記変換されたテキストを用いて出力テキストを生成し、前記出力テキストに基づいて、前記受信した音声信号に対応する音声を発した前記ユーザの発話意図を特定し、前記ユーザの発話に対する応答を行う装置に、前記特定されたユーザの発話意図を示す情報を送信することを特徴とする。 A typical example of the invention disclosed in the present application is as follows. That is, it is a computer that processes a voice signal corresponding to a voice uttered by a user, and includes a calculation device, a storage device connected to the calculation device, and a communication interface connected to the calculation device. , The voice signal is received via the communication interface, the received voice signal is converted into a text composed of a plurality of character strings, and the converted text is analyzed to continue the utterance of the user. It is determined whether or not the utterance continuation character string indicating that the utterance is inside is included in the converted text, and when it is determined that the utterance continuation character string is included in the converted text, the storage device stores the utterance continuation character string. When the converted text is accumulated and it is determined that the utterance continuation character string is not included in the converted text, an output text is generated using one or more of the converted texts, and the output is described. Based on the text, the utterance intention of the user who has emitted the voice corresponding to the received voice signal is specified, and information indicating the utterance intention of the specified user is transmitted to the device that responds to the user's utterance. It is characterized by doing.

本発明によれば、フィラー及び間を考慮した言語解析を実現できる。したがって、適切にユーザの発話意図を特定し、サービスの提供又はコミュニケーションを行うことが可能となる。上記した以外の課題、構成及び効果は、以下の実施例の説明により明らかにされる。 According to the present invention, it is possible to realize a language analysis in consideration of fillers and spaces. Therefore, it is possible to appropriately identify the user's utterance intention and provide a service or communicate. Issues, configurations and effects other than those mentioned above will be clarified by the description of the following examples.

実施例１の計算機システムの構成例を示す図である。It is a figure which shows the configuration example of the computer system of Example 1. FIG. 実施例１の計算機のハードウェア構成の一例を示す図である。It is a figure which shows an example of the hardware configuration of the computer of Example 1. FIG. 実施例１の計算機が保持する発話継続文字列情報のデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of the utterance continuation character string information held by the computer of Example 1. FIG. 実施例１の計算機が保持する発話継続文字列情報のデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of the utterance continuation character string information held by the computer of Example 1. FIG. 実施例１の計算機が保持する意図理解情報のデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of the intention understanding information held by the computer of Example 1. FIG. 実施例１の計算機が保持する回答生成情報のデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of the answer generation information held by the computer of Example 1. FIG. 実施例１のテキスト送信判定部が実行する処理の一例を説明するフローチャートである。It is a flowchart explaining an example of the process executed by the text transmission determination part of Example 1. FIG. 実施例１の計算機システムにおける処理の流れの一例を示すシーケンス図である。It is a sequence diagram which shows an example of the processing flow in the computer system of Example 1. FIG. 実施例１の計算機システムにおける処理の流れの一例を示すシーケンス図である。It is a sequence diagram which shows an example of the processing flow in the computer system of Example 1. FIG. 実施例２のテキスト送信判定部が実行する処理の一例を説明するフローチャートである。It is a flowchart explaining an example of the process executed by the text transmission determination part of Example 2. FIG.

以下、本発明の実施例を、図面を用いて説明する。ただし、本発明は以下に示す実施の形態の記載内容に限定して解釈されるものではない。本発明の思想ないし趣旨から逸脱しない範囲で、その具体的構成を変更し得ることは当業者であれば容易に理解される。 Hereinafter, examples of the present invention will be described with reference to the drawings. However, the present invention is not construed as being limited to the description of the embodiments shown below. It is easily understood by those skilled in the art that a specific configuration thereof can be changed without departing from the idea or gist of the present invention.

以下に説明する発明の構成において、同一又は類似する構成又は機能には同一の符号を付し、重複する説明は省略する。 In the configurations of the invention described below, the same or similar configurations or functions are designated by the same reference numerals, and duplicate description will be omitted.

本明細書等における「第１」、「第２」、「第３」等の表記は、構成要素を識別するために付するものであり、必ずしも、数又は順序を限定するものではない。 The notations such as "first", "second", and "third" in the present specification and the like are attached to identify the components, and do not necessarily limit the number or order.

図面等において示す各構成の位置、大きさ、形状、及び範囲等は、発明の理解を容易にするため、実際の位置、大きさ、形状、及び範囲等を表していない場合がある。したがって、本発明では、図面等に開示された位置、大きさ、形状、及び範囲等に限定されない。 The position, size, shape, range, etc. of each configuration shown in the drawings and the like may not represent the actual position, size, shape, range, etc., in order to facilitate understanding of the invention. Therefore, the present invention is not limited to the position, size, shape, range, etc. disclosed in the drawings and the like.

図１は、実施例１の計算機システムの構成例を示す図である。 FIG. 1 is a diagram showing a configuration example of the computer system of the first embodiment.

計算機システムは、計算機１００、通信装置１０１、及び対話装置１０２から構成される。計算機１００及び通信装置１０１は、ネットワーク１０５を介して互いに接続される。また、通信装置１０１及び対話装置１０２は、図示しない無線ネットワークを介して互いに接続される。なお、通信装置１０１及び対話装置１０２は、有線ネットワークを介して接続されてもよい。 The computer system includes a computer 100, a communication device 101, and a dialogue device 102. The computer 100 and the communication device 101 are connected to each other via the network 105. Further, the communication device 101 and the dialogue device 102 are connected to each other via a wireless network (not shown). The communication device 101 and the dialogue device 102 may be connected via a wired network.

なお、ネットワーク１０５は、ＬＡＮ（ＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋ）及びＷＡＮ（ＷｉｄｅＡｒｅａＮｅｔｗｏｒｋ）等であり、接続方式は無線及び有線のいずれでもよい。 The network 105 is a LAN (Local Area Network), a WAN (Wide Area Network), or the like, and the connection method may be either wireless or wired.

対話装置１０２は、ユーザ１０３とコミュニケーションを行う装置である。対話装置１０２は、例えば、ロボット及びタブレット端末等である。対話装置１０２は、ユーザ１０３が発する音声を取得する音声取得装置（図示省略）、ユーザ１０３に対して音声を出力する音声出力装置（図示省略）、及び通信装置１０１と通信するためのネットワークインタフェース（図示省略）を有する。 The dialogue device 102 is a device that communicates with the user 103. The dialogue device 102 is, for example, a robot, a tablet terminal, or the like. The dialogue device 102 is a voice acquisition device (not shown) that acquires voice emitted by the user 103, a voice output device (not shown) that outputs voice to the user 103, and a network interface for communicating with the communication device 101 (not shown). (Not shown).

通信装置１０１は、計算機１００及び対話装置１０２の間の通信を制御する装置である。通信装置１０１は、例えば、ルータ及びゲートウェイ装置等である。 The communication device 101 is a device that controls communication between the computer 100 and the dialogue device 102. The communication device 101 is, for example, a router, a gateway device, or the like.

計算機１００は、ユーザ１０３の発話意図を特定し、発話意図に沿ったコミュニケーションを行うための情報（テキスト）を生成する。計算機１００のハードウェア構成は図２を用いて説明する。ここで、テキストは一つ以上の文字列から構成されるデータである。 The computer 100 identifies the utterance intention of the user 103 and generates information (text) for communicating according to the utterance intention. The hardware configuration of the computer 100 will be described with reference to FIG. Here, the text is data composed of one or more character strings.

計算機１００は、音声処理部１１０及び言語処理部１１１を有する。また、計算機１００は、発話継続文字列情報１３０、意図理解情報１３１、及び回答生成情報１３２を保持する。 The computer 100 has a voice processing unit 110 and a language processing unit 111. Further, the computer 100 holds the utterance continuation character string information 130, the intention understanding information 131, and the answer generation information 132.

発話継続文字列情報１３０は、発話継続文字列を管理するための情報である。ここで、発話継続文字列は、フィラー及び間を含む発話を検知するための文字列である。後述するように、計算機１００は、テキスト中の発話継続文字列の有無に基づいて、ユーザ１０３の発話が継続しているか否かを判定する。発話継続文字列情報１３０のデータ構造は図３Ａ及び図３Ｂを用いて説明する。 The utterance continuation character string information 130 is information for managing the utterance continuation character string. Here, the utterance continuation character string is a character string for detecting an utterance including a filler and an interval. As will be described later, the computer 100 determines whether or not the utterance of the user 103 is continued based on the presence or absence of the utterance continuation character string in the text. The data structure of the utterance continuation character string information 130 will be described with reference to FIGS. 3A and 3B.

意図理解情報１３１は、ユーザ１０３の発話意図を特定するための情報である。意図理解情報１３１のデータ構造は図４を用いて説明する。 The intention understanding information 131 is information for specifying the utterance intention of the user 103. The data structure of the intention understanding information 131 will be described with reference to FIG.

回答生成情報１３２は、ユーザ１０３の発話に対する回答を生成するための情報である。回答生成情報１３２のデータ構造は図５を用いて説明する。 The answer generation information 132 is information for generating a response to the utterance of the user 103. The data structure of the answer generation information 132 will be described with reference to FIG.

音声処理部１１０は、ユーザ１０３が発した音声に対応する音声信号をテキストに変換し、また、計算機１００が生成したテキストを音声信号に変換する。 The voice processing unit 110 converts a voice signal corresponding to the voice emitted by the user 103 into a text, and also converts the text generated by the computer 100 into a voice signal.

言語処理部１１１は、テキストの解析結果に基づいてユーザ１０３の発話意図を特定し、また、ユーザ１０３に対する回答を音声として対話装置１０２から出力するための回答テキストを生成する。言語処理部１１１は、テキスト受信部１２０、テキスト送信判定部１２１、意図理解部１２２、及び回答生成部１２３を含む。 The language processing unit 111 identifies the utterance intention of the user 103 based on the analysis result of the text, and generates an answer text for outputting the answer to the user 103 as a voice from the dialogue device 102. The language processing unit 111 includes a text receiving unit 120, a text transmission determining unit 121, an intention understanding unit 122, and an answer generating unit 123.

テキスト受信部１２０は、音声処理部１１０が送信したテキストを受信し、テキスト送信判定部１２１に受信したテキストを送信する。 The text receiving unit 120 receives the text transmitted by the voice processing unit 110, and transmits the received text to the text transmission determining unit 121.

テキスト送信判定部１２１は、テキスト受信部１２０からテキストを受信した場合、受信したテキストを解析し、発話継続文字列情報１３０及び解析結果に基づいて意図理解部１２２へのテキストの送信タイミングを判定する。また、テキスト送信判定部１２１は、意図理解部１２２にテキストを送信する場合、意図理解処理が処理する一つのまとまった音声に対応する出力テキストを生成し、当該出力テキストを意図理解部１２２に送信する。 When the text transmission unit 121 receives the text from the text reception unit 120, the text transmission determination unit 121 analyzes the received text and determines the transmission timing of the text to the intention understanding unit 122 based on the utterance continuation character string information 130 and the analysis result. .. Further, when the text transmission determination unit 121 transmits the text to the intention understanding unit 122, the text transmission determination unit 121 generates an output text corresponding to one set of voices processed by the intention understanding process, and transmits the output text to the intention understanding unit 122. To do.

意図理解部１２２は、テキスト送信判定部１２１から受信した出力テキスト及び意図理解情報１３１に基づいて、ユーザ１０３の発話意図を特定するための意図理解処理を実行する。意図理解部１２２は、処理結果として、ユーザ１０３の発話意図を示す意図情報（図４参照）を回答生成部１２３に送信する。 The intention understanding unit 122 executes an intention understanding process for identifying the utterance intention of the user 103 based on the output text and the intention understanding information 131 received from the text transmission determination unit 121. As a processing result, the intention understanding unit 122 transmits the intention information (see FIG. 4) indicating the utterance intention of the user 103 to the response generation unit 123.

回答生成部１２３は、意図理解部１２２から送信された意図情報に基づいて回答生成情報１３２を参照し、対話装置１０２が出力する回答の回答テキストを生成する。 The answer generation unit 123 refers to the answer generation information 132 based on the intention information transmitted from the intention understanding unit 122, and generates the answer text of the answer output by the dialogue device 102.

本実施例の計算機システムは、ユーザ１０３の発話に対する応答として、回答（音声）を出力する。これによって、ユーザ１０３と対話装置１０２との間でコミュニケーションが行われる。なお、ユーザ１０３の発話に対する応答はこれに限定されず、映像及び音楽等の再生、商品の提供、並びに行動の補助等、様々なものが考えられる。 The computer system of this embodiment outputs a response (voice) as a response to the utterance of the user 103. As a result, communication is performed between the user 103 and the dialogue device 102. The response to the utterance of the user 103 is not limited to this, and various things such as reproduction of video and music, provision of products, assistance of actions, and the like can be considered.

なお、複数の計算機１００に各機能部を配置してもよい。例えば、音声処理部１１０、テキスト受信部１２０、テキスト送信判定部１２１、意図理解部１２２を有する第１計算機と、回答生成部１２３を有する第２計算機とから構成される計算機システムでもよい。また、計算機１００が有する情報は、複数の計算機がアクセス可能なストレージシステムに格納してよい。 In addition, each functional unit may be arranged in a plurality of computers 100. For example, a computer system may be composed of a first computer having a voice processing unit 110, a text receiving unit 120, a text transmission determining unit 121, and an intention understanding unit 122, and a second computer having an answer generating unit 123. Further, the information held by the computer 100 may be stored in a storage system accessible to a plurality of computers.

なお、計算機１００が有する各機能部は、複数の機能部を一つの機能部にまとめてもよいし、一つの機能部を機能毎に複数の機能部に分けてもよい。 In each functional unit included in the computer 100, a plurality of functional units may be combined into one functional unit, or one functional unit may be divided into a plurality of functional units for each function.

図２は、実施例１の計算機１００のハードウェア構成の一例を示す図である。 FIG. 2 is a diagram showing an example of the hardware configuration of the calculator 100 of the first embodiment.

計算機１００は、プロセッサ２００、メモリ２０１、及びネットワークインタフェース２０２を有する。各ハードウェア構成は、内部バスを介して互いに接続される。なお、計算機１００は、ＨＤＤ（ＨａｒｄＤｉｓｋＤｒｉｖｅ）及びＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）等の記憶装置を有してもよい。また、計算機１００は、キーボード、マウス、及びタッチパネル等の入力装置、並びに、ディスプレイ等の出力装置を有してもよい。 The computer 100 has a processor 200, a memory 201, and a network interface 202. The hardware configurations are connected to each other via an internal bus. The calculator 100 may have a storage device such as an HDD (Hard Disk Drive) and an SSD (Solid State Drive). Further, the computer 100 may have an input device such as a keyboard, a mouse, and a touch panel, and an output device such as a display.

プロセッサ２００は、演算装置であり、メモリ２０１に格納されるプログラムを実行する。プロセッサ２００がプログラムにしたがって処理を実行することによって、特定の機能を実現する機能部（モジュール）として動作する。以下の説明では、機能部を主語に処理を説明する場合、プロセッサ２００が当該機能部を実現するプログラムを実行していることを示す。 The processor 200 is an arithmetic unit and executes a program stored in the memory 201. When the processor 200 executes processing according to a program, it operates as a functional unit (module) that realizes a specific function. In the following description, when the process is described with the functional unit as the subject, it is shown that the processor 200 is executing the program that realizes the functional unit.

メモリ２０１は、記憶装置であり、プロセッサ２００が実行するプログラム及びプログラムが使用する情報を格納する。本実施例のメモリ２０１は、音声処理部１１０及び言語処理部１１１を実現するプログラムを格納する。また、メモリ２０１は、発話継続文字列情報１３０、意図理解情報１３１、及び回答生成情報１３２を格納する。また、メモリ２０１は、プログラムが使用するワークエリアと、テキストを蓄積するためのバッファを含む。 The memory 201 is a storage device, and stores a program executed by the processor 200 and information used by the program. The memory 201 of this embodiment stores a program that realizes the voice processing unit 110 and the language processing unit 111. Further, the memory 201 stores the utterance continuation character string information 130, the intention understanding information 131, and the answer generation information 132. The memory 201 also includes a work area used by the program and a buffer for accumulating text.

ネットワークインタフェース２０２は、ネットワークを介して外部装置と接続するためのインタフェースである。 The network interface 202 is an interface for connecting to an external device via a network.

図３Ａ及び図３Ｂは、実施例１の計算機１００が保持する発話継続文字列情報１３０のデータ構造の一例を示す図である。 3A and 3B are diagrams showing an example of a data structure of the utterance continuation character string information 130 held by the computer 100 of the first embodiment.

本実施例では、言語の種別毎に発話継続文字列情報１３０が存在する。図３Ａは日本語の発話継続文字列情報１３０−１を示し、図３Ｂは英語の発話継続文字列情報１３０−２を示す。 In this embodiment, the utterance continuation character string information 130 exists for each language type. FIG. 3A shows Japanese utterance continuation character string information 130-1, and FIG. 3B shows English utterance continuation character string information 130-2.

発話継続文字列情報１３０は、発話継続文字列３０１及び位置３０２から構成されるエントリを一つ以上含む。 The utterance continuation character string information 130 includes one or more entries composed of the utterance continuation character string 301 and the position 302.

発話継続文字列３０１は、発話継続文字列を格納するフィールドである。位置３０２は、ユーザが発した音声の中で発話継続文字列に対応する音声が出現する位置を格納するフィールドである。本実施例の位置３０２には、音声信号が変換されたテキストにおける発話継続文字列の位置（検知範囲）が格納される。なお、位置３０２を用いた処理については実施例２で説明する。 The utterance continuation character string 301 is a field for storing the utterance continuation character string. The position 302 is a field for storing the position where the voice corresponding to the utterance continuation character string appears in the voice uttered by the user. At the position 302 of this embodiment, the position (detection range) of the utterance continuation character string in the text to which the voice signal is converted is stored. The process using the position 302 will be described in the second embodiment.

本実施例では、発話継続文字列情報１３０は予め設定されているものとする。発話継続文字列情報１３０は、管理者等が手動で設定してもよいし、ユーザ１０３及び対話装置１０２の会話の履歴を用いた機械学習を実行することによって生成してもよい。 In this embodiment, it is assumed that the utterance continuation character string information 130 is set in advance. The utterance continuation character string information 130 may be manually set by the administrator or the like, or may be generated by executing machine learning using the conversation history of the user 103 and the dialogue device 102.

図４は、実施例１の計算機１００が保持する意図理解情報１３１のデータ構造の一例を示す図である。 FIG. 4 is a diagram showing an example of the data structure of the intention understanding information 131 held by the computer 100 of the first embodiment.

意図理解情報１３１は、発話内容４０１及び意図４０２から構成されるエントリを一つ以上含む。 The intention understanding information 131 includes one or more entries composed of the utterance content 401 and the intention 402.

発話内容４０１は、発話内容を示すテキストが格納されるフィールドである。意図４０２は、発話内容４０１に対応する発話を行ったユーザ１０３の発話意図を示す情報を格納するフィールドである。以下の説明では、意図４０２に格納される値を意図情報と記載する。 The utterance content 401 is a field in which a text indicating the utterance content is stored. The intent 402 is a field for storing information indicating the utterance intention of the user 103 who has made an utterance corresponding to the utterance content 401. In the following description, the value stored in the intention 402 is described as the intention information.

本実施例では、意図理解情報１３１は予め設定されているものとする。意図理解情報１３１は、管理者等が手動で設定する。 In this embodiment, it is assumed that the intention understanding information 131 is set in advance. The intention understanding information 131 is manually set by the administrator or the like.

図５は、実施例１の計算機１００が保持する回答生成情報１３２のデータ構造の一例を示す図である。 FIG. 5 is a diagram showing an example of the data structure of the answer generation information 132 held by the computer 100 of the first embodiment.

回答生成情報１３２は、意図５０１及び回答内容５０２から構成されるエントリを一つ以上含む。 The answer generation information 132 includes one or more entries composed of the intent 501 and the answer content 502.

意図５０１は、意図４０２と同一のフィールドである。回答内容５０２は、対話装置１０２が音声として出力する回答のテキスト（回答テキスト）を格納するフィールドである。なお、一つの発話意図に対して、複数の回答テキストが対応づけられていてもよい。この場合、ランダムに回答テキストを選択する方法、又は、ユーザ１０３の属性等に基づいて回答テキストを選択する方法等が考えられる。 Intention 501 is the same field as intent 402. The answer content 502 is a field for storing the answer text (answer text) output as voice by the dialogue device 102. It should be noted that a plurality of answer texts may be associated with one utterance intention. In this case, a method of randomly selecting the answer text, a method of selecting the answer text based on the attributes of the user 103, and the like can be considered.

図６は、実施例１のテキスト送信判定部１２１が実行する処理の一例を説明するフローチャートである。 FIG. 6 is a flowchart illustrating an example of processing executed by the text transmission determination unit 121 of the first embodiment.

テキスト送信判定部１２１は、テキスト受信部１２０からテキストを受信する（ステップＳ１０１）。受信したテキストは、ワークエリアに一時的に格納される。 The text transmission determination unit 121 receives the text from the text reception unit 120 (step S101). The received text is temporarily stored in the work area.

次に、テキスト送信判定部１２１は、テキストに対して形態素解析を実行する（ステップＳ１０２）。形態素解析は公知の技術を用いればよいため詳細な説明を省略する。 Next, the text transmission determination unit 121 executes morphological analysis on the text (step S102). Since a known technique may be used for the morphological analysis, detailed description thereof will be omitted.

次に、テキスト送信判定部１２１は、形態素解析の結果及び発話継続文字列情報１３０に基づいて、ユーザ１０３が発した音声に対応するテキストの末尾に発話継続文字列が存在するか否かを判定する（ステップＳ１０３）。 Next, the text transmission determination unit 121 determines whether or not the utterance continuation character string exists at the end of the text corresponding to the voice uttered by the user 103 based on the result of the morphological analysis and the utterance continuation character string information 130. (Step S103).

具体的には、テキスト送信判定部１２１は、発話継続文字列情報１３０の各エントリの発話継続文字列３０１と、テキストの末尾に出現する文字列とを比較し、発話継続文字列３０１に一致する文字列がテキストの末尾に存在するか否かを判定する。 Specifically, the text transmission determination unit 121 compares the utterance continuation character string 301 of each entry of the utterance continuation character string information 130 with the character string appearing at the end of the text, and matches the utterance continuation character string 301. Determines if the string is at the end of the text.

テキストの末尾に発話継続文字列が存在しないと判定された場合、テキスト送信判定部１２１は、発話の終了と判定する。テキスト送信判定部１２１は、メモリ２０１（ワークエリア及びバッファ）に格納されるテキストを用いて出力テキストを生成し、意図理解部１２２に出力テキストを送信する（ステップＳ１０８）。その後、テキスト送信判定部１２１は、処理を終了する。 When it is determined that the utterance continuation character string does not exist at the end of the text, the text transmission determination unit 121 determines that the utterance has ended. The text transmission determination unit 121 generates an output text using the text stored in the memory 201 (work area and buffer), and transmits the output text to the intention understanding unit 122 (step S108). After that, the text transmission determination unit 121 ends the process.

具体的には、テキスト送信判定部１２１は、ワークエリア及びバッファに格納される各テキストから発話継続文字列を削除し、時系列順にテキストを結合することによって出力テキストを生成する。出力テキストが生成された後、メモリ２０１に格納されるテキストは削除される。なお、テキスト送信判定部１２１は、出力テキストの生成時にテキストを削除してもよいし、一連の処理が完了した後にテキストを削除してもよい。 Specifically, the text transmission determination unit 121 deletes the utterance continuation character string from each text stored in the work area and the buffer, and generates the output text by combining the texts in chronological order. After the output text is generated, the text stored in memory 201 is deleted. The text transmission determination unit 121 may delete the text at the time of generating the output text, or may delete the text after the series of processing is completed.

なお、バッファにテキストが格納されていない場合、テキスト送信判定部１２１は、ワークエリアに格納されるテキストを出力テキストとして生成する。 When the text is not stored in the buffer, the text transmission determination unit 121 generates the text stored in the work area as the output text.

テキストの末尾に発話継続文字列が存在すると判定された場合、テキスト送信判定部１２１は、発話が継続中であると判定し、バッファに受信したテキストを格納する（ステップＳ１０４）。すなわち、意味理解処理が実行される前のテキストがメモリ２０１に蓄積される。 When it is determined that the utterance continuation character string exists at the end of the text, the text transmission determination unit 121 determines that the utterance is continuing and stores the received text in the buffer (step S104). That is, the text before the meaning understanding process is executed is stored in the memory 201.

次に、テキスト送信判定部１２１は、タイマが起動中であるか否かを判定する（ステップＳ１０５）。本実施例のタイマは、意図理解部１２２へのテキストの出力タイミングを調整するための待ち時間を計測する。 Next, the text transmission determination unit 121 determines whether or not the timer is running (step S105). The timer of this embodiment measures the waiting time for adjusting the output timing of the text to the intention understanding unit 122.

タイマが起動中でないと判定された場合、テキスト送信判定部１２１は、タイマを起動し（ステップＳ１０７）、その後、ステップＳ１０１に戻る。この場合、テキスト送信判定部１２１は、テキストを受信するまで待ち状態に移行する。 If it is determined that the timer is not running, the text transmission determination unit 121 starts the timer (step S107), and then returns to step S101. In this case, the text transmission determination unit 121 shifts to the waiting state until the text is received.

タイマが起動中であると判定された場合、テキスト送信判定部１２１は、タイマが計測した待ち時間が閾値より大きいか否かを判定する（ステップＳ１０６）。 When it is determined that the timer is running, the text transmission determination unit 121 determines whether or not the waiting time measured by the timer is larger than the threshold value (step S106).

待ち時間が閾値以下であると判定された場合、テキスト送信判定部１２１は、計測時間を初期化し、待ち時間の計測を継続する。その後、テキスト送信判定部１２１は、ステップＳ１０１に戻る。この場合、テキスト送信判定部１２１は、次のテキストを受信するまで待ち状態に移行する。 When it is determined that the waiting time is equal to or less than the threshold value, the text transmission determination unit 121 initializes the measurement time and continues the measurement of the waiting time. After that, the text transmission determination unit 121 returns to step S101. In this case, the text transmission determination unit 121 shifts to the waiting state until the next text is received.

待ち時間が閾値より大きいと判定された場合、テキスト送信判定部１２１は、発話の終了と判定する。さらに、テキスト送信判定部１２１は、メモリ２０１（ワークエリア及びバッファ）に格納されるテキストを用いて出力テキストを生成し、意図理解部１２２に出力テキストを送信する（ステップＳ１０８）。このとき、テキスト送信判定部１２１は、タイマを停止する。その後、テキスト送信判定部１２１は、処理を終了する。 When it is determined that the waiting time is larger than the threshold value, the text transmission determination unit 121 determines that the utterance has ended. Further, the text transmission determination unit 121 generates an output text using the text stored in the memory 201 (work area and buffer), and transmits the output text to the intention understanding unit 122 (step S108). At this time, the text transmission determination unit 121 stops the timer. After that, the text transmission determination unit 121 ends the process.

次に、計算機システムにおける処理の流れについて説明する。図７Ａ及び図７Ｂは、実施例１の計算機システムにおける処理の流れの一例を示すシーケンス図である。図７Ａは、発話継続文字列が含まれる発話が行われた場合の処理の流れを示す。図７Ｂは、発話継続文字列が含まれない発話が行われた場合の処理の流れを示す。 Next, the processing flow in the computer system will be described. 7A and 7B are sequence diagrams showing an example of a processing flow in the computer system of the first embodiment. FIG. 7A shows a processing flow when an utterance including an utterance continuation character string is performed. FIG. 7B shows a processing flow when an utterance that does not include the utterance continuation character string is performed.

まず、図７Ａに示す処理の流れについて説明する。 First, the flow of processing shown in FIG. 7A will be described.

対話装置１０２は、ユーザ１０３が発した音声を取得し、当該音声の音声信号を生成する。また、対話装置１０２は、通信装置１０１と通信を行い、ネットワーク１０５を介して接続される計算機１００に音声信号を送信する（ステップＳ２０１）。 The dialogue device 102 acquires the voice emitted by the user 103 and generates an audio signal of the voice. Further, the dialogue device 102 communicates with the communication device 101 and transmits an audio signal to the computer 100 connected via the network 105 (step S201).

計算機１００の音声処理部１１０は、対話装置１０２から送信された音声信号をテキストに変換し、テキストをテキスト受信部１２０に送信する（ステップＳ２０２）。当該テキストは、テキスト受信部１２０からテキスト送信判定部１２１に送信される。なお、テキストの末尾には発話継続文字列が存在するものとする。 The voice processing unit 110 of the computer 100 converts the voice signal transmitted from the dialogue device 102 into text, and transmits the text to the text receiving unit 120 (step S202). The text is transmitted from the text receiving unit 120 to the text transmission determining unit 121. It is assumed that the utterance continuation character string exists at the end of the text.

テキスト送信判定部１２１は、テキストを受信した場合、図６に示す処理を実行する。テキストの末尾には発話継続文字列が存在するため、テキスト送信判定部１２１は、受信したテキストをメモリ２０１に蓄積する（ステップＳ２０３）。すなわち、バッファにテキストが格納される。また、テキスト送信判定部１２１は、タイマが起動していないため、タイマを起動する（ステップＳ２０４）。 When the text transmission determination unit 121 receives the text, the text transmission determination unit 121 executes the process shown in FIG. Since the utterance continuation character string exists at the end of the text, the text transmission determination unit 121 stores the received text in the memory 201 (step S203). That is, the text is stored in the buffer. Further, the text transmission determination unit 121 activates the timer because the timer has not been activated (step S204).

対話装置１０２は、待ち時間が閾値より大きくなった後、発話継続文字列が末尾に存在するテキストに対応する新たな音声をユーザ１０３から取得し、当該音声の音声信号を生成する。対話装置１０２は、音声信号を計算機１００に送信する（ステップＳ２０５）。 After the waiting time becomes larger than the threshold value, the dialogue device 102 acquires a new voice corresponding to the text having the utterance continuation character string at the end from the user 103, and generates a voice signal of the voice. The dialogue device 102 transmits an audio signal to the computer 100 (step S205).

音声処理部１１０は、受信した音声信号をテキストに変換し、テキスト受信部１２０を介して、テキスト送信判定部１２１にテキストを送信する（ステップＳ２０６）。 The voice processing unit 110 converts the received voice signal into text, and transmits the text to the text transmission determination unit 121 via the text receiving unit 120 (step S206).

テキスト送信判定部１２１は、テキストの末尾に発話文字列が存在するため、受信したテキストをメモリ２０１に蓄積する（ステップＳ２０７）。この時点では、タイマが起動中であり、かつ、待ち時間が閾値より大きいため、テキスト送信判定部１２１は、バッファに格納される二つのテキストを用いて出力テキストを生成し、意図理解部１２２に当該出力テキストを送信する（ステップＳ２０８）。 Since the utterance character string exists at the end of the text, the text transmission determination unit 121 stores the received text in the memory 201 (step S207). At this point, since the timer is running and the waiting time is larger than the threshold value, the text transmission determination unit 121 generates an output text using the two texts stored in the buffer, and causes the intention understanding unit 122 The output text is transmitted (step S208).

なお、テキストの末尾に発話文字列が存在しない場合、テキスト送信判定部１２１は、ワークエリア及びバッファの各々に格納されるテキストを用いて出力テキストを生成する。 If the utterance character string does not exist at the end of the text, the text transmission determination unit 121 generates an output text using the text stored in each of the work area and the buffer.

意図理解部１２２は、出力テキストを受信した場合、意図理解処理を実行する（ステップＳ２０９）。 When the intention understanding unit 122 receives the output text, the intention understanding unit 122 executes the intention understanding process (step S209).

意図理解処理では、意図理解部１２２は、意図理解情報１３１の発話内容４０１が出力テキストと一致するエントリを検索する。意図理解部１２２は、検索されたエントリの意図４０２に格納される値を処理結果として取得する。このとき、意図理解部１２２は、類似辞書等の意図理解情報１３１以外の情報を用いてもよい。 In the intention understanding process, the intention understanding unit 122 searches for an entry in which the utterance content 401 of the intention understanding information 131 matches the output text. The intent understanding unit 122 acquires the value stored in the intent 402 of the searched entry as the processing result. At this time, the intention understanding unit 122 may use information other than the intention understanding information 131 such as a similar dictionary.

意図理解部１２２は、回答生成部１２３に意図情報を送信する（ステップＳ２１０）。 The intention understanding unit 122 transmits the intention information to the response generation unit 123 (step S210).

回答生成部１２３は、意図情報を受信した場合、回答生成処理を実行する（ステップＳ２１１）。 When the answer generation unit 123 receives the intention information, the answer generation unit 123 executes the answer generation process (step S211).

回答生成処理では、回答生成部１２３は、回答生成情報１３２を参照し、意図５０１が意図情報に一致するエントリを検索する。回答生成部１２３は、検索されたエントリの回答内容５０２に格納される回答テキストを取得する。 In the answer generation process, the answer generation unit 123 refers to the answer generation information 132 and searches for an entry whose intention 501 matches the intention information. The answer generation unit 123 acquires the answer text stored in the answer content 502 of the searched entry.

回答生成部１２３は、音声処理部１１０に回答テキストを送信する（ステップＳ２１２）。 The answer generation unit 123 transmits the answer text to the voice processing unit 110 (step S212).

音声処理部１１０は、回答テキストを音声信号に変換し、ネットワーク１０５を介して対話装置１０２に音声信号を送信する（ステップＳ２１３）。 The voice processing unit 110 converts the answer text into a voice signal and transmits the voice signal to the dialogue device 102 via the network 105 (step S213).

図７Ａに示すように、計算機１００は、フィラー及び間を含む発話を検知するための発話継続文字列がテキストの末尾に存在する場合、発話の継続中であると判定し、テキストをメモリ２０１（バッファ）に蓄積する。計算機１００は、発話の終了を検知した場合、メモリ２０１（ワークエリア及びバッファ）に格納される一つ以上のテキストを用いて、意図理解処理の処理単位となる出力テキストを生成する。 As shown in FIG. 7A, when the utterance continuation character string for detecting the utterance including the filler and the interval exists at the end of the text, the computer 100 determines that the utterance is continuing, and stores the text in the memory 201 ( Accumulate in the buffer). When the computer 100 detects the end of the utterance, the computer 100 uses one or more texts stored in the memory 201 (work area and buffer) to generate an output text which is a processing unit of the intention understanding process.

このように、計算機１００は、フィラー及び間を考慮して、意図理解処理の処理単位となる出力テキストを生成することによって、ユーザ１０３の発話意図を正確に特定できる。したがって、計算機１００は、ユーザ１０３の発話意図に沿った回答を生成できる。 In this way, the computer 100 can accurately identify the utterance intention of the user 103 by generating the output text which is the processing unit of the intention understanding process in consideration of the filler and the interval. Therefore, the computer 100 can generate an answer in line with the utterance intention of the user 103.

次に、図７Ｂに示す処理の流れについて説明する。 Next, the flow of processing shown in FIG. 7B will be described.

対話装置１０２は、ユーザ１０３が発した音声を取得し、当該音声の音声信号を生成する。また、対話装置１０２は、通信装置１０１と通信を行い、ネットワーク１０５を介して接続される計算機１００に音声信号を送信する（ステップＳ３０１）。 The dialogue device 102 acquires the voice emitted by the user 103 and generates an audio signal of the voice. Further, the dialogue device 102 communicates with the communication device 101 and transmits an audio signal to the computer 100 connected via the network 105 (step S301).

計算機１００の音声処理部１１０は、対話装置１０２から送信された音声信号をテキストに変換し、テキストをテキスト受信部１２０に送信する（ステップＳ３０２）。当該テキストは、テキスト受信部１２０からテキスト送信判定部１２１に送信される。なお、テキストの末尾には発話継続文字列は存在しないものとする。 The voice processing unit 110 of the computer 100 converts the voice signal transmitted from the dialogue device 102 into text, and transmits the text to the text receiving unit 120 (step S302). The text is transmitted from the text receiving unit 120 to the text transmission determining unit 121. It is assumed that there is no utterance continuation character string at the end of the text.

テキスト送信判定部１２１は、テキストを受信した場合、図６に示す処理を実行する。テキストの末尾には発話継続文字列が存在しないため、テキスト送信判定部１２１は、ワークエリアに格納されるテキストを出力テキストとして生成し、意図理解部１２２に当該出力テキストを送信する（ステップＳ３０３）。 When the text transmission determination unit 121 receives the text, the text transmission determination unit 121 executes the process shown in FIG. Since there is no utterance continuation character string at the end of the text, the text transmission determination unit 121 generates the text stored in the work area as the output text, and transmits the output text to the intention understanding unit 122 (step S303). ..

意図理解部１２２は、出力テキストを受信した場合、意図理解処理を実行する（ステップＳ３０４）。意図理解部１２２は、回答生成部１２３に意図情報を送信する（ステップＳ３０５）。 When the intention understanding unit 122 receives the output text, the intention understanding unit 122 executes the intention understanding process (step S304). The intention understanding unit 122 transmits the intention information to the response generation unit 123 (step S305).

回答生成部１２３は、意図情報を受信した場合、回答生成処理を実行する（ステップＳ３０６）。回答生成部１２３は、音声処理部１１０に回答テキストを送信する（ステップＳ３０７）。 When the answer generation unit 123 receives the intention information, the answer generation unit 123 executes the answer generation process (step S306). The answer generation unit 123 transmits the answer text to the voice processing unit 110 (step S307).

音声処理部１１０は、回答テキストを音声信号に変換し、ネットワーク１０５を介して対話装置１０２に音声信号を送信する（ステップＳ３０８）。 The voice processing unit 110 converts the answer text into a voice signal and transmits the voice signal to the dialogue device 102 via the network 105 (step S308).

図７Ｂに示すように、計算機１００は、発話継続文字列がテキストの末尾に存在しない場合、従来技術と同様の処理手順にしたがって処理を実行する。 As shown in FIG. 7B, when the utterance continuation character string does not exist at the end of the text, the computer 100 executes the process according to the same processing procedure as in the prior art.

本実施例では、テキストを蓄積する記憶領域としてバッファを設けているが、ワークエリアに複数のテキストを格納してもよい。この場合、バッファを設けなくてもよい。 In this embodiment, a buffer is provided as a storage area for storing texts, but a plurality of texts may be stored in the work area. In this case, it is not necessary to provide a buffer.

実施例１によれば、計算機１００は、フィラー及び間が含まれる発話が行われた場合であっても、意図理解処理の処理単位となる一つのまとまった音声に対応する出力テキストを生成できる。計算機１００は、出力テキストを入力とする意図理解処理を実行することによって、ユーザ１０３の発話意図を正しく特定できる。したがって、対話装置１０２は、ユーザ１０３の発話意図に沿った適切な回答（音声）を出力することができる。 According to the first embodiment, the computer 100 can generate an output text corresponding to one cohesive voice that is a processing unit of the intention understanding process even when an utterance including a filler and an interval is performed. The computer 100 can correctly identify the utterance intention of the user 103 by executing the intention understanding process in which the output text is input. Therefore, the dialogue device 102 can output an appropriate answer (voice) according to the utterance intention of the user 103.

実施例２では、テキスト送信判定部１２１が実行する処理が一部異なる。以下実施例１との差異を中心に実施例２について説明する。 In the second embodiment, the processing executed by the text transmission determination unit 121 is partially different. Hereinafter, Example 2 will be described with a focus on the differences from Example 1.

実施例２の計算機システムの構成は、実施例１の計算機システムの構成と同一である。実施例２の計算機１００のハードウェア構成及びソフトウェア構成は、実施例１の計算機１００のハードウェア構成及びソフトウェア構成と同一である。また、実施例２の計算機１００が保持する情報のデータ構造は、実施例１の計算機１００が保持する情報のデータ構造と同一である。 The configuration of the computer system of the second embodiment is the same as the configuration of the computer system of the first embodiment. The hardware configuration and software configuration of the computer 100 of the second embodiment are the same as the hardware configuration and software configuration of the computer 100 of the first embodiment. Further, the data structure of the information held by the computer 100 of the second embodiment is the same as the data structure of the information held by the computer 100 of the first embodiment.

図８は、実施例２のテキスト送信判定部１２１が実行する処理の一例を説明するフローチャートである。 FIG. 8 is a flowchart illustrating an example of processing executed by the text transmission determination unit 121 of the second embodiment.

ステップＳ１０１及びステップＳ１０２の処理は、実施例１と同一の処理である。 The processes of steps S101 and S102 are the same as those of the first embodiment.

ステップＳ１０２の処理が実行された後、テキスト送信判定部１２１は、テキストに発話継続文字列が含まれるか否かを判定する（ステップＳ１５１）。 After the process of step S102 is executed, the text transmission determination unit 121 determines whether or not the utterance continuation character string is included in the text (step S151).

具体的には、テキスト送信判定部１２１は、形態素解析の結果及び発話継続文字列情報１３０の発話継続文字列３０１に基づいて、テキストに含まれる発話継続文字列を検索する。 Specifically, the text transmission determination unit 121 searches for the utterance continuation character string included in the text based on the result of the morphological analysis and the utterance continuation character string 301 of the utterance continuation character string information 130.

テキストに発話継続文字列が含まれないと判定された場合、テキスト送信判定部１２１は、メモリ２０１（ワークエリア及びバッファ）に格納されるテキストを用いて出力テキストを生成し、意図理解部１２２に出力テキストを送信する（ステップＳ１０８）。その後、テキスト送信判定部１２１は、処理を終了する。 When it is determined that the text does not include the utterance continuation character string, the text transmission determination unit 121 generates an output text using the text stored in the memory 201 (work area and buffer), and causes the intention understanding unit 122 to generate an output text. The output text is transmitted (step S108). After that, the text transmission determination unit 121 ends the process.

テキストに発話継続文字列が含まれると判定された場合、テキスト送信判定部１２１は、発話継続文字列が検知範囲に存在するか否かを判定する（ステップＳ１５２）。 When it is determined that the text includes the utterance continuation character string, the text transmission determination unit 121 determines whether or not the utterance continuation character string exists in the detection range (step S152).

具体的には、テキスト送信判定部１２１は、テキストに含まれる発話継続文字列の位置を特定する。テキスト送信判定部１２１は、ステップＳ１０３において検索された発話継続文字列に対応するエントリの位置３０２の値を読み出す。テキスト送信判定部１２１は、テキストにおける発話継続文字列の位置が、エントリの位置３０２に設定された検知範囲に存在するか否かを判定する。 Specifically, the text transmission determination unit 121 specifies the position of the utterance continuation character string included in the text. The text transmission determination unit 121 reads out the value at the position 302 of the entry corresponding to the utterance continuation character string searched in step S103. The text transmission determination unit 121 determines whether or not the position of the utterance continuation character string in the text exists in the detection range set at the entry position 302.

発話継続文字列が検知範囲に存在しないと判定された場合、テキスト送信判定部１２１は、メモリ２０１（ワークエリア及びバッファ）に格納されるテキストを用いて出力テキストを生成し、意図理解部１２２に出力テキストを送信する（ステップＳ１０８）。その後、テキスト送信判定部１２１は、処理を終了する。 When it is determined that the utterance continuation character string does not exist in the detection range, the text transmission determination unit 121 generates an output text using the text stored in the memory 201 (work area and buffer), and causes the intention understanding unit 122 to generate an output text. The output text is transmitted (step S108). After that, the text transmission determination unit 121 ends the process.

発話継続文字列が検知範囲に存在すると判定された場合、テキスト送信判定部１２１は、ステップＳ１０４に進む。ステップＳ１０４からステップＳ１０８の処理は実施例１と同一の処理である。 When it is determined that the utterance continuation character string exists in the detection range, the text transmission determination unit 121 proceeds to step S104. The processes from steps S104 to S108 are the same as those in the first embodiment.

実施例２の計算機システムの処理の流れは実施例１の計算機システムの処理の流れと同一である。 The processing flow of the computer system of the second embodiment is the same as the processing flow of the computer system of the first embodiment.

実施例２によれば、発話継続文字列及び発話継続文字列の出現位置に基づいて、テキストの出力タイミングを調整することによって、ユーザ１０３の発話意図をより正確に特定できる。また、言語の種別に応じて判定基準を調整できる。 According to the second embodiment, the utterance intention of the user 103 can be more accurately specified by adjusting the output timing of the text based on the utterance continuation character string and the appearance position of the utterance continuation character string. In addition, the judgment criteria can be adjusted according to the type of language.

なお、本発明は上記した実施例に限定されるものではなく、様々な変形例が含まれる。また、例えば、上記した実施例は本発明を分かりやすく説明するために構成を詳細に説明したものであり、必ずしも説明した全ての構成を備えるものに限定されるものではない。また、各実施例の構成の一部について、他の構成に追加、削除、置換することが可能である。 The present invention is not limited to the above-described examples, and includes various modifications. Further, for example, the above-described embodiment describes the configuration in detail in order to explain the present invention in an easy-to-understand manner, and is not necessarily limited to the one including all the described configurations. In addition, a part of the configuration of each embodiment can be added, deleted, or replaced with another configuration.

また、上記の各構成、機能、処理部、処理手段等は、それらの一部又は全部を、例えば集積回路で設計する等によりハードウェアで実現してもよい。また、本発明は、実施例の機能を実現するソフトウェアのプログラムコードによっても実現できる。この場合、プログラムコードを記録した記憶媒体をコンピュータに提供し、そのコンピュータが備えるプロセッサが記憶媒体に格納されたプログラムコードを読み出す。この場合、記憶媒体から読み出されたプログラムコード自体が前述した実施例の機能を実現することになり、そのプログラムコード自体、及びそれを記憶した記憶媒体は本発明を構成することになる。このようなプログラムコードを供給するための記憶媒体としては、例えば、フレキシブルディスク、ＣＤ−ＲＯＭ、ＤＶＤ−ＲＯＭ、ハードディスク、ＳＳＤ（ＳｏｌｉｄＳｔａｔｅＤｒｉｖｅ）、光ディスク、光磁気ディスク、ＣＤ−Ｒ、磁気テープ、不揮発性のメモリカード、ＲＯＭなどが用いられる。 Further, each of the above configurations, functions, processing units, processing means and the like may be realized by hardware by designing a part or all of them by, for example, an integrated circuit. The present invention can also be realized by a program code of software that realizes the functions of the examples. In this case, a storage medium in which the program code is recorded is provided to the computer, and the processor included in the computer reads the program code stored in the storage medium. In this case, the program code itself read from the storage medium realizes the functions of the above-described embodiment, and the program code itself and the storage medium storing the program code itself constitute the present invention. Examples of the storage medium for supplying such a program code include a flexible disk, a CD-ROM, a DVD-ROM, a hard disk, an SSD (Solid State Drive), an optical disk, a magneto-optical disk, a CD-R, and a magnetic tape. Non-volatile memory cards, ROMs, etc. are used.

また、本実施例に記載の機能を実現するプログラムコードは、例えば、アセンブラ、Ｃ／Ｃ＋＋、ｐｅｒｌ、Ｓｈｅｌｌ、ＰＨＰ、Ｊａｖａ（登録商標）等の広範囲のプログラム又はスクリプト言語で実装できる。 In addition, the program code that realizes the functions described in this embodiment can be implemented in a wide range of programs or script languages such as assembler, C / C ++, perl, Shell, PHP, and Java (registered trademark).

さらに、実施例の機能を実現するソフトウェアのプログラムコードを、ネットワークを介して配信することによって、それをコンピュータのハードディスクやメモリ等の記憶手段又はＣＤ−ＲＷ、ＣＤ−Ｒ等の記憶媒体に格納し、コンピュータが備えるプロセッサが当該記憶手段や当該記憶媒体に格納されたプログラムコードを読み出して実行するようにしてもよい。 Further, by distributing the program code of the software that realizes the functions of the examples via the network, it is stored in a storage means such as a hard disk or memory of a computer or a storage medium such as a CD-RW or a CD-R. , The processor provided in the computer may read and execute the program code stored in the storage means or the storage medium.

上述の実施例において、制御線や情報線は、説明上必要と考えられるものを示しており、製品上必ずしも全ての制御線や情報線を示しているとは限らない。全ての構成が相互に接続されていてもよい。 In the above-described embodiment, the control lines and information lines show what is considered necessary for explanation, and do not necessarily indicate all the control lines and information lines in the product. All configurations may be interconnected.

１００計算機
１０１通信装置
１０２対話装置
１０３ユーザ
１０５ネットワーク
１１０音声処理部
１１１言語処理部
１２０テキスト受信部
１２１テキスト送信判定部
１２２意図理解部
１２３回答生成部
１３０発話継続文字列情報
１３１意図理解情報
１３２回答生成情報
２００プロセッサ
２０１メモリ
２０２ネットワークインタフェース 100 Computer 101 Communication device 102 Dialogue device 103 User 105 Network 110 Voice processing unit 111 Language processing unit 120 Text receiving unit 121 Text transmission judgment unit 122 Intention understanding unit 123 Answer generation unit 130 Speech continuation character string information 131 Intention understanding information 132 Answer generation Information 200 Processor 201 Memory 202 Network Interface

Claims

A computer that processes audio signals corresponding to user-generated audio.
It includes an arithmetic unit, a storage device connected to the arithmetic unit, and a communication interface connected to the arithmetic unit.
The arithmetic unit
The audio signal is received via the communication interface and
The received audio signal is converted into a text composed of a plurality of character strings, and the received audio signal is converted into a text composed of a plurality of character strings.
By analyzing the converted text, it is determined whether or not the utterance continuation character string indicating that the user's utterance is continuing is included in the converted text.
When it is determined that the utterance continuation character string is included in the converted text, the converted text is stored in the storage device, and the converted text is stored.
If it is determined that the utterance continuation character string is not included in the converted text, the output text is generated using one or more of the converted texts.
Based on the output text, the utterance intention of the user who emitted the voice corresponding to the received voice signal is specified.
A computer characterized by transmitting information indicating the utterance intention of the specified user to a device that responds to the utterance of the user.

The computer according to claim 1.
The computer holds the utterance continuation character string information for managing the utterance continuation character string, and holds the utterance continuation character string information.
The arithmetic unit
With reference to the utterance continuation character string information, it is determined whether or not the utterance continuation character string exists at the end of the converted text.
A computer characterized in that when it is determined that the utterance continuation character string exists at the end of the converted text, the converted text is stored in the storage device.

The computer according to claim 1.
The computer holds the utterance continuation character string information for managing the utterance continuation character string, and holds the utterance continuation character string information.
The utterance continuation character string information includes a plurality of entries composed of the utterance continuation character string and the appearance position of the utterance continuation character string in the text.
The arithmetic unit
With reference to the utterance continuation character string information, it is determined whether or not the converted text includes the utterance continuation character string.
When it is determined that the converted text includes the utterance continuation character string, the position of the utterance continuation character string included in the converted text in the converted text becomes the utterance continuation character string. Determines whether or not it matches the appearance position set in the corresponding entry,
When it is determined that the position of the utterance continuation character string included in the converted text in the converted text matches the appearance position set in the entry corresponding to the utterance continuation character string, the storage device. A computer characterized by accumulating the converted text in.

The computer according to claim 1.
The arithmetic unit
After accumulating the converted text in the storage device, it is determined whether or not the timer for measuring the waiting time is running.
If it is determined that the timer is not running, the timer is started and the timer is started.
When it is determined that the timer is running, it is determined whether or not the waiting time is larger than the threshold value.
When it is determined that the waiting time is equal to or less than the threshold value, the waiting time is initialized and the measurement of the waiting time is continued.
A computer characterized in that when it is determined that the waiting time is larger than a threshold value, the output text is generated.

The computer according to claim 1.
The arithmetic unit
When a plurality of the converted texts are stored in the storage device, the output texts are generated by deleting and combining the utterance continuation character strings from the plurality of converted texts.
A computer characterized in that when one of the converted texts is stored in the storage device, the one converted text is generated as the output text.

A language analysis method executed by a computer that processes a voice signal corresponding to a voice emitted by a user.
The computer includes an arithmetic unit, a storage device connected to the arithmetic unit, and a communication interface connected to the arithmetic unit.
The language analysis method is
A first step in which the arithmetic unit receives the audio signal via the communication interface and converts the received audio signal into a text composed of a plurality of character strings.
The second arithmetic unit analyzes the converted text to determine whether or not the utterance continuation character string indicating that the user's utterance is continuing is included in the converted text. Steps and
When the arithmetic unit determines that the utterance continuation character string is included in the converted text, a third step of accumulating the converted text in the storage device, and
When the arithmetic unit determines that the utterance continuation character string is not included in the converted text, a fourth step of generating an output text using one or more of the converted texts,
A fifth step in which the arithmetic unit identifies the utterance intention of the user who has emitted a voice corresponding to the received voice signal based on the output text.
A language analysis method, wherein the arithmetic unit includes a sixth step of transmitting information indicating the utterance intention of the specified user to a device that responds to the utterance of the user.

The language analysis method according to claim 6.
The computer holds the utterance continuation character string information for managing the utterance continuation character string, and holds the utterance continuation character string information.
The second step includes a step in which the arithmetic unit refers to the utterance continuation character string information and determines whether or not the utterance continuation character string exists at the end of the converted text.
In the third step, when the arithmetic unit determines that the utterance continuation character string exists at the end of the converted text, the language is characterized by accumulating the converted text in the storage device. analysis method.

The language analysis method according to claim 6.
The computer holds the utterance continuation character string information for managing the utterance continuation character string, and holds the utterance continuation character string information.
The utterance continuation character string information includes a plurality of entries composed of the utterance continuation character string and the appearance position of the utterance continuation character string in the text.
The second step is
A step in which the arithmetic unit refers to the utterance continuation character string information and determines whether or not the utterance continuation character string is included in the converted text.
When the arithmetic unit determines that the converted text includes the utterance continuation character string, the position of the utterance continuation character string included in the converted text in the converted text is the utterance. Including a step of determining whether or not it matches the appearance position set in the entry corresponding to the continuation character string.
In the third step, the arithmetic unit sets the position of the utterance continuation character string included in the converted text in the converted text to the entry corresponding to the utterance continuation character string. A language analysis method characterized by accumulating the converted text in the storage device when it is determined that the positions match.

The language analysis method according to claim 6.
The third step is
A step of determining whether or not the timer for measuring the waiting time is running after the arithmetic unit stores the converted text in the storage device.
When it is determined that the timer is not running, the arithmetic unit starts the timer, and
When it is determined that the timer is running, the arithmetic unit determines whether or not the waiting time is greater than the threshold value.
When it is determined that the waiting time is equal to or less than the threshold value, the arithmetic unit initializes the waiting time and continues the measurement of the waiting time.
A language analysis method comprising the step of generating the output text by the arithmetic unit when it is determined that the waiting time is larger than the threshold value.

The language analysis method according to claim 6.
The fourth step is
When a plurality of the converted texts are stored in the storage device, the arithmetic unit generates the output text by deleting the utterance continuation character string from the plurality of converted texts and combining them. When,
A language analysis method comprising: when the storage device stores one of the converted texts, the arithmetic unit includes a step of generating the one converted text as the output text.

A program for a computer that processes a voice signal corresponding to a voice emitted by a user to be executed.
The computer includes an arithmetic unit, a storage device connected to the arithmetic unit, and a communication interface connected to the arithmetic unit.
The program
The first procedure of receiving the voice signal via the communication interface and converting the received voice signal into a text composed of a plurality of character strings, and
A second procedure for determining whether or not the converted text includes a speech continuation character string indicating that the user's utterance is ongoing by analyzing the converted text.
When it is determined that the utterance continuation character string is included in the converted text, a third step of accumulating the converted text in the storage device, and
When it is determined that the utterance continuation character string is not included in the converted text, a fourth step of generating an output text using one or more of the converted texts, and
A fifth step of identifying the utterance intention of the user who has emitted a voice corresponding to the received voice signal based on the output text, and
A program for causing the computer to execute a sixth step of transmitting information indicating the specified user's utterance intention to a device that responds to the user's utterance.

The program according to claim 11.
The computer holds the utterance continuation character string information for managing the utterance continuation character string, and holds the utterance continuation character string information.
The second procedure includes a procedure for determining whether or not the utterance continuation character string exists at the end of the converted text by referring to the utterance continuation character string information.
The third procedure is a program characterized in that when it is determined that the utterance continuation character string exists at the end of the converted text, the converted text is stored in the storage device.

The program according to claim 11.
The computer holds the utterance continuation character string information for managing the utterance continuation character string, and holds the utterance continuation character string information.
The utterance continuation character string information includes a plurality of entries composed of the utterance continuation character string and the appearance position of the utterance continuation character string in the text.
The second procedure is
With reference to the utterance continuation character string information, a procedure for determining whether or not the utterance continuation character string is included in the converted text, and
When it is determined that the converted text includes the utterance continuation character string, the position of the utterance continuation character string included in the text in the converted text is an entry corresponding to the utterance continuation character string. Including the procedure for determining whether or not it matches the appearance position set in
In the third step, it is determined that the position of the utterance continuation character string included in the converted text in the converted text matches the appearance position set in the entry corresponding to the utterance continuation character string. If so, a program comprising storing the converted text in the storage device.

The program according to claim 11.
The third procedure is
After accumulating the converted text in the storage device, a procedure for determining whether or not the timer for measuring the waiting time is running, and
If it is determined that the timer is not running, the procedure for starting the timer and the procedure for starting the timer
When it is determined that the timer is running, the procedure for determining whether or not the waiting time is larger than the threshold value and
When it is determined that the waiting time is equal to or less than the threshold value, the procedure of initializing the waiting time and continuing the measurement of the waiting time, and
A program comprising a procedure for generating the output text when it is determined that the waiting time is greater than a threshold value.

The program according to claim 11.
The fourth procedure is
When a plurality of the converted texts are stored in the storage device, a procedure for generating the output texts by deleting and combining the utterance continuation character strings from the plurality of converted texts.
A program comprising, when one of the converted texts is stored in the storage device, a procedure of generating the one converted text as the output text.