JP2018155896A

JP2018155896A - Control method and controller

Info

Publication number: JP2018155896A
Application number: JP2017052316A
Authority: JP
Inventors: 陽前澤; Akira Maezawa; 田邑　元一; Genichi Tamura; 元一田邑
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2017-03-17
Filing date: 2017-03-17
Publication date: 2018-10-04
Anticipated expiration: 2037-03-17
Also published as: JP6819383B2

Abstract

PROBLEM TO BE SOLVED: To change an execution of the sequence data, when there is an omission in the case of execution along the sequence data, according to the omission.SOLUTION: An acquisition unit 11 acquires input data from the outside. A storage unit 14 stores sequence data having a branching structure which branches at least between data having an omission portion and data having no omission portion. A detection unit 12 compares the sequence data stored in the storage unit 14 with the acquired input data and detects an omission state in the input data. A selection unit 13 refers to the stored contents of the storage unit 14 and selects one of the data among data groups branched in the sequence data according to the detected omission state. A control unit 15 controls to cause a voice interactive robot 20 to perform an operation corresponding to the selected data.SELECTED DRAWING: Figure 5

Description

本発明は、シーケンスデータに沿った実施を制御するための制御方法及び制御装置に関する。 The present invention relates to a control method and a control apparatus for controlling execution along sequence data.

音声対話装置が人間と音声対話する方式として、予め定められたシナリオに従って音声対話装置が順次発話することにより対話を進める方式がある。例えば音声対話装置がユーザに質問し、これに対するユーザの返答に応じて音声対話装置がさらに応答するという処理を繰り返すことにより、両者の対話が進められる。例えば特許文献１には、ユーザの発話を音声認識した結果に基づき、予め与えられたシナリオに従ってユーザとの対話を制御すると共に、必要に応じてユーザの発話内容に応じた応答文を生成し、再生したシナリオの一文又は生成した応答文を音声合成処理するようにした仕組みが開示されている。 As a method in which the voice interaction device performs a voice interaction with a human, there is a method in which the conversation is advanced by the voice interaction device sequentially speaking according to a predetermined scenario. For example, by repeating the process in which the voice interaction device asks the user a question and the voice interaction device further responds in response to the user's response to the question, the dialogue between the two is advanced. For example, in Patent Document 1, based on the result of voice recognition of the user's utterance, the dialogue with the user is controlled according to a scenario given in advance, and a response sentence according to the content of the user's utterance is generated as necessary. A mechanism is disclosed in which a synthesized sentence of a reproduced scenario or a generated response sentence is subjected to speech synthesis processing.

特開２００４−２８７０１６号公報JP 2004-287016 A

ところで、人間同士の対話においては、例えば主語や目的語或いは文末等が適宜省略される場合がある。このような省略部分を含む発話に対して、全く省略部分が無い応答を返すという対話は、特別な場合を除き、違和感があり不自然である。 By the way, in a dialogue between humans, for example, a subject, an object, or a sentence ending may be omitted as appropriate. A dialogue that returns a response with no omission part to an utterance containing such an omission part is unnatural and unnatural except in special cases.

そこで、本発明は、例えば発話内容が時系列で連なっているようなシーケンスデータに沿った実施がなされる場合において省略があったときに、その省略に応じてシーケンスデータの実施を変更することを目的とする。 Therefore, the present invention, for example, when there is an omission in the case where the execution is performed along sequence data such that the utterance contents are continuous in time series, the execution of the sequence data is changed according to the omission. Objective.

上記課題を解決するため、本発明は、外部から取得した入力データと、省略部分が有るデータと当該省略部分が無いデータとで少なくとも分岐する分岐構造を持つシーケンスデータとを比較して、当該入力データにおける省略状態を検出する検出ステップと、検出された省略状態に応じて、前記シーケンスデータにおいて分岐したデータ群のうちのいずれかのデータを選択する選択ステップとを備えることを特徴とする制御方法を提供する。 In order to solve the above problems, the present invention compares input data acquired from the outside with sequence data having a branch structure that branches at least between data having an omitted portion and data having no omitted portion. A control method comprising: a detection step for detecting an omission state in data; and a selection step for selecting any data in a data group branched in the sequence data according to the detected omission state. I will provide a.

前記検出ステップにおいて、前記入力データにおける省略部分の位置を検出し、前記選択ステップにおいて、検出された省略部分の位置に応じて前記データを選択するようにしてもよい。 In the detection step, the position of the omitted portion in the input data may be detected, and in the selection step, the data may be selected according to the detected position of the omitted portion.

前記検出ステップにおいて、前記入力データにおける省略部分の量を検出し、前記選択ステップにおいて、検出された省略部分の量に応じて前記データを選択するようにしてもよい。 In the detecting step, an amount of the omitted portion in the input data may be detected, and in the selecting step, the data may be selected according to the detected amount of the omitted portion.

前記シーケンスデータは、人間の動作に相当するデータ及び装置の動作に相当するデータを含むようにしてもよい。 The sequence data may include data corresponding to human operations and data corresponding to device operations.

前記選択ステップにおいて、検出された省略状態に応じて、前記装置の動作に相当するデータを選択し、さらに、選択されたデータに応じた動作を前記装置に行わせるよう制御する制御ステップを備えるようにしてもよい。 The selection step includes a control step of selecting data corresponding to the operation of the device in accordance with the detected omission state, and further controlling the device to perform an operation corresponding to the selected data. It may be.

また、本発明は、外部から取得した入力データと、省略部分が有るデータと当該省略部分が無いデータとで少なくとも分岐する分岐構造を持つシーケンスデータとを比較して、当該入力データにおける省略状態を検出する検出部と、検出された省略状態に応じて、前記シーケンスデータにおいて分岐したデータ群のうちのいずれかのデータを選択する選択部とを備えることを特徴とする制御装置を提供する。 Further, the present invention compares the input data acquired from the outside with the sequence data having a branch structure that branches at least between the data with the omitted portion and the data without the omitted portion, and determines the omitted state in the input data. There is provided a control device comprising: a detection unit that detects; and a selection unit that selects any one of the data groups branched in the sequence data according to the detected omission state.

また、本発明は、コンピュータが、外部から取得した入力データと、省略部分が有るデータと当該省略部分が無いデータとで少なくとも分岐する分岐構造を持つシーケンスデータとを比較して、当該入力データにおける省略状態を検出する検出部と、検出された省略状態に応じて、前記シーケンスデータにおいて分岐したデータ群のうちのいずれかのデータを選択する選択部として機能するためのプログラムを提供する。 In the present invention, the computer compares the input data acquired from the outside with the sequence data having a branch structure that branches at least between the data with the omitted portion and the data without the omitted portion. A detection unit for detecting an omission state and a program for functioning as a selection unit for selecting any one of the data groups branched in the sequence data according to the detected omission state are provided.

本発明によれば、シーケンスデータに沿った実施がなされる場合において省略があったときに、その省略に応じてシーケンスデータの実施を変更することができる。 According to the present invention, when execution is performed in accordance with sequence data, when there is an omission, the execution of the sequence data can be changed according to the omission.

本発明の一実施形態に係る対話制御システムの全体構成を示すブロック図である。1 is a block diagram illustrating an overall configuration of a dialog control system according to an embodiment of the present invention. 音声対話ロボットの電気的なハードウェア構成を示すブロック図である。It is a block diagram which shows the electric hardware constitutions of a voice dialogue robot. 制御装置の電気的なハードウェア構成を示すブロック図である。It is a block diagram which shows the electrical hardware constitutions of a control apparatus. 制御装置が記憶しているシーケンスデータの一例を示す図である。It is a figure which shows an example of the sequence data which the control apparatus has memorize | stored. 制御装置の機能構成を示すブロック図である。It is a block diagram which shows the function structure of a control apparatus. 制御装置の動作を示すフローチャートである。It is a flowchart which shows operation | movement of a control apparatus.

［構成］
図１は、本発明の一実施形態に係る対話制御システムの全体構成を示すブロック図である。対話制御システム１は、制御装置１０と、音声対話ロボット２０と、制御装置１０及び音声対話ロボット２０を通信可能に接続する通信網９０とを備えている。通信網９０は、無線通信規格に従う無線通信区間又は有線通信規格に従う有線通信区間の少なくともいずれか一方を含むネットワークである。音声対話ロボット２０は、例えば人間や動物の姿を模した外観を有しており、制御装置１０による制御に従って人間と対話を行う音声対話装置である。 [Constitution]
FIG. 1 is a block diagram showing the overall configuration of a dialog control system according to an embodiment of the present invention. The dialogue control system 1 includes a control device 10, a voice dialogue robot 20, and a communication network 90 that connects the control device 10 and the voice dialogue robot 20 so that they can communicate with each other. The communication network 90 is a network including at least one of a wireless communication section according to a wireless communication standard and a wired communication section according to a wired communication standard. The voice interaction robot 20 is an audio interaction device that has an external appearance imitating, for example, the shape of a human being or an animal, and performs a dialogue with a human according to control by the control device 10.

図２は、音声対話ロボット２０のハードウェア構成を例示した図である。音声対話ロボット２０は、ＣＰＵ２０１（Central Processing Unit）、ＲＯＭ（Read Only Memory）２０２、ＲＡＭ（Random Access Memory）２０３、補助記憶装置２０４、通信ＩＦ２０５、スピーカ２０６及びマイク２０７を有するコンピュータ装置である。ＣＰＵ２０１は、各種の演算を行うプロセッサである。ＲＡＭ２０３は、ＣＰＵ２０１がプログラムを実行する際のワークエリアとして機能する揮発性メモリである。ＲＯＭ２０２は、例えば音声対話ロボット２０ａの起動に用いられるプログラム及びデータを記憶した不揮発性メモリである。補助記憶装置２０４は、例えばＨＤＤ（Hard Disk Drive）又はＳＳＤ（Solid State Drive）等の不揮発性の記憶装置であり、音声対話ロボット２０において用いられるプログラム及びデータを記憶する。ＣＰＵ２０１がこのプログラムを実行することにより、音声対話ロボット２０の動作を制御する。通信ＩＦ２０５は、通信網９０を介して制御装置１０と通信を行う。スピーカ２０６は、制御装置１０から送信されてくる制御データに従って、人間に対して放音する発話手段である。マイク２０７は、音声対話ロボット２０と対話する人間の音声を収音する収音手段である。 FIG. 2 is a diagram illustrating a hardware configuration of the voice interactive robot 20. The voice interactive robot 20 is a computer device having a CPU 201 (Central Processing Unit), a ROM (Read Only Memory) 202, a RAM (Random Access Memory) 203, an auxiliary storage device 204, a communication IF 205, a speaker 206, and a microphone 207. The CPU 201 is a processor that performs various calculations. A RAM 203 is a volatile memory that functions as a work area when the CPU 201 executes a program. The ROM 202 is a non-volatile memory that stores a program and data used for starting the voice interactive robot 20a, for example. The auxiliary storage device 204 is a non-volatile storage device such as an HDD (Hard Disk Drive) or an SSD (Solid State Drive), and stores programs and data used in the voice interactive robot 20. The CPU 201 controls the operation of the voice interactive robot 20 by executing this program. The communication IF 205 communicates with the control device 10 via the communication network 90. The speaker 206 is an utterance unit that emits sound to a human in accordance with control data transmitted from the control device 10. The microphone 207 is a sound collection unit that collects the voice of a person who interacts with the voice dialogue robot 20.

図３は、制御装置１０のハードウェア構成を例示する図である。制御装置１０は、ＣＰＵ１０１、ＲＯＭ１０２、ＲＡＭ１０３、補助記憶装置１０４、及び通信ＩＦ１０５を有するコンピュータ装置である。ＣＰＵ１０１は、各種の演算を行うプロセッサである。ＲＡＭ１０３は、ＣＰＵ１０１がプログラムを実行する際のワークエリアとして機能する揮発性メモリある。ＲＯＭ１０２は、例えば制御装置１０の起動に用いられるプログラム及びデータを記憶した不揮発性メモリである。補助記憶装置１０４は、例えばＨＤＤ又はＳＳＤ等の不揮発性の記憶装置であり、制御装置１０において用いられるプログラム及びデータを記憶する。ＣＰＵ１０１がこのプログラムを実行することにより、後述する図５に示される機能が実現される。通信ＩＦ１０５は、通信網９０を介して音声対話ロボット２０と通信を行う。 FIG. 3 is a diagram illustrating a hardware configuration of the control device 10. The control device 10 is a computer device having a CPU 101, a ROM 102, a RAM 103, an auxiliary storage device 104, and a communication IF 105. The CPU 101 is a processor that performs various calculations. The RAM 103 is a volatile memory that functions as a work area when the CPU 101 executes a program. The ROM 102 is a non-volatile memory that stores a program and data used for starting up the control device 10, for example. The auxiliary storage device 104 is a non-volatile storage device such as an HDD or an SSD, and stores programs and data used in the control device 10. When the CPU 101 executes this program, the functions shown in FIG. 5 described later are realized. The communication IF 105 communicates with the voice interactive robot 20 via the communication network 90.

本実施形態においては、予め定められたシナリオに従って音声対話ロボット２０が順次発話することにより人間との対話を進める、という対話方式を採用している。例えば音声対話ロボット２０が人間に質問し、これに対する人間の返答に応じて音声対話ロボット２０がさらに応答する処理を繰り返すことにより、両者の対話が進められる。 In the present embodiment, an interactive method is adopted in which the dialogue with the human being is advanced by the voice dialogue robot 20 sequentially speaking according to a predetermined scenario. For example, when the voice dialogue robot 20 asks a human question and repeats the process in which the voice dialogue robot 20 further responds in response to a human response to the question, the dialogue between the two is advanced.

人間の発話は、予め作成されたシナリオにおいて想定される発話に比べると、その一部が省略されることが考えられる。このように人間の発話の一部が省略されたときに、音声対話ロボットだけがシナリオどおりに省略が無い発話を行うと、人間と音声対話ロボットとの対話に違和感や不自然さが生じてしまう。そこで、本実施形態では、シナリオで規定されている人間の発話部分において省略状態が記述されている。具体的には、隠れマルコフモデルを用いて、人間と音声対話ロボットとの間で連続する発話を確率的に状態遷移する有限状態オートマトンと想定し、人間の発話の省略状態に応じて音声対話ロボットの発話が複数とおりに分岐するような状態遷移を考える。 It is conceivable that a part of a human utterance is omitted as compared with an utterance assumed in a scenario created in advance. In this way, when a part of human utterance is omitted, if only the voice dialogue robot utters without omission as per the scenario, the conversation between the human and the voice dialogue robot will become strange and unnatural. . Therefore, in this embodiment, the omitted state is described in the human utterance part specified in the scenario. Specifically, using a hidden Markov model, we assume a finite state automaton that probabilistically changes the state of continuous utterances between a human and a spoken dialogue robot. Let us consider state transitions in which the utterance of

以上のことを前提として、補助記憶装置１０４は、音声対話ロボット２０の発話と人間の発話とが時系列に交互に繰り返されるように連ねたシーケンスデータを記憶している。このシーケンスデータは、人間の発話に応じて音声対話ロボットの発話が複数とおりに分岐する分岐構造を有する。この分岐構造は、省略部分が有るデータと省略部分が無いデータとで少なくとも分岐している。 On the premise of the above, the auxiliary storage device 104 stores sequence data in which utterances of the voice interactive robot 20 and human utterances are alternately repeated in time series. This sequence data has a branching structure in which the speech of the voice interactive robot branches in accordance with a human speech. This branching structure branches at least between data with an omitted part and data without an omitted part.

図４は、制御装置１０の補助記憶装置１０４が記憶しているシーケンスデータの一例を示す図である。このシーケンスデータは、人間及び音声対話ロボットの発話データの集合である。このシーケンスデータにおいては、人間の発話データと音声対話ロボットの発話データとが時系列に交互に関連付けられた、いわゆるリンク構造となっており、このリンクの順序に従って発話の制御が進行する。図４では、例えば音声対話ロボット２０による「こんにちは」という発話から右側に進む方向で人間との対話が始まる例を想定している。さらに、図４の例では、まず最初に人間及び音声対話ロボット間での挨拶に関する対話のシナリオがあり、次にお互いの趣味についての対話のシナリオがあり、というように、話題として一定のまとまりのある複数のシナリオが時系列に遷移していくようなケースを想定している。例えばシナリオＡがシナリオＢとシナリオＣに分岐したあとにシナリオＤに戻る、というように、各シナリオが時系列にどのように遷移していくかということは、シーケンスデータの設計者が任意に設計可能である。 FIG. 4 is a diagram illustrating an example of sequence data stored in the auxiliary storage device 104 of the control device 10. This sequence data is a set of utterance data of the human and the voice interactive robot. This sequence data has a so-called link structure in which human speech data and speech dialogue robot speech data are alternately associated in time series, and speech control proceeds according to the link order. In Figure 4, it is assumed example the interaction begins with the people in the direction of travel to the right from the speech of, for example, by voice dialogue robot 20 "Hello". Further, in the example of FIG. 4, there is a dialogue scenario concerning greetings between a human and a voice dialogue robot first, followed by a dialogue scenario about each other's hobbies, and so on. A case is assumed in which a plurality of scenarios transition in time series. For example, how each scenario changes in time series, such as scenario A branching to scenario B and scenario C and then returning to scenario D. Is possible.

図４の例においては、最初の「こんにちは」という音声対話ロボット２０の発話に対して、「こんにちは」「こんちは」「ちは」・・・「おう！」・・・「あれ？久しぶり」・・・というような、複数とおりの人間の発話が規定されている。ここで、「こんにちは」「こんちは」「ちは」という発話群において、「こんにちは」という発話においては省略が無く、「こんちは」「ちは」という発話においては省略が有る。また、「ちは」における省略の量は「こんちは」における省略の量よりも多い。そして、「こんにちは」という省略が無い人間の発話に対しては、「お元気ですか？」という省略の無い音声対話ロボット２０の発話が関連付けられており、「こんちは」という省略が少ない人間の発話に対しては、「元気か？」という省略の少ない音声対話ロボット２０の発話が関連付けられており、「ちは！」という省略が多い人間の発話に対しては、「元気？」という省略の多い音声対話ロボット２０の発話が関連付けられている。 In the example of FIG. 4, for the utterance of the voice interactive robot 20 that the first "Hello", "Hello", "Continent is", "blood" ... "Oh!" ... "there? Long time no see." -・・ Multiple types of human utterances are specified. Here, in the speech group referred to as "Hello", "Continent is", "blood", it is not omitted in the utterance of "Hello", "Conti is", "blood" omitted there in the speech that. Also, the amount of omission in “Chiha” is greater than the amount of omission in “Konha”. And, for the human speech there is no omission of "Hello", is associated with the utterance of the voice interactive robot 20 without omission of "How are you?", "Conti is" omitted the little human beings The utterance is associated with the utterance of the voice dialogue robot 20, which is less abbreviated as “How are you?”, And the abbreviation “How are you?” For the utterance of a person who is often abbreviated as “Chi!”. The utterances of the voice dialogue robot 20 having a large number of voices are associated.

このように、シーケンスデータにおいては、人間の発話の省略状態に応じて、音声対話ロボット２０の省略状態の異なる発話が複数とおりに分岐している。この省略状態という用語の意味には、省略された部分が有るか否かということに加え、省略された部分が有る場合にその省略された部分の量という意味が含まれており、人間の発話において検出された省略部分の量に応じて、音声対話ロボット２０の発話における省略部分の量が規定される。例えば人間の発話において省略量が多い場合には、音声対話ロボット２０の発話においても省略量が多い発話が規定され、人間の発話において省略量が少ない場合には、音声対話ロボット２０の発話においても省略量が少ない発話が規定されている。 As described above, in the sequence data, utterances having different omission states of the voice interactive robot 20 are branched according to the omission state of the human utterance. The meaning of the term “abbreviated state” includes, in addition to whether or not there is an omitted part, the meaning of the amount of the omitted part when there is an omitted part. The amount of the omitted portion in the speech of the voice interactive robot 20 is defined according to the amount of the omitted portion detected in step. For example, when there is a large amount of omission in a human utterance, an utterance with a large amount of omission is also defined in the utterance of the voice dialogue robot 20. Utterances with a small amount of omission are specified.

ここで、最初の「こんにちは」という音声対話ロボット２０の発話に対して、人間が「こんにちは」という発話を行った場合に、この人間の発話においては省略が無いので、音声対話ロボット２０の発話においても「お元気ですか？」という省略が無い発話が関連付けられている。そして、この「お元気ですか？」という音声対話ロボット２０の発話に対して、さらに「元気です」「元気だ」「元気！」というような複数とおりの人間の発話が規定されている。これらの「元気です」「元気だ」「元気！」という発話群において、「元気です」という発話においては省略が無く、「元気だ」「元気！」という発話においては省略が有る。さらに、「元気！」における省略の量は「元気だ」における省略の量よりも多い。 Here, for the utterance of the first voice interactive robot 20 of "Hello", when a person makes a speech of "Hello", since there is no omitted in the speech of this man, in the utterance of the voice interactive robot 20 Is also associated with an utterance that is not abbreviated as "How are you?" In addition to the utterances of the voice dialogue robot 20 "How are you?", There are further defined a plurality of human utterances such as "I'm fine", "I'm fine" and "I'm fine!". In these utterances of “I'm fine”, “I'm fine” and “I'm fine!”, There is no omission in the utterance “I'm fine”, and there are omissions in the utterances “I'm fine” and “I'm fine!”. Further, the amount of omission in “Genki!” Is larger than the amount of omission in “I'm fine”.

また、最初の「こんにちは」という音声対話ロボット２０の発話に対して、人間が「こんちは」という発話を行った場合に、この人間の発話においては省略が有るので、音声対話ロボット２０の発話においても「元気か？」という省略が有る発話が関連付けられている。そして、この「元気か？」という音声対話ロボット２０の発話に対して、「元気です」「元気だ」「元気！」というような複数とおりの人間の発話が規定されている。これら「元気です」「元気だ」「元気！」という発話群において、「元気です」という発話においては省略が無く、「元気だ」「元気！」という発話においては省略が有る。 In addition, for the utterance of the first voice interactive robot 20 of "Hello", when a person makes a speech that "Conch is", because the omission is present in the speech of this man, in the utterance of the voice interactive robot 20 Is also associated with an utterance with the abbreviation "How are you?" For the utterance of the voice dialogue robot 20 “How are you?”, A plurality of human utterances such as “I'm fine”, “I'm fine” and “I'm fine!” Are defined. In these utterances of “I'm fine”, “I'm fine” and “I'm fine!”, There is no omission in the utterance “I'm fine”, and there is an omission in the utterances “I'm fine” and “I'm fine!”.

このように、シーケンスデータは、人間の発話において省略部分が有る場合と省略部分が無い場合とで分岐する分岐構造を持っている。よって、このようなシーケンスデータに従って音声対話ロボット２０の発話を制御することで、人間の発話における省略状態に応じて異なるシーケンスデータが実施されることになる。 Thus, the sequence data has a branching structure that branches when there is an omitted part and when there is no omitted part in human speech. Therefore, by controlling the speech of the voice interactive robot 20 according to such sequence data, different sequence data is executed according to the omitted state in human speech.

図５は、制御装置１０の機能構成を示すブロック図である。制御装置１０は、取得部１１、検出部１２、選択部１３、記憶部１４及び制御部１５という各機能を実現する。取得部１１は制御装置１０の通信ＩＦ１０５によって実現され、検出部１２は制御装置１０のＣＰＵ１０１によって実現され、選択部１３は制御装置１０のＣＰＵ１０１によって実現され、記憶部１４は制御装置１０の補助記憶装置１０４によって実現され、制御部１５は制御装置１０のＣＰＵ１０１及び通信ＩＦ１０５によって実現される。 FIG. 5 is a block diagram illustrating a functional configuration of the control device 10. The control device 10 realizes the functions of an acquisition unit 11, a detection unit 12, a selection unit 13, a storage unit 14, and a control unit 15. The acquisition unit 11 is realized by the communication IF 105 of the control device 10, the detection unit 12 is realized by the CPU 101 of the control device 10, the selection unit 13 is realized by the CPU 101 of the control device 10, and the storage unit 14 is an auxiliary storage of the control device 10. The control unit 15 is realized by the CPU 104 and the communication IF 105 of the control device 10.

取得部１１は、外部（ここでは音声対話ロボット２０）から送信されてくる人間の音声データを、入力データとして取得する。記憶部１４は、図４に例示したような省略部分が有るデータと当該省略部分が無いデータとで少なくとも分岐する分岐構造を持つシーケンスデータを記憶する。検出部１２は、記憶部１４に記憶されたシーケンスデータと外部から取得された入力データとを比較して、入力データにおける省略状態を検出する。選択部１３は、記憶部１４の記憶内容を参照し、検出された省略状態に応じて、シーケンスデータにおいて分岐した発話データ群のうちのいずれかの発話データを選択する。制御部１５は、選択されたデータに応じた発話動作を音声対話ロボット２０に行わせるよう制御する。 The acquiring unit 11 acquires human voice data transmitted from the outside (here, the voice interactive robot 20) as input data. The storage unit 14 stores sequence data having a branch structure that branches at least between data having an omitted portion as exemplified in FIG. 4 and data having no omitted portion. The detection unit 12 compares the sequence data stored in the storage unit 14 with input data acquired from the outside, and detects an omitted state in the input data. The selection unit 13 refers to the storage contents of the storage unit 14 and selects any utterance data from the utterance data group branched in the sequence data according to the detected omission state. The control unit 15 controls the voice interaction robot 20 to perform an utterance operation according to the selected data.

［動作］
図６は制御装置１０の動作を示すフローチャートである。図６の処理は、音声対話ロボット２０がシーケンスデータで規定された最初の発話を行うことで開始される。つまり、シーケンスデータで規定された最初の発話データが制御装置１０から音声対話ロボット２０に送信され、音声対話ロボット２０がこの発話データに従ってスピーカ２０６から放音を行うことで発話する。この音声対話ロボット２０の発話に対して人間が応答する発話を行うと、音声対話ロボット２０のマイク２０７がこれを収音して、音声データとして制御装置１０に送信する。 [Operation]
FIG. 6 is a flowchart showing the operation of the control device 10. The process of FIG. 6 is started when the voice dialogue robot 20 performs the first utterance defined by the sequence data. That is, the first utterance data defined by the sequence data is transmitted from the control device 10 to the voice dialogue robot 20, and the voice dialogue robot 20 utters by emitting sound from the speaker 206 according to the utterance data. When an utterance in which a human responds to the utterance of the voice interactive robot 20 is made, the microphone 207 of the voice interactive robot 20 picks up the sound and transmits it to the control apparatus 10 as voice data.

制御装置１０の通信ＩＦ１０５（取得部１１）は、音声対話ロボット２０から送信されてくる音声データを入力データとして取得する（ステップＳ１１）。 The communication IF 105 (acquisition unit 11) of the control device 10 acquires voice data transmitted from the voice interaction robot 20 as input data (step S11).

制御装置１０のＣＰＵ１０１（検出部１２）は、外部から取得した入力データを所定の音声認識アルゴリズムによって音声認識し、その音声認識結果と、補助記憶装置１０４（記憶部１４）に記憶されたシーケンスデータとを比較して、入力データにおける省略状態を検出する（ステップＳ１２）。例えば図４の例において、人間の発話が「こんちは」であった場合、制御装置１０のＣＰＵ１０１（検出部１２）は、シーケンスデータにおける人間の発話である「こんにちは」に対して「に」が省略された「こんちは」が人間の発話であることを特定する。 The CPU 101 (detection unit 12) of the control device 10 recognizes the input data acquired from the outside by a predetermined speech recognition algorithm, the speech recognition result, and the sequence data stored in the auxiliary storage device 104 (storage unit 14). And the omission state in the input data is detected (step S12). In the example of FIG. 4, for example, "the Conch" human speech if were, CPU 101 of the control unit 10 (detecting unit 12) is "on" for a human speech "Hello" in the sequence data It is specified that the omitted “Kon-ha” is a human utterance.

制御装置１０のＣＰＵ１０１（選択部１３）は、補助記憶装置１０４（記憶部１４）の記憶内容を参照し、検出された省略状態に応じて、シーケンスデータにおいて分岐した音声対話ロボット２０の発話データ群のうちのいずれかの発話データを選択する（ステップＳ１３）。例えば図４の例示において、人間の発話が「こんちは」であった場合、「こんにちは」に対して「に」という文字の省略が有るから、その「に」が省略された「こんちは」に応じた発話である「元気か？」という発話データを選択する。 The CPU 101 (selection unit 13) of the control device 10 refers to the stored contents of the auxiliary storage device 104 (storage unit 14), and the speech data group of the voice interactive robot 20 branched in the sequence data according to the detected omission state. Is selected (step S13). For example, in the illustration of FIG. 4, when human speech is "Conch" for the because the omission of the letter "a" is present for the "Hello", the "on" is omitted "conche is" The utterance data “How are you?” That is the corresponding utterance is selected.

制御装置１０のＣＰＵ１０１及び通信ＩＤ１０５（制御部１５）は、選択された発話データに応じた動作を音声対話ロボット２０に行わせるよう制御する（ステップＳ１４）。つまり、制御装置１０のＣＰＵ１０１及び通信ＩＦ１０５（制御部１５）は、選択した発話データを音声対話ロボット２０に送信し、音声対話ロボット２０がこの発話データに従ってスピーカ２０６から放音を行うことで発話する。さらに、この音声対話ロボット２０の発話に対して人間が応答する発話を行うと、音声対話ロボット２０のマイク２０７がこれを収音して、音声データとして制御装置１０に送信する。以降、再びステップＳ１１の処理に戻り、上記のステップＳ１２〜ステップＳ１４の処理が繰り返される。 The CPU 101 and the communication ID 105 (control unit 15) of the control device 10 control the voice dialogue robot 20 to perform an operation corresponding to the selected utterance data (step S14). That is, the CPU 101 and the communication IF 105 (control unit 15) of the control device 10 transmit the selected utterance data to the voice dialogue robot 20, and the voice dialogue robot 20 utters by emitting sound from the speaker 206 according to the utterance data. . Further, when an utterance in which a human responds to the utterance of the voice dialogue robot 20 is performed, the microphone 207 of the voice dialogue robot 20 picks up the sound and transmits it as voice data to the control device 10. Thereafter, the process returns to step S11 again, and the processes of steps S12 to S14 are repeated.

以上説明した実施形態においては、人間の発話とシーケンスデータにおいて規定されている発話とが比較され、その比較結果に応じて音声対話ロボット２０の発話が選択される。より具体的には、人間の発話においてシーケンスデータに規定されている発話のどこがどれだけ省略されているかが解析され、その省略状態に応じた発話が選択されて音声対話ロボット２０から出力される。これにより、音声対話ロボット２０の発話を人間の発話の省略状態に応じて変更することができ、両者において自然な対話が実現されることになる。 In the embodiment described above, the human speech and the speech defined in the sequence data are compared, and the speech of the voice interactive robot 20 is selected according to the comparison result. More specifically, in the human utterance, how much of the utterance specified in the sequence data is omitted is analyzed, and the utterance corresponding to the omitted state is selected and output from the voice interactive robot 20. Thereby, the utterance of the voice dialogue robot 20 can be changed according to the omission state of the human utterance, and a natural dialogue is realized between the two.

［変形例］
上述した実施形態は次のような変形が可能である。また、以下の変形例を互いに組み合わせて実施してもよい。
［変形例１］
本発明におけるシーケンスデータは、省略部分が有るデータと当該省略部分が無いデータとが論理的に分岐するような分岐構造を含むものであればよい。例えば、制御装置１０に記憶されているシーケンスデータは省略部分が無いシーケンスデータであり、制御装置１０がこのシーケンスデータに基づいて省略部分が有るデータと当該省略部分が無いデータとで論理的に分岐する分岐構造を都度生成するようにしてもよい。この場合、制御装置１０は、省略部分が無いシーケンスデータ、及び、このシーケンスデータに基づいて省略部分が有るデータと当該省略部分が無いデータとで論理的に分岐する分岐構造を都度生成するアルゴリズムデータを記憶しているから、これら記憶されているデータは、省略部分が有るデータと当該省略部分が無いデータとで論理的に分岐する分岐構造を含むデータと言える。制御装置１０は、入力データと、省略部分が有るデータと当該省略部分が無いデータとで少なくとも分岐する分岐構造を持つシーケンスデータとを比較して、入力データにおける省略状態を検出し、検出された省略状態に応じて、前記シーケンスデータにおいて分岐したデータ群のうちのいずれかのデータを選択する。 [Modification]
The embodiment described above can be modified as follows. The following modifications may be implemented in combination with each other.
[Modification 1]
The sequence data in the present invention only needs to include a branch structure in which data having an omitted portion and data having no omitted portion logically branch. For example, the sequence data stored in the control device 10 is sequence data having no omitted portion, and the control device 10 logically branches based on this sequence data between data having the omitted portion and data having no omitted portion. A branch structure to be generated may be generated each time. In this case, the control device 10 generates sequence data without an omitted portion, and algorithm data that generates a branch structure that logically branches between data having an omitted portion and data without the omitted portion based on the sequence data. Therefore, the stored data can be said to be data including a branch structure that logically branches between data having an omitted portion and data having no omitted portion. The control device 10 detects the omission state in the input data by comparing the input data with the sequence data having a branch structure that branches at least between the data with the omission part and the data without the omission part. Depending on the omitted state, one of the data groups branched in the sequence data is selected.

［変形例２］
本発明におけるシーケンスデータは、実施形態で例示したような人間及び音声対話ロボット２０の発話内容を示すデータに限らず、人間の動作に相当するデータ及び装置の動作に相当するデータであってもよい。例えば人間とロボットが舞台上で共演して演技や演劇を行うような場合に、人間及びロボットの舞台における動きや姿勢、表情或いは位置など、それぞれの動作を示す動作データをシーケンスデータとしてもよい。この場合、シーケンスデータは、人間の動作として省略部分が有る動作データと、人間の動作においてそのような省略部分が無い動作データとで少なくとも分岐する分岐構造を持っており、人間の動作の省略状態に応じて、このシーケンスデータにおいて分岐したデータ群のうちのいずれかの動作データが選択される。この場合において、装置は、実施形態のスピーカ２０６に代えて、装置自身を駆動する動作駆動装置を備え、実施形態のマイク２０７に代えて、人間の動作を撮像して認識するためのカメラを備える。 [Modification 2]
The sequence data in the present invention is not limited to the data indicating the utterance content of the human and the voice interactive robot 20 as exemplified in the embodiment, but may be data corresponding to human motion and data corresponding to device operation. . For example, when humans and robots perform together on the stage to perform a performance or play, motion data indicating their respective movements such as movements, postures, facial expressions or positions on the stage of humans and robots may be used as sequence data. In this case, the sequence data has a branching structure that branches at least between operation data having an omission part as human action and operation data having no omission part in human action, and the human action omission state. In response to this, any operation data is selected from the data group branched in the sequence data. In this case, the device includes an operation driving device that drives the device itself instead of the speaker 206 of the embodiment, and a camera for imaging and recognizing human motion instead of the microphone 207 of the embodiment. .

［変形例３］
省略状態という用語の意味には入力データにおける省略部分の位置が含まれており、人間の発話において検出された省略部分の位置に応じて、音声対話ロボット２０の発話が選択されてもよい。例えば人間の発話における省略位置と属性が同じ位置において音声対話ロボット２０の発話が省略される。例えば主語に相当する位置が省略された人間の発話に対しては、主語に相当する位置が省略された音声対話ロボットの発話で返答する。また、例えば目的語に相当する位置が省略された人間の発話に対しては、目的語に相当する位置が省略された音声対話ロボットの発話で返答する。また、例えば文末に相当する位置が省略された人間の発話に対しては、文末に相当する位置が省略された音声対話ロボットの発話で返答する。 [Modification 3]
The meaning of the term “omitted state” includes the position of the omitted portion in the input data, and the utterance of the voice interactive robot 20 may be selected according to the position of the omitted portion detected in the human speech. For example, the speech of the voice interactive robot 20 is omitted at a position where the attribute is the same as the omitted position in human speech. For example, for a human utterance in which the position corresponding to the subject is omitted, a response is made with the utterance of the voice interactive robot in which the position corresponding to the subject is omitted. For example, for a human utterance in which the position corresponding to the object is omitted, a response is made with the utterance of the voice interactive robot in which the position corresponding to the object is omitted. For example, for a human utterance in which the position corresponding to the end of the sentence is omitted, a response is made with the utterance of the voice interactive robot in which the position corresponding to the end of the sentence is omitted.

［変形例４］
上記実施形態の説明に用いた図５のブロック図は機能単位のブロックを示している。これらの各機能ブロックは、ハードウェア及び／又はソフトウェアの任意の組み合わせによって実現される。また、各機能ブロックの実現手段は特に限定されない。すなわち、各機能ブロックは、物理的及び／又は論理的に結合した１つの装置により実現されてもよいし、物理的及び／又は論理的に分離した２つ以上の装置を直接的及び／又は間接的に（例えば、有線及び／又は無線）で接続し、これら複数の装置により実現されてもよい。従って、本発明に係る制御装置は、実施形態で説明したようにそれぞれの機能の全てを一体に備えた装置によっても実現可能であるし、各機能を複数の装置に分散して実装したシステムであってもよい。また、上記実施形態で説明した処理の手順は、矛盾の無い限り、順序を入れ替えてもよい。実施形態で説明した方法については、例示的な順序で各ステップの要素を提示しており、提示した特定の順序に限定されない。 [Modification 4]
The block diagram of FIG. 5 used for description of the said embodiment has shown the block of the functional unit. Each of these functional blocks is realized by any combination of hardware and / or software. Further, the means for realizing each functional block is not particularly limited. That is, each functional block may be realized by one device physically and / or logically coupled, and two or more devices physically and / or logically separated may be directly and / or indirectly. (For example, wired and / or wireless) and may be realized by these plural devices. Therefore, as described in the embodiment, the control device according to the present invention can be realized by a device that is provided with all of the functions, or a system in which each function is distributed and implemented in a plurality of devices. There may be. Further, the order of the processing procedures described in the above embodiments may be changed as long as there is no contradiction. About the method demonstrated by embodiment, the element of each step is shown in the example order, and is not limited to the specific order shown.

本発明は、制御装置が行う制御方法、或いは、制御装置としてコンピュータを機能させるためのプログラムといった形態でも実施が可能である。かかるプログラムは、光ディスク等の記録媒体に記録した形態で提供されたり、インターネット等の通信網を介して、コンピュータにダウンロードさせ、これをインストールして利用可能にするなどの形態で提供されたりすることが可能である。 The present invention can also be implemented in the form of a control method performed by the control device or a program for causing a computer to function as the control device. Such a program may be provided in a form recorded on a recording medium such as an optical disc, or may be provided in a form such that the program is downloaded to a computer via a communication network such as the Internet, and is installed and made available. Is possible.

１…対話制御システム、１０…制御装置、１１…取得部、１２…検出部、１３…選択部、１４…記憶部、１５…制御部、２０…音声対話ロボット、９０…通信網、１０１…ＣＰＵ、１０２…ＲＡＭ、１０３…ＲＯＭ、１０４…補助記憶装置、１０５…通信ＩＦ、２０１…ＣＰＵ、２０２…ＲＡＭ、２０３…ＲＯＭ、２０４…補助記憶装置、２０５…通信ＩＦ、２０６…スピーカ、２０７…マイク。 DESCRIPTION OF SYMBOLS 1 ... Dialog control system, 10 ... Control apparatus, 11 ... Acquisition part, 12 ... Detection part, 13 ... Selection part, 14 ... Memory | storage part, 15 ... Control part, 20 ... Voice dialogue robot, 90 ... Communication network, 101 ... CPU , 102 ... RAM, 103 ... ROM, 104 ... auxiliary storage device, 105 ... communication IF, 201 ... CPU, 202 ... RAM, 203 ... ROM, 204 ... auxiliary storage device, 205 ... communication IF, 206 ... speaker, 207 ... microphone .

Claims

A detection step of detecting an omission state in the input data by comparing input data acquired from the outside with sequence data having a branch structure that branches at least between data having an omission part and data without the omission part;
And a selection step of selecting any one of the data groups branched in the sequence data according to the detected omission state.

In the detection step, the position of the omitted portion in the input data is detected,
The control method according to claim 1, wherein, in the selection step, the data is selected in accordance with the position of the detected omitted part.

In the detection step, the amount of omitted parts in the input data is detected,
The control method according to claim 1 or 2, wherein, in the selection step, the data is selected according to the amount of the omitted portion detected.

The control method according to any one of claims 1 to 3, wherein the sequence data includes data corresponding to a human operation and data corresponding to a device operation.

In the selection step, data corresponding to the operation of the device is selected according to the detected omission state,
The control method according to claim 4, further comprising a control step of controlling the apparatus to perform an operation according to the selected data.

A detection unit that detects input data acquired from the outside, sequence data having a branch structure that branches at least between data having an omission part and data having no omission part, and detecting an omission state in the input data;
A control unit comprising: a selection unit that selects any one of the data groups branched in the sequence data according to the detected omission state.