WO2022249222A1 - 対話装置、対話方法、およびプログラム - Google Patents
対話装置、対話方法、およびプログラム Download PDFInfo
- Publication number
- WO2022249222A1 WO2022249222A1 PCT/JP2021/019516 JP2021019516W WO2022249222A1 WO 2022249222 A1 WO2022249222 A1 WO 2022249222A1 JP 2021019516 W JP2021019516 W JP 2021019516W WO 2022249222 A1 WO2022249222 A1 WO 2022249222A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- dialogue
- utterance
- state
- unit
- user
- Prior art date
Links
- 238000000034 method Methods 0.000 title description 90
- 230000004044 response Effects 0.000 claims abstract description 49
- 230000002452 interceptive effect Effects 0.000 claims description 26
- 230000007704 transition Effects 0.000 claims description 22
- 238000006243 chemical reaction Methods 0.000 claims description 16
- 238000012545 processing Methods 0.000 description 22
- 230000008569 process Effects 0.000 description 8
- 230000015572 biosynthetic process Effects 0.000 description 7
- 230000003993 interaction Effects 0.000 description 7
- 238000003786 synthesis reaction Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 4
- 230000002194 synthesizing effect Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 102100034761 Cilia- and flagella-associated protein 418 Human genes 0.000 description 1
- 101100439214 Homo sapiens CFAP418 gene Proteins 0.000 description 1
- 241001465754 Metazoa Species 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 239000000470 constituent Substances 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
Definitions
- This invention relates to technology for interacting with humans using natural language.
- Dialog systems that use natural language to interact with humans are becoming more common.
- Dialog systems are generally divided into a task-oriented dialog system for accomplishing a given task (hereinafter also referred to as a "task dialog system") and a non-task-oriented dialog system whose purpose is to interact itself (generally called a “chat chat system”). Also called “dialogue system”).
- task dialog system for accomplishing a given task
- chat chat system non-task-oriented dialog system
- chat chat system non-task-oriented dialog system whose purpose is to interact itself
- dialogue system There are various techniques for constructing a dialogue system, but in many cases, a scenario method or an example method is used.
- the scenario method is a technique mainly used in task dialogue systems.
- a scenario for achieving the purpose of dialogue is prepared in advance, and the dialogue system executes dialogue with the user according to the scenario.
- the purpose of the dialogue is to instruct the user about the tax return documents to be submitted so that the user can properly submit the tax return documents.
- an expert with specialized knowledge creates a scenario. Therefore, it is often called an expert system (see, for example, Non-Patent Document 1).
- the example method is a technique mainly used in chat dialogue systems.
- simple rules for utterances and responses called examples (when the user speaks like this, the system responds like this) are prepared in advance, and the dialogue system responds to the user's utterance according to those rules. interact with the user by uttering a response.
- a method of automatically generating based on conversations performed on a social networking service (SNS), or a method of creating characters by multiple users pretending to be a specific character, etc. prepare an example (see, for example, Non-Patent Document 2).
- Dialogue systems that perform tasks that require expert knowledge such as expert systems, use a scenario method in which scenarios are manually created by experts, so they are very costly to build. Also, in order to build a dialogue system that executes multiple tasks simultaneously, it is necessary to appropriately combine scenarios created by multiple experts, so it is even more difficult than building a dialogue system that executes a single task. cost a lot.
- the purpose of this invention is to construct a dialogue system for accomplishing a given task at low cost, in view of the technical problems described above.
- a dialog apparatus includes an example storage unit that stores a plurality of examples each composed of an utterance sentence, a response sentence, and situation information, a dialog state, situation information that can be used in that conversation state, and examples of the situation information.
- a selection rule storage unit that stores a selection rule consisting of a transition destination dialogue state when is selected, an utterance reception unit that receives a user utterance uttered by a user, and a situation from a plurality of examples using the selection rule.
- An example selection unit for selecting a selection example whose information corresponds to the situation information available in the current dialogue state and whose utterance sentence corresponds to the user utterance, and presents the user with a system utterance based on the response sentence included in the selection example and an utterance presenter.
- a dialogue system for accomplishing a given task can be constructed at low cost.
- FIG. 1 is a diagram illustrating the functional configuration of the interactive device of the first embodiment.
- FIG. 2 is a diagram illustrating the processing procedure of the interaction method of the first embodiment.
- FIG. 3 is a diagram illustrating the functional configuration of the interactive device of the second embodiment.
- FIG. 4 is a diagram illustrating the processing procedure of the interaction method of the second embodiment.
- FIG. 5 is a diagram illustrating the functional configuration of a computer.
- a first embodiment of the present invention is an interactive device and method that can perform various tasks simultaneously and that can be constructed at low cost without requiring the labor of specialists.
- the present invention solves the above-mentioned problems by introducing two elemental technologies: (1) collection of examples with contextual information provided by non-experts, and (2) response selection according to the situation by dialogue control. .
- Many non-experts create examples of content that they can answer with confidence (that is, they have knowledge close to their expertise), and a database that collects expert knowledge as a whole is created. It is possible to build As a result, a dialogue system can be built at a cost lower than the cost required when an expert with specialized knowledge creates a scenario.
- Dialogue control is a technique used in slot-value task dialogue systems and is not normally used in example-based dialogue systems.
- interaction control we combine the examples we collect with an additional attribute called context information.
- the dialogue device 1 of the first embodiment includes, for example, an example storage unit 10-1, a dialogue state storage unit 10-2, a selection rule storage unit 10-3, an example collection unit 11, an utterance reception unit 12 , a dialogue state acquisition unit 13 , an example selection unit 14 , a dialogue state update unit 15 , and an utterance presentation unit 16 .
- the dialogue device 1 may include a speech recognition section 17 and a speech synthesis section 18 .
- the interaction method of the first embodiment is realized by the interaction device 1 executing the processing of each step shown in FIG.
- a dialogue device is, for example, a special device configured by reading a special program into a publicly known or dedicated computer having a central processing unit (CPU: Central Processing Unit), a main memory (RAM: Random Access Memory), etc. is.
- the interactive device executes each process under the control of, for example, a central processing unit. Data input to the interactive device and data obtained in each process are stored, for example, in a main memory device, and the data stored in the main memory device are read out to the central processing unit as necessary and used for other purposes. used for processing. At least a part of each processing unit included in the interactive device may be configured by hardware such as an integrated circuit.
- Each storage unit provided in the interactive device is, for example, a main storage device such as RAM (Random Access Memory), an auxiliary storage device composed of a hard disk, an optical disk, or a semiconductor memory device such as flash memory, or a relational database. and middleware such as a key-value store.
- the plurality of storage units included in the interactive device may be implemented as a plurality of physically different storage devices, or may be implemented by logically dividing a single storage device into a plurality of areas. good.
- the dialogue device 1 receives as input text representing the contents of user utterances, and outputs text representing the contents of system utterances for responding to the user utterances, thereby executing a dialogue with the user who is the dialogue partner.
- the dialogue executed by the dialogue device 1 may be text-based or speech-based.
- a dialogue screen displayed on a display unit such as a display provided in the dialogue device 1 is used to execute dialogue between the user and the dialogue device 1 .
- the display unit may be installed in the housing of the interactive device 1, or may be installed outside the housing of the interactive device 1 and connected to the interactive device 1 via a wired or wireless interface.
- the dialogue screen includes at least an input area for inputting user utterances and a display area for presenting system utterances.
- the dialogue screen may include a history area for displaying the history of the dialogue from the start of the dialogue to the present, or the history area may also serve as a display area.
- the user inputs text representing the contents of the user's utterance into the input area of the interactive screen.
- the dialogue device 1 displays text representing the content of the system utterance in the display area of the dialogue screen.
- the dialogue device 1 When executing dialogue based on speech, the dialogue device 1 further includes a speech recognition unit 17 and a speech synthesis unit 18 .
- the dialogue device 1 also has a microphone and a speaker (not shown).
- the microphone and speaker may be installed in the housing of the interactive device 1, or may be installed outside the housing of the interactive device 1 and connected to the interactive device 1 via a wired or wireless interface.
- the microphone and speaker may be installed in an android modeled after a human, or a robot modeled after an animal or a fictional character.
- an android or a robot may be provided with the speech recognition unit 17 and the speech synthesis unit 18, and the interactive device 1 may be configured to input/output text representing the contents of user utterances or system utterances.
- the microphone picks up an utterance uttered by the user and outputs a sound representing the content of the user's utterance.
- the voice recognition unit 17 receives as input a voice representing the content of the user's utterance, and outputs text representing the content of the user's utterance, which is the result of voice recognition of the voice.
- a text representing the content of the user's utterance is input to the utterance reception unit 12 .
- the text representing the content of the system utterance output by the utterance presentation unit 16 is input to the speech synthesis unit 18 .
- the speech synthesizing unit 18 receives a text representing the content of the system utterance, and outputs a voice representing the content of the system utterance obtained as a result of voice synthesis of the text.
- the speaker emits sound representing the content of the system utterance.
- a plurality of examples input by a plurality of example registrants are stored in the example storage unit 10-1.
- An example consists of an utterance sentence assumed to be uttered by a user, a response sentence for the system to respond to the utterance, and at least one piece of situation information corresponding to the combination of the utterance sentence and the response sentence.
- the status information is information representing the category of the topic being discussed in the current dialogue, such as "tourist information" or "administrative procedures".
- the situation information set in the example by the example registrant may be selected from predefined situation information, or may be arbitrarily created by the example registrant.
- the example registrant may be an expert with expertise or a non-expert without expertise.
- data can be collected using a website (see Non-Patent Document 2).
- a non-expert combines an utterance sentence representing the content of a user utterance, a response sentence representing the content of a system utterance in response to the user utterance, and situation information in which the user utterance and the system utterance are performed. and post it.
- Non-Patent Document 2 describes that many example registrants pretend to be a specific character to create examples. Not a required configuration.
- An example registrant may create an example without pretending to be a specific character, or a mixture of examples created by pretending to be a specific character and examples created by not pretending to be a specific character. do not have.
- the dialogue state is information representing the current state of the dialogue, and is determined based on the dialogue performed from the start of the dialogue to the last utterance. Actually, it is set by the dialogue state update unit 15, which will be described later, when presenting the immediately preceding system utterance.
- the initial value of the dialogue state may be arbitrarily set from the situation information set in any of the examples stored in the example storage unit 10-1.
- "dialogue start” may be set. In this case, a formal example in which the situation information is set to "dialogue start” is stored in advance in the example storage unit 10-1.
- Predefined selection rules are stored in the selection rule storage unit 10-3.
- the selection rule expresses the correspondence between the dialog state, the context information, and the dialog state to be transitioned to. Defines the destination dialog state.
- the selection rule is that (1) in some dialogue state X, an example with context information Y or Z can be selected, and (2) if example A with context information Z is selected, other Transition to dialogue state W or transition to dialogue state X is defined.
- (1) for example, (1-1) when the dialogue state is "tourist information", the situation information can be selected from “tourist information", “history”, or “shrine”. is.
- the selection rule for that context information (which context information is selected, which dialog state transitions to when that context information is selected, which context information can be selected in the dialog state corresponding to that context information, and when that context information is selected which dialogue state to transition to) is also manually added.
- chat dialogue is inserted during the execution of the task, if example sentences whose situation information is set to "chat" are collected, for example, the dialogue state is "administrative procedure" or "tourist information”.
- a selection rule is stored in advance so that an example of "chat" as the situation information can be selected and used as a response sentence.
- step S11 the example collection unit 11 receives the example input from the example registrant and stores it in the example storage unit 10-1.
- step S12 the speech accepting unit 12 receives the text representing the content of the user's utterance input to the dialogue device 1 (or output by the speech recognition unit 17), and converts the text representing the content of the user's utterance into the dialogue state. Output to the acquisition unit 13 .
- step S13 the dialogue state acquisition unit 13 receives the text representing the content of the user utterance from the utterance receiving unit 12, and stores it in the dialogue state storage unit 10-2 as the dialogue state at the time of receiving the text representing the content of the user utterance.
- the acquired dialogue state and the text representing the content of the user's utterance are output to the example selection unit 14 .
- step S14 the example selection unit 14 receives the text representing the dialog state and the content of the user's utterance from the dialog state acquisition unit 13, and an example for responding to the user's utterance from the example storage unit 10-1 (hereinafter referred to as "selected example ”), and outputs the acquired selection example to the dialogue state update unit 15 .
- the example selection unit 14 acquires situation information that can be used in the current dialogue state based on the selection rule stored in the selection rule storage unit 10-3.
- the example selection unit 14 searches for examples stored in the example storage unit 10-1 based on the text representing the content of the user's utterance and the situation information that can be used in the current dialogue state.
- examples containing a response sentence that is an answer to the question sentence are retrieved.
- a well-known method may be used as a search method.
- search for an example that has an uttered sentence that is highly similar to the content of the user's utterance.
- a well-known search method may also be used to search for an example having an uttered sentence with a high degree of similarity to the content of the user's utterance.
- the example selection unit 14 selects each of the retrieved examples as a response based on the search score representing the degree of matching with the search condition, the correspondence relationship between the utterance sentence and the response sentence set for the example, and the like. Compute a response selection score for relevance. Then, the example selection unit 14 acquires the example with the highest response selection score as the selected example.
- the selection rule can be defined in such a relationship that when in the dialogue state X, the examples of the situation information Y and Z can be selected, but this is just an example.
- the response selection score is weighted, such as *.8 for the example of the situation information Y and *.2 for the example of the situation information Z, and the example with the highest response selection score is obtained.
- a weighted selection may be made. Specifically, when the dialogue state is "sightseeing information", it is possible to select between "sightseeing information" and "shrine” for the situation information. A weight such as *.2 is set for the example of "shrine".
- step S15 the dialog state updating unit 15 receives the selected example from the example selecting unit 14, and if the dialog state transitions according to the selection rule used for selecting the selected example, the dialog state storing unit 10-2 The stored dialogue state is updated, and the response sentence included in the selected example is output to the utterance presentation unit 16 .
- a new dialog state is set according to selection rules based on the current dialog state and context information contained in the selection example.
- the dialog state stored in the dialog state storage unit 10-2 is updated to "administrative procedure” according to the selection rule in which the dialog state of the transition destination when is selected is set to "administrative procedure”.
- the current dialogue state is "administrative procedure” and the situation information included in the selected example is also "administrative procedure”
- the example where the situation information is "administrative procedure” when the dialogue state is "administrative procedure” In accordance with the selection rule in which the dialog state of the transition destination when is selected is set to "administrative procedure", the dialog state continues to be "administrative procedure” (dialogue state is not updated).
- the current dialogue state is "greeting” and the situation information included in the selected example is "administrative procedure”
- the dialogue state is "waiting for dialogue”
- the example where the situation information is "administrative procedure” is displayed.
- the dialog state stored in the dialog state storage unit 10-2 is updated to "administrative procedure" according to the selection rule in which the dialog state of the transition destination when selected is set to "administrative procedure". Further, for example, priority is given to the situation information, and if the current dialogue state and the situation information included in the selected example received from the example selection unit 14 are different, the situation information with the higher priority is selected. may be updated to a new dialogue state. For example, when the dialogue state is "greetings" and the context information is "chat” and "administrative procedures” can be selected, the priority of "administrative procedures” will be higher between “chat” and “administrative procedures”. By setting as follows, it is easy to update the dialog state to "administrative procedure".
- the progress of the dialogue can be controlled so that the example selection unit 14 can more easily select the example of "administrative procedure” than "chat” as the next utterance.
- examples such as “self-introduction” and “greeting”, which are assumed to be uttered only once in one dialogue, have their priority lowered after being selected the first time, and are not selected from the second time onwards. can be controlled as follows.
- step S16 the utterance presenting unit 16 receives the response sentence from the dialogue state updating unit 15, and presents the response sentence to the user as text representing the content of the system utterance by a predetermined method.
- the text representing the content of the system utterance is output to the display section of the dialogue device 1 .
- the dialogue is executed on a voice basis, the text representing the content of the system utterance is input to the voice synthesizing unit 18, and the voice representing the content of the system utterance outputted by the voice synthesizing unit 18 is reproduced from a predetermined speaker.
- step S100 the dialogue device 1 determines whether or not the current dialogue has ended. If it is determined that the current dialogue has ended (YES), the processing is terminated and the next dialogue is waited for to start. If it is determined that the current dialogue has not ended (NO), the process returns to step S12 to receive the next user's utterance.
- the end of dialogue may be determined by determining whether the current state is a predefined end state. Predefined end states include, for example, a state in which the status information has been updated to "end", and a state in which the user or the system has uttered a predetermined greeting such as "That's all for you" or "Thank you very much.” It should be defined as a state.
- Example 1 User utterance "Is a resident card necessary for final tax return?" System response "If you have a My Number card, you do not need a resident card.
- the current dialogue state is set to "start dialogue" as an initial value, and the dialogue device passively waits for the user's utterance, the dialogue device sets the dialogue state to "dialogue". It waits until the user speaks with "start”. If a selection rule has been set to allow usage of examples whose context information is ⁇ greeting'' when the dialogue state is ⁇ start of dialogue'', the dialogue device will be given the context information of ⁇ greeting''. Example 4 is selected, and the system utterance "What is your business?" is output to the user. At this time, if a selection rule has been set such that if an example whose status information is "greeting" is selected when the dialog state is "start dialog", the dialog state will transition to "administrative procedure”. transitions to "administrative procedures”.
- Example 5 in an example that assumes that the interactive device spontaneously speaks to the user, the user's speech does not have to be set.
- Example 5 is registered in advance as a formal example used for transitioning the state of dialogue in dialogue control.
- the current dialogue state is set to "start dialogue" as an initial value, and when the dialogue device spontaneously speaks, the status information of "start dialogue” is given.
- Example 5 is selected, and the system utterance "What is your business?" is output to the user.
- the current dialogue state is "dialogue start” and the context information is "administrative procedure”
- the dialogue state will transition to "administrative procedure”. Since "administrative procedure” is also given to the situation information of the selection example (example 5), the dialogue state transitions to "administrative procedure”.
- a selection rule is defined so that an example with the context information of "chat" can be selected and the conversation can be conducted according to the topic of the user's utterance. .
- chat the context information of "chat”
- the flow of task dialogue without inserting chat is as follows.
- User: "Hello” (Dialogue state: Dialogue started)
- System: (Status information responds with a greeting example) (Status information: greeting, dialogue state: transition to dialogue waiting)
- User: "Where is my number card?” (Dialogue state: administrative procedure)
- System: (Situation information responds with window guidance example) (Status information: window guidance, dialog state: transition to window guidance)
- the flow of task dialogue when inserting a chat is as follows.
- User: Hello Dialog State: Dialogue Start
- System: Status information returns a greeting example (Status information: Greeting, Dialogue state: Transition to Dialogue waiting)
- Dialog state: Chatting System: Status information responds with a chat example (transitions to status information: chat, dialog state: chat)
- the dialogue state is "chatter”
- the example of "chat” as the situation information is given a greater weight, so that the example of "chat” as the situation information is more likely to be selected.
- the second embodiment of the present invention is a dialogue device and method capable of rephrasing the system utterances presented by the dialogue device 1 of the first embodiment into utterances impersonating a specific character.
- the dialog device 2 of the second embodiment includes an example storage unit 10-1, a dialog state storage unit 10-2, a selection rule storage unit 10-3, and a It includes an example collection unit 11 , an utterance reception unit 12 , a dialogue state acquisition unit 13 , an example selection unit 14 , a dialogue state update unit 15 , and an utterance presentation unit 16 , and an utterance conversion unit 21 .
- the dialogue device 2 may include a speech recognition section 17 and a speech synthesis section 18, as in the first embodiment.
- the interaction method of the second embodiment is realized by the interaction device 2 executing the processing of each step shown in FIG.
- step S11-2 the example collection unit 11 receives the conversion example input by the example registrant and stores it in the example storage unit 10-1.
- a conversion example is an example for converting an utterance sentence into an utterance sentence in which the utterance sentence is paraphrased.
- a paraphrased utterance sentence is, for example, an utterance sentence when a certain utterance sentence is uttered by pretending to be a specific character.
- the conversion examples are an utterance before conversion (that is, an utterance that can be presented in an existing dialogue system) and an utterance after conversion (that is, an utterance before conversion assuming that a specific character uttered the utterance before conversion). and context information indicating the target character, such as "paraphrasing of ⁇ specific character>".
- step S21 the utterance conversion unit 21 receives a response sentence from the dialogue state update unit 15, and converts the response sentence into a response obtained by paraphrasing the response sentence using conversion examples stored in the example storage unit 10-1. It converts it into a sentence, and outputs the converted response sentence to the utterance presenting unit 16 .
- the speech presentation unit 16 of the second embodiment receives the converted response sentence from the speech conversion unit 21, and presents the converted response sentence to the user as text representing the content of the system utterance by a predetermined method.
- the dialogue device includes the dialogue state storage unit 10-2, and the dialogue state acquisition unit 13 acquires the current dialogue state by reading out the dialogue state stored in the dialogue state storage unit 10-2. explained the configuration. However, it is also possible for the dialogue state acquisition unit 13 to estimate the dialogue state based on the progress of the dialogue. In this case, the dialog device does not need to include the dialog state storage unit 10-2 and the dialog state update unit 15. FIG. For example, in the case of a slot-value dialogue system that executes a task by analyzing the contents of user utterances and filling in values corresponding to predefined slots, the next state can be determined based on how the slot values are filled. can be estimated.
- the example corresponding to the most similar combination If there is no example corresponding to the combination, output the example corresponding to the most similar combination, or make an utterance to ask the user about the slot that is the difference between the most similar combination and the current combination, and confirm the content of the slot. Update or add, and try to output the example corresponding to the combination again. For example, if the slot value is filled like "Listening to: Where, Purpose: Toilet", the dialogue state can be presumed to be "Guide to the office", and the corresponding "Toilet is on each floor". on the east side of” can be selected. For example, among a plurality of predetermined slots, a response corresponding to a slot value that has already been filled may be selected, or a response sentence that asks about the content of a slot that has not yet been filled may be selected.
- a dialogue system for accomplishing a predetermined task can be constructed at low cost.
- by adding an attribute of situation information to examples and defining available situation information according to the state of dialogue it is possible to realize a series of dialogues like a scenario while using examples.
- Scenario-based dialogue systems require dialogue control to appropriately combine multiple scenarios corresponding to each task created by experts.
- a dialogue system that can execute multiple tasks at the same time can be easily realized.
- Computer-readable recording media are, for example, non-temporary recording media such as magnetic recording devices and optical discs.
- this program will be carried out, for example, by selling, transferring, lending, etc. portable recording media such as DVDs and CD-ROMs on which the program is recorded.
- the program may be distributed by storing the program in the storage device of the server computer and transferring the program from the server computer to other computers via the network.
- a computer that executes such a program for example, first stores a program recorded on a portable recording medium or a program transferred from a server computer once in the auxiliary recording unit 1050, which is its own non-temporary storage device. Store. When executing the process, this computer reads the program stored in the auxiliary recording unit 1050, which is its own non-temporary storage device, into the storage unit 1020, which is a temporary storage device, and follows the read program. Execute the process. Also, as another execution form of this program, the computer may read the program directly from a portable recording medium and execute processing according to the program, and the program is transferred from the server computer to this computer. Each time, the processing according to the received program may be executed sequentially.
- ASP Application Service Provider
- the above-mentioned processing is executed by a so-called ASP (Application Service Provider) type service, which does not transfer the program from the server computer to this computer, and realizes the processing function only by its execution instruction and result acquisition.
- ASP Application Service Provider
- the program in this embodiment includes information that is used for processing by a computer and that conforms to the program (data that is not a direct instruction to the computer but has the property of prescribing the processing of the computer, etc.).
- the device is configured by executing a predetermined program on a computer, but at least part of these processing contents may be implemented by hardware.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
Abstract
Description
この発明の第一実施形態は、様々なタスクを同時に実行することができ、かつ、専門家の労力を要せず低コストで構築することが可能な対話装置およびその方法である。本発明では、(1)非専門家による状況情報を付与した用例の収集、(2)対話制御による状況に応じた応答選択、という2つの要素技術を導入することにより、上述の課題を解決する。多くの非専門家が、各々が自信を持って回答できる(すなわち、部分的に専門に近い知識を有している)内容の用例を作成することで、全体として専門知識が収集されたデータベースを構築することが可能となる。これにより、専門知識を有する専門家がシナリオを作成する場合に必要となるコストよりも低いコストで対話システムを構築することができる。また、通常は雑談対話で用いられる用例を用いて、シナリオ方式と同等の対話を実現するために、対話制御の技術を導入する。対話制御は、スロット・バリュー方式のタスク対話システムで用いられる技術であり、用例方式の対話システムでは通常用いられない。対話制御を導入するために、収集する用例に状況情報と呼ばれる追加の属性を組み合わせる。これにより、用例方式の対話システムにおいて、シナリオ方式のような対話の流れや、状況に応じた精度の高い応答を実現することが可能となる。
以下、「観光案内」と「行政手続き」の2つのタスクを選択的に実行できるタスク対話システムを想定し、第一実施形態の対話装置1により実現される対話の具体例を説明する。
用例1:ユーザ発話「確定申告に住民票は必要ですか?」
システム応答「マイナンバーカードがあれば住民票は必要ありません。ただし、マイナンバーカードがない場合、マイナンバーが記入された住民票や戸籍謄本が必要になります。」
状況情報:行政手続き/確定申告
用例2:ユーザ発話「マイナンバーカードはどこで発行できますか?」
システム応答「総合窓口で発行できます。」
状況情報:行政手続き/マイナンバー
用例3:ユーザ発話「名物はありますか」
システム応答「京阪奈には雄大な自然と美味しい空気があります。」
状況情報:観光案内/名物
用例4:ユーザ発話「こんにちは」
システム応答「こんにちは,本日はどのようなご用件でしょうか?」
状況情報:挨拶
用例5:ユーザ発話「」
システム応答「どのようなご用件でしょうか?」
状況情報:行政手続き/対話開始
用例6:ユーザ発話「とくにありません」
システム発話「わかりました。ご利用ありがとうございました。」
状況情報:終了
ユーザ:「こんにちは」(対話状態:対話開始)
システム:(状況情報が挨拶の用例で応答する)(状況情報:挨拶、対話状態:対話待機へ遷移)
ユーザ:「マイナンバーカードはどこですか」(対話状態:行政手続き)
システム:(状況情報が窓口案内の用例で応答する)(状況情報:窓口案内、対話状態:窓口案内に遷移)
ユーザ:「住民票はどこですか。納税証明は?」(対話状態:行政手続き)
システム:(状況情報が窓口案内の用例で応答する)
例えば、対話状態が「行政手続き」や「窓口案内」のときには、状況情報が「行政手続き」や「窓口案内」の用例に高い重みづけをしておくことで、「行政手続き」や「窓口案内」の用例が選択されやすくなる。
ユーザ:こんにちは(対話状態:対話開始)
システム:状況情報が挨拶の用例を返す(状況情報:挨拶、対話状態:対話待機、へ遷移)
ユーザ:今日はいい天気だね。(対話状態:雑談)
システム:状況情報が雑談の用例で応答する(状況情報:雑談、対話状態:雑談、に遷移)
例えば、対話状態が「雑談」のときは状況情報が「雑談」の用例に重みが大きくなるように設定しておくことで、状況情報が「雑談」の用例が選択されやすくなる。
この発明の第二実施形態は、第一実施形態の対話装置1が提示するシステム発話を、特定のキャラクタになりきった発話に言い換えて提示することができる対話装置およびその方法である。第二実施形態の対話装置2は、図3に示すように、第一実施形態の対話装置1が備える用例記憶部10-1、対話状態記憶部10-2、選択規則記憶部10-3、用例収集部11、発話受付部12、対話状態取得部13、用例選択部14、対話状態更新部15、および発話提示部16を備え、さらに、発話変換部21を備える。対話装置2は、第一実施形態と同様に、音声認識部17および音声合成部18を備えていてもよい。この対話装置2が図4に示す各ステップの処理を実行することにより、第二実施形態の対話方法が実現される。
上記の実施形態では、対話装置が対話状態記憶部10-2を備え、対話状態取得部13が対話状態記憶部10-2に記憶された対話状態を読み出すことで、現在の対話状態を取得する構成を説明した。しかしながら、対話状態取得部13が対話の進行状況等に基づいて対話状態を推定するように構成することも可能である。この場合、対話装置は対話状態記憶部10-2および対話状態更新部15を備えなくともよい。例えば、ユーザ発話の内容を解析して予め定義されたスロットに対応するバリューを埋めていくことでタスクを実行するスロット・バリュー方式の対話システムであれば、スロット・バリューの埋まり具合から次の状態を推定することができる。推定には、条件付き確率場(CRF: Conditional Random Fields)やニューラルネットワーク(NN: Neural Network)などの系列ラベリングを用いた言語理解を用いることができる。この方法では、「どこでマイナンバーカードを発行できますか?」のような入力文に対して、どの箇所がどのようなスロットに対応しているのかを推定する。具体的には、「(どこ:聞いていること)で(マイナンバーカード:目的)を(発行:作業)できますか?」のように推定される。そして、推定された『「どこ」:聞いていること、「マイナンバー」:目的、「発行」:作業』をスロット・バリューに入力する。用例記憶部10-1に『「どこ」:聞いていること、「マイナンバー」:目的、「発行」:作業』の組み合わせに対応する用例が存在すれば、その用例を選択用例として出力する。組み合わせに対応する用例が存在しない場合、最も類似する組み合わせに対応する用例を出力するか、最も類似する組み合わせと現状の組み合わせとの差分となるスロットについて、ユーザに問い合わせる発話を行い、スロットの内容を更新ないし追記し、再度、組み合わせに対応する用例を出力することを試みる。例えば、スロット・バリューが、『聞いていること:どこ、目的:トイレ』のようにスロット・バリューが埋まっているのであれば、対話状態を「役場案内」と推定でき、対応する「トイレは各階の東側にあります」という応答を選択できる。例えば、予め定めた複数のスロットのうち、すでに埋まっているスロット・バリューに対応する応答を選択してもよいし、まだ埋まっていないスロットの内容を問う応答文を選択してもよい。
上記実施形態で説明した各装置における各種の処理機能をコンピュータによって実現する場合、各装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムを図5に示すコンピュータの記憶部1020に読み込ませ、演算処理部1010、入力部1030、出力部1040などに動作させることにより、上記各装置における各種の処理機能がコンピュータ上で実現される。
Claims (8)
- 発話文と応答文と状況情報とからなる複数の用例を記憶する用例記憶部と、
対話状態とその対話状態で利用可能な状況情報とその状況情報の用例が選択されたときの遷移先の対話状態とからなる選択規則を記憶する選択規則記憶部と、
ユーザが発話したユーザ発話を受け付ける発話受付部と、
前記選択規則を用いて、前記複数の用例から、状況情報が現在の対話状態で利用可能な状況情報に対応し、発話文が前記ユーザ発話に対応する選択用例を選択する用例選択部と、
前記選択用例に含まれる応答文に基づくシステム発話を前記ユーザへ提示する発話提示部と、
を含む対話装置。 - 請求項1に記載の対話装置であって、
前記用例選択部は、前記選択規則から前記現在の対話状態で利用可能な状況情報を取得し、取得した状況情報が設定された前記用例のうち、前記ユーザ発話に対する回答となる応答文を含む前記用例を前記選択用例として選択するものである、
対話装置。 - 請求項1または2に記載の対話装置であって、
前記用例記憶部は、変換前の発話文と変換後の発話文とキャラクタを示す情報とからなる変換用例をさらに記憶するものであり、
前記変換用例を用いて、前記選択用例に含まれる応答文を所定のキャラクタの発話する応答文へ変換する発話変換部をさらに備える、
対話装置。 - 請求項1から3のいずれかに記載の対話装置であって、
前記現在の対話状態を記憶する対話状態記憶部と、
前記選択規則現在の対話状態を前記選択規則に含まれる前記遷移先の対話状態に更新する対話状態更新部と、
をさらに備え、
前記用例選択部は、前記対話状態記憶部に記憶された前記現在の対話状態を用いて前記選択用例を選択するものである、
対話装置。 - 請求項1から3のいずれかに記載の対話装置であって、
対話の開始から現在までの進行状況に基づいて前記現在の対話状態を推定する対話状態取得部をさらに備え、
前記用例選択部は、前記対話状態取得部が推定した前記現在の対話状態を用いて前記選択用例を選択するものである、
対話装置。 - 請求項1から5のいずれかに記載の対話装置であって、
前記用例選択部は、現在の対話状態で利用可能な状況情報を重み付けして、前記選択用例を選択するものである、
対話装置。 - 用例記憶部に、発話文と応答文と状況情報とからなる複数の用例が記憶されており、
選択規則記憶部に、対話状態とその対話状態で利用可能な状況情報とその状況情報の用例が選択されたときの遷移先の対話状態とからなる選択規則が記憶されており、
発話受付部が、ユーザが発話したユーザ発話を受け付け、
用例選択部が、前記選択規則を用いて、前記複数の用例から、状況情報が現在の対話状態で利用可能な状況情報に対応し、発話文が前記ユーザ発話に対応する選択用例を選択し、
発話提示部が、前記選択用例に含まれる応答文に基づくシステム発話を前記ユーザへ提示する、
対話方法。 - 請求項1から6のいずれかに記載の対話装置としてコンピュータを機能させるためのプログラム。
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2021/019516 WO2022249222A1 (ja) | 2021-05-24 | 2021-05-24 | 対話装置、対話方法、およびプログラム |
JP2023523707A JPWO2022249222A1 (ja) | 2021-05-24 | 2021-05-24 |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
PCT/JP2021/019516 WO2022249222A1 (ja) | 2021-05-24 | 2021-05-24 | 対話装置、対話方法、およびプログラム |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022249222A1 true WO2022249222A1 (ja) | 2022-12-01 |
Family
ID=84229657
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2021/019516 WO2022249222A1 (ja) | 2021-05-24 | 2021-05-24 | 対話装置、対話方法、およびプログラム |
Country Status (2)
Country | Link |
---|---|
JP (1) | JPWO2022249222A1 (ja) |
WO (1) | WO2022249222A1 (ja) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2020067584A (ja) * | 2018-10-25 | 2020-04-30 | トヨタ自動車株式会社 | コミュニケーション装置およびコミュニケーション装置の制御プログラム |
JP2020119221A (ja) * | 2019-01-23 | 2020-08-06 | カシオ計算機株式会社 | 対話装置、対話方法、及びプログラム |
-
2021
- 2021-05-24 WO PCT/JP2021/019516 patent/WO2022249222A1/ja active Application Filing
- 2021-05-24 JP JP2023523707A patent/JPWO2022249222A1/ja active Pending
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2020067584A (ja) * | 2018-10-25 | 2020-04-30 | トヨタ自動車株式会社 | コミュニケーション装置およびコミュニケーション装置の制御プログラム |
JP2020119221A (ja) * | 2019-01-23 | 2020-08-06 | カシオ計算機株式会社 | 対話装置、対話方法、及びプログラム |
Non-Patent Citations (1)
Title |
---|
MIYAKE, SHINJI ET AL.: "Robot spoken dialog systems make in 10 days", PROCEEDINGS 2011 OF THE HUMAN INTERFACE SYMPOSIUM, vol. 11, 20 September 2011 (2011-09-20), pages 579 - 582, ISSN: 1345-0794 * |
Also Published As
Publication number | Publication date |
---|---|
JPWO2022249222A1 (ja) | 2022-12-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11074416B2 (en) | Transformation of chat logs for chat flow prediction | |
JP7204690B2 (ja) | 作成者が提供したコンテンツに基づいて対話型ダイアログアプリケーションを調整すること | |
US8346563B1 (en) | System and methods for delivering advanced natural language interaction applications | |
KR101066741B1 (ko) | 컴퓨터 시스템과 동적으로 상호작용하기 위한 컴퓨터 구현 방법, 시스템, 및 컴퓨터 판독가능 기록 매체 | |
US10860289B2 (en) | Flexible voice-based information retrieval system for virtual assistant | |
CN106796787A (zh) | 在自然语言处理中使用先前对话行为进行的语境解释 | |
TW201921267A (zh) | 基於機器翻譯的自動生成重述以產生一對話式代理人的方法及系統 | |
WO2016033291A2 (en) | Virtual assistant development system | |
Sugiura et al. | Rospeex: A cloud robotics platform for human-robot spoken dialogues | |
US9361589B2 (en) | System and a method for providing a dialog with a user | |
JP2013068952A (ja) | 音声認識結果の統合 | |
US20080010069A1 (en) | Authoring and running speech related applications | |
KR20040103445A (ko) | 음성 애플리케이션 언어 태그로 구현된 의미 객체 동기 이해 | |
JP6980411B2 (ja) | 情報処理装置、対話処理方法、及び対話処理プログラム | |
US10713288B2 (en) | Natural language content generator | |
EP2879062A2 (en) | A system and a method for providing a dialog with a user | |
CN114168718A (zh) | 信息处理装置、方法和信息记录介质 | |
JP2020154076A (ja) | 推論器、学習方法および学習プログラム | |
JP2023543268A (ja) | カスタマイズ可能なチャットボットを実行するための構成可能な会話エンジン | |
Lee | Voice user interface projects: build voice-enabled applications using dialogflow for google home and Alexa skills kit for Amazon Echo | |
McGraw | Crowd-supervised training of spoken language systems | |
JP4881903B2 (ja) | 自然言語対話エージェントのためのスクリプト作成支援方法及びプログラム | |
WO2022249222A1 (ja) | 対話装置、対話方法、およびプログラム | |
WO2024069978A1 (ja) | 生成装置、学習装置、生成方法、学習方法、及びプログラム | |
CN110249326B (zh) | 自然语言内容生成器 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21942880 Country of ref document: EP Kind code of ref document: A1 |
|
WWE | Wipo information: entry into national phase |
Ref document number: 2023523707 Country of ref document: JP |
|
WWE | Wipo information: entry into national phase |
Ref document number: 18561788 Country of ref document: US |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21942880 Country of ref document: EP Kind code of ref document: A1 |