WO2016136208A1

WO2016136208A1 - Voice interaction device, voice interaction system, control method of voice interaction device

Info

Publication number: WO2016136208A1
Application number: PCT/JP2016/000855
Authority: WO
Inventors: 釜井　孝浩; 宇佐見　陽; 中西　雅浩
Original assignee: パナソニックＩｐマネジメント株式会社
Priority date: 2015-02-27
Filing date: 2016-02-18
Publication date: 2016-09-01
Also published as: JP2018063272A

Abstract

This voice interaction device (20A) for interacting with a user by voice, is provided with an acquisition unit (101) which acquires utterance data indicating the content of a speech utterance of a user, multiple storage units (103) which associate and store words contained in the utterance data and attributes of those words, a memory unit (104) which stores the history of words stored by the storage units (103), and a modification unit (102). If prescribed control words are included in the utterance data, the modification unit (102) refers to the history stored in the memory unit (104) and modifies the multiple storage units (103) to store the words that were stored by the storage units (103) at a point in time in the past specified by the control words.

Description

Spoken dialogue apparatus, spoken dialogue system, and control method of spoken dialogue apparatus

The present disclosure relates to a voice dialogue apparatus, a voice dialogue system, and a control method for the voice dialogue apparatus.

Patent Document 1 discloses a dialog sequence recognition device that presents a vocabulary group expected to be input next so that the user can visually recognize it based on information input from the user. This prevents the inconvenience that the user is at a loss due to erroneous recognition of the dialogue.

JP 2001-34292 A

This disclosure provides a speech dialogue apparatus that modifies the content of dialogue with a user by a simple method.

The voice dialogue apparatus according to the present disclosure is a voice dialogue apparatus that performs voice dialogue with a user, and includes an acquisition unit, a plurality of holding units, a storage unit, and a change unit. The acquisition unit acquires utterance data indicating the content of the utterance by the user's voice. Each of the plurality of holding units associates the term included in the utterance data with the attribute of the term. The storage unit stores a history of terms held by the plurality of holding units. When a predetermined control term is included in the utterance data, the changing unit refers to the history stored in the storage unit, so that the plurality of holding units at the past time point become the terms held by the plurality of holding units. Change the terminology of the holding part. Here, the past time point is a past time point specified by the control term.

The speech dialogue apparatus according to the present disclosure is effective for correcting the content of dialogue with the user by a simple method.

FIG. 1 is a block diagram illustrating a configuration of a voice interaction apparatus and a voice interaction system according to an embodiment. FIG. 2 is a diagram for explaining presentation by the voice interaction system according to the embodiment. FIG. 3 is a diagram for explaining a dialogue sequence and history information according to the embodiment. FIG. 4 is a flowchart of a main process performed by the voice interaction apparatus according to the embodiment. FIG. 5 is a flowchart of restoration processing by the voice interaction apparatus according to the embodiment. FIG. 6 is a flowchart of restoration point setting processing by the voice interaction apparatus according to the embodiment. FIG. 7 is a diagram for explaining a dialogue sequence and history information according to the embodiment. FIG. 8 is a block diagram showing a configuration of a voice interactive apparatus according to a modification of the embodiment. FIG. 9 is a flowchart showing a control method of the voice interactive apparatus according to the modification of the embodiment.

Hereinafter, embodiments will be described in detail with reference to the drawings as appropriate. However, more detailed description than necessary may be omitted. For example, detailed descriptions of already well-known matters and repeated descriptions for substantially the same configuration may be omitted. This is to avoid the following description from becoming unnecessarily redundant and to facilitate understanding by those skilled in the art.

In addition, the inventor (s) provides the accompanying drawings and the following description in order for those skilled in the art to fully understand the present disclosure, and these are intended to limit the claimed subject matter. It is not a thing.

(Embodiment)
In the present embodiment, a voice dialogue apparatus for correcting the contents of dialogue with a user by a simple method will be described. The voice dialogue apparatus according to the present embodiment performs voice dialogue with the user, generates and corrects dialogue information indicating the content of the dialogue with the user, and outputs the dialogue information to an external processing device. . Further, the voice interaction device acquires the processing result from the external processing device and presents it to the user, and further continues the dialogue with the user. As described above, the voice interaction device sequentially presents the processing results to the user while generating and correcting the interaction information based on the interaction with the user.

Note that the voice interaction device is useful when an operation such as key input by the user or touching the panel is impossible or difficult. For example, there may be applications such as a car navigation device that searches for information while sequentially receiving instructions by the user's voice when the user is driving a car or the like. It is also useful in a voice interaction device that does not have a user interface such as a key or a panel.

[1-1. Constitution]
FIG. 1 is a block diagram showing a configuration of a voice interaction device 20 and a voice interaction system 1 according to the present embodiment.

As shown in FIG. 1, the voice dialogue system 1 includes a display device 10, a speaker 11, a voice synthesis unit 12, a microphone 13, a voice recognition unit 14, a voice dialogue device 20, and a task processing unit 40. Is provided.

The display device 10 is a display device having a display screen. The display device 10 displays an image on the display screen based on the display data acquired from the voice interaction device 20. The display device 10 is realized by, for example, a car navigation device, a smartphone (high-function mobile phone terminal), a mobile phone terminal, a mobile information terminal, a display, or a PC (Personal Computer). Although the display device 10 is shown as an example of a device that displays an image based on information presented by the voice interaction device 20, a speaker that outputs information presented by the voice interaction device 20 as a voice instead of the display device 10. May be used. This speaker may be shared with the speaker 11 described later.

Speaker 11 is a speaker that outputs sound. The speaker 11 outputs sound based on the sound signal acquired from the sound synthesizer 12. The sound output from the speaker 11 is heard by the user.

The speech synthesis unit 12 is a processing unit that converts a response sentence into a speech signal. The voice synthesizing unit 12 acquires a response sentence, which is information transmitted from the voice dialogue apparatus 20 to the user, from the voice dialogue apparatus 20, and generates a voice signal to be output by the speaker based on the obtained response sentence.

The speaker 11 and the voice synthesis unit 12 may be provided inside the voice dialogue apparatus 20 as one function of the voice dialogue apparatus 20 or may be provided outside the voice dialogue apparatus 20. Further, the voice synthesizer 12 may be realized as a so-called cloud server so as to be able to communicate with the voice interaction device 20 via a network such as the Internet. In that case, the connection between the voice synthesizer 12 and the voice interaction device 20 and the connection between the voice synthesizer 12 and the speaker 11 are made through a communication path via the Internet or the like.

The microphone 13 is a microphone that acquires sound. The microphone 13 acquires the user's voice and outputs an audio signal based on the acquired voice.

The voice recognition unit 14 is a processing unit that generates speech data by performing voice recognition on the user's voice. The voice recognition unit 14 acquires the voice signal generated by the microphone 13 and performs voice recognition processing on the acquired voice signal, thereby generating utterance data of the user's utterance. The utterance data is information transmitted from the user to the voice interaction device 20, and is expressed by characters (text) such as “I want to eat Chinese”. Note that since the speech recognition process converts a speech signal into text information, it can also be referred to as a text conversion process.

Note that the microphone 13 and the voice recognition unit 14 may be provided inside the voice dialogue device 20 as one function of the voice dialogue device 20 as in the voice synthesis unit 12 or the like, or provided outside the voice dialogue device 20. May be. In addition, the voice recognition unit 14 may be realized as a cloud server like the voice synthesis unit 12.

The task processing unit 40 is a processing unit that performs processing based on the content of the dialogue between the user and the voice interaction device 20, and outputs information indicating the processing result or related information. The processing by the task processing unit 40 may be any information processing based on the content of the dialogue. For example, the task processing unit 40 may execute a search process for searching a Web page of a restaurant that matches the content of the conversation from a Web page on the Internet, and output the search result. This case will be described below. Note that the unit of execution of processing by the task processing unit 40 is also referred to as a task. The task processing unit 40 corresponds to a processing unit according to the present disclosure.

As another example of processing by the task processing unit 40, processing for accumulating the contents of the dialogue as data may be executed, and information indicating the success or failure of the processing may be output. In addition, the task processing unit 40 may identify an electric device to be controlled among a plurality of electric devices based on the content of the dialogue, and may output specific information or information on the operation of the electric device.

The voice dialogue device 20 is a processing device that performs voice dialogue with the user. The spoken dialogue apparatus 20 generates and corrects dialogue information indicating the content of the dialogue with the user, and outputs the dialogue information to the task processing unit 40. In addition, the voice interaction device 20 acquires a processing result from the task processing unit 40, presents the acquired processing result to the user, and further continues the dialog with the user.

The voice interaction device 20 includes a response sentence generation unit 21, an utterance data acquisition unit 22, a sequence control unit 23, a task control unit 24, an operation unit 25, an analysis unit 26, a memory 27, and a task result analysis unit. 28 and a presentation control unit 29.

The response sentence generation unit 21 is a processing unit that acquires a response instruction from the sequence control unit 23 and generates a response sentence based on the acquired response instruction. The response sentence is information transmitted from the voice interaction device 20 to the user. Specifically, a sentence that prompts the user to speak, such as “Please specify a region”, a conflict with the user ’s speech, such as “Acknowledged”, or a voice, such as “Search” This is a sentence explaining the operation of the dialogue apparatus 20. What kind of response instruction is given at what time will be described in detail later.

The utterance data acquisition unit 22 is a processing unit that acquires the utterance data of the user's utterance from the voice recognition unit 14. When the user's voice is uttered, the microphone 13 and the voice recognition unit 14 generate utterance data indicating the content of the utterance, and the utterance data acquisition unit 22 acquires the generated utterance data. Further, the utterance data acquired by the utterance data acquisition unit 22 may include control terms for changing the content of the dialogue to that at the past time. Utterance data including control terms is also referred to as control utterance data. Note that the utterance data acquisition unit 22 corresponds to one function of the acquisition unit according to the present disclosure.

The sequence control unit 23 is a processing unit that realizes a dialogue with the user by controlling a dialogue sequence of the dialogue between the voice dialogue apparatus 20 and the user. Here, the dialogue sequence is data in which utterances by the user in the dialogue and responses by the voice dialogue apparatus 20 are arranged in time series. Note that the sequence control unit 23 corresponds to one function of the acquisition unit according to the present disclosure.

Specifically, the sequence control unit 23 acquires the utterance data of the user's utterance from the utterance data acquisition unit 22. Then, based on the acquired utterance data, the previous interaction sequence with the user, or the processing result acquired from the task result analysis unit 28, an instruction to create a response sentence to be presented to the user (hereinafter referred to as “response”). Is also referred to as “instruction”, and is sent to the response sentence generation unit 21. What kind of response instruction is generated in what case by the sequence control unit 23 will be specifically described later.

Also, the sequence control unit 23 extracts terms (also referred to as utterance terms) from the acquired utterance data. Furthermore, the sequence control unit 23 stores the extracted term in the slot 31 associated with the attribute of the term via the operation unit 25. Here, the term refers to a relatively short word such as a word. For example, one noun or one adjective corresponds to one term.

The task control unit 24 is a processing unit that outputs the content of the dialogue between the voice interactive device 20 and the user to the task processing unit 40 and causes the task processing unit 40 to execute a process based on the output content of the dialogue. Specifically, the task control unit 24 outputs the terms held in the plurality of slots 31 to the task processing unit 40. Further, the task control unit 24 determines whether or not a predetermined condition regarding the state of the plurality of slots 31 is satisfied, and the term held by the plurality of slots 31 is determined only when the predetermined condition is satisfied. You may make it output to the task process part 40. FIG. The task control unit 24 corresponds to one function of the external processing control unit according to the present disclosure.

The operation unit 25 is a processing unit that adds, deletes, or changes information indicating the content of the dialogue stored in the memory 27. Specifically, the operation unit 25 changes the term held in the slot 31 when the utterance data acquired by the utterance data acquisition unit 22 includes a control term for controlling dialogue information. That is, the operation unit 25 refers to the history table 32 and changes the term held in each of the plurality of slots 31 to the term held in the slot 31 at the past time point specified by the control term. To do. Further, the operation unit 25 may set a restoration point in a predetermined record on the history table 32 in response to an instruction from the task result analysis unit 28. The operation unit 25 corresponds to one function of the acquisition unit according to the present disclosure and one function of the change unit according to the present disclosure.

The analysis unit 26 is a processing unit that analyzes the slot 31 or the history table 32 in the memory 27 and notifies the sequence control unit 23 according to the analysis result. Specifically, the analysis unit 26 determines whether or not each of the slots of the essential slot group of the slots 31 holds the term, and when all the slots of the essential slot group hold the term. Notifies the sequence control unit 23 to that effect. The analysis unit 26 corresponds to one function of the changing unit according to the present disclosure.

In addition, the analysis unit 26 uses the operation unit 25 to perform a restoration process for restoring the content of the dialogue to a past time point. When performing the restoration process, the analysis unit 26 determines whether there are a plurality of restoration points set in the history table 32. If it is determined that there are a plurality of restoration points, A condition for selecting one of them is sent to the sequence control unit 23. Specific processing contents of the restoration processing will be described in detail later.

The memory 27 is a storage device that stores dialogue contents. Specifically, the memory 27 has a slot 31 and a history table 32.

The slot 31 is a storage area for holding dialogue information indicating the content of the dialogue, and a plurality of slots are provided in the voice dialogue device 20. Each of the plurality of slots 31 is associated with a term attribute, and holds a term having an attribute associated with the slot 31. The entire terms stored in each of the slots 31 indicate the dialogue information. Each slot 31 holds one term. When a new term is held in the slot 31 in a state where one term is held, the one term held before that is deleted from the slot 31.

Here, the term attribute is information indicating the nature, feature or category of the term. For example, when the processing of the task processing unit 40 is a restaurant search, the dish name, area, budget, existence of a private room, existence of a parking lot, required time on foot from the nearest station, whether or not chartering is possible, or Information such as whether or not a night view is visible can be used as an attribute. Note that holding a term in the slot 31 can also be expressed as storing or registering a term in the slot 31. Note that the area of the slot 31 in the memory 27 corresponds to a holding unit according to the present disclosure.

In addition, the slot 31 may be provided with two types, that is, an essential slot and an optional slot. The essential slot is a slot 31 in which the task control unit 24 does not output a term to the task processing unit 40 unless the essential slot holds a term. The option slot is a slot 31 in which the task control unit 24 outputs the term to the task processing unit 40 if all the essential slots hold the term even if the option slot does not hold the term. It is. For example, when a search task is executed as task processing, when the task control unit 24 outputs the terms held in all slots 31 to the task processing unit 40, all slots included in the essential slot group hold the terms. The output may be made only when it is. Whether the slot 31 is an essential slot or an optional slot is predetermined for each slot 31. If the above two types are not provided and there is only one type, all of the slots 31 may be required slots or optional slots. Which of these may be determined as appropriate based on the processing of the task processing unit 40 or the content of the dialogue.

The history table 32 is a table showing the history of terms held by the plurality of slots 31. Specifically, the history table 32 is a table in which the terms held in the past by the plurality of slots 31 and the terms currently held are stored in time series. By holding a new term in the slot 31, even when the term held immediately before is deleted from the slot 31, the deleted term remains in the history table 32.

Note that the history table 32 may store information indicating the time at that time (for example, a time stamp) together with the terms held by the plurality of slots 31 in the past. In addition, if there is a premise that records are additionally stored as time progresses, the history table 32 may store only terms held by a plurality of slots 31 in the past. In the memory 27, the area where the history table 32 is stored corresponds to a storage unit according to the present disclosure.

The task result analysis unit 28 is a processing unit that acquires a processing result by the task processing unit 40 and analyzes the acquired processing result. When the task result analysis unit 28 acquires the processing result from the task processing unit 40, the task result analysis unit 28 analyzes the acquired processing result and passes the analysis result to the sequence control unit 23. This analysis result is used when the operation unit 25 determines whether or not to set a restoration point at a time corresponding to the current time in the history table 32. The task result analysis unit 28 corresponds to one function of the external processing control unit according to the present disclosure.

For example, the task result analysis unit 28 acquires the title and URL (Uniform Resource Locator) of the Web page on which the searched information is posted as a result of the restaurant search process by the task processing unit 40. Further, the task result analysis unit 28 analyzes the result of the search process and calculates the number of searched information. Then, the task result analysis unit 28 may set the restoration point only when the number of retrieved information is the number suitable for browsing by the user (for example, about 1 to 30 cases). In addition, the task result analysis unit 28 prohibits setting a restoration point when the number of retrieved information items is not suitable for browsing by the user, such as 0 or 100 or more. It may be.

In addition, the task result analysis unit 28 may set a restoration point when all of the slots of the essential slot group hold the term, or hold it while the slot 31 holds the term. A restoration point may be set at a point in time when the state changes to a state that holds a term different from the existing term.

The presentation control unit 29 is a processing unit that generates presentation data to be presented to the user by the display device 10 and outputs the presentation data to the display device 10. The presentation control unit 29 acquires the processing result from the task processing unit 40, arranges the position on the screen of the display device 10 so that the user can browse the processing result effectively, and outputs it to the display device 10 The presentation data is output to the display device 10 after being converted into a suitable data format.

Note that part or all of the functions of the voice interaction device 20 and the task processing unit 40 may be realized as a cloud server, like the voice synthesis unit 12 and the like.

FIG. 2 is an explanatory diagram of presentation by the voice interaction system 1 according to the present embodiment. The explanatory diagram shown in FIG. 2 is an example of an image displayed on the display screen when the display device 10 presents the processing result by the task processing unit 40 to the user.

* Character strings 201 to 205 indicating attributes are displayed on the left side of the display screen. Character strings 201 to 205 are character strings indicating attributes of the plurality of slots 31.

The terms 211 to 215 are displayed on the right side of the display screen. The terms 211 to 215 are terms held in the slots 31 associated with the attributes of the character strings 201 to 205, respectively.

A character string 206 and result information 216 are shown on the lower side of the display screen. The character string 206 is a character string indicating that what is displayed below the character string 206 is a search result. The result information 216 is information indicating a result of the restaurant search performed by the task processing unit 40 based on the terms 211 to 215.

Thus, the content of the dialogue and the result information that is the processing result by the task processing unit 40 based on the content of the dialogue are displayed on the display device 10, and the user knows the processing result in which the content of the dialogue is reflected. Can do.

It should be noted that the image displayed on the display screen is not limited to that shown in FIG. 2, and the displayed information, the presence / absence of display such as its arrangement, and the display position may be arbitrarily changed.

FIG. 3 is a first explanatory diagram of a dialogue sequence and history information according to the present embodiment.

FIG. 3 shows a dialogue sequence 310, a history table 320, and a search result 330 together with the time series of the dialogue sequence. Note that one row shown in FIG. 3 corresponds to one time point. This line is also called a record. The history table 320 is an example of the history table 32.

The dialogue sequence 310 is data in which utterances by the user in the dialogue and responses by the voice dialogue apparatus 20 are arranged in time series.

The time information 311 is time information (time stamp) indicating the time when the user uttered or responded by the voice interaction apparatus 20.

The utterance 312 is utterance data indicating the utterance by the user at the time. Specifically, the utterance 312 is utterance data indicating the utterance by the user's voice acquired by the utterance data acquisition unit 22 via the microphone 13 and the voice recognition unit 14.

The response 313 is a response sentence indicating a response by the voice interaction device 20 at the time. Specifically, the response 313 is generated by the response sentence generation unit 21 in response to a response instruction from the sequence control unit 23.

The history table 320 is an example of the history table 32, and includes information on a mandatory slot group 321, an optional slot group 322, an action 323, and a restoration point 324. As shown in FIG. 3, the history table 320 is associated with the dialogue sequence 310 in time series.

The essential slot group 321 is a term held in an essential slot among the slots 31 at the time. The essential slot group 321 includes, for example, terms of attributes of “dishes name”, “region”, and “budget”.

The option slot group 322 is a term held in the option slot of the slots 31 at the time. The option slot group 322 includes, for example, attribute terms of “presence / absence of private room” and “presence / absence of parking lot”.

The action 323 is information indicating processing executed by the voice interaction apparatus 20 at the time point, and a plurality of information may be stored. For example, when a new term is held in a slot 31 with a certain attribute, the name of the attribute and a character string “register” are set at the time point to indicate that. In addition, when the task control unit 24 outputs a term to the task processing unit 40 to search for information, a character string “search” is set. Further, when the operation unit 25 changes the term held in the slot 31 to that at the past time point, the character string “restore” is set.

The restoration point 324 is information indicating whether or not a restoration point is set at the time. At the time when the restoration point is set, “1” is set. The task result analysis unit 28 determines whether or not a restoration point is set at the time. When the task result analysis unit 28 determines that the restoration point is set at the time, the operation unit 25 sets the restoration point to the restoration point 324 at the time.

The search result 330 is the number of search processing results by the task processing unit 40 at the time. The search result 330 is set by the task result analysis unit 28.

FIG. 3 shows an interactive sequence when the user sequentially searches for a restaurant under different search conditions while changing the search conditions. FIG. 3 shows a dialogue sequence when the content of the dialogue is changed to the content of the dialogue at the past time intended by the user.

At the time corresponding to the records R1 to R7, the terms included in the user's utterance are sequentially acquired by the utterance data acquisition unit 22 and the like, and each of the acquired terms is stored in the slot 31 corresponding to the attribute of the term. .

At the time corresponding to the record R8, the first search processing based on the term held in the slot 31 is performed by the task processing unit 40. This is performed when the term is stored in all the slots 31 included in the essential slot group at the time corresponding to the record R7.

At the time corresponding to the records R9 to R16, search processing based on the terms held in the slot 31 is performed. In this case, the search processing is sequentially performed while changing the search word so that the search result desired by the user can be obtained.

At the time corresponding to the record R17, the user makes a control utterance to return the content of the dialogue to a past time. This is because the search result at the time corresponding to the record R14 or R16 was 0, and the user intended to return to the search conditions of the past time before the number of searches became 0. Is.

In the records R18 to R20, the terms held in the slots 31 are restored to those in the record R10.

By doing in this way, the voice dialogue apparatus 20 can return the content of the dialogue to a past time point based on the speech by the user's voice, and continuously execute a new dialogue from the state. In this way, the voice interaction device can correct the content of the dialogue with the user by a simple method.

[1-2. Operation]
The operations of the voice interaction device 20 and the voice interaction system 1 configured as described above will be described below.

FIG. 4 is a flowchart of main processing by the voice interaction apparatus 20 according to the present embodiment.

In step S101, the microphone 13 acquires the voice of the user's utterance and generates a voice signal based on the acquired voice. Here, the voice of the utterance by the user may be a voice including a term for restaurant search such as “I want to eat Chinese”, or the slot 31 holds “return to the guard”. It may be a voice including a term for changing the term to be used at a past time.

In step S102, the voice recognition unit 14 performs voice recognition processing on the voice signal generated by the microphone 13 in step S101, thereby generating utterance data of the user's utterance.

In step S103, the utterance data acquisition unit 22 acquires the utterance data generated by the voice recognition unit 14 in step S102.

In step S104, the sequence control unit 23 determines whether or not the utterance data acquired by the utterance data acquisition unit 22 in step S103 is empty.

If the sequence control unit 23 determines that the utterance data is empty in step S104 ("Y" in step S104), the process proceeds to step S105. On the other hand, if it is determined that the utterance data is not empty (“N” in step S104), the process proceeds to step S121.

In step S105, the sequence control unit 23 stores the term included in the utterance data in the slot 31 using the operation unit 25. Specifically, the sequence control unit 23 determines the attribute of the term for each of the terms included in the utterance data, and stores the term in the slot 31 having an attribute that matches the attribute of the term. For example, the sequence control unit 23 determines that the term “Chinese” included in the utterance data “Chinese wants to eat” is a term having a dish name attribute, and the term “Chinese” is a slot having a dish name attribute. 31. At this time, when the term stored in the slot 31 is an abbreviation or common name of the original name, the sequence control unit 23 converts the original name into the original name and stores it in the slot 31. Good. Specifically, the sequence control unit 23 may determine that the term “Chinese” is an abbreviation of “Chinese cuisine” and store “Chinese cuisine” in the slot 31.

In step S106, the operation unit 25 and the presentation control unit 29 display the terms held in the slot 31 by the display device 10.

In step S107, the operation unit 25 or the like performs a restoration process for restoring the content of the dialogue by changing the content of the dialogue to that at the past time when necessary. Details of the restoration process will be described later in detail.

In step S108, the analysis unit 26 determines whether the term is stored in all the slots 31 of the essential slot group, that is, whether all the slots 31 of the essential slot group hold the term.

If the analysis unit 26 determines that the term is stored in all the slots 31 in step S108 ("Y" in step S108), the process proceeds to step S109. On the other hand, if the analysis unit 26 determines that no term is stored in all the slots 31 (“N” in step S108), that is, if at least one slot 31 in the essential slot group is empty, the step The process proceeds to S122.

In step S109, the sequence control unit 23 gives the task control unit 24 an execution instruction for causing the task processing unit 40 to execute the task processing. At this time, the operation unit 25 records in the history table 32 that the search task has been executed. Specifically, the operation unit 25 sets “search” to the current action 323 in the history table 320.

In step S110, the task control unit 24 outputs the term held in the slot 31 to the task processing unit 40 based on the execution instruction from the sequence control unit 23 in step S109, and performs search processing on the task processing unit 40. Let it run. The task processing unit 40 acquires the term output by the task control unit 24, performs a search process using the acquired term as a search term, and outputs a search result.

In step S111, the presentation control unit 29 acquires the search result output by the task processing unit 40 in step S110, and presents the acquired search result to the user in the display device 10 (for example, FIG. 2). The display mode is output to the display device 10. The display device 10 acquires the search result output by the presentation control unit 29 and displays it on the display screen.

In step S112, the task result analysis unit 28 acquires the search result output by the task processing unit 40 in step S110, and performs restoration point setting processing based on the acquired search result. Details of the restoration point setting process will be described later in detail.

In step S113, the sequence control unit 23 gives a response instruction to prompt the user for the next utterance to the response sentence generation unit 21.

In step S114, the response text generation unit 21 generates a response text based on the response instruction. In addition, the response sentence generation unit 21 outputs the generated response sentence to the speech synthesizer 12, and outputs the response sentence as a sound from the speaker 11 to allow the user to listen.

When the process of step S114 is completed, the process of step S101 is executed again.

In step S121, the sequence control unit 23 gives a response instruction to the response sentence generation unit 21 to prompt the user to re-utter (perform the same utterance as the previous time). The fact that the utterance data is determined to be empty in step S104 means that the voice recognition unit 14 cannot acquire the utterance data from the sound although the microphone 13 has acquired some sound. . Therefore, it is expected that utterance data can be acquired by requesting the user to perform the same utterance as the previous time.

In step S122, the sequence control unit 23 gives a response instruction for prompting the user to speak next to the response sentence generating unit 21. For example, when there is a slot 31 that does not hold a term among the slots 31 included in the essential slot group, the sequence control unit 23 causes the user to utter the term that the slot 31 that does not hold the term should hold. A response instruction is generated to generate a response sentence.

FIG. 5 is a flowchart of restoration processing by the voice interaction apparatus according to the present embodiment. The flowchart shown in FIG. 5 shows the details of the process of step S107 in FIG. 4, and when the utterance data includes a control term, the term held in the slot 31 is changed to that at the past time point. This shows the processing to be performed.

More specifically, the operation unit 25 determines whether or not the utterance data acquired by the utterance data acquisition unit 22 includes a first term and a second term described later. In addition, when the operation unit 25 determines that the first term and the second term are included, the operation unit 25 refers to the history table to display the term held in each of the plurality of slots 31 in the past time point. It changes to the term which 31 had. Here, the past time point is a time point when the slot 31 (corresponding to the correspondence holding unit according to the present disclosure) associated with the attribute of the second term among the plurality of slots 31 holds the second term. .

In step S201, the sequence control unit 23 determines whether the utterance data acquired from the utterance data acquisition unit 22 includes a restoration term (also referred to as a first term). Here, the restoration term is a predetermined term indicating that the dialogue information is changed to a past time point, for example, “return to (to)” or “not (to)”. It is a thing.

If the sequence control unit 23 determines that the restoration term is included in step S201 ("Y" in step S201), the process proceeds to step S202. On the other hand, if it is determined that no restoration term is included (“N” in step S201), the series of processes shown in FIG. 5 is terminated.

In step S202, the analysis unit 26 acquires a term (also referred to as a second term) included in a portion excluding the restoration term in the utterance data, and extracts a restoration point from the history table 32 based on the obtained term. Specifically, the analysis unit 26 determines the attribute of the acquired term, and among the restoration points included in the history table 32, the term held in the slot 31 corresponding to the acquired term attribute is Extract restore points that match the terms you have selected. It can also be said that the utterance data including the first term and the second term is control utterance data. A plurality of setting points may be extracted.

In step S203, the analysis unit 26 determines whether or not there is one restoration point extracted in step S202.

If the analysis unit 26 determines that there is one restoration point in step S203 ("Y" in step S203), the process proceeds to step S204. On the other hand, when the analysis unit 26 determines that there is not one restoration point (“N” in step S203), the process proceeds to step S211.

In step S204, the operation unit 25 refers to the history table 32 and changes the term held in the slot 31 to the term held in the slot 31 at the point of the restoration point extracted in step S202. To do. That is, the operation unit 25 changes the terminology held in the plurality of slots 31 so as to return to the term at the time of the restoration point. In addition, the operation unit 25 sets “restore” as the action when the history table 320 changes to the term at the time of the restoration point. Note that the operation unit 25 does not have a term held in the slot 31 at the time of the restoration point, that is, if the slot 31 holds no term at the time of the restoration point, the slot 31 Terminate the term.

In step S <b> 211, the sequence control unit 23 gives a response instruction to the response sentence generation unit 21 for a response for prompting the user to extract only one restoration point. For example, in the history table 320, when a control utterance such as “Return to Moriguchi” is acquired from the user, there are two restoration point candidates specified from the control utterance. In order to urge the user to indicate which of the two restoration points is intended, the sequence control unit 23 responds with a response “Would it return to the place searched with the parking lot?” Give instructions.

After step S211, when the user makes an utterance that specifies one of the two restoration points, one restoration point is extracted in step S202 executed from the next main process (FIG. 4). Step S204 is executed.

In the above, instead of the second term, an attribute name that is an attribute name may be used. That is, the operation unit 25 determines whether or not the first term and the attribute name are included in the utterance data acquired by the utterance data acquisition unit 22. When the operation unit 25 determines that the first term and the attribute name are included, the operation unit 25 refers to the history table, and holds the term held in each of the plurality of slots 31 in the past time point. You may make it change into the vocabulary which was doing. Here, the past time point holds the term currently held in the slot 31 (corresponding to the correspondence holding unit according to the present disclosure) associated with the attribute indicated by the attribute name among the plurality of slots 31. It is the time immediately before.

FIG. 6 is a flowchart of restoration point setting processing by the voice interaction apparatus according to the present embodiment. The flowchart shown in FIG. 6 shows details of the process in step S112 in FIG.

In step S301, the operation unit 25 branches the process based on a condition for setting a restoration point. When the above condition is “when search is executed” (condition C) (“condition C” in step S301), the process proceeds to step S302. On the other hand, when the above condition is ““ time when search is performed ”and“ search result is valid ”” (condition D) (“condition D” in step S301), the process proceeds to step S303. Here, the case where there are two conditions is shown as an example, but the same processing is possible even when there are three or more conditions.

In step S302, the operation unit 25 sets a restoration point at the current time point in the history table 320.

In step S303, the operation unit 25 acquires a search result that is an analysis result of the task result analysis unit 28, and determines whether or not the number of pieces of searched information is zero.

If the number of information retrieved in step S303 is 0 (“Y” in step S303), the operation unit 25 ends a series of processes without setting a restoration point at this time. That is, the operation unit 25 prohibits setting a restoration point when the information included in the information search result is 0 even when the information search result is acquired. On the other hand, when the number of retrieved information is not 0 (“N” in step S303), the process proceeds to step S302.

Note that even when the number of retrieved information is not suitable for browsing by the user (for example, 100 or more), the restoration point may not be set at this point as in the case of zero.

FIG. 7 is a second explanatory diagram of history information according to the present embodiment. FIG. 7 shows a dialogue sequence in a dialogue in which a user sequentially searches for a restaurant under different search conditions while changing the search conditions. This is an example of a dialogue sequence when the dialogue content is changed to that at a past time when the dialogue content is different from the user's intention due to misrecognition of voice or the like.

FIG. 7 shows a dialogue sequence 310 and the like as in FIG.

At the time corresponding to the records R1 to R5, the terms included in the user's utterance are sequentially acquired by the utterance data acquisition unit 22 and the like, and each of the acquired terms is stored in the slot 31 corresponding to the attribute of the term. .

At the time corresponding to the record R6, the task processing unit 40 performs an initial search process based on the term held in the slot 31. This is performed when the term is stored in all the slots 31 included in the essential slot group at the time corresponding to the record R5.

At the time corresponding to the records R7 to R14, a search process based on the terms stored in the slot 31 is performed. In this case, the search processing is sequentially performed while changing the search word so that the search result desired by the user can be obtained.

In this dialogue, the term held in the slot 31 is changed to something different from the user's intention due to erroneous recognition by the voice recognition unit 14. Specifically, at the time corresponding to the record R11, the user uttered “parking lot (Chushajomo)” with the intention of adding a parking lot as a search condition. ) ”. Due to the erroneous recognition by the voice recognition unit 14, the term “Chinese cuisine” is stored in the slot 31 of the attribute of the dish name at the time corresponding to the record R 12. At the time corresponding to the record R13, the user uttered “Chuka-janakute-itaria” instead of “Chuka-janakute-itaria” with the intention of correcting the search condition. "In other words, I misunderstood the place name Iriya. Due to the erroneous recognition by the voice recognition unit 14, the term “Iritani” is stored in the slot 31 of the regional attribute at the time corresponding to the record R15.

At the time corresponding to the record R15, the user makes an utterance for returning the content of the dialogue to the past time. This is because the term held in the slot 31 at the time corresponding to the record R12 or R14 has been changed unlike the user's intention, and the user tries to return to the search condition at the past time before the change was made. Is intended.

At the time corresponding to the records R15 to R16, the terms held in the slots 31 are restored to those in the record R10.

In this way, the voice interaction apparatus can return the content of the conversation to a past time point based on the utterance by the user, and continuously execute a new conversation from that state. In this way, the voice interaction device can correct the content of the dialogue with the user by a simple method.

[1-3. Modified example]
FIG. 8 is a block diagram showing a configuration of a voice interactive apparatus 20A according to a modification of the present embodiment.

As shown in FIG. 8, the voice interaction apparatus 20 </ b> A that performs dialogue with the user by voice includes a plurality of holding units 103, a storage unit 104, an acquisition unit 101, and a change unit 102. The plurality of holding units 103 hold dialogue information indicating the content of the dialogue. Each of the plurality of holding units 103 is associated with a term attribute, and holds a term having the attribute. The storage unit 104 stores a history of terms held by the plurality of holding units 103. The acquisition unit 101 acquires utterance data indicating the contents of utterances by the user's voice. Furthermore, the acquisition unit 101 holds the utterance term included in the acquired utterance data in the holding unit 103 associated with the attribute of the utterance term among the plurality of holding units 103. The change unit 102 refers to the storage unit 104 when the acquisition unit 101 acquires control utterance data that includes a control term for controlling the conversation information. change. Specifically, the term held by each of the plurality of holding units 103 is changed to the term held by the holding unit 103 at the past time point specified by the control term.

The voice interaction device 20A further outputs the interaction information to a processing unit that performs processing based on the interaction information, using the terms held by each of the plurality of storage units 103 as interaction information, and processes the response as an output response. An external processing control unit 105 that acquires information indicating the result of the above may be provided.

Note that the processing unit may perform an information search using a term related to the acquired dialogue information as a search term, and the external processing control unit 105 may acquire a result of the information search as a response. The voice interactive apparatus 20A may further include a presentation control unit 106 for presenting a result of the information search acquired by the external processing control unit 105 to the user.

FIG. 9 is a flowchart showing a control method of the voice interactive apparatus 20A according to a modification of the present embodiment.

As shown in FIG. 9, the control method of the voice interaction apparatus 20 </ b> A that performs voice interaction with the user includes an acquisition step and a change step. In the acquisition step, utterance data indicating the content of the utterance by the user's voice is acquired (step S 401), and the utterance term included in the acquired utterance data is associated with the attribute of the utterance term in the plurality of holding units 103. Held by the holding unit 103 (step S402). In the change step, when the utterance data acquired in the acquisition step includes a control term for controlling the conversation information, the plurality of holding units 103 holds the history stored in the storage unit 104 with reference to the history stored in the storage unit 104 Change the terminology used. Specifically, the term held by each of the plurality of holding units 103 is changed to the term held by the holding unit 103 at the past time point specified by the control term (step S403).

The voice interaction device 20A according to this modification has the same effect as the voice interaction device 20.

[1-4. Effect]
As described above, the voice dialogue apparatus 20 according to the present embodiment is a voice dialogue apparatus 20 that performs dialogue with a user by voice, and includes a plurality of slots 31 for holding dialogue information indicating the contents of the dialogue. A history table 32, an utterance data acquisition unit 22, and an operation unit 25. Each of the plurality of slots 31 is associated with a term attribute, and holds a term having the associated attribute. The history table 32 stores the history of terms held by the plurality of slots 31. The utterance data acquisition unit 22 acquires utterance data indicating the content of the utterance by the user's voice, and the utterance terms included in the acquired utterance data are assigned to the slots associated with the utterance term attributes among the plurality of slots 31. Hold. When the utterance data acquired by the utterance data acquisition unit 22 includes a control term for controlling dialogue information, the operation unit 25 refers to the history stored in the history table 32 and refers to the plurality of slots 31. Change the terminology held by. Specifically, the term held in each of the plurality of slots 31 is changed to the term held in the slot 31 at the past time point specified by the control term.

According to this, the voice dialogue device 20 can change the dialogue information to that at the past time point based on the voice of the user, that is, the dialogue information can be returned to the past state. Here, the past time point is a time point determined by the user's voice. Therefore, the user can return the conversation information, which is the content of the conversation with the voice interaction apparatus 20, to the one at the past time point by uttering by voice including the control terms for specifying the past time point. . Thus, the voice interaction device 20 can correct the content of the dialogue with the user by a simple method.

In particular, the voice dialogue apparatus 20 is characterized in that the content of the dialogue with the user is corrected by a simple method by performing control based on the voice of the user. Since it is difficult for the user to grasp the content of the dialogue in time series in the voice dialogue with the conventional voice dialogue apparatus, it is difficult to perform an operation of returning the content of the dialogue to a past time point desired by the user. Since the voice interaction device 20 according to the present embodiment performs control based on the user's voice, the content of the dialogue can be returned to a past time point desired by the user. And it is thought that the superiority of the control based on the user's voice increases as the content of the dialogue becomes more complicated, that is, the number of terms increases.

Also, when the dialogue information becomes more complicated as the technology evolves, for example, when there are several dozen or more holding units, the above-described correction method is highly advantageous. Because, in the case of a voice dialogue apparatus having less than ten holding units as shown in the present embodiment, the term held by the holding unit instead of returning the content of the dialogue with the user to a past time point It is practically possible to reset and reset the terminology from the beginning. However, in the case where the voice interactive apparatus includes several tens or more holding units, it is complicated to reset the terms held by the holding unit from the beginning, which is a heavy burden on the user. It is hard to say that it is possible. In such a case, since the voice interaction device 20 can return the content of the dialog with the user to a past time, it is possible to restart the dialog from a past time desired by the user without resetting from the beginning. There are advantages you can do.

Further, the control terms may include a first term that is a predetermined term indicating that the dialogue information is changed to a past time point, and a second term that is different from the predetermined term. At this time, the operation unit 25 determines whether or not both the first term and the second term are included in the utterance data acquired by the utterance data acquisition unit 22. When the operation unit 25 determines that both the first term and the second term are included in the utterance data, the operation unit 25 refers to the history and changes the term held in each of the plurality of slots 31. Specifically, when the slot 31 associated with the attribute of the second term among the plurality of slots 31 holds the second term, the term is changed to the term held by each slot 31.

According to this, the voice interaction device 20 can return the interaction information to a specific point in the past desired by the user by recognizing the control term including the first term and the second term spoken by the user. . As described above, the voice interaction device 20 can more easily identify the past time point to be referred to when returning the content of the dialogue with the user to the past state based on the user's voice.

In addition, the operation unit 25 may set a restoration point at the above time when the state of the plurality of slots 31 at a certain time on the history stored in the history table 32 satisfies a predetermined condition. Based on the set restoration point, the operation unit 25 changes the term held by the plurality of slots 31 to the term held by the slot 31 at the past time point. Here, the past time is the time when the slot 31 associated with the attribute of the second term among the plurality of slots 31 holds the second term among the times when the restoration point is set.

According to this, the voice interaction apparatus 20 determines whether or not a restoration point should be set at that time from the state of the plurality of slots 31 at each time stored in the history table 32. By appropriately setting the restoration points using predetermined conditions, it is possible to narrow down the time points at which the terms held by the holding unit are to be changed later. Thereby, when changing the term which a holding | maintenance part hold | maintains, the voice interactive apparatus 20 can return the state of a dialog to the more suitable past time point narrowed down by predetermined conditions.

Further, the voice interaction device 20 may further include a task control unit 24. The task control unit 24 outputs the term held in each of the plurality of slots 31 as dialogue information to the task processing unit 40 that performs processing based on the dialogue information. The task processing unit 40 performs processing based on the output of the task control unit 24. The task control unit 24 acquires information indicating the result of the processing by the task processing unit 40 as a response to the output to the task processing unit 40.

According to this, the voice interaction apparatus 20 presents the result of processing the terms held by the plurality of holding units by the external processing unit to the user. Therefore, the user can acquire the processing result reflecting the content of the dialogue with the voice dialogue apparatus 20.

In addition, the task processing unit 40 performs an information search using the acquired term as a search term, the task control unit 24 acquires a result of the information search as a response, and the voice interaction device 20 further includes an external processing control unit. You may provide the presentation control part 29 for showing the acquired information search result to a user.

According to this, the voice interaction device 20 can acquire the result of the search processing based on the content of the dialogue as a result of the processing by the external processing unit and present it to the user.

Further, the operation unit 25 may set a restoration point at the time when the task control unit 24 acquires the information search result in the history.

According to this, the voice interaction device 20 can return the term held by the holding unit to that at the time when the information search is performed by using the restoration point. The time when the information search is performed is also the time when the result is obtained, and is the time when the user can easily specify in the dialogue. By setting the restoration point in this way, the voice interaction apparatus 20 can return the term held by the holding unit to the one at the time when the user can easily specify intuitively. In addition, when presenting information prompting the user to specify the point in time when the dialogue information is to be returned, a more appropriate point in time can be presented as a restoration point candidate.

In addition, even when the task control unit 24 obtains the information search result in the history, the operation unit 25 determines that the information search result is zero when the information search result includes 0 information. Setting a restore point may be prohibited.

According to this, the voice interaction apparatus 20 can exclude the time point when the result of the information search is 0 from the time point when the restoration point is set. When the user wants to return the dialog state, it is considered useful to set the time when there are one or more information search results. Therefore, the voice interaction device 20 can return the content of the dialog with the user to a time useful for the user.

In addition, when there are two or more restoration points on the history when the terms held by each of the plurality of slots 31 are changed, the operation unit 25 is specified by the user among the two or more restoration points. You may change the terminology using a restore point.

According to this, the voice interaction apparatus 20 can return the content of the dialogue with the user to the one at the past time by using one restoration point specified by the user among a plurality of restoration points. Thereby, the user can select the time point that the user considers best from the time point when the voice interaction device 20 determines to be appropriate, and can return to the interaction information at the time point of selection.

Further, the voice interaction device 20 may further include a response sentence generation unit 21 that generates a response sentence for accepting one restoration point used for changing a term out of two or more restoration points from the user.

According to this, the user can know from the response sentence that there are a plurality of candidates when the voice dialogue device 20 returns the contents of the dialogue. The user specifies the time point for returning the content of the dialogue by responding to the response sentence. That is, the voice interactive apparatus 20 causes the user to specify one restoration point from among a plurality of restoration points. As a result, the voice interaction device 20 can specifically accept the designation of the restoration point from the user, and return the content of the dialogue with the user to that at the past time.

The control term may include a first term that is a predetermined term indicating that the dialogue information is changed to a past time point and an attribute name that is an attribute name of the acquired term. Good. The operation unit 25 determines whether or not the utterance data acquired by the utterance data acquisition unit 22 includes the first term and the attribute name. When the operation unit 25 determines that the first term and the attribute name are included in the utterance data, the operation unit 25 refers to the history and changes the term held in each of the plurality of slots 31. Specifically, the term held in each slot 31 immediately before holding the term currently held in the slot 31 associated with the attribute indicated by the attribute name among the plurality of slots 31 change.

According to this, based on the control terms acquired by the acquisition unit, the voice interaction device 20 specifically specifies a past time point using the name of the attribute with which the holding unit is associated. Even if the user does not specify a specific condition, the user can specify the point in time when the content of the dialogue is returned simply by specifying the attribute name. Thus, the voice interaction device 20 can correct the content of the dialogue with the user by a more specific method.

Also, the voice dialogue system 1 according to the present embodiment performs voice dialogue with the user. The voice dialogue system 1 includes a plurality of slots 31 for holding dialogue information indicating the contents of dialogue, a history table 32, an utterance data acquisition unit 22, an operation unit 25, a microphone 13, and a voice recognition unit 14. The task processing unit 40, the speech synthesis unit 12, the speaker 11, and the display device 10 are provided. Each of the plurality of slots 31 is associated with a term attribute, and holds a term having the associated attribute. The history table 32 stores the history of terms held by the plurality of slots 31. The utterance data acquisition unit 22 acquires utterance data indicating the content of the utterance by the user's voice, and the utterance term included in the acquired utterance data is associated with the attribute of the utterance term among the plurality of slots 31. To hold. When the utterance data acquired by the utterance data acquisition unit 22 includes control terms for controlling the conversation information, the operation unit 25 changes the terms held by the plurality of slots 31. Specifically, the operation unit 25 refers to the history stored in the history table 32 and determines the term held in each of the plurality of slots 31 at the past time point specified by the control term. It changes to the term which 31 had. The microphone 13 acquires the user's voice and generates a voice signal. The speech recognition unit 14 generates speech data acquired by the speech data acquisition unit 22 by performing speech recognition processing on the speech signal generated by the microphone 13. The task processing unit 40 acquires dialogue information held by the plurality of slots 31, performs predetermined processing on the acquired dialogue information, and outputs information indicating the processing result. The speech synthesizer 12 generates a response sentence for an utterance by the user's voice, and generates a speech signal by performing a speech synthesis process on the generated response sentence. The speaker 11 outputs the voice signal generated by the voice synthesizer 12 as voice. The display device 10 displays the processing result output by the task processing unit 40.

As a result, the same effect as that of the voice dialogue apparatus 20 is obtained.

Also, the method for controlling the voice interaction apparatus according to the present embodiment can be used for controlling the voice interaction apparatus 20 that performs voice conversation with the user. The voice interaction device 20 includes a plurality of slots 31 for holding dialogue information indicating the content of the dialogue and a history table 32. Each of the plurality of slots 31 is associated with a term attribute, and holds a term having the associated attribute. The history table 32 stores the history of terms held by the plurality of slots 31. The control method of the voice interaction apparatus 20 according to the present embodiment includes an acquisition step and a change step. In the acquisition step, utterance data indicating the content of the utterance by the user's voice is acquired, and the utterance term included in the acquired utterance data is held in the slot 31 associated with the attribute of the utterance term among the plurality of slots 31. Let In the change step, when the utterance data acquired in the acquisition step includes a control term for controlling the dialogue information, the term held in the plurality of slots 31 is changed. Specifically, referring to the history stored in the history table 32, the term held in each of the plurality of slots 31 is held in the slot 31 at the past time point specified by the control term. Change to a different term.

As described above, the embodiments have been described as examples of the technology in the present disclosure. For this purpose, the accompanying drawings and detailed description are provided.

Accordingly, among the components described in the accompanying drawings and the detailed description, not only the components essential for solving the problem, but also the components not essential for solving the problem in order to illustrate the above implementation. May also be included. Therefore, it should not be immediately recognized that these non-essential components are essential as those non-essential components are described in the accompanying drawings and detailed description.

In addition, since the above-described embodiment is for illustrating the technique in the present disclosure, various modifications, replacements, additions, omissions, and the like can be performed within the scope of the claims or an equivalent scope thereof.

The present disclosure is useful as a voice dialogue apparatus that can correct the content of dialogue with the user by a simple method. For example, the present disclosure can be applied to an application of a car navigation device, a smartphone (high-function mobile phone terminal), a mobile phone terminal, a mobile information terminal, or a PC (Personal Computer).

DESCRIPTION OF SYMBOLS 1 Voice dialogue system 10 Display apparatus 11 Speaker 12 Voice synthesizer 13 Microphone 14

Voice recognition part

20, 20A Voice dialogue apparatus 21 Response sentence production | generation part 22 Utterance data acquisition part 23 Sequence control part 24 Task control part 25 Operation part 26 Analysis part 27 Memory 28 Task

result analysis unit

29, 106 Presentation control unit 31 Slot 32, 320 History table 40 Task processing unit 101 Acquisition unit 102 Change unit 103 Holding unit 104 Storage unit 105 External processing control unit 310 Dialog sequence 311 Time information 312 Utterance 313 Response 321 Required slot group 322 Optional slot group 323 Action 324 Restore point 330 Search result

Claims

A voice dialogue device that performs voice dialogue with a user,
An acquisition unit for acquiring utterance data indicating the content of the utterance of the user's voice;
A plurality of holding units each holding a term included in the utterance data and the attribute of the term in association with each other;
A storage unit for storing a history of terms held by the plurality of holding units;
When a predetermined control term is included in the utterance data, referring to the history stored in the storage unit, the plurality of holding units held in the past time point specified by the control term A spoken dialogue apparatus comprising: a changing unit that changes the terms of the plurality of holding units so as to be terms.
The control term is:
A first term that is a predetermined term indicating that the dialogue information with the user is changed to a past time point, and a second term that is different from the predetermined term,
The changing unit is
It is determined whether or not the first term and the second term are included in the utterance data, and when it is determined that the first term and the second term are included, the plurality of the plurality of the utterance data are referred to by referring to the history. The corresponding holding unit that holds the attribute corresponding to the attribute of the second term is specified, and the plurality of holding units hold the second holding term when the corresponding holding unit holds the second term. The spoken dialogue apparatus according to claim 1, wherein the terms of the plurality of holding units are changed so as to be different terms.
The changing unit is
When the state of a plurality of holding units at a certain point on the history stored in the storage unit satisfies a predetermined condition, a restoration point is set at the point in time,
The terms of the plurality of holding units are set such that the corresponding holding unit is the term held by the plurality of holding units when the corresponding holding unit holds the second term among the time points when the restoration points are set. The spoken dialogue apparatus according to claim 2 to be changed.
further,
The external processing control part which acquires the result of a process from the process part which processes the said dialogue information containing the term which these holding | maintenance parts hold | maintain, and acquires the information which shows the result of the said process is provided. Voice interaction device.
The processing unit performs an information search using a term related to the dialogue information as a search word,
The external processing control unit acquires a result of the information search as a result of the processing,
further,
The voice interaction apparatus according to claim 4, further comprising a presentation control unit for presenting the information search result acquired by the external processing control unit to the user.
The voice interaction device according to claim 5, wherein the changing unit sets the restoration point at the time when the external processing control unit acquires the information search result in the history.
In the history, the changing unit is configured to restore the restoration when the information included in the information search result is 0 even when the external processing control unit acquires the information search result. The voice interactive apparatus according to claim 6, wherein setting of points is prohibited.
When there are two or more restoration points on the history, the changing unit is a term held by the plurality of holding units at a restoration point specified by the user among the two or more restoration points. The spoken dialogue apparatus according to claim 3, wherein the terms of the plurality of holding units are changed.
further,
The response sentence generation part which generates the response sentence for accepting selection of one restoration point used for changing the term of a plurality of above-mentioned restoration parts among the two or more restoration points from the user. Voice interaction device.
The control term is:
A first term that is a predetermined term indicating that the dialogue information with the user is changed to a past time point, and an attribute name that is a name of the attribute,
The changing unit is
It is determined whether or not the first term and the attribute name are included in the utterance data, and when it is determined that the first term and the attribute name are included, the plurality of holdings are referred to the history. Identifying the correspondence holding unit that holds the attribute indicated by the attribute name, and the term held by the plurality of holding units immediately before holding the term currently held by the correspondence holding unit; The spoken dialogue apparatus according to claim 1, wherein the terms of the plurality of holding units are changed.
A voice dialogue device that performs voice dialogue with a user,
A plurality of holding units for holding dialogue information indicating the content of dialogue with the user;
A control circuit for controlling a dialogue with the user,
The plurality of holding portions are
Each is associated with a term attribute,
The control circuit includes:
Obtaining utterance data indicating the content of the utterance of the user's voice;
The utterance term included in the acquired utterance data is held in a holding unit associated with the attribute of the utterance term among the plurality of holding units,
When a predetermined control term is included in the utterance data, the terminology of the plurality of holding units is changed so that the term is held by the plurality of holding units at a past time specified by the control term. A voice interaction device.
The voice interaction device according to any one of claims 1 to 11,
A microphone for acquiring the user's voice and generating a voice signal;
A voice recognition unit that generates the utterance data acquired by the acquisition unit by performing a voice recognition process on the voice signal generated by the microphone;
A processing unit that acquires dialogue information including terms held by the plurality of holding units, performs predetermined processing on the acquired dialogue information, and outputs a result of the processing;
A speech synthesizer that generates a response signal to an utterance by the user's voice and generates a speech signal by performing speech synthesis processing on the generated response statement;
A speaker that outputs the voice signal generated by the voice synthesizer as voice;
A speech dialogue system comprising: a display device that displays a result of the processing output by the processing unit.
A method for controlling a voice dialogue apparatus that performs voice dialogue with a user,
The voice interaction device
A plurality of holding units that hold the terms included in the utterance data indicating the content of the utterances by the user's voice and the attributes of the terms in association with each other,
A storage unit that stores a history of terms held by the plurality of holding units,
The control method is:
An acquisition step of acquiring the utterance data;
When a predetermined control term is included in the utterance data obtained in the obtaining step, the history stored in the storage unit is referred to, and the plurality of the plurality of utterance data at the past time point specified by the control term And a changing step of changing the terms of the plurality of holding units so that the terms held by the holding unit are obtained.