US20170364323A1 - Dialogue support apparatus, method and terminal - Google Patents

Dialogue support apparatus, method and terminal Download PDF

Info

Publication number
US20170364323A1
US20170364323A1 US15/421,125 US201515421125A US2017364323A1 US 20170364323 A1 US20170364323 A1 US 20170364323A1 US 201515421125 A US201515421125 A US 201515421125A US 2017364323 A1 US2017364323 A1 US 2017364323A1
Authority
US
United States
Prior art keywords
dialogue
user
state
display
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Abandoned
Application number
US15/421,125
Inventor
Hiroko Fujii
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Toshiba Corp
Original Assignee
Toshiba Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Toshiba Corp filed Critical Toshiba Corp
Assigned to KABUSHIKI KAISHA TOSHIBA reassignment KABUSHIKI KAISHA TOSHIBA ASSIGNMENT OF ASSIGNORS INTEREST (SEE DOCUMENT FOR DETAILS). Assignors: FUJII, HIROKO
Publication of US20170364323A1 publication Critical patent/US20170364323A1/en
Abandoned legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • G06F17/279
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • G06F3/0482Interaction with lists of selectable items, e.g. menus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • G06F40/35Discourse or dialogue representation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Definitions

  • Embodiments described herein relate generally to a dialogue support apparatus, method and terminal.
  • FIG. 1 is a conceptual diagram showing an example of a dialogue system on which an embodiment is based.
  • FIG. 2 is a block diagram of a dialogue support apparatus according to the first embodiment.
  • FIG. 3 is a drawing showing an example of a dialogue information table stored in a dialogue information storage.
  • FIG. 4 is a flowchart showing the operation of the dialogue support apparatus when receiving a user's utterance.
  • FIG. 5 is a flowchart showing the operation of the dialogue support apparatus when receiving a user's operation.
  • FIG. 6 is a drawing showing a specific example of a dialogue between a user and the dialogue system according to the first embodiment.
  • FIG. 7 is a drawing showing an example of a user interface according to the first embodiment.
  • FIG. 8 is a drawing showing a display example of a processing result of a user's utterance U 2 .
  • FIG. 9 is a drawing showing a dialogue information table to which dialogue information related to a user's utterance U 3 is added.
  • FIG. 10 is a drawing showing a display example of a processing result of a user's speech U 3 .
  • FIG. 11 is a drawing showing a display example in which two search results are displayed side by side.
  • FIG. 12 is a drawing showing a dialogue information table to which dialogue information related to a user's utterance U 5 is added.
  • FIG. 13 is a drawing showing a display example of a processing result of a user's utterance U 5 .
  • FIG. 14 is a drawing showing a specific example of a dialogue between a user and a dialogue system according to the second embodiment.
  • FIG. 15 is a drawing showing a dialogue information table to which dialogue information related to a user's utterance U 13 is added.
  • FIG. 16 is a drawing showing a display example of a processing result of a user's utterance U 13 .
  • FIG. 17 is a drawing showing a dialogue information table to which dialogue information related to a user's utterance U 14 is added.
  • FIG. 18 is a drawing showing an example of a user interface according to the second embodiment.
  • FIG. 19 is a drawing of a display example of the user interface when a dialogue state switching button is pressed down.
  • a dialogue support apparatus includes a processing circuitry.
  • the processing circuitry receives input information indicating a user's intention.
  • the processing circuitry determines at least one target dialogue state among dialogue histories related to dialogue states indicating dialogue results between a user and a dialogue system which estimates the user's intention and performs processing in accordance with the user's intention, the target dialogue state being a target of instructions from the user.
  • the processing circuitry acquires a processing result from the dialogue system by inputting the input information and the target dialogue state into the dialogue system.
  • the processing circuitry generates first update information to update a display of the processing result.
  • a dialogue system 100 shown in FIG. 1 includes a terminal 101 and a dialogue processing server 102 .
  • the terminal 101 is a mobile terminal such as a tablet computer or a smartphone used by a user 103 , for example.
  • the user 103 inputs speech to a client application installed onto the terminal 101 and speech recognition is performed to acquire a speech recognition result; however, similar processing can be performed for a case of text input.
  • the terminal 101 sends the speech recognition result to the dialogue processing server 102 .
  • the dialogue processing server 102 is connected to the terminal 101 via network 104 , and receives the speech recognition result from the terminal 101 .
  • the dialogue processing server 102 operates as a dialogue system, which estimates the user's intention from the speech recognition result, performs processing in accordance with the intention, and performs dialogue processing of the speech recognition result.
  • the dialogue processing server 102 sends a processing result of the dialogue processing to the terminal 101 .
  • the dialogue support apparatus 200 includes an operation receiver 201 (a second receiver), a dialogue receiver 202 (a first receiver), a dialogue information storage 203 , a target determiner 204 , a request processor 205 , a dialogue information updater 206 , and a display updater 207 .
  • the dialogue support apparatus 200 may be included in the dialogue processing server 102 , or in a server different from the dialogue processing server 102 . If the dialogue support apparatus 200 is included in the dialogue processing server 102 or in a different server, the terminal 101 should include a speech acquisition unit, such as a microphone, a display unit which displays a user interface, and a communication unit which performs data communication. Furthermore, the dialogue support apparatus 200 may be included in the terminal 101 .
  • the operation receiver 201 receives a user's operation of the screen, and acquires an operation target and operation detail. As the process of acquiring the operation target and the operation detail, processing on a user interface which is common in a tablet device may be adopted, and a detailed description of such processing is therefore omitted herein.
  • the dialogue receiver 202 receives a user's utterance as an audio signal, performs speech recognition on the audio signal, and receives text of the speech recognition result as input information indicating a user's intention. For example, a user's utterance input to a microphone installed onto the terminal 101 shown in FIG. 1 may be received as an audio signal.
  • the speech recognition processing may be performed using a speech recognition server (not shown in the drawings) on so-called cloud computing, or using a speech recognition engine installed onto the terminal 101 .
  • the dialogue receiver 202 may receive, as user input information, text that is directly input by the user using a keyboard.
  • the dialogue information storage 203 stores, as a form of table, dialogue information including a dialogue state indicating a dialogue result between the dialogue system and the user.
  • the table also presents a dialogue history, which is a history of dialogue states, by identifying the dialogue information in a chronological manner. The details of the dialogue information will be described later with reference to FIG. 3 .
  • the target determiner 204 receives input information from the dialogue receiver 202 , refers to dialogue information stored in the dialogue information storage 203 , and determines a dialogue state as a target dialogue state from the dialogue history, in accordance with an instruction from the user.
  • the request processor 205 receives the dialogue state and the input information from the target determiner 204 , and inputs the dialogue state and the input information to the dialogue processing system, and sends them to the dialogue processing server 102 in FIG. 1 in the example described herein.
  • the request processor 205 acquires a processing result which is a result of dialogue processing at the dialogue processing server 102 .
  • the dialogue information updater 206 acquires an operation target and operation details from the operation receiver 201 and a processing result from the request processor 205 , and updates dialogue information stored in the dialogue information storage 203 .
  • the display updater 207 receives the processing result, the operation target, and the operation details from the dialogue information updater 206 , and generates update information to update the display of the processing result on the user interface.
  • the user interface is a screen displayed on a display of the terminal 101 , for example, and it is updated based on the update information, as needed.
  • a dialogue state identifier 301 In the table 300 shown in FIG. 3 , a dialogue state identifier 301 , a dialogue sequence identifier 302 , a dialogue state name 303 , a display status flag 304 , a time 305 , and system response data 306 are associated each other and stored as dialogue information 307 .
  • the dialogue state identifier 301 is an identifier given to each dialogue state.
  • the dialogue sequence identifier 302 is an identifier given to a group of dialogue states related to each other in terms of a dialogue. It is assumed that a rule of giving the dialogue sequence identifier 302 is predefined by the dialogue processing server 102 ; however, a rule that is determined on the terminal side may be used. As a predefined rule, for example, the same identifier may be given to a dialogue state included in the same dialogue scenario.
  • the dialogue state name 303 is a name which is given to a dialogue state managed by the dialogue processing server 102 .
  • the display status flag 304 is a flag indicating how a dialogue state is displayed on the user interface of a client (herein, the screen of the terminal 101 ).
  • the display status flag 304 “0” indicates that a processing result is not being displayed
  • the flag “1” indicates that a processing result is being normally displayed on the user interface
  • the flag “2” indicates that a processing result is being displayed in a thumbnail on the user interface.
  • a normal display means a display in a size that can be visually identified when the user considers the details of a processing result.
  • a thumbnail display is a display smaller than the normal display, and it may simply be a display reduced in size, or may only show character strings including search keywords, as long as the user can understand their intended conditions from the display and the display does not interfere with the normal display.
  • the time 305 is a time when a system response including a processing result is acquired from the dialogue processing server 102 .
  • the system response data 306 indicates a body of response data from the dialogue processing server 102 in response to a request from the request processor 205 .
  • the response data body includes speech synthesis text, XML of a search result, and data in a JSON format, for example.
  • the dialogue state identifier 301 “1”, the dialogue sequence identifier 302 “1”, the dialogue state name 303 “Search_spot (Kyoto)”, the display status flag 304 “0”, the time 305 “2014/07/07 12:05:11” and the system response data 306 are associated with each other and stored as dialogue information 307 .
  • step S 401 the dialogue receiver 202 receives a user's utterance and generates text as a result of the speech recognition on the utterance.
  • step S 402 the target determiner 204 specifies an utterance which is a target of a request from the user.
  • the display status flag included in the dialogue information is “1”, in other words, the dialogue state which is in a normal display is determined as a dialogue state targeted by the user, then the dialogue state with the display status flag “1” is acquired as a target dialogue state.
  • step S 403 the request processor 205 sends, as a processing request, text acquired by performing speech recognition on the utterance and the target dialogue state to the dialogue processing server 102 .
  • step S 404 the request processor 205 receives from the dialogue processing server 102 a processing result in response to the processing request.
  • step S 405 the dialogue information updater 206 updates the dialogue information stored in the dialogue information storage 203 based on the processing result.
  • step S 406 the display updater 207 generates update information to update the screen based on the updated dialogue information. The operation of the dialogue support apparatus 200 is thus finished.
  • the operation of the dialogue support apparatus 200 in the case of receiving an operation from a user will be explained with reference to the flowchart shown in FIG. 5 .
  • the processing is performed every time a user's operation is acquired.
  • step S 501 the operation receiver 201 receives a user's operation and determines a dialogue state which is an operation target.
  • step S 502 the operation receiver 201 determines, as an operation detail, whether the user's operation is an operation associated with an enlarged display. If the operation is associated with an enlarged display, the process proceeds to step S 503 ; if not, the process proceeds to step S 504 .
  • step S 503 the dialogue information updater 206 updates the display status flag to “1” for the dialogue information included in the dialogue state which is an operation target.
  • step S 504 the operation receiver 201 determines, as an operation detail, whether the user's operation is associated with a reduced display. If the operation is associated with a reduced display, the process proceeds to step S 505 ; if not, the process is finished.
  • step S 505 the dialogue information updater 206 updates the display status flag to “2” for the dialogue information included in a dialogue state which is an operation target.
  • step S 506 the display updater 207 generates update information to update a screen based on the updated dialogue information.
  • the operation of the dialogue support apparatus 200 in the case of receiving an operation from the user is thus finished by the above processing.
  • FIG. 6 is an example of a dialogue between the user and the dialogue system when the user who has not yet decided any specific details, such as a destination, etc., first searches and compares sightseeing spots in a candidate place, and then searches hotels near the candidate place.
  • the user utterances (U 1 , U 2 , . . . ) and the system responses (S 1 , S 2 , . . . ) are listed in the chronological order of the utterances. If there is a user's operation which is input to the user interface, it is added as (UI 1 , . . . ) to the chronological order of the utterances.
  • the user interface 700 displays the display region 701 and the speech recognition button 702 .
  • the speech recognition button 702 is a button for a user to notify when to start speech recognition, and the button may be physically pressed, or may be displayed on the screen.
  • the speech recognition processing begins when the user presses or touches the speech recognition button 702 . It may be designed so that the speech recognition begins when the user speaks at a volume louder than a predetermined volume, without a need of pressing the speech recognition button 702 .
  • the user says the user's utterance U 1 , “Tell me sightseeing spots in Kyoto”.
  • the dialogue receiver 202 receives the user's utterance U 1 and performs speech recognition, and converts the user's utterance into text.
  • the target determiner 204 refers to the dialogue information and determines whether there are any dialogue statuses displaying a processing result on the user interface 700 .
  • the target determiner 204 determines a dialogue state as “no target”.
  • the request processor 205 generates a processing request message including information indicating the user utterance U 1 and “no target” as a dialogue state, and sends the processing request message to the dialogue processing server 102 .
  • the dialogue processing server 102 searches for a dialogue scenario that is capable of receiving a user's utterance.
  • a sightseeing spots search process has been performed.
  • the dialogue processing server 102 sends a response message including a processing result of the sightseeing spot search process to the request processor 205 .
  • the response message includes a dialogue state identifier, a dialogue sequence identifier, a dialogue state name, and a processing time, which are elements of dialogue information, in addition to the text of the user's utterance, and the processing result of the sightseeing spot search process.
  • a dialogue state is an initial state
  • the dialogue state identifier is “1”
  • the dialogue state name is “Search_spot (Kyoto)”.
  • the request processor 205 receives a response message from the dialogue processing server 102 .
  • the dialogue information updater 206 updates dialogue information to be stored in the dialogue information storage 203 , using the text of the user's utterance UI included in the response message acquired from the request processor 205 , the dialogue state identifier corresponding to the processing result corresponding to the user's utterance U 1 , the dialogue sequence identifier, and the processing time.
  • the processing result of the sightseeing spot search process is displayed, and the display status flag is set to “1”.
  • the dialogue state identifier 301 “1”, the dialogue sequence identifier 302 “1”, the dialogue state name “Search_spot (Kyoto)”, the display status flag 304 “1”, the time 305 “2014/07/07 12:05:11”, and the system response data 306 are associated with each other and stored as dialogue information 307 .
  • the display updater 207 displays the processing result of the user's utterance U 1 in a normal manner on the display region 701 .
  • the system response S 1 from the dialogue system, “Sightseeing spots in Kyoto are displayed” may be output by audio, or may be displayed on a screen.
  • the processing result at the dialogue system for the user's utterance U 1 is displayed on the screen as shown in FIG. 7 , and the search result for the sightseeing spots in Kyoto are displayed in a normal manner.
  • the user speaks a new user's utterance U 2 , “Narrow down to the Arashiyama area”.
  • the dialogue receiver 202 receives the user's utterance U 2 and performs speech recognition, and converts the user's utterance into text.
  • the target determiner 204 refers to the dialogue information stored in the dialogue information storage 203 , and determines a dialogue state as a target of the user's utterance U 2 .
  • the dialogue state to become a target is a dialogue state with the display status flag “1”, and herein, the dialogue state with the dialogue status identifier “1” is determined as a target dialogue state.
  • the request processor 205 generates a processing request message including information indicating the user utterance U 2 and the dialogue information in the target dialogue state, and sends the processing request message to the dialogue processing server 102 .
  • an intention of the user's utterance U 2 is analyzed and it is determined that the intention is a narrowed-down search request for the target dialogue state.
  • a general dialogue scenario determination for an audio dialogue may be used; the detailed explanation of the determination is thus omitted herein.
  • the dialogue sequence identifier associated with a dialogue state to be newly generated i.e., a dialogue state identifier “2”
  • the dialogue processing server 102 performs processing (search processing in this example) based on the intention analysis result for the utterance and the dialogue scenario corresponding thereto, and sends a response message including a processing result of the search processing to the request processor 205 .
  • the dialogue information updater 206 adds and updates the text of the user's utterance U 2 and the information related to the processing result for the user's utterance U 2 as dialogue information, in accordance with the content of the response message acquired from the request processor 205 .
  • the dialogue state identifier 301 “2”, the dialogue sequence identifier 302 “1”, the dialogue state name “Search_spot (Arashiyama)”, the display status flag 304 “1”, the time 305 “2014/07/07 12:05:45”, and the system response data 306 are associated with each other and stored as dialogue information 307 .
  • the display status flag 304 associated with the user's utterance U 1 is also updated accordingly. In other words, because the dialogue information with the dialogue state identifier 301 “1” will no longer be displayed on the screen, the display status flag 304 is updated from “1” to “0”.
  • the display updater 207 displays the processing result for the user's utterance U 2 in a normal manner.
  • a display example of the processing result for the user's utterance U 2 is shown in FIG. 8 .
  • the system response S 2 “These are sightseeing spots in the Arashiyama area” is output, and the sightseeing spots in the Arashiyama area are displayed in a normal display.
  • the user speaks a new user's utterance U 3 “Change the location to Izu”.
  • the dialogue receiver 202 receives the user's utterance U 3 and performs speech recognition, and converts the user's utterance into text.
  • the target determiner 204 refers to dialogue information stored in the dialogue information storage 203 , and determines, as a target dialogue state, the dialogue information with the dialogue state identifier “2” associated with the display status flag “1”.
  • the request processor 205 generates a processing request message including information indicating the user utterance U 3 and “no target” as a dialogue state, and sends the processing request message to the dialogue processing server 102 .
  • the intention of the user's utterance U 3 is analyzed, and the intention is determined as a (new) sightseeing spot search request with a change of location, not a narrowed-down search.
  • a general dialogue scenario determination may be used to make this determination as well.
  • a new dialogue sequence identifier “2” is given to a dialogue state (a dialogue state identifier “3”) which is newly generated as a result of processing the user's utterance U 3 .
  • the dialogue processing server 102 performs processing based on the result of the intention analysis for the utterance and the dialogue scenario, and sends a response message including a processing result of the search process to the request processor 205 .
  • the dialogue information updater 206 adds and updates the text of the user's utterance U 3 and the information related to the processing result for the user's utterance U 3 as dialogue information, in accordance with the content of the response message acquired from the request processor 205 .
  • the dialogue state identifier 301 “3”, the dialogue sequence identifier 302 “2”, the dialogue state name “Search_spot (Izu)”, the display status flag 304 “1”, the time 305 “2014/07/07 12:06:20”, and the system response data 306 are associated with each other and stored as dialogue information 307 .
  • the dialogue information updater 206 determines whether or not the dialogue sequence identifier of the dialogue state currently with the display status flag “1” is the same as the dialogue sequence identifier of the dialogue state which is about to be added. If they are not the same, the current display status flag “1” is changed to “2”, and the display status flag of the dialogue state which is newly added is set to “1”.
  • the dialogue sequence identifier of the dialogue information with the dialogue state identifier “2”, which is associated with the display status flag “1”, is “1”, and the dialogue sequence identifier of the new dialogue state is “2”, the display status flag of the dialogue information with the dialogue state identifier “2” is changed from “2” to “1”, and the display status flag of the newly-added dialogue state is set to “1”.
  • the dialogue information table of the dialogue information storage 203 at the time of adding the dialogue information associated with the user's utterance U 3 is shown in FIG. 9 .
  • the display status flag 304 of the dialogue information with the dialogue state identifier 301 “2” is changed to “2”, and the display status flag 304 of the dialogue information with the dialogue state identifier 301 “3” is set to “1”.
  • the display updater 207 generates a thumbnail of the search result corresponding to the dialogue state (the dialogue state identifier 2 ) for which the display status flag has been changed from “1” to “2”, and displays the thumbnail.
  • a display example of the processing result for the user's utterance U 3 is shown in FIG. 10 .
  • the processing result for the user's utterance U 3 is displayed as a normal display in the display region 701 , and the system response S 3 “These are the sightseeing spots in Izu” is output.
  • the search result display corresponding to the user's utterance U 2 is displayed as a thumbnail display in a thumbnail display region 1001 at the top right of the display region 701 .
  • a display method is not limited to a thumbnail display, and may be any display method as long as a search result is readily accessible, such as a tab display on the screen.
  • the user speaks a new user's utterance U 4 “Narrow down the location to the Izu Kogen area”.
  • the operation receiver 201 specifies that the operation target is a thumbnail, and generates information indicating that the thumbnail is dragged to or in the proximity of the center of the screen. It is determined that the drag operation is an enlarged display instruction based on the thumbnail and the information indicating that the thumbnail has been dragged.
  • the dialogue information updater 206 updates the display status flag “2” of the dialogue information with the dialogue state identifier “2”, with which the display status flag “2” is associated, to the display status flag “1” based on the information from the operation receiver 201 .
  • the display updater 207 updates the user interface so as to display the search result of the dialogue information in a thumbnail display in a normal display, side by side with the search result which is currently displayed in the display region 701 .
  • the display example in which two search results are displayed side by side is shown in FIG. 11 .
  • the search result screen 1101 of “Sightseeing spots in Izu Kogen”, and the search result display “ 1102 ” of “Sightseeing spots in Arashiyama” which have been in a thumbnail display are horizontally displayed side by side in the same size. They may also be displayed in a vertical direction. In a case where three or more search results are displayed side by side, such as a case where one search result is displayed and two thumbnails are dragged to the center, the search results may be displayed in a similar manner.
  • Displaying in such a side-by-side manner makes it easy for a user to compare result screens. If the search results displayed side by side are changed, a search result displayed in the display region 701 can be dragged toward the outside of the user interface. Thus, the operation receiver 201 determines that the drag operation is a reduced display, and the dialogue information updater 206 updates the display status flag “1” of the dialogue information which is a target of the drag operation to “2”. The display updater 207 may change the display of the dragged search result to a thumbnail display.
  • the user speaks a new user's utterance U 5 , “Show me hotels”.
  • the target determiner 204 refers to dialogue information stored in the dialogue information storage 203 , and determines, as a target dialogue state, the dialogue information with the dialogue state identifiers “2” and “4” associated with the display status flag “1”.
  • the request processor 205 generates a processing request message including information indicating the user utterance U 5 and dialogue information with the dialogue state identifiers “2” and “4” to the dialogue processing server 102 .
  • an intention of the user's utterance U 5 is analyzed and it is determined that the intention of the utterance is a hotel search.
  • the dialogue processing server 102 searches for hotels for each of the dialogue states respectively with the dialogue state identifiers “2” and “4”, and sends a response message including a process result of the search processing to the request processor 205 .
  • the dialogue information updater 206 adds and updates the text of the user's utterance U 5 and the information related to the processing result for the user's utterance U 5 as dialogue information, in accordance with the content of the response message acquired from the request processor 205 .
  • the dialogue state identifier 301 “5”, the dialogue sequence identifier 302 “1”, the dialogue state name “Search_hotel (Arashiyama)”, the display status flag 304 “1”, the time 305 “2014/07/07 12:07:50”, and the system response data 306 are associated with each other and stored as dialogue information 307
  • the dialogue state identifier 301 “6”, the dialogue sequence identifier 302 “2”, the dialogue state name “Search_hotel (Izu Kogen)”, the display status flag 304 “1”, the time 305 “2014/07/07 12:07:50”, and the system response data 306 are associated with each other and stored as dialogue information 307 .
  • the display status flag “1” of the dialogue information with the dialogue state identifier “2” and the display status flag “i” of the dialogue information with the dialogue state identifier “4” are changed to “0”.
  • the dialogue information table of the dialogue information storage 203 at the time of adding the dialogue information associated with the user's utterance U 5 is shown in FIG. 12 .
  • the search result 1301 for “Hotels in Izu Kogen” and the search result 1302 for “Hotels in Arashiyama” are displayed.
  • determining the dialogue state in a normal display as a target dialogue state makes it easy to select a dialogue state which the user wants to continue, and the user's inputting an utterance under one condition makes it possible to obtain a processing result for a plurality of dialogue states.
  • the target determiner 204 determines a dialogue state with the display status flag 304 “1” as a target dialogue state; however, a dialogue state designated by a user touching a dialogue state in a thumbnail display or in a normal display on the screen may be determined as a target dialogue state, regardless of a thumbnail display or a normal display.
  • a designated flag may be newly set for the dialogue information of the dialogue state selected by a user, and the dialogue state for which the designated flag is set should be sent to the dialogue processing server.
  • a plurality of dialogue states designated by a user's utterance may be determined as target dialogue states.
  • a target dialogue state is designated by an utterance
  • a dialogue state in which a match between the text of an utterance obtained by performing speech recognition and a keyword included in a dialogue state name can be found may be determined as a target dialogue state, for example.
  • one or more dialogue states for which a user wants to make a request can be easily designated by determining a target dialogue state based on a user's utterance and operation, thereby improving the convenience of an audio dialogue when searching for a program starring an actor whom a user wants to watch, or when comparing and deciding travel destinations, electric home appliances, clothing, and so on at an online store on the Internet. For example, there is no need to input a redundant utterance, as shown below, when searching for a drama:
  • a smoother dialogue can be supported.
  • the dialogue support apparatus has the same configuration as that of the first embodiment, except for the operation of the request processor 205 , the explanation is omitted herein.
  • the request processor 205 according to the second embodiment operates in almost the same manner as the request processor 205 according to the first embodiment; however, if a user's utterance changes a condition of a dialogue state included in the dialogue history, the request processor 205 according to the second embodiment is different from the request processor 205 according to the first embodiment in respect of acquiring, from the dialogue processing server 102 , a dialogue state to which user's utterances after a dialogue state which is a target of change are reflected among the dialogue states included in the dialogue history.
  • a dialogue example according to the second embodiment is explained with reference to FIG. 14 .
  • FIG. 14 is an example of a dialogue between a user and a dialogue system when a user schedules a recording of a TV program.
  • the user inputs an utterance to request scheduling a recording, and a dialogue is carried out along a dialogue scenario wherein the dialogue system asks the user about a broadcast date, a start time, a broadcast channel, and a program title to specify a program and schedule a recording of the program.
  • the user says the user's utterance U 10 , “I want to schedule a recording”.
  • Dialogue information, the dialogue state identifier 301 “10”, the dialogue sequence identifier 302 “5”, the dialogue state name “Record_TV ( )”, the display status flag 304 “1”, the time 305 “2014/07/07 12:05:45”, and the system response data 306 are associated with each other and stored in the dialogue information storage 203 as dialogue information 307 .
  • a result related to the preparation of the scheduled recording processing is displayed in the search result region, and the display status flag is set to “1”.
  • the screen updater 207 displays a processing result related to the preparation of the scheduled recording processing on the display region 701 , and outputs a system response S 10 from the system, “When is the program broadcast?”.
  • the user speaks a user's utterance U 11 “July 10th” as a response to the system response S 10 , “When is the program broadcast?”.
  • the dialogue receiver 202 receives the user's utterance U 11 and performs speech recognition, and converts the user's utterance into text.
  • the target determiner 204 selects a dialogue state with the display status flag “1” as a target dialogue state.
  • the request processor 205 generates a request message which includes the user's utterance U 11 and the dialogue information associated with the dialogue state identifier “10” as the target dialogue state.
  • an intention of the user's utterance U 11 is analyzed and it is determined that this utterance is a response to the inquiry from the system in the target dialogue state (the dialogue state identifier “10”).
  • the dialogue processing server 102 sets a dialogue sequence identifier of the newly-generated dialogue state (the dialogue state identifier “11”) to be the same as the dialogue information with the dialogue state identifier “10”, and sends the dialogue sequence identifier to the request processor 205 as a response message.
  • the dialogue information updater 206 adds and updates the information related to the processing result for the user's utterance U 11 as dialogue information, in accordance with the content of the response message acquired from the request processor 205 .
  • the dialogue state identifier 301 “11”, the dialogue sequence identifier 302 “5”, the dialogue state name “Record_TV (date: July 10)”, the display status flag 304 “1”, the time 305 “2014/07/07 12:05:45”, and the system response data 306 are associated with each other and stored as dialogue information 307 .
  • the display status flag 304 is also updated accordingly.
  • the display updater 207 displays the processing result for the user's utterance U 11 in a normal manner, and outputs the system response S 11 , “From what time does it start?”.
  • the user's subsequent utterance U 12 “From nine p.m.”, the system response S 12 “What is the broadcast channel?”, the user's utterance U 13 “Channel XX”, and the system response S 13 “What is the program title?” can be processed in a manner similar to the foregoing processing; the explanation is omitted herein.
  • the dialogue information table of the dialogue information storage 203 at the time of adding the dialogue information associated with the user's utterance U 13 is shown in FIG. 15 .
  • each dialogue information 307 is stored similar to the dialogue information shown in FIG. 3 . As the dialogue proceeds, the dialogue information containing more conditions for recording processing is stored.
  • the broadcast date, the start time, and the broadcast channel are input to schedule a recording of a program.
  • the target determiner 204 refers to dialogue information stored in the dialogue information storage 203 , and determines, as a target dialogue state, the dialogue information with the dialogue state identifier “13” associated with the display status flag “1”.
  • the request processor 205 generates a request message which includes the user's utterance U 14 and the dialogue information of the target dialogue state.
  • an intention of the user's utterance U 14 is analyzed and it is determined that this utterance is an utterance to change the broadcast date, not an utterance in response to the system response S 13 .
  • the dialogue processing server 102 interrupts the ongoing dialogue of recording scheduling for the broadcast date of July 10th, and generates a state of a dialogue of a new recording scheduling for the broadcast date of July 11th.
  • the request processor 205 acquires from the dialogue processing server 102 a dialogue state utilizing the dialogue before the change.
  • the first system's utterance after the system's utterance related to the broadcast date is the system response S 11 “From what time does it start?”.
  • the request processor 205 receives from the dialogue processing server 102 a response message which includes both of a dialogue state up to the change of the broadcast date to July 11th, as well as a dialogue state up to the change of the broadcast date to July 11th and the setting of subsequent user utterance U 13 “Channel XX”.
  • the dialogue information updater 206 acquires a response message which includes the two dialogue states, and updates the dialogue information.
  • the dialogue information table of the dialogue information storage 203 at the time of adding the dialogue information associated with the user's utterance U 14 is shown in FIG. 17 .
  • Two of dialogue information 1701 and 1702 are recorded as a processing result of the user's utterance U 14 “Change the broadcast date to July 11th”.
  • the dialogue information 1701 only the broadcast date is changed, and the other user's utterances are not utilized; thus, the dialogue information 1701 is stored as a dialogue state name “Record_TV (date: July 11)”.
  • the dialogue information 1702 is in a dialogue state in which the broadcast date is changed and usable dialogues in the dialogue history are used; thus, the dialogue information 1702 is stored with the dialogue state name “Record_TV (date: July 11/stime: 21:00/channel: Channel XX)”.
  • the display status flag “3” indicates a dialogue state is replaceable with the dialogue state currently displayed in the display region 701 .
  • FIG. 18 a display example of the user interface when the dialogue information 1701 is displayed is shown in FIG. 18 .
  • a dialogue state switching button 1801 is newly displayed at the top right of the screen.
  • the dialogue state switching button 1801 is a button for making a dialogue state corresponding to the dialogue information 1702 sent from the dialogue processing server 102 effective.
  • FIG. 19 a display example of the user interface when the dialogue state switching button 1801 is pressed down is shown in FIG. 19 .
  • the screen updater 207 may delete the dialogue state switching button 1801 when a user's new utterance is displayed on the screen in the state shown in either FIG. 18 or 19 .
  • the instructions included in the steps described in the foregoing embodiment can be implemented based on a software program.
  • a general-purpose computer system may store the program beforehand and read the program in order to attain the same advantage as the above-described dialogue support apparatus.
  • the instructions described in the above embodiments are stored in a magnetic disc (flexible disc, hard disc, etc.), an optical disc (CD-ROM, CD-R, CD-RW, DVD-ROM, DV ⁇ R, DVD ⁇ RW, Blu-ray (registered trademark) disc, etc.), a semiconductor memory, or a similar storage medium, as a program executable by a computer.
  • any storage format can be used.
  • An operation similar to the operation of the dialogue support apparatus of the above-described embodiments can be realized if a computer reads a program from the storage medium, and executes the instructions written in the program on the CPU based on the program.
  • the computer may, of course, acquire or read the program by way of a network.
  • an operating system (OS) working on a computer may execute a part of processes for realizing the present embodiments based on instructions from a program installed from a storage medium onto a computer and a built-in system.
  • OS operating system
  • MW middleware
  • the storage medium according to the present embodiments is not limited to a medium independent from a system or a built-in system; a storage medium storing or temporarily storing a program downloaded through a LAN or the Internet, etc. is also included as the storage medium according to the present embodiments.
  • a storage medium is not limited to one; when the process according to the present embodiments is carried out in multiple storage media, these storage media are included as a storage medium according to the present embodiments, and can take any configuration.
  • the computer or built-in system in the present embodiments are used to execute each process disclosed in the present embodiments based on a program stored in a storage medium, and the computer or built-in system may be an apparatus consisting of a PC or a microcomputer, etc. or a system, etc. in which a plurality of apparatuses are connected through a network.
  • the computer adopted in the present embodiments is not limited to a PC; it may be a calculation processing apparatus, a processing circuitry, a microcomputer, etc. included in an information processor, and a device and apparatus that can realize the functions disclosed in the present embodiments by a program.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Acoustics & Sound (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

According to one embodiment, a dialogue support apparatus includes a processing circuitry. The processing circuitry receives input information indicating a user's intention. The processing circuitry determines at least one target dialogue state among dialogue histories related to dialogue states indicating dialogue results between a user and a dialogue system which estimates the user's intention and performs processing in accordance with the user's intention, the target dialogue state being a target of instructions from the user. The processing circuitry acquires a processing result from the dialogue system by inputting the input information and the target dialogue state into the dialogue system. The processing circuitry generates first update information to update a display of the processing result.

Description

    CROSS-REFERENCE TO RELATED APPLICATIONS
  • This application is a Continuation application of PCT Application No. PCT/JP2015/057359, filed Mar. 12, 2015, the entire contents of which are incorporated herein by reference.
  • FIELD
  • Embodiments described herein relate generally to a dialogue support apparatus, method and terminal.
  • BACKGROUND
  • In recent years, small mobile terminals, such as smartphones, are in widespread use, and in accordance therewith dialogue systems allowing natural speech inputs have received attention. The dialogue systems allowing natural speech inputs interpret a user's intention without the need for users to adapt their speech to the systems; thus, users do not have to use predefined phrases, and they can give instructions to the system with natural expressions. In such dialogue systems, if conditions for a search performed during a past dialogue are changed to perform a new search, it is necessary to undo a dialogue.
  • To undo a dialogue, there is a method of selecting a dialogue state to be applied depending on whether or not the search conditions are exclusive, without a need for a user to say “undo”.
  • BRIEF DESCRIPTION OF THE DRAWINGS
  • FIG. 1 is a conceptual diagram showing an example of a dialogue system on which an embodiment is based.
  • FIG. 2 is a block diagram of a dialogue support apparatus according to the first embodiment.
  • FIG. 3 is a drawing showing an example of a dialogue information table stored in a dialogue information storage.
  • FIG. 4 is a flowchart showing the operation of the dialogue support apparatus when receiving a user's utterance.
  • FIG. 5 is a flowchart showing the operation of the dialogue support apparatus when receiving a user's operation.
  • FIG. 6 is a drawing showing a specific example of a dialogue between a user and the dialogue system according to the first embodiment.
  • FIG. 7 is a drawing showing an example of a user interface according to the first embodiment.
  • FIG. 8 is a drawing showing a display example of a processing result of a user's utterance U2.
  • FIG. 9 is a drawing showing a dialogue information table to which dialogue information related to a user's utterance U3 is added.
  • FIG. 10 is a drawing showing a display example of a processing result of a user's speech U3.
  • FIG. 11 is a drawing showing a display example in which two search results are displayed side by side.
  • FIG. 12 is a drawing showing a dialogue information table to which dialogue information related to a user's utterance U5 is added.
  • FIG. 13 is a drawing showing a display example of a processing result of a user's utterance U5.
  • FIG. 14 is a drawing showing a specific example of a dialogue between a user and a dialogue system according to the second embodiment.
  • FIG. 15 is a drawing showing a dialogue information table to which dialogue information related to a user's utterance U13 is added.
  • FIG. 16 is a drawing showing a display example of a processing result of a user's utterance U13.
  • FIG. 17 is a drawing showing a dialogue information table to which dialogue information related to a user's utterance U14 is added.
  • FIG. 18 is a drawing showing an example of a user interface according to the second embodiment.
  • FIG. 19 is a drawing of a display example of the user interface when a dialogue state switching button is pressed down.
  • DETAILED DESCRIPTION
  • However, with the aforementioned procedure, a latest user's utterance is always applied to one dialogue state. Accordingly, users need to input almost the same conditions many times when comparing and evaluating items, such as trips and clothing at an online store on the Internet, for example, and it is inconvenient for users.
  • In general, according to one embodiment, a dialogue support apparatus includes a processing circuitry. The processing circuitry receives input information indicating a user's intention. The processing circuitry determines at least one target dialogue state among dialogue histories related to dialogue states indicating dialogue results between a user and a dialogue system which estimates the user's intention and performs processing in accordance with the user's intention, the target dialogue state being a target of instructions from the user. The processing circuitry acquires a processing result from the dialogue system by inputting the input information and the target dialogue state into the dialogue system. The processing circuitry generates first update information to update a display of the processing result.
  • Hereinafter, a dialogue support apparatus, method, program and terminal according to the present embodiment will be described in detail with reference to the accompanying drawings. In the embodiments described below, elements assigned with the same reference symbols perform the same operations, and redundant descriptions such elements will be omitted as appropriate.
  • First Embodiment
  • An example of a dialogue system on which the embodiment is based will be explained with reference to the conceptual diagram shown in FIG. 1.
  • A dialogue system 100 shown in FIG. 1 includes a terminal 101 and a dialogue processing server 102. The terminal 101 is a mobile terminal such as a tablet computer or a smartphone used by a user 103, for example. In the present embodiment, it is assumed that the user 103 inputs speech to a client application installed onto the terminal 101 and speech recognition is performed to acquire a speech recognition result; however, similar processing can be performed for a case of text input. The terminal 101 sends the speech recognition result to the dialogue processing server 102.
  • The dialogue processing server 102 is connected to the terminal 101 via network 104, and receives the speech recognition result from the terminal 101. The dialogue processing server 102 operates as a dialogue system, which estimates the user's intention from the speech recognition result, performs processing in accordance with the intention, and performs dialogue processing of the speech recognition result. The dialogue processing server 102 sends a processing result of the dialogue processing to the terminal 101.
  • Next, the dialogue support apparatus according to the present embodiment will be described with reference to the block diagram shown in FIG. 2.
  • The dialogue support apparatus 200 according to the present embodiment includes an operation receiver 201 (a second receiver), a dialogue receiver 202 (a first receiver), a dialogue information storage 203, a target determiner 204, a request processor 205, a dialogue information updater 206, and a display updater 207.
  • In FIG. 1, the dialogue support apparatus 200 according to the present embodiment may be included in the dialogue processing server 102, or in a server different from the dialogue processing server 102. If the dialogue support apparatus 200 is included in the dialogue processing server 102 or in a different server, the terminal 101 should include a speech acquisition unit, such as a microphone, a display unit which displays a user interface, and a communication unit which performs data communication. Furthermore, the dialogue support apparatus 200 may be included in the terminal 101.
  • The operation receiver 201 receives a user's operation of the screen, and acquires an operation target and operation detail. As the process of acquiring the operation target and the operation detail, processing on a user interface which is common in a tablet device may be adopted, and a detailed description of such processing is therefore omitted herein.
  • The dialogue receiver 202 receives a user's utterance as an audio signal, performs speech recognition on the audio signal, and receives text of the speech recognition result as input information indicating a user's intention. For example, a user's utterance input to a microphone installed onto the terminal 101 shown in FIG. 1 may be received as an audio signal. The speech recognition processing may be performed using a speech recognition server (not shown in the drawings) on so-called cloud computing, or using a speech recognition engine installed onto the terminal 101. The dialogue receiver 202 may receive, as user input information, text that is directly input by the user using a keyboard.
  • The dialogue information storage 203 stores, as a form of table, dialogue information including a dialogue state indicating a dialogue result between the dialogue system and the user. The table also presents a dialogue history, which is a history of dialogue states, by identifying the dialogue information in a chronological manner. The details of the dialogue information will be described later with reference to FIG. 3.
  • The target determiner 204 receives input information from the dialogue receiver 202, refers to dialogue information stored in the dialogue information storage 203, and determines a dialogue state as a target dialogue state from the dialogue history, in accordance with an instruction from the user.
  • The request processor 205 receives the dialogue state and the input information from the target determiner 204, and inputs the dialogue state and the input information to the dialogue processing system, and sends them to the dialogue processing server 102 in FIG. 1 in the example described herein. The request processor 205 acquires a processing result which is a result of dialogue processing at the dialogue processing server 102.
  • The dialogue information updater 206 acquires an operation target and operation details from the operation receiver 201 and a processing result from the request processor 205, and updates dialogue information stored in the dialogue information storage 203.
  • The display updater 207 receives the processing result, the operation target, and the operation details from the dialogue information updater 206, and generates update information to update the display of the processing result on the user interface. The user interface is a screen displayed on a display of the terminal 101, for example, and it is updated based on the update information, as needed.
  • Next, the dialogue information stored in the dialogue information storage 203 will be described with reference to FIG. 3.
  • In the table 300 shown in FIG. 3, a dialogue state identifier 301, a dialogue sequence identifier 302, a dialogue state name 303, a display status flag 304, a time 305, and system response data 306 are associated each other and stored as dialogue information 307.
  • The dialogue state identifier 301 is an identifier given to each dialogue state. The dialogue sequence identifier 302 is an identifier given to a group of dialogue states related to each other in terms of a dialogue. It is assumed that a rule of giving the dialogue sequence identifier 302 is predefined by the dialogue processing server 102; however, a rule that is determined on the terminal side may be used. As a predefined rule, for example, the same identifier may be given to a dialogue state included in the same dialogue scenario.
  • The dialogue state name 303 is a name which is given to a dialogue state managed by the dialogue processing server 102. The display status flag 304 is a flag indicating how a dialogue state is displayed on the user interface of a client (herein, the screen of the terminal 101). In the present embodiment, the display status flag 304 “0” indicates that a processing result is not being displayed, the flag “1” indicates that a processing result is being normally displayed on the user interface, and the flag “2” indicates that a processing result is being displayed in a thumbnail on the user interface. A normal display means a display in a size that can be visually identified when the user considers the details of a processing result. A thumbnail display is a display smaller than the normal display, and it may simply be a display reduced in size, or may only show character strings including search keywords, as long as the user can understand their intended conditions from the display and the display does not interfere with the normal display.
  • The time 305 is a time when a system response including a processing result is acquired from the dialogue processing server 102. The system response data 306 indicates a body of response data from the dialogue processing server 102 in response to a request from the request processor 205. The response data body includes speech synthesis text, XML of a search result, and data in a JSON format, for example.
  • Specifically, the dialogue state identifier 301 “1”, the dialogue sequence identifier 302 “1”, the dialogue state name 303 “Search_spot (Kyoto)”, the display status flag 304 “0”, the time 305 “2014/07/07 12:05:11” and the system response data 306 are associated with each other and stored as dialogue information 307.
  • Next, the operation of the dialogue support apparatus 200 in the case of receiving utterances from a user will be explained with reference to the flowchart shown in FIG. 4. Herein, suppose processing of a user's utterance is performed every time.
  • In step S401, the dialogue receiver 202 receives a user's utterance and generates text as a result of the speech recognition on the utterance.
  • In step S402, the target determiner 204 specifies an utterance which is a target of a request from the user. In the present embodiment, if the display status flag included in the dialogue information is “1”, in other words, the dialogue state which is in a normal display is determined as a dialogue state targeted by the user, then the dialogue state with the display status flag “1” is acquired as a target dialogue state.
  • In step S403, the request processor 205 sends, as a processing request, text acquired by performing speech recognition on the utterance and the target dialogue state to the dialogue processing server 102.
  • In step S404, the request processor 205 receives from the dialogue processing server 102 a processing result in response to the processing request.
  • In step S405, the dialogue information updater 206 updates the dialogue information stored in the dialogue information storage 203 based on the processing result.
  • In step S406, the display updater 207 generates update information to update the screen based on the updated dialogue information. The operation of the dialogue support apparatus 200 is thus finished.
  • Next, the operation of the dialogue support apparatus 200 in the case of receiving an operation from a user will be explained with reference to the flowchart shown in FIG. 5. Herein, the processing is performed every time a user's operation is acquired.
  • In step S501, the operation receiver 201 receives a user's operation and determines a dialogue state which is an operation target.
  • In step S502, the operation receiver 201 determines, as an operation detail, whether the user's operation is an operation associated with an enlarged display. If the operation is associated with an enlarged display, the process proceeds to step S503; if not, the process proceeds to step S504.
  • In step S503, the dialogue information updater 206 updates the display status flag to “1” for the dialogue information included in the dialogue state which is an operation target.
  • In step S504, the operation receiver 201 determines, as an operation detail, whether the user's operation is associated with a reduced display. If the operation is associated with a reduced display, the process proceeds to step S505; if not, the process is finished.
  • In step S505, the dialogue information updater 206 updates the display status flag to “2” for the dialogue information included in a dialogue state which is an operation target.
  • In step S506, the display updater 207 generates update information to update a screen based on the updated dialogue information. The operation of the dialogue support apparatus 200 in the case of receiving an operation from the user is thus finished by the above processing.
  • Next, a specific example of the dialogue support processing in the dialogue support apparatus 200 will be explained with reference to FIG. 6 to FIG. 13.
  • FIG. 6 is an example of a dialogue between the user and the dialogue system when the user who has not yet decided any specific details, such as a destination, etc., first searches and compares sightseeing spots in a candidate place, and then searches hotels near the candidate place. The user utterances (U1, U2, . . . ) and the system responses (S1, S2, . . . ) are listed in the chronological order of the utterances. If there is a user's operation which is input to the user interface, it is added as (UI1, . . . ) to the chronological order of the utterances.
  • In the following, a processing in line with the order of the user's utterances shown in FIG. 6 will be specifically explained.
  • Herein, an example of the user interface for the user dialogue and operation will be described with reference to FIG. 7.
  • The user interface 700 displays the display region 701 and the speech recognition button 702. The speech recognition button 702 is a button for a user to notify when to start speech recognition, and the button may be physically pressed, or may be displayed on the screen. Suppose the speech recognition processing begins when the user presses or touches the speech recognition button 702. It may be designed so that the speech recognition begins when the user speaks at a volume louder than a predetermined volume, without a need of pressing the speech recognition button 702.
  • In the following, suppose the display updater 207 directly updates the user interface based on generated update information.
  • <Processing of User's Utterance U1>
  • The user says the user's utterance U1, “Tell me sightseeing spots in Kyoto”. The dialogue receiver 202 receives the user's utterance U1 and performs speech recognition, and converts the user's utterance into text. Subsequently, the target determiner 204 refers to the dialogue information and determines whether there are any dialogue statuses displaying a processing result on the user interface 700. At the time of inputting the user's utterance U1, nothing is displayed on the user interface 700, and dialogue information is not stored in the dialogue information storage 203; thus, the target determiner 204 determines a dialogue state as “no target”.
  • The request processor 205 generates a processing request message including information indicating the user utterance U1 and “no target” as a dialogue state, and sends the processing request message to the dialogue processing server 102.
  • Since there is no dialogue state as a target of the user utterance U1, the dialogue processing server 102 searches for a dialogue scenario that is capable of receiving a user's utterance. Herein, suppose that a sightseeing spots search process has been performed. The dialogue processing server 102 sends a response message including a processing result of the sightseeing spot search process to the request processor 205. The response message includes a dialogue state identifier, a dialogue sequence identifier, a dialogue state name, and a processing time, which are elements of dialogue information, in addition to the text of the user's utterance, and the processing result of the sightseeing spot search process. Herein, since a dialogue state is an initial state, the dialogue state identifier is “1”, the dialogue sequence identifier “1”, and “the dialogue state name is “Search_spot (Kyoto)”.
  • The request processor 205 receives a response message from the dialogue processing server 102. The dialogue information updater 206 updates dialogue information to be stored in the dialogue information storage 203, using the text of the user's utterance UI included in the response message acquired from the request processor 205, the dialogue state identifier corresponding to the processing result corresponding to the user's utterance U1, the dialogue sequence identifier, and the processing time. Herein, the processing result of the sightseeing spot search process is displayed, and the display status flag is set to “1”. In other words, the dialogue state identifier 301 “1”, the dialogue sequence identifier 302 “1”, the dialogue state name “Search_spot (Kyoto)”, the display status flag 304 “1”, the time 305 “2014/07/07 12:05:11”, and the system response data 306 are associated with each other and stored as dialogue information 307.
  • The display updater 207 displays the processing result of the user's utterance U1 in a normal manner on the display region 701. Along with the display, the system response S1 from the dialogue system, “Sightseeing spots in Kyoto are displayed” may be output by audio, or may be displayed on a screen. Herein, the processing result at the dialogue system for the user's utterance U1 is displayed on the screen as shown in FIG. 7, and the search result for the sightseeing spots in Kyoto are displayed in a normal manner.
  • <Processing of User's Utterance U2>
  • Next, the user speaks a new user's utterance U2, “Narrow down to the Arashiyama area”.
  • The dialogue receiver 202 receives the user's utterance U2 and performs speech recognition, and converts the user's utterance into text. The target determiner 204 refers to the dialogue information stored in the dialogue information storage 203, and determines a dialogue state as a target of the user's utterance U2. The dialogue state to become a target is a dialogue state with the display status flag “1”, and herein, the dialogue state with the dialogue status identifier “1” is determined as a target dialogue state.
  • The request processor 205 generates a processing request message including information indicating the user utterance U2 and the dialogue information in the target dialogue state, and sends the processing request message to the dialogue processing server 102.
  • In the dialogue processing server 102, an intention of the user's utterance U2 is analyzed and it is determined that the intention is a narrowed-down search request for the target dialogue state. For this determination, a general dialogue scenario determination for an audio dialogue may be used; the detailed explanation of the determination is thus omitted herein. Accordingly, as the user's utterance U2 is in the same utterance group, the dialogue sequence identifier associated with a dialogue state to be newly generated (i.e., a dialogue state identifier “2”) is set to “1”, which is the same as the dialogue sequence identifier associated with the target dialogue state. The dialogue processing server 102 performs processing (search processing in this example) based on the intention analysis result for the utterance and the dialogue scenario corresponding thereto, and sends a response message including a processing result of the search processing to the request processor 205.
  • In the same manner as in the case of the user's utterance U1, the dialogue information updater 206 adds and updates the text of the user's utterance U2 and the information related to the processing result for the user's utterance U2 as dialogue information, in accordance with the content of the response message acquired from the request processor 205. In other words, the dialogue state identifier 301 “2”, the dialogue sequence identifier 302 “1”, the dialogue state name “Search_spot (Arashiyama)”, the display status flag 304 “1”, the time 305 “2014/07/07 12:05:45”, and the system response data 306 are associated with each other and stored as dialogue information 307.
  • Since the search result displayed in the display region 701 is a processing result for the user's utterance U2, the display status flag 304 associated with the user's utterance U1 is also updated accordingly. In other words, because the dialogue information with the dialogue state identifier 301 “1” will no longer be displayed on the screen, the display status flag 304 is updated from “1” to “0”.
  • The display updater 207 displays the processing result for the user's utterance U2 in a normal manner. A display example of the processing result for the user's utterance U2 is shown in FIG. 8. The system response S2 “These are sightseeing spots in the Arashiyama area” is output, and the sightseeing spots in the Arashiyama area are displayed in a normal display.
  • <Processing of User's Utterance U3>
  • Next, the user speaks a new user's utterance U3 “Change the location to Izu”.
  • The dialogue receiver 202 receives the user's utterance U3 and performs speech recognition, and converts the user's utterance into text. The target determiner 204 refers to dialogue information stored in the dialogue information storage 203, and determines, as a target dialogue state, the dialogue information with the dialogue state identifier “2” associated with the display status flag “1”.
  • The request processor 205 generates a processing request message including information indicating the user utterance U3 and “no target” as a dialogue state, and sends the processing request message to the dialogue processing server 102.
  • In the dialogue processing server 102, the intention of the user's utterance U3 is analyzed, and the intention is determined as a (new) sightseeing spot search request with a change of location, not a narrowed-down search. A general dialogue scenario determination may be used to make this determination as well. In this case, as the user's utterance U3 is not in the same utterance group, a new dialogue sequence identifier “2” is given to a dialogue state (a dialogue state identifier “3”) which is newly generated as a result of processing the user's utterance U3. The dialogue processing server 102 performs processing based on the result of the intention analysis for the utterance and the dialogue scenario, and sends a response message including a processing result of the search process to the request processor 205.
  • The dialogue information updater 206 adds and updates the text of the user's utterance U3 and the information related to the processing result for the user's utterance U3 as dialogue information, in accordance with the content of the response message acquired from the request processor 205. In other words, the dialogue state identifier 301 “3”, the dialogue sequence identifier 302 “2”, the dialogue state name “Search_spot (Izu)”, the display status flag 304 “1”, the time 305 “2014/07/07 12:06:20”, and the system response data 306 are associated with each other and stored as dialogue information 307.
  • At this time, the dialogue information updater 206 determines whether or not the dialogue sequence identifier of the dialogue state currently with the display status flag “1” is the same as the dialogue sequence identifier of the dialogue state which is about to be added. If they are not the same, the current display status flag “1” is changed to “2”, and the display status flag of the dialogue state which is newly added is set to “1”.
  • Herein, the dialogue sequence identifier of the dialogue information with the dialogue state identifier “2”, which is associated with the display status flag “1”, is “1”, and the dialogue sequence identifier of the new dialogue state is “2”, the display status flag of the dialogue information with the dialogue state identifier “2” is changed from “2” to “1”, and the display status flag of the newly-added dialogue state is set to “1”.
  • The dialogue information table of the dialogue information storage 203 at the time of adding the dialogue information associated with the user's utterance U3 is shown in FIG. 9.
  • As shown in the dialogue information 901 in FIG. 9, the display status flag 304 of the dialogue information with the dialogue state identifier 301 “2” is changed to “2”, and the display status flag 304 of the dialogue information with the dialogue state identifier 301 “3” is set to “1”.
  • The display updater 207 generates a thumbnail of the search result corresponding to the dialogue state (the dialogue state identifier 2) for which the display status flag has been changed from “1” to “2”, and displays the thumbnail.
  • A display example of the processing result for the user's utterance U3 is shown in FIG. 10.
  • As shown in FIG. 10, the processing result for the user's utterance U3 is displayed as a normal display in the display region 701, and the system response S3 “These are the sightseeing spots in Izu” is output. The search result display corresponding to the user's utterance U2 is displayed as a thumbnail display in a thumbnail display region 1001 at the top right of the display region 701. A display method is not limited to a thumbnail display, and may be any display method as long as a search result is readily accessible, such as a tab display on the screen.
  • <Processing of User's Utterance U4>
  • Next, the user speaks a new user's utterance U4 “Narrow down the location to the Izu Kogen area”.
  • Since the processing for the user's utterance U4 is the same as the processing for the user's utterance U2 and the system response S2, except for the words, the explanation of the process is omitted herein.
  • <Processing of User's Utterance UI1>
  • Next, suppose a drag operation on the screen is performed by a user.
  • For the drag operation herein, an operation of dragging the thumbnail in the thumbnail display region 1001 on the user interface shown in FIG. 10 toward the center of the screen is assumed. If the drag operation is performed, the operation receiver 201 specifies that the operation target is a thumbnail, and generates information indicating that the thumbnail is dragged to or in the proximity of the center of the screen. It is determined that the drag operation is an enlarged display instruction based on the thumbnail and the information indicating that the thumbnail has been dragged. The dialogue information updater 206 updates the display status flag “2” of the dialogue information with the dialogue state identifier “2”, with which the display status flag “2” is associated, to the display status flag “1” based on the information from the operation receiver 201. The display updater 207 updates the user interface so as to display the search result of the dialogue information in a thumbnail display in a normal display, side by side with the search result which is currently displayed in the display region 701.
  • The display example in which two search results are displayed side by side is shown in FIG. 11.
  • As shown in FIG. 11, the search result screen 1101 of “Sightseeing spots in Izu Kogen”, and the search result display “1102” of “Sightseeing spots in Arashiyama” which have been in a thumbnail display, are horizontally displayed side by side in the same size. They may also be displayed in a vertical direction. In a case where three or more search results are displayed side by side, such as a case where one search result is displayed and two thumbnails are dragged to the center, the search results may be displayed in a similar manner.
  • Displaying in such a side-by-side manner makes it easy for a user to compare result screens. If the search results displayed side by side are changed, a search result displayed in the display region 701 can be dragged toward the outside of the user interface. Thus, the operation receiver 201 determines that the drag operation is a reduced display, and the dialogue information updater 206 updates the display status flag “1” of the dialogue information which is a target of the drag operation to “2”. The display updater 207 may change the display of the dragged search result to a thumbnail display.
  • <Processing of User's Utterance U5>
  • Next, the user speaks a new user's utterance U5, “Show me hotels”.
  • The target determiner 204 refers to dialogue information stored in the dialogue information storage 203, and determines, as a target dialogue state, the dialogue information with the dialogue state identifiers “2” and “4” associated with the display status flag “1”.
  • The request processor 205 generates a processing request message including information indicating the user utterance U5 and dialogue information with the dialogue state identifiers “2” and “4” to the dialogue processing server 102.
  • In the dialogue processing server 102, an intention of the user's utterance U5 is analyzed and it is determined that the intention of the utterance is a hotel search. The dialogue processing server 102 searches for hotels for each of the dialogue states respectively with the dialogue state identifiers “2” and “4”, and sends a response message including a process result of the search processing to the request processor 205.
  • The dialogue information updater 206 adds and updates the text of the user's utterance U5 and the information related to the processing result for the user's utterance U5 as dialogue information, in accordance with the content of the response message acquired from the request processor 205. In other words, the dialogue state identifier 301 “5”, the dialogue sequence identifier 302 “1”, the dialogue state name “Search_hotel (Arashiyama)”, the display status flag 304 “1”, the time 305 “2014/07/07 12:07:50”, and the system response data 306 are associated with each other and stored as dialogue information 307, and the dialogue state identifier 301 “6”, the dialogue sequence identifier 302 “2”, the dialogue state name “Search_hotel (Izu Kogen)”, the display status flag 304 “1”, the time 305 “2014/07/07 12:07:50”, and the system response data 306 are associated with each other and stored as dialogue information 307. The display status flag “1” of the dialogue information with the dialogue state identifier “2” and the display status flag “i” of the dialogue information with the dialogue state identifier “4” are changed to “0”.
  • Herein, the dialogue information table of the dialogue information storage 203 at the time of adding the dialogue information associated with the user's utterance U5 is shown in FIG. 12.
  • Next, the display example of the processing result for the user's utterance U5 is shown in FIG. 13.
  • As shown in FIG. 13, the search result 1301 for “Hotels in Izu Kogen” and the search result 1302 for “Hotels in Arashiyama” are displayed. Thus, determining the dialogue state in a normal display as a target dialogue state makes it easy to select a dialogue state which the user wants to continue, and the user's inputting an utterance under one condition makes it possible to obtain a processing result for a plurality of dialogue states.
  • The target determiner 204 determines a dialogue state with the display status flag 304 “1” as a target dialogue state; however, a dialogue state designated by a user touching a dialogue state in a thumbnail display or in a normal display on the screen may be determined as a target dialogue state, regardless of a thumbnail display or a normal display. In this case, a designated flag may be newly set for the dialogue information of the dialogue state selected by a user, and the dialogue state for which the designated flag is set should be sent to the dialogue processing server. A plurality of dialogue states designated by a user's utterance may be determined as target dialogue states. When a target dialogue state is designated by an utterance, a dialogue state in which a match between the text of an utterance obtained by performing speech recognition and a keyword included in a dialogue state name can be found may be determined as a target dialogue state, for example.
  • According to the foregoing first embodiment, one or more dialogue states for which a user wants to make a request can be easily designated by determining a target dialogue state based on a user's utterance and operation, thereby improving the convenience of an audio dialogue when searching for a program starring an actor whom a user wants to watch, or when comparing and deciding travel destinations, electric home appliances, clothing, and so on at an online store on the Internet. For example, there is no need to input a redundant utterance, as shown below, when searching for a drama:
  • User: “Show me a drama from yesterday.”
  • System: “There are 10 dramas from yesterday.” - - - (1)
  • User: “How about variety shows?”
  • System: “There are 25 programs.” - - - (2)
  • User: “Narrow down to a program starring [actor A].”
  • System: “There are two variety shows starring [actor A] from yesterday.” - - - (3)
  • User: “Search for a drama from yesterday.”
  • System: “There are 10 dramas from yesterday.” - - - (4)
  • User: “Narrow down to a program starring [actor A].”
  • System: “There is one variety show starring [actor A] from yesterday.” - - - (5)
  • (The user looks and compares the result screen (3) and the result screen (5).)
  • User: “Play XX.”
  • Thus, according to the first embodiment, a smoother dialogue can be supported.
  • Second Embodiment
  • In the second embodiment, suppose a user corrects content of a previous utterance during an audio dialogue.
  • As the dialogue support apparatus according to the second embodiment has the same configuration as that of the first embodiment, except for the operation of the request processor 205, the explanation is omitted herein.
  • The request processor 205 according to the second embodiment operates in almost the same manner as the request processor 205 according to the first embodiment; however, if a user's utterance changes a condition of a dialogue state included in the dialogue history, the request processor 205 according to the second embodiment is different from the request processor 205 according to the first embodiment in respect of acquiring, from the dialogue processing server 102, a dialogue state to which user's utterances after a dialogue state which is a target of change are reflected among the dialogue states included in the dialogue history.
  • A dialogue example according to the second embodiment is explained with reference to FIG. 14.
  • FIG. 14 is an example of a dialogue between a user and a dialogue system when a user schedules a recording of a TV program. The user inputs an utterance to request scheduling a recording, and a dialogue is carried out along a dialogue scenario wherein the dialogue system asks the user about a broadcast date, a start time, a broadcast channel, and a program title to specify a program and schedule a recording of the program.
  • In the following, a processing in line with the order of the user's utterances shown in FIG. 14 will be specifically explained.
  • <Processing of User's Utterance U10>
  • The user says the user's utterance U10, “I want to schedule a recording”.
  • At the time of inputting the user's utterance U10, nothing is displayed on the user interface 700, and the same processing as the processing of the user's utterance U10 in the first embodiment is performed; thus, the explanation is omitted herein.
  • Dialogue information, the dialogue state identifier 301 “10”, the dialogue sequence identifier 302 “5”, the dialogue state name “Record_TV ( )”, the display status flag 304 “1”, the time 305 “2014/07/07 12:05:45”, and the system response data 306 are associated with each other and stored in the dialogue information storage 203 as dialogue information 307. A result related to the preparation of the scheduled recording processing is displayed in the search result region, and the display status flag is set to “1”.
  • The screen updater 207 displays a processing result related to the preparation of the scheduled recording processing on the display region 701, and outputs a system response S10 from the system, “When is the program broadcast?”.
  • <Processing of User's Utterance U11>
  • Next, the user speaks a user's utterance U11 “July 10th” as a response to the system response S10, “When is the program broadcast?”.
  • The dialogue receiver 202 receives the user's utterance U11 and performs speech recognition, and converts the user's utterance into text. The target determiner 204 selects a dialogue state with the display status flag “1” as a target dialogue state.
  • The request processor 205 generates a request message which includes the user's utterance U11 and the dialogue information associated with the dialogue state identifier “10” as the target dialogue state. In the dialogue processing server 102, an intention of the user's utterance U11 is analyzed and it is determined that this utterance is a response to the inquiry from the system in the target dialogue state (the dialogue state identifier “10”). The dialogue processing server 102 sets a dialogue sequence identifier of the newly-generated dialogue state (the dialogue state identifier “11”) to be the same as the dialogue information with the dialogue state identifier “10”, and sends the dialogue sequence identifier to the request processor 205 as a response message.
  • The dialogue information updater 206 adds and updates the information related to the processing result for the user's utterance U11 as dialogue information, in accordance with the content of the response message acquired from the request processor 205. In other words, the dialogue state identifier 301 “11”, the dialogue sequence identifier 302 “5”, the dialogue state name “Record_TV (date: July 10)”, the display status flag 304 “1”, the time 305 “2014/07/07 12:05:45”, and the system response data 306 are associated with each other and stored as dialogue information 307.
  • Since the search result displayed on the user interface is a processing result for the user's utterance U11, the display status flag 304 is also updated accordingly.
  • The display updater 207 displays the processing result for the user's utterance U11 in a normal manner, and outputs the system response S11, “From what time does it start?”.
  • The user's subsequent utterance U12 “From nine p.m.”, the system response S12 “What is the broadcast channel?”, the user's utterance U13 “Channel XX”, and the system response S13 “What is the program title?” can be processed in a manner similar to the foregoing processing; the explanation is omitted herein.
  • Next, the dialogue information table of the dialogue information storage 203 at the time of adding the dialogue information associated with the user's utterance U13 is shown in FIG. 15.
  • As shown in FIG. 15, each dialogue information 307 is stored similar to the dialogue information shown in FIG. 3. As the dialogue proceeds, the dialogue information containing more conditions for recording processing is stored.
  • Next, the display example of the processing result for the user's utterance U13 is shown in FIG. 16.
  • As shown in FIG. 16, the broadcast date, the start time, and the broadcast channel are input to schedule a recording of a program.
  • Herein, suppose the user inputs the user's utterance U14 “Change the broadcast date to July 11th” in response to the system response S13.
  • The target determiner 204 refers to dialogue information stored in the dialogue information storage 203, and determines, as a target dialogue state, the dialogue information with the dialogue state identifier “13” associated with the display status flag “1”.
  • The request processor 205 generates a request message which includes the user's utterance U14 and the dialogue information of the target dialogue state. In the dialogue processing server 102, an intention of the user's utterance U14 is analyzed and it is determined that this utterance is an utterance to change the broadcast date, not an utterance in response to the system response S13. Thus, the dialogue processing server 102 interrupts the ongoing dialogue of recording scheduling for the broadcast date of July 10th, and generates a state of a dialogue of a new recording scheduling for the broadcast date of July 11th.
  • At this time, if the user's utterance which was input before the change of the broadcast date can be utilized in a dialogue after the change, the request processor 205 acquires from the dialogue processing server 102 a dialogue state utilizing the dialogue before the change. In the present embodiment, the first system's utterance after the system's utterance related to the broadcast date is the system response S11 “From what time does it start?”.
  • In response to the system response S11, the user has already answered in the user's utterance U12 “From nine p.m.”; accordingly, the user's utterance U12 is utilized. In response to the next system response S12 “Which is the broadcast channel?”, the user has also already answered in the user's utterance U13 “Channel XX”; thus, the user's utterance U13 is utilized.
  • In response to the subsequent system response U13 “What is the program title?”, the user has not yet answered; thus, a dialogue state that can be used among the dialogue history can be determined as the user's utterance U12 and the user's utterance U13.
  • The request processor 205 receives from the dialogue processing server 102 a response message which includes both of a dialogue state up to the change of the broadcast date to July 11th, as well as a dialogue state up to the change of the broadcast date to July 11th and the setting of subsequent user utterance U13 “Channel XX”.
  • The dialogue information updater 206 acquires a response message which includes the two dialogue states, and updates the dialogue information.
  • The dialogue information table of the dialogue information storage 203 at the time of adding the dialogue information associated with the user's utterance U14 is shown in FIG. 17.
  • Two of dialogue information 1701 and 1702 are recorded as a processing result of the user's utterance U14 “Change the broadcast date to July 11th”. For the dialogue information 1701, only the broadcast date is changed, and the other user's utterances are not utilized; thus, the dialogue information 1701 is stored as a dialogue state name “Record_TV (date: July 11)”. On the other hand, the dialogue information 1702 is in a dialogue state in which the broadcast date is changed and usable dialogues in the dialogue history are used; thus, the dialogue information 1702 is stored with the dialogue state name “Record_TV (date: July 11/stime: 21:00/channel: Channel XX)”. The display status flag “3” indicates a dialogue state is replaceable with the dialogue state currently displayed in the display region 701.
  • Next, a display example of the user interface when the dialogue information 1701 is displayed is shown in FIG. 18.
  • In the display region 701 on the user interface, the processing result related to the scheduled recording in which the date is changed to July 11th in accordance with the user's utterance is displayed in a normal display, and the system response S11 “From what time does it start?” is output. A dialogue state switching button 1801 is newly displayed at the top right of the screen. The dialogue state switching button 1801 is a button for making a dialogue state corresponding to the dialogue information 1702 sent from the dialogue processing server 102 effective.
  • Next, a display example of the user interface when the dialogue state switching button 1801 is pressed down is shown in FIG. 19.
  • As shown in FIG. 19, the change of the broadcast date has been completed, and the state display 1901 in which the dialogue advances to the broadcast date “July 11th”, the start time “21:00”, and the broadcast channel “Channel XX” is displayed. The screen updater 207 may delete the dialogue state switching button 1801 when a user's new utterance is displayed on the screen in the state shown in either FIG. 18 or 19.
  • According to the foregoing second embodiment, it is possible to select whether or not content of the other utterances can be utilized on the screen based on the dialogue history when content of an utterance is changed during a dialogue, thereby improving convenience of an audio dialogue and supporting a smoother dialogue.
  • The instructions included in the steps described in the foregoing embodiment can be implemented based on a software program. A general-purpose computer system may store the program beforehand and read the program in order to attain the same advantage as the above-described dialogue support apparatus. The instructions described in the above embodiments are stored in a magnetic disc (flexible disc, hard disc, etc.), an optical disc (CD-ROM, CD-R, CD-RW, DVD-ROM, DV±R, DVD±RW, Blu-ray (registered trademark) disc, etc.), a semiconductor memory, or a similar storage medium, as a program executable by a computer. As long as the storage medium is readable by a computer or by a built-in system, any storage format can be used. An operation similar to the operation of the dialogue support apparatus of the above-described embodiments can be realized if a computer reads a program from the storage medium, and executes the instructions written in the program on the CPU based on the program. The computer may, of course, acquire or read the program by way of a network.
  • Furthermore, an operating system (OS) working on a computer, database management software, middleware (MW) of a network, etc. may execute a part of processes for realizing the present embodiments based on instructions from a program installed from a storage medium onto a computer and a built-in system.
  • Furthermore, the storage medium according to the present embodiments is not limited to a medium independent from a system or a built-in system; a storage medium storing or temporarily storing a program downloaded through a LAN or the Internet, etc. is also included as the storage medium according to the present embodiments.
  • Furthermore, a storage medium is not limited to one; when the process according to the present embodiments is carried out in multiple storage media, these storage media are included as a storage medium according to the present embodiments, and can take any configuration.
  • The computer or built-in system in the present embodiments are used to execute each process disclosed in the present embodiments based on a program stored in a storage medium, and the computer or built-in system may be an apparatus consisting of a PC or a microcomputer, etc. or a system, etc. in which a plurality of apparatuses are connected through a network.
  • The computer adopted in the present embodiments is not limited to a PC; it may be a calculation processing apparatus, a processing circuitry, a microcomputer, etc. included in an information processor, and a device and apparatus that can realize the functions disclosed in the present embodiments by a program.
  • While certain embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the inventions. Indeed, the novel embodiments described herein may be embodied in a variety of other forms; furthermore, various omissions, substitutions, and changes in the form of the embodiments described herein may be made without departing from the spirit of the inventions. The accompanying claims and their equivalents are intended to cover such forms or modifications as would fall within the scope and spirit of the inventions.

Claims (20)

What is claimed is:
1. A dialogue support apparatus comprising:
a processing circuitry configured to:
receive input information indicating a user's intention;
determine at least one target dialogue state among dialogue histories related to dialogue states indicating dialogue results between a user and a dialogue system which estimates the user's intention and performs processing in accordance with the user's intention, the target dialogue state being a target of instructions from the user;
acquire a processing result from the dialogue system by inputting the input information and the target dialogue state into the dialogue system; and
generate first update information to update a display of the processing result.
2. The apparatus according to claim 1, wherein the processing circuitry is further configured to:
receive an operation of the user on a screen; and
determine a dialogue state designated by the operation as the target dialogue state.
3. The apparatus according to claim 1, further comprising a storage which stores dialogue information which includes an identifier of a dialogue state and a display flag indicating whether the dialogue state is in a normal display or in a thumbnail display,
wherein the processing circuitry determines a dialogue state for which the display flag indicates the normal display as the target dialogue state.
4. The apparatus according to claim 3, wherein the processing circuitry is further configured to update the dialogue information when the user's operation on the screen is an operation to change a display flag of the dialogue state.
5. The apparatus according to claim 3, wherein
the dialogue information further includes a dialogue sequence identifier which indicates a group of dialogue states related to a predetermined dialogue in a dialogue scenario, and
the processing circuitry generates second update information to display a displayed dialogue state in a thumbnail display when a dialogue sequence identifier of the displayed dialogue state is different from a dialogue sequence identifier of a current dialogue state.
6. The apparatus according to claim 1, wherein when a current dialogue state is to change a condition of a dialogue state included in the dialogue history, the processing circuitry acquires one or more dialogue states after the dialogue state which is a target of change included in the dialogue history, and
generates third update information to display the current dialogue state and the one or more dialogue states.
7. The apparatus according to claim 6, wherein the processing circuitry generates fourth update information to display a switch display for a user to select whether or not the one or more dialogue state is displayed in addition to the current dialogue state.
8. The apparatus according to claim 1, wherein the processing circuitry generates fifth update information to display processing results side by side when the processing results are generated respectively to a plurality of dialogue states.
9. The apparatus according to claim 1, wherein the input information is a speech recognition result of utterance of the user.
10. A dialogue support method comprising:
receiving input information indicating a user's intention;
determining at least one target dialogue state among dialogue histories related to dialogue states indicating dialogue results between a user and a dialogue system which estimates the user's intention and performs processing in accordance with the user's intention, the target dialogue state being a target of instructions from the user;
acquiring a processing result from the dialogue system by inputting the input information and the target dialogue state into the dialogue system; and
generating first update information to update a display of the processing result.
11. The method according to claim 10, further comprising:
receiving an operation of the user on a screen; and
determining a dialogue state designated by the operation as the target dialogue state.
12. The method according to claim 10, further comprising storing, in the storage, dialogue information which includes an identifier of a dialogue state and a display flag indicating whether the dialogue state is in a normal display or in a thumbnail display,
wherein the determining determines a dialogue state for which the display flag indicates the normal display as the target dialogue state.
13. The method according to claim 12, further comprising updating the dialogue information when the user's operation on the screen is an operation to change a display flag of the dialogue state.
14. The method according to claim 12, wherein
the dialogue information further includes a dialogue sequence identifier which indicates a group of dialogue states related to a predetermined dialogue in a dialogue scenario, and
the generating generates second update information to display a displayed dialogue state in a thumbnail display when a dialogue sequence identifier of the displayed dialogue state is different from a dialogue sequence identifier of a current dialogue state.
15. The method according to claim 10, wherein when a current dialogue state is to change a condition of a dialogue state included in the dialogue history, the acquiring acquires one or more dialogue states after the dialogue state which is a target of change included in the dialogue history, and
the generating generates third update information to display the current dialogue state and the one or more dialogue states.
16. The method according to claim 15, wherein the generating generates fourth update information to display a switch display for a user to select whether or not the one or more dialogue state is displayed in addition to the current dialogue state.
17. The method according to claim 10, wherein the generating generates fifth update information to display processing results side by side when the processing results are generated respectively to a plurality of dialogue states.
18. The method according to claim 10, wherein the input information is a speech recognition result of utterance of the user.
19. A non-transitory computer readable medium including computer executable instructions, wherein the instructions, when executed by a processor, cause the processor to perform a method that uses utterance intention of a user, the utterance intention being inferred from a first text representing results of speech recognition of user's utterances, the method comprising:
receiving input information indicating a user's intention;
determining at least one target dialogue state among dialogue histories related to dialogue states indicating dialogue results between a user and a dialogue system which estimates the user's intention and performs processing in accordance with the user's intention, the target dialogue state being a target of instructions from the user;
acquiring a processing result from the dialogue system by inputting the input information and the target dialogue state into the dialogue system; and
generating first update information to update a display of the processing result.
20. A terminal comprising:
a speech acquiring unit which acquires a user's utterance to obtain a speech recognition result;
a communicating unit which sends the speech recognition result to the dialogue support apparatus according to claim 1 to receive a processing result; and
a display unit which displays the processing result.
US15/421,125 2015-03-12 2015-03-12 Dialogue support apparatus, method and terminal Abandoned US20170364323A1 (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/JP2015/057359 WO2016143131A1 (en) 2015-03-12 2015-03-12 Interaction assistance device, method, and program, and terminal

Publications (1)

Publication Number Publication Date
US20170364323A1 true US20170364323A1 (en) 2017-12-21

Family

ID=56878564

Family Applications (2)

Application Number Title Priority Date Filing Date
US15/421,125 Abandoned US20170364323A1 (en) 2015-03-12 2015-03-12 Dialogue support apparatus, method and terminal
US15/672,993 Active US10248383B2 (en) 2015-03-12 2017-08-09 Dialogue histories to estimate user intention for updating display information

Family Applications After (1)

Application Number Title Priority Date Filing Date
US15/672,993 Active US10248383B2 (en) 2015-03-12 2017-08-09 Dialogue histories to estimate user intention for updating display information

Country Status (3)

Country Link
US (2) US20170364323A1 (en)
JP (1) JP6251450B2 (en)
WO (1) WO2016143131A1 (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6900996B2 (en) * 2017-03-31 2021-07-14 日本電気株式会社 Conversation support devices, conversation support methods, and programs
CN107577728B (en) * 2017-08-22 2020-06-26 北京奇艺世纪科技有限公司 User request processing method and device
KR102446961B1 (en) * 2018-03-08 2022-09-23 구글 엘엘씨 Mitigating client device lag when rendering remotely generated automated assistant content
JP7043593B2 (en) * 2018-04-23 2022-03-29 株式会社Nttドコモ Dialogue server
CN109887483A (en) * 2019-01-04 2019-06-14 平安科技(深圳)有限公司 Self-Service processing method, device, computer equipment and storage medium
CN111831795B (en) * 2019-04-11 2023-10-27 北京猎户星空科技有限公司 Multi-round dialogue processing method and device, electronic equipment and storage medium
JP7249203B2 (en) * 2019-05-22 2023-03-30 株式会社日立製作所 Dialogue policy migration device, dialogue policy migration method, and dialogue policy migration system
US11133006B2 (en) * 2019-07-19 2021-09-28 International Business Machines Corporation Enhancing test coverage of dialogue models
CN111639167B (en) * 2020-05-21 2024-04-16 北京声智科技有限公司 Task dialogue method and device
CN112000793B (en) * 2020-08-28 2022-08-09 哈尔滨工业大学 Man-machine interaction oriented dialogue target planning method
CN113553413A (en) * 2021-06-30 2021-10-26 北京百度网讯科技有限公司 Dialog state generation method and device, electronic equipment and storage medium

Family Cites Families (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH05120350A (en) 1991-10-24 1993-05-18 Nec Corp Picture retrieving device for interactive information presentation system
DE69232407T2 (en) * 1991-11-18 2002-09-12 Kabushiki Kaisha Toshiba, Kawasaki Speech dialogue system to facilitate computer-human interaction
JPH0863319A (en) 1994-08-24 1996-03-08 Hitachi Ltd Information processor
JP3765202B2 (en) 1999-07-09 2006-04-12 日産自動車株式会社 Interactive information search apparatus, interactive information search method using computer, and computer-readable medium recording program for interactive information search processing
US6993474B2 (en) * 2001-05-17 2006-01-31 Curry David G Interactive conversational speech communicator method and system
KR100457509B1 (en) 2001-07-07 2004-11-17 삼성전자주식회사 Communication terminal controlled through a touch screen and a voice recognition and instruction executing method thereof
WO2004092967A1 (en) * 2003-04-14 2004-10-28 Fujitsu Limited Interactive apparatus, interaction method, and interaction program
CN1842788B (en) * 2004-10-08 2012-04-04 松下电器产业株式会社 Dialog supporting apparatus, system and method
US8676796B2 (en) * 2005-05-26 2014-03-18 Carhamm Ltd., Llc Coordinated related-search feedback that assists search refinement
JP4197344B2 (en) * 2006-02-20 2008-12-17 インターナショナル・ビジネス・マシーンズ・コーポレーション Spoken dialogue system
US8488178B2 (en) * 2006-06-21 2013-07-16 Kyocera Document Solutions Inc. Image forming apparatus storage files with file attribution table having deletion date and flag
JP2008203559A (en) * 2007-02-20 2008-09-04 Toshiba Corp Interaction device and method
US8171431B2 (en) * 2007-10-05 2012-05-01 Microsoft Corporation Handle flags
JP4934104B2 (en) * 2008-06-06 2012-05-16 ソニー株式会社 Display control apparatus, display control method, and program
WO2010049514A2 (en) 2008-10-30 2010-05-06 Talkamatic Ab Dialog system
US20100201879A1 (en) * 2009-02-12 2010-08-12 Echostar Technologies L.L.C. Customizable Picture-in-Picture
JP5638896B2 (en) * 2010-09-21 2014-12-10 任天堂株式会社 Display control program, display control device, display control system, and display control method
US20120253789A1 (en) * 2011-03-31 2012-10-04 Microsoft Corporation Conversational Dialog Learning and Correction
JP5482968B2 (en) * 2012-01-18 2014-05-07 Jfeスチール株式会社 How to prevent yellowing of steel plate surface after pickling
JP5818753B2 (en) * 2012-08-13 2015-11-18 株式会社東芝 Spoken dialogue system and spoken dialogue method
KR101935039B1 (en) * 2012-09-11 2019-01-03 엘지전자 주식회사 Mobile terminal and method for controlling of the same
JP2014109897A (en) * 2012-11-30 2014-06-12 Toshiba Corp Information processing device and content retrieval method
JP2014109889A (en) * 2012-11-30 2014-06-12 Toshiba Corp Content retrieval device, content retrieval method and control program
JP2014203208A (en) * 2013-04-03 2014-10-27 ソニー株式会社 Information processing unit, information processing method, and computer program
US9875494B2 (en) * 2013-04-16 2018-01-23 Sri International Using intents to analyze and personalize a user's dialog experience with a virtual personal assistant
KR102141116B1 (en) * 2013-08-05 2020-08-04 삼성전자주식회사 Interface device and method supporting speech dialogue survice
US9214156B2 (en) * 2013-08-06 2015-12-15 Nuance Communications, Inc. Method and apparatus for a multi I/O modality language independent user-interaction platform
US10152987B2 (en) * 2014-06-23 2018-12-11 Google Llc Remote invocation of mobile device actions
JP2016062264A (en) * 2014-09-17 2016-04-25 株式会社東芝 Interaction support apparatus, method, and program
JP2016061970A (en) * 2014-09-18 2016-04-25 株式会社東芝 Speech dialog device, method, and program
JP6334815B2 (en) * 2015-03-20 2018-05-30 株式会社東芝 Learning apparatus, method, program, and spoken dialogue system

Also Published As

Publication number Publication date
US10248383B2 (en) 2019-04-02
WO2016143131A1 (en) 2016-09-15
JPWO2016143131A1 (en) 2017-07-06
US20170337036A1 (en) 2017-11-23
JP6251450B2 (en) 2017-12-20

Similar Documents

Publication Publication Date Title
US10248383B2 (en) Dialogue histories to estimate user intention for updating display information
EP3241213B1 (en) Discovering capabilities of third-party voice-enabled resources
AU2016211903B2 (en) Updating language understanding classifier models for a digital personal assistant based on crowd-sourcing
KR101870934B1 (en) Provides suggested voice-based action queries
US10043520B2 (en) Multilevel speech recognition for candidate application group using first and second speech commands
US11288321B1 (en) Systems and methods for editing and replaying natural language queries
KR20200007882A (en) Offer command bundle suggestions for automated assistants
US20140156277A1 (en) Information processing device and content retrieval method
KR102353285B1 (en) Autonomous delivery of post-search results, including interactive assistant context
CN111949240A (en) Interaction method, storage medium, service program, and device
US10749986B2 (en) Platform for interaction via commands and entities
JP2019192211A (en) Method and device for pushing information
KR20160032564A (en) Image Displaying Device, Driving Method of Image Displaying Device, and Computer Readable Recording Medium
US20200365139A1 (en) Information processing apparatus, information processing system, and information processing method, and program
US11340925B2 (en) Action recipes for a crowdsourced digital assistant system
JP2016062264A (en) Interaction support apparatus, method, and program
WO2016125342A1 (en) Information processing system and information processing device
US9781232B2 (en) Server apparatus and information processing apparatus
WO2016136208A1 (en) Voice interaction device, voice interaction system, control method of voice interaction device
CN115378751A (en) Control method of household appliance, household appliance and computer readable storage medium
WO2015006506A2 (en) Providing device-specific instructions in response to a perception of a media content segment

Legal Events

Date Code Title Description
AS Assignment

Owner name: KABUSHIKI KAISHA TOSHIBA, JAPAN

Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:FUJII, HIROKO;REEL/FRAME:041138/0668

Effective date: 20170131

STCB Information on status: application discontinuation

Free format text: EXPRESSLY ABANDONED -- DURING EXAMINATION