WO2018025668A1

WO2018025668A1 - Conversation processing device and program

Info

Publication number: WO2018025668A1
Application number: PCT/JP2017/026490
Authority: WO
Inventors: 高史小山; 佐知夫前田; 真人土居
Original assignee: ユニロボット株式会社
Priority date: 2016-08-02
Filing date: 2017-07-21
Publication date: 2018-02-08
Also published as: JP2018021987A

Abstract

The present invention provides a system that facilitates more natural conversation with a user. This conversation processing device comprises: a detection unit for detecting a statement made by a user; a selection unit that selects the running conversation application if the running conversation application can respond to the statement made by the user, and if the running conversation application cannot respond to the statement made by the user, selects another conversation application that can respond to the statement made by the user; and a response unit that responds to the statement made by the user by allowing the running conversation application to continue if the selected conversation application is the running conversation application, and by suspending the running conversation application and initiating another conversation application if the selected conversation application is another conversation application.

Description

Conversation processing apparatus and program

The present invention relates to a conversation processing device and a program.

Various systems for realizing conversations with users have been proposed.
Patent Document 1 Japanese Patent Laid-Open No. 2015-011621

Challenges to be solved

A system that can more naturally communicate with users is desired.

General disclosure

The conversation processing device according to an aspect of the present invention executes a plurality of conversation applications that respond to a user's speech according to a predetermined algorithm. The conversation processing device may include a detection unit that detects a user's speech. The conversation processing device selects a running conversation application when the running conversation application can respond to the user's speech, and responds to the user's speech when the running conversation application cannot respond to the user's speech. A selection unit for selecting another conversation application that can respond to the response may be provided. The conversation processing device continues the running conversation application if the selected conversation application is a running conversation application, and interrupts the running conversation application if the selected conversation application is another conversation application. A response unit that responds to the user's speech by executing another conversation application may be provided.

The plurality of conversation applications may include a plurality of specific conversation applications that continue the conversation with the user according to an algorithm until a predetermined condition is satisfied. The selection unit may select the specific conversation application being executed when the specific conversation application being executed can respond to the user's speech. When the specific conversation application being executed cannot respond to the user's speech, the selection unit may select another specific conversation application that can respond to the user's speech.

The plurality of conversation applications may further include a daily conversation application that executes one response to one utterance of the user. The selection unit may select the specific conversation application being executed when the specific conversation application being executed can respond to the user's speech. When the specific conversation application being executed cannot respond to the user's speech, the selection unit may select another specific conversation application that can respond to the user's speech. When the selection unit cannot select another specific conversation application that can respond to the user's utterance, the selection unit may select the daily conversation application.

The daily conversation application may execute one response to one user's speech according to the deep learning algorithm.

The conversation processing apparatus may further include a word information acquisition unit that acquires word information including at least one word extracted from the user's utterance detected by the detection unit. The conversation processing apparatus may further include a word list storage unit that stores a word list in which at least one word corresponding to a user's utterance that can be answered by the plurality of specific conversation applications is associated with the plurality of specific conversation applications. . The selection unit may select the specific conversation application being executed when at least one word included in the word information is registered in the word list in association with the specific conversation application being executed with reference to the word list. . The selection unit is configured such that at least one word included in the word information is not registered in the word list in association with the specific conversation application being executed and is registered in the word list in association with another specific conversation application. Other specific conversation applications registered in the word list may be selected.

The word list storage unit may store a word list for each of a plurality of specific conversation applications. The selection unit may select a specific conversation application with reference to a word list associated with the specific conversation application being executed.

The response unit is obtained from the user during the execution of the specific conversation application being executed when the specific conversation application being executed is interrupted and responding to the user's speech by executing another specific conversation application. The start position of the algorithm of the other specific conversation application may be determined based on the obtained information, and the other specific conversation application may be executed based on the determined start position.

The conversation processing device may further include an interruption state storage unit that stores the interruption state of the algorithm of the specific conversation application being executed when the specific conversation application being executed is interrupted. The response unit refers to the interruption state storage unit, identifies the interruption state of the algorithm of the specific conversation application being executed, and identifies the interruption that was interrupted first in response to the termination or interruption of another specific conversation application. The conversation application may be resumed based on the suspended state.

The conversation processing device may further include an ending unit that forcibly terminates the conversation application being executed when the user's utterance detected by the detection unit is a predetermined specific utterance.

The conversation processing device may further include an imaging unit that images the periphery of the conversation processing device. The conversation processing device may further include an infrared sensor that detects the presence of an object existing around the conversation processing device. The conversation processing device may further include an adjustment unit that adjusts the imaging range of the imaging unit so that the user's face is included in the imaging range of the imaging unit according to the detection result of the infrared sensor.

The program according to an aspect of the present invention is a program for causing a computer to execute a plurality of conversation applications that respond to a user's speech according to a predetermined algorithm. The program may cause the computer to execute a procedure for detecting a user's speech. The program selects a running conversation application if the running conversation application can respond to the user's speech, and responds to the user's speech if the running conversation application cannot respond to the user's speech. The computer may be caused to perform a procedure for selecting another conversation application that can respond. The program continues the running conversation application if the selected conversation application is a running conversation application, and interrupts the running conversation application if the selected conversation application is another conversation application, and others By executing the conversation application, the computer may execute a procedure for responding to the user's speech.

The above summary of the invention does not enumerate all the features of the present invention. A sub-combination of these feature groups can also be an invention.

It is a figure which shows an example of the system configuration | structure of a conversation processing system. It is a figure which shows an example of the functional block of a conversation processing apparatus. It is a figure which shows an example of a word list. It is a figure which shows an example of a user profile. It is a flowchart which shows an example of the procedure of the conversation process of a conversation processing apparatus. It is a figure which shows an example of a computer.

Hereinafter, the present invention will be described through embodiments of the invention. However, the following embodiments do not limit the invention according to the claims. In addition, not all the combinations of features described in the embodiments are essential for the solving means of the invention.

Various embodiments of the invention may be described with reference to flowcharts and block diagrams, where a block is either (1) a stage in a process in which the operation is performed or (2) an apparatus responsible for performing the operation. May represent a section of Certain stages and sections are implemented by dedicated circuitry, programmable circuitry supplied with computer readable instructions stored on a computer readable medium, and / or processor supplied with computer readable instructions stored on a computer readable medium. It's okay. Dedicated circuitry may include digital and / or analog hardware circuitry and may include integrated circuits (ICs) and / or discrete circuits. Programmable circuits include memory elements such as logical AND, logical OR, logical XOR, logical NAND, logical NOR, and other logical operations, flip-flops, registers, field programmable gate arrays (FPGA), programmable logic arrays (PLA), etc. Reconfigurable hardware circuitry, including and the like.

Computer readable media may include any tangible device capable of storing instructions to be executed by a suitable device, such that a computer readable medium having instructions stored thereon is specified in a flowchart or block diagram. A product including instructions that can be executed to create a means for performing the operation. Examples of computer readable media may include electronic storage media, magnetic storage media, optical storage media, electromagnetic storage media, semiconductor storage media, and the like. More specific examples of computer readable media include floppy disks, diskettes, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), Electrically erasable programmable read only memory (EEPROM), static random access memory (SRAM), compact disc read only memory (CD-ROM), digital versatile disc (DVD), Blu-ray (RTM) disc, memory stick, integrated A circuit card or the like may be included.

Computer readable instructions can be assembler instructions, instruction set architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state setting data, or object oriented programming such as Smalltalk, JAVA, C ++, etc. Including any source code or object code written in any combination of one or more programming languages, including languages and conventional procedural programming languages such as "C" programming language or similar programming languages Good.

Computer readable instructions may be directed to a general purpose computer, special purpose computer, or other programmable data processing device processor or programmable circuit locally or in a wide area network (WAN) such as a local area network (LAN), the Internet, etc. The computer-readable instructions may be executed to create a means for performing the operations provided via and specified in the flowchart or block diagram. Examples of processors include computer processors, processing units, microprocessors, digital signal processors, controllers, microcontrollers, and the like.

FIG. 1 shows an example of the system configuration of a conversation processing system according to this embodiment. The conversation processing system includes a conversation processing apparatus 100, a text conversion apparatus 200, and a morpheme analysis apparatus 300. The conversation processing device 100, the text conversion device 200, and the morpheme analysis device 300 are connected via a network 50. The conversation processing apparatus 100 responds to the user's utterance with voice, image, movement or the like.

The conversation processing apparatus 100 includes a microphone 101, a camera 102, a speaker 104, a display unit 105, a touch sensor 106, and the like. The conversation processing apparatus 100 detects the user's voice via the microphone 101. The conversation processing apparatus 100 detects a user's facial expression or the like via the camera 102. The conversation processing apparatus 100 transmits information by voice to the user via the speaker 104. The conversation processing apparatus 100 transmits information to the user as an image via the display unit 105. The conversation processing apparatus 100 may communicate with the user by means other than voice via the touch sensor 106 or the like.

The text conversion device 200 extracts words from the audio data provided from the conversation processing device 100 and generates text data. The text conversion device 200 returns the generated text data to the conversation processing device 100. The morpheme analyzer 300 performs morpheme analysis on the text data provided from the conversation processing device 100 to generate morpheme analysis data. The morpheme analyzer 300 returns the generated morpheme analysis data to the conversation processing device 100. The conversation processing apparatus 100 refers to the morphological analysis data and responds to the user's statement.

FIG. 2 shows an example of functional blocks of the conversation processing apparatus 100. The conversation processing apparatus 100 executes a plurality of conversation applications that respond to a user's speech according to a predetermined algorithm. The plurality of conversation applications may include a plurality of specific conversation applications that continue the conversation with the user according to an algorithm until a predetermined condition is satisfied. The specific conversation application may be a conversation application for achieving a specific purpose through conversation with the user. The plurality of specific conversation applications include a schedule conversation application that continues the conversation with the user until the user's desired schedule is registered. The plurality of specific conversation applications include a weather conversation application that continues the conversation with the user until providing weather information at a specific date and time at a specific location. The plurality of specific conversation applications include a recipe conversation application that continues the conversation with the user until a specific cooking recipe is provided. The plurality of specific conversation applications include a game conversation application that executes a game according to a predetermined rule through a conversation with a user.

The plurality of conversation applications further include a daily conversation application that executes one response to one utterance of the user. The daily conversation application may operate according to a different algorithm from the specific conversation application. The daily conversation application executes, for example, a response according to the user's characteristics according to a deep learning algorithm.

Suppose that the user speaks an unknown word to the conversation processing device 100 while the conversation processing device 100 is executing the daily conversation application. The algorithm of the daily conversation application may be designed so that the conversation processing apparatus 100 asks the user the meaning of the word. For example, a word unknown to the conversation processing apparatus 100 is defined as “_UNK”. For example, an algorithm for a daily conversation application may be designed to respond to a user's statement “I met _UNK today at XX” as “What kind of person is _UNK?”. With such a design, an algorithm may be designed so that an appropriate answer can be made even when a user speaks a word unknown to the conversation processing apparatus 100.

The plurality of conversation applications further include a system conversation application that performs various settings of the conversation processing apparatus 100 through conversation with the user. When the user wants to set the volume of the conversation processing apparatus 100 or communication setting, the user has a conversation with the conversation processing apparatus 100 through the system conversation application and performs various settings of the conversation processing apparatus 100.

If the conversation processing apparatus 100 determines that the response to the user's statement cannot be executed by the currently executing conversation application, the conversation processing apparatus 100 interrupts the currently executing conversation application. The conversation processing apparatus 100 selects an appropriate conversation application that can respond to the user's statement and responds to the user's statement.

The conversation processing apparatus 100 includes a microphone 101, a camera 102, a speaker 104, a display unit 105, a touch sensor 106, an infrared sensor 107, an actuator 108, a detection unit 110, an image processing unit 112, an audio control unit 114, a display control unit 116, and a sensor. A control unit 118 and an actuator control unit 119 are provided. The conversation processing apparatus 100 further includes an application execution unit 120, a transmission / reception unit 130, a word information acquisition unit 132, a selection unit 134, an application storage unit 140, a word list storage unit 142, and a user profile storage unit 144.

The microphone 101 detects the voice uttered by the user. The microphone 101 may be a directional microphone. The camera 102 images the environment around the conversation processing apparatus 100. For example, the camera 102 captures an image of the face of a user who has a conversation with the conversation processing apparatus 100. The speaker 104 outputs sound. The display unit 105 displays various information presented to the user. The display unit 105 may be a liquid crystal display unit with a touch panel. The touch sensor 106 detects that a user's finger, palm, or the like has touched. The infrared sensor 107 detects an object such as a user existing around the conversation processing apparatus 100. The infrared sensor 107 may be a pyroelectric infrared sensor. The actuator 108 provides power for operating the movable member provided in the conversation processing apparatus 100. When the conversation processing apparatus 100 has a head and an arm, the actuator 108 may rotate at least one of the head and the arm, for example.

The conversation processing device 100 may specify the direction in which the user exists based on the image captured by the camera 102. The conversation processing apparatus 100 may rotate the movable member such as the head provided with the microphone 101 by controlling the actuator 108 so that the microphone 101 faces in the specified direction.

The infrared sensor 107 may be arranged to detect an object that exists outside the imaging range of the camera 102. For example, when the user is outside the imaging range of the camera 102, the conversation processing apparatus 100 estimates the user's position using the infrared sensor 107. The conversation processing apparatus 100 may rotate a movable member such as a head provided with the camera 102 so that the user is included in the imaging range of the camera 102. Even when the user does not exist within the angle of view of the camera 102, the conversation processing apparatus 100 can easily detect the user and point the camera 102 and the microphone 101 in the optimum direction for the conversation with the user.

The detection unit 110 detects a user's speech. The detection unit 110 includes a voice recognition unit 111. The voice recognition unit 111 converts a user's speech into voice data. The image processing unit 112 processes image data captured by the camera 102. For example, the image processing unit 112 extracts user face image data from the image data. The image processing unit 112 extracts a facial feature amount from the extracted face image data. The feature amount may be information that can identify an object such as a person, and information on pixel values of face image data, or the interval or size of the face eyes, nose, mouth included in the face image data, skin It may be information indicating numerical values of appearance features such as color and hairstyle. The image processing unit 112 provides the application execution unit 120 with data indicating the facial feature amount.

The sound control unit 114 outputs sound based on the sound information provided from the application execution unit 120 to the speaker 104. The display control unit 116 causes the display unit 105 to display the image information provided from the application execution unit 120. The sensor control unit 118 receives detection signals from the touch sensor 106 and the infrared sensor 107 and provides them to the actuator control unit 119 and the application execution unit 120.

The actuator control unit 119 controls the actuator 108. The actuator control unit 119 has an adjustment unit 109. The adjustment unit 109 adjusts the imaging range of the camera 102 according to the detection result by the infrared sensor 107. When the user exists outside the imaging range of the camera 102, the adjustment unit 109 estimates the position of the user using the infrared sensor 107. The adjustment unit 109 may rotate the movable member such as the head provided with the camera 102 by controlling the actuator 108 so that the user's face is included in the imaging range of the camera 102. The adjustment unit 109 may adjust the imaging range of the camera 102 by adjusting the angle of view of the camera 102.

The conversation processing apparatus 100 may realize a video call using the camera 102. The conversation processing apparatus 100 uses the camera 102 as a remote camera that remotely monitors the environment surrounding the conversation processing apparatus 100 by controlling the camera 102, a movable member such as a head provided with the camera 102, and the angle of view of the camera 102. May function. The conversation processing apparatus 100 may store communication with the user such as conversation with the user as history information. The conversation processing apparatus 100 may determine whether or not the user has an abnormality based on the history information. The conversation processing apparatus 100 may determine that the user has an abnormality when the user's life pattern is predicted from the history information and the user takes an action different from the life pattern. When the conversation processing apparatus 100 determines that the user has an abnormality, the conversation processing apparatus 100 may notify the specific destination of the abnormality via the transmission / reception unit 130.

The transmission / reception unit 130 transmits / receives data to / from the text conversion device 200 and the morphological analysis device 300 via the network 50. The word information acquisition unit 132 acquires word information including at least one word extracted from the user's utterance detected by the detection unit 110. The word information acquisition unit 132 may acquire morphological analysis data provided from the morphological analysis device 300 via the transmission / reception unit 130 as word information. The morphological analysis data may be data including each word included in the user's utterance, the utterance order of each word, the part of speech of each word, and the like.

The application storage unit 140 stores a plurality of conversation applications. The application storage unit 140 may store a plurality of specific conversation applications and daily conversation applications.

When the conversation application being executed can respond to the user's speech, the selection unit 134 selects the conversation application being executed. When the conversation application being executed cannot respond to the user's speech, the selection unit 134 selects another conversation application that can respond to the user's speech.

The selection unit 134 selects the specific conversation application being executed when the specific conversation application being executed can respond to the user's speech. When the specific conversation application being executed cannot respond to the user's speech, the selection unit 134 selects another specific conversation application that can respond to the user's speech. When the other specific conversation application that can respond to the user's speech cannot be selected, the selection unit 134 selects the daily conversation application. The conversation application selected by the selection unit 134 is executed by the application execution unit 120.

The word list storage unit 142 stores a word list in which at least one word corresponding to a user's utterance to which a plurality of specific conversation applications can respond is registered in association with a plurality of specific conversation applications. The word list storage unit 142 stores, for example, a word list as shown in FIG. The word list includes combinations of words arranged in the order of statements included in the user's statements and a specific conversation application that is estimated to be able to respond appropriately from the combinations of words. The word list further includes a priority selected by the selection unit 134. For example, when the user makes a specific utterance regardless of the type of the specific conversation application being executed, the highest priority “1” is set for the conversation application to be selected. Next, the next priority “2” is set for the specific conversation application being executed. The priority “3” is set for the other specific conversation application. The word list shown in FIG. 3 indicates that the lower the numerical value, the higher the priority. However, the word list may indicate that the higher the numerical value, the higher the priority. The index indicating the degree of priority may not be a numerical value.

Depending on the type of the specific conversation application being executed, the type of other specific conversation application that is executed by interrupting and interrupting the specific conversation application being executed may be different in order to realize a more natural conversation. Therefore, the word list storage unit 142 may store a word list for each specific conversation application.

The selection unit 134 selects the specific conversation application being executed when at least one word included in the word information is registered in the word list in association with the specific conversation application being executed. In the selection unit 134, at least one word included in the word information is not registered in the word list in association with the specific conversation application being executed, and is registered in the word list in association with another specific conversation application. If so, another specific conversation application registered in the word list is selected.

For example, suppose that the user says “What is the weather tomorrow?”. The selection unit 134 selects a weather conversation application as a specific conversation application that can respond to a combination of the words (1) “Tomorrow”-(2) “Weather” included in the user's remarks. Then, suppose that the user says “Okay, what will be the day after tomorrow?”. In this case, the selection unit 134 continues to select the weather conversation application as the specific conversation application being executed. On the other hand, suppose that the user then remarks, "Now, let's enter tomorrow's schedule." In this case, the selection unit 134 interrupts the weather conversation application and selects the scheduled conversation application as another specific conversation application.

The user profile storage unit 144 stores at least one user profile including at least one item related to the user. In each item of the user profile, words extracted through conversation with the user are registered. The user profile storage unit 144 stores a user profile including a plurality of items indicating the individuality of the user, such as the user's name, date of birth, address, favorite food, favorite sports, as shown in FIG. The response unit 121 may determine information to be included in the response to the user based on the word of each item of the user profile.

The application execution unit 120 includes a response unit 121, a registration unit 122, an interruption state storage unit 123, and an end unit 124. If the selected conversation application is a running conversation application, the response unit 121 continues the running conversation application. When the selected conversation application is another conversation application, the response unit 121 suspends the conversation application being executed and executes another conversation application, thereby responding to the user's statement.

The registration unit 122 registers words belonging to at least one item among at least one word included in the word information in the user profile. For example, it is assumed that the user has said a favorite food. Then, the registration unit 122 extracts the user's favorite food word from the user's remarks and registers it in the user profile. The response unit 121 may optimize the content of the response to the user with reference to the user profile. The response unit 121 may optimize a response by changing a part of the content of the response to the user according to the content of the user profile.

For example, a user's favorite food is defined as _FAV_FOOD. In response to the user's remark that “There was something very good today,” a response phrase “It was good. Let's celebrate today's dinner with _FAV_FOOD” is defined. In this case, the response unit 121 refers to the user profile, identifies the user's favorite food, inserts the identified word into the response phrase, and optimizes the content of the response.

The interruption state storage unit 123 stores the interruption state of the algorithm of the specific conversation application being executed when the specific conversation application being executed is interrupted. The interruption state includes the position where the algorithm of the specific conversation application is interrupted and information obtained from the user's speech until the specific conversation application is interrupted. The information obtained from the user's remarks may include information registered in the user profile. When responding to a user's statement by interrupting the specific conversation application being executed and executing another specific conversation application, the response unit 121 obtains from the user during execution of the specific conversation application being executed. Based on the obtained information, the start position of the algorithm of another specific conversation application may be determined. The response unit 121 may execute another specific conversation application based on the determined start position.

For example, it is assumed that when the user is talking with the conversation processing apparatus 100 about the weather on the weekend using the weather conversation application, the user knows that the weekend is likely to be sunny and wants to go out. The user makes a statement such as “I want to input the schedule for the day” to the conversation processing apparatus 100. The response unit 121 determines the start position of the algorithm of the scheduled conversation application in consideration that the schedule to be input is a weekend. For example, the response unit 121 starts the algorithm of the scheduled conversation application from the time after the date on which the schedule is to be input is determined.

When the response unit 121 responds to the user's statement by interrupting the specific conversation application being executed and executing another specific conversation application, the response unit 121 refers to the interruption state storage unit 123 and is executing Identify the interruption status of the algorithm for a specific conversation application. In response to the termination or interruption of another specific conversation application, the response unit 121 resumes the specific conversation application that was previously suspended based on the suspended state. For example, the response unit 121 interrupts the other specific conversation application when the user makes a statement that the other conversation application cannot respond while executing the other specific conversation application. In other words, the response unit 121 suspends the other specific conversation application when the user starts a topic different from the topic related to the other specific conversation application while executing the other specific conversation application. In response to the interruption of the other specific conversation application, the response unit 121 sets the specific conversation application that can respond to the user's speech among the specific conversation applications previously interrupted to the suspended state of the specific conversation application. You may resume based. That is, in response to the interruption of another specific conversation application, the response unit 121 selects a specific conversation application corresponding to a topic newly started by the user from among the specific conversation applications previously interrupted. You may resume based on the interruption status.

The termination unit 124 forcibly terminates the conversation application being executed when the user's speech detected by the detection unit 110 is a predetermined specific speech. For example, when the user makes a specific statement such as “home” or “forced termination”, the termination unit 124 forcibly terminates the conversation application being executed.

The conversation processing apparatus 100 further includes an infrared light receiving unit 126, an infrared light emitting unit 128, and a peripheral device control unit 129. The peripheral device control unit 129 causes the conversation processing apparatus 100 to function as a remote control terminal (for example, a remote controller) for peripheral devices. Peripheral devices are devices that operate in response to control commands transmitted from a remote control terminal in an infrared or wireless manner, such as AV devices such as televisions and recorders, and home appliances such as air conditioners and electric fans. The infrared light receiving unit 126 receives a control command by infrared rays from the remote control terminal. The infrared light emitting unit 128 transmits a control command for controlling the peripheral device by infrared rays.

Peripheral device control unit 129 stores a control command list that associates control commands and control contents of peripheral devices to be controlled. For example, when the user desires to register a new peripheral device to be controlled, the peripheral device control unit 129 executes a control command registration process through a conversation with the user. For example, the peripheral device control unit 129 requests the user to operate the remote control terminal, transmits various control commands from the remote control terminal, and associates the received various control commands with each control content. Is generated. The peripheral device control unit 129 may cause the user to sequentially press the buttons of the remote control terminal corresponding to the control contents, and receive each control command emitted from the remote control terminal via the infrared light receiving unit 126. The peripheral device control unit 129 may generate a control command list in which each received control command is associated with each control content. Alternatively, the peripheral device control unit 129 may cause the user to input a number that uniquely specifies the type of the peripheral device to be controlled via the remote control terminal. The peripheral device control unit 129 may acquire a control command list corresponding to the input number via the network 50 such as the Internet. When the peripheral device control unit 129 is requested by the user to control the peripheral device, the peripheral device control unit 129 refers to the control command list and identifies a control command associated with the requested control content. The peripheral device control unit 129 transmits the specified control command to the peripheral device to be controlled by infrared rays via the infrared light emitting unit 128.

The conversation processing apparatus 100 may communicate with peripheral devices via wireless such as WiFi, Bluetooth (registered trademark). The peripheral device control unit 129 may acquire a device driver for controlling the peripheral device to be controlled via the network 50. The peripheral device control unit 129 may control the peripheral device through a conversation with the user using a device driver.

FIG. 5 is a flowchart showing an example of the conversation processing procedure of the conversation processing apparatus 100. The conversation processing apparatus 100 may execute the procedure of the flowchart shown in FIG. 5 when detecting a user's utterance.

The detection unit 110 detects the user's voice via the microphone 101 (S100). The voice recognition unit 111 generates voice data from the detected voice (S102). The transmission / reception unit 130 transmits the audio data to the text conversion device 200 (S104). The text conversion device 200 extracts words from the voice data and generates text data in which the words are arranged in the order of speech, for example. The transmission / reception unit 130 receives text data from the text conversion device 200 (S106). The transmission / reception unit 130 transmits the received text data to the morphological analyzer 300 (S108). The selection unit 134 performs pattern matching based on the text data (S110). Based on the received text data, the selection unit 134 determines whether or not the user's utterance is a topic related to the system such as various settings of the conversation processing device 100 (S112). When the text data matching the received text data is associated with the system conversation application, the selection unit 134 selects the system conversation application (S120).

When receiving the text data, the morpheme analyzer 300 generates morpheme analysis data by performing morphological analysis on the received text data. The morpheme analyzer 300 transmits the generated morpheme analysis data to the conversation processing device 100. The transmission / reception unit 130 receives morpheme analysis data from the morpheme analyzer 300 (S114). The word information acquisition unit 132 acquires morpheme analysis data as word information and provides it to the selection unit 134. The selection unit 134 performs pattern matching based on the word information (S116). The selection unit 134 refers to a word list associated with the specific conversation application being executed, and selects a conversation application that responds to the user's speech.

The selection unit 134 refers to the word list to determine whether or not the user's remark is a topic related to the system such as various settings of the conversation processing apparatus 100 (S118). If the user's speech is a topic about the system such as various settings of the conversation processing device 100, the selection unit 134 selects a system conversation application (S120).

If the user's utterance is not a topic related to the system, the selection unit 134 refers to the word list to determine whether the user's utterance is a continuation of the current topic (S122). The selection unit 134 refers to the word list, and if at least one word included in the word information is registered in the word list in association with the specific conversation application being executed, the user's utterance is a continuation of the current topic. Judge that there is. If the user's speech is a continuation of the current topic, the selection unit 134 selects the specific conversation application being executed (S124).

If the user's speech is not a continuation of the current topic, the selection unit 134 refers to the word list to determine whether or not the user's speech is a new topic (S126). In the selection unit 134, at least one word included in the word information is not registered in the word list in association with the specific conversation application being executed, and is registered in the word list in association with another specific conversation application. If it is, it is determined that the topic is new. If the user's speech is a new topic, the selection unit 134 selects another specific conversation application registered in the word list in association with at least one word included in the word information (S128).

If the user's speech is not a new topic, that is, if there is no specific conversation application that can respond appropriately to the user's speech, the selection unit 134 selects the daily conversation application (S130).

After an appropriate conversation application is selected by the selection unit 134 for the user's utterance, the response unit 121 executes the selected conversation application and responds to the user's utterance (S134).

As described above, according to the conversation processing apparatus 100 according to the present embodiment, when a response by the specific conversation application currently being executed is not appropriate for the user's utterance, an appropriate response is given to the user's utterance. Run other specific conversation applications that can. Further, when there is no other specific conversation application that can appropriately respond to the user's speech, the conversation processing apparatus 100 executes the daily conversation application. Therefore, even when the user tries to start a conversation on another topic in the middle of a conversation on one topic, the conversation processing apparatus 100 can realize a more natural response to the user's speech.

When resuming the interrupted specific conversation application, the conversation processing apparatus 100 starts the algorithm of the specific conversation application based on the information obtained in the conversation with the user via the specific conversation application before the interruption. Determine the position. Therefore, even if the conversation processing apparatus 100 returns to the previous topic after moving to a new topic, the conversation processing apparatus 100 can realize a more natural response to the user's utterance.

FIG. 6 illustrates an example of a computer 1200 in which aspects of the present invention may be embodied in whole or in part. A program installed in the computer 1200 can cause the computer 1200 to function as an operation associated with the apparatus according to the embodiment of the present invention or one or more sections of the apparatus, or to perform the operation or the one or more sections. The section can be executed and / or the computer 1200 can execute a process according to an embodiment of the present invention or a stage of the process. Such a program may be executed by CPU 1212 to cause computer 1200 to perform certain operations associated with some or all of the blocks in the flowcharts and block diagrams described herein.

A computer 1200 according to this embodiment includes a CPU 1212, a RAM 1214, a ROM 1230, a graphic controller 1216, and a display device 1218, which are connected to each other by a host controller 1210. Computer 1200 also includes a communication interface 1222 and an input / output controller 1220. Computer 1200 may include optional input / output units, which may be connected to host controller 1210 via input / output controller 1220.

The CPU 1212 operates in accordance with programs stored in the ROM 1230 and the RAM 1214, thereby controlling each unit. The graphic controller 1216 acquires the image data generated by the CPU 1212 in a frame buffer or the like provided in the RAM 1214 or the like, and causes the image data to be displayed on the display device 1218. The communication interface 1222 communicates with other electronic devices such as the text conversion device 200 and the morphological analysis device 300 via a network.

The ROM 1230 stores therein a boot program executed by the computer 1200 at the time of activation and / or a program depending on the hardware of the computer 1200. The program is read from a computer-readable medium, installed in the RAM 1214 or the ROM 1230, which is also an example of a computer-readable medium, and executed by the CPU 1212. Information processing described in these programs is read by the computer 1200 to bring about cooperation between the programs and the various types of hardware resources. An apparatus or method may be configured by implementing information manipulation or processing in accordance with the use of computer 1200.

For example, when communication is performed between the computer 1200 and an external device, the CPU 1212 executes a communication program loaded in the RAM 1214 and performs communication processing on the communication interface 1222 based on the processing described in the communication program. You may order. The communication interface 1222 reads transmission data stored in a transmission buffer processing area provided in a recording medium such as the RAM 1214 under the control of the CPU 1212, and transmits the read transmission data to the network or is received from the network. The received data is written into a reception buffer processing area provided on the recording medium.

Various types of information such as various types of programs, data, tables, and databases may be stored in the recording medium and subjected to information processing. The CPU 1212 describes various types of operations, information processing, conditional judgment, conditional branching, unconditional branching, and information retrieval that are described in various places in the present disclosure for data read from the RAM 1214 and specified by the instruction sequence of the program. Various types of processing may be performed, including / replacement, etc., and the result is written back to RAM 1214.

The programs or software modules described above may be stored on a computer-readable medium on the computer 1200 or in the vicinity of the computer 1200. In addition, a recording medium such as a hard disk or a RAM provided in a server system connected to a dedicated communication network or the Internet can be used as a computer-readable medium, thereby providing a program to the computer 1200 via the network. To do.

As mentioned above, although this invention was demonstrated using embodiment, the technical scope of this invention is not limited to the range as described in the said embodiment. It will be apparent to those skilled in the art that various modifications or improvements can be added to the above-described embodiment. It is apparent from the scope of the claims that the embodiments added with such changes or improvements can be included in the technical scope of the present invention.

The execution order of each process such as operations, procedures, steps, and stages in the apparatus, system, program, and method shown in the claims, the description, and the drawings is particularly “before” or “prior”. It should be noted that they can be implemented in any order unless the output of the previous process is used in the subsequent process. Regarding the operation flow in the claims, the description, and the drawings, even if it is described using “first”, “next”, etc. for the sake of convenience, it means that it is essential to carry out in this order. is not.

DESCRIPTION OF SYMBOLS 100 Conversation processing apparatus 101 Microphone 102 Camera 104 Speaker 105 Display part 106 Touch sensor 107 Infrared sensor 108 Actuator 109 Adjustment part 110 Detection part 111 Voice recognition part 112 Image processing part 114 Voice control part 116 Display control part 118 Sensor control part 119 Actuator control Unit 120 application execution unit 121 response unit 122 registration unit 123 interruption state storage unit 124 end unit 126 infrared light receiving unit 128 infrared light emitting unit 129 peripheral device control unit 130 transmission / reception unit 132 word information acquisition unit 134 selection unit 140 application storage unit 142 word list Storage unit 144 User profile storage unit 200 Text conversion device 300 Morphological analysis device 1200 Computer 1210 Host controller 1212 CPU
1214 RAM
1216 Graphic controller 1218 Display device 1220 Input / output controller 1222 Communication interface 1230 ROM

Claims

A conversation processing device that executes a plurality of conversation applications that respond to a user's speech according to a predetermined algorithm,
A detection unit for detecting a user's speech;
If the running conversation application can respond to the user's speech, select the running conversation application, and if the running conversation application cannot respond to the user's speech, respond to the user's speech A selection section for selecting other conversational applications that can respond to,
If the selected conversation application is the running conversation application, continue the running conversation application; if the selected conversation application is the other conversation application, suspend the running conversation application; A conversation processing apparatus comprising: a response unit that responds to the user's speech by executing the other conversation application.
The plurality of conversation applications includes a plurality of specific conversation applications that continue a conversation with the user according to the algorithm until a predetermined condition is satisfied,
When the specific conversation application being executed can respond to the user's utterance, the selection unit selects the specific conversation application being executed, and the specific conversation application being executed responds to the user's utterance. The conversation processing apparatus according to claim 1, wherein if it is not possible, another specific conversation application that can respond to the user's speech is selected.
The plurality of conversation applications further includes a daily conversation application that performs one response to one remark of the user;
When the specific conversation application being executed can respond to the user's utterance, the selection unit selects the specific conversation application being executed, and the specific conversation application being executed responds to the user's utterance. If it is not possible, the user selects another specific conversation application that can respond to the user's speech, and if the user cannot select another specific conversation application that can respond to the user's speech, the daily conversation application is selected. 3. The conversation processing apparatus according to 2.
The conversation processing apparatus according to claim 3, wherein the daily conversation application executes one response to one utterance of the user according to a deep learning algorithm.
A word information acquisition unit that acquires word information including at least one word extracted from the user's statement detected by the detection unit;
A word list storage unit that stores a word list in which at least one word corresponding to a user's utterance that can be responded to by the plurality of specific conversation applications is associated with the plurality of specific conversation applications;
The selection unit refers to the word list, and if the at least one word included in the word information is registered in the word list in association with the specific conversation application being executed, A conversation application is selected, and the at least one word included in the word information is not registered in the word list in association with the specific conversation application being executed and is associated with another specific conversation application. The conversation processing apparatus according to any one of claims 2 to 4, wherein, when registered in a list, the other specific conversation application registered in the word list is selected.
The word list storage unit stores the word list for each of the plurality of specific conversation applications,
The conversation processing device according to claim 5, wherein the selection unit selects a specific conversation application with reference to the word list associated with the specific conversation application being executed.
When the response unit responds to the user's statement by interrupting the specific conversation application being executed and executing the other specific conversation application, the specific conversation application being executed is being executed. The start position of the algorithm of the other specific conversation application is determined based on information obtained from the user at the same time, and the other specific conversation application is executed based on the determined start position. To 6. The conversation processing device according to any one of items 1 to 6.
An interruption state storage unit for storing an interruption state of the algorithm of the specific conversation application being executed when the specific conversation application being executed is interrupted;
The response unit refers to the interruption state storage unit, specifies the interruption state of the algorithm of the specific conversation application being executed, and corresponds to the end of or interruption of the other specific conversation application. The conversation processing apparatus according to claim 2, wherein the specific conversation application suspended at a time is resumed based on the suspended state.
The system according to claim 1, further comprising: an ending unit that forcibly terminates the conversation application being executed when the user's utterance detected by the detection unit is a predetermined specific utterance. Conversation processing device described in one.
An imaging unit for imaging the periphery of the conversation processing device;
An infrared sensor for detecting the presence of an object present around the conversation processing device;
10. The apparatus according to claim 1, further comprising: an adjustment unit that adjusts an imaging range of the imaging unit such that the user's face is included in the imaging range of the imaging unit according to a detection result by the infrared sensor. The conversation processing device according to any one of the above.
A program for causing a computer to execute a plurality of conversation applications that respond to a user's speech according to a predetermined algorithm,
A procedure to detect the user's speech;
If the running conversation application can respond to the user's speech, select the running conversation application, and if the running conversation application cannot respond to the user's speech, respond to the user's speech Select other conversational applications that can respond to,
If the selected conversation application is the running conversation application, continue the running conversation application; if the selected conversation application is the other conversation application, suspend the running conversation application; A program for causing the computer to execute a procedure for responding to the user's speech by executing the other conversation application.