WO2018025668A1 - Conversation processing device and program - Google Patents

Conversation processing device and program Download PDF

Info

Publication number
WO2018025668A1
WO2018025668A1 PCT/JP2017/026490 JP2017026490W WO2018025668A1 WO 2018025668 A1 WO2018025668 A1 WO 2018025668A1 JP 2017026490 W JP2017026490 W JP 2017026490W WO 2018025668 A1 WO2018025668 A1 WO 2018025668A1
Authority
WO
WIPO (PCT)
Prior art keywords
conversation
user
application
conversation application
specific
Prior art date
Application number
PCT/JP2017/026490
Other languages
French (fr)
Japanese (ja)
Inventor
高史 小山
佐知夫 前田
真人 土居
Original Assignee
ユニロボット株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by ユニロボット株式会社 filed Critical ユニロボット株式会社
Publication of WO2018025668A1 publication Critical patent/WO2018025668A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Definitions

  • the present invention relates to a conversation processing device and a program.
  • Patent Document 1 Japanese Patent Laid-Open No. 2015-011621
  • a system that can more naturally communicate with users is desired.
  • the conversation processing device executes a plurality of conversation applications that respond to a user's speech according to a predetermined algorithm.
  • the conversation processing device may include a detection unit that detects a user's speech.
  • the conversation processing device selects a running conversation application when the running conversation application can respond to the user's speech, and responds to the user's speech when the running conversation application cannot respond to the user's speech.
  • a selection unit for selecting another conversation application that can respond to the response may be provided.
  • the conversation processing device continues the running conversation application if the selected conversation application is a running conversation application, and interrupts the running conversation application if the selected conversation application is another conversation application.
  • a response unit that responds to the user's speech by executing another conversation application may be provided.
  • the plurality of conversation applications may include a plurality of specific conversation applications that continue the conversation with the user according to an algorithm until a predetermined condition is satisfied.
  • the selection unit may select the specific conversation application being executed when the specific conversation application being executed can respond to the user's speech. When the specific conversation application being executed cannot respond to the user's speech, the selection unit may select another specific conversation application that can respond to the user's speech.
  • the plurality of conversation applications may further include a daily conversation application that executes one response to one utterance of the user.
  • the selection unit may select the specific conversation application being executed when the specific conversation application being executed can respond to the user's speech. When the specific conversation application being executed cannot respond to the user's speech, the selection unit may select another specific conversation application that can respond to the user's speech. When the selection unit cannot select another specific conversation application that can respond to the user's utterance, the selection unit may select the daily conversation application.
  • the daily conversation application may execute one response to one user's speech according to the deep learning algorithm.
  • the conversation processing apparatus may further include a word information acquisition unit that acquires word information including at least one word extracted from the user's utterance detected by the detection unit.
  • the conversation processing apparatus may further include a word list storage unit that stores a word list in which at least one word corresponding to a user's utterance that can be answered by the plurality of specific conversation applications is associated with the plurality of specific conversation applications.
  • the selection unit may select the specific conversation application being executed when at least one word included in the word information is registered in the word list in association with the specific conversation application being executed with reference to the word list. .
  • the selection unit is configured such that at least one word included in the word information is not registered in the word list in association with the specific conversation application being executed and is registered in the word list in association with another specific conversation application. Other specific conversation applications registered in the word list may be selected.
  • the word list storage unit may store a word list for each of a plurality of specific conversation applications.
  • the selection unit may select a specific conversation application with reference to a word list associated with the specific conversation application being executed.
  • the response unit is obtained from the user during the execution of the specific conversation application being executed when the specific conversation application being executed is interrupted and responding to the user's speech by executing another specific conversation application.
  • the start position of the algorithm of the other specific conversation application may be determined based on the obtained information, and the other specific conversation application may be executed based on the determined start position.
  • the conversation processing device may further include an interruption state storage unit that stores the interruption state of the algorithm of the specific conversation application being executed when the specific conversation application being executed is interrupted.
  • the response unit refers to the interruption state storage unit, identifies the interruption state of the algorithm of the specific conversation application being executed, and identifies the interruption that was interrupted first in response to the termination or interruption of another specific conversation application.
  • the conversation application may be resumed based on the suspended state.
  • the conversation processing device may further include an ending unit that forcibly terminates the conversation application being executed when the user's utterance detected by the detection unit is a predetermined specific utterance.
  • the conversation processing device may further include an imaging unit that images the periphery of the conversation processing device.
  • the conversation processing device may further include an infrared sensor that detects the presence of an object existing around the conversation processing device.
  • the conversation processing device may further include an adjustment unit that adjusts the imaging range of the imaging unit so that the user's face is included in the imaging range of the imaging unit according to the detection result of the infrared sensor.
  • the program according to an aspect of the present invention is a program for causing a computer to execute a plurality of conversation applications that respond to a user's speech according to a predetermined algorithm.
  • the program may cause the computer to execute a procedure for detecting a user's speech.
  • the program selects a running conversation application if the running conversation application can respond to the user's speech, and responds to the user's speech if the running conversation application cannot respond to the user's speech.
  • the computer may be caused to perform a procedure for selecting another conversation application that can respond.
  • the program continues the running conversation application if the selected conversation application is a running conversation application, and interrupts the running conversation application if the selected conversation application is another conversation application, and others By executing the conversation application, the computer may execute a procedure for responding to the user's speech.
  • FIG. 1 It is a figure which shows an example of the system configuration
  • a block is either (1) a stage in a process in which the operation is performed or (2) an apparatus responsible for performing the operation. May represent a section of Certain stages and sections are implemented by dedicated circuitry, programmable circuitry supplied with computer readable instructions stored on a computer readable medium, and / or processor supplied with computer readable instructions stored on a computer readable medium. It's okay.
  • Dedicated circuitry may include digital and / or analog hardware circuitry and may include integrated circuits (ICs) and / or discrete circuits.
  • Programmable circuits include memory elements such as logical AND, logical OR, logical XOR, logical NAND, logical NOR, and other logical operations, flip-flops, registers, field programmable gate arrays (FPGA), programmable logic arrays (PLA), etc. Reconfigurable hardware circuitry, including and the like.
  • Computer readable media may include any tangible device capable of storing instructions to be executed by a suitable device, such that a computer readable medium having instructions stored thereon is specified in a flowchart or block diagram. A product including instructions that can be executed to create a means for performing the operation. Examples of computer readable media may include electronic storage media, magnetic storage media, optical storage media, electromagnetic storage media, semiconductor storage media, and the like.
  • Computer readable media include floppy disks, diskettes, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), Electrically erasable programmable read only memory (EEPROM), static random access memory (SRAM), compact disc read only memory (CD-ROM), digital versatile disc (DVD), Blu-ray (RTM) disc, memory stick, integrated A circuit card or the like may be included.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • EEPROM Electrically erasable programmable read only memory
  • SRAM static random access memory
  • CD-ROM compact disc read only memory
  • DVD digital versatile disc
  • RTM Blu-ray
  • Computer readable instructions can be assembler instructions, instruction set architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state setting data, or object oriented programming such as Smalltalk, JAVA, C ++, etc. Including any source code or object code written in any combination of one or more programming languages, including languages and conventional procedural programming languages such as "C" programming language or similar programming languages Good.
  • Computer readable instructions may be directed to a general purpose computer, special purpose computer, or other programmable data processing device processor or programmable circuit locally or in a wide area network (WAN) such as a local area network (LAN), the Internet, etc.
  • the computer-readable instructions may be executed to create a means for performing the operations provided via and specified in the flowchart or block diagram.
  • processors include computer processors, processing units, microprocessors, digital signal processors, controllers, microcontrollers, and the like.
  • FIG. 1 shows an example of the system configuration of a conversation processing system according to this embodiment.
  • the conversation processing system includes a conversation processing apparatus 100, a text conversion apparatus 200, and a morpheme analysis apparatus 300.
  • the conversation processing device 100, the text conversion device 200, and the morpheme analysis device 300 are connected via a network 50.
  • the conversation processing apparatus 100 responds to the user's utterance with voice, image, movement or the like.
  • the conversation processing apparatus 100 includes a microphone 101, a camera 102, a speaker 104, a display unit 105, a touch sensor 106, and the like.
  • the conversation processing apparatus 100 detects the user's voice via the microphone 101.
  • the conversation processing apparatus 100 detects a user's facial expression or the like via the camera 102.
  • the conversation processing apparatus 100 transmits information by voice to the user via the speaker 104.
  • the conversation processing apparatus 100 transmits information to the user as an image via the display unit 105.
  • the conversation processing apparatus 100 may communicate with the user by means other than voice via the touch sensor 106 or the like.
  • the text conversion device 200 extracts words from the audio data provided from the conversation processing device 100 and generates text data.
  • the text conversion device 200 returns the generated text data to the conversation processing device 100.
  • the morpheme analyzer 300 performs morpheme analysis on the text data provided from the conversation processing device 100 to generate morpheme analysis data.
  • the morpheme analyzer 300 returns the generated morpheme analysis data to the conversation processing device 100.
  • the conversation processing apparatus 100 refers to the morphological analysis data and responds to the user's statement.
  • FIG. 2 shows an example of functional blocks of the conversation processing apparatus 100.
  • the conversation processing apparatus 100 executes a plurality of conversation applications that respond to a user's speech according to a predetermined algorithm.
  • the plurality of conversation applications may include a plurality of specific conversation applications that continue the conversation with the user according to an algorithm until a predetermined condition is satisfied.
  • the specific conversation application may be a conversation application for achieving a specific purpose through conversation with the user.
  • the plurality of specific conversation applications include a schedule conversation application that continues the conversation with the user until the user's desired schedule is registered.
  • the plurality of specific conversation applications include a weather conversation application that continues the conversation with the user until providing weather information at a specific date and time at a specific location.
  • the plurality of specific conversation applications include a recipe conversation application that continues the conversation with the user until a specific cooking recipe is provided.
  • the plurality of specific conversation applications include a game conversation application that executes a game according to a predetermined rule through a conversation with a user.
  • the plurality of conversation applications further include a daily conversation application that executes one response to one utterance of the user.
  • the daily conversation application may operate according to a different algorithm from the specific conversation application.
  • the daily conversation application executes, for example, a response according to the user's characteristics according to a deep learning algorithm.
  • the algorithm of the daily conversation application may be designed so that the conversation processing apparatus 100 asks the user the meaning of the word.
  • a word unknown to the conversation processing apparatus 100 is defined as “_UNK”.
  • an algorithm for a daily conversation application may be designed to respond to a user's statement “I met _UNK today at XX” as “What kind of person is _UNK?”. With such a design, an algorithm may be designed so that an appropriate answer can be made even when a user speaks a word unknown to the conversation processing apparatus 100.
  • the plurality of conversation applications further include a system conversation application that performs various settings of the conversation processing apparatus 100 through conversation with the user.
  • a system conversation application that performs various settings of the conversation processing apparatus 100 through conversation with the user.
  • the conversation processing apparatus 100 determines that the response to the user's statement cannot be executed by the currently executing conversation application, the conversation processing apparatus 100 interrupts the currently executing conversation application.
  • the conversation processing apparatus 100 selects an appropriate conversation application that can respond to the user's statement and responds to the user's statement.
  • the conversation processing apparatus 100 includes a microphone 101, a camera 102, a speaker 104, a display unit 105, a touch sensor 106, an infrared sensor 107, an actuator 108, a detection unit 110, an image processing unit 112, an audio control unit 114, a display control unit 116, and a sensor.
  • a control unit 118 and an actuator control unit 119 are provided.
  • the conversation processing apparatus 100 further includes an application execution unit 120, a transmission / reception unit 130, a word information acquisition unit 132, a selection unit 134, an application storage unit 140, a word list storage unit 142, and a user profile storage unit 144.
  • the microphone 101 detects the voice uttered by the user.
  • the microphone 101 may be a directional microphone.
  • the camera 102 images the environment around the conversation processing apparatus 100. For example, the camera 102 captures an image of the face of a user who has a conversation with the conversation processing apparatus 100.
  • the speaker 104 outputs sound.
  • the display unit 105 displays various information presented to the user.
  • the display unit 105 may be a liquid crystal display unit with a touch panel.
  • the touch sensor 106 detects that a user's finger, palm, or the like has touched.
  • the infrared sensor 107 detects an object such as a user existing around the conversation processing apparatus 100.
  • the infrared sensor 107 may be a pyroelectric infrared sensor.
  • the actuator 108 provides power for operating the movable member provided in the conversation processing apparatus 100. When the conversation processing apparatus 100 has a head and an arm, the actuator 108 may rotate at least one of the head and the arm, for example.
  • the conversation processing device 100 may specify the direction in which the user exists based on the image captured by the camera 102.
  • the conversation processing apparatus 100 may rotate the movable member such as the head provided with the microphone 101 by controlling the actuator 108 so that the microphone 101 faces in the specified direction.
  • the infrared sensor 107 may be arranged to detect an object that exists outside the imaging range of the camera 102. For example, when the user is outside the imaging range of the camera 102, the conversation processing apparatus 100 estimates the user's position using the infrared sensor 107. The conversation processing apparatus 100 may rotate a movable member such as a head provided with the camera 102 so that the user is included in the imaging range of the camera 102. Even when the user does not exist within the angle of view of the camera 102, the conversation processing apparatus 100 can easily detect the user and point the camera 102 and the microphone 101 in the optimum direction for the conversation with the user.
  • the detection unit 110 detects a user's speech.
  • the detection unit 110 includes a voice recognition unit 111.
  • the voice recognition unit 111 converts a user's speech into voice data.
  • the image processing unit 112 processes image data captured by the camera 102. For example, the image processing unit 112 extracts user face image data from the image data.
  • the image processing unit 112 extracts a facial feature amount from the extracted face image data.
  • the feature amount may be information that can identify an object such as a person, and information on pixel values of face image data, or the interval or size of the face eyes, nose, mouth included in the face image data, skin It may be information indicating numerical values of appearance features such as color and hairstyle.
  • the image processing unit 112 provides the application execution unit 120 with data indicating the facial feature amount.
  • the sound control unit 114 outputs sound based on the sound information provided from the application execution unit 120 to the speaker 104.
  • the display control unit 116 causes the display unit 105 to display the image information provided from the application execution unit 120.
  • the sensor control unit 118 receives detection signals from the touch sensor 106 and the infrared sensor 107 and provides them to the actuator control unit 119 and the application execution unit 120.
  • the actuator control unit 119 controls the actuator 108.
  • the actuator control unit 119 has an adjustment unit 109.
  • the adjustment unit 109 adjusts the imaging range of the camera 102 according to the detection result by the infrared sensor 107. When the user exists outside the imaging range of the camera 102, the adjustment unit 109 estimates the position of the user using the infrared sensor 107.
  • the adjustment unit 109 may rotate the movable member such as the head provided with the camera 102 by controlling the actuator 108 so that the user's face is included in the imaging range of the camera 102.
  • the adjustment unit 109 may adjust the imaging range of the camera 102 by adjusting the angle of view of the camera 102.
  • the conversation processing apparatus 100 may realize a video call using the camera 102.
  • the conversation processing apparatus 100 uses the camera 102 as a remote camera that remotely monitors the environment surrounding the conversation processing apparatus 100 by controlling the camera 102, a movable member such as a head provided with the camera 102, and the angle of view of the camera 102. May function.
  • the conversation processing apparatus 100 may store communication with the user such as conversation with the user as history information.
  • the conversation processing apparatus 100 may determine whether or not the user has an abnormality based on the history information.
  • the conversation processing apparatus 100 may determine that the user has an abnormality when the user's life pattern is predicted from the history information and the user takes an action different from the life pattern.
  • the conversation processing apparatus 100 may notify the specific destination of the abnormality via the transmission / reception unit 130.
  • the transmission / reception unit 130 transmits / receives data to / from the text conversion device 200 and the morphological analysis device 300 via the network 50.
  • the word information acquisition unit 132 acquires word information including at least one word extracted from the user's utterance detected by the detection unit 110.
  • the word information acquisition unit 132 may acquire morphological analysis data provided from the morphological analysis device 300 via the transmission / reception unit 130 as word information.
  • the morphological analysis data may be data including each word included in the user's utterance, the utterance order of each word, the part of speech of each word, and the like.
  • the application storage unit 140 stores a plurality of conversation applications.
  • the application storage unit 140 may store a plurality of specific conversation applications and daily conversation applications.
  • the selection unit 134 selects the conversation application being executed.
  • the selection unit 134 selects another conversation application that can respond to the user's speech.
  • the selection unit 134 selects the specific conversation application being executed when the specific conversation application being executed can respond to the user's speech. When the specific conversation application being executed cannot respond to the user's speech, the selection unit 134 selects another specific conversation application that can respond to the user's speech. When the other specific conversation application that can respond to the user's speech cannot be selected, the selection unit 134 selects the daily conversation application. The conversation application selected by the selection unit 134 is executed by the application execution unit 120.
  • the word list storage unit 142 stores a word list in which at least one word corresponding to a user's utterance to which a plurality of specific conversation applications can respond is registered in association with a plurality of specific conversation applications.
  • the word list storage unit 142 stores, for example, a word list as shown in FIG.
  • the word list includes combinations of words arranged in the order of statements included in the user's statements and a specific conversation application that is estimated to be able to respond appropriately from the combinations of words.
  • the word list further includes a priority selected by the selection unit 134. For example, when the user makes a specific utterance regardless of the type of the specific conversation application being executed, the highest priority “1” is set for the conversation application to be selected.
  • the next priority “2” is set for the specific conversation application being executed.
  • the priority “3” is set for the other specific conversation application.
  • the word list shown in FIG. 3 indicates that the lower the numerical value, the higher the priority. However, the word list may indicate that the higher the numerical value, the higher the priority.
  • the index indicating the degree of priority may not be a numerical value.
  • the word list storage unit 142 may store a word list for each specific conversation application.
  • the selection unit 134 selects the specific conversation application being executed when at least one word included in the word information is registered in the word list in association with the specific conversation application being executed. In the selection unit 134, at least one word included in the word information is not registered in the word list in association with the specific conversation application being executed, and is registered in the word list in association with another specific conversation application. If so, another specific conversation application registered in the word list is selected.
  • the selection unit 134 selects a weather conversation application as a specific conversation application that can respond to a combination of the words (1) “Tomorrow”-(2) “Weather” included in the user's remarks. Then, suppose that the user says “Okay, what will be the day after tomorrow?”. In this case, the selection unit 134 continues to select the weather conversation application as the specific conversation application being executed. On the other hand, suppose that the user then remarks, "Now, let's enter tomorrow's schedule.” In this case, the selection unit 134 interrupts the weather conversation application and selects the scheduled conversation application as another specific conversation application.
  • the user profile storage unit 144 stores at least one user profile including at least one item related to the user. In each item of the user profile, words extracted through conversation with the user are registered.
  • the user profile storage unit 144 stores a user profile including a plurality of items indicating the individuality of the user, such as the user's name, date of birth, address, favorite food, favorite sports, as shown in FIG.
  • the response unit 121 may determine information to be included in the response to the user based on the word of each item of the user profile.
  • the application execution unit 120 includes a response unit 121, a registration unit 122, an interruption state storage unit 123, and an end unit 124. If the selected conversation application is a running conversation application, the response unit 121 continues the running conversation application. When the selected conversation application is another conversation application, the response unit 121 suspends the conversation application being executed and executes another conversation application, thereby responding to the user's statement.
  • the registration unit 122 registers words belonging to at least one item among at least one word included in the word information in the user profile. For example, it is assumed that the user has said a favorite food. Then, the registration unit 122 extracts the user's favorite food word from the user's remarks and registers it in the user profile.
  • the response unit 121 may optimize the content of the response to the user with reference to the user profile. The response unit 121 may optimize a response by changing a part of the content of the response to the user according to the content of the user profile.
  • a user's favorite food is defined as _FAV_FOOD.
  • the response unit 121 refers to the user profile, identifies the user's favorite food, inserts the identified word into the response phrase, and optimizes the content of the response.
  • the interruption state storage unit 123 stores the interruption state of the algorithm of the specific conversation application being executed when the specific conversation application being executed is interrupted.
  • the interruption state includes the position where the algorithm of the specific conversation application is interrupted and information obtained from the user's speech until the specific conversation application is interrupted.
  • the information obtained from the user's remarks may include information registered in the user profile.
  • the response unit 121 obtains from the user during execution of the specific conversation application being executed. Based on the obtained information, the start position of the algorithm of another specific conversation application may be determined. The response unit 121 may execute another specific conversation application based on the determined start position.
  • the response unit 121 determines the start position of the algorithm of the scheduled conversation application in consideration that the schedule to be input is a weekend. For example, the response unit 121 starts the algorithm of the scheduled conversation application from the time after the date on which the schedule is to be input is determined.
  • the response unit 121 When the response unit 121 responds to the user's statement by interrupting the specific conversation application being executed and executing another specific conversation application, the response unit 121 refers to the interruption state storage unit 123 and is executing Identify the interruption status of the algorithm for a specific conversation application. In response to the termination or interruption of another specific conversation application, the response unit 121 resumes the specific conversation application that was previously suspended based on the suspended state. For example, the response unit 121 interrupts the other specific conversation application when the user makes a statement that the other conversation application cannot respond while executing the other specific conversation application. In other words, the response unit 121 suspends the other specific conversation application when the user starts a topic different from the topic related to the other specific conversation application while executing the other specific conversation application.
  • the response unit 121 sets the specific conversation application that can respond to the user's speech among the specific conversation applications previously interrupted to the suspended state of the specific conversation application. You may resume based. That is, in response to the interruption of another specific conversation application, the response unit 121 selects a specific conversation application corresponding to a topic newly started by the user from among the specific conversation applications previously interrupted. You may resume based on the interruption status.
  • the termination unit 124 forcibly terminates the conversation application being executed when the user's speech detected by the detection unit 110 is a predetermined specific speech. For example, when the user makes a specific statement such as “home” or “forced termination”, the termination unit 124 forcibly terminates the conversation application being executed.
  • the conversation processing apparatus 100 further includes an infrared light receiving unit 126, an infrared light emitting unit 128, and a peripheral device control unit 129.
  • the peripheral device control unit 129 causes the conversation processing apparatus 100 to function as a remote control terminal (for example, a remote controller) for peripheral devices.
  • Peripheral devices are devices that operate in response to control commands transmitted from a remote control terminal in an infrared or wireless manner, such as AV devices such as televisions and recorders, and home appliances such as air conditioners and electric fans.
  • the infrared light receiving unit 126 receives a control command by infrared rays from the remote control terminal.
  • the infrared light emitting unit 128 transmits a control command for controlling the peripheral device by infrared rays.
  • Peripheral device control unit 129 stores a control command list that associates control commands and control contents of peripheral devices to be controlled. For example, when the user desires to register a new peripheral device to be controlled, the peripheral device control unit 129 executes a control command registration process through a conversation with the user. For example, the peripheral device control unit 129 requests the user to operate the remote control terminal, transmits various control commands from the remote control terminal, and associates the received various control commands with each control content. Is generated. The peripheral device control unit 129 may cause the user to sequentially press the buttons of the remote control terminal corresponding to the control contents, and receive each control command emitted from the remote control terminal via the infrared light receiving unit 126.
  • the peripheral device control unit 129 may generate a control command list in which each received control command is associated with each control content. Alternatively, the peripheral device control unit 129 may cause the user to input a number that uniquely specifies the type of the peripheral device to be controlled via the remote control terminal. The peripheral device control unit 129 may acquire a control command list corresponding to the input number via the network 50 such as the Internet. When the peripheral device control unit 129 is requested by the user to control the peripheral device, the peripheral device control unit 129 refers to the control command list and identifies a control command associated with the requested control content. The peripheral device control unit 129 transmits the specified control command to the peripheral device to be controlled by infrared rays via the infrared light emitting unit 128.
  • the conversation processing apparatus 100 may communicate with peripheral devices via wireless such as WiFi, Bluetooth (registered trademark).
  • the peripheral device control unit 129 may acquire a device driver for controlling the peripheral device to be controlled via the network 50.
  • the peripheral device control unit 129 may control the peripheral device through a conversation with the user using a device driver.
  • FIG. 5 is a flowchart showing an example of the conversation processing procedure of the conversation processing apparatus 100.
  • the conversation processing apparatus 100 may execute the procedure of the flowchart shown in FIG. 5 when detecting a user's utterance.
  • the detection unit 110 detects the user's voice via the microphone 101 (S100).
  • the voice recognition unit 111 generates voice data from the detected voice (S102).
  • the transmission / reception unit 130 transmits the audio data to the text conversion device 200 (S104).
  • the text conversion device 200 extracts words from the voice data and generates text data in which the words are arranged in the order of speech, for example.
  • the transmission / reception unit 130 receives text data from the text conversion device 200 (S106).
  • the transmission / reception unit 130 transmits the received text data to the morphological analyzer 300 (S108).
  • the selection unit 134 performs pattern matching based on the text data (S110).
  • the selection unit 134 determines whether or not the user's utterance is a topic related to the system such as various settings of the conversation processing device 100 (S112). When the text data matching the received text data is associated with the system conversation application, the selection unit 134 selects the system conversation application (S120).
  • the morpheme analyzer 300 When receiving the text data, the morpheme analyzer 300 generates morpheme analysis data by performing morphological analysis on the received text data.
  • the morpheme analyzer 300 transmits the generated morpheme analysis data to the conversation processing device 100.
  • the transmission / reception unit 130 receives morpheme analysis data from the morpheme analyzer 300 (S114).
  • the word information acquisition unit 132 acquires morpheme analysis data as word information and provides it to the selection unit 134.
  • the selection unit 134 performs pattern matching based on the word information (S116).
  • the selection unit 134 refers to a word list associated with the specific conversation application being executed, and selects a conversation application that responds to the user's speech.
  • the selection unit 134 refers to the word list to determine whether or not the user's remark is a topic related to the system such as various settings of the conversation processing apparatus 100 (S118). If the user's speech is a topic about the system such as various settings of the conversation processing device 100, the selection unit 134 selects a system conversation application (S120).
  • the selection unit 134 refers to the word list to determine whether the user's utterance is a continuation of the current topic (S122).
  • the selection unit 134 refers to the word list, and if at least one word included in the word information is registered in the word list in association with the specific conversation application being executed, the user's utterance is a continuation of the current topic. Judge that there is. If the user's speech is a continuation of the current topic, the selection unit 134 selects the specific conversation application being executed (S124).
  • the selection unit 134 refers to the word list to determine whether or not the user's speech is a new topic (S126). In the selection unit 134, at least one word included in the word information is not registered in the word list in association with the specific conversation application being executed, and is registered in the word list in association with another specific conversation application. If it is, it is, it is determined that the topic is new. If the user's speech is a new topic, the selection unit 134 selects another specific conversation application registered in the word list in association with at least one word included in the word information (S128).
  • the selection unit 134 selects the daily conversation application (S130).
  • the response unit 121 executes the selected conversation application and responds to the user's utterance (S134).
  • the conversation processing apparatus 100 when a response by the specific conversation application currently being executed is not appropriate for the user's utterance, an appropriate response is given to the user's utterance. Run other specific conversation applications that can. Further, when there is no other specific conversation application that can appropriately respond to the user's speech, the conversation processing apparatus 100 executes the daily conversation application. Therefore, even when the user tries to start a conversation on another topic in the middle of a conversation on one topic, the conversation processing apparatus 100 can realize a more natural response to the user's speech.
  • the conversation processing apparatus 100 When resuming the interrupted specific conversation application, the conversation processing apparatus 100 starts the algorithm of the specific conversation application based on the information obtained in the conversation with the user via the specific conversation application before the interruption. Determine the position. Therefore, even if the conversation processing apparatus 100 returns to the previous topic after moving to a new topic, the conversation processing apparatus 100 can realize a more natural response to the user's utterance.
  • FIG. 6 illustrates an example of a computer 1200 in which aspects of the present invention may be embodied in whole or in part.
  • a program installed in the computer 1200 can cause the computer 1200 to function as an operation associated with the apparatus according to the embodiment of the present invention or one or more sections of the apparatus, or to perform the operation or the one or more sections.
  • the section can be executed and / or the computer 1200 can execute a process according to an embodiment of the present invention or a stage of the process.
  • Such a program may be executed by CPU 1212 to cause computer 1200 to perform certain operations associated with some or all of the blocks in the flowcharts and block diagrams described herein.
  • a computer 1200 includes a CPU 1212, a RAM 1214, a ROM 1230, a graphic controller 1216, and a display device 1218, which are connected to each other by a host controller 1210.
  • Computer 1200 also includes a communication interface 1222 and an input / output controller 1220.
  • Computer 1200 may include optional input / output units, which may be connected to host controller 1210 via input / output controller 1220.
  • the CPU 1212 operates in accordance with programs stored in the ROM 1230 and the RAM 1214, thereby controlling each unit.
  • the graphic controller 1216 acquires the image data generated by the CPU 1212 in a frame buffer or the like provided in the RAM 1214 or the like, and causes the image data to be displayed on the display device 1218.
  • the communication interface 1222 communicates with other electronic devices such as the text conversion device 200 and the morphological analysis device 300 via a network.
  • the ROM 1230 stores therein a boot program executed by the computer 1200 at the time of activation and / or a program depending on the hardware of the computer 1200.
  • the program is read from a computer-readable medium, installed in the RAM 1214 or the ROM 1230, which is also an example of a computer-readable medium, and executed by the CPU 1212.
  • Information processing described in these programs is read by the computer 1200 to bring about cooperation between the programs and the various types of hardware resources.
  • An apparatus or method may be configured by implementing information manipulation or processing in accordance with the use of computer 1200.
  • the CPU 1212 executes a communication program loaded in the RAM 1214 and performs communication processing on the communication interface 1222 based on the processing described in the communication program. You may order.
  • the communication interface 1222 reads transmission data stored in a transmission buffer processing area provided in a recording medium such as the RAM 1214 under the control of the CPU 1212, and transmits the read transmission data to the network or is received from the network. The received data is written into a reception buffer processing area provided on the recording medium.
  • the CPU 1212 describes various types of operations, information processing, conditional judgment, conditional branching, unconditional branching, and information retrieval that are described in various places in the present disclosure for data read from the RAM 1214 and specified by the instruction sequence of the program.
  • Various types of processing may be performed, including / replacement, etc., and the result is written back to RAM 1214.
  • the programs or software modules described above may be stored on a computer-readable medium on the computer 1200 or in the vicinity of the computer 1200.
  • a recording medium such as a hard disk or a RAM provided in a server system connected to a dedicated communication network or the Internet can be used as a computer-readable medium, thereby providing a program to the computer 1200 via the network. To do.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The present invention provides a system that facilitates more natural conversation with a user. This conversation processing device comprises: a detection unit for detecting a statement made by a user; a selection unit that selects the running conversation application if the running conversation application can respond to the statement made by the user, and if the running conversation application cannot respond to the statement made by the user, selects another conversation application that can respond to the statement made by the user; and a response unit that responds to the statement made by the user by allowing the running conversation application to continue if the selected conversation application is the running conversation application, and by suspending the running conversation application and initiating another conversation application if the selected conversation application is another conversation application.

Description

会話処理装置、及びプログラムConversation processing apparatus and program
 本発明は、会話処理装置、及びプログラムに関する。 The present invention relates to a conversation processing device and a program.
 ユーザとの会話を実現するシステムが様々提案されている。
 特許文献1 特開2015-011621号公報
Various systems for realizing conversations with users have been proposed.
Patent Document 1 Japanese Patent Laid-Open No. 2015-011621
解決しようとする課題Challenges to be solved
 ユーザとの会話をより自然に行うことができるシステムが望まれている。 A system that can more naturally communicate with users is desired.
一般的開示General disclosure
 本発明の一態様に係る会話処理装置は、ユーザの発言に対して予め定められたアルゴリズムに従って応答する複数の会話アプリケーションを実行する。会話処理装置は、ユーザの発言を検出する検出部を備えてよい。会話処理装置は、実行中の会話アプリケーションがユーザの発言に対して応答できる場合、実行中の会話アプリケーションを選択し、実行中の会話アプリケーションがユーザの発言に対して応答できない場合、ユーザの発言に対して応答できる他の会話アプリケーションを選択する選択部を備えてよい。会話処理装置は、選択された会話アプリケーションが実行中の会話アプリケーションの場合、実行中の会話アプリケーションを継続し、選択された会話アプリケーションが他の会話アプリケーションの場合、実行中の会話アプリケーションを中断して、他の会話アプリケーションを実行することにより、ユーザの発言に対して応答する応答部を備えてよい。 The conversation processing device according to an aspect of the present invention executes a plurality of conversation applications that respond to a user's speech according to a predetermined algorithm. The conversation processing device may include a detection unit that detects a user's speech. The conversation processing device selects a running conversation application when the running conversation application can respond to the user's speech, and responds to the user's speech when the running conversation application cannot respond to the user's speech. A selection unit for selecting another conversation application that can respond to the response may be provided. The conversation processing device continues the running conversation application if the selected conversation application is a running conversation application, and interrupts the running conversation application if the selected conversation application is another conversation application. A response unit that responds to the user's speech by executing another conversation application may be provided.
 複数の会話アプリケーションは、予め定められた条件を満たすまでアルゴリズムに従ってユーザとの会話を継続する複数の特定会話アプリケーションを含んでよい。選択部は、実行中の特定会話アプリケーションがユーザの発言に対して応答できる場合、実行中の特定会話アプリケーションを選択してよい。選択部は、実行中の特定会話アプリケーションがユーザの発言に対して応答できない場合、ユーザの発言に対して応答できる他の特定会話アプリケーションを選択してよい。 The plurality of conversation applications may include a plurality of specific conversation applications that continue the conversation with the user according to an algorithm until a predetermined condition is satisfied. The selection unit may select the specific conversation application being executed when the specific conversation application being executed can respond to the user's speech. When the specific conversation application being executed cannot respond to the user's speech, the selection unit may select another specific conversation application that can respond to the user's speech.
 複数の会話アプリケーションは、ユーザの1つの発言に対して1つの応答を実行する日常会話アプリケーションを更に含んでよい。選択部は、実行中の特定会話アプリケーションがユーザの発言に対して応答できる場合、実行中の特定会話アプリケーションを選択してよい。選択部は、実行中の特定会話アプリケーションがユーザの発言に対して応答できない場合、ユーザの発言に対して応答できる他の特定会話アプリケーションを選択してよい。選択部は、ユーザの発言に対して応答できる他の特定会話アプリケーションを選択できない場合、日常会話アプリケーションを選択してよい。 The plurality of conversation applications may further include a daily conversation application that executes one response to one utterance of the user. The selection unit may select the specific conversation application being executed when the specific conversation application being executed can respond to the user's speech. When the specific conversation application being executed cannot respond to the user's speech, the selection unit may select another specific conversation application that can respond to the user's speech. When the selection unit cannot select another specific conversation application that can respond to the user's utterance, the selection unit may select the daily conversation application.
 日常会話アプリケーションは、深層学習アルゴリズムに従ってユーザの1つの発言に対して1つの応答を実行してよい。 The daily conversation application may execute one response to one user's speech according to the deep learning algorithm.
 上記会話処理装置は、検出部により検出されたユーザの発言から抽出された少なくとも1つの単語を含む単語情報を取得する単語情報取得部を更に備えてよい。会話処理装置は、複数の特定会話アプリケーションに関連付けて、複数の特定会話アプリケーションが応答できるユーザの発言に対応する少なくとも1つの単語が登録された単語リストを格納する単語リスト格納部を更に備えてよい。選択部は、単語リストを参照して、単語情報に含まれる少なくとも1つの単語が実行中の特定会話アプリケーションに関連付けて単語リストに登録されている場合、実行中の特定会話アプリケーションを選択してよい。選択部は、単語情報に含まれる少なくとも1つの単語が、実行中の特定会話アプリケーションに関連付けて単語リストに登録されておらず、かつ他の特定会話アプリケーションに関連付けて単語リストに登録されている場合、単語リストに登録されている他の特定会話アプリケーションを選択してよい。 The conversation processing apparatus may further include a word information acquisition unit that acquires word information including at least one word extracted from the user's utterance detected by the detection unit. The conversation processing apparatus may further include a word list storage unit that stores a word list in which at least one word corresponding to a user's utterance that can be answered by the plurality of specific conversation applications is associated with the plurality of specific conversation applications. . The selection unit may select the specific conversation application being executed when at least one word included in the word information is registered in the word list in association with the specific conversation application being executed with reference to the word list. . The selection unit is configured such that at least one word included in the word information is not registered in the word list in association with the specific conversation application being executed and is registered in the word list in association with another specific conversation application. Other specific conversation applications registered in the word list may be selected.
 単語リスト格納部は、複数の特定会話アプリケーションごとに単語リストを格納してよい。選択部は、実行中の特定会話アプリケーションに関連付けられた単語リストを参照して、特定会話アプリケーションを選択してよい。 The word list storage unit may store a word list for each of a plurality of specific conversation applications. The selection unit may select a specific conversation application with reference to a word list associated with the specific conversation application being executed.
 応答部は、実行中の特定会話アプリケーションを中断して、他の特定会話アプリケーションを実行することにより、ユーザの発言に対して応答する場合、実行中の特定会話アプリケーションの実行中にユーザから得られた情報に基づいて、他の特定会話アプリケーションのアルゴリズムの開始位置を決定し、決定された開始位置に基づいて他の特定会話アプリケーションを実行してよい。 The response unit is obtained from the user during the execution of the specific conversation application being executed when the specific conversation application being executed is interrupted and responding to the user's speech by executing another specific conversation application. The start position of the algorithm of the other specific conversation application may be determined based on the obtained information, and the other specific conversation application may be executed based on the determined start position.
 上記会話処理装置は、実行中の特定会話アプリケーションを中断する場合に、実行中の特定会話アプリケーションのアルゴリズムの中断状態を記憶する中断状態記憶部を更に備えてよい。応答部は、中断状態記憶部を参照して、実行中の特定会話アプリケーションのアルゴリズムの中断状態を特定し、他の特定会話アプリケーションが終了または中断したことに対応して、先に中断された特定会話アプリケーションを中断状態に基づいて再開してよい。 The conversation processing device may further include an interruption state storage unit that stores the interruption state of the algorithm of the specific conversation application being executed when the specific conversation application being executed is interrupted. The response unit refers to the interruption state storage unit, identifies the interruption state of the algorithm of the specific conversation application being executed, and identifies the interruption that was interrupted first in response to the termination or interruption of another specific conversation application. The conversation application may be resumed based on the suspended state.
 上記会話処理装置は、検出部により検出されたユーザの発言が、予め定められた特定の発言である場合、実行中の会話アプリケーションを強制的に終了する終了部を更に備えてよい。 The conversation processing device may further include an ending unit that forcibly terminates the conversation application being executed when the user's utterance detected by the detection unit is a predetermined specific utterance.
 上記会話処理装置は、会話処理装置の周囲を撮像する撮像部を更に備えてよい。会話処理装置は、会話処理装置の周囲に存在する物体の存在を検出する赤外線センサを更に備えてよい。会話処理装置は、赤外線センサによる検出結果に応じて、ユーザの顔が撮像部の撮像範囲に含まれるように、撮像部の撮像範囲を調整する調整部とを更に備えてよい。 The conversation processing device may further include an imaging unit that images the periphery of the conversation processing device. The conversation processing device may further include an infrared sensor that detects the presence of an object existing around the conversation processing device. The conversation processing device may further include an adjustment unit that adjusts the imaging range of the imaging unit so that the user's face is included in the imaging range of the imaging unit according to the detection result of the infrared sensor.
 本発明の一態様に係るプログラムは、ユーザの発言に対して予め定められたアルゴリズムに従って応答する複数の会話アプリケーションをコンピュータに実行させるためのプログラムである。プログラムは、ユーザの発言を検出する手順をコンピュータに実行させてよい。プログラムは、実行中の会話アプリケーションがユーザの発言に対して応答できる場合、実行中の会話アプリケーションを選択し、実行中の会話アプリケーションがユーザの発言に対して応答できない場合、ユーザの発言に対して応答できる他の会話アプリケーションを選択する手順をコンピュータに実行させてよい。プログラムは、選択された会話アプリケーションが実行中の会話アプリケーションの場合、実行中の会話アプリケーションを継続し、選択された会話アプリケーションが他の会話アプリケーションの場合、実行中の会話アプリケーションを中断して、他の会話アプリケーションを実行することにより、ユーザの発言に対して応答する手順をコンピュータに実行させてよい。 The program according to an aspect of the present invention is a program for causing a computer to execute a plurality of conversation applications that respond to a user's speech according to a predetermined algorithm. The program may cause the computer to execute a procedure for detecting a user's speech. The program selects a running conversation application if the running conversation application can respond to the user's speech, and responds to the user's speech if the running conversation application cannot respond to the user's speech. The computer may be caused to perform a procedure for selecting another conversation application that can respond. The program continues the running conversation application if the selected conversation application is a running conversation application, and interrupts the running conversation application if the selected conversation application is another conversation application, and others By executing the conversation application, the computer may execute a procedure for responding to the user's speech.
 上記の発明の概要は、本発明の特徴の全てを列挙したものではない。これらの特徴群のサブコンビネーションも発明となりうる。 The above summary of the invention does not enumerate all the features of the present invention. A sub-combination of these feature groups can also be an invention.
会話処理システムのシステム構成の一例を示す図である。It is a figure which shows an example of the system configuration | structure of a conversation processing system. 会話処理装置の機能ブロックの一例を示す図である。It is a figure which shows an example of the functional block of a conversation processing apparatus. 単語リストの一例を示す図である。It is a figure which shows an example of a word list. ユーザプロファイルの一例を示す図である。It is a figure which shows an example of a user profile. 会話処理装置の会話処理の手順の一例を示すフローチャートである。It is a flowchart which shows an example of the procedure of the conversation process of a conversation processing apparatus. コンピュータの一例を示す図である。It is a figure which shows an example of a computer.
 以下、発明の実施の形態を通じて本発明を説明するが、以下の実施形態は請求の範囲にかかる発明を限定するものではない。また、実施形態の中で説明されている特徴の組み合わせの全てが発明の解決手段に必須であるとは限らない。 Hereinafter, the present invention will be described through embodiments of the invention. However, the following embodiments do not limit the invention according to the claims. In addition, not all the combinations of features described in the embodiments are essential for the solving means of the invention.
 本発明の様々な実施形態は、フローチャートおよびブロック図を参照して記載されてよく、ここにおいてブロックは、(1)操作が実行されるプロセスの段階または(2)操作を実行する役割を持つ装置のセクションを表わしてよい。特定の段階およびセクションが、専用回路、コンピュータ可読媒体上に格納されるコンピュータ可読命令と共に供給されるプログラマブル回路、および/またはコンピュータ可読媒体上に格納されるコンピュータ可読命令と共に供給されるプロセッサによって実装されてよい。専用回路は、デジタルおよび/またはアナログハードウェア回路を含んでよく、集積回路(IC)および/またはディスクリート回路を含んでよい。プログラマブル回路は、論理AND、論理OR、論理XOR、論理NAND、論理NOR、および他の論理操作、フリップフロップ、レジスタ、フィールドプログラマブルゲートアレイ(FPGA)、プログラマブルロジックアレイ(PLA)等のようなメモリ要素等を含む、再構成可能なハードウェア回路を含んでよい。 Various embodiments of the invention may be described with reference to flowcharts and block diagrams, where a block is either (1) a stage in a process in which the operation is performed or (2) an apparatus responsible for performing the operation. May represent a section of Certain stages and sections are implemented by dedicated circuitry, programmable circuitry supplied with computer readable instructions stored on a computer readable medium, and / or processor supplied with computer readable instructions stored on a computer readable medium. It's okay. Dedicated circuitry may include digital and / or analog hardware circuitry and may include integrated circuits (ICs) and / or discrete circuits. Programmable circuits include memory elements such as logical AND, logical OR, logical XOR, logical NAND, logical NOR, and other logical operations, flip-flops, registers, field programmable gate arrays (FPGA), programmable logic arrays (PLA), etc. Reconfigurable hardware circuitry, including and the like.
 コンピュータ可読媒体は、適切なデバイスによって実行される命令を格納可能な任意の有形なデバイスを含んでよく、その結果、そこに格納される命令を有するコンピュータ可読媒体は、フローチャートまたはブロック図で指定された操作を実行するための手段を作成すべく実行され得る命令を含む、製品を備えることになる。コンピュータ可読媒体の例としては、電子記憶媒体、磁気記憶媒体、光記憶媒体、電磁記憶媒体、半導体記憶媒体等が含まれてよい。コンピュータ可読媒体のより具体的な例としては、フロッピー(登録商標)ディスク、ディスケット、ハードディスク、ランダムアクセスメモリ(RAM)、リードオンリメモリ(ROM)、消去可能プログラマブルリードオンリメモリ(EPROMまたはフラッシュメモリ)、電気的消去可能プログラマブルリードオンリメモリ(EEPROM)、静的ランダムアクセスメモリ(SRAM)、コンパクトディスクリードオンリメモリ(CD-ROM)、デジタル多用途ディスク(DVD)、ブルーレイ(RTM)ディスク、メモリスティック、集積回路カード等が含まれてよい。 Computer readable media may include any tangible device capable of storing instructions to be executed by a suitable device, such that a computer readable medium having instructions stored thereon is specified in a flowchart or block diagram. A product including instructions that can be executed to create a means for performing the operation. Examples of computer readable media may include electronic storage media, magnetic storage media, optical storage media, electromagnetic storage media, semiconductor storage media, and the like. More specific examples of computer readable media include floppy disks, diskettes, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), Electrically erasable programmable read only memory (EEPROM), static random access memory (SRAM), compact disc read only memory (CD-ROM), digital versatile disc (DVD), Blu-ray (RTM) disc, memory stick, integrated A circuit card or the like may be included.
 コンピュータ可読命令は、アセンブラ命令、命令セットアーキテクチャ(ISA)命令、マシン命令、マシン依存命令、マイクロコード、ファームウェア命令、状態設定データ、またはSmalltalk、JAVA(登録商標)、C++等のようなオブジェクト指向プログラミング言語、および「C」プログラミング言語または同様のプログラミング言語のような従来の手続型プログラミング言語を含む、1または複数のプログラミング言語の任意の組み合わせで記述されたソースコードまたはオブジェクトコードのいずれかを含んでよい。 Computer readable instructions can be assembler instructions, instruction set architecture (ISA) instructions, machine instructions, machine dependent instructions, microcode, firmware instructions, state setting data, or object oriented programming such as Smalltalk, JAVA, C ++, etc. Including any source code or object code written in any combination of one or more programming languages, including languages and conventional procedural programming languages such as "C" programming language or similar programming languages Good.
 コンピュータ可読命令は、汎用コンピュータ、特殊目的のコンピュータ、若しくは他のプログラム可能なデータ処理装置のプロセッサまたはプログラマブル回路に対し、ローカルにまたはローカルエリアネットワーク(LAN)、インターネット等のようなワイドエリアネットワーク(WAN)を介して提供され、フローチャートまたはブロック図で指定された操作を実行するための手段を作成すべく、コンピュータ可読命令を実行してよい。プロセッサの例としては、コンピュータプロセッサ、処理ユニット、マイクロプロセッサ、デジタル信号プロセッサ、コントローラ、マイクロコントローラ等を含む。 Computer readable instructions may be directed to a general purpose computer, special purpose computer, or other programmable data processing device processor or programmable circuit locally or in a wide area network (WAN) such as a local area network (LAN), the Internet, etc. The computer-readable instructions may be executed to create a means for performing the operations provided via and specified in the flowchart or block diagram. Examples of processors include computer processors, processing units, microprocessors, digital signal processors, controllers, microcontrollers, and the like.
 図1は、本実施形態に係る会話処理システムのシステム構成の一例を示す。会話処理システムは、会話処理装置100、テキスト変換装置200、及び形態素解析装置300を備える。会話処理装置100、テキスト変換装置200、及び形態素解析装置300は、ネットワーク50を介して接続される。会話処理装置100は、ユーザの発言に対して音声、画像、動きなどで応答する。 FIG. 1 shows an example of the system configuration of a conversation processing system according to this embodiment. The conversation processing system includes a conversation processing apparatus 100, a text conversion apparatus 200, and a morpheme analysis apparatus 300. The conversation processing device 100, the text conversion device 200, and the morpheme analysis device 300 are connected via a network 50. The conversation processing apparatus 100 responds to the user's utterance with voice, image, movement or the like.
 会話処理装置100は、マイク101、カメラ102、スピーカ104、表示部105、タッチセンサ106などを備える。会話処理装置100は、マイク101を介してユーザの音声を検出する。会話処理装置100は、カメラ102を介してユーザの表情等を検出する。会話処理装置100は、スピーカ104を介してユーザに音声で情報を伝達する。会話処理装置100は、表示部105を介してユーザに画像で情報を伝達する。会話処理装置100は、タッチセンサ106等を介して音声以外の手段でユーザとコミュニケーションしてよい。 The conversation processing apparatus 100 includes a microphone 101, a camera 102, a speaker 104, a display unit 105, a touch sensor 106, and the like. The conversation processing apparatus 100 detects the user's voice via the microphone 101. The conversation processing apparatus 100 detects a user's facial expression or the like via the camera 102. The conversation processing apparatus 100 transmits information by voice to the user via the speaker 104. The conversation processing apparatus 100 transmits information to the user as an image via the display unit 105. The conversation processing apparatus 100 may communicate with the user by means other than voice via the touch sensor 106 or the like.
 テキスト変換装置200は、会話処理装置100から提供された音声データから単語を抽出してテキストデータを生成する。テキスト変換装置200は、生成されたテキストデータを会話処理装置100に返信する。形態素解析装置300は、会話処理装置100から提供されたテキストデータに対して形態素解析を実行して、形態素解析データを生成する。形態素解析装置300は、生成された形態素解析データを会話処理装置100に返信する。会話処理装置100は、形態素解析データを参照して、ユーザの発言に対して応答する。 The text conversion device 200 extracts words from the audio data provided from the conversation processing device 100 and generates text data. The text conversion device 200 returns the generated text data to the conversation processing device 100. The morpheme analyzer 300 performs morpheme analysis on the text data provided from the conversation processing device 100 to generate morpheme analysis data. The morpheme analyzer 300 returns the generated morpheme analysis data to the conversation processing device 100. The conversation processing apparatus 100 refers to the morphological analysis data and responds to the user's statement.
 図2は、会話処理装置100の機能ブロックの一例を示す。会話処理装置100は、ユーザの発言に対して予め定められたアルゴリズムに従って応答する複数の会話アプリケーションを実行する。複数の会話アプリケーションは、予め定められた条件を満たすまでアルゴリズムに従ってユーザとの会話を継続する複数の特定会話アプリケーションを含んでよい。特定会話アプリケーションは、ユーザとの会話を通じて特定の目的を達成するための会話アプリケーションでよい。複数の特定会話アプリケーションは、ユーザの所望のスケジュールが登録されるまでユーザとの会話を継続するスケジュール会話アプリケーションを含む。複数の特定会話アプリケーションは、特定の場所の特定の日時における天気情報を提供するまでユーザとの会話を継続する天気会話アプリケーションを含む。複数の特定会話アプリケーションは、特定の料理のレシピを提供するまでユーザとの会話を継続するレシピ会話アプリケーションを含む。複数の特定会話アプリケーションは、ユーザとの会話を通じて予め定められたルールに従ったゲームを実行するゲーム会話アプリケーションを含む。 FIG. 2 shows an example of functional blocks of the conversation processing apparatus 100. The conversation processing apparatus 100 executes a plurality of conversation applications that respond to a user's speech according to a predetermined algorithm. The plurality of conversation applications may include a plurality of specific conversation applications that continue the conversation with the user according to an algorithm until a predetermined condition is satisfied. The specific conversation application may be a conversation application for achieving a specific purpose through conversation with the user. The plurality of specific conversation applications include a schedule conversation application that continues the conversation with the user until the user's desired schedule is registered. The plurality of specific conversation applications include a weather conversation application that continues the conversation with the user until providing weather information at a specific date and time at a specific location. The plurality of specific conversation applications include a recipe conversation application that continues the conversation with the user until a specific cooking recipe is provided. The plurality of specific conversation applications include a game conversation application that executes a game according to a predetermined rule through a conversation with a user.
 複数の会話アプリケーションは、更にユーザの1つの発言に対して1つの応答を実行する日常会話アプリケーションを含む。日常会話アプリケーションは、特定会話アプリケーションとは異なるアルゴリズムに従って動作してよい。日常会話アプリケーションは、例えば、深層学習アルゴリズムに従ってユーザの特性に合わせた応答を実行する。 The plurality of conversation applications further include a daily conversation application that executes one response to one utterance of the user. The daily conversation application may operate according to a different algorithm from the specific conversation application. The daily conversation application executes, for example, a response according to the user's characteristics according to a deep learning algorithm.
 会話処理装置100が日常会話アプリケーションを実行中に、ユーザが会話処理装置100にとって未知の単語を発言したとする。会話処理装置100がその単語の意味をユーザに聞くように、日常会話アプリケーションのアルゴリズムが設計されてよい。例えば、会話処理装置100にとって未知の単語を、「_UNK」と定義する。例えば、「今日、○○で_UNKに会ったよ。」というユーザの発言に対して、「_UNKってどんな人ですか?」と応答するように、日常会話アプリケーションのアルゴリズムが設計されてよい。このような設計により、会話処理装置100にとって未知の単語をユーザが発言した場合でも、適切な回答ができるように、アルゴリズムが設計されてよい。 Suppose that the user speaks an unknown word to the conversation processing device 100 while the conversation processing device 100 is executing the daily conversation application. The algorithm of the daily conversation application may be designed so that the conversation processing apparatus 100 asks the user the meaning of the word. For example, a word unknown to the conversation processing apparatus 100 is defined as “_UNK”. For example, an algorithm for a daily conversation application may be designed to respond to a user's statement “I met _UNK today at XX” as “What kind of person is _UNK?”. With such a design, an algorithm may be designed so that an appropriate answer can be made even when a user speaks a word unknown to the conversation processing apparatus 100.
 複数の会話アプリケーションは、更に、ユーザとの会話を通じて会話処理装置100の各種設定を行うシステム会話アプリケーションを含む。ユーザは、会話処理装置100の音量を設定したい場合、通信設定をしたい場合などにシステム会話アプリケーションを通じて、会話処理装置100と会話して、会話処理装置100の各種設定を行う。 The plurality of conversation applications further include a system conversation application that performs various settings of the conversation processing apparatus 100 through conversation with the user. When the user wants to set the volume of the conversation processing apparatus 100 or communication setting, the user has a conversation with the conversation processing apparatus 100 through the system conversation application and performs various settings of the conversation processing apparatus 100.
 会話処理装置100は、ユーザの発言に対する応答を現在実行中の会話アプリケーションで実行できないと判断した場合、実行中の会話アプリケーションを中断する。会話処理装置100は、そのユーザの発言に対する応答が可能な適切な会話アプリケーションを選択して、そのユーザの発言に対して応答する。 If the conversation processing apparatus 100 determines that the response to the user's statement cannot be executed by the currently executing conversation application, the conversation processing apparatus 100 interrupts the currently executing conversation application. The conversation processing apparatus 100 selects an appropriate conversation application that can respond to the user's statement and responds to the user's statement.
 会話処理装置100は、マイク101、カメラ102、スピーカ104、表示部105、タッチセンサ106、赤外線センサ107、アクチュエータ108、検出部110、画像処理部112、音声制御部114、表示制御部116、センサ制御部118、及びアクチュエータ制御部119を備える。会話処理装置100は、アプリケーション実行部120、送受信部130、単語情報取得部132、選択部134、アプリケーション格納部140、単語リスト格納部142、及びユーザプロファイル格納部144を更に備える。 The conversation processing apparatus 100 includes a microphone 101, a camera 102, a speaker 104, a display unit 105, a touch sensor 106, an infrared sensor 107, an actuator 108, a detection unit 110, an image processing unit 112, an audio control unit 114, a display control unit 116, and a sensor. A control unit 118 and an actuator control unit 119 are provided. The conversation processing apparatus 100 further includes an application execution unit 120, a transmission / reception unit 130, a word information acquisition unit 132, a selection unit 134, an application storage unit 140, a word list storage unit 142, and a user profile storage unit 144.
 マイク101は、ユーザが発した音声を検出する。マイク101は、指向性マイクでよい。カメラ102は、会話処理装置100の周囲の環境を撮像する。カメラ102は、例えば、会話処理装置100と会話するユーザの顔を撮像する。スピーカ104は、音声を出力する。表示部105は、ユーザに提示する各種情報を表示する。表示部105は、タッチパネル付きの液晶表示ユニットでよい。タッチセンサ106は、ユーザの指、掌などが接触したことを検出する。赤外線センサ107は、会話処理装置100の周囲に存在するユーザなどの物体を検出する。赤外線センサ107は、焦電型赤外線センサでよい。あアクチュエータ108は、会話処理装置100が備える可動部材を作動させる動力を提供する。会話処理装置100が、頭部、及び腕部を有する場合、アクチュエータ108は、例えば、頭部、及び腕部の少なくとも一方を回転させてよい。 The microphone 101 detects the voice uttered by the user. The microphone 101 may be a directional microphone. The camera 102 images the environment around the conversation processing apparatus 100. For example, the camera 102 captures an image of the face of a user who has a conversation with the conversation processing apparatus 100. The speaker 104 outputs sound. The display unit 105 displays various information presented to the user. The display unit 105 may be a liquid crystal display unit with a touch panel. The touch sensor 106 detects that a user's finger, palm, or the like has touched. The infrared sensor 107 detects an object such as a user existing around the conversation processing apparatus 100. The infrared sensor 107 may be a pyroelectric infrared sensor. The actuator 108 provides power for operating the movable member provided in the conversation processing apparatus 100. When the conversation processing apparatus 100 has a head and an arm, the actuator 108 may rotate at least one of the head and the arm, for example.
 会話処理装置100は、カメラ102により撮像された画像に基づいて、ユーザが存在する方向を特定してよい。会話処理装置100は、特定された方向に、マイク101が向くように、アクチュエータ108を制御して、マイク101が設けられた頭部などの可動部材を回転させてよい。 The conversation processing device 100 may specify the direction in which the user exists based on the image captured by the camera 102. The conversation processing apparatus 100 may rotate the movable member such as the head provided with the microphone 101 by controlling the actuator 108 so that the microphone 101 faces in the specified direction.
 赤外線センサ107は、カメラ102の撮像範囲外に存在する物体を検出するように配置されてよい。例えば、ユーザがカメラ102の撮像範囲外に存在する場合に、会話処理装置100は、赤外線センサ107を利用してユーザの位置を推定する。会話処理装置100は、ユーザがカメラ102の撮像範囲内に含まれるように、カメラ102が設けられた頭部などの可動部材を回転させてよい。ユーザがカメラ102の画角内に存在しない場合でも、会話処理装置100は、容易にユーザを検出して、ユーザとの会話に最適な方向にカメラ102及びマイク101を向けることができる。 The infrared sensor 107 may be arranged to detect an object that exists outside the imaging range of the camera 102. For example, when the user is outside the imaging range of the camera 102, the conversation processing apparatus 100 estimates the user's position using the infrared sensor 107. The conversation processing apparatus 100 may rotate a movable member such as a head provided with the camera 102 so that the user is included in the imaging range of the camera 102. Even when the user does not exist within the angle of view of the camera 102, the conversation processing apparatus 100 can easily detect the user and point the camera 102 and the microphone 101 in the optimum direction for the conversation with the user.
 検出部110は、ユーザの発言を検出する。検出部110は、音声認識部111を含む。音声認識部111は、ユーザの発言を音声データに変換する。画像処理部112は、カメラ102により撮像された画像データを処理する。画像処理部112は、例えば、画像データからユーザの顔画像データを抽出する。画像処理部112は、抽出された顔画像データから顔の特徴量を抽出する。特徴量は、人物などの対象物を識別可能な情報であればよく、顔画像データの画素値の情報、または顔画像データに含まれる顔の目、鼻、口の間隔または大きさ、肌の色、髪型などの外見的特徴を数値で示した情報などでよい。画像処理部112は、顔の特徴量を示すデータをアプリケーション実行部120に提供する。 The detection unit 110 detects a user's speech. The detection unit 110 includes a voice recognition unit 111. The voice recognition unit 111 converts a user's speech into voice data. The image processing unit 112 processes image data captured by the camera 102. For example, the image processing unit 112 extracts user face image data from the image data. The image processing unit 112 extracts a facial feature amount from the extracted face image data. The feature amount may be information that can identify an object such as a person, and information on pixel values of face image data, or the interval or size of the face eyes, nose, mouth included in the face image data, skin It may be information indicating numerical values of appearance features such as color and hairstyle. The image processing unit 112 provides the application execution unit 120 with data indicating the facial feature amount.
 音声制御部114は、アプリケーション実行部120から提供された音声情報に基づく音声をスピーカ104に出力する。表示制御部116は、アプリケーション実行部120から提供される画像情報を表示部105に表示させる。センサ制御部118は、タッチセンサ106及び赤外線センサ107からの検出信号を受信して、アクチュエータ制御部119及びアプリケーション実行部120に提供する。 The sound control unit 114 outputs sound based on the sound information provided from the application execution unit 120 to the speaker 104. The display control unit 116 causes the display unit 105 to display the image information provided from the application execution unit 120. The sensor control unit 118 receives detection signals from the touch sensor 106 and the infrared sensor 107 and provides them to the actuator control unit 119 and the application execution unit 120.
 アクチュエータ制御部119は、アクチュエータ108を制御する。アクチュエータ制御部119は、調整部109を有する。調整部109は、赤外線センサ107による検出結果に応じて、カメラ102の撮像範囲を調整する。ユーザがカメラ102の撮像範囲外に存在する場合に、調整部109は、赤外線センサ107を利用してユーザの位置を推定する。調整部109は、ユーザの顔がカメラ102の撮像範囲内に含まれるように、アクチュエータ108を制御して、カメラ102が設けられた頭部などの可動部材を回転させてよい。調整部109は、カメラ102の画角を調整することで、カメラ102の撮像範囲を調整してよい。 The actuator control unit 119 controls the actuator 108. The actuator control unit 119 has an adjustment unit 109. The adjustment unit 109 adjusts the imaging range of the camera 102 according to the detection result by the infrared sensor 107. When the user exists outside the imaging range of the camera 102, the adjustment unit 109 estimates the position of the user using the infrared sensor 107. The adjustment unit 109 may rotate the movable member such as the head provided with the camera 102 by controlling the actuator 108 so that the user's face is included in the imaging range of the camera 102. The adjustment unit 109 may adjust the imaging range of the camera 102 by adjusting the angle of view of the camera 102.
 会話処理装置100は、カメラ102を利用してビデオ通話を実現してよい。会話処理装置100は、カメラ102、カメラ102が設けられた頭部などの可動部材、カメラ102の画角の制御により、会話処理装置100の周囲の環境を遠隔から監視する遠隔カメラとしてカメラ102を機能させてよい。また、会話処理装置100は、ユーザとの会話などのユーザとのコミュニケーションを履歴情報として記憶してよい。会話処理装置100は、履歴情報に基づいてユーザに異常がないかどうかを判断してよい。会話処理装置100は、履歴情報からユーザの生活パターンを予測し、その生活パターンと異なる行動をユーザがとった場合に、ユーザに異常があると判断してよい。会話処理装置100は、ユーザに異常があると判断した場合に、送受信部130を介して特定の宛先に異常を通知してよい。 The conversation processing apparatus 100 may realize a video call using the camera 102. The conversation processing apparatus 100 uses the camera 102 as a remote camera that remotely monitors the environment surrounding the conversation processing apparatus 100 by controlling the camera 102, a movable member such as a head provided with the camera 102, and the angle of view of the camera 102. May function. The conversation processing apparatus 100 may store communication with the user such as conversation with the user as history information. The conversation processing apparatus 100 may determine whether or not the user has an abnormality based on the history information. The conversation processing apparatus 100 may determine that the user has an abnormality when the user's life pattern is predicted from the history information and the user takes an action different from the life pattern. When the conversation processing apparatus 100 determines that the user has an abnormality, the conversation processing apparatus 100 may notify the specific destination of the abnormality via the transmission / reception unit 130.
 送受信部130は、ネットワーク50を介してテキスト変換装置200及び形態素解析装置300とデータを送受信する。単語情報取得部132は、検出部110により検出されたユーザの発言から抽出された少なくとも1つの単語を含む単語情報を取得する。単語情報取得部132は、形態素解析装置300から送受信部130を介して提供された形態素解析データを単語情報として取得してよい。形態素解析データは、ユーザの発言に含まれる各単語、各単語の発言順、各単語の品詞などを含むデータでよい。 The transmission / reception unit 130 transmits / receives data to / from the text conversion device 200 and the morphological analysis device 300 via the network 50. The word information acquisition unit 132 acquires word information including at least one word extracted from the user's utterance detected by the detection unit 110. The word information acquisition unit 132 may acquire morphological analysis data provided from the morphological analysis device 300 via the transmission / reception unit 130 as word information. The morphological analysis data may be data including each word included in the user's utterance, the utterance order of each word, the part of speech of each word, and the like.
 アプリケーション格納部140は、複数の会話アプリケーションを格納する。アプリケーション格納部140は、複数の特定会話アプリケーション、及び日常会話アプリケーションを格納してよい。 The application storage unit 140 stores a plurality of conversation applications. The application storage unit 140 may store a plurality of specific conversation applications and daily conversation applications.
 選択部134は、実行中の会話アプリケーションがユーザの発言に対して応答できる場合、実行中の会話アプリケーションを選択する。選択部134は、実行中の会話アプリケーションがユーザの発言に対して応答できない場合、ユーザの発言に対して応答できる他の会話アプリケーションを選択する。 When the conversation application being executed can respond to the user's speech, the selection unit 134 selects the conversation application being executed. When the conversation application being executed cannot respond to the user's speech, the selection unit 134 selects another conversation application that can respond to the user's speech.
 選択部134は、実行中の特定会話アプリケーションがユーザの発言に対して応答できる場合、実行中の特定会話アプリケーションを選択する。選択部134は、実行中の特定会話アプリケーションがユーザの発言に対して応答できない場合、ユーザの発言に対して応答できる他の特定会話アプリケーションを選択する。選択部134は、ユーザの発言に対して応答できる他の特定会話アプリケーションを選択できない場合、日常会話アプリケーションを選択する。選択部134により選択された会話アプリケーションがアプリケーション実行部120により実行される。 The selection unit 134 selects the specific conversation application being executed when the specific conversation application being executed can respond to the user's speech. When the specific conversation application being executed cannot respond to the user's speech, the selection unit 134 selects another specific conversation application that can respond to the user's speech. When the other specific conversation application that can respond to the user's speech cannot be selected, the selection unit 134 selects the daily conversation application. The conversation application selected by the selection unit 134 is executed by the application execution unit 120.
 単語リスト格納部142は、複数の特定会話アプリケーションに関連付けて、複数の特定会話アプリケーションが応答できるユーザの発言に対応する少なくとも1つの単語が登録された単語リストを格納する。単語リスト格納部142は、例えば、図3に示すような単語リストを格納する。単語リストは、ユーザの発言に含まれる発言順に並ぶ単語の組み合わせと、その単語の組み合わせから適切な応答ができると推定される特定会話アプリケーションとを含む。単語リストは、更に選択部134により選択される優先度を含む。例えば、実行中の特定会話アプリケーションの種類に関わらず、ユーザが特定の発言をした場合に、選択される会話アプリケーションに対して最も高い優先度「1」が設定される。次いで、実行中の特定会話アプリケーションに対して次の優先度「2」が設定される。その他の特定会話アプリケーションに対して優先度「3」が設定される。図3に示す単語リストは、数値が小さいほど優先度が高いことを示している。しかし、単語リストは、数値が大きいほど優先度が高いことを示してもよい。優先の度合いを示す指標は、数値でなくてもよい。 The word list storage unit 142 stores a word list in which at least one word corresponding to a user's utterance to which a plurality of specific conversation applications can respond is registered in association with a plurality of specific conversation applications. The word list storage unit 142 stores, for example, a word list as shown in FIG. The word list includes combinations of words arranged in the order of statements included in the user's statements and a specific conversation application that is estimated to be able to respond appropriately from the combinations of words. The word list further includes a priority selected by the selection unit 134. For example, when the user makes a specific utterance regardless of the type of the specific conversation application being executed, the highest priority “1” is set for the conversation application to be selected. Next, the next priority “2” is set for the specific conversation application being executed. The priority “3” is set for the other specific conversation application. The word list shown in FIG. 3 indicates that the lower the numerical value, the higher the priority. However, the word list may indicate that the higher the numerical value, the higher the priority. The index indicating the degree of priority may not be a numerical value.
 実行中の特定会話アプリケーションの種類によって、より自然な会話を実現すべく実行中の特定会話アプリケーションを中断して割り込んで実行される他の特定会話アプリケーションの種類は異なる可能性がある。よって、単語リスト格納部142は、特定会話アプリケーションごとに単語リストを格納してよい。 Depending on the type of the specific conversation application being executed, the type of other specific conversation application that is executed by interrupting and interrupting the specific conversation application being executed may be different in order to realize a more natural conversation. Therefore, the word list storage unit 142 may store a word list for each specific conversation application.
 選択部134は、単語情報に含まれる少なくとも1つの単語が実行中の特定会話アプリケーションに関連付けて単語リストに登録されている場合、実行中の特定会話アプリケーションを選択する。選択部134は、単語情報に含まれる少なくとも1つの単語が、実行中の特定会話アプリケーションに関連付けて単語リストに登録されておらず、かつ他の特定会話アプリケーションに関連付けて単語リストに登録されている場合、単語リストに登録されている他の特定会話アプリケーションを選択する。 The selection unit 134 selects the specific conversation application being executed when at least one word included in the word information is registered in the word list in association with the specific conversation application being executed. In the selection unit 134, at least one word included in the word information is not registered in the word list in association with the specific conversation application being executed, and is registered in the word list in association with another specific conversation application. If so, another specific conversation application registered in the word list is selected.
 例えば、ユーザが「明日の天気は?」と発言したとする。選択部134は、ユーザの発言に含まれる(1)「明日」-(2)「天気」という単語の組み合わせに対して応答できる特定会話アプリケーションとして、天気会話アプリケーションを選択する。その後、ユーザが「じゃあ、明後日は?」と発言したとする。この場合、選択部134は、実行中の特定会話アプリケーションとして、引き続き天気会話アプリケーションを選択する。一方、その後、ユーザが「じゃあ、明日の予定を入力しよう。」と発言したとする。この場合、選択部134は、天気会話アプリケーションを中断して、他の特定会話アプリケーションとして、スケジュール会話アプリケーションを選択する。 For example, suppose that the user says “What is the weather tomorrow?”. The selection unit 134 selects a weather conversation application as a specific conversation application that can respond to a combination of the words (1) “Tomorrow”-(2) “Weather” included in the user's remarks. Then, suppose that the user says “Okay, what will be the day after tomorrow?”. In this case, the selection unit 134 continues to select the weather conversation application as the specific conversation application being executed. On the other hand, suppose that the user then remarks, "Now, let's enter tomorrow's schedule." In this case, the selection unit 134 interrupts the weather conversation application and selects the scheduled conversation application as another specific conversation application.
 ユーザプロファイル格納部144は、ユーザに関連する少なくとも1つの項目を含む少なくとも1つのユーザプロファイルを格納する。ユーザプロファイルの各項目には、ユーザとの会話を通じて抽出された単語が登録される。ユーザプロファイル格納部144は、例えば、図4に示すような、ユーザの名前、生年月日、住所、好きな食べ物、好きなスポーツなどユーザの個性を示す複数の項目を含むユーザプロファイルを格納する。応答部121は、ユーザプロファイルの各項目の単語に基づいて、ユーザへの応答に含める情報を決定してよい。 The user profile storage unit 144 stores at least one user profile including at least one item related to the user. In each item of the user profile, words extracted through conversation with the user are registered. The user profile storage unit 144 stores a user profile including a plurality of items indicating the individuality of the user, such as the user's name, date of birth, address, favorite food, favorite sports, as shown in FIG. The response unit 121 may determine information to be included in the response to the user based on the word of each item of the user profile.
 アプリケーション実行部120は、応答部121、登録部122、中断状態記憶部123、及び終了部124を含む。応答部121は、選択された会話アプリケーションが実行中の会話アプリケーションの場合、実行中の会話アプリケーションを継続する。応答部121は、選択された会話アプリケーションが他の会話アプリケーションの場合、実行中の会話アプリケーションを中断して、他の会話アプリケーションを実行することにより、ユーザの発言に対して応答する。 The application execution unit 120 includes a response unit 121, a registration unit 122, an interruption state storage unit 123, and an end unit 124. If the selected conversation application is a running conversation application, the response unit 121 continues the running conversation application. When the selected conversation application is another conversation application, the response unit 121 suspends the conversation application being executed and executes another conversation application, thereby responding to the user's statement.
 登録部122は、単語情報に含まれる少なくとも1つの単語のうち少なくとも1つの項目に属する単語をユーザプロファイルに登録する。例えば、ユーザが好きな食べ物を発言したとする。すると、登録部122は、ユーザの発言からユーザの好きな食べ物の単語を抽出して、ユーザプロファイルに登録する。応答部121は、ユーザプロファイルを参照して、ユーザへの応答の内容を最適化してよい。応答部121は、ユーザへの応答の内容の一部を、ユーザプロファイルの内容を応じて変更して、応答を最適化してよい。 The registration unit 122 registers words belonging to at least one item among at least one word included in the word information in the user profile. For example, it is assumed that the user has said a favorite food. Then, the registration unit 122 extracts the user's favorite food word from the user's remarks and registers it in the user profile. The response unit 121 may optimize the content of the response to the user with reference to the user profile. The response unit 121 may optimize a response by changing a part of the content of the response to the user according to the content of the user profile.
 例えば、ユーザの好きな食べ物を、_FAV_FOODと定義する。「今日はすごくいいことがあったよ」というユーザの発言に対して、「それは良かったですね。今日の夕飯は、_FAV_FOODでお祝いしましょう。」という応答用のフレーズを定義する。この場合、応答部121は、ユーザプロファイルを参照して、ユーザの好きな食べ物を特定して、特定された単語を応答用のフレーズに挿入して、応答の内容を最適化する。 For example, a user's favorite food is defined as _FAV_FOOD. In response to the user's remark that “There was something very good today,” a response phrase “It was good. Let's celebrate today's dinner with _FAV_FOOD” is defined. In this case, the response unit 121 refers to the user profile, identifies the user's favorite food, inserts the identified word into the response phrase, and optimizes the content of the response.
 中断状態記憶部123は、実行中の特定会話アプリケーションを中断する場合に、実行中の特定会話アプリケーションのアルゴリズムの中断状態を記憶する。中断状態は、特定会話アプリケーションのアルゴリズムの中断位置、及び特定会話アプリケーションが中断されるまでにユーザの発言から得られた情報を含む。ユーザの発言から得られた情報は、ユーザプロファイルに登録される情報を含んでよい。応答部121は、実行中の特定会話アプリケーションを中断して、他の特定会話アプリケーションを実行することにより、ユーザの発言に対して応答する場合、実行中の特定会話アプリケーションの実行中にユーザから得られた情報に基づいて、他の特定会話アプリケーションのアルゴリズムの開始位置を決定してよい。応答部121は、決定された開始位置に基づいて他の特定会話アプリケーションを実行してよい。 The interruption state storage unit 123 stores the interruption state of the algorithm of the specific conversation application being executed when the specific conversation application being executed is interrupted. The interruption state includes the position where the algorithm of the specific conversation application is interrupted and information obtained from the user's speech until the specific conversation application is interrupted. The information obtained from the user's remarks may include information registered in the user profile. When responding to a user's statement by interrupting the specific conversation application being executed and executing another specific conversation application, the response unit 121 obtains from the user during execution of the specific conversation application being executed. Based on the obtained information, the start position of the algorithm of another specific conversation application may be determined. The response unit 121 may execute another specific conversation application based on the determined start position.
 例えば、天気会話アプリケーションを利用して、ユーザが週末の天気について会話処理装置100と会話しているときに、ユーザが、週末が晴れそうなことがわかり、外出の予定を入れたくなったとする。ユーザは、会話処理装置100に対して「では、その日の予定を入力したい。」のような発言をする。応答部121は、入力すべき予定が週末であることを考慮して、スケジュール会話アプリケーションのアルゴリズムの開始位置を決定する。例えば、応答部121は、スケジュールを入力すべき日が決定された後の時点から、スケジュール会話アプリケーションのアルゴリズムを開始する。 For example, it is assumed that when the user is talking with the conversation processing apparatus 100 about the weather on the weekend using the weather conversation application, the user knows that the weekend is likely to be sunny and wants to go out. The user makes a statement such as “I want to input the schedule for the day” to the conversation processing apparatus 100. The response unit 121 determines the start position of the algorithm of the scheduled conversation application in consideration that the schedule to be input is a weekend. For example, the response unit 121 starts the algorithm of the scheduled conversation application from the time after the date on which the schedule is to be input is determined.
 応答部121は、実行中の特定会話アプリケーションを中断して、他の特定会話アプリケーションを実行することにより、ユーザの発言に対して応答する場合、中断状態記憶部123を参照して、実行中の特定会話アプリケーションのアルゴリズムの中断状態を特定する。応答部121は、他の特定会話アプリケーションが終了または中断したことに対応して、先に中断された特定会話アプリケーションを中断状態に基づいて再開する。例えば、応答部121は、他の特定会話アプリケーションを実行中に、他の会話アプリケーションが応答できない発言をユーザがした場合、他の特定会話アプリケーションを中断する。言い換えれば、応答部121は、他の特定会話アプリケーションを実行中に、ユーザが他の特定会話アプリケーションに関する話題とは異なる話題を開始した場合、他の特定会話アプリケーションを中断する。応答部121は、他の特定会話アプリケーションが中断したことに対応して、先に中断された特定会話アプリケーションのうちユーザの発言に対して応答できる特定会話アプリケーションを、その特定会話アプリケーションの中断状態に基づいて再開してよい。すなわち、応答部121は、他の特定会話アプリケーションが中断したことに対応して、先に中断された特定会話アプリケーションのうちユーザが新たに始めた話題に対応する特定会話アプリケーションを、その特定会話アプリケーションの中断状態に基づいて再開してよい。 When the response unit 121 responds to the user's statement by interrupting the specific conversation application being executed and executing another specific conversation application, the response unit 121 refers to the interruption state storage unit 123 and is executing Identify the interruption status of the algorithm for a specific conversation application. In response to the termination or interruption of another specific conversation application, the response unit 121 resumes the specific conversation application that was previously suspended based on the suspended state. For example, the response unit 121 interrupts the other specific conversation application when the user makes a statement that the other conversation application cannot respond while executing the other specific conversation application. In other words, the response unit 121 suspends the other specific conversation application when the user starts a topic different from the topic related to the other specific conversation application while executing the other specific conversation application. In response to the interruption of the other specific conversation application, the response unit 121 sets the specific conversation application that can respond to the user's speech among the specific conversation applications previously interrupted to the suspended state of the specific conversation application. You may resume based. That is, in response to the interruption of another specific conversation application, the response unit 121 selects a specific conversation application corresponding to a topic newly started by the user from among the specific conversation applications previously interrupted. You may resume based on the interruption status.
 終了部124は、検出部110により検出されたユーザの発言が、予め定められた特定の発言である場合、実行中の会話アプリケーションを強制的に終了する。例えば、ユーザが、「ホーム」、「強制終了」などの特定の発言をした場合、終了部124は、実行中の会話アプリケーションを強制的に終了する。 The termination unit 124 forcibly terminates the conversation application being executed when the user's speech detected by the detection unit 110 is a predetermined specific speech. For example, when the user makes a specific statement such as “home” or “forced termination”, the termination unit 124 forcibly terminates the conversation application being executed.
 会話処理装置100は、赤外線受光部126、赤外線発光部128、及び周辺機器制御部129を更に備える。周辺機器制御部129は、会話処理装置100を周辺機器用の遠隔制御端末(例えば、リモートコントローラ)として機能させる。周辺機器は、例えば、テレビ及びレコーダなどのAV機器、エアコン及び扇風機などの家電機器など、遠隔制御端末から赤外線または無線で送信される制御命令に応じて動作する機器である。赤外線受光部126は、遠隔制御端末から赤外線で制御命令を受信する。赤外線発光部128は、周辺機器を制御するための制御命令を赤外線で送信する。 The conversation processing apparatus 100 further includes an infrared light receiving unit 126, an infrared light emitting unit 128, and a peripheral device control unit 129. The peripheral device control unit 129 causes the conversation processing apparatus 100 to function as a remote control terminal (for example, a remote controller) for peripheral devices. Peripheral devices are devices that operate in response to control commands transmitted from a remote control terminal in an infrared or wireless manner, such as AV devices such as televisions and recorders, and home appliances such as air conditioners and electric fans. The infrared light receiving unit 126 receives a control command by infrared rays from the remote control terminal. The infrared light emitting unit 128 transmits a control command for controlling the peripheral device by infrared rays.
 周辺機器制御部129は、制御対象の周辺機器の制御命令と制御内容とを関連付けた制御命令リストを記憶する。周辺機器制御部129は、例えば、ユーザが新たな制御対象の周辺機器を登録することを希望する場合に、ユーザとの会話を通じて制御命令の登録処理を実行する。例えば、周辺機器制御部129は、ユーザに遠隔制御端末の操作を依頼して、遠隔制御端末から各種の制御命令を送信させ、受信した各種の制御命令と各制御内容とを関連付けた制御命令リストを生成する。周辺機器制御部129は、制御内容に応じた遠隔制御端末のボタンを順次ユーザに押下させて、遠隔制御端末から発光された各制御命令を赤外線受光部126を介して受光してよい。周辺機器制御部129は、受光された各制御命令と各制御内容とを関連付けた制御命令リストに生成してよい。あるいは、周辺機器制御部129は、制御対象の周辺機器の種類を一意に特定する番号を遠隔制御端末を介してユーザに入力させてもよい。周辺機器制御部129は、入力された番号に対応する制御命令リストをインターネットなどネットワーク50を介して取得してよい。周辺機器制御部129は、周辺機器の制御をユーザから依頼された場合、制御命令リストを参照して、依頼された制御内容に関連付けられた制御命令を特定する。周辺機器制御部129は、特定された制御命令を赤外線発光部128を介して赤外線で制御対象の周辺機器に送信する。 Peripheral device control unit 129 stores a control command list that associates control commands and control contents of peripheral devices to be controlled. For example, when the user desires to register a new peripheral device to be controlled, the peripheral device control unit 129 executes a control command registration process through a conversation with the user. For example, the peripheral device control unit 129 requests the user to operate the remote control terminal, transmits various control commands from the remote control terminal, and associates the received various control commands with each control content. Is generated. The peripheral device control unit 129 may cause the user to sequentially press the buttons of the remote control terminal corresponding to the control contents, and receive each control command emitted from the remote control terminal via the infrared light receiving unit 126. The peripheral device control unit 129 may generate a control command list in which each received control command is associated with each control content. Alternatively, the peripheral device control unit 129 may cause the user to input a number that uniquely specifies the type of the peripheral device to be controlled via the remote control terminal. The peripheral device control unit 129 may acquire a control command list corresponding to the input number via the network 50 such as the Internet. When the peripheral device control unit 129 is requested by the user to control the peripheral device, the peripheral device control unit 129 refers to the control command list and identifies a control command associated with the requested control content. The peripheral device control unit 129 transmits the specified control command to the peripheral device to be controlled by infrared rays via the infrared light emitting unit 128.
 会話処理装置100は、WiFi、Bluetooth(登録商標)などの無線を介して周辺機器と通信してもよい。周辺機器制御部129は、制御対象の周辺機器を制御するためのデバイスドライバをネットワーク50を介して取得してよい。周辺機器制御部129は、デバイスドライバを利用して、ユーザとの会話を通じて、周辺機器を制御してよい。 The conversation processing apparatus 100 may communicate with peripheral devices via wireless such as WiFi, Bluetooth (registered trademark). The peripheral device control unit 129 may acquire a device driver for controlling the peripheral device to be controlled via the network 50. The peripheral device control unit 129 may control the peripheral device through a conversation with the user using a device driver.
 図5は、会話処理装置100の会話処理の手順の一例を示すフローチャートである。会話処理装置100は、ユーザの発言を検出した場合に図5に示すフローチャートの手順を実行してよい。 FIG. 5 is a flowchart showing an example of the conversation processing procedure of the conversation processing apparatus 100. The conversation processing apparatus 100 may execute the procedure of the flowchart shown in FIG. 5 when detecting a user's utterance.
 マイク101を介して検出部110がユーザの音声を検出する(S100)。音声認識部111は、検出された音声から音声データを生成する(S102)。送受信部130は、音声データをテキスト変換装置200に送信する(S104)。テキスト変換装置200は、音声データから単語を抽出して、例えば、単語が発言順に並んだテキストデータを生成する。送受信部130は、テキスト変換装置200からテキストデータを受信する(S106)。送受信部130は、受信したテキストデータを形態素解析装置300に送信する(S108)。選択部134は、テキストデータに基づいてパターンマッチを実行する(S110)。選択部134は、受信したテキストデータに基づいて、ユーザの発言が会話処理装置100の各種設定などのシステムに関する話題であるか否かを判定する(S112)。選択部134は、受信したテキストデータと一致するテキストデータがシステム会話アプリケーションに関連付けられている場合、システム会話アプリケーションを選択する(S120)。 The detection unit 110 detects the user's voice via the microphone 101 (S100). The voice recognition unit 111 generates voice data from the detected voice (S102). The transmission / reception unit 130 transmits the audio data to the text conversion device 200 (S104). The text conversion device 200 extracts words from the voice data and generates text data in which the words are arranged in the order of speech, for example. The transmission / reception unit 130 receives text data from the text conversion device 200 (S106). The transmission / reception unit 130 transmits the received text data to the morphological analyzer 300 (S108). The selection unit 134 performs pattern matching based on the text data (S110). Based on the received text data, the selection unit 134 determines whether or not the user's utterance is a topic related to the system such as various settings of the conversation processing device 100 (S112). When the text data matching the received text data is associated with the system conversation application, the selection unit 134 selects the system conversation application (S120).
 形態素解析装置300は、テキストデータを受信すると、受信したテキストデータを形態素解析して形態素解析データを生成する。形態素解析装置300は、生成された形態素解析データを会話処理装置100に送信する。送受信部130は、形態素解析装置300から形態素解析データを受信する(S114)。単語情報取得部132は、形態素解析データを単語情報として取得し、選択部134に提供する。選択部134は、単語情報に基づいてパターンマッチを実行する(S116)。選択部134は、実行中の特定会話アプリケーションに関連付けられた単語リストを参照して、ユーザの発言に対して応答する会話アプリケーションを選択する。 When receiving the text data, the morpheme analyzer 300 generates morpheme analysis data by performing morphological analysis on the received text data. The morpheme analyzer 300 transmits the generated morpheme analysis data to the conversation processing device 100. The transmission / reception unit 130 receives morpheme analysis data from the morpheme analyzer 300 (S114). The word information acquisition unit 132 acquires morpheme analysis data as word information and provides it to the selection unit 134. The selection unit 134 performs pattern matching based on the word information (S116). The selection unit 134 refers to a word list associated with the specific conversation application being executed, and selects a conversation application that responds to the user's speech.
 選択部134は、単語リストを参照して、ユーザの発言が、会話処理装置100の各種設定などのシステムに関する話題であるか否かを判定する(S118)。ユーザの発言が、会話処理装置100の各種設定などのシステムに関する話題であれば、選択部134は、システム会話アプリケーションを選択する(S120)。 The selection unit 134 refers to the word list to determine whether or not the user's remark is a topic related to the system such as various settings of the conversation processing apparatus 100 (S118). If the user's speech is a topic about the system such as various settings of the conversation processing device 100, the selection unit 134 selects a system conversation application (S120).
 ユーザの発言がシステムに関する話題でなければ、選択部134は、単語リストを参照して、ユーザの発言が現在の話題の続きかどうかを判定する(S122)。選択部134は、単語リストを参照して、単語情報に含まれる少なくとも1つの単語が実行中の特定会話アプリケーションに関連付けて単語リストに登録されている場合、ユーザの発言が現在の話題の続きであると判断する。ユーザの発言が現在の話題の続きであれば、選択部134は、実行中の特定会話アプリケーションを選択する(S124)。 If the user's utterance is not a topic related to the system, the selection unit 134 refers to the word list to determine whether the user's utterance is a continuation of the current topic (S122). The selection unit 134 refers to the word list, and if at least one word included in the word information is registered in the word list in association with the specific conversation application being executed, the user's utterance is a continuation of the current topic. Judge that there is. If the user's speech is a continuation of the current topic, the selection unit 134 selects the specific conversation application being executed (S124).
 ユーザの発言が現在の話題の続きでなければ、選択部134は、単語リストを参照して、ユーザの発言が新たな話題であるかどうかを判定する(S126)。選択部134は、単語情報に含まれる少なくとも1つの単語が、実行中の特定会話アプリケーションに関連付けて単語リストに登録されておらず、かつ他の特定会話アプリケーションに関連付けて単語リストに登録されている場合、新たな話題であると判断する。ユーザの発言が新たな話題であれば、選択部134は、単語情報に含まれる少なくとも1つの単語と関連付けて単語リストに登録されている他の特定会話アプリケーションを選択する(S128)。 If the user's speech is not a continuation of the current topic, the selection unit 134 refers to the word list to determine whether or not the user's speech is a new topic (S126). In the selection unit 134, at least one word included in the word information is not registered in the word list in association with the specific conversation application being executed, and is registered in the word list in association with another specific conversation application. If it is, it is determined that the topic is new. If the user's speech is a new topic, the selection unit 134 selects another specific conversation application registered in the word list in association with at least one word included in the word information (S128).
 ユーザの発言が新たな話題でなければ、すなわちユーザの発言に対して適切な応答ができる特定会話アプリケーションがなければ、選択部134は、日常会話アプリケーションを選択する(S130)。 If the user's speech is not a new topic, that is, if there is no specific conversation application that can respond appropriately to the user's speech, the selection unit 134 selects the daily conversation application (S130).
 ユーザの発言に対して適当な会話アプリケーションが選択部134により選択された後、応答部121は、選択された会話アプリケーションを実行して、ユーザの発言に対して応答する(S134)。 After an appropriate conversation application is selected by the selection unit 134 for the user's utterance, the response unit 121 executes the selected conversation application and responds to the user's utterance (S134).
 以上のとおり、本実施形態に係る会話処理装置100によれば、ユーザの発言に対して現在実行中の特定会話アプリケーションによる応答が適切でない場合には、そのユーザの発言に対して適当な応答ができる他の特定会話アプリケーションを実行する。さらに、そのユーザの発言に対して適当な応答ができる他の特定会話アプリケーションが存在しない場合には、会話処理装置100は、日常会話アプリケーションを実行する。よって、ユーザが、1つの話題の会話の途中で、他の話題の会話を始めようとした場合でも、会話処理装置100は、ユーザの発言に対してより自然な応答を実現できる。 As described above, according to the conversation processing apparatus 100 according to the present embodiment, when a response by the specific conversation application currently being executed is not appropriate for the user's utterance, an appropriate response is given to the user's utterance. Run other specific conversation applications that can. Further, when there is no other specific conversation application that can appropriately respond to the user's speech, the conversation processing apparatus 100 executes the daily conversation application. Therefore, even when the user tries to start a conversation on another topic in the middle of a conversation on one topic, the conversation processing apparatus 100 can realize a more natural response to the user's speech.
 また、中断された特定会話アプリケーションを再開する場合には、会話処理装置100は、中断前に特定会話アプリケーションを介したユーザとの会話で得られた情報に基づいて、特定会話アプリケーションのアルゴリズムの開始位置を決定する。よって、新たな話題に移ったのち、前回の話題に戻った場合でも、会話処理装置100は、ユーザの発言に対してより自然な応答を実現できる。 When resuming the interrupted specific conversation application, the conversation processing apparatus 100 starts the algorithm of the specific conversation application based on the information obtained in the conversation with the user via the specific conversation application before the interruption. Determine the position. Therefore, even if the conversation processing apparatus 100 returns to the previous topic after moving to a new topic, the conversation processing apparatus 100 can realize a more natural response to the user's utterance.
 図6は、本発明の複数の態様が全体的または部分的に具現化されてよいコンピュータ1200の例を示す。コンピュータ1200にインストールされたプログラムは、コンピュータ1200に、本発明の実施形態に係る装置に関連付けられる操作または当該装置の1または複数のセクションとして機能させることができ、または当該操作または当該1または複数のセクションを実行させることができ、および/またはコンピュータ1200に、本発明の実施形態に係るプロセスまたは当該プロセスの段階を実行させることができる。そのようなプログラムは、コンピュータ1200に、本明細書に記載のフローチャートおよびブロック図のブロックのうちのいくつかまたはすべてに関連付けられた特定の操作を実行させるべく、CPU1212によって実行されてよい。 FIG. 6 illustrates an example of a computer 1200 in which aspects of the present invention may be embodied in whole or in part. A program installed in the computer 1200 can cause the computer 1200 to function as an operation associated with the apparatus according to the embodiment of the present invention or one or more sections of the apparatus, or to perform the operation or the one or more sections. The section can be executed and / or the computer 1200 can execute a process according to an embodiment of the present invention or a stage of the process. Such a program may be executed by CPU 1212 to cause computer 1200 to perform certain operations associated with some or all of the blocks in the flowcharts and block diagrams described herein.
 本実施形態によるコンピュータ1200は、CPU1212、RAM1214、ROM1230、グラフィックコントローラ1216、およびディスプレイデバイス1218を含み、それらはホストコントローラ1210によって相互に接続されている。コンピュータ1200はまた、通信インタフェース1222、及び入力/出力コントローラ1220を含む。コンピュータ1200は、任意の入/出力ユニットを含んでよく、それらは入力/出力コントローラ1220を介してホストコントローラ1210に接続されてよい。 A computer 1200 according to this embodiment includes a CPU 1212, a RAM 1214, a ROM 1230, a graphic controller 1216, and a display device 1218, which are connected to each other by a host controller 1210. Computer 1200 also includes a communication interface 1222 and an input / output controller 1220. Computer 1200 may include optional input / output units, which may be connected to host controller 1210 via input / output controller 1220.
 CPU1212は、ROM1230およびRAM1214内に格納されたプログラムに従い動作し、それにより各ユニットを制御する。グラフィックコントローラ1216は、RAM1214内に提供されるフレームバッファ等またはそれ自体の中にCPU1212によって生成されたイメージデータを取得し、イメージデータがディスプレイデバイス1218上に表示されるようにする。通信インタフェース1222は、ネットワークを介してテキスト変換装置200及び形態素解析装置300などの他の電子デバイスと通信する。 The CPU 1212 operates in accordance with programs stored in the ROM 1230 and the RAM 1214, thereby controlling each unit. The graphic controller 1216 acquires the image data generated by the CPU 1212 in a frame buffer or the like provided in the RAM 1214 or the like, and causes the image data to be displayed on the display device 1218. The communication interface 1222 communicates with other electronic devices such as the text conversion device 200 and the morphological analysis device 300 via a network.
 ROM1230はその中に、アクティブ化時にコンピュータ1200によって実行されるブートプログラム等、および/またはコンピュータ1200のハードウェアに依存するプログラムを格納する。プログラムは、コンピュータ可読媒体から読み取られ、コンピュータ可読媒体の例でもある、RAM1214、またはROM1230にインストールされ、CPU1212によって実行される。これらのプログラム内に記述される情報処理は、コンピュータ1200に読み取られ、プログラムと、上記様々なタイプのハードウェアリソースとの間の連携をもたらす。装置または方法が、コンピュータ1200の使用に従い情報の操作または処理を実現することによって構成されてよい。 The ROM 1230 stores therein a boot program executed by the computer 1200 at the time of activation and / or a program depending on the hardware of the computer 1200. The program is read from a computer-readable medium, installed in the RAM 1214 or the ROM 1230, which is also an example of a computer-readable medium, and executed by the CPU 1212. Information processing described in these programs is read by the computer 1200 to bring about cooperation between the programs and the various types of hardware resources. An apparatus or method may be configured by implementing information manipulation or processing in accordance with the use of computer 1200.
 例えば、通信がコンピュータ1200および外部デバイス間で実行される場合、CPU1212は、RAM1214にロードされた通信プログラムを実行し、通信プログラムに記述された処理に基づいて、通信インタフェース1222に対し、通信処理を命令してよい。通信インタフェース1222は、CPU1212の制御下、RAM1214のような記録媒体内に提供される送信バッファ処理領域に格納された送信データを読み取り、読み取られた送信データをネットワークに送信し、またはネットワークから受信された受信データを記録媒体上に提供される受信バッファ処理領域等に書き込む。 For example, when communication is performed between the computer 1200 and an external device, the CPU 1212 executes a communication program loaded in the RAM 1214 and performs communication processing on the communication interface 1222 based on the processing described in the communication program. You may order. The communication interface 1222 reads transmission data stored in a transmission buffer processing area provided in a recording medium such as the RAM 1214 under the control of the CPU 1212, and transmits the read transmission data to the network or is received from the network. The received data is written into a reception buffer processing area provided on the recording medium.
 様々なタイプのプログラム、データ、テーブル、およびデータベースのような様々なタイプの情報が記録媒体に格納され、情報処理を受けてよい。CPU1212は、RAM1214から読み取られたデータに対し、本開示の随所に記載され、プログラムの命令シーケンスによって指定される様々なタイプの操作、情報処理、条件判断、条件分岐、無条件分岐、情報の検索/置換等を含む、様々なタイプの処理を実行してよく、結果をRAM1214に対しライトバックする。 Various types of information such as various types of programs, data, tables, and databases may be stored in the recording medium and subjected to information processing. The CPU 1212 describes various types of operations, information processing, conditional judgment, conditional branching, unconditional branching, and information retrieval that are described in various places in the present disclosure for data read from the RAM 1214 and specified by the instruction sequence of the program. Various types of processing may be performed, including / replacement, etc., and the result is written back to RAM 1214.
 上で説明したプログラムまたはソフトウェアモジュールは、コンピュータ1200上またはコンピュータ1200近傍のコンピュータ可読媒体に格納されてよい。また、専用通信ネットワークまたはインターネットに接続されたサーバーシステム内に提供されるハードディスクまたはRAMのような記録媒体が、コンピュータ可読媒体として使用可能であり、それによりプログラムを、ネットワークを介してコンピュータ1200に提供する。 The programs or software modules described above may be stored on a computer-readable medium on the computer 1200 or in the vicinity of the computer 1200. In addition, a recording medium such as a hard disk or a RAM provided in a server system connected to a dedicated communication network or the Internet can be used as a computer-readable medium, thereby providing a program to the computer 1200 via the network. To do.
 以上、本発明を実施の形態を用いて説明したが、本発明の技術的範囲は上記実施の形態に記載の範囲には限定されない。上記実施の形態に、多様な変更または改良を加えることが可能であることが当業者に明らかである。その様な変更または改良を加えた形態も本発明の技術的範囲に含まれ得ることが、請求の範囲の記載から明らかである。 As mentioned above, although this invention was demonstrated using embodiment, the technical scope of this invention is not limited to the range as described in the said embodiment. It will be apparent to those skilled in the art that various modifications or improvements can be added to the above-described embodiment. It is apparent from the scope of the claims that the embodiments added with such changes or improvements can be included in the technical scope of the present invention.
 請求の範囲、明細書、および図面中において示した装置、システム、プログラム、および方法における動作、手順、ステップ、および段階等の各処理の実行順序は、特段「より前に」、「先立って」等と明示しておらず、また、前の処理の出力を後の処理で用いるのでない限り、任意の順序で実現しうることに留意すべきである。請求の範囲、明細書、および図面中の動作フローに関して、便宜上「まず、」、「次に、」等を用いて説明したとしても、この順で実施することが必須であることを意味するものではない。 The execution order of each process such as operations, procedures, steps, and stages in the apparatus, system, program, and method shown in the claims, the description, and the drawings is particularly “before” or “prior”. It should be noted that they can be implemented in any order unless the output of the previous process is used in the subsequent process. Regarding the operation flow in the claims, the description, and the drawings, even if it is described using “first”, “next”, etc. for the sake of convenience, it means that it is essential to carry out in this order. is not.
100 会話処理装置
101 マイク
102 カメラ
104 スピーカ
105 表示部
106 タッチセンサ
107 赤外線センサ
108 アクチュエータ
109 調整部
110 検出部
111 音声認識部
112 画像処理部
114 音声制御部
116 表示制御部
118 センサ制御部
119 アクチュエータ制御部
120 アプリケーション実行部
121 応答部
122 登録部
123 中断状態記憶部
124 終了部
126 赤外線受光部
128 赤外線発光部
129 周辺機器制御部
130 送受信部
132 単語情報取得部
134 選択部
140 アプリケーション格納部
142 単語リスト格納部
144 ユーザプロファイル格納部
200 テキスト変換装置
300 形態素解析装置
1200 コンピュータ
1210 ホストコントローラ
1212 CPU
1214 RAM
1216 グラフィックコントローラ
1218 ディスプレイデバイス
1220 入力/出力コントローラ
1222 通信インタフェース
1230 ROM
DESCRIPTION OF SYMBOLS 100 Conversation processing apparatus 101 Microphone 102 Camera 104 Speaker 105 Display part 106 Touch sensor 107 Infrared sensor 108 Actuator 109 Adjustment part 110 Detection part 111 Voice recognition part 112 Image processing part 114 Voice control part 116 Display control part 118 Sensor control part 119 Actuator control Unit 120 application execution unit 121 response unit 122 registration unit 123 interruption state storage unit 124 end unit 126 infrared light receiving unit 128 infrared light emitting unit 129 peripheral device control unit 130 transmission / reception unit 132 word information acquisition unit 134 selection unit 140 application storage unit 142 word list Storage unit 144 User profile storage unit 200 Text conversion device 300 Morphological analysis device 1200 Computer 1210 Host controller 1212 CPU
1214 RAM
1216 Graphic controller 1218 Display device 1220 Input / output controller 1222 Communication interface 1230 ROM

Claims (11)

  1.  ユーザの発言に対して予め定められたアルゴリズムに従って応答する複数の会話アプリケーションを実行する会話処理装置であって、
     ユーザの発言を検出する検出部と、
     実行中の会話アプリケーションが前記ユーザの発言に対して応答できる場合、前記実行中の会話アプリケーションを選択し、前記実行中の会話アプリケーションが前記ユーザの発言に対して応答できない場合、前記ユーザの発言に対して応答できる他の会話アプリケーションを選択する選択部と、
     選択された会話アプリケーションが前記実行中の会話アプリケーションの場合、前記実行中の会話アプリケーションを継続し、選択された会話アプリケーションが前記他の会話アプリケーションの場合、前記実行中の会話アプリケーションを中断して、前記他の会話アプリケーションを実行することにより、前記ユーザの発言に対して応答する応答部と
    を備える会話処理装置。
    A conversation processing device that executes a plurality of conversation applications that respond to a user's speech according to a predetermined algorithm,
    A detection unit for detecting a user's speech;
    If the running conversation application can respond to the user's speech, select the running conversation application, and if the running conversation application cannot respond to the user's speech, respond to the user's speech A selection section for selecting other conversational applications that can respond to,
    If the selected conversation application is the running conversation application, continue the running conversation application; if the selected conversation application is the other conversation application, suspend the running conversation application; A conversation processing apparatus comprising: a response unit that responds to the user's speech by executing the other conversation application.
  2.  前記複数の会話アプリケーションは、予め定められた条件を満たすまで前記アルゴリズムに従って前記ユーザとの会話を継続する複数の特定会話アプリケーションを含み、
     前記選択部は、実行中の特定会話アプリケーションが前記ユーザの発言に対して応答できる場合、前記実行中の特定会話アプリケーションを選択し、前記実行中の特定会話アプリケーションが前記ユーザの発言に対して応答できない場合、前記ユーザの発言に対して応答できる他の特定会話アプリケーションを選択する、請求項1に記載の会話処理装置。
    The plurality of conversation applications includes a plurality of specific conversation applications that continue a conversation with the user according to the algorithm until a predetermined condition is satisfied,
    When the specific conversation application being executed can respond to the user's utterance, the selection unit selects the specific conversation application being executed, and the specific conversation application being executed responds to the user's utterance. The conversation processing apparatus according to claim 1, wherein if it is not possible, another specific conversation application that can respond to the user's speech is selected.
  3.  前記複数の会話アプリケーションは、前記ユーザの1つの発言に対して1つの応答を実行する日常会話アプリケーションを更に含み、
     前記選択部は、実行中の特定会話アプリケーションが前記ユーザの発言に対して応答できる場合、前記実行中の特定会話アプリケーションを選択し、前記実行中の特定会話アプリケーションが前記ユーザの発言に対して応答できない場合、前記ユーザの発言に対して応答できる他の特定会話アプリケーションを選択し、前記ユーザの発言に対して応答できる他の特定会話アプリケーションを選択できない場合、前記日常会話アプリケーションを選択する、請求項2に記載の会話処理装置。
    The plurality of conversation applications further includes a daily conversation application that performs one response to one remark of the user;
    When the specific conversation application being executed can respond to the user's utterance, the selection unit selects the specific conversation application being executed, and the specific conversation application being executed responds to the user's utterance. If it is not possible, the user selects another specific conversation application that can respond to the user's speech, and if the user cannot select another specific conversation application that can respond to the user's speech, the daily conversation application is selected. 3. The conversation processing apparatus according to 2.
  4.  前記日常会話アプリケーションは、深層学習アルゴリズムに従って前記ユーザの1つの発言に対して1つの応答を実行する、請求項3に記載の会話処理装置。 The conversation processing apparatus according to claim 3, wherein the daily conversation application executes one response to one utterance of the user according to a deep learning algorithm.
  5.  前記検出部により検出された前記ユーザの発言から抽出された少なくとも1つの単語を含む単語情報を取得する単語情報取得部と、
     前記複数の特定会話アプリケーションに関連付けて、前記複数の特定会話アプリケーションが応答できるユーザの発言に対応する少なくとも1つの単語が登録された単語リストを格納する単語リスト格納部と
    を更に備え、
     前記選択部は、前記単語リストを参照して、前記単語情報に含まれる前記少なくとも1つの単語が前記実行中の特定会話アプリケーションに関連付けて前記単語リストに登録されている場合、前記実行中の特定会話アプリケーションを選択し、前記単語情報に含まれる前記少なくとも1つの単語が、前記実行中の特定会話アプリケーションに関連付けて前記単語リストに登録されておらず、かつ他の特定会話アプリケーションに関連付けて前記単語リストに登録されている場合、前記単語リストに登録されている前記他の特定会話アプリケーションを選択する、請求項2から4のいずれか1つに記載の会話処理装置。
    A word information acquisition unit that acquires word information including at least one word extracted from the user's statement detected by the detection unit;
    A word list storage unit that stores a word list in which at least one word corresponding to a user's utterance that can be responded to by the plurality of specific conversation applications is associated with the plurality of specific conversation applications;
    The selection unit refers to the word list, and if the at least one word included in the word information is registered in the word list in association with the specific conversation application being executed, A conversation application is selected, and the at least one word included in the word information is not registered in the word list in association with the specific conversation application being executed and is associated with another specific conversation application. The conversation processing apparatus according to any one of claims 2 to 4, wherein, when registered in a list, the other specific conversation application registered in the word list is selected.
  6.  前記単語リスト格納部は、前記複数の特定会話アプリケーションごとに前記単語リストを格納し、
     前記選択部は、前記実行中の特定会話アプリケーションに関連付けられた前記単語リストを参照して、特定会話アプリケーションを選択する、請求項5に記載の会話処理装置。
    The word list storage unit stores the word list for each of the plurality of specific conversation applications,
    The conversation processing device according to claim 5, wherein the selection unit selects a specific conversation application with reference to the word list associated with the specific conversation application being executed.
  7.  前記応答部は、前記実行中の特定会話アプリケーションを中断して、前記他の特定会話アプリケーションを実行することにより、前記ユーザの発言に対して応答する場合、前記実行中の特定会話アプリケーションの実行中に前記ユーザから得られた情報に基づいて、前記他の特定会話アプリケーションの前記アルゴリズムの開始位置を決定し、決定された前記開始位置に基づいて前記他の特定会話アプリケーションを実行する、請求項2から6のいずれか1つに記載の会話処理装置。 When the response unit responds to the user's statement by interrupting the specific conversation application being executed and executing the other specific conversation application, the specific conversation application being executed is being executed. The start position of the algorithm of the other specific conversation application is determined based on information obtained from the user at the same time, and the other specific conversation application is executed based on the determined start position. To 6. The conversation processing device according to any one of items 1 to 6.
  8.  前記実行中の特定会話アプリケーションを中断する場合に、前記実行中の特定会話アプリケーションの前記アルゴリズムの中断状態を記憶する中断状態記憶部を更に備え、
     前記応答部は、前記中断状態記憶部を参照して、前記実行中の特定会話アプリケーションの前記アルゴリズムの中断状態を特定し、前記他の特定会話アプリケーションが終了または中断したことに対応して、先に中断された前記特定会話アプリケーションを前記中断状態に基づいて再開する、請求項2から7のいずれか1つに記載の会話処理装置。
    An interruption state storage unit for storing an interruption state of the algorithm of the specific conversation application being executed when the specific conversation application being executed is interrupted;
    The response unit refers to the interruption state storage unit, specifies the interruption state of the algorithm of the specific conversation application being executed, and corresponds to the end of or interruption of the other specific conversation application. The conversation processing apparatus according to claim 2, wherein the specific conversation application suspended at a time is resumed based on the suspended state.
  9.  前記検出部により検出されたユーザの発言が、予め定められた特定の発言である場合、前記実行中の会話アプリケーションを強制的に終了する終了部を更に備える、請求項1から8のいずれか1つに記載の会話処理装置。 The system according to claim 1, further comprising: an ending unit that forcibly terminates the conversation application being executed when the user's utterance detected by the detection unit is a predetermined specific utterance. Conversation processing device described in one.
  10.  前記会話処理装置の周囲を撮像する撮像部と、
     前記会話処理装置の周囲に存在する物体の存在を検出する赤外線センサと、
     前記赤外線センサによる検出結果に応じて、前記ユーザの顔が前記撮像部の撮像範囲に含まれるように、前記撮像部の撮像範囲を調整する調整部と
    を更に備える、請求項1から9のいずれか1つに記載の会話処理装置。
    An imaging unit for imaging the periphery of the conversation processing device;
    An infrared sensor for detecting the presence of an object present around the conversation processing device;
    10. The apparatus according to claim 1, further comprising: an adjustment unit that adjusts an imaging range of the imaging unit such that the user's face is included in the imaging range of the imaging unit according to a detection result by the infrared sensor. The conversation processing device according to any one of the above.
  11.  ユーザの発言に対して予め定められたアルゴリズムに従って応答する複数の会話アプリケーションをコンピュータに実行させるためのプログラムであって、
     ユーザの発言を検出する手順と、
     実行中の会話アプリケーションが前記ユーザの発言に対して応答できる場合、前記実行中の会話アプリケーションを選択し、前記実行中の会話アプリケーションが前記ユーザの発言に対して応答できない場合、前記ユーザの発言に対して応答できる他の会話アプリケーションを選択する手順と、
     選択された会話アプリケーションが前記実行中の会話アプリケーションの場合、前記実行中の会話アプリケーションを継続し、選択された会話アプリケーションが前記他の会話アプリケーションの場合、前記実行中の会話アプリケーションを中断して、前記他の会話アプリケーションを実行することにより、前記ユーザの発言に対して応答する手順と
    を前記コンピュータに実行させるためのプログラム。
    A program for causing a computer to execute a plurality of conversation applications that respond to a user's speech according to a predetermined algorithm,
    A procedure to detect the user's speech;
    If the running conversation application can respond to the user's speech, select the running conversation application, and if the running conversation application cannot respond to the user's speech, respond to the user's speech Select other conversational applications that can respond to,
    If the selected conversation application is the running conversation application, continue the running conversation application; if the selected conversation application is the other conversation application, suspend the running conversation application; A program for causing the computer to execute a procedure for responding to the user's speech by executing the other conversation application.
PCT/JP2017/026490 2016-08-02 2017-07-21 Conversation processing device and program WO2018025668A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2016151925A JP2018021987A (en) 2016-08-02 2016-08-02 Conversation processing device and program
JP2016-151925 2016-08-02

Publications (1)

Publication Number Publication Date
WO2018025668A1 true WO2018025668A1 (en) 2018-02-08

Family

ID=61072831

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2017/026490 WO2018025668A1 (en) 2016-08-02 2017-07-21 Conversation processing device and program

Country Status (2)

Country Link
JP (1) JP2018021987A (en)
WO (1) WO2018025668A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2020142555A (en) * 2019-03-04 2020-09-10 本田技研工業株式会社 Vehicle control system, vehicle control method and program
CN112204655A (en) * 2018-05-22 2021-01-08 三星电子株式会社 Electronic device for outputting response to voice input by using application and operating method thereof

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3859568A4 (en) * 2018-09-28 2021-09-29 Fujitsu Limited Dialogue device, dialogue method and dialogue program
JP7198122B2 (en) * 2019-03-07 2022-12-28 本田技研工業株式会社 AGENT DEVICE, CONTROL METHOD OF AGENT DEVICE, AND PROGRAM

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH099128A (en) * 1995-06-22 1997-01-10 Canon Inc Image pickup device
JP2000069459A (en) * 1998-08-25 2000-03-03 Matsushita Electric Ind Co Ltd Monitor system
JP2001056694A (en) * 1999-08-19 2001-02-27 Denso Corp Interactive user interface device
JP2002032370A (en) * 2000-07-18 2002-01-31 Fujitsu Ltd Information processor
JP2014219594A (en) * 2013-05-09 2014-11-20 ソフトバンクモバイル株式会社 Conversation processing system and program
JP2014222402A (en) * 2013-05-13 2014-11-27 日本電信電話株式会社 Utterance candidate generation device, utterance candidate generation method, and utterance candidate generation program
JP2016062550A (en) * 2014-09-22 2016-04-25 ソフトバンク株式会社 Conversation processing system, and program

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002350555A (en) * 2001-05-28 2002-12-04 Yamaha Motor Co Ltd Human presence detector
JP3998443B2 (en) * 2001-08-10 2007-10-24 富士通テン株式会社 Dialog system
JP6280342B2 (en) * 2013-10-22 2018-02-14 株式会社Nttドコモ Function execution instruction system and function execution instruction method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH099128A (en) * 1995-06-22 1997-01-10 Canon Inc Image pickup device
JP2000069459A (en) * 1998-08-25 2000-03-03 Matsushita Electric Ind Co Ltd Monitor system
JP2001056694A (en) * 1999-08-19 2001-02-27 Denso Corp Interactive user interface device
JP2002032370A (en) * 2000-07-18 2002-01-31 Fujitsu Ltd Information processor
JP2014219594A (en) * 2013-05-09 2014-11-20 ソフトバンクモバイル株式会社 Conversation processing system and program
JP2014222402A (en) * 2013-05-13 2014-11-27 日本電信電話株式会社 Utterance candidate generation device, utterance candidate generation method, and utterance candidate generation program
JP2016062550A (en) * 2014-09-22 2016-04-25 ソフトバンク株式会社 Conversation processing system, and program

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112204655A (en) * 2018-05-22 2021-01-08 三星电子株式会社 Electronic device for outputting response to voice input by using application and operating method thereof
US11508364B2 (en) * 2018-05-22 2022-11-22 Samsung Electronics Co., Ltd. Electronic device for outputting response to speech input by using application and operation method thereof
JP2020142555A (en) * 2019-03-04 2020-09-10 本田技研工業株式会社 Vehicle control system, vehicle control method and program
JP7145105B2 (en) 2019-03-04 2022-09-30 本田技研工業株式会社 Vehicle control system, vehicle control method, and program
US11541906B2 (en) 2019-03-04 2023-01-03 Honda Motor Co., Ltd. Vehicle control device, vehicle control method, and storage medium

Also Published As

Publication number Publication date
JP2018021987A (en) 2018-02-08

Similar Documents

Publication Publication Date Title
US11955124B2 (en) Electronic device for processing user speech and operating method therefor
EP3396665B1 (en) Voice data processing method and electronic device supporting the same
WO2018025668A1 (en) Conversation processing device and program
KR102414122B1 (en) Electronic device for processing user utterance and method for operation thereof
US20200159491A1 (en) Remote Execution of Secondary-Device Drivers
JP5777731B2 (en) Environment-dependent dynamic range control for gesture recognition
US20190258456A1 (en) System for processing user utterance and controlling method thereof
US11404060B2 (en) Electronic device and control method thereof
KR102424260B1 (en) Generate IOT-based notifications and provide command(s) to automatically render IOT-based notifications by the automated assistant client(s) of the client device(s)
KR20210002598A (en) Transfer of automation assistant routines between client devices during routine execution
US20200043476A1 (en) Electronic device, control method therefor, and non-transitory computer readable recording medium
US20190130898A1 (en) Wake-up-word detection
US20240143154A1 (en) Proximity-Based Controls on a Second Device
KR20210005200A (en) Providing audio information using digital assistant
KR102345883B1 (en) Electronic device for ouputting graphical indication
US11620996B2 (en) Electronic apparatus, and method of controlling to execute function according to voice command thereof
JP6950708B2 (en) Information processing equipment, information processing methods, and information processing systems
US11756545B2 (en) Method and device for controlling operation mode of terminal device, and medium
US10901520B1 (en) Content capture experiences driven by multi-modal user inputs
KR20230121150A (en) Automated assistant performance of non-assistant application action(s) in response to user input, which may be limited to parameter(s)
KR20180116725A (en) Method for displaying operation screen of speech recognition service and electronic device supporting the same

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17836774

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17836774

Country of ref document: EP

Kind code of ref document: A1