WO2016052164A1 - Conversation device - Google Patents

Conversation device Download PDF

Info

Publication number
WO2016052164A1
WO2016052164A1 PCT/JP2015/076081 JP2015076081W WO2016052164A1 WO 2016052164 A1 WO2016052164 A1 WO 2016052164A1 JP 2015076081 W JP2015076081 W JP 2015076081W WO 2016052164 A1 WO2016052164 A1 WO 2016052164A1
Authority
WO
WIPO (PCT)
Prior art keywords
response
voice
character
unit
response information
Prior art date
Application number
PCT/JP2015/076081
Other languages
French (fr)
Japanese (ja)
Inventor
梅原 尚子
圭司 寺島
Original Assignee
シャープ株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by シャープ株式会社 filed Critical シャープ株式会社
Publication of WO2016052164A1 publication Critical patent/WO2016052164A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/04Details of speech synthesis systems, e.g. synthesiser structure or memory management
    • G10L13/047Architecture of speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications

Definitions

  • the present invention relates to an interactive apparatus and an interactive system that are connected to a communication network and recognize and respond to a user's voice.
  • information related to a response based on the recognition result is received from the server device and output (response).
  • response information related to a response based on the recognition result
  • the server device can be used by a plurality of robots, it is more advantageous in terms of cost than increasing the processing capability of each interactive robot.
  • the interactive robot responds by acquiring the response content from the server device
  • the response timing is delayed as compared with the case where the interactive robot alone recognizes and responds. For this reason, the user may feel stressed and feel difficult to talk.
  • the present invention has been made in view of the above problems, and provides an interactive apparatus and an interactive system that can smoothly output a plurality of information and provide a comfortable interactive environment without stressing the user. There is to do.
  • an interactive apparatus shows a speech recognition unit that recognizes input speech that has been input, and a response content corresponding to a result of speech recognition by the speech recognition unit.
  • Response information storage unit for storing first response information, and communication for transmitting the input voice to a server device and receiving second response information indicating response contents according to a result of voice recognition of the input voice by the server device And the second response continuously after performing the first response processing for outputting the response content indicated by the first response information obtained by referring to the response information storage unit to the input speech
  • an output control unit for performing a second response process for outputting the response content indicated by the information as a voice.
  • the input speech can be supplemented by the speech recognition of the server device and the response by the second response information in addition to the speech recognition by the device itself and the response by the first response information. Therefore, it is possible to respond with a plurality of pieces of information without improving the processing capability of the voice recognition unit of the interactive apparatus and expanding the capacity of the response information storage unit.
  • a plurality of information can be smoothly output as a voice, and a comfortable interactive environment can be provided without causing stress to the user.
  • FIG. 1 is a diagram showing a configuration of a dialogue system 100 according to the present embodiment.
  • the dialogue system 100 includes a dialogue device (self device) 10 and a cloud server (server device) 30, which are connected via a communication network.
  • a communication network For example, the Internet can be used as this communication network.
  • a telephone line network, a mobile communication network, a CATV (CAble TeleVision) communication network, a satellite communication network, or the like can be used.
  • the dialogue device 10 and the cloud server 30 each have a voice recognition function, and the user can interact with the dialogue device 10 by voice using natural language.
  • the dialogue apparatus 10 may be, for example, a dialogue robot, or may be a smartphone, a tablet terminal, a personal computer, a home appliance (home electronic device) or the like having a voice recognition function.
  • FIG. 1 only one interactive device 10 connected to the cloud server 30 is shown for simplicity of explanation, but in the interactive system 100, the number of interactive devices 10 connected to the cloud server 30 is as follows. It is not limited.
  • the type of the interactive device 10 connected to the cloud server 30 is not limited, that is, different types of interactive devices 10 such as an interactive robot and a smartphone may be connected to the cloud server 30.
  • the dialogue device 10 is a device that performs voice recognition when a voice (voice signal) is inputted and performs a dialogue according to the recognition result.
  • the dialogue apparatus 10 includes a voice input unit 11, a voice output unit 12, a control unit 13, a data storage unit 14, and a communication unit 15.
  • the voice input unit 11 is a voice input device such as a microphone
  • the voice output unit 12 is a voice output device such as a speaker.
  • the control unit 13 is a block that controls the operation of each unit of the dialogue apparatus 10.
  • the control unit 13 includes a computer device including an arithmetic processing unit such as a CPU (Central Processing Unit) or a dedicated processor.
  • the control unit 13 reads out and executes a program for executing various controls in the interactive device 10 stored in the data storage unit 14, thereby controlling the operation of each unit of the interactive device 10 in an integrated manner.
  • arithmetic processing unit such as a CPU (Central Processing Unit) or a dedicated processor.
  • the control unit 13 reads out and executes a program for executing various controls in the interactive device 10 stored in the data storage unit 14, thereby controlling the operation of each unit of the interactive device 10 in an integrated manner.
  • control unit 13 has functions as a speech recognition unit 16, a response information acquisition unit 17, an output control unit 18, and a speech synthesis unit 19.
  • the voice recognition unit 16 is a block that recognizes an input voice from the user. Specifically, the voice recognition unit 16 converts voice data input from the voice input unit 11 into text data, analyzes the text data, and extracts words and phrases. A known technique can be used for voice recognition processing.
  • the response information acquisition unit 17 is a block that detects response information indicating response content according to the recognition result of the voice recognition unit 16 from a first response information storage unit (response information storage unit) 141 described below.
  • the response information that the response information acquisition unit 17 acquires from the first response information storage unit 141 is referred to as first response information.
  • the response information acquisition unit 17 refers to the first response information storage unit 141 and acquires the first response information corresponding to the words and phrases extracted by the voice recognition unit 16. If the information corresponding to the word or phrase extracted by the speech recognition unit 16 is not registered in the first response information storage unit 141 or if the speech recognition unit 16 has failed in speech recognition, the response information acquisition unit 17 Obtains default first response information. Specific examples of the default first response information include information such as “Please wait for a while” or “Let me hear” (for example, “Please wait for a while” or “Listen Information that becomes "I see”. Note that the present invention is not limited to these.
  • the output control unit 18 is a block that performs audio output by causing the audio output unit 12 to output audio data.
  • the output control unit 18 performs the second response process continuously after performing the first response process as a response to the input voice from the voice input unit 11.
  • the first response process is a process of outputting the response content indicated by the first response information acquired by the response information acquisition unit 17 by voice.
  • the second response process is the second response information received from the cloud server 30. This is a process for outputting the displayed response content by voice. The second response information will be described later.
  • the speech synthesizer 19 is a block that generates speech data (speech synthesis).
  • the voice synthesizer 19 generates voice data having response contents indicated by the first response information.
  • the generated audio data is output via the audio output unit 12. If the first response information is generated as voice data (recorded voice), the voice synthesizer 19 does not generate it.
  • the data storage unit 14 includes a RAM (Random Access Memory), a ROM (Read Only Memory), an HDD (Hard Disk Drive), and the like, and is a block that stores various information (data) used in the interactive device 10. Further, the data storage unit 14 includes a first response information storage unit 141.
  • the first response information storage unit 141 is a database in which first response information is registered in association with words and phrases.
  • the first response information includes not only information corresponding to one word but also information corresponding to a combination of a plurality of words.
  • a plurality of pieces of first correspondence information may be registered corresponding to a certain word or a certain phrase, and in this case, what is actually output by voice may be selected.
  • the words, phrases, and first response information may all be stored as text data.
  • Known techniques can be used to construct such a database and to obtain response information from the database.
  • the dialogue apparatus 10 can return a response to the user's utterance by referring to the first response information storage unit 141, that is, the dialogue with the user becomes possible.
  • the communication unit 15 is a block that performs communication with the outside.
  • the communication unit 15 transmits the voice data to the cloud server 30.
  • the 2nd response information which shows the response content according to the result of the speech recognition of the input sound by the cloud server 30 is received from the cloud server 30 explained in full detail behind.
  • the communication unit 15 transmits the voice data input from the voice input unit to the cloud server 30 as it is.
  • the communication unit 15 uses the text data generated by the voice recognition unit 16 or the text data. You may transmit the word and phrase extracted from to the cloud server 30.
  • the output control unit 18 is configured to perform the first response processing while the communication unit 15 receives the second response information from the cloud server 30.
  • the dialogue apparatus 10 may further include an imaging unit (camera).
  • the dialogue apparatus 10 is configured to analyze a user's facial expression and position from an image input from the imaging unit and perform a dialogue based on the analysis. Also good.
  • the dialogue apparatus 10 is a robot, when it is recognized that the user's position is rightward when viewed from the front of the robot, the head of the robot is actually turned to the right, or the face is turned to the right It may be configured to indicate a state of facing the user, that is, being able to respond by displaying a state of moving toward.
  • the cloud server 30 is a server that generates a response to voice data (input voice) received from the dialogue apparatus 10 and transmits the response to the dialogue apparatus 10.
  • the cloud server 30 is a server that manages the interactive device 10. When a plurality of interactive devices 10 are connected, the cloud server 30 manages each individually.
  • the cloud server 30 may also manage information related to the user of the interactive device 10. In this case, information related to the user can be registered in the cloud server 30 from an external device such as a smartphone or a tablet. Also good.
  • the server device connected to the interactive device 10 will be described using the cloud server 30 that provides a cloud service, but is not limited to the cloud server.
  • the cloud server 30 may be one unit or a plurality of units connected via a communication network.
  • the cloud server 30 includes a control unit 31, a data storage unit 32, and a communication unit 33, as shown in FIG.
  • the control unit 31 includes a computer device configured by an arithmetic processing unit such as a CPU or a dedicated processor, and is a block that controls the operation of each unit of the cloud server 30.
  • the control unit 31 has functions as a speech recognition unit 34, a response information generation unit 35, and a speech synthesis unit 36.
  • the voice recognition unit 34 is a block having the same function as the voice recognition unit 16 of the dialogue apparatus 10. However, the speech recognition capability (performance) is higher than that of the speech recognition unit 16. Thereby, even if the dialogue apparatus 10 cannot recognize the voice, the cloud server 30 can recognize the voice.
  • the response information generation unit 35 is a block that generates response information following the first response information.
  • the response information generated by the response information generation unit 35 is referred to as second response information.
  • the response information generation unit 35 detects response information indicating the response content according to the recognition result of the voice recognition unit 34 from the second response information storage unit 321 described below. Then, the second response information is generated.
  • the voice synthesizer 36 is a block that generates voice data.
  • the voice synthesizer 36 is a block that generates voice data of response contents indicated by the second response information generated by the response information generator 35.
  • the cloud server 30 is configured to receive information (external provision information) from an external information provision server via a communication network. Therefore, the response information generation unit 35 may generate the second response information based on the externally provided information, the user information registered in the cloud server 30 described above, or a combination thereof. Specific examples of the externally provided information include weather information, traffic information, disaster information, and the like, but are not limited thereto. Further, the number of information providing servers that provide information to the cloud server 30 is not limited.
  • the voice of the response content indicated by the first response information (the first response) with respect to the input voice “Good morning”
  • the voice of the response content indicated by the second response information is “Today because today ’s weather is cloudy and rainy” It ’s better to take an umbrella with you. ”
  • the externally provided information is weather information
  • the second response information is generated based on the weather information.
  • the second response information is continuously output after voice output of the first response information
  • the second response information is a continuous content or a detailed content of the first response information. It is preferable because a sense of unity is born in the response.
  • the cloud server 30 knows in advance which first response information is output from the interactive device 10 for which input voice, or when the interactive device 10 acquires the first response information. The cloud server 30 may be notified.
  • the voice data of the response content indicated by the second response information generated by the cloud server 30 is transmitted to the dialogue apparatus 10 by the control unit 31 controlling the communication unit 33.
  • the cloud server 30 is configured to generate the second response information as voice data and then transmit it, so the load on the interactive device 10 can be reduced.
  • each of the dialogue apparatus 10 and the cloud server 30 has a voice synthesis unit, even if the voice quality of the synthesized voice differs, the first response process as described in the second embodiment. By changing the character that appears in the dialogue apparatus 10 in the second response process, it is possible to eliminate the uncomfortable feeling that the user feels.
  • the cloud server 30 may not have the voice synthesizer 36 and may be configured to transmit the second response information to the dialogue apparatus 10 as text data. In this case, the second response information is generated as voice data by the voice synthesizer 19 of the dialogue apparatus 10.
  • the cloud server 30 may be able to register recorded voices from an external device such as a smartphone or a tablet, for example.
  • the response information generation unit 35 may acquire the registered recorded voice as the second response information, or may be included in the generation of the second response information. Since the recorded voice is formed as voice data, if it is transmitted to the dialogue apparatus 10 as it is, the voice synthesis process in the dialogue apparatus 10 is not performed. For example, when the voice “There is a cake in the refrigerator” is registered in the cloud server 30 from the smartphone of the user ’s mother, the dialogue apparatus 10 receives the first response information in response to the input voice of the user “I ’m right”. Use it to output “Okaeri” as a voice, and then use the second response information to say “A message from your mother.“ There is a cake in the refrigerator. ” It is possible to perform an advanced response such as outputting “
  • the data storage unit 32 is a block that stores various information (data) used in the cloud server 30. Further, the data storage unit 32 includes a second response information storage unit 321.
  • the second response information storage unit 321 is a database in which second response information is registered in association with words and phrases. The second response information storage unit 321 stores a larger amount of information than the first response information storage unit 141. Further, the second response information may be updated periodically.
  • the response information generating unit 35 and the second response information of the cloud server 30 as described above. Since the storage unit 321 is configured, it is possible to correctly recognize the input voice and to return a plurality of information.
  • the communication unit 33 is a block that performs communication with the outside.
  • the communication unit 33 is connected to an external information providing server (not shown) or an external device such as a smartphone or a tablet through a communication network in addition to the interactive device 10.
  • an external information providing server not shown
  • an external device such as a smartphone or a tablet
  • a communication network in addition to the interactive device 10.
  • the number of devices connected to the cloud server 30 is not limited.
  • step A1 When the dialogue apparatus 10 receives the voice data (input voice) of the utterance from the user 2 (step A1), the dialogue apparatus 10 transmits the received voice data to the cloud server 30 (step A2) and acquires the first response information (step A2). Step A3).
  • step A3 the input speech is recognized as speech, and first response information indicating response contents according to the result of speech recognition is acquired. Note that either step A2 or step A3 may be started first. Then, the dialogue apparatus 10 outputs the response content indicated by the first response information as a voice (step A4).
  • the cloud server 30 when the cloud server 30 receives the voice data from the interactive device 10 (step B1), the cloud server 30 generates second response information (step B2), and transmits the generated second response information to the interactive device 10 (step B3). .
  • the dialogue apparatus 10 while receiving the second response information from the cloud server 30, the dialogue apparatus 10 outputs the response content indicated by the first response information as a voice.
  • step A5 When the interactive device 10 receives the second response information (step A5), the dialogue device 10 outputs the following response content indicated by the second response information as a voice (step A6). Thus, the dialogue process in the dialogue system 100 ends.
  • the dialogue device 10 continuously outputs the response content indicated by the first response information associated with the result of the voice recognition performed by the voice recognition unit 16, after the voice is output,
  • the response content indicated by the second response information associated with the result of the speech recognition by the server 30 is output as speech.
  • the first response information is in accordance with the result of voice recognition in the dialog device 10, the first response information is output from the dialog device 10 earlier than the second response information received through communication with the cloud server 30. Can do.
  • the cloud server 30 can perform advanced processing than the individual interactive devices 10, it can perform advanced speech recognition. Therefore, the interactive device 10 can make a quick response to the input voice by the first response process, and can provide various or advanced information by the second response process that continues after the first response process.
  • the voice recognition in the cloud server 30 and the second response information in response to the input voice from the user. Since it can be supplemented by a response, it is possible to respond with a plurality of pieces of information without improving the processing capability of the voice recognition unit 16 of the interactive apparatus 10 and expanding the capacity of the data storage unit 14. Therefore, the interactive system 100 can smoothly output a plurality of information as voices, and can provide a comfortable interactive environment without stressing the user.
  • the output control unit 18 is configured to perform the first response process while the communication unit 15 receives the second response information from the cloud server 30. The user can interact without stressing the waiting time for receiving the second response information.
  • the interactive system 100a includes an interactive device 10a and a cloud server (server device) 30a.
  • the dialogue apparatus 10a will be described as a dialogue robot having a head and a torso that can express facial expressions as shown in FIG.
  • the interactive device 10 a includes a display unit 21 and an operation unit 22 in addition to the configuration of the interactive device 10 of the first embodiment.
  • the display unit 21 is a block that displays an image of the expression of the interactive robot. In the present embodiment, the display unit 21 performs display using the rear projection method, but is not limited thereto.
  • the operation unit 22 is a block that executes the operation of the interactive apparatus 10a. As will be described below, the operation unit 22 performs an operation of rotating the dialogue apparatus 10a at the time of character switching between the first response process and the second response process. This rotation may be a motion other than the rotation, and as shown in FIG. 4A, the head of the dialog robot as the dialog device 10a rotates in the horizontal direction.
  • the operation unit 22 may be configured to operate the head, the body, or the arm attached to the body of the dialog robot that is the dialog device 10a in various directions. Further, at the time of character switching, the motion unit 22 may perform an exercise other than rotation, and may execute, for example, movement, direction change, vibration, and the like. Here, the exercise means that at least a part of the interactive device performs a physical operation.
  • the dialogue apparatus 10a includes a character storage unit (character feature amount storage unit) 142 that stores image feature amounts and audio feature amounts of a plurality of characters in the data storage unit 14a.
  • the control unit 13a has a function as the character switching unit 20.
  • the character switching unit 20 controls at least one of the image display, the sound output, and the exercise in the interactive device 10a, and performs an effect process indicating that a different character has appeared in the interactive device 10a. It is a block. The effect process will be described below using a specific example.
  • the output control unit 18a selects one of a plurality of characters during the first response process and the second response process, and refers to the character storage unit 142 of the selected character.
  • the character image determined by the feature amount of the obtained image is displayed on the display unit 21, and the character voice determined by the voice feature amount obtained by referring to the character storage unit 142 of the selected character is displayed on the voice output unit 12.
  • Output audio In this way, the interactive device 10a changes the feature amount of the image displayed on the display unit 21 and the sound quality of the sound output from the audio output unit 12 between the first response process and the second response process, Different characters can appear in the dialogue apparatus 10a. Further, the operation may be changed.
  • the character is, for example, a child, a father, a mother, a teacher, a news caster, or the like.
  • the cloud server 30a generates the second response information including the character designation information for designating the character when the dialogue apparatus 10a outputs the second response information as a voice.
  • the character designation information designates a character that can appear on the dialogue apparatus 10a.
  • the character designation information may be, for example, information corresponding to the content to be output as voice by the second response information. For example, if the content to be output in the second response information is content related to study, it is information that specifies the character of father, if it is content related to life, mom, and if it is content related to climate, it is a character that specifies the weather caster. . In addition, these are illustrations and are not limited to these.
  • the output control unit 18a selects a default character during the first response process, and selects a character designated by the character designation information during the second response process.
  • the character switching unit 20 performs an effect process indicating that a different character has appeared in the dialogue apparatus 10a during the character switching. The following processing is performed.
  • the head of the dialogue apparatus 10a is started to rotate while the character image of the first character, which is the character selected during the first response process, is displayed. Then, when the rotation of the head is finished, the display unit 21 and the operation unit 22 are controlled so that the character image of the second character selected during the first response process is displayed instead of the character image of the first character. To do.
  • a voice calling the second character for example, “Mr. XX who is in charge of news!” May be output as a voice.
  • the footsteps that the first character moves away and the footsteps that the second character approaches may be output as audio.
  • the character switching unit 20 may perform the following process as the effect process. As shown in (b) or (c) of FIG. 4, the display is gradually switched from the character image of the first character to the character image of the second character that is the character selected during the second response process.
  • the display unit 21 is controlled.
  • the interactive device 10a itself does not rotate, but displays to appear to rotate.
  • an effect by voice output may be performed in the same manner as described above.
  • the output control unit 18a does not switch the character between the first response process and the second response process.
  • a default audio output such as “I'm busy, ask me later” or “Newscaster is off today” may be made.
  • the flow of dialogue processing in the dialogue system 100 is basically the same as the flow of processing in the dialogue system 100 of the first embodiment shown in FIG.
  • the second response information is generated including the character designation information for designating the character when the second response information is output as a voice.
  • the first character appears in the dialogue device 10a.
  • the character switching unit 20 performs the above-described effect processing, and the second character designated in the character designation information is designated as the dialogue device 10a. To appear.
  • the character switching unit 20 performs the above-described effect processing after the voice output “Please wait a moment” is performed, it is possible to easily tell the user that the character is changed.
  • the second response information includes information specifying any one of a plurality of characters and is received by the interactive device 10a.
  • the interactive device 10a is as follows. It may be configured.
  • the output control unit 18 is configured to determine which one of the plurality of characters is designated from the recognition result of the input speech by the speech recognition unit 16, and to select the designated character at the time of the second response process. It may be.
  • the output control unit 18 may determine the specified character if the input voice by the user includes information specifying the character itself, or may determine the character from the metadata in the dialog with the user. You may judge by analogy. This will be explained using a specific example. In the former case, if the voice input by the user includes the commands “teacher”, “call the teacher”, and “speak teacher”, select the character “teacher”. The latter means that if the dialogue with the user is related to study, the character of “teacher” is selected.
  • the above determination may not be performed by the output control unit 18, and a block for performing the above determination based on the result of speech recognition by the speech recognition unit 16 may be provided separately.
  • the second response information can be provided using the character reflecting the user's intention by selecting the character from the content of the dialogue with the user.
  • the conversation can be interesting.
  • the output control unit 18 selects any one of a plurality of characters when the interactive device 10a is activated or returns from the sleep state, and displays an image of the selected character on the display unit 21 as well as the selection.
  • the voice output unit 12 may output the voice of the character.
  • the interactive device 10a When starting up or returning from the sleep state, the user who wants to interact will be kept waiting, which may cause stress on the user. However, by performing the display and voice output as described above, a character can appear in the interactive device, and a user's stress can be reduced with a gap.
  • the interactive device 10a When the interactive device 10a is activated or the character image or sound shown when the interactive device 10a returns from the sleep state, for example, indicates that the interactive robot has awakened from sleep, the interactive device 10a is in the process of being activated. I can convey this in an easy-to-understand manner.
  • the dialogue system 100b of the present embodiment includes a dialogue device 10b and a cloud server (server device) 30b.
  • the dialogue apparatus 10b will be described below assuming that it is a dialogue robot.
  • the interactive device 10 b includes a display unit 21 and an operation unit 22 in the same manner as the interactive device 10 a of the second embodiment, in addition to the configuration of the interactive device 10 of the first embodiment. Moreover, the interactive device 10b has a home appliance operation mode in which home appliances can be operated for each home appliance. As illustrated in FIG. 6, the home appliance in the user home 40 is connected to the infrared communication or wireless LAN communication from the communication unit 15. It is provided so that it can be operated.
  • Home appliances are, for example, air conditioners (air conditioners), washing machines, refrigerators, cooking utensils, lighting devices, hot water supply equipment, photographing equipment, various AV (Audio-Visual) equipment, various household robots (for example, cleaning robots, housework support) Robot, animal type robot, etc.).
  • air conditioners air conditioners
  • washing machines refrigerators
  • cooking utensils lighting devices
  • hot water supply equipment photographing equipment
  • AV (Audio-Visual) equipment various household robots (for example, cleaning robots, housework support) Robot, animal type robot, etc.).
  • AV Audio-Visual
  • household robots for example, cleaning robots, housework support
  • animal type robot etc.
  • the interactive device 10b can operate the air conditioner 50-1 and the washing machine 50-2.
  • the control unit 13b of the interactive device 10b has a function as the mode setting unit 23 that sets the interactive device 10b to the home appliance operation mode.
  • the mode setting unit 23 determines a home appliance to be operated from the input voice input from the voice input unit 11, and sets the interactive device 10b to the home appliance operation mode of the determined home appliance. Therefore, when it is inferred that the user wishes to operate the air conditioner 50-1 from the dialog with the user, the dialog device 10b can set and operate the dialog device 10b in the home appliance operation mode for operating the air conditioner 50-1. It becomes possible.
  • the output control unit 18 may make a determination if the input voice includes information for designating the home appliance to be operated. The determination may be made by analogy with the home appliance to be operated from the data. This will be described using a specific example.
  • the former determines that the home appliance to be operated is the air conditioner 50-1 if the input voice by the user includes the command “turn on air conditioner” or “air conditioner ON”.
  • the latter means that if the input voice includes “hot” metadata, it is determined that the home appliance to be operated is the air conditioner 50-1.
  • a sound for confirming the execution of the operation for example, a voice such as “Would you like to turn on the air conditioner?” Is output from the interactive device 10b and executed by the user.
  • the operation is executed when a voice to be permitted, for example, a voice input such as “Turn on” or “OK” is made.
  • a voice to be permitted for example, a voice input such as “Turn on” or “OK” is made.
  • the data storage unit 14b of the interactive device 10b includes a mode information storage unit 143.
  • the mode information storage unit 143 stores information for setting the interactive device 10b for each home appliance so that the home appliance can be operated. Yes. Further, the mode information storage unit 143 stores the feature amount of the character image and the feature amount of the voice associated with the home appliance for each home appliance.
  • the character associated with the home appliance appears when the interactive device 10b is set to the home appliance operation mode of the home appliance.
  • the appearance of the character can be executed by changing the feature quantity of the image displayed on the display unit 21 and the sound quality of the audio output from the audio output unit 12 as in the second embodiment. Further, the operation may be changed.
  • the interactive device 10b when the interactive device 10b is set to the operation mode of the air conditioner 50-1, a character associated with the air conditioner appears in the interactive device 10b.
  • the mark of the air conditioner is displayed on the forehead part or the stomach part (part of the display unit) of the interactive robot.
  • the interactive device 10b when the interactive device 10b is set to the operation mode of the washing machine 50-2, a character associated with the washing machine appears in the interactive device 10b.
  • the user may be able to register the association between the home appliance and the character in the dialogue apparatus 10b, such as a rabbit character for an air conditioner and a raccoon character for a washing machine.
  • the character in the dialogue apparatus 10b such as a rabbit character for an air conditioner and a raccoon character for a washing machine.
  • the interaction device 10b When a character associated with such a home appliance appears in the interaction device 10b, the user is notified of which home appliance operation mode the interaction device 10b can operate, that is, which is the operation target home appliance. be able to.
  • the dialogue apparatus 10b when a character associated with the air conditioner 50-1 appears in the dialogue apparatus 10b, the dialogue apparatus 10b outputs a voice signal “break” in response to a voice input “break”. In addition, an operation for turning off the power of the air conditioner 50-1 may be performed.
  • the dialogue apparatus 10b When a character associated with the washing machine 50-2 appears, the dialogue apparatus 10b outputs a voice of "Good morning” in response to the voice input of "Good morning”, and at the same time, the washing machine 50 -2 may be configured to turn on the power. Thus, you may have a different function for every character matched with the household appliance.
  • the interactive device 10b when the interactive device 10b also has a function as a television device, it may be configured to project a broadcast program on the display unit 21 when the home appliance operation mode for operating the television is set. Good.
  • the positions of the air conditioner 50-1 and the washing machine 50-2 may be detected using infrared rays, or the interactive apparatus 10b may be connected to a camera.
  • the positions of the air conditioner 50-1 and the washing machine 50-2 may be detected from information obtained from the camera.
  • the cloud server 30b Since the configuration of the cloud server 30b is the same as that of the cloud server 30, description thereof is omitted.
  • the cloud server 30 When the cloud server 30 is communicatively connected to the air conditioner 50-1 and the washing machine 50-2 and collects state information indicating these states, the cloud server 30 performs the operation based on the state information. Two response information may be generated.
  • the second response information may be generated so as to obtain the state information of the washing machine 50-2 and output a voice saying that the washing machine has finished work.
  • the interactive devices 10, 10a, 10b and the cloud servers 30, 30a, 30b described in the first to third embodiments are each realized by a logic circuit (hardware) formed in an integrated circuit (IC chip) or the like. Alternatively, it may be realized by software using a CPU (Central Processing Unit).
  • a logic circuit hardware
  • IC chip integrated circuit
  • CPU Central Processing Unit
  • each of the interactive devices 10, 10a, 10b and the cloud servers 30, 30a, 30b includes a CPU that executes instructions of a program that is software that realizes each function, and the program and various data are computers (or CPUs).
  • ROM Read Only Memory
  • storage device these are referred to as “recording media”
  • RAM Random Access ⁇ Memory
  • the objective of this invention is achieved when a computer (or CPU) reads the said program from the said recording medium and runs it.
  • a “non-temporary tangible medium” such as a tape, a disk, a card, a semiconductor memory, a programmable logic circuit, or the like can be used.
  • the program may be supplied to the computer via an arbitrary transmission medium (such as a communication network or a broadcast wave) that can transmit the program.
  • a transmission medium such as a communication network or a broadcast wave
  • the present invention can also be realized in the form of a data signal embedded in a carrier wave in which the program is embodied by electronic transmission.
  • the present invention is not limited to the above-described embodiments, and various modifications are possible, and the present invention also relates to embodiments obtained by appropriately combining technical means disclosed in different embodiments. Is included in the technical scope. Furthermore, a new technical feature can be formed by combining the technical means disclosed in each embodiment.
  • a dialog device (10) shows a speech recognition unit (16) that recognizes an input voice that has been input, and a response content corresponding to a result of speech recognition by the speech recognition unit (16).
  • the response information storage unit (first response information storage unit 141) for storing the first response information and the input voice are transmitted to the server device (cloud server 30), and the result of the voice recognition of the input voice by the server device.
  • the communication unit (15) that receives the second response information indicating the corresponding response content, and the response content indicated by the first response information obtained by referring to the response information storage unit to the input voice
  • an output control unit (18) for performing a second response process for outputting the response content indicated by the second response information continuously after performing the first response process to be output.
  • the result of speech recognition by the server device is continuously displayed.
  • the response content indicated by the associated second response information is output as a voice.
  • the first response information is in accordance with the result of speech recognition in the own device, it can generally be output earlier from the dialogue device than the second response information received through communication with the server device.
  • the server device can perform a higher level of processing than an individual interactive device, and thus can perform high-level voice recognition. Therefore, according to the above configuration, a quick response to the input voice can be made by the first response process, and various or advanced information can be provided by the second response process that follows the first response process.
  • the speech recognition of the dialogue device since the input speech can be supplemented by the speech recognition of the server device and the response of the second response information in addition to the speech recognition of the own device and the response of the first response information, the speech recognition of the dialogue device It is possible to respond with a plurality of pieces of information without increasing the processing capability of each section and expanding the capacity of the response information storage section.
  • it is possible to smoothly output a plurality of pieces of information it is possible to provide a comfortable interactive environment without giving stress to the user.
  • the user can further interact without stressing the waiting time for receiving the second response information.
  • the interactive apparatus further includes a character feature amount storage unit (character storage unit 142) that stores image feature amounts and audio feature amounts of a plurality of characters in the above aspect 1, and the output control described above.
  • the unit (18) selects one of the plurality of characters at the time of the first response process and the second response process, and refers to the character feature amount storage unit of the selected character.
  • the part (18) selects a character different from the first response process during the second response process, Between the time of 1 response processing and the time of the second response processing, at least one operation of image display, sound output, and exercise is controlled by the own device, and an effect representing the appearance of a different character is performed.
  • a character switching unit (20) is further provided.
  • the character is selected, the character image of the selected character is displayed, and the character voice of the selected character is output by voice, thereby the dialogue device You can make characters appear.
  • a character different from the first response process is selected during the second response process, an effect indicating that a different character has appeared is performed. This effect makes it possible to excite the atmosphere of the appearance of different characters without destroying the image of the character before the appearance due to the appearance of different characters.
  • the user's interest can be attracted by the above-described production by the above control, for example, even when the time from the first response process to the second response process elapses, the user's stress due to the waiting time is reduced. can do.
  • the character switching unit (a) from the character image of the first character that is the character selected during the first response process, the character of the second character that is the character selected during the second response process. Control the display to be gradually switched to the image, or (b) Start the rotation of the device with the character image of the first character displayed, and when the rotation ends, the first character Different characters appeared by controlling the display of the image and the rotation, movement, direction change, vibration, etc. of the device so that the character image of the second character is displayed instead of the character image of An effect representing this may be performed.
  • the second response information includes designation information for designating any one of the plurality of characters
  • the output control unit ( 18) selects a character designated by the designation information included in the second response information during the second response process.
  • the character designation information is included in the second response information. Therefore, if a character suitable for the content of the second response information is specified in advance, the specified character can appear at the time of the second response process, so that it is persuasive or interesting.
  • the second response information can be provided to the user.
  • the output control unit (18) determines any one of the plurality of characters from the result of speech recognition of the input speech by the speech recognition unit (16). It is determined whether one is designated, and the designated character is selected during the second response process.
  • a character is designated from the voice input by the user, and the designated character can appear in the dialogue device during the second response process.
  • the specified character may be determined based on the character or by analogizing the character from the metadata in the dialog with the user. You may judge.
  • 2nd response information can be provided using the character reflecting a user's intention.
  • the conversation can be interesting.
  • the output control unit (18) selects any one of the plurality of characters when the own device is activated or when returning from the sleep state.
  • the character image determined by the feature amount of the image obtained by referring to the character feature amount storage unit of the selected character is displayed, and the voice of the selected character obtained by referring to the character feature amount storage unit
  • the character voice determined by the feature amount is output as a voice.
  • the character is selected at the time of starting up the device or returning from the sleep state, the character image of the selected character is displayed, and the character voice of the selected character is output as a voice to the dialogue device.
  • the appearance of such a character can attract the user's interest and can reduce the stress in the waiting time at the time of activation or return from the sleep state.
  • the interactive apparatus has the home appliance operation mode capable of operating home appliances for each home appliance in any one of the above aspects 1 to 5, and the input by the voice recognition unit (16).
  • a mode setting unit (23) is further provided that determines a home appliance to be operated from the result of voice recognition of the voice and sets the own apparatus in the home appliance operation mode of the determined home appliance.
  • the interactive device can be set to the home appliance operation mode in which the home appliance can be operated, and the home appliance to be operated can be determined from the input voice. Therefore, when it is inferred that the user wants to operate the home appliance from the dialog with the user, the interactive device can set the own device to the home appliance operation mode for operating the home appliance and perform the operation.
  • the interactive device includes a display unit and is configured to display an operation target home appliance or a character representing the home appliance when the home appliance operation mode is set
  • the operation target home appliance is displayed to the user. Can be clearly notified.
  • a dialogue system is configured by connecting the dialogue apparatus according to any one of aspects 1 to 6 above and a server apparatus having a voice recognition function via a communication network.
  • the server device according to aspect 8 of the present invention is a server device provided in the interactive system according to aspect 7 described above.
  • the dialogue system according to aspect 7 can be constructed.
  • the interactive device, the server device, or the interactive system according to each aspect of the present invention may be realized by a computer.
  • each unit speech recognition unit, computer included in the interactive device, the server device, or the interactive system.
  • An output control unit, a character switching unit, a voice recognition unit, a mode setting unit), and a program for realizing a dialog device, a server device, or a dialog system on a computer and a computer-readable recording medium on which the program is recorded It falls into the category of the invention.
  • the present invention can be used for an interactive device connected to a communication network and recognizing and responding to a user's voice.

Abstract

The purpose of the present invention is to provide a conversation device which is capable of responding smoothly, and which is capable of providing a comfortable conversation environment without putting stress on a user. Provided is a conversation device (10), comprising an output control unit (18) which, in response to a voice input, carries out, in series after carrying out a first response process which voice outputs response content which is signified with first response information which is obtained by querying a first response information storage unit (141), a second response process which voice outputs response content which is signified with second response information which is received from a cloud server (30).

Description

対話装置Dialogue device
 本発明は、通信ネットワークに接続した、ユーザの音声を認識して応答する対話装置及び対話システムに関する。 The present invention relates to an interactive apparatus and an interactive system that are connected to a communication network and recognize and respond to a user's voice.
 近年、介護や癒しのケアロボットや家事代行ロボットといったロボットが徐々にユーザの生活に浸透してきており、例えば、特許文献1~4に開示されているように、対話エンジンを備えユーザの音声を認識して応答する対話ロボット(対話装置)も開発されている。このような対話ロボットは、備えられた対話エンジンの性能やコストの面から、複雑な音声認識を行うことは困難であり、また応答もパターン化された内容や単純な内容であり、ユーザにとって面白みが少なく、飽きやすいものとなりがちである。 In recent years, robots such as care and healing care robots and robots for housekeeping have gradually permeated the lives of users. For example, as disclosed in Patent Documents 1 to 4, a dialogue engine is provided to recognize a user's voice. Dialogue robots (dialogue devices) that respond in response have also been developed. Such a dialogue robot is difficult to perform complex speech recognition due to the performance and cost of the equipped dialogue engine, and the response is also patterned or simple, which is interesting for the user. It tends to be less likely to get bored.
 そこで、対話ロボットを通信ネットワークにてサーバ装置と接続させ、当該サーバ装置にて音声認識を行い、対話型ロボットでは、サーバ装置から認識結果に基づく応答に関する情報を受信して出力(応答)するシステムも開発されている。このようなシステムによれば、対話ロボット単体では処理できない内容の音声に対しても応答することができ、ユーザにとってはより多くの情報を得ることが可能となる。さらに、サーバ装置は、複数のロボットで利用可能であるため、対話ロボット毎の処理能力を上げるよりもコスト面で有利である。 Therefore, a system in which a dialog robot is connected to a server device via a communication network, and voice recognition is performed by the server device. In the interactive robot, information related to a response based on the recognition result is received from the server device and output (response). Has also been developed. According to such a system, it is possible to respond to a voice whose content cannot be processed by the dialog robot alone, and it is possible to obtain more information for the user. Furthermore, since the server device can be used by a plurality of robots, it is more advantageous in terms of cost than increasing the processing capability of each interactive robot.
国際公開公報WO05/076258A1(2005年8月18日公開)International Publication No. WO05 / 076258A1 (released on August 18, 2005) 日本国公開特許公報「特開2006-043780(2006年2月16日公開)」Japanese Patent Publication “Japanese Patent Laid-Open No. 2006-043780 (published February 16, 2006)” 日本国公開特許公報「特開2010-128281(2010年6月10日公開)」Japanese Patent Publication “JP 2010-128281 (published on June 10, 2010)” 日本国公開特許公報「特開2003-022092(2003年1月24日公開)」Japanese Patent Publication “JP 2003-022092 (published on January 24, 2003)”
 しかしながら、対話ロボットがサーバ装置から応答内容を取得して応答する場合、対話ロボット単体で音声認識して応答する場合と比べて、応答のタイミングが遅れてしまう。そのため、ユーザがストレスを感じ、会話し難いといった思いをすることもある。 However, when the interactive robot responds by acquiring the response content from the server device, the response timing is delayed as compared with the case where the interactive robot alone recognizes and responds. For this reason, the user may feel stressed and feel difficult to talk.
 そこで、本発明は、上記の問題点に鑑みてなされたものであり、複数の情報をスムーズに音声出力でき、ユーザにストレスを与えることなく快適な対話環境を提供できる対話装置及び対話システムを提供することにある。 Accordingly, the present invention has been made in view of the above problems, and provides an interactive apparatus and an interactive system that can smoothly output a plurality of information and provide a comfortable interactive environment without stressing the user. There is to do.
 上記の課題を解決するために、本発明の一態様に係る対話装置は、入力された入力音声を音声認識する音声認識部と、上記音声認識部による音声認識の結果に応じた応答内容を示す第1応答情報を格納する応答情報格納部と、上記入力音声をサーバ装置に送信し、当該サーバ装置による上記入力音声の音声認識の結果に応じた応答内容を示す第2応答情報を受信する通信部と、上記入力音声に対して、上記応答情報格納部を参照して得られる上記第1応答情報で示される応答内容を音声出力する第1応答処理を行った後に連続して上記第2応答情報で示される応答内容を音声出力する第2応答処理を行う出力制御部と、を備えたえたことを特徴とする。 In order to solve the above-described problem, an interactive apparatus according to an aspect of the present invention shows a speech recognition unit that recognizes input speech that has been input, and a response content corresponding to a result of speech recognition by the speech recognition unit. Response information storage unit for storing first response information, and communication for transmitting the input voice to a server device and receiving second response information indicating response contents according to a result of voice recognition of the input voice by the server device And the second response continuously after performing the first response processing for outputting the response content indicated by the first response information obtained by referring to the response information storage unit to the input speech And an output control unit for performing a second response process for outputting the response content indicated by the information as a voice.
 本発明の一態様に係る対話装置によると、入力音声に対して、自装置での音声認識及び第1応答情報での応答に加え、サーバ装置の音声認識及び第2応答情報での応答で補えるので、対話装置の音声認識部の処理能力の向上や応答情報格納部の容量の拡大を図ることなく、複数の情報にて応答可能である。このように、上記構成によると、複数の情報をスムーズに音声出力でき、ユーザにストレスを与えることなく快適な対話環境を提供できる。 According to the interactive device of one aspect of the present invention, the input speech can be supplemented by the speech recognition of the server device and the response by the second response information in addition to the speech recognition by the device itself and the response by the first response information. Therefore, it is possible to respond with a plurality of pieces of information without improving the processing capability of the voice recognition unit of the interactive apparatus and expanding the capacity of the response information storage unit. Thus, according to the above configuration, a plurality of information can be smoothly output as a voice, and a comfortable interactive environment can be provided without causing stress to the user.
本発明の実施の形態1に係る対話システムの概略構成を示す図である。It is a figure which shows schematic structure of the dialogue system which concerns on Embodiment 1 of this invention. 本発明の実施の形態1に係る対話システムにおける対話処理の流れを示す図である。It is a figure which shows the flow of the dialogue process in the dialogue system which concerns on Embodiment 1 of this invention. 本発明の実施の形態2に係る対話システムの概略構成を示す図である。It is a figure which shows schematic structure of the dialogue system which concerns on Embodiment 2 of this invention. 本発明の実施の形態2に係る対話装置の動作を説明する図である。It is a figure explaining operation | movement of the dialogue apparatus which concerns on Embodiment 2 of this invention. 本発明の実施の形態3に係る対話システムの概略構成を示す図である。It is a figure which shows schematic structure of the dialogue system which concerns on Embodiment 3 of this invention. 本発明の実施の形態3に係る対話システムの概念図である。It is a conceptual diagram of the dialogue system which concerns on Embodiment 3 of this invention.
 〔実施の形態1〕
 以下、本発明の一実施形態について図1~3に基づいて説明すれば以下の通りである。
[Embodiment 1]
Hereinafter, an embodiment of the present invention will be described with reference to FIGS.
 (対話システムの構成)
 図1は、本実施の形態に係る対話システム100の構成を示す図である。図1に示すように、対話システム100は、対話装置(自装置)10とクラウドサーバ(サーバ装置)30とを備えており、これらは通信ネットワークを介して接続している。この通信ネットワークとしては、例えば、インターネットが利用できる。また、電話回線網、移動体通信網、CATV(CAble TeleVision)通信網、衛星通信網などを利用することもできる。
(Configuration of interactive system)
FIG. 1 is a diagram showing a configuration of a dialogue system 100 according to the present embodiment. As shown in FIG. 1, the dialogue system 100 includes a dialogue device (self device) 10 and a cloud server (server device) 30, which are connected via a communication network. For example, the Internet can be used as this communication network. Further, a telephone line network, a mobile communication network, a CATV (CAble TeleVision) communication network, a satellite communication network, or the like can be used.
 対話システム100において、対話装置10及びクラウドサーバ30はそれぞれ音声認識機能を有しており、ユーザは自然言語を用いた音声によって対話装置10と対話することができる。対話装置10は、例えば、対話ロボットであってもよいし、音声認識機能を備えた、スマートフォン、タブレット端末、パーソナルコンピュータ、家電(家庭用電子機器)等であってもよい。 In the dialogue system 100, the dialogue device 10 and the cloud server 30 each have a voice recognition function, and the user can interact with the dialogue device 10 by voice using natural language. The dialogue apparatus 10 may be, for example, a dialogue robot, or may be a smartphone, a tablet terminal, a personal computer, a home appliance (home electronic device) or the like having a voice recognition function.
 なお、図1では、説明の簡略化のため、クラウドサーバ30に接続している対話装置10は1つしか示していないが、対話システム100では、クラウドサーバ30に接続する対話装置10の数は限定されない。また、クラウドサーバ30に接続する対話装置10の種類は問わず、つまり、クラウドサーバ30に対して、対話ロボットとスマートフォンといったように異なる種類の対話装置10が接続していてもよい。 In FIG. 1, only one interactive device 10 connected to the cloud server 30 is shown for simplicity of explanation, but in the interactive system 100, the number of interactive devices 10 connected to the cloud server 30 is as follows. It is not limited. In addition, the type of the interactive device 10 connected to the cloud server 30 is not limited, that is, different types of interactive devices 10 such as an interactive robot and a smartphone may be connected to the cloud server 30.
 (対話装置)
 次に、対話装置10の構成について説明する。対話装置10は、音声(音声信号)が入力されると、音声認識を行い、その認識結果に応じた対話を行う装置である。対話装置10は、図1に示すように、音声入力部11、音声出力部12、制御部13、データ格納部14、及び通信部15を備えている。
(Interactive device)
Next, the configuration of the interactive apparatus 10 will be described. The dialogue device 10 is a device that performs voice recognition when a voice (voice signal) is inputted and performs a dialogue according to the recognition result. As shown in FIG. 1, the dialogue apparatus 10 includes a voice input unit 11, a voice output unit 12, a control unit 13, a data storage unit 14, and a communication unit 15.
 音声入力部11は、マイク等の音声入力装置であり、音声出力部12は、スピーカ等の音声出力装置である。 The voice input unit 11 is a voice input device such as a microphone, and the voice output unit 12 is a voice output device such as a speaker.
 制御部13は、対話装置10の各部の動作を制御するブロックである。制御部13は、例えば、CPU(Central Processing Unit)や専用プロセッサなどの演算処理部などにより構成されるコンピュータ装置から成る。制御部13は、データ格納部14に記憶されている対話装置10における各種制御を実施するためのプログラムを読み出して実行することで、対話装置10の各部の動作を統括的に制御する。 The control unit 13 is a block that controls the operation of each unit of the dialogue apparatus 10. For example, the control unit 13 includes a computer device including an arithmetic processing unit such as a CPU (Central Processing Unit) or a dedicated processor. The control unit 13 reads out and executes a program for executing various controls in the interactive device 10 stored in the data storage unit 14, thereby controlling the operation of each unit of the interactive device 10 in an integrated manner.
 また、制御部13は、音声認識部16、応答情報取得部17、出力制御部18、及び音声合成部19としての機能を有する。 Further, the control unit 13 has functions as a speech recognition unit 16, a response information acquisition unit 17, an output control unit 18, and a speech synthesis unit 19.
 音声認識部16は、ユーザからの入力音声を認識するブロックである。具体的には、音声認識部16は、音声入力部11から入力された音声データをテキストデータに変換して、そのテキストデータを解析して単語やフレーズを抽出する。なお、音声認識の処理について公知技術を用いることができる。 The voice recognition unit 16 is a block that recognizes an input voice from the user. Specifically, the voice recognition unit 16 converts voice data input from the voice input unit 11 into text data, analyzes the text data, and extracts words and phrases. A known technique can be used for voice recognition processing.
 応答情報取得部17は、音声認識部16の認識結果に応じた応答内容を示す応答情報を、以下で説明する第1応答情報格納部(応答情報格納部)141から検出するブロックである。本実施の形態では、応答情報取得部17が第1応答情報格納部141から取得する応答情報を第1応答情報と称する。応答情報取得部17は、第1応答情報格納部141を参照し、音声認識部16が抽出した単語やフレーズに対応する第1応答情報を取得する。応答情報取得部17は、音声認識部16が抽出した単語やフレーズに対応する情報が第1応答情報格納部141に登録されていなければ、または、音声認識部16が音声認識に失敗した場合には、デフォルトの第1応答情報を取得する。デフォルトの第1応答情報の具体例を挙げると、「ちょっと待ってね」や「聞いてみるね」等の音声が出力される情報(応答内容の音声が「ちょっと待ってね」や「聞いてみるね」等となる情報)である。なお、これらに限定されない。 The response information acquisition unit 17 is a block that detects response information indicating response content according to the recognition result of the voice recognition unit 16 from a first response information storage unit (response information storage unit) 141 described below. In the present embodiment, the response information that the response information acquisition unit 17 acquires from the first response information storage unit 141 is referred to as first response information. The response information acquisition unit 17 refers to the first response information storage unit 141 and acquires the first response information corresponding to the words and phrases extracted by the voice recognition unit 16. If the information corresponding to the word or phrase extracted by the speech recognition unit 16 is not registered in the first response information storage unit 141 or if the speech recognition unit 16 has failed in speech recognition, the response information acquisition unit 17 Obtains default first response information. Specific examples of the default first response information include information such as “Please wait for a while” or “Let me hear” (for example, “Please wait for a while” or “Listen Information that becomes "I see". Note that the present invention is not limited to these.
 出力制御部18は、音声データを音声出力部12に出力させることで音声出力を行うブロックである。出力制御部18は、音声入力部11からの入力音声に対する応答として、第1応答処理を行った後に連続して第2応答処理を行う。第1応答処理とは、応答情報取得部17が取得した第1応答情報で示される応答内容を音声出力する処理であり、第2応答処理とは、クラウドサーバ30から受信した第2応答情報で示される応答内容を音声出力する処理である。第2応答情報については後述する。 The output control unit 18 is a block that performs audio output by causing the audio output unit 12 to output audio data. The output control unit 18 performs the second response process continuously after performing the first response process as a response to the input voice from the voice input unit 11. The first response process is a process of outputting the response content indicated by the first response information acquired by the response information acquisition unit 17 by voice. The second response process is the second response information received from the cloud server 30. This is a process for outputting the displayed response content by voice. The second response information will be described later.
 音声合成部19は、音声データを生成(音声合成)するブロックである。音声合成部19は、第1応答情報で示される応答内容の音声データを生成する。生成された音声データは、音声出力部12を介して出力される。なお、第1応答情報が音声データとして生成されている(録音音声)である場合には、音声合成部19での生成は行わない。 The speech synthesizer 19 is a block that generates speech data (speech synthesis). The voice synthesizer 19 generates voice data having response contents indicated by the first response information. The generated audio data is output via the audio output unit 12. If the first response information is generated as voice data (recorded voice), the voice synthesizer 19 does not generate it.
 データ格納部14は、RAM(Random Access Memory)、ROM(Read Only Memory)、HDD(Hard Disk Drive)などを含み、対話装置10にて用いられる各種情報(データ)を記憶するブロックである。また、データ格納部14には、第1応答情報格納部141が含まれる。第1応答情報格納部141は、単語やフレーズに対応させて第1応答情報が登録されているデータベースである。第1応答情報は、単語1つに対応したものだけでなく、複数の単語の組み合わせに対応しものが登録されている。また、ある単語やあるフレーズに対応させて複数の第1対応情報が登録されていてもよく、この場合、実際に音声出力されるものを選択すればよい。なお、単語やフレーズおよび第1応答情報は、何れもテキストデータとして格納しておけばよい。このようなデータベースの構築、また、データベースからの応答情報の取得については、公知技術が利用できる。 The data storage unit 14 includes a RAM (Random Access Memory), a ROM (Read Only Memory), an HDD (Hard Disk Drive), and the like, and is a block that stores various information (data) used in the interactive device 10. Further, the data storage unit 14 includes a first response information storage unit 141. The first response information storage unit 141 is a database in which first response information is registered in association with words and phrases. The first response information includes not only information corresponding to one word but also information corresponding to a combination of a plurality of words. In addition, a plurality of pieces of first correspondence information may be registered corresponding to a certain word or a certain phrase, and in this case, what is actually output by voice may be selected. Note that the words, phrases, and first response information may all be stored as text data. Known techniques can be used to construct such a database and to obtain response information from the database.
 対話装置10は、このように、第1応答情報格納部141を参照することにより、ユーザの発話に対して応答を返すことが、つまり、ユーザとの対話が可能になる。 In this way, the dialogue apparatus 10 can return a response to the user's utterance by referring to the first response information storage unit 141, that is, the dialogue with the user becomes possible.
 通信部15は、外部との通信を行うブロックである。通信部15は、制御部13による制御の下、音声入力部11から音声データが入力されると、音声データをクラウドサーバ30に送信する。そして、後段で詳述するクラウドサーバ30から、クラウドサーバ30による入力音声の音声認識の結果に応じた応答内容を示す第2応答情報を受信する。本実施の形態では、通信部15は音声入力部から入力された音声データをそのままクラウドサーバ30に送信するものとするが、通信部15は、音声認識部16が生成したテキストデータまたは該テキストデータから抽出した単語やフレーズをクラウドサーバ30に送信してもよい。 The communication unit 15 is a block that performs communication with the outside. When the voice data is input from the voice input unit 11 under the control of the control unit 13, the communication unit 15 transmits the voice data to the cloud server 30. And the 2nd response information which shows the response content according to the result of the speech recognition of the input sound by the cloud server 30 is received from the cloud server 30 explained in full detail behind. In the present embodiment, the communication unit 15 transmits the voice data input from the voice input unit to the cloud server 30 as it is. However, the communication unit 15 uses the text data generated by the voice recognition unit 16 or the text data. You may transmit the word and phrase extracted from to the cloud server 30. FIG.
 ここで、本実施の形態では、通信部15がクラウドサーバ30から第2応答情報を受信している間に、出力制御部18が第1応答処理を行うように構成されている。 Here, in the present embodiment, the output control unit 18 is configured to perform the first response processing while the communication unit 15 receives the second response information from the cloud server 30.
 なお、対話装置10は、さらに撮像部(カメラ)を備えていてもよく、例えば、撮像部から入力された画像からユーザの表情や位置を解析してそれに基づき対話をするように構成されていてもよい。例えば、対話装置10がロボットである場合、ロボットの正面から見てユーザの位置が右方向であると認識した場合、ロボットの頭部を実際に右に向ける、あるいは、頭部に顔が右に向いて移動する状態を表示することで、ユーザの方を向いている、つまり応答可能であるという状態を示す構成であってもよい。 The dialogue apparatus 10 may further include an imaging unit (camera). For example, the dialogue apparatus 10 is configured to analyze a user's facial expression and position from an image input from the imaging unit and perform a dialogue based on the analysis. Also good. For example, when the dialogue apparatus 10 is a robot, when it is recognized that the user's position is rightward when viewed from the front of the robot, the head of the robot is actually turned to the right, or the face is turned to the right It may be configured to indicate a state of facing the user, that is, being able to respond by displaying a state of moving toward.
 (クラウドサーバの構成)
 次に、クラウドサーバ30について説明する。クラウドサーバ30は、対話装置10から受信した音声データ(入力音声)に対する応答を生成して対話装置10に送信するサーバである。また、クラウドサーバ30は、対話装置10を管理するサーバであり、複数の対話装置10が接続されている場合には、それぞれを個別に管理する。また、クラウドサーバ30は、対話装置10のユーザに関する情報を併せて管理していてもよく、この場合、ユーザに関する情報はスマートフォンやタブレット等の外部装置からクラウドサーバ30に登録できるようになっていてもよい。
(Cloud server configuration)
Next, the cloud server 30 will be described. The cloud server 30 is a server that generates a response to voice data (input voice) received from the dialogue apparatus 10 and transmits the response to the dialogue apparatus 10. In addition, the cloud server 30 is a server that manages the interactive device 10. When a plurality of interactive devices 10 are connected, the cloud server 30 manages each individually. The cloud server 30 may also manage information related to the user of the interactive device 10. In this case, information related to the user can be registered in the cloud server 30 from an external device such as a smartphone or a tablet. Also good.
 なお、本実施形態では、対話装置10と接続するサーバ装置として、クラウドサービスを提供するクラウドサーバ30を用いて説明を行うが、クラウドサーバに限定されることはない。また、クラウドサーバ30は、1台であってもよいし、複数台が通信ネットワークを介して接続したものであってもよい。 In this embodiment, the server device connected to the interactive device 10 will be described using the cloud server 30 that provides a cloud service, but is not limited to the cloud server. Moreover, the cloud server 30 may be one unit or a plurality of units connected via a communication network.
 クラウドサーバ30は、図1に示すように、制御部31、データ格納部32、及び通信部33を備えている。 The cloud server 30 includes a control unit 31, a data storage unit 32, and a communication unit 33, as shown in FIG.
 制御部31は、例えば、CPUや専用プロセッサなどの演算処理部などにより構成されるコンピュータ装置からなり、クラウドサーバ30の各部の動作を制御するブロックである。また、制御部31は、音声認識部34、応答情報生成部35、及び音声合成部36としての機能を有する。 The control unit 31 includes a computer device configured by an arithmetic processing unit such as a CPU or a dedicated processor, and is a block that controls the operation of each unit of the cloud server 30. In addition, the control unit 31 has functions as a speech recognition unit 34, a response information generation unit 35, and a speech synthesis unit 36.
 音声認識部34は、対話装置10の音声認識部16と同様の機能を有するブロックであるが。ただし、音声認識の能力(性能)は音声認識部16よりも高度である。これにより、対話装置10では音声認識できなかったとしても、クラウドサーバ30で音声認識することが可能になる。 The voice recognition unit 34 is a block having the same function as the voice recognition unit 16 of the dialogue apparatus 10. However, the speech recognition capability (performance) is higher than that of the speech recognition unit 16. Thereby, even if the dialogue apparatus 10 cannot recognize the voice, the cloud server 30 can recognize the voice.
 応答情報生成部35は、第1応答情報に続く応答情報を生成するブロックである。本実施の形態では、応答情報生成部35にて生成された応答情報を第2応答情報と称する。応答情報生成部35は、第2応答情報を生成する際に、音声認識部34の認識結果に応じた応答内容を示す応答情報を、以下で説明する第2応答情報格納部321から検出することで、第2応答情報を生成する。 The response information generation unit 35 is a block that generates response information following the first response information. In the present embodiment, the response information generated by the response information generation unit 35 is referred to as second response information. When the response information generation unit 35 generates the second response information, the response information generation unit 35 detects response information indicating the response content according to the recognition result of the voice recognition unit 34 from the second response information storage unit 321 described below. Then, the second response information is generated.
 音声合成部36は、音声データを生成するブロックである。音声合成部36は、応答情報生成部35が生成した第2応答情報で示される応答内容の音声データを生成するブロックである。 The voice synthesizer 36 is a block that generates voice data. The voice synthesizer 36 is a block that generates voice data of response contents indicated by the second response information generated by the response information generator 35.
 さらに、クラウドサーバ30は、通信ネットワークを経由して外部の情報提供サーバから情報(外部提供情報)を受信するように構成されている。そこで、応答情報生成部35は、外部提供情報や上記で説明したクラウドサーバ30に登録されたユーザ情報、あるいはこれらの組み合わせ等に基づき、第2応答情報を生成してもよい。外部提供情報の具体例を挙げると、ウエザー情報、交通情報、災害情報等であるが、これらには限定されない。また、クラウドサーバ30に情報を提供する情報提供サーバの数は限定されない。 Furthermore, the cloud server 30 is configured to receive information (external provision information) from an external information provision server via a communication network. Therefore, the response information generation unit 35 may generate the second response information based on the externally provided information, the user information registered in the cloud server 30 described above, or a combination thereof. Specific examples of the externally provided information include weather information, traffic information, disaster information, and the like, but are not limited thereto. Further, the number of information providing servers that provide information to the cloud server 30 is not limited.
 このように、第2応答情報を生成する際に外部提供情報を利用することで、例えば、「おはよう。」という入力音声に対して、第1応答情報で示される応答内容の音声(第1応答情報で出力される音声)が「おはよう。」であっても、第2応答情報で示される応答内容の音声(第2応答情報で出力される音声)が「今日の天気は曇りのち雨だから出かけるなら傘を持って行った方がいいね。」のような高度な応答を行うことができる。この場合、外部提供情報が天候の情報であり、それを基に第2応答情報が生成されたとういことである。 In this way, by using the externally provided information when generating the second response information, for example, the voice of the response content indicated by the first response information (the first response) with respect to the input voice “Good morning” The voice of the response content indicated by the second response information (the voice output by the second response information) is “Today because today ’s weather is cloudy and rainy” It ’s better to take an umbrella with you. ” In this case, the externally provided information is weather information, and the second response information is generated based on the weather information.
 上記で説明したように、第2応答情報は第1応答情報の音声出力後に連続して音声出力されるものであるため、内容が連続するもの、あるいは第1応答情報の内容の詳細内容であると応答に統一感が生まれるため好ましい。この場合、クラウドサーバ30が、どの入力音声にはどの第1応答情報が対話装置10から出力されるかを予め把握しておくか、あるいは、対話装置10が第1応答情報を取得した際にクラウドサーバ30に通知してもよい。 As described above, since the second response information is continuously output after voice output of the first response information, the second response information is a continuous content or a detailed content of the first response information. It is preferable because a sense of unity is born in the response. In this case, the cloud server 30 knows in advance which first response information is output from the interactive device 10 for which input voice, or when the interactive device 10 acquires the first response information. The cloud server 30 may be notified.
 クラウドサーバ30にて生成された第2応答情報で示される応答内容の音声データは、制御部31が通信部33を制御して対話装置10に送信される。 The voice data of the response content indicated by the second response information generated by the cloud server 30 is transmitted to the dialogue apparatus 10 by the control unit 31 controlling the communication unit 33.
 本実施の形態では、クラウドサーバ30にて第2応答情報を音声データに生成してから送信するように構成されているため、対話装置10の負荷を低減させることできる。ここで、対話装置10とクラウドサーバ30とがそれぞれ音声合成部を有していることにより、それぞれで合成した音声の声質が異なったとしても、実施の形態2で説明するように第1応答処理と第2応答処理とで対話装置10に出現させるキャラクターを変更することで、ユーザが感じる違和感を無くすことができる。なお、クラウドサーバ30が音声合成部36を有しておらず、第2応答情報をテキストデータで対話装置10に送信する構成であってもよい。この場合、第2応答情報は、対話装置10の音声合成部19にて音声データとして生成される。 In the present embodiment, the cloud server 30 is configured to generate the second response information as voice data and then transmit it, so the load on the interactive device 10 can be reduced. Here, since each of the dialogue apparatus 10 and the cloud server 30 has a voice synthesis unit, even if the voice quality of the synthesized voice differs, the first response process as described in the second embodiment. By changing the character that appears in the dialogue apparatus 10 in the second response process, it is possible to eliminate the uncomfortable feeling that the user feels. The cloud server 30 may not have the voice synthesizer 36 and may be configured to transmit the second response information to the dialogue apparatus 10 as text data. In this case, the second response information is generated as voice data by the voice synthesizer 19 of the dialogue apparatus 10.
 また、クラウドサーバ30は、例えば、スマートフォンやタブレット等の外部装置から録音音声を登録できるようになっていてもよい。そして、この場合、応答情報生成部35が、この登録された録音音声を第2応答情報として取得することも、第2応答情報の生成に含めてもよい。録音音声は音声データとして形成されているので、そのまま対話装置10に送信すると、対話装置10での音声合成の処理はなされない。例えば、「冷蔵庫にケーキがあるよ」という音声がユーザの母親のスマートフォンからクラウドサーバ30に登録されると、対話装置10が、ユーザの「ただいま」という入力音声に対して、第1応答情報を用いて「おかえり」を音声出力して、続けて、第2応答情報を用いて「お母さんからの伝言だよ。「冷蔵庫にケーキがあるよ」。」を音声出力する、というような高度な応答を行うことができる。 Also, the cloud server 30 may be able to register recorded voices from an external device such as a smartphone or a tablet, for example. In this case, the response information generation unit 35 may acquire the registered recorded voice as the second response information, or may be included in the generation of the second response information. Since the recorded voice is formed as voice data, if it is transmitted to the dialogue apparatus 10 as it is, the voice synthesis process in the dialogue apparatus 10 is not performed. For example, when the voice “There is a cake in the refrigerator” is registered in the cloud server 30 from the smartphone of the user ’s mother, the dialogue apparatus 10 receives the first response information in response to the input voice of the user “I ’m right”. Use it to output “Okaeri” as a voice, and then use the second response information to say “A message from your mother.“ There is a cake in the refrigerator. ” It is possible to perform an advanced response such as outputting “
 データ格納部32は、クラウドサーバ30で用いられる各種情報(データ)を記憶するブロックである。また、データ格納部32には第2応答情報格納部321が含まれる。第2応答情報格納部321は、単語やフレーズに対応させて第2応答情報が登録されているデータベースである。第2応答情報格納部321には第1応答情報格納部141よりも多量の情報が格納されている。また、第2応答情報が定期的に更新されるようになっていてもよい。 The data storage unit 32 is a block that stores various information (data) used in the cloud server 30. Further, the data storage unit 32 includes a second response information storage unit 321. The second response information storage unit 321 is a database in which second response information is registered in association with words and phrases. The second response information storage unit 321 stores a larger amount of information than the first response information storage unit 141. Further, the second response information may be updated periodically.
 対話システム100では、対話装置10では音声認識ができなかったり、第1応答情報が簡単な応答内容であったりしても、上記のようにクラウドサーバ30の応答情報生成部35及び第2応答情報格納部321が構成されていることにより、入力音声に対して、正しく音声認識でき、複数の情報を返答可能である。 In the interactive system 100, even if the interactive device 10 cannot perform voice recognition or the first response information has simple response contents, the response information generating unit 35 and the second response information of the cloud server 30 as described above. Since the storage unit 321 is configured, it is possible to correctly recognize the input voice and to return a plurality of information.
 通信部33は、外部との通信を行うブロックである。通信部33は、対話装置10に加え、図示しない外部の情報提供サーバやスマートフォンやタブレット等の外部装置と通信ネットワークにて接続する。なお、クラウドサーバ30と接続する装置の数は限定されない。 The communication unit 33 is a block that performs communication with the outside. The communication unit 33 is connected to an external information providing server (not shown) or an external device such as a smartphone or a tablet through a communication network in addition to the interactive device 10. Note that the number of devices connected to the cloud server 30 is not limited.
 (対話システムにおける処理の流れ)
 次に、対話システム100における対話処理の流れを、図3を参照して説明する。
(Processing flow in dialogue system)
Next, the flow of dialogue processing in the dialogue system 100 will be described with reference to FIG.
 対話装置10は、ユーザ2からの発話の音声データ(入力音声)を受信すると(工程A1)、受信した音声データをクラウドサーバ30に送信し(工程A2)、第1応答情報の取得を行う(工程A3)。工程A3では、入力音声を音声認識して、音声認識の結果に応じた応答内容を示す第1応答情報を取得する。なお、工程A2と工程A3とはどちらが先に開始されてもよい。そして、対話装置10は、第1応答情報で示される応答内容を音声出力する(工程A4)。 When the dialogue apparatus 10 receives the voice data (input voice) of the utterance from the user 2 (step A1), the dialogue apparatus 10 transmits the received voice data to the cloud server 30 (step A2) and acquires the first response information (step A2). Step A3). In step A3, the input speech is recognized as speech, and first response information indicating response contents according to the result of speech recognition is acquired. Note that either step A2 or step A3 may be started first. Then, the dialogue apparatus 10 outputs the response content indicated by the first response information as a voice (step A4).
 他方、クラウドサーバ30は、対話装置10から音声データを受信すると(工程B1)、第2応答情報を生成し(工程B2)、生成した第2応答情報を対話装置10に送信する(工程B3)。 On the other hand, when the cloud server 30 receives the voice data from the interactive device 10 (step B1), the cloud server 30 generates second response information (step B2), and transmits the generated second response information to the interactive device 10 (step B3). .
 ここで、対話装置10は、クラウドサーバ30から第2応答情報を受信している間に、第1応答情報で示される応答内容を音声出力する。 Here, while receiving the second response information from the cloud server 30, the dialogue apparatus 10 outputs the response content indicated by the first response information as a voice.
 対話装置10は、第2応答情報を受信すると(工程A5)、第2応答情報で示される追う応答内容を音声出力する(工程A6)。以上で、対話システム100における対話処理が終わる。 When the interactive device 10 receives the second response information (step A5), the dialogue device 10 outputs the following response content indicated by the second response information as a voice (step A6). Thus, the dialogue process in the dialogue system 100 ends.
 以上のように、対話システム100では、対話装置10は、音声認識部16にて音声認識した結果に対応付けられた第1応答情報で示される応答内容を音声出力させた後に連続して、クラウドサーバ30による音声認識の結果に対応付けられた第2応答情報で示される応答内容を音声出力する。 As described above, in the dialogue system 100, the dialogue device 10 continuously outputs the response content indicated by the first response information associated with the result of the voice recognition performed by the voice recognition unit 16, after the voice is output, The response content indicated by the second response information associated with the result of the speech recognition by the server 30 is output as speech.
 ここで第1応答情報は対話装置10での音声認識の結果に応じたものであるため、クラウドサーバ30との通信を介して受信する第2応答情報よりも早く、対話装置10から出力することができる。また、クラウドサーバ30は、個々の対話装置10よりも高度な処理が可能であるため、高度の音声認識を行うことができる。よって、対話装置10は、第1応答処理により入力音声へのすばやい応答ができる上、第1応答処理の後に連続する第2応答処理により多様なまたは高度な情報を提供することができる。 Here, since the first response information is in accordance with the result of voice recognition in the dialog device 10, the first response information is output from the dialog device 10 earlier than the second response information received through communication with the cloud server 30. Can do. Moreover, since the cloud server 30 can perform advanced processing than the individual interactive devices 10, it can perform advanced speech recognition. Therefore, the interactive device 10 can make a quick response to the input voice by the first response process, and can provide various or advanced information by the second response process that continues after the first response process.
 このように、対話システム100によると、ユーザからの入力音声に対して、対話装置10での音声認識及び第1応答情報での応答に加え、クラウドサーバ30の音声認識及び第2応答情報での応答で補えるので、対話装置10の音声認識部16の処理能力の向上やデータ格納部14の容量の拡大を図ることなく、複数の情報にて応答可能である。従って、対話システム100は、スムーズに複数の情報を音声出力でき、ユーザにストレスを与えることなく快適な対話環境を提供できる。 As described above, according to the dialogue system 100, in addition to the voice recognition in the dialogue device 10 and the response in the first response information, the voice recognition in the cloud server 30 and the second response information in response to the input voice from the user. Since it can be supplemented by a response, it is possible to respond with a plurality of pieces of information without improving the processing capability of the voice recognition unit 16 of the interactive apparatus 10 and expanding the capacity of the data storage unit 14. Therefore, the interactive system 100 can smoothly output a plurality of information as voices, and can provide a comfortable interactive environment without stressing the user.
 さらに、対話システム100では、対話装置10において、通信部15がクラウドサーバ30から第2応答情報を受信している間に、出力制御部18が第1応答処理を行うように構成されているため、ユーザに第2応答情報の受信の待ち時間のストレスを与えることなく対話を行うことができる。 Furthermore, in the interactive system 100, in the interactive device 10, the output control unit 18 is configured to perform the first response process while the communication unit 15 receives the second response information from the cloud server 30. The user can interact without stressing the waiting time for receiving the second response information.
 〔実施の形態2〕
 本発明の他の実施の形態の対話システムについて図3及び4を用いて説明する。なお説明の便宜上、実施の形態1にて説明した部材と同じ機能を有する部材については、同じ符号を付記し、その説明を省略する。
[Embodiment 2]
An interactive system according to another embodiment of the present invention will be described with reference to FIGS. For convenience of explanation, members having the same functions as those explained in the first embodiment are given the same reference numerals and explanations thereof are omitted.
 本実施の形態の対話システム100aは、図3に示すように、対話装置10aとクラウドサーバ(サーバ装置)30aとを備えている。本実施の形態では、対話装置10aは、図4に示すように表情を表出することでのできる、頭部と胴体部とを有する対話ロボットであるとして説明を行う。 As shown in FIG. 3, the interactive system 100a according to the present embodiment includes an interactive device 10a and a cloud server (server device) 30a. In the present embodiment, the dialogue apparatus 10a will be described as a dialogue robot having a head and a torso that can express facial expressions as shown in FIG.
 対話装置10aは実施の形態1の対話装置10構成に加え、表示部21及び動作部22を備えている。表示部21は、対話ロボットの表情の画像を表出するブロックであり、本実施形態では、リアプロジェクション方式で表示を行うものとするが、これには限定されない。動作部22は、対話装置10aの動作を実行するブロックである。動作部22は、以下で説明するように、第1応答処理時と第2応答処理時との間であるキャラクター切替時に、対話装置10aを回転する動作を実行する。この回転は、回転以外の運動でもよく、図4の(a)に示すように、対話装置10aである対話ロボットの頭部が水平方向に回転する。また、動作部22は、対話装置10aである対話ロボットの頭部、胴体部、または胴体部に付随の腕部等を様々な方向に動作させるように構成されていてもよい。また、キャラクター切替時に、動作部22は回転以外の運動を行ってもよく、例えば、移動、方向転換、振動等を実行してもよい。ここで運動とは、対話装置の少なくとも一部が物理的な動作をすることを指す。 The interactive device 10 a includes a display unit 21 and an operation unit 22 in addition to the configuration of the interactive device 10 of the first embodiment. The display unit 21 is a block that displays an image of the expression of the interactive robot. In the present embodiment, the display unit 21 performs display using the rear projection method, but is not limited thereto. The operation unit 22 is a block that executes the operation of the interactive apparatus 10a. As will be described below, the operation unit 22 performs an operation of rotating the dialogue apparatus 10a at the time of character switching between the first response process and the second response process. This rotation may be a motion other than the rotation, and as shown in FIG. 4A, the head of the dialog robot as the dialog device 10a rotates in the horizontal direction. Further, the operation unit 22 may be configured to operate the head, the body, or the arm attached to the body of the dialog robot that is the dialog device 10a in various directions. Further, at the time of character switching, the motion unit 22 may perform an exercise other than rotation, and may execute, for example, movement, direction change, vibration, and the like. Here, the exercise means that at least a part of the interactive device performs a physical operation.
 また、対話装置10aは、データ格納部14aに、複数のキャラクターの画像の特徴量及び音声の特徴量を格納するキャラクター格納部(キャラクター特徴量格納部)142を含んでいる。また、対話装置10aでは、制御部13aは、キャラクター切替部20としての機能を有する。キャラクター切替部20は、キャラクター切替時に、対話装置10aでの、画像表示、音声出力、及び運動の少なくとも1つの動作を制御して、対話装置10aに異なるキャラクターが出現したことを表す演出処理を行うブロックである。演出処理については、以下で具体例を用いて説明する。 Further, the dialogue apparatus 10a includes a character storage unit (character feature amount storage unit) 142 that stores image feature amounts and audio feature amounts of a plurality of characters in the data storage unit 14a. In the interactive device 10a, the control unit 13a has a function as the character switching unit 20. At the time of character switching, the character switching unit 20 controls at least one of the image display, the sound output, and the exercise in the interactive device 10a, and performs an effect process indicating that a different character has appeared in the interactive device 10a. It is a block. The effect process will be described below using a specific example.
 そして、対話装置10aでは、出力制御部18aは、第1応答処理時及び第2応答処理時には、それぞれ、複数のキャラクターのいずれか1つを選択し、選択したキャラクターの、キャラクター格納部142を参照して得られる画像の特徴量により定まるキャラクター画像を表示部21に表示させると共に、選択したキャラクターの、キャラクター格納部142を参照して得られる音声の特徴量により定まるキャラクター音声を音声出力部12に音声出力させる。このように、対話装置10aは、表示部21で表示する画像、音声出力部12から出力する音声の音質、の特徴量を第1応答処理時と第2応答処理時とで変更することで、異なるキャラクターを対話装置10aに出現させることができる。また、動作を変更させてもよい。キャラクターは、例えば、子供、お父さん、お母さん、先生、ニュースキャスター等である。 In the interactive apparatus 10a, the output control unit 18a selects one of a plurality of characters during the first response process and the second response process, and refers to the character storage unit 142 of the selected character. The character image determined by the feature amount of the obtained image is displayed on the display unit 21, and the character voice determined by the voice feature amount obtained by referring to the character storage unit 142 of the selected character is displayed on the voice output unit 12. Output audio. In this way, the interactive device 10a changes the feature amount of the image displayed on the display unit 21 and the sound quality of the sound output from the audio output unit 12 between the first response process and the second response process, Different characters can appear in the dialogue apparatus 10a. Further, the operation may be changed. The character is, for example, a child, a father, a mother, a teacher, a news caster, or the like.
 さらに、対話システム100aでは、クラウドサーバ30aは、対話装置10aにて第2応答情報を音声出力する際のキャラクターを指定するキャラ指定情報も含めて第2応答情報を生成する。キャラ指定情報とは、対話装置10aにて出現させることのできるキャラクターを指定するものである。キャラ指定情報は、例えば、第2応答情報で音声出力する内容に応じたものであってもよい。具体例を挙げると、第2応答情報で音声出力する内容が、勉強に関する内容であればお父さん、生活に関する内容であればお母さん、気候に関する内容であればお天気キャスター、のキャラクターを指定する情報である。なお、これらは例示であり、これらに限定されない。 Furthermore, in the interactive system 100a, the cloud server 30a generates the second response information including the character designation information for designating the character when the dialogue apparatus 10a outputs the second response information as a voice. The character designation information designates a character that can appear on the dialogue apparatus 10a. The character designation information may be, for example, information corresponding to the content to be output as voice by the second response information. For example, if the content to be output in the second response information is content related to study, it is information that specifies the character of father, if it is content related to life, mom, and if it is content related to climate, it is a character that specifies the weather caster. . In addition, these are illustrations and are not limited to these.
 次に、キャラクター切替時の演出処理について具体例を用いて説明する。出力制御部18aは、第1応答処理時には、デフォルトのキャラクターを選択し、第2応答処理時には、キャラ指定情報により指定されるキャラクターを選択する。ここで、出力制御部18aが、第2応答処理時に第1応答処理時と異なるキャラクターを選択すると、キャラクター切替部20は、キャラクター切替時に、対話装置10aに異なるキャラクターが出現したことを表す演出処理として以下の処理を行う。 Next, the effect process at the time of character switching will be described using a specific example. The output control unit 18a selects a default character during the first response process, and selects a character designated by the character designation information during the second response process. Here, when the output control unit 18a selects a different character during the second response process than during the first response process, the character switching unit 20 performs an effect process indicating that a different character has appeared in the dialogue apparatus 10a during the character switching. The following processing is performed.
 図4の(a)に示すように、第1応答処理時に選択されたキャラクターである第1キャラクターのキャラクター画像を表示させた状態で、対話装置10aの頭部の回転を開始させる。そして、頭部の回転が終了したとき、第1キャラクターのキャラクター画像に代えて第1応答処理時に選択された第2キャラクターのキャラクター画像を表示させるように、表示部21と動作部22とを制御する。このとき、第1キャラクターが出現している状態で、第2キャラクターを呼ぶ声(例えば、「ニュース担当の○○さん!」)を音声出力してもよい。また、回転時には、第1キャラクターが遠ざかる足音及び第2キャラクターが近づく足音を音声出力してもよい。 As shown in FIG. 4A, the head of the dialogue apparatus 10a is started to rotate while the character image of the first character, which is the character selected during the first response process, is displayed. Then, when the rotation of the head is finished, the display unit 21 and the operation unit 22 are controlled so that the character image of the second character selected during the first response process is displayed instead of the character image of the first character. To do. At this time, in a state where the first character appears, a voice calling the second character (for example, “Mr. XX who is in charge of news!”) May be output as a voice. Further, during the rotation, the footsteps that the first character moves away and the footsteps that the second character approaches may be output as audio.
 あるいは、キャラクター切替部20は、演出処理として以下の処理を行ってもよい。図4の(b)または(c)に示すように、第1キャラクターのキャラクター画像から、第2応答処理時に選択されたキャラクターである第2キャラクターの上記キャラクター画像へと表示が徐々に切り替わるように表示部21を制御する。この演出処理では、対話装置10a自体は回転せず、回転するようにみせかける表示を行う。また、この演出処理においても、上記と同様に音声出力による演出を行なってもよい。 Alternatively, the character switching unit 20 may perform the following process as the effect process. As shown in (b) or (c) of FIG. 4, the display is gradually switched from the character image of the first character to the character image of the second character that is the character selected during the second response process. The display unit 21 is controlled. In this effect process, the interactive device 10a itself does not rotate, but displays to appear to rotate. Also in this effect processing, an effect by voice output may be performed in the same manner as described above.
 また、対話装置10aで第2応答情報を正しく受信できなかった場合には、出力制御部18aは、第1応答処理と第2応答処理でキャラクターを切り替えない。この場合、例えば、「忙しんだよ、また後で質問してね」や、「ニュースキャスターさんは今日はお休みだよ」といったようなデフォルトの音声出力がなされてもよい。 Further, when the second response information cannot be correctly received by the dialogue apparatus 10a, the output control unit 18a does not switch the character between the first response process and the second response process. In this case, for example, a default audio output such as “I'm busy, ask me later” or “Newscaster is off today” may be made.
 (対話システムにおける処理の流れ)
 対話システム100にaおける対話処理の流れは、基本的には図2に示す実施の形態1の対話システム100での処理の流れと同様である。ただし、対話システム100aでは、工程B2において、第2応答情報を音声出力する際のキャラクターを指定するキャラ指定情報も含めて第2応答情報を生成する。そして、工程A4では、第1キャラクターを対話装置10aに出現させ、そして、工程A6では、キャラクター切替部20により上記の演出処理を行い、キャラ指定情報にて指定された第2キャラクターを対話装置10aに出現させる。ここで、例えば「ちょっと待ってね」という音声出力が行われた後に、キャラクター切替部20が上記の演出処理を行うと、ユーザにキャラクターが変更されることをわかり易く伝えることができる。
(Processing flow in dialogue system)
The flow of dialogue processing in the dialogue system 100 is basically the same as the flow of processing in the dialogue system 100 of the first embodiment shown in FIG. However, in the interactive system 100a, in the process B2, the second response information is generated including the character designation information for designating the character when the second response information is output as a voice. In step A4, the first character appears in the dialogue device 10a. In step A6, the character switching unit 20 performs the above-described effect processing, and the second character designated in the character designation information is designated as the dialogue device 10a. To appear. Here, for example, if the character switching unit 20 performs the above-described effect processing after the voice output “Please wait a moment” is performed, it is possible to easily tell the user that the character is changed.
 上記では、第2応答情報に複数のキャラクターのうちいずれか1つを指定する情報が含まれており、それを対話装置10aが受信するものとして説明したが、対話装置10aは、次のように構成されていてもよい。出力制御部18は、音声認識部16による入力音声の認識結果から複数のキャラクターのいずれの1つが指定されたかを判定し、第2応答処理時には、その指定されたキャラクターを選択するように構成されていてもよい。 In the above description, it has been described that the second response information includes information specifying any one of a plurality of characters and is received by the interactive device 10a. The interactive device 10a is as follows. It may be configured. The output control unit 18 is configured to determine which one of the plurality of characters is designated from the recognition result of the input speech by the speech recognition unit 16, and to select the designated character at the time of the second response process. It may be.
 この場合、出力制御部18は、指定されたキャラクターを、ユーザによる入力音声にキャラクターそのものを指定する情報が含まれていればそれから判定してもよいし、ユーザとの対話におけるメタデータからキャラクターを類推して判定してもよい。これについて具体例を用いて説明すると、前者は、ユーザによる入力音声に「先生」、「先生を呼んで」、「先生を出して」というコマンド入っていれば、「先生」のキャラクターを選択し、後者は、ユーザとの対話が勉強に関する内容であれば、「先生」のキャラクターを選択する、ということである。 In this case, the output control unit 18 may determine the specified character if the input voice by the user includes information specifying the character itself, or may determine the character from the metadata in the dialog with the user. You may judge by analogy. This will be explained using a specific example. In the former case, if the voice input by the user includes the commands “teacher”, “call the teacher”, and “speak teacher”, select the character “teacher”. The latter means that if the dialogue with the user is related to study, the character of “teacher” is selected.
 なお、上記判定は出力制御部18で行わず、音声認識部16による音声認識の結果を基に上記判定を行うブロックが別途設けられていてもよい。 Note that the above determination may not be performed by the output control unit 18, and a block for performing the above determination based on the result of speech recognition by the speech recognition unit 16 may be provided separately.
 上記のようにユーザとの対話内容からキャラクターを選択することで、ユーザの意図を反映させたキャラクターを用いて第2応答情報を提供できる。また、対話に面白みを与えることができる。 As described above, the second response information can be provided using the character reflecting the user's intention by selecting the character from the content of the dialogue with the user. In addition, the conversation can be interesting.
 さらに、出力制御部18は、対話装置10aの起動時またはスリープ状態からの復帰時に、複数のキャラクターのいずれか1つを選択し、選択したキャラクターの画像を表示部21に表示させると共に、選択したキャラクターの音声を音声出力部12に音声出力させてもよい。 Furthermore, the output control unit 18 selects any one of a plurality of characters when the interactive device 10a is activated or returns from the sleep state, and displays an image of the selected character on the display unit 21 as well as the selection. The voice output unit 12 may output the voice of the character.
 起動時やスリープ状態からの復帰時には、対話したいユーザを待たせることになり、ユーザのストレスを招くことがある。しかし、上記のような表示及び音声出力を行うことで対話装置にキャラクターを出現させることができ、間を持たせ、ユーザのストレスを低減することができる。対話装置10aの起動時またはスリープ状態からの復帰時に示すキャラクターの画像や音声は、例えば、対話ロボットが眠りから目覚めたことを示すものであると、ユーザに対話装置10aが起動の最中であることをわかり易く伝えることができる。 When starting up or returning from the sleep state, the user who wants to interact will be kept waiting, which may cause stress on the user. However, by performing the display and voice output as described above, a character can appear in the interactive device, and a user's stress can be reduced with a gap. When the interactive device 10a is activated or the character image or sound shown when the interactive device 10a returns from the sleep state, for example, indicates that the interactive robot has awakened from sleep, the interactive device 10a is in the process of being activated. I can convey this in an easy-to-understand manner.
 〔実施の形態3〕
 以下では、本発明のさらに別の実施の形態の対話システムについて図5及び6を用いて説明する。なお説明の便宜上、実施の形態1または2にて説明した部材と同じ機能を有する部材については、同じ符号を付記し、その説明を省略する。
[Embodiment 3]
Hereinafter, a dialogue system according to still another embodiment of the present invention will be described with reference to FIGS. For convenience of explanation, members having the same functions as those described in the first or second embodiment are denoted by the same reference numerals and description thereof is omitted.
 本実施の形態の対話システム100bは、図6に示すように、対話装置10bとクラウドサーバ(サーバ装置)30bとを備えている。対話装置10bは、対話ロボットであるものとして以下の説明を行う。 As shown in FIG. 6, the dialogue system 100b of the present embodiment includes a dialogue device 10b and a cloud server (server device) 30b. The dialogue apparatus 10b will be described below assuming that it is a dialogue robot.
 対話装置10bは、図5に示すように、実施の形態1の対話装置10構成に加え、実施の形態2の対話装置10aと同様に、表示部21及び動作部22を備えている。また、対話装置10bは、家電を操作可能な家電操作モードを家電毎に有しており、図6に示すように、ユーザ宅40にある家電を、通信部15からの赤外線通信や無線LAN通信などで操作可能に設けられている。家電は、例えば、空気調和機(エアコン)、洗濯機、冷蔵庫、調理器具、照明装置、給湯機器、撮影機器、各種AV(Audio-Visual)機器、各種家庭用ロボット(例えば、掃除ロボット、家事支援ロボット、動物型ロボット等)等である。本実施の形態では、家電の例として、エアコン50-1及び洗濯機50-2を用いて説明を行う。 As shown in FIG. 5, the interactive device 10 b includes a display unit 21 and an operation unit 22 in the same manner as the interactive device 10 a of the second embodiment, in addition to the configuration of the interactive device 10 of the first embodiment. Moreover, the interactive device 10b has a home appliance operation mode in which home appliances can be operated for each home appliance. As illustrated in FIG. 6, the home appliance in the user home 40 is connected to the infrared communication or wireless LAN communication from the communication unit 15. It is provided so that it can be operated. Home appliances are, for example, air conditioners (air conditioners), washing machines, refrigerators, cooking utensils, lighting devices, hot water supply equipment, photographing equipment, various AV (Audio-Visual) equipment, various household robots (for example, cleaning robots, housework support) Robot, animal type robot, etc.). In the present embodiment, description will be made using an air conditioner 50-1 and a washing machine 50-2 as examples of home appliances.
 対話装置10bは、エアコン50-1及び洗濯機50-2を操作可能である。対話装置10bの制御部13bは、家電操作モードに対話装置10bを設定するモード設定部23としての機能を有する。モード設定部23は、音声入力部11から入力された入力音声から操作対象とする家電を判定し、判定した家電の家電操作モードに対話装置10bを設定する。よって、対話装置10bは、ユーザとの対話からエアコン50-1を操作したいことを類推した場合には、エアコン50-1を操作する家電操作モードに対話装置10bを設定し、操作を行うことが可能となる。 The interactive device 10b can operate the air conditioner 50-1 and the washing machine 50-2. The control unit 13b of the interactive device 10b has a function as the mode setting unit 23 that sets the interactive device 10b to the home appliance operation mode. The mode setting unit 23 determines a home appliance to be operated from the input voice input from the voice input unit 11, and sets the interactive device 10b to the home appliance operation mode of the determined home appliance. Therefore, when it is inferred that the user wishes to operate the air conditioner 50-1 from the dialog with the user, the dialog device 10b can set and operate the dialog device 10b in the home appliance operation mode for operating the air conditioner 50-1. It becomes possible.
 出力制御部18は、入力音声から操作対象の家電を判定する際、入力音声に操作対象の家電を指定する情報が含まれていれば、それから判定してもよいし、ユーザとの対話におけるメタデータから操作対象の家電を類推して判定してもよい。これについて具体例を用いて説明すると、前者は、ユーザによる入力音声に「エアコンつけて」や「エアコンON」というコマンド入っていれば、操作対象の家電はエアコン50-1であると判定し、後者は、入力音声に「暑い」というメタデータが含まれていれば、操作対象の家電はエアコン50-1であると判定する、ということである。 When determining the home appliance to be operated from the input voice, the output control unit 18 may make a determination if the input voice includes information for designating the home appliance to be operated. The determination may be made by analogy with the home appliance to be operated from the data. This will be described using a specific example. The former determines that the home appliance to be operated is the air conditioner 50-1 if the input voice by the user includes the command “turn on air conditioner” or “air conditioner ON”. The latter means that if the input voice includes “hot” metadata, it is determined that the home appliance to be operated is the air conditioner 50-1.
 ここで、対話装置10bからのエアコン50-1の操作前には、操作の実行を確認する音声、例えば、「エアコンつけようか?」といった音声を対話装置10bから出力させて、ユーザから実行を許可する音声、例えば、「つけて」や「OK」といった音声入力がなされた場合に操作を実行する。このように、家電の操作を実行する前にはユーザの確認を取るのが安全性を確保する上で好ましい。 Here, before the operation of the air conditioner 50-1 from the interactive device 10b, a sound for confirming the execution of the operation, for example, a voice such as “Would you like to turn on the air conditioner?” Is output from the interactive device 10b and executed by the user. The operation is executed when a voice to be permitted, for example, a voice input such as “Turn on” or “OK” is made. As described above, it is preferable to secure the user's confirmation before executing the operation of the home appliance in order to ensure safety.
 また、対話装置10bのデータ格納部14bは、モード情報格納部143を含み、モード情報格納部143には、家電を操作できるように対話装置10bを設定するための情報が家電毎に格納されている。また、モード情報格納部143には、家電に対応付けられたキャラクターの画像の特徴量及び音声の特徴量を家電毎に格納している。 The data storage unit 14b of the interactive device 10b includes a mode information storage unit 143. The mode information storage unit 143 stores information for setting the interactive device 10b for each home appliance so that the home appliance can be operated. Yes. Further, the mode information storage unit 143 stores the feature amount of the character image and the feature amount of the voice associated with the home appliance for each home appliance.
 家電に対応付けられたキャラクターは、対話装置10bが家電の家電操作モードに設定されると出現する。キャラクターの出現は、実施の形態2と同様に、表示部21で表示する画像、音声出力部12から出力する音声の音質、の特徴量を変更することで実行できる。また、動作を変更させてもよい。 The character associated with the home appliance appears when the interactive device 10b is set to the home appliance operation mode of the home appliance. The appearance of the character can be executed by changing the feature quantity of the image displayed on the display unit 21 and the sound quality of the audio output from the audio output unit 12 as in the second embodiment. Further, the operation may be changed.
 具体例を示すと、対話装置10bがエアコン50-1の操作モードに設定されると、エアコンに対応付けられたキャラクターが対話装置10bに出現する。この場合、例えば、エアコンのマークが対話ロボットの額部分やお腹部分(表示部の一部)に表示される。同様に、対話装置10bが洗濯機50-2の操作モードに設定されると、洗濯機に対応付けられたキャラクターが対話装置10bに出現する。 As a specific example, when the interactive device 10b is set to the operation mode of the air conditioner 50-1, a character associated with the air conditioner appears in the interactive device 10b. In this case, for example, the mark of the air conditioner is displayed on the forehead part or the stomach part (part of the display unit) of the interactive robot. Similarly, when the interactive device 10b is set to the operation mode of the washing machine 50-2, a character associated with the washing machine appears in the interactive device 10b.
 あるいは、エアコンにはうさぎのキャラクター、洗濯機にはアライグマのキャラクターというように、家電とキャラクターとの対応付けをユーザが対話装置10bに登録できるようになっていてもよい。なお、これらは例示である。 Alternatively, the user may be able to register the association between the home appliance and the character in the dialogue apparatus 10b, such as a rabbit character for an air conditioner and a raccoon character for a washing machine. These are merely examples.
 このような家電に対応付けられたキャラクターが対話装置10bに出現することにより、対話装置10bがどの家電を操作できる家電操作モードになっているか、つまり、どれが操作対象家電かをユーザに通知することができる。 When a character associated with such a home appliance appears in the interaction device 10b, the user is notified of which home appliance operation mode the interaction device 10b can operate, that is, which is the operation target home appliance. be able to.
 また、例えば、対話装置10bにエアコン50-1に対応付けられたキャラクターが出現している場合には、対話装置10bは、「お休み」という音声入力に対して、「お休み」という音声出力をすると共に、エアコン50-1の電源をOFFする操作を行うように構成されていてもよい。また、洗濯機50-2に対応付けられたキャラクターが出現している場合には、対話装置10bは、「おはよう」という音声入力に対して、「おはよう」という音声出力をすると共に、洗濯機50-2の電源をONにする操作を行うように構成されていてもよい。このように、家電に対応付けられたキャラクター毎に異なる機能を有していてもよい。 Further, for example, when a character associated with the air conditioner 50-1 appears in the dialogue apparatus 10b, the dialogue apparatus 10b outputs a voice signal “break” in response to a voice input “break”. In addition, an operation for turning off the power of the air conditioner 50-1 may be performed. When a character associated with the washing machine 50-2 appears, the dialogue apparatus 10b outputs a voice of "Good morning" in response to the voice input of "Good morning", and at the same time, the washing machine 50 -2 may be configured to turn on the power. Thus, you may have a different function for every character matched with the household appliance.
 また、対話装置10bがテレビジョン装置としての機能も備えている場合には、テレビを操作する家電操作モードに設定された場合には、表示部21に放送番組を映すように構成されていてもよい。 Further, when the interactive device 10b also has a function as a television device, it may be configured to project a broadcast program on the display unit 21 when the home appliance operation mode for operating the television is set. Good.
 対話装置10bからエアコン50-1及び洗濯機50-2を操作する際には、赤外線を用いてエアコン50-1及び洗濯機50-2の位置を検出してもよいし、対話装置10bにカメラが設けられており、カメラから入手した情報でエアコン50-1及び洗濯機50-2の位置を検出してもよい。 When operating the air conditioner 50-1 and the washing machine 50-2 from the interactive apparatus 10b, the positions of the air conditioner 50-1 and the washing machine 50-2 may be detected using infrared rays, or the interactive apparatus 10b may be connected to a camera. The positions of the air conditioner 50-1 and the washing machine 50-2 may be detected from information obtained from the camera.
 クラウドサーバ30bの構成はクラウドサーバ30と同様のため、説明は省略する。クラウドサーバ30が、エアコン50-1及び洗濯機50-2と通信接続しており、これらの状態を示す状態情報を収集している場合には、クラウドサーバ30は、この状態情報を基に第2応答情報を生成してもよい。例えば、洗濯機50-2の状態情報を取得して、「洗濯機さんがお仕事終わったと言っているよ」という音声を出力するように第2応答情報を生成してもよい。 Since the configuration of the cloud server 30b is the same as that of the cloud server 30, description thereof is omitted. When the cloud server 30 is communicatively connected to the air conditioner 50-1 and the washing machine 50-2 and collects state information indicating these states, the cloud server 30 performs the operation based on the state information. Two response information may be generated. For example, the second response information may be generated so as to obtain the state information of the washing machine 50-2 and output a voice saying that the washing machine has finished work.
 〔実施の形態4〕
 実施の形態1から3にて説明した対話装置10,10a,10b及びクラウドサーバ30,30a,30bは、それぞれ、集積回路(ICチップ)等に形成された論理回路(ハードウェア)によって実現してもよいし、CPU(Central Processing Unit)を用いてソフトウェアによって実現してもよい。
[Embodiment 4]
The interactive devices 10, 10a, 10b and the cloud servers 30, 30a, 30b described in the first to third embodiments are each realized by a logic circuit (hardware) formed in an integrated circuit (IC chip) or the like. Alternatively, it may be realized by software using a CPU (Central Processing Unit).
 後者の場合、対話装置10,10a,10b及びクラウドサーバ30,30a,30bは、それぞれ、各機能を実現するソフトウェアであるプログラムの命令を実行するCPU、上記プログラム及び各種データがコンピュータ(又はCPU)で読み取り可能に記録されたROM(Read Only Memory)又は記憶装置(これらを「記録媒体」と称する)、上記プログラムを展開するRAM(Random Access Memory)等を備えている。そして、コンピュータ(又はCPU)が上記プログラムを上記記録媒体から読み取って実行することにより、本発明の目的が達成される。上記記録媒体としては、「一時的でない有形の媒体」、例えば、テープ、ディスク、カード、半導体メモリ、プログラマブルな論理回路等を用いることができる。また、上記プログラムは、該プログラムを伝送可能な任意の伝送媒体(通信ネットワークや放送波等)を介して上記コンピュータに供給されてもよい。なお、本発明は、上記プログラムが電子的な伝送によって具現化された、搬送波に埋め込まれたデータ信号の形態でも実現され得る。 In the latter case, each of the interactive devices 10, 10a, 10b and the cloud servers 30, 30a, 30b includes a CPU that executes instructions of a program that is software that realizes each function, and the program and various data are computers (or CPUs). ROM (Read Only Memory) or storage device (these are referred to as “recording media”), a RAM (Random Access 展開 Memory) that expands the program, and the like. And the objective of this invention is achieved when a computer (or CPU) reads the said program from the said recording medium and runs it. As the recording medium, a “non-temporary tangible medium” such as a tape, a disk, a card, a semiconductor memory, a programmable logic circuit, or the like can be used. The program may be supplied to the computer via an arbitrary transmission medium (such as a communication network or a broadcast wave) that can transmit the program. The present invention can also be realized in the form of a data signal embedded in a carrier wave in which the program is embodied by electronic transmission.
 本発明は上述した各実施の形態に限定されるものではなく、種々の変更が可能であり、異なる実施の形態にそれぞれ開示された技術的手段を適宜組み合わせて得られる実施の形態についても本発明の技術的範囲に含まれる。さらに、各実施の形態にそれぞれ開示された技術的手段を組み合わせることにより、新しい技術的特徴を形成することができる。 The present invention is not limited to the above-described embodiments, and various modifications are possible, and the present invention also relates to embodiments obtained by appropriately combining technical means disclosed in different embodiments. Is included in the technical scope. Furthermore, a new technical feature can be formed by combining the technical means disclosed in each embodiment.
 〔まとめ〕
 本発明の態様1に係る対話装置(10)は、入力された入力音声を音声認識する音声認識部(16)と、上記音声認識部(16)による音声認識の結果に応じた応答内容を示す第1応答情報を格納する応答情報格納部(第1応答情報格納部141)と、上記入力音声をサーバ装置(クラウドサーバ30)に送信し、当該サーバ装置による上記入力音声の音声認識の結果に応じた応答内容を示す第2応答情報を受信する通信部(15)と、上記入力音声に対して、上記応答情報格納部を参照して得られる上記第1応答情報で示される応答内容を音声出力する第1応答処理を行った後に連続して上記第2応答情報で示される応答内容を音声出力する第2応答処理を行う出力制御部(18)と、を備えている。
[Summary]
A dialog device (10) according to an aspect 1 of the present invention shows a speech recognition unit (16) that recognizes an input voice that has been input, and a response content corresponding to a result of speech recognition by the speech recognition unit (16). The response information storage unit (first response information storage unit 141) for storing the first response information and the input voice are transmitted to the server device (cloud server 30), and the result of the voice recognition of the input voice by the server device. The communication unit (15) that receives the second response information indicating the corresponding response content, and the response content indicated by the first response information obtained by referring to the response information storage unit to the input voice And an output control unit (18) for performing a second response process for outputting the response content indicated by the second response information continuously after performing the first response process to be output.
 上記構成によると、自装置が有する音声認識部にて音声認識した結果に対応付けられた第1応答情報で示される応答内容を音声出力させた後に連続して、サーバ装置による音声認識の結果に対応付けられた第2応答情報で示される応答内容を音声出力する。 According to the above configuration, after the response content indicated by the first response information associated with the result of speech recognition performed by the speech recognition unit of the own device is output as speech, the result of speech recognition by the server device is continuously displayed. The response content indicated by the associated second response information is output as a voice.
 第1応答情報は自装置での音声認識の結果に応じたものであるため、一般に、サーバ装置との通信を介して受信する第2応答情報よりも対話装置から早く出力することができる。また、一般に、サーバ装置は、個々の対話装置よりも高度な処理が可能であるため、高度の音声認識を行うことができる。よって、上記構成により、第1応答処理により、入力音声へのすばやい応答ができる上、第1応答処理の後に連続する第2応答処理により多様なまたは高度な情報を提供することができる。 Since the first response information is in accordance with the result of speech recognition in the own device, it can generally be output earlier from the dialogue device than the second response information received through communication with the server device. In general, the server device can perform a higher level of processing than an individual interactive device, and thus can perform high-level voice recognition. Therefore, according to the above configuration, a quick response to the input voice can be made by the first response process, and various or advanced information can be provided by the second response process that follows the first response process.
 上記構成によると、入力音声に対して、自装置での音声認識及び第1応答情報での応答に加え、サーバ装置の音声認識や第2応答情報での応答で補えるので、対話装置の音声認識部の処理能力の向上や応答情報格納部の容量の拡大を図ることなく、複数の情報にて応答可能である。このように、上記構成によると、複数の情報をスムーズに音声出力することが可能であり、ユーザにストレスを与えることなく快適な対話環境を提供できる。 According to the above configuration, since the input speech can be supplemented by the speech recognition of the server device and the response of the second response information in addition to the speech recognition of the own device and the response of the first response information, the speech recognition of the dialogue device It is possible to respond with a plurality of pieces of information without increasing the processing capability of each section and expanding the capacity of the response information storage section. Thus, according to the above configuration, it is possible to smoothly output a plurality of pieces of information, and it is possible to provide a comfortable interactive environment without giving stress to the user.
 さらに、第2応答情報を受信している間に第1応答処理を行うように構成されていると、よりユーザに第2応答情報の受信の待ち時間のストレスを与えることなく対話を行える。 Furthermore, if the first response process is performed while the second response information is being received, the user can further interact without stressing the waiting time for receiving the second response information.
 本発明の態様2に係る対話装置は、上記態様1において、複数のキャラクターの画像の特徴量及び音声の特徴量を格納するキャラクター特徴量格納部(キャラクター格納部142)をさらに備え、上記出力制御部(18)は、上記第1応答処理時及び上記第2応答処理時には、それぞれ、上記複数のキャラクターのいずれか1つを選択し、上記選択したキャラクターの、上記キャラクター特徴量格納部を参照して得られる画像の特徴量により定まるキャラクター画像を表示すると共に、上記選択したキャラクターの、上記キャラクター特徴量格納部を参照して得られる音声の特徴量により定まるキャラクター音声を音声出力し、上記出力制御部(18)が上記第2応答処理時に上記第1応答処理時と異なるキャラクターを選択すると、上記第1応答処理時と上記第2応答処理時との間に、自装置での、画像表示、音声出力、及び運動の少なくとも1つの動作を制御して、異なるキャラクターが出現したことを表す演出を行うキャラクター切替部(20)をさらに備えている。 The interactive apparatus according to aspect 2 of the present invention further includes a character feature amount storage unit (character storage unit 142) that stores image feature amounts and audio feature amounts of a plurality of characters in the above aspect 1, and the output control described above. The unit (18) selects one of the plurality of characters at the time of the first response process and the second response process, and refers to the character feature amount storage unit of the selected character. A character image determined by the feature amount of the obtained image, and voice output of the selected character determined by the feature amount of the voice obtained by referring to the character feature amount storage unit, and the output control When the part (18) selects a character different from the first response process during the second response process, Between the time of 1 response processing and the time of the second response processing, at least one operation of image display, sound output, and exercise is controlled by the own device, and an effect representing the appearance of a different character is performed. A character switching unit (20) is further provided.
 上記構成によると、第1応答処理時及び第2応答処理時に、それぞれ、キャラクターを選択し、選択したキャラクターのキャラクター画像を表示すると共に、選択したキャラクターのキャラクター音声を音声出力することで、対話装置にキャラクターを出現させることができる。そして、第2応答処理時に第1応答処理時と異なるキャラクターが選択されると、異なるキャラクターが出現したことを表す演出を行う。この演出により、異なるキャラクターの出現による出現前のキャラクターのイメージを壊すことなく、異なるキャラクターが出現するという場の雰囲気を盛り上げることができる。 According to the above configuration, at the time of the first response process and the second response process, the character is selected, the character image of the selected character is displayed, and the character voice of the selected character is output by voice, thereby the dialogue device You can make characters appear. When a character different from the first response process is selected during the second response process, an effect indicating that a different character has appeared is performed. This effect makes it possible to excite the atmosphere of the appearance of different characters without destroying the image of the character before the appearance due to the appearance of different characters.
 また、上記制御による上記演出によってユーザの興味を引くことができるため、例えば、第1応答処理後の第2応答処理までの時間が長く経過した場合でも、待ち時間に起因するユーザのストレスを低減することができる。 In addition, since the user's interest can be attracted by the above-described production by the above control, for example, even when the time from the first response process to the second response process elapses, the user's stress due to the waiting time is reduced. can do.
 ここで、キャラクター切替部は、(a)上記第1応答処理時に選択されたキャラクターである第1キャラクターの上記キャラクター画像から、上記第2応答処理時に選択されたキャラクターである第2キャラクターの上記キャラクター画像へと表示が徐々に切り替わるように制御する、または、(b)上記第1キャラクターの上記キャラクター画像が表示された状態で自装置の回転を開始させ、回転が終了したとき、上記第1キャラクターの上記キャラクター画像に代えて上記第2キャラクターの上記キャラクター画像が表示されるように、画像の表示と自装置の回転、移動、方向転換、振動等を制御する、ことで、異なるキャラクターが出現したことを表す演出を行ってもよい。 Here, the character switching unit (a) from the character image of the first character that is the character selected during the first response process, the character of the second character that is the character selected during the second response process. Control the display to be gradually switched to the image, or (b) Start the rotation of the device with the character image of the first character displayed, and when the rotation ends, the first character Different characters appeared by controlling the display of the image and the rotation, movement, direction change, vibration, etc. of the device so that the character image of the second character is displayed instead of the character image of An effect representing this may be performed.
 本発明の態様3に係る対話装置では、上記態様2において、上記第2応答情報には上記複数のキャラクターのうちのいずれか1つを指定する指定情報が含まれており、上記出力制御部(18)は、上記第2応答処理時には、当該第2応答情報に含まれる上記指定情報が指定するキャラクターを選択する。 In the interactive device according to aspect 3 of the present invention, in the aspect 2, the second response information includes designation information for designating any one of the plurality of characters, and the output control unit ( 18) selects a character designated by the designation information included in the second response information during the second response process.
 上記構成によると、第2応答情報にキャラクターの指定情報が含まれている。そのため、予め第2応答情報の内容にふさわしいキャラクターを指定しておけば、第2応答処理時には、指定されたキャラクターを出現させることができるので、説得力をもたせて、あるいは、面白みを与えつつ、第2応答情報をユーザに提供することができる。 According to the above configuration, the character designation information is included in the second response information. Therefore, if a character suitable for the content of the second response information is specified in advance, the specified character can appear at the time of the second response process, so that it is persuasive or interesting. The second response information can be provided to the user.
 本発明の態様4に係る対話装置では、上記態様2において、上記出力制御部(18)は、上記音声認識部(16)による上記入力音声の音声認識の結果から上記複数のキャラクターのいずれの1つが指定されたかを判定し、上記第2応答処理時には、当該指定されたキャラクターを選択する。 In the dialog device according to aspect 4 of the present invention, in the aspect 2, the output control unit (18) determines any one of the plurality of characters from the result of speech recognition of the input speech by the speech recognition unit (16). It is determined whether one is designated, and the designated character is selected during the second response process.
 上記構成によると、ユーザによる入力音声からキャラクターが指定され、第2応答処理時には、指定されたキャラクターを対話装置に出現させることができる。ここで、指定されたキャラクターを、ユーザによる入力音声にキャラクターそのものを指定する情報が含まれていれば、それから判定してもよいし、または、ユーザとの対話におけるメタデータからキャラクターを類推して判定してもよい。上記構成によると、ユーザの意図を反映させたキャラクターを用いて第2応答情報を提供できる。また、対話に面白みを与えることができる。 According to the above configuration, a character is designated from the voice input by the user, and the designated character can appear in the dialogue device during the second response process. Here, if the information that specifies the character itself is included in the input voice by the user, the specified character may be determined based on the character or by analogizing the character from the metadata in the dialog with the user. You may judge. According to the said structure, 2nd response information can be provided using the character reflecting a user's intention. In addition, the conversation can be interesting.
 本発明の態様5に係る対話装置では、上記態様2において、上記出力制御部(18)は、自装置の起動時またはスリープ状態からの復帰時に、上記複数のキャラクターのいずれか1つを選択し、選択したキャラクターの、上記キャラクター特徴量格納部を参照して得られる画像の特徴量により定まるキャラクター画像を表示すると共に、上記選択したキャラクターの、上記キャラクター特徴量格納部を参照して得られる音声の特徴量により定まるキャラクター音声を音声出力する。 In the dialog device according to aspect 5 of the present invention, in the above aspect 2, the output control unit (18) selects any one of the plurality of characters when the own device is activated or when returning from the sleep state. The character image determined by the feature amount of the image obtained by referring to the character feature amount storage unit of the selected character is displayed, and the voice of the selected character obtained by referring to the character feature amount storage unit The character voice determined by the feature amount is output as a voice.
 上記構成によると、自装置の起動時あるいはスリープ状態からの復帰時にキャラクターを選択し、選択したキャラクターのキャラクター画像を表示すると共に、選択したキャラクターのキャラクター音声を音声出力することで、対話装置にキャラクターを出現させることができる。このようなキャラクターの出現により、ユーザの興味を引くことができ、起動時あるいはスリープ状態からの復帰時の待ち時間におけるストレスを低減することができる。 According to the above configuration, the character is selected at the time of starting up the device or returning from the sleep state, the character image of the selected character is displayed, and the character voice of the selected character is output as a voice to the dialogue device. Can appear. The appearance of such a character can attract the user's interest and can reduce the stress in the waiting time at the time of activation or return from the sleep state.
 本発明の態様6に係る対話装置は、上記態様1から5のいずれか1つにおいて、家電を操作可能な家電操作モードを家電毎に有しており、上記音声認識部(16)による上記入力音声の音声認識の結果から操作対象とする家電を判定し、判定した家電の家電操作モードに自装置を設定するモード設定部(23)をさらに備えている。 The interactive apparatus according to aspect 6 of the present invention has the home appliance operation mode capable of operating home appliances for each home appliance in any one of the above aspects 1 to 5, and the input by the voice recognition unit (16). A mode setting unit (23) is further provided that determines a home appliance to be operated from the result of voice recognition of the voice and sets the own apparatus in the home appliance operation mode of the determined home appliance.
 上記構成によると、対話装置を家電が操作できる家電操作モードに設定することができ、操作対象の家電の判定は入力音声から行うことができる。よって、対話装置は、ユーザとの対話から家電を操作したいことを類推した場合には、その家電を操作する家電操作モードに自装置を設定し、操作を行うことが可能となる。 According to the above configuration, the interactive device can be set to the home appliance operation mode in which the home appliance can be operated, and the home appliance to be operated can be determined from the input voice. Therefore, when it is inferred that the user wants to operate the home appliance from the dialog with the user, the interactive device can set the own device to the home appliance operation mode for operating the home appliance and perform the operation.
 ここで、例えば、対話装置が表示部を備えており、家電操作モードに設定される際に操作対象の家電あるいは家電を表すキャラクターの表示を行うように構成されていると、ユーザに操作対象家電をわかりやすく通知することができる。 Here, for example, when the interactive device includes a display unit and is configured to display an operation target home appliance or a character representing the home appliance when the home appliance operation mode is set, the operation target home appliance is displayed to the user. Can be clearly notified.
 本発明の態様7に係る対話システムは、上記態様1から6のいずれか1つに記載の対話装置と、音声認識機能を有するサーバ装置とが通信ネットワークを介して接続されて構成されている。 A dialogue system according to aspect 7 of the present invention is configured by connecting the dialogue apparatus according to any one of aspects 1 to 6 above and a server apparatus having a voice recognition function via a communication network.
 上記対話システムによると、ユーザによる入力音声に対してスムーズな応答が可能であり、ユーザにストレスを与えることなく快適な対話環境を提供できる。 According to the above dialogue system, it is possible to respond smoothly to the voice input by the user, and it is possible to provide a comfortable dialogue environment without stressing the user.
 また、本発明の態様8に係るサーバ装置は、上記態様7の対話システムに備えられるサーバ装置である。 The server device according to aspect 8 of the present invention is a server device provided in the interactive system according to aspect 7 described above.
 上記サーバ装置を用いることで、上記態様7の対話システムを構築することができる。 By using the server device, the dialogue system according to aspect 7 can be constructed.
 また、本発明の各態様に係る対話装置、サーバ装置または対話システムは、コンピュータによって実現してもよく、この場合には、コンピュータを対話装置、サーバ装置または対話システムが備える各部(音声認識部、出力制御部、キャラクター切替部、音声認識部、モード設定部)として動作させることにより対話装置、サーバ装置または対話システムをコンピュータにて実現させるプログラム、及びそれを記録したコンピュータ読み取り可能な記録媒体も本発明の範疇に入る。 In addition, the interactive device, the server device, or the interactive system according to each aspect of the present invention may be realized by a computer. In this case, each unit (speech recognition unit, computer) included in the interactive device, the server device, or the interactive system. An output control unit, a character switching unit, a voice recognition unit, a mode setting unit), and a program for realizing a dialog device, a server device, or a dialog system on a computer and a computer-readable recording medium on which the program is recorded It falls into the category of the invention.
 本発明は、通信ネットワークに接続した、ユーザの音声を認識して応答する対話装置等に利用可能である。 The present invention can be used for an interactive device connected to a communication network and recognizing and responding to a user's voice.
 2  ユーザ
 10,10a,10b  対話装置
 11  音声入力部
 12  音声出力部
 13,13a,13b  制御部
 14,14a,14b  データ格納部
 15  通信部
 16  音声認識部
 17  応答情報取得部
 18,18a  出力制御部
 20  キャラクター切替部
 21  表示部
 22  動作部
 23  モード設定部
 30,30a,30b  クラウドサーバ(サーバ装置)
 35  応答情報生成部
 40  ユーザ宅
 50-1  エアコン(家電)
 50-2  洗濯機(家電)
 100,100a,100b  対話システム
 141  第1応答情報格納部(応答情報格納部)
 142  キャラクター格納部
 143  モード情報格納部
2 User 10, 10a, 10b Dialogue device 11 Voice input unit 12 Voice output unit 13, 13a, 13b Control unit 14, 14a, 14b Data storage unit 15 Communication unit 16 Voice recognition unit 17 Response information acquisition unit 18, 18a Output control unit 20 character switching unit 21 display unit 22 operation unit 23 mode setting unit 30, 30a, 30b cloud server (server device)
35 Response information generation unit 40 User's house 50-1 Air conditioner (home appliance)
50-2 Washing machine (home appliance)
100, 100a, 100b Dialog system 141 First response information storage (response information storage)
142 Character storage unit 143 Mode information storage unit

Claims (5)

  1.  入力された入力音声を音声認識する音声認識部と、
     上記音声認識部による音声認識の結果に応じた応答内容を示す第1応答情報を格納する応答情報格納部と、
     上記入力音声をサーバ装置に送信し、当該サーバ装置による上記入力音声の音声認識の結果に応じた応答内容を示す第2応答情報を受信する通信部と、
     上記入力音声に対して、上記応答情報格納部を参照して得られる上記第1応答情報で示される応答内容を音声出力する第1応答処理を行った後に連続して上記第2応答情報で示される応答内容を音声出力する第2応答処理を行う出力制御部と、を備えたことを特徴とする対話装置。
    A speech recognition unit that recognizes the input speech that has been input;
    A response information storage unit for storing first response information indicating a response content according to a result of voice recognition by the voice recognition unit;
    A communication unit that transmits the input voice to a server device and receives second response information indicating response content according to a result of voice recognition of the input voice by the server device;
    After the first response processing for outputting the response content indicated by the first response information obtained by referring to the response information storage unit with respect to the input voice, the second response information is continuously indicated. And an output control unit for performing a second response process for outputting the content of the response to be voiced.
  2.  複数のキャラクターの画像の特徴量及び音声の特徴量を格納するキャラクター特徴量格納部をさらに備え、
     上記出力制御部は、上記第1応答処理時及び上記第2応答処理時には、それぞれ、上記複数のキャラクターのいずれか1つを選択し、選択したキャラクターの、上記キャラクター特徴量格納部を参照して得られる画像の特徴量により定まるキャラクター画像を表示すると共に、上記選択したキャラクターの、上記キャラクター特徴量格納部を参照して得られる音声の特徴量により定まるキャラクター音声を音声出力し、
     上記出力制御部が上記第2応答処理時に上記第1応答処理時と異なるキャラクターを選択すると、上記第1応答処理時と上記第2応答処理時との間に、自装置での、画像表示、音声出力、及び運動の少なくとも1つの動作を制御して、異なるキャラクターが出現したことを表す演出を行うキャラクター切替部をさらに備えたことを特徴とする請求項1に記載の対話装置。
    A character feature amount storage unit for storing feature values of images and voice features of a plurality of characters;
    The output control unit selects one of the plurality of characters at the time of the first response process and the second response process, and refers to the character feature amount storage unit of the selected character. The character image determined by the feature amount of the obtained image is displayed, and the character voice determined by the feature amount of the voice obtained by referring to the character feature amount storage unit of the selected character is output as voice.
    When the output control unit selects a character different from the first response process at the time of the second response process, between the first response process and the second response process, the image display on its own device, The interactive apparatus according to claim 1, further comprising a character switching unit that controls at least one action of voice output and exercise to perform an effect indicating that a different character has appeared.
  3.  上記第2応答情報には上記複数のキャラクターのうちのいずれか1つを指定する指定情報が含まれており、
     上記出力制御部は、上記第2応答処理時には、当該第2応答情報に含まれる上記指定情報が指定するキャラクターを選択することを特徴とする請求項2に記載の対話装置。
    The second response information includes designation information for designating any one of the plurality of characters.
    The dialogue apparatus according to claim 2, wherein the output control unit selects a character designated by the designation information included in the second response information during the second response process.
  4.  上記出力制御部は、上記音声認識部による上記入力音声の音声認識の結果から上記複数のキャラクターのいずれの1つが指定されたかを判定し、上記第2応答処理時には、指定されたキャラクターを選択することを特徴とする請求項2に記載の対話装置。 The output control unit determines which one of the plurality of characters is designated from the result of speech recognition of the input speech by the speech recognition unit, and selects the designated character during the second response process. The interactive apparatus according to claim 2.
  5.  家電を操作可能な家電操作モードを家電毎に有しており、
     上記音声認識部による上記入力音声の音声認識の結果から操作対象とする家電を判定し、判定した家電の家電操作モードに自装置を設定するモード設定部をさらに備えたことを特徴とする請求項1から4のいずれか1項に記載の対話装置。
    Each home appliance has a home appliance operation mode that can operate home appliances.
    The apparatus further comprises: a mode setting unit that determines a home appliance to be operated from a result of voice recognition of the input voice by the voice recognition unit, and sets the own device in a home appliance operation mode of the determined home appliance. 5. The interactive apparatus according to any one of 1 to 4.
PCT/JP2015/076081 2014-09-30 2015-09-15 Conversation device WO2016052164A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2014-202218 2014-09-30
JP2014202218A JP6448971B2 (en) 2014-09-30 2014-09-30 Interactive device

Publications (1)

Publication Number Publication Date
WO2016052164A1 true WO2016052164A1 (en) 2016-04-07

Family

ID=55630206

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2015/076081 WO2016052164A1 (en) 2014-09-30 2015-09-15 Conversation device

Country Status (2)

Country Link
JP (1) JP6448971B2 (en)
WO (1) WO2016052164A1 (en)

Cited By (64)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109346083A (en) * 2018-11-28 2019-02-15 北京猎户星空科技有限公司 A kind of intelligent sound exchange method and device, relevant device and storage medium
JP2020003081A (en) * 2018-06-25 2020-01-09 株式会社パロマ Control device for gas cooking stove, gas cooking stove, and instruction data generation program in control device for gas cooking stove
US10720160B2 (en) 2018-06-01 2020-07-21 Apple Inc. Voice interaction at a primary device to access call functionality of a companion device
US10748546B2 (en) 2017-05-16 2020-08-18 Apple Inc. Digital assistant services based on device capabilities
US10878809B2 (en) 2014-05-30 2020-12-29 Apple Inc. Multi-command single utterance input method
US10978090B2 (en) 2013-02-07 2021-04-13 Apple Inc. Voice trigger for a digital assistant
US11009970B2 (en) 2018-06-01 2021-05-18 Apple Inc. Attention aware virtual assistant dismissal
US11037565B2 (en) 2016-06-10 2021-06-15 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US11070949B2 (en) 2015-05-27 2021-07-20 Apple Inc. Systems and methods for proactively identifying and surfacing relevant content on an electronic device with a touch-sensitive display
US11087759B2 (en) 2015-03-08 2021-08-10 Apple Inc. Virtual assistant activation
US11120372B2 (en) 2011-06-03 2021-09-14 Apple Inc. Performing actions associated with task items that represent tasks to perform
US11126400B2 (en) 2015-09-08 2021-09-21 Apple Inc. Zero latency digital assistant
US11133008B2 (en) 2014-05-30 2021-09-28 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US11140099B2 (en) 2019-05-21 2021-10-05 Apple Inc. Providing message response suggestions
US11152002B2 (en) 2016-06-11 2021-10-19 Apple Inc. Application integration with a digital assistant
US11169616B2 (en) 2018-05-07 2021-11-09 Apple Inc. Raise to speak
US11217251B2 (en) 2019-05-06 2022-01-04 Apple Inc. Spoken notifications
US11237797B2 (en) 2019-05-31 2022-02-01 Apple Inc. User activity shortcut suggestions
US11257504B2 (en) 2014-05-30 2022-02-22 Apple Inc. Intelligent assistant for home automation
US11269678B2 (en) 2012-05-15 2022-03-08 Apple Inc. Systems and methods for integrating third party services with a digital assistant
US11289073B2 (en) 2019-05-31 2022-03-29 Apple Inc. Device text to speech
US11307752B2 (en) 2019-05-06 2022-04-19 Apple Inc. User configurable task triggers
US11348582B2 (en) 2008-10-02 2022-05-31 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US11360641B2 (en) 2019-06-01 2022-06-14 Apple Inc. Increasing the relevance of new available information
US11380310B2 (en) 2017-05-12 2022-07-05 Apple Inc. Low-latency intelligent automated assistant
US11388291B2 (en) 2013-03-14 2022-07-12 Apple Inc. System and method for processing voicemail
US11405466B2 (en) 2017-05-12 2022-08-02 Apple Inc. Synchronization and task delegation of a digital assistant
US11423908B2 (en) 2019-05-06 2022-08-23 Apple Inc. Interpreting spoken requests
US11423886B2 (en) 2010-01-18 2022-08-23 Apple Inc. Task flow identification based on user intent
US11431642B2 (en) 2018-06-01 2022-08-30 Apple Inc. Variable latency device coordination
US11467802B2 (en) 2017-05-11 2022-10-11 Apple Inc. Maintaining privacy of personal information
US11468282B2 (en) 2015-05-15 2022-10-11 Apple Inc. Virtual assistant in a communication session
US11475898B2 (en) 2018-10-26 2022-10-18 Apple Inc. Low-latency multi-speaker speech recognition
US11475884B2 (en) 2019-05-06 2022-10-18 Apple Inc. Reducing digital assistant latency when a language is incorrectly determined
US11488406B2 (en) 2019-09-25 2022-11-01 Apple Inc. Text detection using global geometry estimators
US11496600B2 (en) 2019-05-31 2022-11-08 Apple Inc. Remote execution of machine-learned models
US11500672B2 (en) 2015-09-08 2022-11-15 Apple Inc. Distributed personal assistant
US11516537B2 (en) 2014-06-30 2022-11-29 Apple Inc. Intelligent automated assistant for TV user interactions
US11526368B2 (en) 2015-11-06 2022-12-13 Apple Inc. Intelligent automated assistant in a messaging environment
US11532306B2 (en) 2017-05-16 2022-12-20 Apple Inc. Detecting a trigger of a digital assistant
US11580990B2 (en) 2017-05-12 2023-02-14 Apple Inc. User-specific acoustic models
US11599331B2 (en) 2017-05-11 2023-03-07 Apple Inc. Maintaining privacy of personal information
US11656884B2 (en) 2017-01-09 2023-05-23 Apple Inc. Application integration with a digital assistant
US11657813B2 (en) 2019-05-31 2023-05-23 Apple Inc. Voice identification in digital assistant systems
US11671920B2 (en) 2007-04-03 2023-06-06 Apple Inc. Method and system for operating a multifunction portable electronic device using voice-activation
US11675829B2 (en) 2017-05-16 2023-06-13 Apple Inc. Intelligent automated assistant for media exploration
US11696060B2 (en) 2020-07-21 2023-07-04 Apple Inc. User identification using headphones
US11710482B2 (en) 2018-03-26 2023-07-25 Apple Inc. Natural assistant interaction
US11727219B2 (en) 2013-06-09 2023-08-15 Apple Inc. System and method for inferring user intent from speech inputs
US11755276B2 (en) 2020-05-12 2023-09-12 Apple Inc. Reducing description length based on confidence
US11765209B2 (en) 2020-05-11 2023-09-19 Apple Inc. Digital assistant hardware abstraction
US11783815B2 (en) 2019-03-18 2023-10-10 Apple Inc. Multimodality in digital assistant systems
US11790914B2 (en) 2019-06-01 2023-10-17 Apple Inc. Methods and user interfaces for voice-based control of electronic devices
US11798547B2 (en) 2013-03-15 2023-10-24 Apple Inc. Voice activated device for use with a voice-based digital assistant
US11809783B2 (en) 2016-06-11 2023-11-07 Apple Inc. Intelligent device arbitration and control
US11809483B2 (en) 2015-09-08 2023-11-07 Apple Inc. Intelligent automated assistant for media search and playback
US11838734B2 (en) 2020-07-20 2023-12-05 Apple Inc. Multi-device audio adjustment coordination
US11854539B2 (en) 2018-05-07 2023-12-26 Apple Inc. Intelligent automated assistant for delivering content from user experiences
US11853536B2 (en) 2015-09-08 2023-12-26 Apple Inc. Intelligent automated assistant in a media environment
US11853647B2 (en) 2015-12-23 2023-12-26 Apple Inc. Proactive assistance based on dialog communication between devices
US11886805B2 (en) 2015-11-09 2024-01-30 Apple Inc. Unconventional virtual assistant interactions
US11893992B2 (en) 2018-09-28 2024-02-06 Apple Inc. Multi-modal inputs for voice commands
US11914848B2 (en) 2020-05-11 2024-02-27 Apple Inc. Providing relevant data items based on context
US11947873B2 (en) 2015-06-29 2024-04-02 Apple Inc. Virtual assistant for media playback

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6680125B2 (en) * 2016-07-25 2020-04-15 トヨタ自動車株式会社 Robot and voice interaction method
JP6614080B2 (en) * 2016-09-16 2019-12-04 トヨタ自動車株式会社 Spoken dialogue system and spoken dialogue method
CN117130574A (en) * 2017-05-16 2023-11-28 苹果公司 Far field extension of digital assistant services
JP2019016061A (en) * 2017-07-04 2019-01-31 株式会社Nttドコモ Information processing unit and program
JP7023823B2 (en) * 2018-11-16 2022-02-22 アルパイン株式会社 In-vehicle device and voice recognition method

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002108380A (en) * 2000-10-02 2002-04-10 Canon Inc Information presenting device and its control method, and computer-readable memory
JP2003131695A (en) * 2001-10-25 2003-05-09 Hitachi Ltd Voice recognition equipment, and unit and method for voice recognition equipment control
WO2013190963A1 (en) * 2012-06-18 2013-12-27 エイディシーテクノロジー株式会社 Voice response device
JP2014062944A (en) * 2012-09-20 2014-04-10 Sharp Corp Information processing devices
JP2014182307A (en) * 2013-03-19 2014-09-29 Sharp Corp Voice recognition system and speech system
JP2014191030A (en) * 2013-03-26 2014-10-06 Fuji Soft Inc Voice recognition terminal and voice recognition method using computer terminal

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2002108380A (en) * 2000-10-02 2002-04-10 Canon Inc Information presenting device and its control method, and computer-readable memory
JP2003131695A (en) * 2001-10-25 2003-05-09 Hitachi Ltd Voice recognition equipment, and unit and method for voice recognition equipment control
WO2013190963A1 (en) * 2012-06-18 2013-12-27 エイディシーテクノロジー株式会社 Voice response device
JP2014062944A (en) * 2012-09-20 2014-04-10 Sharp Corp Information processing devices
JP2014182307A (en) * 2013-03-19 2014-09-29 Sharp Corp Voice recognition system and speech system
JP2014191030A (en) * 2013-03-26 2014-10-06 Fuji Soft Inc Voice recognition terminal and voice recognition method using computer terminal

Cited By (96)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11671920B2 (en) 2007-04-03 2023-06-06 Apple Inc. Method and system for operating a multifunction portable electronic device using voice-activation
US11900936B2 (en) 2008-10-02 2024-02-13 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US11348582B2 (en) 2008-10-02 2022-05-31 Apple Inc. Electronic devices with voice command and contextual data processing capabilities
US11423886B2 (en) 2010-01-18 2022-08-23 Apple Inc. Task flow identification based on user intent
US11120372B2 (en) 2011-06-03 2021-09-14 Apple Inc. Performing actions associated with task items that represent tasks to perform
US11321116B2 (en) 2012-05-15 2022-05-03 Apple Inc. Systems and methods for integrating third party services with a digital assistant
US11269678B2 (en) 2012-05-15 2022-03-08 Apple Inc. Systems and methods for integrating third party services with a digital assistant
US11862186B2 (en) 2013-02-07 2024-01-02 Apple Inc. Voice trigger for a digital assistant
US10978090B2 (en) 2013-02-07 2021-04-13 Apple Inc. Voice trigger for a digital assistant
US11557310B2 (en) 2013-02-07 2023-01-17 Apple Inc. Voice trigger for a digital assistant
US11636869B2 (en) 2013-02-07 2023-04-25 Apple Inc. Voice trigger for a digital assistant
US11388291B2 (en) 2013-03-14 2022-07-12 Apple Inc. System and method for processing voicemail
US11798547B2 (en) 2013-03-15 2023-10-24 Apple Inc. Voice activated device for use with a voice-based digital assistant
US11727219B2 (en) 2013-06-09 2023-08-15 Apple Inc. System and method for inferring user intent from speech inputs
US11257504B2 (en) 2014-05-30 2022-02-22 Apple Inc. Intelligent assistant for home automation
US11699448B2 (en) 2014-05-30 2023-07-11 Apple Inc. Intelligent assistant for home automation
US10878809B2 (en) 2014-05-30 2020-12-29 Apple Inc. Multi-command single utterance input method
US11670289B2 (en) 2014-05-30 2023-06-06 Apple Inc. Multi-command single utterance input method
US11133008B2 (en) 2014-05-30 2021-09-28 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US11810562B2 (en) 2014-05-30 2023-11-07 Apple Inc. Reducing the need for manual start/end-pointing and trigger phrases
US11516537B2 (en) 2014-06-30 2022-11-29 Apple Inc. Intelligent automated assistant for TV user interactions
US11838579B2 (en) 2014-06-30 2023-12-05 Apple Inc. Intelligent automated assistant for TV user interactions
US11842734B2 (en) 2015-03-08 2023-12-12 Apple Inc. Virtual assistant activation
US11087759B2 (en) 2015-03-08 2021-08-10 Apple Inc. Virtual assistant activation
US11468282B2 (en) 2015-05-15 2022-10-11 Apple Inc. Virtual assistant in a communication session
US11070949B2 (en) 2015-05-27 2021-07-20 Apple Inc. Systems and methods for proactively identifying and surfacing relevant content on an electronic device with a touch-sensitive display
US11947873B2 (en) 2015-06-29 2024-04-02 Apple Inc. Virtual assistant for media playback
US11550542B2 (en) 2015-09-08 2023-01-10 Apple Inc. Zero latency digital assistant
US11853536B2 (en) 2015-09-08 2023-12-26 Apple Inc. Intelligent automated assistant in a media environment
US11126400B2 (en) 2015-09-08 2021-09-21 Apple Inc. Zero latency digital assistant
US11954405B2 (en) 2015-09-08 2024-04-09 Apple Inc. Zero latency digital assistant
US11809483B2 (en) 2015-09-08 2023-11-07 Apple Inc. Intelligent automated assistant for media search and playback
US11500672B2 (en) 2015-09-08 2022-11-15 Apple Inc. Distributed personal assistant
US11809886B2 (en) 2015-11-06 2023-11-07 Apple Inc. Intelligent automated assistant in a messaging environment
US11526368B2 (en) 2015-11-06 2022-12-13 Apple Inc. Intelligent automated assistant in a messaging environment
US11886805B2 (en) 2015-11-09 2024-01-30 Apple Inc. Unconventional virtual assistant interactions
US11853647B2 (en) 2015-12-23 2023-12-26 Apple Inc. Proactive assistance based on dialog communication between devices
US11657820B2 (en) 2016-06-10 2023-05-23 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US11037565B2 (en) 2016-06-10 2021-06-15 Apple Inc. Intelligent digital assistant in a multi-tasking environment
US11809783B2 (en) 2016-06-11 2023-11-07 Apple Inc. Intelligent device arbitration and control
US11749275B2 (en) 2016-06-11 2023-09-05 Apple Inc. Application integration with a digital assistant
US11152002B2 (en) 2016-06-11 2021-10-19 Apple Inc. Application integration with a digital assistant
US11656884B2 (en) 2017-01-09 2023-05-23 Apple Inc. Application integration with a digital assistant
US11467802B2 (en) 2017-05-11 2022-10-11 Apple Inc. Maintaining privacy of personal information
US11599331B2 (en) 2017-05-11 2023-03-07 Apple Inc. Maintaining privacy of personal information
US11580990B2 (en) 2017-05-12 2023-02-14 Apple Inc. User-specific acoustic models
US11405466B2 (en) 2017-05-12 2022-08-02 Apple Inc. Synchronization and task delegation of a digital assistant
US11538469B2 (en) 2017-05-12 2022-12-27 Apple Inc. Low-latency intelligent automated assistant
US11380310B2 (en) 2017-05-12 2022-07-05 Apple Inc. Low-latency intelligent automated assistant
US11837237B2 (en) 2017-05-12 2023-12-05 Apple Inc. User-specific acoustic models
US11862151B2 (en) 2017-05-12 2024-01-02 Apple Inc. Low-latency intelligent automated assistant
US10748546B2 (en) 2017-05-16 2020-08-18 Apple Inc. Digital assistant services based on device capabilities
US11217255B2 (en) 2017-05-16 2022-01-04 Apple Inc. Far-field extension for digital assistant services
US11532306B2 (en) 2017-05-16 2022-12-20 Apple Inc. Detecting a trigger of a digital assistant
US11675829B2 (en) 2017-05-16 2023-06-13 Apple Inc. Intelligent automated assistant for media exploration
US11710482B2 (en) 2018-03-26 2023-07-25 Apple Inc. Natural assistant interaction
US11487364B2 (en) 2018-05-07 2022-11-01 Apple Inc. Raise to speak
US11907436B2 (en) 2018-05-07 2024-02-20 Apple Inc. Raise to speak
US11900923B2 (en) 2018-05-07 2024-02-13 Apple Inc. Intelligent automated assistant for delivering content from user experiences
US11854539B2 (en) 2018-05-07 2023-12-26 Apple Inc. Intelligent automated assistant for delivering content from user experiences
US11169616B2 (en) 2018-05-07 2021-11-09 Apple Inc. Raise to speak
US10984798B2 (en) 2018-06-01 2021-04-20 Apple Inc. Voice interaction at a primary device to access call functionality of a companion device
US11630525B2 (en) 2018-06-01 2023-04-18 Apple Inc. Attention aware virtual assistant dismissal
US10720160B2 (en) 2018-06-01 2020-07-21 Apple Inc. Voice interaction at a primary device to access call functionality of a companion device
US11431642B2 (en) 2018-06-01 2022-08-30 Apple Inc. Variable latency device coordination
US11360577B2 (en) 2018-06-01 2022-06-14 Apple Inc. Attention aware virtual assistant dismissal
US11009970B2 (en) 2018-06-01 2021-05-18 Apple Inc. Attention aware virtual assistant dismissal
JP2020003081A (en) * 2018-06-25 2020-01-09 株式会社パロマ Control device for gas cooking stove, gas cooking stove, and instruction data generation program in control device for gas cooking stove
JP7162865B2 (en) 2018-06-25 2022-10-31 株式会社パロマ Control device for gas stove and gas stove system
US11893992B2 (en) 2018-09-28 2024-02-06 Apple Inc. Multi-modal inputs for voice commands
US11475898B2 (en) 2018-10-26 2022-10-18 Apple Inc. Low-latency multi-speaker speech recognition
CN109346083A (en) * 2018-11-28 2019-02-15 北京猎户星空科技有限公司 A kind of intelligent sound exchange method and device, relevant device and storage medium
US11783815B2 (en) 2019-03-18 2023-10-10 Apple Inc. Multimodality in digital assistant systems
US11475884B2 (en) 2019-05-06 2022-10-18 Apple Inc. Reducing digital assistant latency when a language is incorrectly determined
US11705130B2 (en) 2019-05-06 2023-07-18 Apple Inc. Spoken notifications
US11423908B2 (en) 2019-05-06 2022-08-23 Apple Inc. Interpreting spoken requests
US11675491B2 (en) 2019-05-06 2023-06-13 Apple Inc. User configurable task triggers
US11217251B2 (en) 2019-05-06 2022-01-04 Apple Inc. Spoken notifications
US11307752B2 (en) 2019-05-06 2022-04-19 Apple Inc. User configurable task triggers
US11140099B2 (en) 2019-05-21 2021-10-05 Apple Inc. Providing message response suggestions
US11888791B2 (en) 2019-05-21 2024-01-30 Apple Inc. Providing message response suggestions
US11289073B2 (en) 2019-05-31 2022-03-29 Apple Inc. Device text to speech
US11237797B2 (en) 2019-05-31 2022-02-01 Apple Inc. User activity shortcut suggestions
US11360739B2 (en) 2019-05-31 2022-06-14 Apple Inc. User activity shortcut suggestions
US11496600B2 (en) 2019-05-31 2022-11-08 Apple Inc. Remote execution of machine-learned models
US11657813B2 (en) 2019-05-31 2023-05-23 Apple Inc. Voice identification in digital assistant systems
US11360641B2 (en) 2019-06-01 2022-06-14 Apple Inc. Increasing the relevance of new available information
US11790914B2 (en) 2019-06-01 2023-10-17 Apple Inc. Methods and user interfaces for voice-based control of electronic devices
US11488406B2 (en) 2019-09-25 2022-11-01 Apple Inc. Text detection using global geometry estimators
US11914848B2 (en) 2020-05-11 2024-02-27 Apple Inc. Providing relevant data items based on context
US11765209B2 (en) 2020-05-11 2023-09-19 Apple Inc. Digital assistant hardware abstraction
US11924254B2 (en) 2020-05-11 2024-03-05 Apple Inc. Digital assistant hardware abstraction
US11755276B2 (en) 2020-05-12 2023-09-12 Apple Inc. Reducing description length based on confidence
US11838734B2 (en) 2020-07-20 2023-12-05 Apple Inc. Multi-device audio adjustment coordination
US11750962B2 (en) 2020-07-21 2023-09-05 Apple Inc. User identification using headphones
US11696060B2 (en) 2020-07-21 2023-07-04 Apple Inc. User identification using headphones

Also Published As

Publication number Publication date
JP6448971B2 (en) 2019-01-09
JP2016071247A (en) 2016-05-09

Similar Documents

Publication Publication Date Title
JP6448971B2 (en) Interactive device
CN106257355B (en) Equipment control method and controller
KR102306624B1 (en) Persistent companion device configuration and deployment platform
WO2016052018A1 (en) Home appliance management system, home appliance, remote control device, and robot
US20170206064A1 (en) Persistent companion device configuration and deployment platform
AU2017228574A1 (en) Apparatus and methods for providing a persistent companion device
US20210142796A1 (en) Information processing apparatus, information processing method, and program
JP2023015054A (en) Dynamic and/or context-specific hot word for calling automation assistant
KR102400398B1 (en) Animated Character Head Systems and Methods
WO2017141530A1 (en) Information processing device, information processing method and program
TW201408052A (en) Television device and method for displaying virtual on-screen interactive moderator of the television device
JP2022169645A (en) Device and program, or the like
JP7267411B2 (en) INTERACTIVE OBJECT DRIVING METHOD, DEVICE, ELECTRONIC DEVICE AND STORAGE MEDIUM
WO2018163646A1 (en) Dialogue method, dialogue system, dialogue device, and program
WO2016117514A1 (en) Robot control device and robot
WO2016052520A1 (en) Conversation device
KR102519599B1 (en) Multimodal based interaction robot, and control method for the same
JP2016206249A (en) Interactive device, interactive system, and control method for interactive device
CN116185191A (en) Server, display equipment and virtual digital human interaction method
JP7286303B2 (en) Conference support system and conference robot
JP7208361B2 (en) Communication robot and its control method, information processing server and information processing method
JP2020067877A (en) Interactive device and control program of interactive device
WO2018183812A1 (en) Persistent companion device configuration and deployment platform
CN110225380B (en) Display method and device of television desktop
WO2020153146A1 (en) Information processing device and information processing method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15846101

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15846101

Country of ref document: EP

Kind code of ref document: A1