WO2016052164A1 - Dispositif de conversation - Google Patents
Dispositif de conversation Download PDFInfo
- Publication number
- WO2016052164A1 WO2016052164A1 PCT/JP2015/076081 JP2015076081W WO2016052164A1 WO 2016052164 A1 WO2016052164 A1 WO 2016052164A1 JP 2015076081 W JP2015076081 W JP 2015076081W WO 2016052164 A1 WO2016052164 A1 WO 2016052164A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- response
- voice
- character
- unit
- response information
- Prior art date
Links
- 230000004044 response Effects 0.000 claims abstract description 276
- 238000000034 method Methods 0.000 claims abstract description 77
- 230000008569 process Effects 0.000 claims abstract description 74
- 238000003860 storage Methods 0.000 claims abstract description 45
- 230000002452 interceptive effect Effects 0.000 claims description 109
- 238000004891 communication Methods 0.000 claims description 34
- 238000012545 processing Methods 0.000 claims description 25
- 230000000694 effects Effects 0.000 claims description 15
- 238000005406 washing Methods 0.000 description 15
- 230000006870 function Effects 0.000 description 13
- 238000013500 data storage Methods 0.000 description 11
- 210000003128 head Anatomy 0.000 description 6
- 230000015572 biosynthetic process Effects 0.000 description 5
- 230000033001 locomotion Effects 0.000 description 5
- 238000003786 synthesis reaction Methods 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000008921 facial expression Effects 0.000 description 2
- 238000003384 imaging method Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 241001465754 Metazoa Species 0.000 description 1
- 241000283973 Oryctolagus cuniculus Species 0.000 description 1
- 241000282330 Procyon lotor Species 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 238000010411 cooking Methods 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 210000001061 forehead Anatomy 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 230000035876 healing Effects 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 210000002784 stomach Anatomy 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/04—Details of speech synthesis systems, e.g. synthesiser structure or memory management
- G10L13/047—Architecture of speech synthesisers
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
- G10L15/30—Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
Definitions
- the present invention relates to an interactive apparatus and an interactive system that are connected to a communication network and recognize and respond to a user's voice.
- information related to a response based on the recognition result is received from the server device and output (response).
- response information related to a response based on the recognition result
- the server device can be used by a plurality of robots, it is more advantageous in terms of cost than increasing the processing capability of each interactive robot.
- the interactive robot responds by acquiring the response content from the server device
- the response timing is delayed as compared with the case where the interactive robot alone recognizes and responds. For this reason, the user may feel stressed and feel difficult to talk.
- the present invention has been made in view of the above problems, and provides an interactive apparatus and an interactive system that can smoothly output a plurality of information and provide a comfortable interactive environment without stressing the user. There is to do.
- an interactive apparatus shows a speech recognition unit that recognizes input speech that has been input, and a response content corresponding to a result of speech recognition by the speech recognition unit.
- Response information storage unit for storing first response information, and communication for transmitting the input voice to a server device and receiving second response information indicating response contents according to a result of voice recognition of the input voice by the server device And the second response continuously after performing the first response processing for outputting the response content indicated by the first response information obtained by referring to the response information storage unit to the input speech
- an output control unit for performing a second response process for outputting the response content indicated by the information as a voice.
- the input speech can be supplemented by the speech recognition of the server device and the response by the second response information in addition to the speech recognition by the device itself and the response by the first response information. Therefore, it is possible to respond with a plurality of pieces of information without improving the processing capability of the voice recognition unit of the interactive apparatus and expanding the capacity of the response information storage unit.
- a plurality of information can be smoothly output as a voice, and a comfortable interactive environment can be provided without causing stress to the user.
- FIG. 1 is a diagram showing a configuration of a dialogue system 100 according to the present embodiment.
- the dialogue system 100 includes a dialogue device (self device) 10 and a cloud server (server device) 30, which are connected via a communication network.
- a communication network For example, the Internet can be used as this communication network.
- a telephone line network, a mobile communication network, a CATV (CAble TeleVision) communication network, a satellite communication network, or the like can be used.
- the dialogue device 10 and the cloud server 30 each have a voice recognition function, and the user can interact with the dialogue device 10 by voice using natural language.
- the dialogue apparatus 10 may be, for example, a dialogue robot, or may be a smartphone, a tablet terminal, a personal computer, a home appliance (home electronic device) or the like having a voice recognition function.
- FIG. 1 only one interactive device 10 connected to the cloud server 30 is shown for simplicity of explanation, but in the interactive system 100, the number of interactive devices 10 connected to the cloud server 30 is as follows. It is not limited.
- the type of the interactive device 10 connected to the cloud server 30 is not limited, that is, different types of interactive devices 10 such as an interactive robot and a smartphone may be connected to the cloud server 30.
- the dialogue device 10 is a device that performs voice recognition when a voice (voice signal) is inputted and performs a dialogue according to the recognition result.
- the dialogue apparatus 10 includes a voice input unit 11, a voice output unit 12, a control unit 13, a data storage unit 14, and a communication unit 15.
- the voice input unit 11 is a voice input device such as a microphone
- the voice output unit 12 is a voice output device such as a speaker.
- the control unit 13 is a block that controls the operation of each unit of the dialogue apparatus 10.
- the control unit 13 includes a computer device including an arithmetic processing unit such as a CPU (Central Processing Unit) or a dedicated processor.
- the control unit 13 reads out and executes a program for executing various controls in the interactive device 10 stored in the data storage unit 14, thereby controlling the operation of each unit of the interactive device 10 in an integrated manner.
- arithmetic processing unit such as a CPU (Central Processing Unit) or a dedicated processor.
- the control unit 13 reads out and executes a program for executing various controls in the interactive device 10 stored in the data storage unit 14, thereby controlling the operation of each unit of the interactive device 10 in an integrated manner.
- control unit 13 has functions as a speech recognition unit 16, a response information acquisition unit 17, an output control unit 18, and a speech synthesis unit 19.
- the voice recognition unit 16 is a block that recognizes an input voice from the user. Specifically, the voice recognition unit 16 converts voice data input from the voice input unit 11 into text data, analyzes the text data, and extracts words and phrases. A known technique can be used for voice recognition processing.
- the response information acquisition unit 17 is a block that detects response information indicating response content according to the recognition result of the voice recognition unit 16 from a first response information storage unit (response information storage unit) 141 described below.
- the response information that the response information acquisition unit 17 acquires from the first response information storage unit 141 is referred to as first response information.
- the response information acquisition unit 17 refers to the first response information storage unit 141 and acquires the first response information corresponding to the words and phrases extracted by the voice recognition unit 16. If the information corresponding to the word or phrase extracted by the speech recognition unit 16 is not registered in the first response information storage unit 141 or if the speech recognition unit 16 has failed in speech recognition, the response information acquisition unit 17 Obtains default first response information. Specific examples of the default first response information include information such as “Please wait for a while” or “Let me hear” (for example, “Please wait for a while” or “Listen Information that becomes "I see”. Note that the present invention is not limited to these.
- the output control unit 18 is a block that performs audio output by causing the audio output unit 12 to output audio data.
- the output control unit 18 performs the second response process continuously after performing the first response process as a response to the input voice from the voice input unit 11.
- the first response process is a process of outputting the response content indicated by the first response information acquired by the response information acquisition unit 17 by voice.
- the second response process is the second response information received from the cloud server 30. This is a process for outputting the displayed response content by voice. The second response information will be described later.
- the speech synthesizer 19 is a block that generates speech data (speech synthesis).
- the voice synthesizer 19 generates voice data having response contents indicated by the first response information.
- the generated audio data is output via the audio output unit 12. If the first response information is generated as voice data (recorded voice), the voice synthesizer 19 does not generate it.
- the data storage unit 14 includes a RAM (Random Access Memory), a ROM (Read Only Memory), an HDD (Hard Disk Drive), and the like, and is a block that stores various information (data) used in the interactive device 10. Further, the data storage unit 14 includes a first response information storage unit 141.
- the first response information storage unit 141 is a database in which first response information is registered in association with words and phrases.
- the first response information includes not only information corresponding to one word but also information corresponding to a combination of a plurality of words.
- a plurality of pieces of first correspondence information may be registered corresponding to a certain word or a certain phrase, and in this case, what is actually output by voice may be selected.
- the words, phrases, and first response information may all be stored as text data.
- Known techniques can be used to construct such a database and to obtain response information from the database.
- the dialogue apparatus 10 can return a response to the user's utterance by referring to the first response information storage unit 141, that is, the dialogue with the user becomes possible.
- the communication unit 15 is a block that performs communication with the outside.
- the communication unit 15 transmits the voice data to the cloud server 30.
- the 2nd response information which shows the response content according to the result of the speech recognition of the input sound by the cloud server 30 is received from the cloud server 30 explained in full detail behind.
- the communication unit 15 transmits the voice data input from the voice input unit to the cloud server 30 as it is.
- the communication unit 15 uses the text data generated by the voice recognition unit 16 or the text data. You may transmit the word and phrase extracted from to the cloud server 30.
- the output control unit 18 is configured to perform the first response processing while the communication unit 15 receives the second response information from the cloud server 30.
- the dialogue apparatus 10 may further include an imaging unit (camera).
- the dialogue apparatus 10 is configured to analyze a user's facial expression and position from an image input from the imaging unit and perform a dialogue based on the analysis. Also good.
- the dialogue apparatus 10 is a robot, when it is recognized that the user's position is rightward when viewed from the front of the robot, the head of the robot is actually turned to the right, or the face is turned to the right It may be configured to indicate a state of facing the user, that is, being able to respond by displaying a state of moving toward.
- the cloud server 30 is a server that generates a response to voice data (input voice) received from the dialogue apparatus 10 and transmits the response to the dialogue apparatus 10.
- the cloud server 30 is a server that manages the interactive device 10. When a plurality of interactive devices 10 are connected, the cloud server 30 manages each individually.
- the cloud server 30 may also manage information related to the user of the interactive device 10. In this case, information related to the user can be registered in the cloud server 30 from an external device such as a smartphone or a tablet. Also good.
- the server device connected to the interactive device 10 will be described using the cloud server 30 that provides a cloud service, but is not limited to the cloud server.
- the cloud server 30 may be one unit or a plurality of units connected via a communication network.
- the cloud server 30 includes a control unit 31, a data storage unit 32, and a communication unit 33, as shown in FIG.
- the control unit 31 includes a computer device configured by an arithmetic processing unit such as a CPU or a dedicated processor, and is a block that controls the operation of each unit of the cloud server 30.
- the control unit 31 has functions as a speech recognition unit 34, a response information generation unit 35, and a speech synthesis unit 36.
- the voice recognition unit 34 is a block having the same function as the voice recognition unit 16 of the dialogue apparatus 10. However, the speech recognition capability (performance) is higher than that of the speech recognition unit 16. Thereby, even if the dialogue apparatus 10 cannot recognize the voice, the cloud server 30 can recognize the voice.
- the response information generation unit 35 is a block that generates response information following the first response information.
- the response information generated by the response information generation unit 35 is referred to as second response information.
- the response information generation unit 35 detects response information indicating the response content according to the recognition result of the voice recognition unit 34 from the second response information storage unit 321 described below. Then, the second response information is generated.
- the voice synthesizer 36 is a block that generates voice data.
- the voice synthesizer 36 is a block that generates voice data of response contents indicated by the second response information generated by the response information generator 35.
- the cloud server 30 is configured to receive information (external provision information) from an external information provision server via a communication network. Therefore, the response information generation unit 35 may generate the second response information based on the externally provided information, the user information registered in the cloud server 30 described above, or a combination thereof. Specific examples of the externally provided information include weather information, traffic information, disaster information, and the like, but are not limited thereto. Further, the number of information providing servers that provide information to the cloud server 30 is not limited.
- the voice of the response content indicated by the first response information (the first response) with respect to the input voice “Good morning”
- the voice of the response content indicated by the second response information is “Today because today ’s weather is cloudy and rainy” It ’s better to take an umbrella with you. ”
- the externally provided information is weather information
- the second response information is generated based on the weather information.
- the second response information is continuously output after voice output of the first response information
- the second response information is a continuous content or a detailed content of the first response information. It is preferable because a sense of unity is born in the response.
- the cloud server 30 knows in advance which first response information is output from the interactive device 10 for which input voice, or when the interactive device 10 acquires the first response information. The cloud server 30 may be notified.
- the voice data of the response content indicated by the second response information generated by the cloud server 30 is transmitted to the dialogue apparatus 10 by the control unit 31 controlling the communication unit 33.
- the cloud server 30 is configured to generate the second response information as voice data and then transmit it, so the load on the interactive device 10 can be reduced.
- each of the dialogue apparatus 10 and the cloud server 30 has a voice synthesis unit, even if the voice quality of the synthesized voice differs, the first response process as described in the second embodiment. By changing the character that appears in the dialogue apparatus 10 in the second response process, it is possible to eliminate the uncomfortable feeling that the user feels.
- the cloud server 30 may not have the voice synthesizer 36 and may be configured to transmit the second response information to the dialogue apparatus 10 as text data. In this case, the second response information is generated as voice data by the voice synthesizer 19 of the dialogue apparatus 10.
- the cloud server 30 may be able to register recorded voices from an external device such as a smartphone or a tablet, for example.
- the response information generation unit 35 may acquire the registered recorded voice as the second response information, or may be included in the generation of the second response information. Since the recorded voice is formed as voice data, if it is transmitted to the dialogue apparatus 10 as it is, the voice synthesis process in the dialogue apparatus 10 is not performed. For example, when the voice “There is a cake in the refrigerator” is registered in the cloud server 30 from the smartphone of the user ’s mother, the dialogue apparatus 10 receives the first response information in response to the input voice of the user “I ’m right”. Use it to output “Okaeri” as a voice, and then use the second response information to say “A message from your mother.“ There is a cake in the refrigerator. ” It is possible to perform an advanced response such as outputting “
- the data storage unit 32 is a block that stores various information (data) used in the cloud server 30. Further, the data storage unit 32 includes a second response information storage unit 321.
- the second response information storage unit 321 is a database in which second response information is registered in association with words and phrases. The second response information storage unit 321 stores a larger amount of information than the first response information storage unit 141. Further, the second response information may be updated periodically.
- the response information generating unit 35 and the second response information of the cloud server 30 as described above. Since the storage unit 321 is configured, it is possible to correctly recognize the input voice and to return a plurality of information.
- the communication unit 33 is a block that performs communication with the outside.
- the communication unit 33 is connected to an external information providing server (not shown) or an external device such as a smartphone or a tablet through a communication network in addition to the interactive device 10.
- an external information providing server not shown
- an external device such as a smartphone or a tablet
- a communication network in addition to the interactive device 10.
- the number of devices connected to the cloud server 30 is not limited.
- step A1 When the dialogue apparatus 10 receives the voice data (input voice) of the utterance from the user 2 (step A1), the dialogue apparatus 10 transmits the received voice data to the cloud server 30 (step A2) and acquires the first response information (step A2). Step A3).
- step A3 the input speech is recognized as speech, and first response information indicating response contents according to the result of speech recognition is acquired. Note that either step A2 or step A3 may be started first. Then, the dialogue apparatus 10 outputs the response content indicated by the first response information as a voice (step A4).
- the cloud server 30 when the cloud server 30 receives the voice data from the interactive device 10 (step B1), the cloud server 30 generates second response information (step B2), and transmits the generated second response information to the interactive device 10 (step B3). .
- the dialogue apparatus 10 while receiving the second response information from the cloud server 30, the dialogue apparatus 10 outputs the response content indicated by the first response information as a voice.
- step A5 When the interactive device 10 receives the second response information (step A5), the dialogue device 10 outputs the following response content indicated by the second response information as a voice (step A6). Thus, the dialogue process in the dialogue system 100 ends.
- the dialogue device 10 continuously outputs the response content indicated by the first response information associated with the result of the voice recognition performed by the voice recognition unit 16, after the voice is output,
- the response content indicated by the second response information associated with the result of the speech recognition by the server 30 is output as speech.
- the first response information is in accordance with the result of voice recognition in the dialog device 10, the first response information is output from the dialog device 10 earlier than the second response information received through communication with the cloud server 30. Can do.
- the cloud server 30 can perform advanced processing than the individual interactive devices 10, it can perform advanced speech recognition. Therefore, the interactive device 10 can make a quick response to the input voice by the first response process, and can provide various or advanced information by the second response process that continues after the first response process.
- the voice recognition in the cloud server 30 and the second response information in response to the input voice from the user. Since it can be supplemented by a response, it is possible to respond with a plurality of pieces of information without improving the processing capability of the voice recognition unit 16 of the interactive apparatus 10 and expanding the capacity of the data storage unit 14. Therefore, the interactive system 100 can smoothly output a plurality of information as voices, and can provide a comfortable interactive environment without stressing the user.
- the output control unit 18 is configured to perform the first response process while the communication unit 15 receives the second response information from the cloud server 30. The user can interact without stressing the waiting time for receiving the second response information.
- the interactive system 100a includes an interactive device 10a and a cloud server (server device) 30a.
- the dialogue apparatus 10a will be described as a dialogue robot having a head and a torso that can express facial expressions as shown in FIG.
- the interactive device 10 a includes a display unit 21 and an operation unit 22 in addition to the configuration of the interactive device 10 of the first embodiment.
- the display unit 21 is a block that displays an image of the expression of the interactive robot. In the present embodiment, the display unit 21 performs display using the rear projection method, but is not limited thereto.
- the operation unit 22 is a block that executes the operation of the interactive apparatus 10a. As will be described below, the operation unit 22 performs an operation of rotating the dialogue apparatus 10a at the time of character switching between the first response process and the second response process. This rotation may be a motion other than the rotation, and as shown in FIG. 4A, the head of the dialog robot as the dialog device 10a rotates in the horizontal direction.
- the operation unit 22 may be configured to operate the head, the body, or the arm attached to the body of the dialog robot that is the dialog device 10a in various directions. Further, at the time of character switching, the motion unit 22 may perform an exercise other than rotation, and may execute, for example, movement, direction change, vibration, and the like. Here, the exercise means that at least a part of the interactive device performs a physical operation.
- the dialogue apparatus 10a includes a character storage unit (character feature amount storage unit) 142 that stores image feature amounts and audio feature amounts of a plurality of characters in the data storage unit 14a.
- the control unit 13a has a function as the character switching unit 20.
- the character switching unit 20 controls at least one of the image display, the sound output, and the exercise in the interactive device 10a, and performs an effect process indicating that a different character has appeared in the interactive device 10a. It is a block. The effect process will be described below using a specific example.
- the output control unit 18a selects one of a plurality of characters during the first response process and the second response process, and refers to the character storage unit 142 of the selected character.
- the character image determined by the feature amount of the obtained image is displayed on the display unit 21, and the character voice determined by the voice feature amount obtained by referring to the character storage unit 142 of the selected character is displayed on the voice output unit 12.
- Output audio In this way, the interactive device 10a changes the feature amount of the image displayed on the display unit 21 and the sound quality of the sound output from the audio output unit 12 between the first response process and the second response process, Different characters can appear in the dialogue apparatus 10a. Further, the operation may be changed.
- the character is, for example, a child, a father, a mother, a teacher, a news caster, or the like.
- the cloud server 30a generates the second response information including the character designation information for designating the character when the dialogue apparatus 10a outputs the second response information as a voice.
- the character designation information designates a character that can appear on the dialogue apparatus 10a.
- the character designation information may be, for example, information corresponding to the content to be output as voice by the second response information. For example, if the content to be output in the second response information is content related to study, it is information that specifies the character of father, if it is content related to life, mom, and if it is content related to climate, it is a character that specifies the weather caster. . In addition, these are illustrations and are not limited to these.
- the output control unit 18a selects a default character during the first response process, and selects a character designated by the character designation information during the second response process.
- the character switching unit 20 performs an effect process indicating that a different character has appeared in the dialogue apparatus 10a during the character switching. The following processing is performed.
- the head of the dialogue apparatus 10a is started to rotate while the character image of the first character, which is the character selected during the first response process, is displayed. Then, when the rotation of the head is finished, the display unit 21 and the operation unit 22 are controlled so that the character image of the second character selected during the first response process is displayed instead of the character image of the first character. To do.
- a voice calling the second character for example, “Mr. XX who is in charge of news!” May be output as a voice.
- the footsteps that the first character moves away and the footsteps that the second character approaches may be output as audio.
- the character switching unit 20 may perform the following process as the effect process. As shown in (b) or (c) of FIG. 4, the display is gradually switched from the character image of the first character to the character image of the second character that is the character selected during the second response process.
- the display unit 21 is controlled.
- the interactive device 10a itself does not rotate, but displays to appear to rotate.
- an effect by voice output may be performed in the same manner as described above.
- the output control unit 18a does not switch the character between the first response process and the second response process.
- a default audio output such as “I'm busy, ask me later” or “Newscaster is off today” may be made.
- the flow of dialogue processing in the dialogue system 100 is basically the same as the flow of processing in the dialogue system 100 of the first embodiment shown in FIG.
- the second response information is generated including the character designation information for designating the character when the second response information is output as a voice.
- the first character appears in the dialogue device 10a.
- the character switching unit 20 performs the above-described effect processing, and the second character designated in the character designation information is designated as the dialogue device 10a. To appear.
- the character switching unit 20 performs the above-described effect processing after the voice output “Please wait a moment” is performed, it is possible to easily tell the user that the character is changed.
- the second response information includes information specifying any one of a plurality of characters and is received by the interactive device 10a.
- the interactive device 10a is as follows. It may be configured.
- the output control unit 18 is configured to determine which one of the plurality of characters is designated from the recognition result of the input speech by the speech recognition unit 16, and to select the designated character at the time of the second response process. It may be.
- the output control unit 18 may determine the specified character if the input voice by the user includes information specifying the character itself, or may determine the character from the metadata in the dialog with the user. You may judge by analogy. This will be explained using a specific example. In the former case, if the voice input by the user includes the commands “teacher”, “call the teacher”, and “speak teacher”, select the character “teacher”. The latter means that if the dialogue with the user is related to study, the character of “teacher” is selected.
- the above determination may not be performed by the output control unit 18, and a block for performing the above determination based on the result of speech recognition by the speech recognition unit 16 may be provided separately.
- the second response information can be provided using the character reflecting the user's intention by selecting the character from the content of the dialogue with the user.
- the conversation can be interesting.
- the output control unit 18 selects any one of a plurality of characters when the interactive device 10a is activated or returns from the sleep state, and displays an image of the selected character on the display unit 21 as well as the selection.
- the voice output unit 12 may output the voice of the character.
- the interactive device 10a When starting up or returning from the sleep state, the user who wants to interact will be kept waiting, which may cause stress on the user. However, by performing the display and voice output as described above, a character can appear in the interactive device, and a user's stress can be reduced with a gap.
- the interactive device 10a When the interactive device 10a is activated or the character image or sound shown when the interactive device 10a returns from the sleep state, for example, indicates that the interactive robot has awakened from sleep, the interactive device 10a is in the process of being activated. I can convey this in an easy-to-understand manner.
- the dialogue system 100b of the present embodiment includes a dialogue device 10b and a cloud server (server device) 30b.
- the dialogue apparatus 10b will be described below assuming that it is a dialogue robot.
- the interactive device 10 b includes a display unit 21 and an operation unit 22 in the same manner as the interactive device 10 a of the second embodiment, in addition to the configuration of the interactive device 10 of the first embodiment. Moreover, the interactive device 10b has a home appliance operation mode in which home appliances can be operated for each home appliance. As illustrated in FIG. 6, the home appliance in the user home 40 is connected to the infrared communication or wireless LAN communication from the communication unit 15. It is provided so that it can be operated.
- Home appliances are, for example, air conditioners (air conditioners), washing machines, refrigerators, cooking utensils, lighting devices, hot water supply equipment, photographing equipment, various AV (Audio-Visual) equipment, various household robots (for example, cleaning robots, housework support) Robot, animal type robot, etc.).
- air conditioners air conditioners
- washing machines refrigerators
- cooking utensils lighting devices
- hot water supply equipment photographing equipment
- AV (Audio-Visual) equipment various household robots (for example, cleaning robots, housework support) Robot, animal type robot, etc.).
- AV Audio-Visual
- household robots for example, cleaning robots, housework support
- animal type robot etc.
- the interactive device 10b can operate the air conditioner 50-1 and the washing machine 50-2.
- the control unit 13b of the interactive device 10b has a function as the mode setting unit 23 that sets the interactive device 10b to the home appliance operation mode.
- the mode setting unit 23 determines a home appliance to be operated from the input voice input from the voice input unit 11, and sets the interactive device 10b to the home appliance operation mode of the determined home appliance. Therefore, when it is inferred that the user wishes to operate the air conditioner 50-1 from the dialog with the user, the dialog device 10b can set and operate the dialog device 10b in the home appliance operation mode for operating the air conditioner 50-1. It becomes possible.
- the output control unit 18 may make a determination if the input voice includes information for designating the home appliance to be operated. The determination may be made by analogy with the home appliance to be operated from the data. This will be described using a specific example.
- the former determines that the home appliance to be operated is the air conditioner 50-1 if the input voice by the user includes the command “turn on air conditioner” or “air conditioner ON”.
- the latter means that if the input voice includes “hot” metadata, it is determined that the home appliance to be operated is the air conditioner 50-1.
- a sound for confirming the execution of the operation for example, a voice such as “Would you like to turn on the air conditioner?” Is output from the interactive device 10b and executed by the user.
- the operation is executed when a voice to be permitted, for example, a voice input such as “Turn on” or “OK” is made.
- a voice to be permitted for example, a voice input such as “Turn on” or “OK” is made.
- the data storage unit 14b of the interactive device 10b includes a mode information storage unit 143.
- the mode information storage unit 143 stores information for setting the interactive device 10b for each home appliance so that the home appliance can be operated. Yes. Further, the mode information storage unit 143 stores the feature amount of the character image and the feature amount of the voice associated with the home appliance for each home appliance.
- the character associated with the home appliance appears when the interactive device 10b is set to the home appliance operation mode of the home appliance.
- the appearance of the character can be executed by changing the feature quantity of the image displayed on the display unit 21 and the sound quality of the audio output from the audio output unit 12 as in the second embodiment. Further, the operation may be changed.
- the interactive device 10b when the interactive device 10b is set to the operation mode of the air conditioner 50-1, a character associated with the air conditioner appears in the interactive device 10b.
- the mark of the air conditioner is displayed on the forehead part or the stomach part (part of the display unit) of the interactive robot.
- the interactive device 10b when the interactive device 10b is set to the operation mode of the washing machine 50-2, a character associated with the washing machine appears in the interactive device 10b.
- the user may be able to register the association between the home appliance and the character in the dialogue apparatus 10b, such as a rabbit character for an air conditioner and a raccoon character for a washing machine.
- the character in the dialogue apparatus 10b such as a rabbit character for an air conditioner and a raccoon character for a washing machine.
- the interaction device 10b When a character associated with such a home appliance appears in the interaction device 10b, the user is notified of which home appliance operation mode the interaction device 10b can operate, that is, which is the operation target home appliance. be able to.
- the dialogue apparatus 10b when a character associated with the air conditioner 50-1 appears in the dialogue apparatus 10b, the dialogue apparatus 10b outputs a voice signal “break” in response to a voice input “break”. In addition, an operation for turning off the power of the air conditioner 50-1 may be performed.
- the dialogue apparatus 10b When a character associated with the washing machine 50-2 appears, the dialogue apparatus 10b outputs a voice of "Good morning” in response to the voice input of "Good morning”, and at the same time, the washing machine 50 -2 may be configured to turn on the power. Thus, you may have a different function for every character matched with the household appliance.
- the interactive device 10b when the interactive device 10b also has a function as a television device, it may be configured to project a broadcast program on the display unit 21 when the home appliance operation mode for operating the television is set. Good.
- the positions of the air conditioner 50-1 and the washing machine 50-2 may be detected using infrared rays, or the interactive apparatus 10b may be connected to a camera.
- the positions of the air conditioner 50-1 and the washing machine 50-2 may be detected from information obtained from the camera.
- the cloud server 30b Since the configuration of the cloud server 30b is the same as that of the cloud server 30, description thereof is omitted.
- the cloud server 30 When the cloud server 30 is communicatively connected to the air conditioner 50-1 and the washing machine 50-2 and collects state information indicating these states, the cloud server 30 performs the operation based on the state information. Two response information may be generated.
- the second response information may be generated so as to obtain the state information of the washing machine 50-2 and output a voice saying that the washing machine has finished work.
- the interactive devices 10, 10a, 10b and the cloud servers 30, 30a, 30b described in the first to third embodiments are each realized by a logic circuit (hardware) formed in an integrated circuit (IC chip) or the like. Alternatively, it may be realized by software using a CPU (Central Processing Unit).
- a logic circuit hardware
- IC chip integrated circuit
- CPU Central Processing Unit
- each of the interactive devices 10, 10a, 10b and the cloud servers 30, 30a, 30b includes a CPU that executes instructions of a program that is software that realizes each function, and the program and various data are computers (or CPUs).
- ROM Read Only Memory
- storage device these are referred to as “recording media”
- RAM Random Access ⁇ Memory
- the objective of this invention is achieved when a computer (or CPU) reads the said program from the said recording medium and runs it.
- a “non-temporary tangible medium” such as a tape, a disk, a card, a semiconductor memory, a programmable logic circuit, or the like can be used.
- the program may be supplied to the computer via an arbitrary transmission medium (such as a communication network or a broadcast wave) that can transmit the program.
- a transmission medium such as a communication network or a broadcast wave
- the present invention can also be realized in the form of a data signal embedded in a carrier wave in which the program is embodied by electronic transmission.
- the present invention is not limited to the above-described embodiments, and various modifications are possible, and the present invention also relates to embodiments obtained by appropriately combining technical means disclosed in different embodiments. Is included in the technical scope. Furthermore, a new technical feature can be formed by combining the technical means disclosed in each embodiment.
- a dialog device (10) shows a speech recognition unit (16) that recognizes an input voice that has been input, and a response content corresponding to a result of speech recognition by the speech recognition unit (16).
- the response information storage unit (first response information storage unit 141) for storing the first response information and the input voice are transmitted to the server device (cloud server 30), and the result of the voice recognition of the input voice by the server device.
- the communication unit (15) that receives the second response information indicating the corresponding response content, and the response content indicated by the first response information obtained by referring to the response information storage unit to the input voice
- an output control unit (18) for performing a second response process for outputting the response content indicated by the second response information continuously after performing the first response process to be output.
- the result of speech recognition by the server device is continuously displayed.
- the response content indicated by the associated second response information is output as a voice.
- the first response information is in accordance with the result of speech recognition in the own device, it can generally be output earlier from the dialogue device than the second response information received through communication with the server device.
- the server device can perform a higher level of processing than an individual interactive device, and thus can perform high-level voice recognition. Therefore, according to the above configuration, a quick response to the input voice can be made by the first response process, and various or advanced information can be provided by the second response process that follows the first response process.
- the speech recognition of the dialogue device since the input speech can be supplemented by the speech recognition of the server device and the response of the second response information in addition to the speech recognition of the own device and the response of the first response information, the speech recognition of the dialogue device It is possible to respond with a plurality of pieces of information without increasing the processing capability of each section and expanding the capacity of the response information storage section.
- it is possible to smoothly output a plurality of pieces of information it is possible to provide a comfortable interactive environment without giving stress to the user.
- the user can further interact without stressing the waiting time for receiving the second response information.
- the interactive apparatus further includes a character feature amount storage unit (character storage unit 142) that stores image feature amounts and audio feature amounts of a plurality of characters in the above aspect 1, and the output control described above.
- the unit (18) selects one of the plurality of characters at the time of the first response process and the second response process, and refers to the character feature amount storage unit of the selected character.
- the part (18) selects a character different from the first response process during the second response process, Between the time of 1 response processing and the time of the second response processing, at least one operation of image display, sound output, and exercise is controlled by the own device, and an effect representing the appearance of a different character is performed.
- a character switching unit (20) is further provided.
- the character is selected, the character image of the selected character is displayed, and the character voice of the selected character is output by voice, thereby the dialogue device You can make characters appear.
- a character different from the first response process is selected during the second response process, an effect indicating that a different character has appeared is performed. This effect makes it possible to excite the atmosphere of the appearance of different characters without destroying the image of the character before the appearance due to the appearance of different characters.
- the user's interest can be attracted by the above-described production by the above control, for example, even when the time from the first response process to the second response process elapses, the user's stress due to the waiting time is reduced. can do.
- the character switching unit (a) from the character image of the first character that is the character selected during the first response process, the character of the second character that is the character selected during the second response process. Control the display to be gradually switched to the image, or (b) Start the rotation of the device with the character image of the first character displayed, and when the rotation ends, the first character Different characters appeared by controlling the display of the image and the rotation, movement, direction change, vibration, etc. of the device so that the character image of the second character is displayed instead of the character image of An effect representing this may be performed.
- the second response information includes designation information for designating any one of the plurality of characters
- the output control unit ( 18) selects a character designated by the designation information included in the second response information during the second response process.
- the character designation information is included in the second response information. Therefore, if a character suitable for the content of the second response information is specified in advance, the specified character can appear at the time of the second response process, so that it is persuasive or interesting.
- the second response information can be provided to the user.
- the output control unit (18) determines any one of the plurality of characters from the result of speech recognition of the input speech by the speech recognition unit (16). It is determined whether one is designated, and the designated character is selected during the second response process.
- a character is designated from the voice input by the user, and the designated character can appear in the dialogue device during the second response process.
- the specified character may be determined based on the character or by analogizing the character from the metadata in the dialog with the user. You may judge.
- 2nd response information can be provided using the character reflecting a user's intention.
- the conversation can be interesting.
- the output control unit (18) selects any one of the plurality of characters when the own device is activated or when returning from the sleep state.
- the character image determined by the feature amount of the image obtained by referring to the character feature amount storage unit of the selected character is displayed, and the voice of the selected character obtained by referring to the character feature amount storage unit
- the character voice determined by the feature amount is output as a voice.
- the character is selected at the time of starting up the device or returning from the sleep state, the character image of the selected character is displayed, and the character voice of the selected character is output as a voice to the dialogue device.
- the appearance of such a character can attract the user's interest and can reduce the stress in the waiting time at the time of activation or return from the sleep state.
- the interactive apparatus has the home appliance operation mode capable of operating home appliances for each home appliance in any one of the above aspects 1 to 5, and the input by the voice recognition unit (16).
- a mode setting unit (23) is further provided that determines a home appliance to be operated from the result of voice recognition of the voice and sets the own apparatus in the home appliance operation mode of the determined home appliance.
- the interactive device can be set to the home appliance operation mode in which the home appliance can be operated, and the home appliance to be operated can be determined from the input voice. Therefore, when it is inferred that the user wants to operate the home appliance from the dialog with the user, the interactive device can set the own device to the home appliance operation mode for operating the home appliance and perform the operation.
- the interactive device includes a display unit and is configured to display an operation target home appliance or a character representing the home appliance when the home appliance operation mode is set
- the operation target home appliance is displayed to the user. Can be clearly notified.
- a dialogue system is configured by connecting the dialogue apparatus according to any one of aspects 1 to 6 above and a server apparatus having a voice recognition function via a communication network.
- the server device according to aspect 8 of the present invention is a server device provided in the interactive system according to aspect 7 described above.
- the dialogue system according to aspect 7 can be constructed.
- the interactive device, the server device, or the interactive system according to each aspect of the present invention may be realized by a computer.
- each unit speech recognition unit, computer included in the interactive device, the server device, or the interactive system.
- An output control unit, a character switching unit, a voice recognition unit, a mode setting unit), and a program for realizing a dialog device, a server device, or a dialog system on a computer and a computer-readable recording medium on which the program is recorded It falls into the category of the invention.
- the present invention can be used for an interactive device connected to a communication network and recognizing and responding to a user's voice.
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
L'objet de la présente invention consiste à fournir un dispositif de conversation qui peut répondre sans à-coups et qui peut fournir un environnement de conversation confortable sans mettre de pression sur un utilisateur. La présente invention concerne un dispositif (10) de conversation, qui comprend une unité (18) de commande de sortie qui, en réponse à une entrée vocale, effectue, en série après avoir effectué un premier processus de réponse qui délivre par sortie vocale le contenu de réponse qui est signifié à l'aide d'informations de première réponse qui sont obtenues par interrogation d'une première unité (141) de mémoire d'informations de réponse, un second processus de réponse qui délivre par sortie vocale le contenu de réponse qui est signifié à l'aide d'informations de seconde réponse qui sont reçues en provenance d'un serveur (30) en nuage.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2014202218A JP6448971B2 (ja) | 2014-09-30 | 2014-09-30 | 対話装置 |
JP2014-202218 | 2014-09-30 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2016052164A1 true WO2016052164A1 (fr) | 2016-04-07 |
Family
ID=55630206
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2015/076081 WO2016052164A1 (fr) | 2014-09-30 | 2015-09-15 | Dispositif de conversation |
Country Status (2)
Country | Link |
---|---|
JP (1) | JP6448971B2 (fr) |
WO (1) | WO2016052164A1 (fr) |
Cited By (73)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109346083A (zh) * | 2018-11-28 | 2019-02-15 | 北京猎户星空科技有限公司 | 一种智能语音交互方法及装置、相关设备及存储介质 |
JP2020003081A (ja) * | 2018-06-25 | 2020-01-09 | 株式会社パロマ | ガスコンロ用の制御装置、ガスコンロシステム、及びガスコンロ用の制御装置における指示データ生成プログラム |
US10720160B2 (en) | 2018-06-01 | 2020-07-21 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US10748546B2 (en) | 2017-05-16 | 2020-08-18 | Apple Inc. | Digital assistant services based on device capabilities |
US10878809B2 (en) | 2014-05-30 | 2020-12-29 | Apple Inc. | Multi-command single utterance input method |
US10978090B2 (en) | 2013-02-07 | 2021-04-13 | Apple Inc. | Voice trigger for a digital assistant |
US11009970B2 (en) | 2018-06-01 | 2021-05-18 | Apple Inc. | Attention aware virtual assistant dismissal |
US11037565B2 (en) | 2016-06-10 | 2021-06-15 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US11070949B2 (en) | 2015-05-27 | 2021-07-20 | Apple Inc. | Systems and methods for proactively identifying and surfacing relevant content on an electronic device with a touch-sensitive display |
US11087759B2 (en) | 2015-03-08 | 2021-08-10 | Apple Inc. | Virtual assistant activation |
US11120372B2 (en) | 2011-06-03 | 2021-09-14 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US11126400B2 (en) | 2015-09-08 | 2021-09-21 | Apple Inc. | Zero latency digital assistant |
US11133008B2 (en) | 2014-05-30 | 2021-09-28 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US11140099B2 (en) | 2019-05-21 | 2021-10-05 | Apple Inc. | Providing message response suggestions |
US11152002B2 (en) | 2016-06-11 | 2021-10-19 | Apple Inc. | Application integration with a digital assistant |
US11169616B2 (en) | 2018-05-07 | 2021-11-09 | Apple Inc. | Raise to speak |
US11217251B2 (en) | 2019-05-06 | 2022-01-04 | Apple Inc. | Spoken notifications |
US11237797B2 (en) | 2019-05-31 | 2022-02-01 | Apple Inc. | User activity shortcut suggestions |
US11257504B2 (en) | 2014-05-30 | 2022-02-22 | Apple Inc. | Intelligent assistant for home automation |
US11269678B2 (en) | 2012-05-15 | 2022-03-08 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
US11289073B2 (en) | 2019-05-31 | 2022-03-29 | Apple Inc. | Device text to speech |
US11307752B2 (en) | 2019-05-06 | 2022-04-19 | Apple Inc. | User configurable task triggers |
US11348582B2 (en) | 2008-10-02 | 2022-05-31 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US11360641B2 (en) | 2019-06-01 | 2022-06-14 | Apple Inc. | Increasing the relevance of new available information |
US11380310B2 (en) | 2017-05-12 | 2022-07-05 | Apple Inc. | Low-latency intelligent automated assistant |
US11388291B2 (en) | 2013-03-14 | 2022-07-12 | Apple Inc. | System and method for processing voicemail |
US11405466B2 (en) | 2017-05-12 | 2022-08-02 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US11423886B2 (en) | 2010-01-18 | 2022-08-23 | Apple Inc. | Task flow identification based on user intent |
US11423908B2 (en) | 2019-05-06 | 2022-08-23 | Apple Inc. | Interpreting spoken requests |
US11431642B2 (en) | 2018-06-01 | 2022-08-30 | Apple Inc. | Variable latency device coordination |
US11467802B2 (en) | 2017-05-11 | 2022-10-11 | Apple Inc. | Maintaining privacy of personal information |
US11468282B2 (en) | 2015-05-15 | 2022-10-11 | Apple Inc. | Virtual assistant in a communication session |
US11475884B2 (en) | 2019-05-06 | 2022-10-18 | Apple Inc. | Reducing digital assistant latency when a language is incorrectly determined |
US11475898B2 (en) | 2018-10-26 | 2022-10-18 | Apple Inc. | Low-latency multi-speaker speech recognition |
US11488406B2 (en) | 2019-09-25 | 2022-11-01 | Apple Inc. | Text detection using global geometry estimators |
US11496600B2 (en) | 2019-05-31 | 2022-11-08 | Apple Inc. | Remote execution of machine-learned models |
US11500672B2 (en) | 2015-09-08 | 2022-11-15 | Apple Inc. | Distributed personal assistant |
US11516537B2 (en) | 2014-06-30 | 2022-11-29 | Apple Inc. | Intelligent automated assistant for TV user interactions |
CN115472158A (zh) * | 2021-06-11 | 2022-12-13 | 佛山市顺德区美的电热电器制造有限公司 | 家电设备及其语音控制方法、控制终端 |
US11526368B2 (en) | 2015-11-06 | 2022-12-13 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US11532306B2 (en) | 2017-05-16 | 2022-12-20 | Apple Inc. | Detecting a trigger of a digital assistant |
US11580990B2 (en) | 2017-05-12 | 2023-02-14 | Apple Inc. | User-specific acoustic models |
US11599331B2 (en) | 2017-05-11 | 2023-03-07 | Apple Inc. | Maintaining privacy of personal information |
US11656884B2 (en) | 2017-01-09 | 2023-05-23 | Apple Inc. | Application integration with a digital assistant |
US11657813B2 (en) | 2019-05-31 | 2023-05-23 | Apple Inc. | Voice identification in digital assistant systems |
US11671920B2 (en) | 2007-04-03 | 2023-06-06 | Apple Inc. | Method and system for operating a multifunction portable electronic device using voice-activation |
US11675829B2 (en) | 2017-05-16 | 2023-06-13 | Apple Inc. | Intelligent automated assistant for media exploration |
US11696060B2 (en) | 2020-07-21 | 2023-07-04 | Apple Inc. | User identification using headphones |
US11710482B2 (en) | 2018-03-26 | 2023-07-25 | Apple Inc. | Natural assistant interaction |
US11727219B2 (en) | 2013-06-09 | 2023-08-15 | Apple Inc. | System and method for inferring user intent from speech inputs |
US11755276B2 (en) | 2020-05-12 | 2023-09-12 | Apple Inc. | Reducing description length based on confidence |
US11765209B2 (en) | 2020-05-11 | 2023-09-19 | Apple Inc. | Digital assistant hardware abstraction |
US11783815B2 (en) | 2019-03-18 | 2023-10-10 | Apple Inc. | Multimodality in digital assistant systems |
US11790914B2 (en) | 2019-06-01 | 2023-10-17 | Apple Inc. | Methods and user interfaces for voice-based control of electronic devices |
US11798547B2 (en) | 2013-03-15 | 2023-10-24 | Apple Inc. | Voice activated device for use with a voice-based digital assistant |
US11809783B2 (en) | 2016-06-11 | 2023-11-07 | Apple Inc. | Intelligent device arbitration and control |
US11809483B2 (en) | 2015-09-08 | 2023-11-07 | Apple Inc. | Intelligent automated assistant for media search and playback |
US11838734B2 (en) | 2020-07-20 | 2023-12-05 | Apple Inc. | Multi-device audio adjustment coordination |
US11853647B2 (en) | 2015-12-23 | 2023-12-26 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US11853536B2 (en) | 2015-09-08 | 2023-12-26 | Apple Inc. | Intelligent automated assistant in a media environment |
US11854539B2 (en) | 2018-05-07 | 2023-12-26 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US11886805B2 (en) | 2015-11-09 | 2024-01-30 | Apple Inc. | Unconventional virtual assistant interactions |
US11893992B2 (en) | 2018-09-28 | 2024-02-06 | Apple Inc. | Multi-modal inputs for voice commands |
US11914848B2 (en) | 2020-05-11 | 2024-02-27 | Apple Inc. | Providing relevant data items based on context |
US11947873B2 (en) | 2015-06-29 | 2024-04-02 | Apple Inc. | Virtual assistant for media playback |
US12010262B2 (en) | 2013-08-06 | 2024-06-11 | Apple Inc. | Auto-activating smart responses based on activities from remote devices |
US12014118B2 (en) | 2017-05-15 | 2024-06-18 | Apple Inc. | Multi-modal interfaces having selection disambiguation and text modification capability |
US12021806B1 (en) | 2021-09-21 | 2024-06-25 | Apple Inc. | Intelligent message delivery |
US12051413B2 (en) | 2015-09-30 | 2024-07-30 | Apple Inc. | Intelligent device identification |
US12057118B2 (en) | 2019-03-29 | 2024-08-06 | Sony Group Corporation | Information processing apparatus and information processing method |
US12067985B2 (en) | 2018-06-01 | 2024-08-20 | Apple Inc. | Virtual assistant operations in multi-device environments |
US12073147B2 (en) | 2013-06-09 | 2024-08-27 | Apple Inc. | Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
US12087308B2 (en) | 2010-01-18 | 2024-09-10 | Apple Inc. | Intelligent automated assistant |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP6680125B2 (ja) * | 2016-07-25 | 2020-04-15 | トヨタ自動車株式会社 | ロボットおよび音声対話方法 |
JP6614080B2 (ja) * | 2016-09-16 | 2019-12-04 | トヨタ自動車株式会社 | 音声対話システムおよび音声対話方法 |
KR102100742B1 (ko) * | 2017-05-16 | 2020-04-14 | 애플 인크. | 디지털 어시스턴트 서비스의 원거리 확장 |
JP2019016061A (ja) * | 2017-07-04 | 2019-01-31 | 株式会社Nttドコモ | 情報処理装置及びプログラム |
JP7023823B2 (ja) * | 2018-11-16 | 2022-02-22 | アルパイン株式会社 | 車載装置及び音声認識方法 |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002108380A (ja) * | 2000-10-02 | 2002-04-10 | Canon Inc | 情報提示装置及びその制御方法、コンピュータ可読メモリ |
JP2003131695A (ja) * | 2001-10-25 | 2003-05-09 | Hitachi Ltd | 音声認識機器、音声認識機器制御装置、及び音声認識機器制御方法 |
WO2013190963A1 (fr) * | 2012-06-18 | 2013-12-27 | エイディシーテクノロジー株式会社 | Dispositif de réponse vocale |
JP2014062944A (ja) * | 2012-09-20 | 2014-04-10 | Sharp Corp | 情報処理装置 |
JP2014182307A (ja) * | 2013-03-19 | 2014-09-29 | Sharp Corp | 音声認識システム、および発話システム |
JP2014191030A (ja) * | 2013-03-26 | 2014-10-06 | Fuji Soft Inc | 音声認識端末およびコンピュータ端末を用いる音声認識方法 |
-
2014
- 2014-09-30 JP JP2014202218A patent/JP6448971B2/ja active Active
-
2015
- 2015-09-15 WO PCT/JP2015/076081 patent/WO2016052164A1/fr active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2002108380A (ja) * | 2000-10-02 | 2002-04-10 | Canon Inc | 情報提示装置及びその制御方法、コンピュータ可読メモリ |
JP2003131695A (ja) * | 2001-10-25 | 2003-05-09 | Hitachi Ltd | 音声認識機器、音声認識機器制御装置、及び音声認識機器制御方法 |
WO2013190963A1 (fr) * | 2012-06-18 | 2013-12-27 | エイディシーテクノロジー株式会社 | Dispositif de réponse vocale |
JP2014062944A (ja) * | 2012-09-20 | 2014-04-10 | Sharp Corp | 情報処理装置 |
JP2014182307A (ja) * | 2013-03-19 | 2014-09-29 | Sharp Corp | 音声認識システム、および発話システム |
JP2014191030A (ja) * | 2013-03-26 | 2014-10-06 | Fuji Soft Inc | 音声認識端末およびコンピュータ端末を用いる音声認識方法 |
Cited By (113)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11671920B2 (en) | 2007-04-03 | 2023-06-06 | Apple Inc. | Method and system for operating a multifunction portable electronic device using voice-activation |
US11979836B2 (en) | 2007-04-03 | 2024-05-07 | Apple Inc. | Method and system for operating a multi-function portable electronic device using voice-activation |
US11348582B2 (en) | 2008-10-02 | 2022-05-31 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US11900936B2 (en) | 2008-10-02 | 2024-02-13 | Apple Inc. | Electronic devices with voice command and contextual data processing capabilities |
US11423886B2 (en) | 2010-01-18 | 2022-08-23 | Apple Inc. | Task flow identification based on user intent |
US12087308B2 (en) | 2010-01-18 | 2024-09-10 | Apple Inc. | Intelligent automated assistant |
US11120372B2 (en) | 2011-06-03 | 2021-09-14 | Apple Inc. | Performing actions associated with task items that represent tasks to perform |
US11321116B2 (en) | 2012-05-15 | 2022-05-03 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
US11269678B2 (en) | 2012-05-15 | 2022-03-08 | Apple Inc. | Systems and methods for integrating third party services with a digital assistant |
US11862186B2 (en) | 2013-02-07 | 2024-01-02 | Apple Inc. | Voice trigger for a digital assistant |
US11636869B2 (en) | 2013-02-07 | 2023-04-25 | Apple Inc. | Voice trigger for a digital assistant |
US12009007B2 (en) | 2013-02-07 | 2024-06-11 | Apple Inc. | Voice trigger for a digital assistant |
US10978090B2 (en) | 2013-02-07 | 2021-04-13 | Apple Inc. | Voice trigger for a digital assistant |
US11557310B2 (en) | 2013-02-07 | 2023-01-17 | Apple Inc. | Voice trigger for a digital assistant |
US11388291B2 (en) | 2013-03-14 | 2022-07-12 | Apple Inc. | System and method for processing voicemail |
US11798547B2 (en) | 2013-03-15 | 2023-10-24 | Apple Inc. | Voice activated device for use with a voice-based digital assistant |
US11727219B2 (en) | 2013-06-09 | 2023-08-15 | Apple Inc. | System and method for inferring user intent from speech inputs |
US12073147B2 (en) | 2013-06-09 | 2024-08-27 | Apple Inc. | Device, method, and graphical user interface for enabling conversation persistence across two or more instances of a digital assistant |
US12010262B2 (en) | 2013-08-06 | 2024-06-11 | Apple Inc. | Auto-activating smart responses based on activities from remote devices |
US11133008B2 (en) | 2014-05-30 | 2021-09-28 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US12118999B2 (en) | 2014-05-30 | 2024-10-15 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US10878809B2 (en) | 2014-05-30 | 2020-12-29 | Apple Inc. | Multi-command single utterance input method |
US11670289B2 (en) | 2014-05-30 | 2023-06-06 | Apple Inc. | Multi-command single utterance input method |
US11699448B2 (en) | 2014-05-30 | 2023-07-11 | Apple Inc. | Intelligent assistant for home automation |
US12067990B2 (en) | 2014-05-30 | 2024-08-20 | Apple Inc. | Intelligent assistant for home automation |
US11810562B2 (en) | 2014-05-30 | 2023-11-07 | Apple Inc. | Reducing the need for manual start/end-pointing and trigger phrases |
US11257504B2 (en) | 2014-05-30 | 2022-02-22 | Apple Inc. | Intelligent assistant for home automation |
US11516537B2 (en) | 2014-06-30 | 2022-11-29 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US11838579B2 (en) | 2014-06-30 | 2023-12-05 | Apple Inc. | Intelligent automated assistant for TV user interactions |
US11842734B2 (en) | 2015-03-08 | 2023-12-12 | Apple Inc. | Virtual assistant activation |
US11087759B2 (en) | 2015-03-08 | 2021-08-10 | Apple Inc. | Virtual assistant activation |
US11468282B2 (en) | 2015-05-15 | 2022-10-11 | Apple Inc. | Virtual assistant in a communication session |
US12001933B2 (en) | 2015-05-15 | 2024-06-04 | Apple Inc. | Virtual assistant in a communication session |
US11070949B2 (en) | 2015-05-27 | 2021-07-20 | Apple Inc. | Systems and methods for proactively identifying and surfacing relevant content on an electronic device with a touch-sensitive display |
US11947873B2 (en) | 2015-06-29 | 2024-04-02 | Apple Inc. | Virtual assistant for media playback |
US11126400B2 (en) | 2015-09-08 | 2021-09-21 | Apple Inc. | Zero latency digital assistant |
US11809483B2 (en) | 2015-09-08 | 2023-11-07 | Apple Inc. | Intelligent automated assistant for media search and playback |
US11954405B2 (en) | 2015-09-08 | 2024-04-09 | Apple Inc. | Zero latency digital assistant |
US11550542B2 (en) | 2015-09-08 | 2023-01-10 | Apple Inc. | Zero latency digital assistant |
US11853536B2 (en) | 2015-09-08 | 2023-12-26 | Apple Inc. | Intelligent automated assistant in a media environment |
US11500672B2 (en) | 2015-09-08 | 2022-11-15 | Apple Inc. | Distributed personal assistant |
US12051413B2 (en) | 2015-09-30 | 2024-07-30 | Apple Inc. | Intelligent device identification |
US11809886B2 (en) | 2015-11-06 | 2023-11-07 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US11526368B2 (en) | 2015-11-06 | 2022-12-13 | Apple Inc. | Intelligent automated assistant in a messaging environment |
US11886805B2 (en) | 2015-11-09 | 2024-01-30 | Apple Inc. | Unconventional virtual assistant interactions |
US11853647B2 (en) | 2015-12-23 | 2023-12-26 | Apple Inc. | Proactive assistance based on dialog communication between devices |
US11037565B2 (en) | 2016-06-10 | 2021-06-15 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US11657820B2 (en) | 2016-06-10 | 2023-05-23 | Apple Inc. | Intelligent digital assistant in a multi-tasking environment |
US11152002B2 (en) | 2016-06-11 | 2021-10-19 | Apple Inc. | Application integration with a digital assistant |
US11749275B2 (en) | 2016-06-11 | 2023-09-05 | Apple Inc. | Application integration with a digital assistant |
US11809783B2 (en) | 2016-06-11 | 2023-11-07 | Apple Inc. | Intelligent device arbitration and control |
US11656884B2 (en) | 2017-01-09 | 2023-05-23 | Apple Inc. | Application integration with a digital assistant |
US11467802B2 (en) | 2017-05-11 | 2022-10-11 | Apple Inc. | Maintaining privacy of personal information |
US11599331B2 (en) | 2017-05-11 | 2023-03-07 | Apple Inc. | Maintaining privacy of personal information |
US11862151B2 (en) | 2017-05-12 | 2024-01-02 | Apple Inc. | Low-latency intelligent automated assistant |
US11380310B2 (en) | 2017-05-12 | 2022-07-05 | Apple Inc. | Low-latency intelligent automated assistant |
US11580990B2 (en) | 2017-05-12 | 2023-02-14 | Apple Inc. | User-specific acoustic models |
US11538469B2 (en) | 2017-05-12 | 2022-12-27 | Apple Inc. | Low-latency intelligent automated assistant |
US11837237B2 (en) | 2017-05-12 | 2023-12-05 | Apple Inc. | User-specific acoustic models |
US11405466B2 (en) | 2017-05-12 | 2022-08-02 | Apple Inc. | Synchronization and task delegation of a digital assistant |
US12014118B2 (en) | 2017-05-15 | 2024-06-18 | Apple Inc. | Multi-modal interfaces having selection disambiguation and text modification capability |
US10748546B2 (en) | 2017-05-16 | 2020-08-18 | Apple Inc. | Digital assistant services based on device capabilities |
US12026197B2 (en) | 2017-05-16 | 2024-07-02 | Apple Inc. | Intelligent automated assistant for media exploration |
US11675829B2 (en) | 2017-05-16 | 2023-06-13 | Apple Inc. | Intelligent automated assistant for media exploration |
US11217255B2 (en) | 2017-05-16 | 2022-01-04 | Apple Inc. | Far-field extension for digital assistant services |
US11532306B2 (en) | 2017-05-16 | 2022-12-20 | Apple Inc. | Detecting a trigger of a digital assistant |
US11710482B2 (en) | 2018-03-26 | 2023-07-25 | Apple Inc. | Natural assistant interaction |
US11907436B2 (en) | 2018-05-07 | 2024-02-20 | Apple Inc. | Raise to speak |
US11854539B2 (en) | 2018-05-07 | 2023-12-26 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US11487364B2 (en) | 2018-05-07 | 2022-11-01 | Apple Inc. | Raise to speak |
US11169616B2 (en) | 2018-05-07 | 2021-11-09 | Apple Inc. | Raise to speak |
US11900923B2 (en) | 2018-05-07 | 2024-02-13 | Apple Inc. | Intelligent automated assistant for delivering content from user experiences |
US11630525B2 (en) | 2018-06-01 | 2023-04-18 | Apple Inc. | Attention aware virtual assistant dismissal |
US12067985B2 (en) | 2018-06-01 | 2024-08-20 | Apple Inc. | Virtual assistant operations in multi-device environments |
US11360577B2 (en) | 2018-06-01 | 2022-06-14 | Apple Inc. | Attention aware virtual assistant dismissal |
US10720160B2 (en) | 2018-06-01 | 2020-07-21 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US11009970B2 (en) | 2018-06-01 | 2021-05-18 | Apple Inc. | Attention aware virtual assistant dismissal |
US11431642B2 (en) | 2018-06-01 | 2022-08-30 | Apple Inc. | Variable latency device coordination |
US12080287B2 (en) | 2018-06-01 | 2024-09-03 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US10984798B2 (en) | 2018-06-01 | 2021-04-20 | Apple Inc. | Voice interaction at a primary device to access call functionality of a companion device |
US12061752B2 (en) | 2018-06-01 | 2024-08-13 | Apple Inc. | Attention aware virtual assistant dismissal |
JP7162865B2 (ja) | 2018-06-25 | 2022-10-31 | 株式会社パロマ | ガスコンロ用の制御装置、及びガスコンロシステム |
JP2020003081A (ja) * | 2018-06-25 | 2020-01-09 | 株式会社パロマ | ガスコンロ用の制御装置、ガスコンロシステム、及びガスコンロ用の制御装置における指示データ生成プログラム |
US11893992B2 (en) | 2018-09-28 | 2024-02-06 | Apple Inc. | Multi-modal inputs for voice commands |
US11475898B2 (en) | 2018-10-26 | 2022-10-18 | Apple Inc. | Low-latency multi-speaker speech recognition |
CN109346083A (zh) * | 2018-11-28 | 2019-02-15 | 北京猎户星空科技有限公司 | 一种智能语音交互方法及装置、相关设备及存储介质 |
US11783815B2 (en) | 2019-03-18 | 2023-10-10 | Apple Inc. | Multimodality in digital assistant systems |
US12057118B2 (en) | 2019-03-29 | 2024-08-06 | Sony Group Corporation | Information processing apparatus and information processing method |
US11217251B2 (en) | 2019-05-06 | 2022-01-04 | Apple Inc. | Spoken notifications |
US11423908B2 (en) | 2019-05-06 | 2022-08-23 | Apple Inc. | Interpreting spoken requests |
US11675491B2 (en) | 2019-05-06 | 2023-06-13 | Apple Inc. | User configurable task triggers |
US11307752B2 (en) | 2019-05-06 | 2022-04-19 | Apple Inc. | User configurable task triggers |
US11705130B2 (en) | 2019-05-06 | 2023-07-18 | Apple Inc. | Spoken notifications |
US11475884B2 (en) | 2019-05-06 | 2022-10-18 | Apple Inc. | Reducing digital assistant latency when a language is incorrectly determined |
US11888791B2 (en) | 2019-05-21 | 2024-01-30 | Apple Inc. | Providing message response suggestions |
US11140099B2 (en) | 2019-05-21 | 2021-10-05 | Apple Inc. | Providing message response suggestions |
US11496600B2 (en) | 2019-05-31 | 2022-11-08 | Apple Inc. | Remote execution of machine-learned models |
US11657813B2 (en) | 2019-05-31 | 2023-05-23 | Apple Inc. | Voice identification in digital assistant systems |
US11289073B2 (en) | 2019-05-31 | 2022-03-29 | Apple Inc. | Device text to speech |
US11237797B2 (en) | 2019-05-31 | 2022-02-01 | Apple Inc. | User activity shortcut suggestions |
US11360739B2 (en) | 2019-05-31 | 2022-06-14 | Apple Inc. | User activity shortcut suggestions |
US11360641B2 (en) | 2019-06-01 | 2022-06-14 | Apple Inc. | Increasing the relevance of new available information |
US11790914B2 (en) | 2019-06-01 | 2023-10-17 | Apple Inc. | Methods and user interfaces for voice-based control of electronic devices |
US11488406B2 (en) | 2019-09-25 | 2022-11-01 | Apple Inc. | Text detection using global geometry estimators |
US11914848B2 (en) | 2020-05-11 | 2024-02-27 | Apple Inc. | Providing relevant data items based on context |
US11924254B2 (en) | 2020-05-11 | 2024-03-05 | Apple Inc. | Digital assistant hardware abstraction |
US11765209B2 (en) | 2020-05-11 | 2023-09-19 | Apple Inc. | Digital assistant hardware abstraction |
US11755276B2 (en) | 2020-05-12 | 2023-09-12 | Apple Inc. | Reducing description length based on confidence |
US11838734B2 (en) | 2020-07-20 | 2023-12-05 | Apple Inc. | Multi-device audio adjustment coordination |
US11750962B2 (en) | 2020-07-21 | 2023-09-05 | Apple Inc. | User identification using headphones |
US11696060B2 (en) | 2020-07-21 | 2023-07-04 | Apple Inc. | User identification using headphones |
CN115472158A (zh) * | 2021-06-11 | 2022-12-13 | 佛山市顺德区美的电热电器制造有限公司 | 家电设备及其语音控制方法、控制终端 |
US12021806B1 (en) | 2021-09-21 | 2024-06-25 | Apple Inc. | Intelligent message delivery |
Also Published As
Publication number | Publication date |
---|---|
JP6448971B2 (ja) | 2019-01-09 |
JP2016071247A (ja) | 2016-05-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP6448971B2 (ja) | 対話装置 | |
JP6752870B2 (ja) | 複数のウェイクワードを利用して人工知能機器を制御する方法およびシステム | |
CN106257355B (zh) | 设备控制方法和控制器 | |
KR102306624B1 (ko) | 지속적 컴패니언 디바이스 구성 및 전개 플랫폼 | |
WO2016052018A1 (fr) | Système de gestion d'appareil ménager, appareil ménager, dispositif de commande à distance et robot | |
US20170206064A1 (en) | Persistent companion device configuration and deployment platform | |
CN103188541B (zh) | 电子设备和控制电子设备的方法 | |
KR102400398B1 (ko) | 애니메이션 캐릭터 헤드 시스템 및 방법 | |
KR20160034243A (ko) | 지속적인 동반 디바이스를 제공하기 위한 장치 및 방법들 | |
WO2017141530A1 (fr) | Dispositif de traitement d'informations, procédé de traitement d'informations et programme | |
WO2018006370A1 (fr) | Procédé et système d'interaction pour robot 3d virtuel, et robot | |
TW201408052A (zh) | 電視裝置及其虛擬主持人顯示方法 | |
JP2022169645A (ja) | 装置及びプログラム等 | |
JP7267411B2 (ja) | インタラクティブオブジェクト駆動方法、装置、電子デバイス及び記憶媒体 | |
WO2016206646A1 (fr) | Procédé et système pour pousser un dispositif de machine à générer une action | |
US20200143235A1 (en) | System and method for providing smart objects virtual communication | |
WO2016052520A1 (fr) | Dispositif de conversation | |
JP2016206249A (ja) | 対話装置、対話システム、及び対話装置の制御方法 | |
WO2016117514A1 (fr) | Dispositif de commande de robot et robot | |
CN104138665B (zh) | 一种玩偶控制方法及玩偶 | |
KR102519599B1 (ko) | 멀티모달 기반의 인터랙션 로봇, 및 그 제어 방법 | |
CN117971154A (zh) | 多模态响应 | |
JP7286303B2 (ja) | 会議支援システム及び会議用ロボット | |
JP7208361B2 (ja) | コミュニケーションロボットおよびその制御方法、情報処理サーバならびに情報処理方法 | |
JP2020067877A (ja) | 対話装置および対話装置の制御プログラム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15846101 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 15846101 Country of ref document: EP Kind code of ref document: A1 |