WO2016157662A1 - 情報処理装置、制御方法、およびプログラム - Google Patents
情報処理装置、制御方法、およびプログラム Download PDFInfo
- Publication number
- WO2016157662A1 WO2016157662A1 PCT/JP2015/086544 JP2015086544W WO2016157662A1 WO 2016157662 A1 WO2016157662 A1 WO 2016157662A1 JP 2015086544 W JP2015086544 W JP 2015086544W WO 2016157662 A1 WO2016157662 A1 WO 2016157662A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- response
- user
- information processing
- processing apparatus
- target user
- Prior art date
Links
- 230000010365 information processing Effects 0.000 title claims abstract description 115
- 238000000034 method Methods 0.000 title claims abstract description 101
- 230000004044 response Effects 0.000 claims abstract description 228
- 230000006870 function Effects 0.000 claims description 16
- 230000008569 process Effects 0.000 claims description 13
- 230000003993 interaction Effects 0.000 claims description 4
- 230000007704 transition Effects 0.000 claims description 2
- 230000009118 appropriate response Effects 0.000 abstract description 4
- 238000012545 processing Methods 0.000 description 17
- 238000004891 communication Methods 0.000 description 9
- 238000010586 diagram Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 238000012986 modification Methods 0.000 description 5
- 238000005259 measurement Methods 0.000 description 3
- 125000002066 L-histidyl group Chemical group [H]N1C([H])=NC(C([H])([H])[C@](C(=O)[*])([H])N([H])[H])=C1[H] 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000004590 computer program Methods 0.000 description 2
- 238000010295 mobile communication Methods 0.000 description 2
- 230000001151 other effect Effects 0.000 description 2
- 230000005236 sound signal Effects 0.000 description 2
- WZFUQSJFWNHZHM-UHFFFAOYSA-N 2-[4-[2-(2,3-dihydro-1H-inden-2-ylamino)pyrimidin-5-yl]piperazin-1-yl]-1-(2,4,6,7-tetrahydrotriazolo[4,5-c]pyridin-5-yl)ethanone Chemical compound C1C(CC2=CC=CC=C12)NC1=NC=C(C=N1)N1CCN(CC1)CC(=O)N1CC2=C(CC1)NN=N2 WZFUQSJFWNHZHM-UHFFFAOYSA-N 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000004397 blinking Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000012790 confirmation Methods 0.000 description 1
- 230000003111 delayed effect Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/16—Sound input; Sound output
- G06F3/167—Audio in a user interface, e.g. using voice commands for navigating, audio feedback
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0481—Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
- G06F3/04817—Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance using icons
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0487—Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/10—Speech classification or search using distance or distortion measures between unknown speech and reference templates
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Definitions
- the present disclosure relates to an information processing device, a control method, and a program.
- Patent Document 1 when the user interrupts speech in the middle of a response on the system side (that is, during voice output) in voice dialogue with a single user, the system side considers the importance of the response content.
- a speech dialog control method is disclosed that continues or pauses response.
- Patent Document 2 there is disclosed a voice dialogue apparatus which makes it easy for the user to recognize who's voice is currently output when a plurality of users are conducting voice dialogue.
- the voice UI responds by voice output
- voice output one-to-one use of the system and the user is assumed, and interaction in the environment where multiple users use is not considered. Therefore, for example, when assuming use in a home or a public space, a situation where one user occupies the system is likely to occur.
- Patent Document 1 is a response system in voice dialogue with a single user, and simultaneous response to a plurality of users has been difficult.
- Patent Document 2 relates to the system use by a plurality of users, the use of a plurality of users in an audio UI that automatically responds by voice to the user's speech is not assumed.
- the present disclosure proposes an information processing apparatus, a control method, and a program capable of improving the convenience of the speech recognition system by outputting an appropriate response to each user when a plurality of users speak.
- a response generation unit that generates a response to an utterance from a plurality of users, and a determination unit that determines a response output method for each user based on the priority according to the utterance order of the plurality of users.
- an output control unit configured to control to output the generated response according to the determined response output method.
- generating a response to an utterance from a plurality of users determining a response output method to each user based on the priority according to the utterance order of the plurality of users, and determining And controlling the output control unit to output the generated response according to the response output method.
- a determination is made to determine a method of outputting a response to each user based on a response generation unit that generates a response to an utterance from a plurality of users and a priority according to the utterance order of the plurality of users.
- FIG. 1 is a diagram for describing an overview of a speech recognition system according to an embodiment of the present disclosure. It is a figure showing an example of composition of an information processor by this embodiment. It is a flowchart which shows the operation processing of the speech recognition system by this embodiment. It is a figure explaining the response output example by the audio
- a speech recognition system has a basic function of performing speech recognition and semantic analysis on a user's speech and responding by speech.
- an outline of a speech recognition system according to an embodiment of the present disclosure will be described with reference to FIG.
- FIG. 1 is a diagram for describing an overview of a speech recognition system according to an embodiment of the present disclosure.
- the information processing apparatus 1 illustrated in FIG. 1 has a voice UI agent function that can perform voice recognition and semantic analysis on a user's speech and output a response to the user by voice.
- the external appearance of the information processing apparatus 1 is not particularly limited, but may be, for example, a cylindrical shape as shown in FIG.
- a light emitting unit 18 formed of a light emitting element such as a light emitting diode (LED) is provided in a band shape so as to surround the horizontal central region of the side surface.
- the information processing apparatus 1 can notify the user of the state of the information processing apparatus 1 by lighting the entire light emitting unit 18 or lighting a part thereof.
- the light emitting unit 18 partially illuminates the direction of the user, that is, the direction of the speaker so that the user looks as if the user looks at the user as shown in FIG. Can.
- the information processing apparatus 1 can also notify the user that processing is in progress by controlling the light to be turning on the side by the light emitting unit 18 during response generation or data search.
- the voice UI responds by voice output
- one-to-one use of the system and the user is assumed, and interaction in an environment where a plurality of users use is not considered. Therefore, for example, when assuming use in a home or a public space, a situation where one user occupies the system is likely to occur.
- the speech recognition system it is possible to improve the convenience of the speech recognition system by outputting an appropriate response to each user when a plurality of users speak.
- the information processing apparatus 1 has a function of projecting and displaying an image on the wall 20 as shown in FIG. 1, for example, and is capable of response output by display in addition to response output by voice. Thereby, when the information processing apparatus 1 has an utterance from another user while outputting a response by voice, the information processing apparatus 1 displays and outputs an image prompting the user to wait for a moment, etc. It is possible to respond flexibly without ignoring the user's speech or interrupting the response.
- the information processing apparatus 1 outputs an answer 31 "It looks like it will be fine tomorrow" in response to the speech 30 "Can it be fine tomorrow?" From the user AA.
- the response image 21b of the clear mark is displayed on the wall 20.
- the information processing apparatus 1 waits for a response image 21a prompting the user BB to wait for the turn. Display and output. Further, at this time, the information processing apparatus 1 may project an utterance content image 21 c “concerts do you want?” In which the recognized utterance content of the user BB is converted into a text on the wall 20.
- the user BB can grasp that his / her speech has been correctly recognized by the information processing apparatus 1.
- the information processing device 1 performs voice response output to the user BB which has been made to stand by.
- simultaneous use of the system by a plurality of users can be realized by, for example, transitioning the occupancy of the voice response output according to the order of speech.
- the outline of the speech recognition system according to the present disclosure has been described above.
- the shape of the information processing apparatus 1 is not limited to the cylindrical shape shown in FIG. 1, and may be, for example, a cube, a sphere, or a polyhedron. Subsequently, a basic configuration and an operation process of the information processing device 1 for realizing the speech recognition system according to an embodiment of the present disclosure will be sequentially described.
- FIG. 2 is a diagram showing an example of the configuration of the information processing apparatus 1 according to the present embodiment.
- the information processing apparatus 1 includes a control unit 10, a communication unit 11, a microphone 12, a speaker 13, a camera 14, a distance measurement sensor 15, a projection unit 16, a storage unit 17, and a light emitting unit 18.
- the control unit 10 controls each component of the information processing apparatus 1.
- the control unit 10 is realized by a microcomputer including a central processing unit (CPU), a read only memory (ROM), a random access memory (RAM), and a non-volatile memory.
- the control unit 10 according to the present embodiment is a voice recognition unit 10 a, a semantic analysis unit 10 b, a response generation unit 10 c, a target determination unit 10 d, a response output method determination unit 10 e, and an output control unit 10 f. It also works as
- the voice recognition unit 10a recognizes the voice of the user collected by the microphone 12 of the information processing device 1, performs conversion to a character string, and acquires a speech text.
- the voice recognition unit 10a can also identify a person who is producing a voice based on the features of the voice, and can estimate the voice generation source, that is, the direction of the speaker.
- the semantic analysis unit 10 b performs semantic analysis on the utterance text acquired by the speech recognition unit 10 a using natural language processing or the like. The result of the semantic analysis is output to the response generation unit 10c.
- the response generation unit 10 c generates a response to the user's utterance based on the semantic analysis result. For example, when the user's utterance is for asking for "the weather of tomorrow", the response generation unit 10c acquires information on "the weather of tomorrow" from the weather forecast server on the network, and generates a response.
- the target determination unit 10d determines the priority of each user based on a predetermined condition, and the user with the highest priority is the target user, the others. Determine one or more users of as non-target users.
- the speech from a plurality of users is recognized, the speech of the second user is recognized during the speech of the first user, or the speech of the second user is output during the speech response to the speech of the first user. Assume that speech is recognized.
- the priority of each user based on a predetermined condition may be, for example, a priority based on the order of speech.
- the target determination unit 10 d determines the priority of the first user who has started the dialog first, and then the dialog Higher than the second user's priority.
- the target determination unit 10d may redetermine the priority and change the interrupted non-target user to the target user.
- the explicit interrupt process for example, a predetermined command utterance by voice, a predetermined operation by gesture, a predetermined user situation based on sensing data, and the like are assumed. Details of the interrupt processing will be described later.
- the response output method determination unit 10e determines a response output method for each user based on the priorities of the plurality of users. For example, the response output method determination unit 10e determines the voice response output / display response output according to whether or not the target determination unit 10d determines the target user. Specifically, for example, the response output method determination unit 10 e assigns different response output methods to the target user and the non-target user so that the target user occupies the voice response output and assigns the non-target user the display response output. . Further, even when the target user is assigned the response output by display, the response output method determination unit 10e can allocate a part of the display area to the non-target user.
- the output control unit 10 f controls to output the response generated by the response generation unit 10 c according to the response output method determined by the response output method determination unit 10 e.
- a specific response output example according to the present embodiment will be described later.
- the communication unit 11 transmits and receives data to and from an external device.
- the communication unit 11 connects to a predetermined server on the network, and receives information necessary for generating a response by the response generation unit 10c. Further, the communication unit 11 cooperates with peripheral devices, and transmits response data to the target device under the control of the output control unit 10 f.
- the microphone 12 has a function of picking up surrounding sound and outputting the sound to the control unit 10 as a sound signal. Also, the microphone 12 may be realized by an array microphone.
- the speaker 13 has a function of converting an audio signal into an audio and outputting the audio according to the control of the output control unit 10 f.
- the camera 14 has a function of capturing an image of the periphery with an imaging lens provided in the information processing device 1 and outputting a captured image to the control unit 10. Also, the camera 14 may be realized by a 360 degree camera or a wide angle camera.
- the distance measuring sensor 15 has a function of measuring the distance between the information processing apparatus 1 and the user or a person who is around the user.
- the distance measurement sensor 15 is realized by, for example, an optical sensor (a sensor that measures the distance to the object based on phase difference information of light emission and light reception timing).
- the projection unit 16 is an example of a display device, and has a function of displaying an image by projecting (enlarging) an image on a wall or a screen.
- the storage unit 17 stores a program for causing each component of the information processing apparatus 1 to function.
- the storage unit 17 may use various parameters used when the target determination unit 10 d calculates the priorities of a plurality of users, or a target / non-target determined based on the priority (or the priority based on the priority) by the response output method determination unit 10 e.
- the storage unit 17 also stores registration information of the user.
- User registration information includes personal identification information (features of voice, face images, features of human images (including body images), names, identification numbers, etc.), age, gender, hobbies, preferences, attributes (housewife , Office workers, students, etc.), and information on communication terminals owned by the user.
- the light emitting unit 18 is realized by a light emitting element such as an LED, and can control all lights, partial lighting, blinking, control of a lighting position, and the like. For example, by partially lighting the direction of the utterer recognized by the voice recognition unit 10a according to the control of the control unit 10, the light emitting unit 18 can make it look as if the line of sight looks toward the utterer.
- the configuration of the information processing apparatus 1 according to the present embodiment has been specifically described above.
- the structure shown in FIG. 2 is an example, and this embodiment is not limited to this.
- the information processing apparatus 1 may further include an infrared (infrared) camera, a depth camera, a stereo camera, a human sensor, or the like in order to acquire information on the surrounding environment.
- the installation positions of the microphone 12, the speaker 13, the camera 14, the light emitting unit 18 and the like provided in the information processing apparatus 1 are not particularly limited.
- each function of the control unit 10 according to the present embodiment may be on a cloud connected via the communication unit 11.
- FIG. 3 is a flowchart showing operation processing of the speech recognition system according to the present embodiment.
- the control unit 10 of the information processing device 1 determines whether or not there is an utterance from the user. Specifically, the control unit 10 performs voice recognition by the voice recognition unit 10a and semantic analysis by the semantic analysis unit 10b on the voice signal collected by the microphone 12 and determines whether the user's utterance to the system To judge.
- step S106 the control unit 10 determines whether there is an utterance of a plurality of users. Specifically, the control unit 10 can determine whether or not there is an utterance from two or more users based on the user (utterer) identification by the voice recognition unit 10a.
- the response output method determination unit 10e of the control unit 10 determines the voice response output method (S112).
- the unit 10f outputs the voice of the response generated by the response generation unit 10c (S115).
- step S109 the target determination unit 10d of the control unit 10 determines a target user and a non-target user based on the priority of each user. For example, the target determination unit 10d increases the priority of the first user who uttered first and decides the target user as the target user, and the priority of the second user who utters later is the priority of the first user. Decide to be a non-target user lower than the degree.
- the response output method determination unit 10e determines a response output method according to the target / non-target determined by the target determination unit 10d. For example, the target user decides on an answer output method by voice (that is, makes the target user occupy the voice answer output method), and the non-target user decides on an answer output method by display.
- step S115 the output control unit 10f determines the response to the utterance of each user generated by the response generation unit 10c with respect to the semantic analysis result of the utterance by the semantic analysis unit 10b by the response output method determination unit 10e. It controls to output each with the above output method.
- the first user can be determined by the target user and can occupy the audio output method.
- the control unit 10 f can continue without interrupting the response.
- the output control unit 10f performs the second operation in parallel with the voice response output to the first user.
- the display response output can be performed to the user of.
- the output control unit 10f outputs a display response indicating that the second user is instructed to wait for the order of response, and after the voice response to the first user is completed, the second user is notified to the second user Voice response output of
- the priority of the second user is increased, and the user is changed to the target user and the voice response output can be occupied.
- the response output method determination unit 10e controls the single user to occupy the voice response output. Do.
- the voice UI system according to the present embodiment can flexibly cope with the utterances of a plurality of users, and the convenience of the voice UI system can be improved.
- a specific response output example for a plurality of users according to this embodiment will be described later.
- step S118 when there is an explicit interrupt process during the response (S118 / Yes), the control unit 10 changes the targets / non-targets of the plurality of users by the target determination unit 10d (S109). Specifically, the target determination unit 10d sets the priority of the interrupt user higher than that of the current target user, determines the interrupt user as the target user, and changes the current target user to the non-target user. Then, the control unit 10 performs control to switch to a response output method determined again according to the change and to respond (S112, S115).
- the explicit interrupt process may be, for example, a process by voice or gesture as described below.
- a voice interrupt process when a system name is uttered such as "SS (system name), weather is taught", a predetermined command for interrupt is uttered such as "interrupt, weather is taught", or If the user utters a term indicating that the user is in a hurry or an important task, such as "Tell me weather, hurry up!", The priority of the interrupting user is raised. In addition, even if the user speaks louder or louder than the user's normal voice (or general voice size), it is judged as an explicit interrupt process, and the interrupt user's priority is given. The degree may be increased.
- an interrupt process with a gesture for example, when an utterance is performed with a predetermined operation such as raising a hand, the priority of the interrupt user may be increased.
- an interrupt processing function is added to a remote controller for operating the information processing apparatus 1, a physical button provided in the information processing apparatus 1, etc. It is also good.
- the explicit interrupt process may be determined based on the content detected by the camera 14 or the distance measuring sensor 15 or the like, for example.
- the schedule information of the target user may be acquired from a predetermined server or the like, and if the interrupt user has a schedule immediately after this, it may be determined that the interrupt processing is explicit and the priority may be raised.
- the explicit interrupt processing has been described above, but in the present embodiment, in addition to the above-described interrupt processing, it is also possible to perform interrupt processing according to the attribute of the target user. That is, when the information processing apparatus 1 can identify the speaker, static or dynamic priority is assigned to each user. Specifically, for example, when the user AA is a "son", the user BB is registered as a "mother”, and it is set that the priority of the "mother” is higher than that of the "son”, with the user AA When the user BB utters and interrupts during the dialogue, control is performed such that the priority of the user BB is higher than that of the user BB. As a result, the response to the user AA switches from voice output to display output.
- FIG. 4 is a diagram for explaining an example of response output by speech and display for simultaneous speech of a plurality of persons according to the present embodiment.
- the information processing apparatus 1 recognizes the speech 32 from the user BB while outputting the response 31 in voice to the speech 30 from the user AA, the dialogue is started earlier The determined user AA is determined as the target user, and the voice output of the response 31 is continued.
- the information processing apparatus 1 determines the user BB uttered later as a non-target user, and displays and outputs a response image 21a prompting the user BB to wait.
- the information processing apparatus 1 makes a voice response 33 “Wait for a while to wait for the user BB who has been waiting. Next week's "It is Friday" is output.
- the information processing apparatus 1 can also project the response image 21d on the wall 20 to perform display output, if necessary.
- the information processing apparatus 1 clearly indicates that the direction of the user BB is one in the light emitting unit 18 of the information processing apparatus 1 in order to clearly indicate that the occupation of the voice response output has been shifted to the user BB. By controlling to turn on the part, the user BB may be shown to look like.
- simultaneous use of the system by a plurality of users can be realized by transitioning the occupation of the speech response output in accordance with the user's speech order.
- the standby instruction to the non-target user is not limited to the projection of the response image 21a as shown in FIG.
- some modified examples will be described.
- Modification 1 For example, when the display response output is also occupied by the target user, the information processing apparatus 1 outputs a standby instruction to the non-target user using the sub display or the light emitting unit 18 provided in the information processing apparatus 1. Is possible.
- the information processing apparatus 1 may output a standby instruction using color information of an icon or light.
- notification of the standby user using the sub display will be described with reference to FIGS. 5A and 5B.
- the output control unit 10f may visualize a non-target user waiting for the current response by a display such as a queue. .
- a display such as a queue.
- the output control unit 10f visualizes a non-target user waiting for the current response by a display such as a queue.
- a display such as a queue.
- the user's ID and name may be clearly indicated, and may be displayed in the color of the registered target user. In the example shown in FIG. 5B, it can be intuitively grasped who is currently waiting for a response.
- FIG. 6 is a diagram for explaining an example in which responses to non-target users are indicated by icons to save a display area.
- the information processing apparatus 1 that recognizes the utterance 34 “show calendar” from the user AA outputs a response 35 “It is a calendar” and projects the corresponding calendar image 22 a on the wall 20. .
- the display area 200 is widely used. Therefore, when the user BB's speech 36 "Is mail coming?" Is recognized in the middle, a space for displaying the response image 21a and the speech content image 21c as shown on the left of FIG. 4 can not be secured. As shown in FIG. 6, the icon image 22b of the mail is displayed. As a result, the user BB can intuitively recognize that his / her speech is correctly recognized and is in a response waiting state.
- FIG. 7 is a diagram for explaining simultaneous voice response using directivity.
- the information processing apparatus 1 recognizes the position of each speaker using the content sensed by the camera 14 or the microphone 12, and as shown in FIG. 7, the response 37 for the user AA and the response 38 for the user BB Voice output toward the position of and make simultaneous response.
- the information processing apparatus 1 may divide the display area, assign display areas to a plurality of users, and display the response image 23a for the user AA and the response image 23b for the user BB. Further, the information processing apparatus 1 may make the display area corresponding to the target user larger than the display area corresponding to the non-target user.
- the speech recognition system it is possible to simultaneously respond to voices to a plurality of users by using directivity, and to realize simultaneous use of the system by a plurality of users.
- the information processing apparatus 1 can also be controlled in cooperation with an external device so as to make a response to a non-target user from the external device.
- the information processing apparatus 1 may be a portable communication terminal owned by the non-target user, a wearable terminal, a TV near or in the room of the non-target user, or other places. Control to output a response to a non-target user from another voice UI system or the like.
- the information processing device 1 may display that the external device performs a response output on the sub display provided in the information processing device 1, or the mobile communication terminal or the wearable terminal responds “from here” Or the like may be output to notify the non-target user of the response terminal.
- simultaneous response to a plurality of users can be enabled in cooperation with an external device, and simultaneous use of the system by a plurality of users can be realized.
- the information processing apparatus 1 can also determine the response output method in accordance with the state of the speaker. For example, when the user is not near the information processing apparatus 1 and utters a voice from a distant place, there is a possibility that the voice output or display output from the information processing apparatus 1 may not be transmitted to the user. Therefore, the information processing apparatus 1 may determine the response output method in cooperation with an external device such as a mobile communication terminal or a wearable device owned by the user. Alternatively, the response content may be temporarily stored in the information processing device 1, and the response content may be output when the user moves within the effective range of the voice output or display output of the information processing device 1.
- the voice output and the display output of the information processing apparatus 1 are avoided from being occupied, and the voice output and the display output are It can be assigned to nearby non-target users.
- the information processing apparatus 1 can also determine the response output method according to the content of the response. For example, when the response has a large amount of information such as calendar display, the information processing apparatus 1 may preferentially assign the display output method to the response, and the voice output method may be available to other users. . Further, in the case of a simple confirmation item (the user's utterance "Yamanote line is not delayed, right?", The response of the information processing apparatus 1 is "Yes" only), the voice output is completed and the image display is unnecessary. Since the information processing apparatus 1 is used, the display output method may be made available to other users. In addition, when the user's utterance is only an instruction for display such as "show calendar", the information processing apparatus 1 may make the voice output method available to other users.
- the target user can perform display output and voice output. It is possible to avoid occupying all and realize simultaneous use of the system by multiple users.
- the information processing apparatus 1 may display an error when the number of simultaneous speakers allowed is exceeded.
- an example will be described with reference to FIG.
- FIG. 8 is a view showing an example of an error display according to the present embodiment.
- the information processing apparatus 1 that has recognized the speech 40 from the user AA outputs a voice of the response 41 and projects the response image 24 b.
- an utterance 42 from the user BB "Concert When?”
- An utterance 43 from the user CC "Show the TV program schedule!”
- An utterance 44 from the user DD "What's the news about today?" If the number of simultaneous speakers permitted by the information processing apparatus 1 (for example, two) is exceeded, the error image 24a is projected as shown in FIG.
- the error image 24a may include, for example, a content prompting the user to take measures to avoid an error, such as "Please request one by one!.
- a content prompting the user to take measures to avoid an error such as "Please request one by one!”.
- the information processing apparatus 1 may transfer the response content to a device or the like associated with each non-target user.
- a computer program for causing the hardware such as the CPU, the ROM, and the RAM built in the above-described information processing apparatus 1 to exhibit the functions of the information processing apparatus 1 can also be created.
- a computer readable storage medium storing the computer program is also provided.
- a response generation unit that generates a response to an utterance from a plurality of users;
- a determination unit that determines a response output method to each user based on the priority according to the utterance order of the plurality of users;
- An output control unit configured to control to output the generated response according to the determined response output method;
- An information processing apparatus comprising: (2) When the determination unit recognizes an utterance from a user different from the user in the dialog, the determination unit makes the priority of the user who has started the dialog earlier than the priority of the user who has started the dialog next (1 The information processing apparatus according to the above.
- the information processing apparatus determines, as a target user, one user with the highest priority as a target user, and one or more other users as non-target users.
- the determination unit causes the target user to occupy the response output method by voice and assigns the response output method by display to the non-target user.
- the response generation unit generates a response for prompting the non-target user to wait,
- the information processing apparatus according to (4), wherein the output control unit controls to display an image of a response prompting the non-target user to stand by.
- the response generation unit generates, for the non-target user, a response indicating a speech recognition result of the utterance of the non-target user.
- the information processing apparatus controls to display an image of a response indicating a speech recognition result of the utterance of the non-target user.
- the information processing apparatus controls to display an image of a response indicating a speech recognition result of the utterance of the non-target user.
- the information processing apparatus controls to display an image of a response indicating a speech recognition result of the utterance of the non-target user.
- the information processing apparatus controls to display an image of a response indicating a speech recognition result of the utterance of the non-target user.
- the output control unit performs control to specify a non-target user waiting for a response.
- the determination unit according to any one of (4) to (7), which causes the voice response output method occupied by the target user to transition to the non-target user after the interaction with the target user is ended.
- the determination unit assigns, to the non-target user, an output method of a response in cooperation with an external device.
- Information processor as described.
- the determination unit determines, as the non-target user, a response output method different from the response output method determined according to the content of the response to the target user.
- the determination unit assigns an audio output method to a non-target user when the response output method to the target user occupies display.
- the determination unit determines an output method of the response according to a state of the target user.
- the information processing apparatus (14) The information processing apparatus according to (13), wherein the determination unit assigns an output method of a response in cooperation with an external apparatus when the target user is at a place separated from the information processing apparatus 1 by a predetermined value or more. (15) The information processing apparatus according to any one of (2) to (14), wherein the determination unit changes the priority according to an explicit interrupt process. (16) The information processing apparatus according to (1), wherein the determination unit allocates, to a plurality of users, a response output method from an audio output unit having directivity. (17) The information processing apparatus according to any one of (1) to (16), wherein the output control unit performs error notification when an allowable number of speakers is exceeded based on a result of voice recognition.
Landscapes
- Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Computational Linguistics (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- User Interface Of Digital Computer (AREA)
- Telephonic Communication Services (AREA)
Abstract
Description
1.本開示の一実施形態による音声認識システムの概要
2.構成
3.動作処理
4.応答出力例
4-1.音声および表示による応答
4-2.指向性を用いた同時応答
4-3.外部装置との連携による応答
4-4.発話者の状態に応じた応答
4-5.発話内容に応じた応答
4-6.エラー応答
5.まとめ
本開示の一実施形態による音声認識システムは、ユーザの発話に対して音声認識・意味解析を行い、音声により応答を行う基本機能を有する。以下、図1を参照して本開示の一実施形態による音声認識システムの概要について説明する。
図2は、本実施形態による情報処理装置1の構成の一例を示す図である。図2に示すように、情報処理装置1は、制御部10、通信部11、マイクロホン12、スピーカ13、カメラ14、測距センサ15、投影部16、記憶部17、および発光部18を有する。
制御部10は、情報処理装置1の各構成を制御する。制御部10は、CPU(Central Processing Unit)、ROM(Read Only Memory)、RAM(Random Access Memory)、および不揮発性メモリを備えたマイクロコンピュータにより実現される。また、本実施形態による制御部10は、図2に示すように、音声認識部10a、意味解析部10b、応答生成部10c、ターゲット決定部10d、応答出力方法決定部10e、および出力制御部10fとしても機能する。
通信部11は、外部装置とデータの送受信を行う。例えば通信部11は、ネットワーク上の所定サーバと接続し、応答生成部10cによる応答生成に必要な情報を受信する。また、通信部11は、周辺の機器と連携し、出力制御部10fの制御に従って対象機器に応答データを送信する。
マイクロホン12は、周辺の音声を収音し、音声信号として制御部10に出力する機能を有する。また、マイクロホン12は、アレイマイクロホンにより実現されていてもよい。
スピーカ13は、出力制御部10fの制御に従って音声信号を音声に変換して出力する機能を有する。
カメラ14は、情報処理装置1に設けられた撮像レンズにより周辺を撮像し、撮像画像を制御部10に出力する機能を有する。また、カメラ14は、360度カメラまたは広角カメラ等により実現されてもよい。
測距センサ15は、情報処理装置1とユーザやユーザの周辺に居る人物との距離を測定する機能を有する。測距センサ15は、例えば光センサ(発光・受光タイミングの位相差情報に基づいて対象物までの距離を測定するセンサ)により実現される。
投影部16は、表示装置の一例であって、壁やスクリーンに画像を(拡大して)投影することで表示する機能を有する。
記憶部17は、情報処理装置1の各構成が機能するためのプログラムを格納する。また、記憶部17は、ターゲット決定部10dが複数ユーザの優先度を算出する際に用いる各種パラメータや、応答出力方法決定部10eが優先度(または優先度に基づいて決定されたターゲット/非ターゲット)に応じて出力方法を決定する際に用いる各種アルゴリズムを格納する。また、記憶部17は、ユーザの登録情報を格納する。ユーザの登録情報には、個人識別用情報(音声の特徴量、顔画像、人画像(身体画像を含む)の特徴量、氏名、識別番号等)、年齢、性別、趣味・嗜好、属性(主婦、会社員、学生等)、およびユーザが所有する通信端末に関する情報等が含まれる。
発光部18は、LED等の発光素子により実現され、全灯、一部点灯、点滅、または点灯位置の制御等が可能である。例えば発光部18は、制御部10の制御にしたがって音声認識部10aにより認識された発話者の方向を一部点灯することで、発話者の方向に視線を向けているように見せることができる。
次に、本実施形態による音声認識システムの動作処理について図3を参照して具体的に説明する。
続いて、本実施形態による複数ユーザに対する応答出力の一例について、図4~図8を参照して具体的に説明する。
図4は、本実施形態の複数人同時発話に対する音声および表示による応答出力例について説明する図である。図4左に示すように、情報処理装置1は、ユーザAAからの発話30に対して応答31を音声出力している際に、ユーザBBからの発話32を認識した場合、先に対話を開始したユーザAAをターゲットユーザに決定し、応答31の音声出力を継続する。一方、情報処理装置1は、後から発話したユーザBBを非ターゲットユーザに決定し、ユーザBBに対しては待機を促す応答画像21aを表示出力する。
例えばターゲットユーザにより表示応答出力も占有されている場合、情報処理装置1は、情報処理装置1に設けられているサブディスプレイや発光部18を用いて非ターゲットユーザに対して待機指示を出力することが可能である。
また、情報処理装置1は、複数ユーザに対するそれぞれの応答に一定の表示領域が必要である場合、表示領域が足りなくなるため、優先度の低い方への応答(すなわち非ターゲットユーザへの応答)を、アイコンやテキストにして表示し、表示領域を節約する。図6は、非ターゲットユーザへの応答をアイコンで示して表示領域を節約する例について説明する図である。図6に示すように、ユーザAAからの発話34「カレンダー見せて」を認識した情報処理装置1は、応答35「カレンダーですね」を出力し、壁20には対応するカレンダー画像22aを投影する。
次に、情報処理装置1は、スピーカ13が、波面合成のように特定位置のみに音場を生成することができる指向性を持つものである場合、複数ユーザに対して同時に音声応答出力することも可能である。図7は、指向性を用いた音声による同時応答について説明する図である。
また、情報処理装置1は、外部装置と連携し、非ターゲットユーザに対する応答を外部装置から行うよう制御することも可能である。例えばターゲットユーザにより音声および表示応答出力が占有されている場合、情報処理装置1は、非ターゲットユーザが所有する携帯通信端末、ウェアラブル端末、付近や非ターゲットユーザの自室にあるTV、または他の場所にある他の音声UIシステム等から非ターゲットユーザに対する応答を出力するよう制御する。この際、情報処理装置1は、情報処理装置1に設けられているサブディスプレイに外部装置から応答出力を行う旨を表示してもよいし、携帯通信端末やウェアラブル端末から「こちらから応答します」等の音声を出力させたりして応答端末を非ターゲットユーザに通知してもよい。
また、本実施形態による情報処理装置1は、発話者の状態に応じて応答出力方法を決定することも可能である。例えばユーザが情報処理装置1の近くにはおらず、少し離れた所から大声で発話した場合、情報処理装置1からの音声出力や表示出力がユーザまで伝わらない可能性がある。そのため、情報処理装置1は、ユーザが所有する携帯通信端末やウェアラブル装置等、外部装置との連携による応答出力方法に決定してもよい。また、応答内容を情報処理装置1内に一時的に記憶させ、ユーザが情報処理装置1の音声出力や表示出力の有効範囲内に移動した場合に、応答内容を出力するようにしてもよい。
また、本実施形態による情報処理装置1は、応答内容に応じて応答出力方法を決定することも可能である。例えば応答がカレンダー表示といったように情報量が多いものである場合、情報処理装置1は、当該応答には表示出力方法を優先的に割り当てて、音声出力方法は他のユーザが利用可能としてもよい。また、簡単な確認事項(ユーザの発話「山手線遅れてないよね?」に対して、情報処理装置1の応答が「はい」のみ)の場合、音声出力だけで完結し、画像表示は不要であるため、情報処理装置1は、表示出力方法は他のユーザが利用可能としてもよい。また、ユーザの発話が「カレンダー見せて」といったように表示に対する指示だけである場合、情報処理装置1は、音声出力方法は他のユーザが利用可能としてもよい。
また、本実施形態による情報処理装置1は、許容する同時発話者数を超えた場合、エラー表示を行ってもよい。以下、図8を参照して一例を説明する。
上述したように、本開示の実施形態による音声認識システムでは、例えば発話順に応じて音声応答出力の占有を遷移させることで、複数ユーザによるシステムの同時利用を実現し、音声認識システムの利便性を向上することが可能となる。
(1)
複数ユーザからの発話に対して応答を生成する応答生成部と、
前記複数ユーザの発話順に応じた優先度に基づいて各ユーザへの応答出力方法を決定する決定部と、
前記決定された応答出力方法で前記生成された応答を出力するよう制御する出力制御部と、
を備える、情報処理装置。
(2)
前記決定部は、対話中のユーザと異なるユーザからの発話を認識した場合、先に対話を開始したユーザの優先度を、次に対話を開始したユーザの優先度よりも高くする、前記(1)に記載の情報処理装置。
(3)
前記決定部は、最も優先度が高い1のユーザをターゲットユーザ、他の1以上のユーザを非ターゲットユーザに決定する、前記(2)に記載の情報処理装置。
(4)
前記決定部は、音声による応答出力方法を前記ターゲットユーザに占有させ、前記非ターゲットユーザには表示による応答出力方法を割り当てる、前記(3)に記載の情報処理装置。
(5)
前記応答生成部は、前記非ターゲットユーザに対して、待機を促す応答を生成し、
前記出力制御部は、前記非ターゲットユーザに対して待機を促す応答の画像を表示するよう制御する、前記(4)に記載の情報処理装置。
(6)
前記応答生成部は、前記非ターゲットユーザに対して、前記非ターゲットユーザの発話の音声認識結果を示す応答を生成し、
前記出力制御部は、前記非ターゲットユーザの発話の音声認識結果を示す応答の画像を表示するよう制御する、前記(5)に記載の情報処理装置。
(7)
前記出力制御部は、応答待ちの非ターゲットユーザを明示するよう制御する、前記(4)~(6)のいずれか1項に記載の情報処理装置。
(8)
前記決定部は、前記ターゲットユーザとの対話が終了した後、ターゲットユーザに占有させていた音声応答出力方法を非ターゲットユーザに遷移させる、前記(4)~(7)のいずれか1項に記載の情報処理装置。
(9)
前記表示による応答出力は、投影による表示である、前記(4)~(8)のいずれか1項に記載の情報処理装置。
(10)
前記決定部は、前記ターゲットユーザが表示による出力方法と音声による出力方法を占有している場合、前記非ターゲットユーザには、外部装置との連携による応答の出力方法を割り当てる、前記(3)に記載の情報処理装置。
(11)
前記決定部は、前記ターゲットユーザへの応答内容に応じて決定された応答出力方法と異なる応答出力方法を前記非ターゲットユーザに決定する、前記(3)に記載の情報処理装置。
(12)
前記決定部は、前記ターゲットユーザへの応答出力方法が表示を占有するものである場合、非ターゲットユーザには音声による出力方法を割り当てる、前記(11)に記載の情報処理装置。
(13)
前記決定部は、前記ターゲットユーザの状態に応じて応答の出力方法を決定する、前記(3)に記載の情報処理装置。
(14)
前記決定部は、前記ターゲットユーザが前記情報処理装置1から所定値以上離れた場所に居る場合、外部装置との連携による応答の出力方法を割り当てる、前記(13)に記載の情報処理装置。
(15)
前記決定部は、明示的な割込み処理に応じて優先度を変更する、前記(2)~(14)のいずれか1項に記載の情報処理装置。
(16)
前記決定部は、指向性を有する音声出力部からの応答出力方法を複数ユーザに対して割り当てる、前記(1)に記載の情報処理装置。
(17)
前記出力制御部は、音声認識結果に基づいて許容する発話者数を超えた場合、エラー通知を行うよう制御する、前記(1)~(16)のいずれか1項に記載の情報処理装置。
(18)
複数ユーザからの発話に対して応答を生成することと、
前記複数ユーザの発話順に応じた優先度に基づいて各ユーザへの応答出力方法を決定することと、
前記決定された応答出力方法で前記生成された応答を出力するよう出力制御部により制御することと、
を含む制御方法。
(19)
コンピュータを、
複数ユーザからの発話に対して応答を生成する応答生成部と、
前記複数ユーザの発話順に応じた優先度に基づいて各ユーザへの応答出力方法を決定する決定部と、
前記決定された応答出力方法で前記生成された応答を出力するよう制御する出力制御部と、
として機能させるためのプログラム。
10 制御部
10a 音声認識部
10b 意味解析部
10c 応答生成部
10d ターゲット決定部
10e 応答出力方法決定部
10f 出力制御部
11 通信部
12 マイクロホン
13 スピーカ
14 カメラ
15 測距センサ
16 投影部
17 記憶部
18 発光部
19 サブディスプレイ
20 壁
Claims (19)
- 複数ユーザからの発話に対して応答を生成する応答生成部と、
前記複数ユーザの発話順に応じた優先度に基づいて各ユーザへの応答出力方法を決定する決定部と、
前記決定された応答出力方法で前記生成された応答を出力するよう制御する出力制御部と、
を備える、情報処理装置。 - 前記決定部は、対話中のユーザと異なるユーザからの発話を認識した場合、先に対話を開始したユーザの優先度を、次に対話を開始したユーザの優先度よりも高くする、請求項1に記載の情報処理装置。
- 前記決定部は、最も優先度が高い1のユーザをターゲットユーザ、他の1以上のユーザを非ターゲットユーザに決定する、請求項2に記載の情報処理装置。
- 前記決定部は、音声による応答出力方法を前記ターゲットユーザに占有させ、前記非ターゲットユーザには表示による応答出力方法を割り当てる、請求項3に記載の情報処理装置。
- 前記応答生成部は、前記非ターゲットユーザに対して、待機を促す応答を生成し、
前記出力制御部は、前記非ターゲットユーザに対して待機を促す応答の画像を表示するよう制御する、請求項4に記載の情報処理装置。 - 前記応答生成部は、前記非ターゲットユーザに対して、前記非ターゲットユーザの発話の音声認識結果を示す応答を生成し、
前記出力制御部は、前記非ターゲットユーザの発話の音声認識結果を示す応答の画像を表示するよう制御する、請求項5に記載の情報処理装置。 - 前記出力制御部は、応答待ちの非ターゲットユーザを明示するよう制御する、請求項4に記載の情報処理装置。
- 前記決定部は、前記ターゲットユーザとの対話が終了した後、ターゲットユーザに占有させていた音声応答出力方法を非ターゲットユーザに遷移させる、請求項4に記載の情報処理装置。
- 前記表示による応答出力は、投影による表示である、請求項4に記載の情報処理装置。
- 前記決定部は、前記ターゲットユーザが表示による出力方法と音声による出力方法を占有している場合、前記非ターゲットユーザには、外部装置との連携による応答の出力方法を割り当てる、請求項3に記載の情報処理装置。
- 前記決定部は、前記ターゲットユーザへの応答内容に応じて決定された応答出力方法と異なる応答出力方法を前記非ターゲットユーザに決定する、請求項3に記載の情報処理装置。
- 前記決定部は、前記ターゲットユーザへの応答出力方法が表示を占有するものである場合、非ターゲットユーザには音声による出力方法を割り当てる、請求項11に記載の情報処理装置。
- 前記決定部は、前記ターゲットユーザの状態に応じて応答の出力方法を決定する、請求項3に記載の情報処理装置。
- 前記決定部は、前記ターゲットユーザが前記情報処理装置1から所定値以上離れた場所に居る場合、外部装置との連携による応答の出力方法を割り当てる、請求項13に記載の情報処理装置。
- 前記決定部は、明示的な割込み処理に応じて優先度を変更する、請求項2に記載の情報処理装置。
- 前記決定部は、指向性を有する音声出力部からの応答出力方法を複数ユーザに対して割り当てる、請求項1に記載の情報処理装置。
- 前記出力制御部は、音声認識結果に基づいて許容する発話者数を超えた場合、エラー通知を行うよう制御する、請求項1に記載の情報処理装置。
- 複数ユーザからの発話に対して応答を生成することと、
前記複数ユーザの発話順に応じた優先度に基づいて各ユーザへの応答出力方法を決定することと、
前記決定された応答出力方法で前記生成された応答を出力するよう出力制御部により制御することと、
を含む制御方法。 - コンピュータを、
複数ユーザからの発話に対して応答を生成する応答生成部と、
前記複数ユーザの発話順に応じた優先度に基づいて各ユーザへの応答出力方法を決定する決定部と、
前記決定された応答出力方法で前記生成された応答を出力するよう制御する出力制御部と、
として機能させるためのプログラム。
Priority Applications (4)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2017509182A JP6669162B2 (ja) | 2015-03-31 | 2015-12-28 | 情報処理装置、制御方法、およびプログラム |
EP15887804.1A EP3279790B1 (en) | 2015-03-31 | 2015-12-28 | Information processing device, control method, and program |
CN201580078175.7A CN107408027B (zh) | 2015-03-31 | 2015-12-28 | 信息处理设备、控制方法及程序 |
US15/559,940 US20180074785A1 (en) | 2015-03-31 | 2015-12-28 | Information processing device, control method, and program |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2015073896 | 2015-03-31 | ||
JP2015-073896 | 2015-03-31 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2016157662A1 true WO2016157662A1 (ja) | 2016-10-06 |
Family
ID=57005865
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2015/086544 WO2016157662A1 (ja) | 2015-03-31 | 2015-12-28 | 情報処理装置、制御方法、およびプログラム |
Country Status (5)
Country | Link |
---|---|
US (1) | US20180074785A1 (ja) |
EP (1) | EP3279790B1 (ja) |
JP (1) | JP6669162B2 (ja) |
CN (1) | CN107408027B (ja) |
WO (1) | WO2016157662A1 (ja) |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2018148254A (ja) * | 2017-03-01 | 2018-09-20 | 大和ハウス工業株式会社 | インターフェースユニット |
WO2019100738A1 (zh) * | 2017-11-24 | 2019-05-31 | 科大讯飞股份有限公司 | 多人参与的人机交互方法及装置 |
JP2019101264A (ja) * | 2017-12-04 | 2019-06-24 | シャープ株式会社 | 外部制御装置、音声対話型制御システム、制御方法、およびプログラム |
WO2019130399A1 (ja) * | 2017-12-25 | 2019-07-04 | 三菱電機株式会社 | 音声認識装置、音声認識システム及び音声認識方法 |
WO2019142420A1 (ja) * | 2018-01-22 | 2019-07-25 | ソニー株式会社 | 情報処理装置および情報処理方法 |
CN110313153A (zh) * | 2017-02-14 | 2019-10-08 | 微软技术许可有限责任公司 | 智能数字助理系统 |
US20190369936A1 (en) * | 2017-07-20 | 2019-12-05 | Apple Inc. | Electronic Device With Sensors and Display Devices |
JP2020003926A (ja) * | 2018-06-26 | 2020-01-09 | 株式会社日立製作所 | 対話システムの制御方法、対話システム及びプログラム |
WO2020017165A1 (ja) * | 2018-07-20 | 2020-01-23 | ソニー株式会社 | 情報処理装置、情報処理システム、および情報処理方法、並びにプログラム |
JP2020505643A (ja) * | 2017-02-15 | 2020-02-20 | ▲騰▼▲訊▼科技(深▲セン▼)有限公司 | 音声認識方法、電子機器、及びコンピュータ記憶媒体 |
EP3567470A4 (en) * | 2017-11-07 | 2020-03-25 | Sony Corporation | INFORMATION PROCESSING DEVICE AND ELECTRONIC APPARATUS |
KR20210002648A (ko) * | 2018-05-04 | 2021-01-08 | 구글 엘엘씨 | 사용자와 자동화된 어시스턴트 인터페이스 간의 거리에 따른 자동화된 어시스턴트 콘텐츠의 생성 및/또는 적용 |
JP2021018664A (ja) * | 2019-07-22 | 2021-02-15 | Tis株式会社 | 情報処理システム、情報処理方法、及びプログラム |
EP3726355A4 (en) * | 2017-12-15 | 2021-02-17 | Sony Corporation | INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING PROCESS AND RECORDING MEDIA |
US11010601B2 (en) | 2017-02-14 | 2021-05-18 | Microsoft Technology Licensing, Llc | Intelligent assistant device communicating non-verbal cues |
US11100384B2 (en) | 2017-02-14 | 2021-08-24 | Microsoft Technology Licensing, Llc | Intelligent device user interactions |
JP2021123215A (ja) * | 2020-02-04 | 2021-08-30 | 株式会社デンソーテン | 表示装置および表示装置の制御方法 |
JP2021135363A (ja) * | 2020-02-26 | 2021-09-13 | 株式会社サイバーエージェント | 制御システム、制御装置、制御方法及びコンピュータプログラム |
JP2021533510A (ja) * | 2018-01-30 | 2021-12-02 | ティントーク ホールディング(ケイマン)リミティド | 相互作用の方法及び装置 |
WO2021251107A1 (ja) * | 2020-06-11 | 2021-12-16 | ソニーグループ株式会社 | 情報処理装置、情報処理システム、および情報処理方法、並びにプログラム |
WO2023090057A1 (ja) * | 2021-11-17 | 2023-05-25 | ソニーグループ株式会社 | 情報処理装置、情報処理方法および情報処理プログラム |
Families Citing this family (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10438584B2 (en) * | 2017-04-07 | 2019-10-08 | Google Llc | Multi-user virtual assistant for verbal device control |
KR101949497B1 (ko) * | 2017-05-02 | 2019-02-18 | 네이버 주식회사 | 사용자 발화의 표현법을 파악하여 기기의 동작이나 컨텐츠 제공 범위를 조정하여 제공하는 사용자 명령 처리 방법 및 시스템 |
US10628570B2 (en) * | 2017-05-15 | 2020-04-21 | Fmr Llc | Protection of data in a zero user interface environment |
US11222060B2 (en) * | 2017-06-16 | 2022-01-11 | Hewlett-Packard Development Company, L.P. | Voice assistants with graphical image responses |
US11178280B2 (en) * | 2017-06-20 | 2021-11-16 | Lenovo (Singapore) Pte. Ltd. | Input during conversational session |
CN107564517A (zh) * | 2017-07-05 | 2018-01-09 | 百度在线网络技术(北京)有限公司 | 语音唤醒方法、设备及系统、云端服务器与可读介质 |
US10475454B2 (en) * | 2017-09-18 | 2019-11-12 | Motorola Mobility Llc | Directional display and audio broadcast |
CN108600911B (zh) | 2018-03-30 | 2021-05-18 | 联想(北京)有限公司 | 一种输出方法及电子设备 |
CN108665900B (zh) | 2018-04-23 | 2020-03-03 | 百度在线网络技术(北京)有限公司 | 云端唤醒方法及系统、终端以及计算机可读存储介质 |
KR20190133100A (ko) * | 2018-05-22 | 2019-12-02 | 삼성전자주식회사 | 어플리케이션을 이용하여 음성 입력에 대한 응답을 출력하는 전자 장치 및 그 동작 방법 |
CN109117737A (zh) * | 2018-07-19 | 2019-01-01 | 北京小米移动软件有限公司 | 洗手机的控制方法、装置和存储介质 |
CN110874201B (zh) * | 2018-08-29 | 2023-06-23 | 斑马智行网络(香港)有限公司 | 交互方法、设备、存储介质和操作系统 |
US10971160B2 (en) * | 2018-11-13 | 2021-04-06 | Comcast Cable Communications, Llc | Methods and systems for determining a wake word |
CN113260953A (zh) * | 2019-01-07 | 2021-08-13 | 索尼集团公司 | 信息处理设备与信息处理方法 |
CN109841207A (zh) * | 2019-03-01 | 2019-06-04 | 深圳前海达闼云端智能科技有限公司 | 一种交互方法及机器人、服务器和存储介质 |
EP3723354B1 (en) * | 2019-04-09 | 2021-12-22 | Sonova AG | Prioritization and muting of speakers in a hearing device system |
KR20210042520A (ko) * | 2019-10-10 | 2021-04-20 | 삼성전자주식회사 | 전자 장치 및 이의 제어 방법 |
CN110992971A (zh) * | 2019-12-24 | 2020-04-10 | 达闼科技成都有限公司 | 一种语音增强方向的确定方法、电子设备及存储介质 |
US11128636B1 (en) * | 2020-05-13 | 2021-09-21 | Science House LLC | Systems, methods, and apparatus for enhanced headsets |
KR20220000182A (ko) * | 2020-06-25 | 2022-01-03 | 현대자동차주식회사 | 차량용 다중 대화 모드 지원 방법 및 시스템 |
CN112863511B (zh) * | 2021-01-15 | 2024-06-04 | 北京小米松果电子有限公司 | 信号处理方法、装置以及存储介质 |
CN113763968B (zh) * | 2021-09-08 | 2024-05-07 | 北京百度网讯科技有限公司 | 用于识别语音的方法、装置、设备、介质和产品 |
CN115017280A (zh) * | 2022-05-17 | 2022-09-06 | 美的集团(上海)有限公司 | 对话管理方法及装置 |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH01216398A (ja) * | 1988-02-25 | 1989-08-30 | Toshiba Corp | 音声認識方式 |
JP2006243555A (ja) * | 2005-03-04 | 2006-09-14 | Nec Corp | 対応決定システム、ロボット、イベント出力サーバ、および対応決定方法 |
Family Cites Families (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6882974B2 (en) * | 2002-02-15 | 2005-04-19 | Sap Aktiengesellschaft | Voice-control for a user interface |
CN101282380B (zh) * | 2007-04-02 | 2012-04-18 | 中国电信股份有限公司 | 一名通业务呼叫接续方法、服务器和通信系统 |
CN101291469B (zh) * | 2008-06-02 | 2011-06-29 | 中国联合网络通信集团有限公司 | 语音被叫业务和主叫业务实现方法 |
KR20140004515A (ko) * | 2012-07-03 | 2014-01-13 | 삼성전자주식회사 | 디스플레이 장치, 대화형 시스템 및 응답 정보 제공 방법 |
US9576574B2 (en) * | 2012-09-10 | 2017-02-21 | Apple Inc. | Context-sensitive handling of interruptions by intelligent digital assistant |
US9098467B1 (en) * | 2012-12-19 | 2015-08-04 | Rawles Llc | Accepting voice commands based on user identity |
US9747896B2 (en) * | 2014-10-15 | 2017-08-29 | Voicebox Technologies Corporation | System and method for providing follow-up responses to prior natural language inputs of a user |
-
2015
- 2015-12-28 US US15/559,940 patent/US20180074785A1/en not_active Abandoned
- 2015-12-28 WO PCT/JP2015/086544 patent/WO2016157662A1/ja active Application Filing
- 2015-12-28 CN CN201580078175.7A patent/CN107408027B/zh not_active Expired - Fee Related
- 2015-12-28 EP EP15887804.1A patent/EP3279790B1/en active Active
- 2015-12-28 JP JP2017509182A patent/JP6669162B2/ja active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH01216398A (ja) * | 1988-02-25 | 1989-08-30 | Toshiba Corp | 音声認識方式 |
JP2006243555A (ja) * | 2005-03-04 | 2006-09-14 | Nec Corp | 対応決定システム、ロボット、イベント出力サーバ、および対応決定方法 |
Cited By (44)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10984782B2 (en) | 2017-02-14 | 2021-04-20 | Microsoft Technology Licensing, Llc | Intelligent digital assistant system |
US11100384B2 (en) | 2017-02-14 | 2021-08-24 | Microsoft Technology Licensing, Llc | Intelligent device user interactions |
US11017765B2 (en) | 2017-02-14 | 2021-05-25 | Microsoft Technology Licensing, Llc | Intelligent assistant with intent-based information resolution |
US10957311B2 (en) | 2017-02-14 | 2021-03-23 | Microsoft Technology Licensing, Llc | Parsers for deriving user intents |
US11010601B2 (en) | 2017-02-14 | 2021-05-18 | Microsoft Technology Licensing, Llc | Intelligent assistant device communicating non-verbal cues |
CN110313153A (zh) * | 2017-02-14 | 2019-10-08 | 微软技术许可有限责任公司 | 智能数字助理系统 |
US11126825B2 (en) | 2017-02-14 | 2021-09-21 | Microsoft Technology Licensing, Llc | Natural language interaction for smart assistant |
US11004446B2 (en) | 2017-02-14 | 2021-05-11 | Microsoft Technology Licensing, Llc | Alias resolving intelligent assistant computing device |
US11194998B2 (en) | 2017-02-14 | 2021-12-07 | Microsoft Technology Licensing, Llc | Multi-user intelligent assistance |
CN110313153B (zh) * | 2017-02-14 | 2021-09-21 | 微软技术许可有限责任公司 | 智能数字助理系统 |
JP2020505643A (ja) * | 2017-02-15 | 2020-02-20 | ▲騰▼▲訊▼科技(深▲セン▼)有限公司 | 音声認識方法、電子機器、及びコンピュータ記憶媒体 |
JP2018148254A (ja) * | 2017-03-01 | 2018-09-20 | 大和ハウス工業株式会社 | インターフェースユニット |
US11609603B2 (en) | 2017-07-20 | 2023-03-21 | Apple Inc. | Electronic device with sensors and display devices |
US11150692B2 (en) * | 2017-07-20 | 2021-10-19 | Apple Inc. | Electronic device with sensors and display devices |
US20190369936A1 (en) * | 2017-07-20 | 2019-12-05 | Apple Inc. | Electronic Device With Sensors and Display Devices |
EP3567470A4 (en) * | 2017-11-07 | 2020-03-25 | Sony Corporation | INFORMATION PROCESSING DEVICE AND ELECTRONIC APPARATUS |
JPWO2019093123A1 (ja) * | 2017-11-07 | 2020-09-24 | ソニー株式会社 | 情報処理装置および電子機器 |
JP7215417B2 (ja) | 2017-11-07 | 2023-01-31 | ソニーグループ株式会社 | 情報処理装置、情報処理方法、およびプログラム |
WO2019100738A1 (zh) * | 2017-11-24 | 2019-05-31 | 科大讯飞股份有限公司 | 多人参与的人机交互方法及装置 |
JP2019101264A (ja) * | 2017-12-04 | 2019-06-24 | シャープ株式会社 | 外部制御装置、音声対話型制御システム、制御方法、およびプログラム |
EP3726355A4 (en) * | 2017-12-15 | 2021-02-17 | Sony Corporation | INFORMATION PROCESSING DEVICE, INFORMATION PROCESSING PROCESS AND RECORDING MEDIA |
US11221684B2 (en) | 2017-12-15 | 2022-01-11 | Sony Corporation | Information processing device, information processing method, and recording medium |
JPWO2019130399A1 (ja) * | 2017-12-25 | 2020-04-23 | 三菱電機株式会社 | 音声認識装置、音声認識システム及び音声認識方法 |
WO2019130399A1 (ja) * | 2017-12-25 | 2019-07-04 | 三菱電機株式会社 | 音声認識装置、音声認識システム及び音声認識方法 |
US11935449B2 (en) | 2018-01-22 | 2024-03-19 | Sony Corporation | Information processing apparatus and information processing method |
WO2019142420A1 (ja) * | 2018-01-22 | 2019-07-25 | ソニー株式会社 | 情報処理装置および情報処理方法 |
JP2021533510A (ja) * | 2018-01-30 | 2021-12-02 | ティントーク ホールディング(ケイマン)リミティド | 相互作用の方法及び装置 |
JP7081045B2 (ja) | 2018-05-04 | 2022-06-06 | グーグル エルエルシー | ユーザと自動化されたアシスタントインターフェースとの間の距離に応じて自動化されたアシスタントのコンテンツを生成するおよび/または適応させること |
JP2021522636A (ja) * | 2018-05-04 | 2021-08-30 | グーグル エルエルシーGoogle LLC | ユーザと自動化されたアシスタントインターフェースとの間の距離に応じて自動化されたアシスタントのコンテンツを生成するおよび/または適応させること |
US11789522B2 (en) | 2018-05-04 | 2023-10-17 | Google Llc | Generating and/or adapting automated assistant content according to a distance between user(s) and an automated assistant interface |
KR102574277B1 (ko) * | 2018-05-04 | 2023-09-04 | 구글 엘엘씨 | 사용자와 자동화된 어시스턴트 인터페이스 간의 거리에 따른 자동화된 어시스턴트 콘텐츠의 생성 및/또는 적용 |
KR20210002648A (ko) * | 2018-05-04 | 2021-01-08 | 구글 엘엘씨 | 사용자와 자동화된 어시스턴트 인터페이스 간의 거리에 따른 자동화된 어시스턴트 콘텐츠의 생성 및/또는 적용 |
JP2020003926A (ja) * | 2018-06-26 | 2020-01-09 | 株式会社日立製作所 | 対話システムの制御方法、対話システム及びプログラム |
US11189270B2 (en) | 2018-06-26 | 2021-11-30 | Hitachi, Ltd. | Method of controlling dialogue system, dialogue system, and data storage medium |
WO2020017165A1 (ja) * | 2018-07-20 | 2020-01-23 | ソニー株式会社 | 情報処理装置、情報処理システム、および情報処理方法、並びにプログラム |
US20210319790A1 (en) * | 2018-07-20 | 2021-10-14 | Sony Corporation | Information processing device, information processing system, information processing method, and program |
US12118991B2 (en) * | 2018-07-20 | 2024-10-15 | Sony Corporation | Information processing device, information processing system, and information processing method |
JP7258686B2 (ja) | 2019-07-22 | 2023-04-17 | Tis株式会社 | 情報処理システム、情報処理方法、及びプログラム |
JP2021018664A (ja) * | 2019-07-22 | 2021-02-15 | Tis株式会社 | 情報処理システム、情報処理方法、及びプログラム |
JP2021123215A (ja) * | 2020-02-04 | 2021-08-30 | 株式会社デンソーテン | 表示装置および表示装置の制御方法 |
JP7474058B2 (ja) | 2020-02-04 | 2024-04-24 | 株式会社デンソーテン | 表示装置および表示装置の制御方法 |
JP2021135363A (ja) * | 2020-02-26 | 2021-09-13 | 株式会社サイバーエージェント | 制御システム、制御装置、制御方法及びコンピュータプログラム |
WO2021251107A1 (ja) * | 2020-06-11 | 2021-12-16 | ソニーグループ株式会社 | 情報処理装置、情報処理システム、および情報処理方法、並びにプログラム |
WO2023090057A1 (ja) * | 2021-11-17 | 2023-05-25 | ソニーグループ株式会社 | 情報処理装置、情報処理方法および情報処理プログラム |
Also Published As
Publication number | Publication date |
---|---|
EP3279790B1 (en) | 2020-11-11 |
US20180074785A1 (en) | 2018-03-15 |
CN107408027B (zh) | 2020-07-28 |
CN107408027A (zh) | 2017-11-28 |
EP3279790A4 (en) | 2018-12-19 |
EP3279790A1 (en) | 2018-02-07 |
JPWO2016157662A1 (ja) | 2018-01-25 |
JP6669162B2 (ja) | 2020-03-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2016157662A1 (ja) | 情報処理装置、制御方法、およびプログラム | |
JP6669073B2 (ja) | 情報処理装置、制御方法、およびプログラム | |
US10572073B2 (en) | Information processing device, information processing method, and program | |
US10776070B2 (en) | Information processing device, control method, and program | |
JP6516585B2 (ja) | 制御装置、その方法及びプログラム | |
JP6739907B2 (ja) | 機器特定方法、機器特定装置及びプログラム | |
KR102551715B1 (ko) | Iot 기반 알림을 생성 및 클라이언트 디바이스(들)의 자동화된 어시스턴트 클라이언트(들)에 의해 iot 기반 알림을 자동 렌더링하게 하는 명령(들)의 제공 | |
EP3419020B1 (en) | Information processing device, information processing method and program | |
WO2019107145A1 (ja) | 情報処理装置、及び情報処理方法 | |
US11373650B2 (en) | Information processing device and information processing method | |
CN115605948B (zh) | 在多个潜在响应的电子装置之间的仲裁 | |
CN112106016A (zh) | 信息处理装置、信息处理方法和记录介质 | |
AU2024200648A1 (en) | Assistant device arbitration using wearable device data | |
JP6973380B2 (ja) | 情報処理装置、および情報処理方法 | |
WO2018139036A1 (ja) | 情報処理装置、情報処理方法およびプログラム | |
JP6950708B2 (ja) | 情報処理装置、情報処理方法、および情報処理システム | |
WO2018139050A1 (ja) | 情報処理装置、情報処理方法およびプログラム | |
JP2020061050A (ja) | コミュニケーションシステム |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 15887804 Country of ref document: EP Kind code of ref document: A1 |
|
DPE2 | Request for preliminary examination filed before expiration of 19th month from priority date (pct application filed from 20040101) | ||
ENP | Entry into the national phase |
Ref document number: 2017509182 Country of ref document: JP Kind code of ref document: A |
|
WWE | Wipo information: entry into national phase |
Ref document number: 15559940 Country of ref document: US |
|
REEP | Request for entry into the european phase |
Ref document number: 2015887804 Country of ref document: EP |
|
NENP | Non-entry into the national phase |
Ref country code: DE |