WO2020226748A1 - Noise reduction in robot human communication - Google Patents
Noise reduction in robot human communication Download PDFInfo
- Publication number
- WO2020226748A1 WO2020226748A1 PCT/US2020/023042 US2020023042W WO2020226748A1 WO 2020226748 A1 WO2020226748 A1 WO 2020226748A1 US 2020023042 W US2020023042 W US 2020023042W WO 2020226748 A1 WO2020226748 A1 WO 2020226748A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- robot
- gesture
- noise
- incoming audio
- profile
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Ceased
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J11/00—Manipulators not otherwise provided for
- B25J11/0005—Manipulators having means for high-level communication with users, e.g. speech generator, face recognition means
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J11/00—Manipulators not otherwise provided for
- B25J11/003—Manipulators for entertainment
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J19/00—Accessories fitted to manipulators, e.g. for monitoring, for viewing; Safety devices combined with or specially adapted for use in connection with manipulators
- B25J19/02—Sensing devices
- B25J19/026—Acoustical sensing devices
-
- B—PERFORMING OPERATIONS; TRANSPORTING
- B25—HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
- B25J—MANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
- B25J9/00—Program-controlled manipulators
- B25J9/16—Program controls
- B25J9/1602—Program controls characterised by the control system, structure, architecture
Definitions
- the present disclosure relates generally to robot-human communication, and more particularly, to noise reduction in robot human communication.
- a robot is generally an electro-mechanical machine guided by a computer or electronic programming. Robots may be used in a wide variety of applications and are often thought of in the context of their use in industrial applications. Recently, the use of robots in the field of human-robot interaction has increased, and the quality of the human- robot interaction may be influenced by a number of factors, such as the ability of the robot to recognize utterances spoken by the user and the ability of the robot to interpret the utterance and response in an appropriate manner.
- a method for noise reduction in a robot system includes: obtaining a gesture to be performed by a robot; receiving incoming audio that includes audio from a user and robot noise caused by the robot’s performance of a gesture; retrieving a noise profile associated with the gesture from a gesture library; and applying the noise profile to remove the robot noise from the incoming audio.
- the gesture library comprises a plurality of predetermined gestures that the robot may be expected to perform.
- Each of the predetermined gestures is paired to a noise profile for removing robot noise in the event that incoming audio that includes user audio is received while the robot is performing the gesture.
- an apparatus for noise reduction in a robot system comprises a processor and a memory coupled to the processor and storing instructions for execution by the process.
- the instructions when executed by the process cause the apparatus to: obtain a gesture to be performed by the robot; receive incoming audio that includes audio from a user and robot noise caused by the robot’s performance of the gesture; retrieve a noise profile associated with the gesture from a gesture library, and apply the noise profile to remove the robot noise from the incoming audio.
- a computer readable medium comprises computer-executable instructions which, when executed by a computer, cause
- the computer to perform a method for noise reduction in a robot system in which a robot performs a gesture.
- the method comprises receiving an indication that the robot is performing the gesture; receiving incoming audio, the incoming audio including a user utterance mixed with mechanical robot noise caused by the robot’s performance of the gesture; retrieving, from a gesture library comprising a plurality of predetermined gestures paired with noise profiles, a noise profile associated with the gesture; and applying the noise profile to the incoming audio to remove the mechanical robot noise caused by the robot’s performance of the gesture.
- FIG. 1 is a schematic diagram illustrating a robot system in which example implementations of the subject matter described herein can be implemented.
- FIG. 2 illustrates a flow chart of a method for noise reduction in a robot system according to embodiments of the present disclosure.
- FIG. 3 A illustrates a schematic diagram of a type of a symbolic representation of a gesture according to embodiments of the present disclosure.
- FIG. 3B illustrates example symbols of body parts of a robot.
- FIG. 4 is a schematic diagram illustrating the creation of gesture-noise profile pairs for a gesture library according to embodiments of the present disclosure.
- FIG. 5 is a high level illustration of exemplary components of a computing apparatus suitable for implementing noise reduction in a robot system according to embodiments of the present disclosure.
- FIG. 1 illustrates a schematic diagram of a robot system 400 according to embodiments of the present disclosure.
- the robot system 400 generally includes a robot 100, an apparatus 10, and a server 300.
- the apparatus 10 may control the robot to perform various gestures by, for example, sending commands to the robot 100 to control motors/actuators 110 that orient the robot’s 100 body parts in a particular manner.
- the robot 100 may, for example, be a chatting robot whose gestures accompany utterances spoken by the robot 100 to provide a more natural, comprehensive and effective communication environment between the user 50 and the robot 100.
- the user 50 may interact with the robot 100 by delivering a message through speech/utterance or other expression.
- Incoming audio that includes the user’s 50 utterance is received by the robot through a microphone 30, which may or may not be embedded within the robot 200.
- the server 300 may include a voice recognition module 310 for processing the user’s utterance.
- the server 300 may be in the form of a cloud-based computer, for example, a chatting intelligence with voice recognition capabilities for the case of a chatting robot interacting with a user’s 50 speech/utterance.
- the apparatus 10 is capable of controlling the robot 100 to perform a predetermined number of different gestures.
- the apparatus 10 receives processed information from the server 300 and interprets the processed information to control the robot 100 to perform a particular gesture.
- the apparatus 10 includes a movement control module 14 that receives processed information from the server 300 and generates commands that control the robot 100 to move one or more robot body parts in a particular orientation to perform the gesture.
- the commands may, for example, be a series of joint angles that instruct the robot 100 how to orient its moving body parts
- the robot 100 receives the commands from the apparatus 10 and executes the commands to perform the gesture by operating a plurality of motors/actuators 110.
- the motors/actuators 110 orient the robot body parts in the manner instructed by the apparatus 100.
- the robot 100 may have movement control capabilities beyond those involved with performing a gesture from among the predetermined number of different gestures.
- the robot 100 may have balancing capabilities in the event of unexpected movement that occurs during the performance of a gesture. This additional movement control capability may be accomplished through the movement control module 14 of the apparatus 10 or may be movement control performed independent of the apparatus 10 by the robot’s 100 own internal movement system.
- the motors/actuators 110 e.g., servo and/or stepper motors
- transformers e.g., transformers
- chassis flex and contact e.g., hydraulics, chamber echo inside the robot, gears, etc. involved in providing gestural, manipulation or locomotion functions produce mechanical noise 20.
- a user 50 may wish to communicate with the robot 100 while the robot 100 is performing a gesture.
- the user 50 may first make expressions or ask questions to the robot 100, and then expect to receive a response.
- the robot’s 100 response may include a gesture that is performed by the robot 100.
- the user 50 may wish to speak to the robot 100 (e.g., to ask a follow up question).
- the robot system 400 should be able to respond to an utterance from a user 50 that is issued while the robot 100 is performing the gesture.
- the speech- recognition performance of a robot system 100 is improved by reducing the relative level of internal mechanical noise against the utterance/speech signals sensed by the
- Various embodiments of the present disclosure provide a gesture library in which the gestures that the robot 100 is expected to perform are paired with noise profiles. With knowledge of the gestures that the robot 100 can be commanded to perform, the corresponding noise profile can be retrieved and used to cancel out the mechanical noise components mixed with the user’s 50 utterance.
- FIG. 2 illustrates a flow chart of a method for noise reduction in a robot system 400 according to embodiments of the present disclosure.
- the method can be executed, for example, on the apparatus 10 as illustrated in FIG. 1.
- the apparatus 10 as shown can be a client device or a cloud-based apparatus, or it can be part of the server 300 or robot 100 illustrated in FIG. 1.
- the method may also include additional actions not shown and/or omit the illustrated steps.
- the scope of the subject matter described herein is not limited in this aspect.
- a gesture to be performed by the robot 100 is obtained.
- the gesture may be one of a plurality of predetermined gestures that the robot 100 is capable of performing.
- the gesture may, for example, be represented by a symbolic representation of the gesture.
- the symbolic representation may be a digital signal format in which the orientations of the body parts of the robot 100 are represented by symbols that can be interpreted by the apparatus 100 to generate instructions for the robot 100 to orient its body parts in a particular way.
- each of the gestures performed by the robot 100 may be represented using a gesture language in which symbols are used to represent orientations of robot body parts.
- the gesture language is preferable machine- independent (or hardware-independent), in that the language can be interpreted and compiled regardless of the type of robot 100 performing the gesture.
- the particular gesture to be performed by the robot 100 may, for example, be determined by the server 300 through the gesture language module 320. The server 300 may then provide the symbolic representation of the gesture to be performed by the robot 100 to the apparatus 10.
- the server 300 may, for example, utilize a library that pairs a plurality of predetermined gestures that can be formed by a robot 100 with the symbolic
- the gesture language module 320 may thus determine an appropriate gesture to be performed by the robot 100 and send the symbolic representation of the gesture to the apparatus 10.
- the present disclosure is not limited in this manner.
- the apparatus 10 itself may alternatively perform this function.
- FIGS. 3A and 3B illustrate a typical labanotation for performing a gesture.
- Labanotation is a notation system used to record human movement in which symbols define orientations of various body parts.
- Labanotation herein particularly defines orientations of at least one body part of the robot 100 with respect to a plurality of time slots 301.
- Labanotation is machine-independent and thereby can be implemented by multiple different types of hardware (or robots).
- it is easy to transmit labanotation between a robot and the cloud computer (e.g., the server 300) through limited communication channels.
- Labanotation also generally requires smaller memory than other types of representations.
- orientations of the at least one body part of the robot 100 in the plurality of time slots 301 can be determined, and then symbols corresponding to the orientations can be obtained. After that, the symbols in association with the corresponding time slots 301 as a part of the labanotation can be saved.
- the at least one body part includes a plurality of body parts
- the labanotation includes a first dimension corresponding to the plurality of time slots 301 and a second dimension corresponding to the plurality of body parts.
- FIG. 3A illustrates such a labanotation representative of a particular gesture.
- each of the columns corresponds to one specific body part, such as left hand, left arm, support, right arm, right hand, head.
- Each row corresponds to the time slot with a given duration.
- a symbol represents to which direction that the body part is oriented at that time.
- the sample labanotation in FIG. 3 A is merely shown for purpose of illustration without suggesting any limitation as to the scope of the subject matter described herein. In other words, a more complicated labanotation with more body parts involved is also possible.
- the apparatus 10 may cause the robot to perform the gesture.
- the apparatus 10 instructs the robot 100 to orient its body parts to perform the particular gesture.
- the apparatus 10 may receive a symbolic representation of the gesture from the server 300, and based on the symbolic representation, determine joint angles and instruct the robot 100 to control its motors 110 to a particular joint angle.
- the various motors 110 of the robot 100 move particular parts of the robot 100 so that the robot 100 performs the gesture.
- the motors 110 and the mechanical parts of the robot 100 involved in providing the gesture produce mechanical noise 20 that can be picked up by the microphone 30.
- This noise 20 becomes problematic when, in 203 the microphone 30 receives incoming audio that includes user audio (for example, a user utterance) that the robot system 400 should interact with.
- the incoming audio from the user may be audio on which speech recognition is performed in order for the robot system 400 to determine how it should response to the user’s utterance.
- the mechanical noise 20 mixes with the incoming audio.
- the presence of the mechanical noise 20 in the incoming audio may decrease the performance of the speech- recognition services provided by the voice recognition module 310 of the server 300 that are used by the robot system 400 to understand and respond to the meaning of the user’s utterance.
- a noise profile INMN for removing the mechanical robot noise 20 from the incoming audio is retrieved.
- the noise profile is ultimately used to cancel out the mechanical noise 20 associated with the robot’s 100 performance of the gesture when, in S205, the noise profile is applied.
- the noise profile is retrieved from a gesture library 12, in which gestures (LAi, LA2, ..., LAN) are paired to noise profiles (INMi, INM2, .. INMN).
- the gesture library 12 comprises a finite number of gestures that the robot 100 is expected to perform (namely, the plurality of predetermined gestures (LAi, LA2, ... , LAN)) for interacting with the user 50.
- the gesture library 12 includes a noise profile INMN for canceling out the mechanical noise 20 caused by the robot’s 100 performance of the gesture LAN.
- the noise signals associated with the performance of the gesture may, for example, be mixed out-of-phase with the incoming audio to obtain a cleaner audio signal that better represents the utterance that was spoken by the user 50 while the robot 100 performed the gesture.
- the gesture library 12 may index each of the noise profiles (INMi, INM2, ..., INMN) to the labanotation representative of the gesture that causes the noise 20 for which the noise profile INMN is created.
- the apparatus can pull the appropriate noise profile INMN from the gesture library 12 based on the labanotation LAN received from the server 300.
- each of the noise profiles (INMi, INM2,
- INMN may be an inverse noise model that can be mixed with the audio signals picked up by the microphone 30 in order to perform noise cancellation.
- An inverse noise model INMN is the inverse of the noise signals caused by the robot 100 when the robot performs the gesture associated with the inverse noise model INMN.
- the inverse noise model may be mixed with the audio signals picked up by the microphone 30 during the robot’s 100 performance of the gesture by adding the inverse noise model to the audio signals.
- the acts described herein may be computer-executable instructions that can be implemented by one or more processors and/or stored on a computer- readable medium or media.
- the computer-executable instructions may include a routine, a sub-routine, programs, a thread of execution, and/or the like.
- results of acts of the methodologies may be stored in a computer-readable medium, displayed on a display device, and/or the like.
- the computer-readable medium may be any suitable computer- readable storage device, such as memory, hard drive, CD, DVD, flash drive, or the like.
- the term“computer-readable medium” is not intended to encompass a propagated signal.
- FIG. 4 is a schematic diagram for illustrating the creation of gesture-noise profile pairs to be included in a gesture library 12 according to embodiments of the present disclosure.
- labanotation is used as a machine- independent gesture language for symbolically representing gestures performed by the robot 100.
- the gesture language used to create gesture-noise pairs is not limited to labanotation.
- FIG. 4 illustrates an embodiment in which the mechanical noise 20 generated by a robot’s 100 performance of a gesture is recorded in order to create a noise profile.
- the gesture library 12 comprises noise profiles based on pre-recorded noise signals.
- the robot controller module 220 of the apparatus 100 controls the robot 100 to perform the gesture by, for example, sending instructions to the robot 100 to orient one or more robot body parts in a particular way.
- the robot 100 generates mechanical noise 20 caused by, for example, the robot’s motors 110 (e.g., servo and/or stepper motors), transformers, chassis flex and contact, hydraulics, chamber echo inside the robot, gears, etc.
- the mechanical noise 20 is recorded, and a noise profile is created based on the pre-recorded noise 20.
- the created noise profile is then paired with the gesture (in this example, the labanotation representation of the gesture).
- the noise profile stored in the library 12 may, for example, be a digital recording of the pre-recorded noise signals, an inverse of the pre-recorded noise, or another noise profile created based on the pre-recorded noise.
- the noise profile will be used to cancel out mechanical robot noise that is picked up by microphone 30 and mixed with incoming user audio.
- the noise profile may include pre-recorded noise signals that are mixed out-of-phase with the incoming user audio or an inverse of the pre-recorded noise signals that is added to the incoming user audio. The process of creating a gesture- noise pair is repeated for each of the gestures/labanotations contained in the gesture library 12.
- the robot system 400 can provide a robot 100 that can perform a predetermined number of gestures using a gesture language that is independent of the robot 100 while also providing the ability to perform noise cancellation of noises that are specific to the particular hardware of the robot.
- the system 400 can ultimately provide gesture services to a plurality of different types of robots independent of the hardware and software implemented by the robot 100, while also having the ability to perform noise cancellation of noise that is specific to the motors, mechanical components, etc. of each of the different types of robots.
- the same microphone 30 that is used to capture incoming user audio is used to create the gesture library 12.
- Using the same microphone 30 can be beneficial in that the hardware components used to pre-record noise signals are the same as those that pick up the noise signals during operation of the robot system, thus further ensuring that the noise signals of the gesture library 12 are an accurate representation of the noise that will be picked up by the microphone 30 when the robot 200 performs the associated gesture.
- the pre-recorded robot noise audio signals are synchronized with the corresponding gesture so that noise cancellation occurs at the appropriate time.
- time passes from the bottom to the top in 301, and a specific combination of various symbols indicating the various orientations of multiple body parts at a given time slot 301 will be executed, so that the robot 100 can continuously perform the
- the robot 100 itself is used to generate the pre-recorded robot noise, and thus, the pre-recorded noise signals of the noise profile can be assumed to be synchronized with the particular movements of the robot at the time slots at which they occur.
- the apparatus 10 may time stamp the point at which motor control begins and then synchronize this time stamp with the noise model associated with the gesture the robot is performing, so that the start point at which the microphone 30 receives incoming audio and the start point at which noise profile is applied are synchronized.
- the creation of this gesture library 12 is not limited as such.
- the noise profile associated with a particular gesture may be obtained from an alternative source without requiring that the robot 100 itself record the mechanical noise 20.
- the noise profiles of the gestures in the gesture library 12 may be created using a physics model representative of the noise created by the robot when the robot performs a gesture.
- the physics model may predict the sound propagation occurring when the robot performs a gesture. Different from using data collected from sound sensors or the like (as in the case of creating noise-profiles using noise signals obtained from the microphone 30), the physics model encompasses predictions of motor waveforms, chassis sound emulation, sound reflection patterns, etc.
- Embodiments of the present disclosure may also include an overlay model that can integrate unexpected sounds with the existing gesture library 12.
- the overlay model may, for example, be computed according to the physics model, or using extended noise records that may be generated in real-time.
- the unexpected sound from a received motor movement may, for example, be the result of the robot 100 righting itself or countering an external unexpected force that occurs while the robot 100 performs a gesture.
- the overlay model for the unexpected sounds may be applied along with the pre-recorded noise model for a particular gesture to facilitate additional noise cancelation in the event that additional unexpected movement occurs during a robot’s performance of a gesture.
- an environment noise physics model may also be created to represent environmental noise that may picked up by the microphone 30 while the user 50 is interacting with the robot 100.
- the physics model for the environmental noise predicts the noise created by the environment in which the robot interacts.
- the physics model for environmental noise may be added to the gesture library 12 and may also be mixed out-of-phase with the incoming audio to reduce environmental noise picked up by microphone 30.
- the gesture library 12 may include a plurality of environmental models each modeling a different environment in which the robot may be present.
- the noise-cancelled audio signals may be transmitted to the voice recognition module 310.
- the voice recognition module 310 translates the noise-cancelled audio signals into verbal interaction elements used by and provided to the apparatus 10. For example, the voice recognition module 310 may perform analyses based on the content of the noise-canceled audio signals, and may prepare an utterance that is to be spoken by the robot 100 as a response to or an answer to the user utterance included in the noise-cancelled audio signals. Further, the gesture language module 320 may determine a gesture to be performed by a robot 100 based on the output of the voice recognition module 310.
- the gesture may accompany the utterance to be spoken by the robot 100, or, alternatively, the voice recognition module 310 may determine that an utterance will not be performed by the robot, and the gesture language module 320 may determine a gesture that will be performed by the robot 100 without an accompanying robot utterance.
- the server 300 may, for example, extract a concept from the utterance to be spoken by the robot and pull a gesture corresponding to the extracted concept from a library.
- the concept may be one representative extracted from a cluster of wards, and such concepts may include, for example,“Hello,”“Good,”“Thanks,”“Hungry,” etc.
- the present disclosure is not limited to any particular method of selecting a gesture that is to be performed by the robot 100.
- the robot system 400 may once again perform the method illustrated in FIG. 4 to remove robot noise from any incoming audio received by the microphone 30 while the gesture is performed.
- FIG. 5 is a block diagram of apparatus 10 suitable for implementing one or more implementations of the subject matter described herein.
- the apparatus 10 may function as discussed above with reference to FIG. 1.
- the apparatus 10 is not intended to suggest any limitation as to scope of use or functionality of the subject matter described herein, as various implementations may be implemented in diverse general-purpose or special-purpose computing environments.
- the apparatus 10 includes at least one processor 120 and a memory 140.
- the processor 120 executes computer-executable instructions and may be a real or a virtual processor. In a multi-processing system, multiple processors execute computer- executable instructions to increase processing power.
- the memory 130 may be volatile memory (e.g., registers, cache, RAM), non-volatile memory (e.g., ROM, EEPROM, flash memory), or some combination thereof.
- the memory 130 and its associated computer- readable media provide storage of data, data structure, computer-executable instructions, etc. for the apparatus 10.
- the memory 130 is coupled to the processor 120 and stores instructions for execution by the processor 120. Those instructions, when executed by the processor 120 cause the apparatus to: obtain a gesture to be performed by a robot; receive incoming audio, the incoming audio including audio from a user with robot noise caused by the robot’s performance of the gesture; retrieve, from a gesture library, a noise profile associated with the gesture for removing the robot noise caused by the robot’s performance of the gesture from the incoming audio; and apply the noise profile to remove the robot noise from the incoming audio.
- the apparatus 10 further includes one or more communication connections 140.
- An interconnection mechanism such as a bus, controller or network interconnects the components of the apparatus 10.
- operating system software provides an operating environment for other software executing in the apparatus 10, and coordinates activities of the components of the apparatus 10.
- the communication connections 140 enable communication over a
- the apparatus 10 may operate in a networked environment (for example, the robot system environment 400) using logical connections to one or more other servers, network PCs, or another common network node.
- communication media include wired or wireless networking techniques.
- Implementations of the subject matter described herein include a computer- readable medium comprising computer-executable instructions. Those instructions, when executed by a computer, cause the computer to perform a method for noise reduction in a robot system in which a robot performs a gesture, the method comprising: receiving an indication that the robot is performing the gesture; receiving incoming audio, the incoming audio including a user utterance mixed with mechanical robot noise caused by the robot’s performance of the gesture; retrieving, from a gesture library comprising a plurality of predetermined gestures paired with noise profiles, a noise profile associated with the gesture; and applying the noise profile to the incoming audio to remove the mechanical robot noise caused by the robot’s performance of the gesture.
- Computer storage medium includes, but is not limited to, RAM, ROM, EPROM, EEPROM, flash memory or other solid state memory technology, CD-ROM, DVD, or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other medium which can be used to store the desired information and which can be accessed by the computer.
- a method for noise reduction in a robot system comprises: obtaining a gesture to be performed by a robot; receiving incoming audio, the incoming audio including audio from a user and robot noise caused by the robot’s performance of the gesture; retrieving, from a gesture library, a noise profile associated with the gesture for removing the robot noise caused by the robot’s
- the noise profile may be an inverse noise model
- applying the noise profile to remove the robot noise from the incoming audio may comprise applying the inverse noise model to the incoming audio.
- the noise profile may comprise pre-recorded noise signals of the robot performing the gesture, and applying the noise profile to remove the robot noise may comprise mixing the pre-recorded noise signals out-of-phase with the incoming audio.
- the gesture library may comprise a plurality of predetermined gestures performed by the robot, and each of the predetermined gestures is paired to a noise profile for removing robot noise.
- the method may further comprise creating the gesture library, wherein creating the gesture library may comprise: causing the robot to perform the predetermined gestures; and recording, for each of the predetermined gestures, robot noise caused by the robot’s performance of the gesture to create a noise profile.
- the incoming audio may be received by a robot microphone, and recording, for each of the predetermined gestures, robot noise caused by the robot’s performance of the gesture to create a noise profile may comprise recording, for each of the predetermined gestures, robot noise caused by the robot’s performance of the gesture using the robot microphone.
- the gesture library may comprise a plurality of symbolic representations of gestures to be performed by the robot and each of the symbolic representations is paired to a noise profile for removing robot noise.
- obtaining a symbolic representation of a gesture to be performed by a robot may comprise obtaining a labanotation defining orientations of at least one body part of the robot with respect to a plurality of time slots.
- the at least one body part includes a plurality of body parts
- causing the robot to perform the gesture may comprise executing the labanotation to trigger the plurality of body parts to perform the gesture according to the respective orientations in the plurality of time slots.
- an apparatus for noise reduction in a robot system comprises a processor and a memory coupled to the processor and storing instructions for execution by the processor, the instructions, when executed by the processor, causing the apparatus to: obtain a gesture to be performed by a robot;
- the incoming audio including audio from a user and robot noise caused by the robot’s performance of the gesture; retrieve, from a gesture library, a noise profile associated with the gesture for removing the robot noise caused by the robot’s performance of the gesture from the incoming audio; and apply the noise profile to remove the robot noise from the incoming audio.
- the noise profile may be an inverse noise model
- applying the noise profile to remove the robot noise from the incoming audio may comprise applying the inverse noise model to the incoming audio.
- the noise profile comprises pre-recorded noise signals of the robot performing the gesture
- applying the noise profile to remove the robot noise may comprise mixing the pre-recorded noise signals out-of-phase with the incoming audio.
- the gesture library may comprise a plurality of predetermined gestures performed by the robot, and each of the predetermined gestures is paired to a noise profile for removing robot noise.
- the instructions when executed by the processor, may further cause the apparatus to create the gesture library, wherein creating the gesture library comprises: causing the robot to perform the predetermined gestures; and recording, for each of the predetermined gestures, robot noise caused by the robot’s performance of the gesture to create a noise profile.
- the incoming audio may be received by a robot microphone; and recording, for each of the predetermined gestures, robot noise caused by the robot’s performance of the gesture to create a noise profile may comprise recording, for each of the predetermined gestures, robot noise caused by the robot’s performance of the gesture using the robot microphone.
- obtaining a gesture to be performed by a robot may comprise obtaining a symbolic representation of the gesture to be performed by a robot, and the instructions, when executed by the processor, may further cause the apparatus to cause the robot to perform the gesture comprises controlling an orientation of at least one body part of the robot according to the symbolic representation.
- the gesture library may comprise a plurality of symbolic representations of gestures to be performed by the robot and each of the symbolic representations is paired to a noise profile for removing robot noise.
- obtaining a symbolic representation of a gesture to be performed by a robot may comprise obtaining a labanotation defining orientations of at least one body part of the robot with respect to a plurality of time slots.
- the at least one body part may include a plurality of body parts
- causing the robot to perform the gesture may comprise executing the labanotation to trigger the plurality of body parts to perform the gesture according to the respective orientations in the plurality of time slots.
- a computer- readable storage medium comprises computer-executable instructions which, when executed by a computer, cause the computer to perform a method for noise reduction in a robot system in which a robot performs a gesture, the method comprises: receiving an indication that the robot is performing the gesture; receiving incoming audio, the incoming audio including a user utterance mixed with mechanical robot noise caused by the robot’s performance of the gesture; retrieving, from a gesture library comprising a plurality of predetermined gestures paired with noise profiles, a noise profile associated with the gesture; and applying the noise profile to the incoming audio to remove the mechanical robot noise caused by the robot’s performance of the gesture.
- the plurality of noise profiles of the gesture library may comprise pre-recorded noise signals of the robot performing the predefined gestures, and applying the noise profile to remove the robot noise may comprise mixing pre-recorded noise signals of the noise profile associated with the gesture out-of-phase with the incoming audio.
- the plurality of predetermined gestures may be represented in the gesture library by a plurality of symbolic representations of the predetermined gestures, wherein a symbolic representation defines orientations of at least one body part of the robot in performing a gesture.
Landscapes
- Engineering & Computer Science (AREA)
- Mechanical Engineering (AREA)
- Robotics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Signal Processing (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Computational Linguistics (AREA)
- Automation & Control Theory (AREA)
- General Health & Medical Sciences (AREA)
- Manipulator (AREA)
- User Interface Of Digital Computer (AREA)
Priority Applications (4)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202080034072.1A CN113826160B (zh) | 2019-05-08 | 2020-03-17 | 在机器人与人的通信中的降噪 |
| KR1020217036241A KR102874755B1 (ko) | 2019-05-08 | 2020-03-17 | 로봇과 인간 간의 통신에서의 노이즈 감소 기법 |
| EP20719255.0A EP3966817B1 (en) | 2019-05-08 | 2020-03-17 | Noise reduction in robot human communication |
| JP2021559363A JP2022531654A (ja) | 2019-05-08 | 2020-03-17 | ロボットと人間とのコミュニケーションにおけるノイズ低減 |
Applications Claiming Priority (2)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| US16/406,788 | 2019-05-08 | ||
| US16/406,788 US11270717B2 (en) | 2019-05-08 | 2019-05-08 | Noise reduction in robot human communication |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| WO2020226748A1 true WO2020226748A1 (en) | 2020-11-12 |
Family
ID=70289450
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| PCT/US2020/023042 Ceased WO2020226748A1 (en) | 2019-05-08 | 2020-03-17 | Noise reduction in robot human communication |
Country Status (6)
| Country | Link |
|---|---|
| US (2) | US11270717B2 (https=) |
| EP (1) | EP3966817B1 (https=) |
| JP (1) | JP2022531654A (https=) |
| KR (1) | KR102874755B1 (https=) |
| CN (1) | CN113826160B (https=) |
| WO (1) | WO2020226748A1 (https=) |
Families Citing this family (4)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CA3035607A1 (en) * | 2016-09-08 | 2018-03-15 | Fives Line Machines Inc. | Machining station, workpiece holding system, and method of machining a workpiece |
| EP3551393A4 (en) * | 2016-12-12 | 2020-08-12 | Microsoft Technology Licensing, LLC | GENERATION OF ROBOT GESTURES |
| US11270717B2 (en) * | 2019-05-08 | 2022-03-08 | Microsoft Technology Licensing, Llc | Noise reduction in robot human communication |
| JP7420144B2 (ja) * | 2019-10-15 | 2024-01-23 | 日本電気株式会社 | モデル生成方法、モデル生成装置、プログラム |
Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20080071540A1 (en) * | 2006-09-13 | 2008-03-20 | Honda Motor Co., Ltd. | Speech recognition method for robot under motor noise thereof |
| US20100299145A1 (en) * | 2009-05-22 | 2010-11-25 | Honda Motor Co., Ltd. | Acoustic data processor and acoustic data processing method |
| US8995671B2 (en) * | 2011-07-06 | 2015-03-31 | Honda Motor Co., Ltd. | Sound processing device, sound processing method, and sound processing program |
Family Cites Families (28)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| JPH035255A (ja) * | 1989-06-02 | 1991-01-11 | Nissan Motor Co Ltd | 車両のこもり音低減装置 |
| JP3550153B2 (ja) * | 1991-07-05 | 2004-08-04 | 本田技研工業株式会社 | 内燃エンジンに係る能動振動制御装置の参照信号生成装置 |
| JP2743639B2 (ja) * | 1991-08-16 | 1998-04-22 | 日産自動車株式会社 | 能動型騒音制御装置 |
| US6099217A (en) * | 1995-12-20 | 2000-08-08 | Wiegand; Alexander Konrad | Device for spatially moving a body with three to six degrees of freedom in a controlled manner |
| KR100337560B1 (ko) * | 1996-12-20 | 2003-02-25 | 기아자동차주식회사 | 차량의 소음 제거 장치 |
| JP2001215990A (ja) * | 2000-01-31 | 2001-08-10 | Japan Science & Technology Corp | ロボット聴覚装置 |
| AUPR604201A0 (en) * | 2001-06-29 | 2001-07-26 | Hearworks Pty Ltd | Telephony interface apparatus |
| JP5041934B2 (ja) * | 2006-09-13 | 2012-10-03 | 本田技研工業株式会社 | ロボット |
| US8046219B2 (en) * | 2007-10-18 | 2011-10-25 | Motorola Mobility, Inc. | Robust two microphone noise suppression system |
| US20100039434A1 (en) * | 2008-08-14 | 2010-02-18 | Babak Makkinejad | Data Visualization Using Computer-Animated Figure Movement |
| JP5535746B2 (ja) * | 2009-05-22 | 2014-07-02 | 本田技研工業株式会社 | 音データ処理装置及び音データ処理方法 |
| JP5391008B2 (ja) * | 2009-09-16 | 2014-01-15 | キヤノン株式会社 | 撮像装置及びその制御方法 |
| US9233470B1 (en) * | 2013-03-15 | 2016-01-12 | Industrial Perception, Inc. | Determining a virtual representation of an environment by projecting texture patterns |
| JP5754454B2 (ja) * | 2013-03-18 | 2015-07-29 | 株式会社安川電機 | ロボットピッキングシステム及び被加工物の製造方法 |
| US9177541B2 (en) * | 2013-08-22 | 2015-11-03 | Bose Corporation | Instability detection and correction in sinusoidal active noise reduction system |
| US9769564B2 (en) * | 2015-02-11 | 2017-09-19 | Google Inc. | Methods, systems, and media for ambient background noise modification based on mood and/or behavior information |
| KR102372188B1 (ko) * | 2015-05-28 | 2022-03-08 | 삼성전자주식회사 | 오디오 신호의 잡음을 제거하기 위한 방법 및 그 전자 장치 |
| CN106308812A (zh) * | 2015-07-08 | 2017-01-11 | 宣威科技股份有限公司 | 可携式听力检测装置 |
| US9990685B2 (en) * | 2016-03-21 | 2018-06-05 | Recognition Robotics, Inc. | Automated guidance system and method for a coordinated movement machine |
| DE102016213663A1 (de) * | 2016-07-26 | 2018-02-01 | Siemens Aktiengesellschaft | Verfahren zum Steuern eines Endelementes einer Werkzeugmaschine und eine Werkzeugmaschine |
| JP2018036332A (ja) * | 2016-08-29 | 2018-03-08 | 国立大学法人 筑波大学 | 音響処理装置、音響処理システム及び音響処理方法 |
| CA3035607A1 (en) * | 2016-09-08 | 2018-03-15 | Fives Line Machines Inc. | Machining station, workpiece holding system, and method of machining a workpiece |
| EP3551393A4 (en) * | 2016-12-12 | 2020-08-12 | Microsoft Technology Licensing, LLC | GENERATION OF ROBOT GESTURES |
| US20180190257A1 (en) * | 2016-12-29 | 2018-07-05 | Shadecraft, Inc. | Intelligent Umbrellas and/or Robotic Shading Systems Including Noise Cancellation or Reduction |
| CN107610698A (zh) * | 2017-08-28 | 2018-01-19 | 深圳市金立通信设备有限公司 | 一种实现语音控制的方法、机器人及计算机可读存储介质 |
| US11069365B2 (en) * | 2018-03-30 | 2021-07-20 | Intel Corporation | Detection and reduction of wind noise in computing environments |
| CN108648756A (zh) * | 2018-05-21 | 2018-10-12 | 百度在线网络技术(北京)有限公司 | 语音交互方法、装置和系统 |
| US11270717B2 (en) * | 2019-05-08 | 2022-03-08 | Microsoft Technology Licensing, Llc | Noise reduction in robot human communication |
-
2019
- 2019-05-08 US US16/406,788 patent/US11270717B2/en active Active
-
2020
- 2020-03-17 CN CN202080034072.1A patent/CN113826160B/zh active Active
- 2020-03-17 WO PCT/US2020/023042 patent/WO2020226748A1/en not_active Ceased
- 2020-03-17 JP JP2021559363A patent/JP2022531654A/ja active Pending
- 2020-03-17 KR KR1020217036241A patent/KR102874755B1/ko active Active
- 2020-03-17 EP EP20719255.0A patent/EP3966817B1/en active Active
-
2022
- 2022-01-28 US US17/587,568 patent/US11842744B2/en active Active
Patent Citations (3)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20080071540A1 (en) * | 2006-09-13 | 2008-03-20 | Honda Motor Co., Ltd. | Speech recognition method for robot under motor noise thereof |
| US20100299145A1 (en) * | 2009-05-22 | 2010-11-25 | Honda Motor Co., Ltd. | Acoustic data processor and acoustic data processing method |
| US8995671B2 (en) * | 2011-07-06 | 2015-03-31 | Honda Motor Co., Ltd. | Sound processing device, sound processing method, and sound processing program |
Also Published As
| Publication number | Publication date |
|---|---|
| CN113826160A (zh) | 2021-12-21 |
| US20220230650A1 (en) | 2022-07-21 |
| KR102874755B1 (ko) | 2025-10-21 |
| CN113826160B (zh) | 2025-09-26 |
| EP3966817B1 (en) | 2023-12-13 |
| KR20220007051A (ko) | 2022-01-18 |
| US20200357423A1 (en) | 2020-11-12 |
| US11270717B2 (en) | 2022-03-08 |
| EP3966817A1 (en) | 2022-03-16 |
| JP2022531654A (ja) | 2022-07-08 |
| US11842744B2 (en) | 2023-12-12 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11842744B2 (en) | Noise reduction in robot human communication | |
| KR20230097106A (ko) | 딥 러닝 기반 스피치 향상 | |
| CN119610090B (zh) | 一种人形机器人自然语言操控方法 | |
| CN111798860A (zh) | 音频信号处理方法、装置、设备及存储介质 | |
| US20230298609A1 (en) | Generalized Automatic Speech Recognition for Joint Acoustic Echo Cancellation, Speech Enhancement, and Voice Separation | |
| CN120422249A (zh) | 一种机器人动作生成方法及相关装置 | |
| CN107309873B (zh) | 机械臂运动控制方法和系统 | |
| EP4483360B1 (en) | Coded speech enhancement based on deep generative model | |
| US20240112018A1 (en) | System and method for deep learning-based sound prediction using accelerometer data | |
| CN114556472A (zh) | 深度源分离架构 | |
| WO2025240379A1 (en) | Real-time multi-modal artificial intelligence agent | |
| JP2024501885A (ja) | 現実ロボットのシミュレーション駆動型ロボット制御 | |
| CN115609343B (zh) | 一种运动倍率调节方法、装置、计算机设备和存储介质 | |
| US20240100693A1 (en) | Using embeddings, generated using robot action models, in controlling robot to perform robotic task | |
| US11978441B2 (en) | Speech recognition apparatus, method and non-transitory computer-readable storage medium | |
| JP7679144B2 (ja) | 仮想アシスタントの音声応答をユーザ活動と同期させる方法、システム、プログラム | |
| Nguyen et al. | Control of autonomous mobile robot using voice command | |
| CN120755870A (zh) | 人形机器人的姿势生成方法、装置、机器人、介质及产品 | |
| CN113113035B (zh) | 一种音频信号处理方法、装置、系统以及电子设备 | |
| CN112669848B (zh) | 一种离线语音识别方法、装置、电子设备及存储介质 | |
| US20240112019A1 (en) | System and method for deep learning-based sound prediction using accelerometer data | |
| CN117316153A (zh) | 麦克风阵列的控制方法、机器人、装置、设备和存储介质 | |
| Klin et al. | Multi-talker Verbal Interaction for Humanoid Robots | |
| Vu et al. | Autonomous Car Control Via Natural Language Commands Using ChatGPT and Raspberry Pi | |
| CN117921647A (zh) | 轨迹规划、机器人控制方法和装置,及并联机器人 |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| 121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20719255 Country of ref document: EP Kind code of ref document: A1 |
|
| ENP | Entry into the national phase |
Ref document number: 2021559363 Country of ref document: JP Kind code of ref document: A |
|
| NENP | Non-entry into the national phase |
Ref country code: DE |
|
| ENP | Entry into the national phase |
Ref document number: 2020719255 Country of ref document: EP Effective date: 20211208 |