WO2018006369A1 - Procédé et système de synchronisation d'actions vocales et virtuelles, et robot - Google Patents

Procédé et système de synchronisation d'actions vocales et virtuelles, et robot Download PDF

Info

Publication number
WO2018006369A1
WO2018006369A1 PCT/CN2016/089213 CN2016089213W WO2018006369A1 WO 2018006369 A1 WO2018006369 A1 WO 2018006369A1 CN 2016089213 W CN2016089213 W CN 2016089213W WO 2018006369 A1 WO2018006369 A1 WO 2018006369A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
length
time
voice
robot
Prior art date
Application number
PCT/CN2016/089213
Other languages
English (en)
Chinese (zh)
Inventor
邱楠
杨新宇
王昊奋
Original Assignee
深圳狗尾草智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳狗尾草智能科技有限公司 filed Critical 深圳狗尾草智能科技有限公司
Priority to CN201680001720.7A priority Critical patent/CN106471572B/zh
Priority to PCT/CN2016/089213 priority patent/WO2018006369A1/fr
Priority to JP2017133167A priority patent/JP6567609B2/ja
Publication of WO2018006369A1 publication Critical patent/WO2018006369A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J13/00Controls for manipulators
    • B25J13/003Controls for manipulators by means of an audio-responsive input
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/04Time compression or expansion
    • G10L21/055Time compression or expansion for synchronising with other signals, e.g. video signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/226Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics

Definitions

  • the present invention relates to the field of robot interaction technologies, and in particular, to a method, system and robot for synchronizing voice and virtual motion.
  • robots are used more and more. For example, some elderly people and children can interact with robots, including dialogue and entertainment.
  • the inventor developed a virtual robot display device and imaging system, which can form a 3D animated image, and the virtual robot's host accepts human commands such as voice to interact with humans. Then the virtual 3D animated image will respond to the sounds and actions according to the instructions of the host, so that the robot can be more anthropomorphic, not only can interact with humans in sounds and expressions, but also interact with humans in actions, etc. Improve the experience of interaction.
  • a method of synchronizing speech and virtual actions including:
  • the interactive content including at least voice information and action information;
  • the length of the voice information and the length of the motion information are adjusted to phase
  • the same specific steps include:
  • the difference between the length of the voice information and the duration of the motion information is not greater than the threshold, when the length of the voice information is less than the duration of the motion information, the playback speed of the motion information is accelerated, and the duration of the motion information is equal to the voice.
  • the length of time for the message is not greater than the threshold, when the length of the voice information is less than the duration of the motion information, the playback speed of the motion information is accelerated, and the duration of the motion information is equal to the voice. The length of time for the message.
  • the playback speed of the voice information or/and the playback speed of the motion information is accelerated, so that the time length of the motion information is equal to the length of time of the voice information.
  • the specific steps of adjusting the length of the voice information and the length of the action information to the same include:
  • the difference between the length of the voice information and the time length of the action information is greater than the threshold, when the time length of the voice information is greater than the time length of the action information, at least two sets of action information are sorted and combined, so that the combined action information is The length of time is equal to the length of time of the voice message.
  • part of the action information is selected, so that the length of the selected part action is equal to the time length of the voice information.
  • the method for generating the variable parameter of the robot comprises: fitting a parameter of the self-cognition of the robot with a parameter of the scene in the variable parameter to generate a variable parameter of the robot.
  • variable parameter includes at least a behavior of changing a user's original behavior and a change, and a parameter value representing a behavior of changing a user's original behavior and a change.
  • the step of generating the interactive content according to the multimodal information and the variable parameter specifically includes: generating the interactive content according to the multimodal information and the variable parameter and the fitting curve of the parameter changing probability.
  • the method for generating a fitting curve of the parameter change probability comprises: using a probability algorithm, using a network to make a probability estimation of parameters between the robots, and calculating a scene parameter change of the robot on the life time axis on the life time axis. After that, the probability of each parameter change forms a fitted curve of the parameter change probability.
  • a system for synchronizing voice and virtual actions including:
  • An obtaining module configured to acquire multi-modal information of the user
  • An artificial intelligence module configured to generate interaction content according to multi-modality information and variable parameters of the user, where the interaction content includes at least voice information and action information;
  • control module configured to adjust the length of the voice information and the length of the motion information to the same.
  • control module is specifically configured to:
  • the difference between the length of the voice information and the duration of the motion information is not greater than the threshold, when the length of the voice information is less than the duration of the motion information, the playback speed of the motion information is accelerated, and the duration of the motion information is equal to the voice.
  • the length of time for the message is not greater than the threshold, when the length of the voice information is less than the duration of the motion information, the playback speed of the motion information is accelerated, and the duration of the motion information is equal to the voice. The length of time for the message.
  • the playback speed of the voice information or/and the playback speed of the motion information is accelerated, so that the time length of the motion information is equal to the length of time of the voice information.
  • control module is specifically configured to:
  • the difference between the length of the voice information and the time length of the action information is greater than the threshold, when the time length of the voice information is greater than the time length of the action information, at least two sets of action information are combined to make the combined action information time.
  • the length is equal to the length of time of the voice information.
  • part of the action information is selected, so that the length of the selected part action is equal to the time length of the voice information.
  • the system further comprises a processing module for fitting the self-cognitive parameters of the robot with the parameters of the scene in the variable parameters to generate variable parameters.
  • variable parameter includes at least a behavior of changing a user's original behavior and a change, and a parameter value representing a behavior of changing a user's original behavior and a change.
  • the artificial intelligence module is specifically configured to: generate interaction content according to the multi-modal information and the variable parameter and the fitting curve of the parameter change probability.
  • the system includes a fitting curve generating module for using a probability algorithm to estimate a parameter between the robots using a network, and calculating a scene parameter of the robot on the life time axis after the life time axis is changed.
  • the probability of each parameter change forms a fitted curve of the parameter change probability.
  • the invention discloses a robot comprising a system for synchronizing speech and virtual actions as described above.
  • the method for synchronizing speech and virtual actions of the present invention includes: acquiring multi-modal information of a user; generating interactive content according to multi-modal information and variable parameters of the user,
  • the interactive content includes at least voice information and motion information; the length of the voice information and the length of the motion information are adjusted to be the same. This can be done by one or more of the user's multimodal information such as user voice, user expressions, user actions, and the like.
  • the interactive content includes at least voice information and motion information, and in order to synchronize the voice information and the motion information, the time length of the voice information and the time length of the motion information are adjusted to be the same, so that the robot can play the sound.
  • the robot can be synchronized with the action, so that the robot not only has the voice performance when interacting, but also has various expressions such as actions.
  • the robot's expression form is more diverse, which makes the robot more anthropomorphic and improves the user's interaction with the robot. Experience.
  • FIG. 1 is a flowchart of a method for synchronizing voice and virtual actions according to Embodiment 1 of the present invention
  • FIG. 2 is a schematic diagram of a system for synchronizing voice and virtual actions according to Embodiment 2 of the present invention.
  • Computer devices include user devices and network devices.
  • the user equipment or the client includes but is not limited to a computer, a smart phone, a PDA, etc.;
  • the network device includes but is not limited to a single network server, a server group composed of multiple network servers, or a cloud computing-based computer or network server. cloud.
  • the computer device can operate alone to carry out the invention, and can also access the network and implement the invention through interoperation with other computer devices in the network.
  • the network in which the computer device is located includes, but is not limited to, the Internet, a wide area network, a metropolitan area network, a local area network, a VPN network, and the like.
  • first means “first,” “second,” and the like may be used herein to describe the various elements, but the elements should not be limited by these terms, and the terms are used only to distinguish one element from another.
  • the term “and/or” used herein includes any and all combinations of one or more of the associated listed items. When a unit is referred to as being “connected” or “coupled” to another unit, it can be directly connected or coupled to the other unit, or an intermediate unit can be present.
  • a method for synchronizing voice and virtual actions including:
  • S102 Generate interactive content according to the multimodal information of the user and the variable parameter 300, where the interactive content includes at least voice information and action information.
  • the method for synchronizing speech and virtual actions of the present invention includes: acquiring multimodal information of a user; generating interactive content according to multimodal information and variable parameters of the user, the interactive content including at least voice information and action information; The length of the information and the length of the action information are adjusted to be the same.
  • the interactive content can be generated by one or more of the user's multimodal information such as user voice, user expression, user action, etc.
  • the interactive content includes at least voice information and motion information, and in order to make the voice information and action
  • the information can be synchronized, and the time length of the voice information and the time length of the motion information are adjusted to be the same, so that the robot can synchronously match when playing the sound and the action, so that the robot not only has the voice performance but also has the action when interacting.
  • the robot's representation is more diverse, making the robot more anthropomorphic and improving the user's experience in robot interaction.
  • the multimodal information in this embodiment may be one of user expression, voice information, gesture information, scene information, image information, video information, face information, pupil iris information, light sense information, and fingerprint information.
  • variable parameters are specifically: sudden changes in people and machines, such as one day on the time axis is eating, sleeping, interacting, running, eating, sleeping. In this case, if the scene of the robot is suddenly changed, such as taking the beach at the time of running, etc., these human active parameters for the robot, as variable parameters, will cause the robot's self-cognition to change.
  • the life timeline and variable parameters can be used to change the attributes of self-cognition, such as mood values, fatigue values, etc., and can also automatically add new self-awareness information, such as no previous anger value, based on the life time axis and The scene of the variable factor will automatically add to the self-cognition of the robot based on the scene that previously simulated the human self-cognition.
  • the robot will Write this as one of the variable parameters.
  • the robot will go out to go shopping at 12 noon to generate interactive content instead of eating at 12 o'clock in the past.
  • the robot In combination with generating the interactive content, when the interactive content is specifically generated, the robot generates the multi-modal information of the acquired user, such as voice information, video information, picture information, and the like, and variable parameters. In this way, some unexpected events in human life can be added to the life axis of the robot, making the interaction of the robot more anthropomorphic.
  • the specific steps of adjusting the length of the voice information and the length of the action information to the same include:
  • the difference between the length of the voice information and the duration of the motion information is not greater than the threshold, when the length of the voice information is less than the duration of the motion information, the playback speed of the motion information is accelerated, and the duration of the motion information is equal to the voice.
  • the length of time for the message is not greater than the threshold, when the length of the voice information is less than the duration of the motion information, the playback speed of the motion information is accelerated, and the duration of the motion information is equal to the voice. The length of time for the message.
  • the playback speed of the voice information or/and the playback speed of the motion information are accelerated, so that the duration of the motion information is equal to the length of time of the voice information.
  • the specific meaning of the adjustment may be the length of time for compressing or stretching the voice information or/and the length of the motion information, or may be accelerated. Speed or slow down the playback speed, for example, multiply the playback speed of the voice message by 2, or multiply the playback time of the action information by 0.8, and so on.
  • the time length of the voice information and the time length of the motion information are one minute.
  • the length of the voice information is 1 minute, and the duration of the motion information is 2 minutes.
  • the playback speed of the motion information can be accelerated to twice the original playback speed, and then the playback time after the motion information adjustment is 1 minute, thereby synchronizing with the voice information.
  • the playback speed of the voice information can be slowed down, and adjusted to 0.5 times the original playback speed, so that the voice information is adjusted and then slowed down to 2 minutes, thereby synchronizing with the motion information.
  • both the voice information and the motion information can be adjusted, for example, the voice information is slowed down, and the motion information is accelerated, and the time is adjusted to 1 minute and 30 seconds, and the voice and the motion can be synchronized.
  • the specific steps of adjusting the length of the voice information and the length of the action information to the same include:
  • the difference between the length of the voice information and the time length of the action information is greater than the threshold, when the time length of the voice information is greater than the time length of the action information, at least two sets of action information are sorted and combined, so that the combined action information is The length of time is equal to the length of time of the voice message degree.
  • part of the action information is selected, so that the length of the selected part action is equal to the time length of the voice information.
  • the meaning of the adjustment is to add or delete part of the action information, so that the time length of the action information is the same as the time length of the voice information.
  • the time length of the voice information and the time length of the motion information are 30 seconds.
  • the length of the voice information is 3 minutes, and the duration of the motion information is 1 minute.
  • other action information needs to be added to the original action information, for example, to find an action information with a length of 2 minutes, and the two sets of action information are sorted and combined to match the time length of the voice information. It is.
  • the length of the post-action information is 2 minutes, so that the length of the voice information can be matched the same.
  • the action information closest to the time length of the voice information may be selected according to the length of the voice information, or the closest voice information may be selected according to the time length of the motion information.
  • the control module can conveniently adjust the time length of the voice information and the motion information, and it is easier to adjust to the same, and the adjusted play is more natural and smooth.
  • the method further includes: outputting the adjusted voice information and the motion information to the virtual image for display.
  • the output can be output after the adjustment is consistent, and the output can be output on the virtual image, thereby making the virtual robot more anthropomorphic and improving the user experience.
  • the method for generating a robot variable parameter includes: fitting a self-cognitive parameter of the robot with a parameter of a scene in the variable parameter to generate a robot variable parameter.
  • the parameters in the self-cognition are matched with the parameters of the scene used in the variable participation axis, and the influence of the personification is generated.
  • variable parameter includes at least a behavior that changes the user's original behavior and the change, and a parameter value that represents a change in the user's original behavior and the behavior after the change.
  • variable parameters are in the same state as the original plan.
  • the sudden change causes the user to be in another state.
  • the variable parameter represents the change of the behavior or state, and the state or behavior of the user after the change. For example, it was originally running at 5 pm, and suddenly there were other things, such as going to play, then changing from running to playing is a variable parameter, and the probability of such a change is also studied.
  • the step of generating the interactive content according to the multimodal information and the variable parameter specifically includes: generating the interactive content according to the multimodal information and the variable parameter and the fitting curve of the parameter change probability.
  • the fitting curve can be generated by the probability training of the variable parameters, thereby generating the robot interaction content.
  • the method for generating a fitting curve of the parameter change probability includes: using a probability algorithm, using a network to make a probability estimation of parameters between the robots, and calculating a scene of the robot on the life time axis on the life time axis. After the parameter is changed, the probability of each parameter changing forms a fitting curve of the parameter change probability.
  • the probability algorithm can adopt the Bayesian probability algorithm.
  • the parameters in the self-cognition are matched with the parameters of the scene used in the variable participation axis, and the influence of the personification is generated.
  • the robot will know its geographical location, and will change the way the interactive content is generated according to the geographical environment in which it is located.
  • Bayesian probability algorithm to estimate the parameters between robots using Bayesian network, and calculate the probability of each parameter change after the change of the time axis scene parameters of the robot itself on the life time axis.
  • the curve dynamically affects the self-recognition of the robot itself.
  • This innovative module makes the robot itself a human lifestyle. For the expression, it can be changed according to the location scene.
  • a system for synchronizing voice and virtual actions including:
  • the obtaining module 201 is configured to acquire multi-modal information of the user
  • the artificial intelligence module 202 is configured to generate interaction content according to the multimodal information and the variable parameter of the user, where the interaction content includes at least voice information and action information, wherein the variable parameter is variable Parameter module 301 generates;
  • the control module 203 is configured to adjust the time length of the voice information and the time length of the motion information to be the same.
  • the interactive content can be generated by one or more of the user's multimodal information such as user voice, user expression, user action, etc.
  • the interactive content includes at least voice information and motion information, and in order to make the voice information and action
  • the information can be synchronized, and the time length of the voice information and the time length of the motion information are adjusted to be the same, so that the robot can synchronously match when playing the sound and the action, so that the robot not only has the voice performance but also has the action when interacting.
  • the robot's representation is more diverse, making the robot more anthropomorphic and improving the user's experience in robot interaction.
  • the multimodal information in this embodiment may be one of user expression, voice information, gesture information, scene information, image information, video information, face information, pupil iris information, light sense information, and fingerprint information.
  • variable parameters are specifically: sudden changes in people and machines, such as one day on the time axis is eating, sleeping, interacting, running, eating, sleeping. In this case, if the scene of the robot is suddenly changed, such as taking the beach at the time of running, etc., these human active parameters for the robot, as variable parameters, will cause the robot's self-cognition to change.
  • the life timeline and variable parameters can be used to change the attributes of self-cognition, such as mood values, fatigue values, etc., and can also automatically add new self-awareness information, such as no previous anger value, based on the life time axis and The scene of the variable factor will automatically add to the self-cognition of the robot based on the scene that previously simulated the human self-cognition.
  • the robot will use this as a variable parameter.
  • the robot will go out to go shopping at 12 noon to generate interactive content, instead of combining the previous 12 noon to generate interactive content in the meal, in the specific interaction
  • the robot generates the multi-modal information of the acquired user, such as voice information, video information, picture information, and the like, and variable parameters. In this way, some unexpected events in human life can be added to the life axis of the robot, making the interaction of the robot more anthropomorphic.
  • control module is specifically configured to:
  • the speed of the motion information is accelerated.
  • the length of time of the action information is equal to the length of time of the voice information.
  • the playback speed of the voice information or/and the playback speed of the motion information are accelerated, so that the duration of the motion information is equal to the length of time of the voice information.
  • the specific meaning of the adjustment may compress or stretch the length of the voice information or/and the length of the motion information, or may speed up the playback speed. Or slow down the playback speed, for example, multiply the playback speed of the voice message by 2, or multiply the playback time of the action information by 0.8 or the like.
  • the time length of the voice information and the time length of the motion information are one minute.
  • the length of the voice information is 1 minute, and the duration of the motion information is 2 minutes.
  • the playback speed of the motion information can be accelerated to twice the original playback speed, and then the playback time after the motion information adjustment is 1 minute, thereby synchronizing with the voice information.
  • the playback speed of the voice information can be slowed down, and adjusted to 0.5 times the original playback speed, so that the voice information is adjusted and then slowed down to 2 minutes, thereby synchronizing with the motion information.
  • both the voice information and the motion information can be adjusted, for example, the voice information is slowed down, and the motion information is accelerated, and the time is adjusted to 1 minute and 30 seconds, and the voice and the motion can be synchronized.
  • control module is specifically configured to:
  • the difference between the length of the voice information and the time length of the action information is greater than the threshold, when the time length of the voice information is greater than the time length of the action information, at least two sets of action information are combined to make the combined action information time.
  • the length is equal to the length of time of the voice information.
  • part of the action information is selected, so that the length of the selected part action is equal to the time length of the voice information.
  • the meaning of the adjustment is to add or delete part of the action information, so that the time length of the action information is the same as the time length of the voice information.
  • the time length of the voice information and the time length of the motion information are 30 seconds.
  • the length of the voice information is 3 minutes, and the duration of the motion information is 1 minute.
  • other action information needs to be added to the original action information, for example, to find an action information with a length of 2 minutes, and the two sets of action information are sorted and combined to match the time length of the voice information. It is.
  • the length of the post-action information is 2 minutes, so that the length of the voice information can be matched the same.
  • the artificial intelligence module may be specifically configured to: select motion information that is closest to the time length of the voice information according to the length of the voice information, or select the closest voice information according to the time length of the motion information. .
  • the control module can conveniently adjust the time length of the voice information and the motion information, and it is easier to adjust to the same, and the adjusted play is more natural and smooth.
  • the system further includes an output module 204 for outputting the adjusted voice information and motion information to the virtual image for presentation.
  • the output can be output after the adjustment is consistent, and the output can be output on the virtual image, thereby making the virtual robot more anthropomorphic and improving the user experience.
  • the system further includes a processing module for fitting the self-cognitive parameters of the robot with the parameters of the scene in the variable parameters to generate variable parameters.
  • variable parameter includes at least a behavior that changes the user's original behavior and the change, and a parameter value that represents a change in the user's original behavior and the behavior after the change.
  • variable parameters are in the same state as the original plan.
  • the sudden change causes the user to be in another state.
  • the variable parameter represents the change of the behavior or state, and the state or behavior of the user after the change. For example, it was originally running at 5 pm, and suddenly there were other things, such as going to play, then changing from running to playing is a variable parameter, and the probability of such a change is also studied.
  • the artificial intelligence module is specifically configured to: generate interaction content according to the multi-modality information and the variable parameter and the fitting curve of the parameter change probability.
  • the fitting curve can be generated by the probability training of the variable parameters, thereby generating the robot interaction content.
  • the system includes a fitting curve generation module for using a probability algorithm to estimate a parameter between the robots using a network for probability estimation, and calculating a machine on a life time axis After the scene parameters on the life time axis are changed, the probability of each parameter change forms a fitting curve of the parameter change probability.
  • the probability algorithm can adopt the Bayesian probability algorithm.
  • the parameters in the self-cognition are matched with the parameters of the scene used in the variable participation axis, and the influence of the personification is generated.
  • the robot will know its geographical location, and will change the way the interactive content is generated according to the geographical environment in which it is located.
  • Bayesian probability algorithm to estimate the parameters between robots using Bayesian network, and calculate the probability of each parameter change after the change of the time axis scene parameters of the robot itself on the life time axis.
  • the curve dynamically affects the self-recognition of the robot itself.
  • This innovative module makes the robot itself a human lifestyle. For the expression, it can be changed according to the location scene.
  • the invention discloses a robot comprising a system for synchronizing speech and virtual actions as described above.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Robotics (AREA)
  • Mechanical Engineering (AREA)
  • Data Mining & Analysis (AREA)
  • Manipulator (AREA)

Abstract

L'invention concerne un procédé de synchronisation d'actions vocales et virtuelles qui consiste : à obtenir des informations multimodales d'un utilisateur (S101) ; à générer un contenu interactif selon les informations multimodales de l'utilisateur et un paramètre variable (300), le contenu interactif comprenant au moins des informations vocales et des informations d'action (S102); et à ajuster la durée des informations vocales et la durée des informations d'action pour qu'elles soient identiques (S103). Le contenu interactif est généré selon un ou plusieurs types d'informations multimodales de l'utilisateur, telles que la parole de l'utilisateur, l'expression de l'utilisateur et l'action de l'utilisateur. De plus, afin de synchroniser les informations vocales et les informations d'action, la durée des informations vocales et la durée des informations d'action sont réglées pour être identiques, de sorte que le son et les actions d'un robot peuvent être synchronisés et mis en correspondance pendant la lecture, et le robot peut utiliser non seulement la parole mais également de multiples autres formes d'expression, telles que des actions, pour une interaction. Par conséquent, les formes d'expression du robot sont encore diversifiées, le robot est plus humanisé et l'expérience de l'utilisateur en interaction avec le robot est également améliorée.
PCT/CN2016/089213 2016-07-07 2016-07-07 Procédé et système de synchronisation d'actions vocales et virtuelles, et robot WO2018006369A1 (fr)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201680001720.7A CN106471572B (zh) 2016-07-07 2016-07-07 一种同步语音及虚拟动作的方法、系统及机器人
PCT/CN2016/089213 WO2018006369A1 (fr) 2016-07-07 2016-07-07 Procédé et système de synchronisation d'actions vocales et virtuelles, et robot
JP2017133167A JP6567609B2 (ja) 2016-07-07 2017-07-06 音声と仮想動作を同期させる方法、システムとロボット本体

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/089213 WO2018006369A1 (fr) 2016-07-07 2016-07-07 Procédé et système de synchronisation d'actions vocales et virtuelles, et robot

Publications (1)

Publication Number Publication Date
WO2018006369A1 true WO2018006369A1 (fr) 2018-01-11

Family

ID=58230946

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/089213 WO2018006369A1 (fr) 2016-07-07 2016-07-07 Procédé et système de synchronisation d'actions vocales et virtuelles, et robot

Country Status (3)

Country Link
JP (1) JP6567609B2 (fr)
CN (1) CN106471572B (fr)
WO (1) WO2018006369A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110610703A (zh) * 2019-07-26 2019-12-24 深圳壹账通智能科技有限公司 基于机器人识别的语音输出方法、装置、机器人及介质

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107457787B (zh) * 2017-06-29 2020-12-08 杭州仁盈科技股份有限公司 一种服务机器人交互决策方法和装置
CN107577661B (zh) * 2017-08-07 2020-12-11 北京光年无限科技有限公司 一种针对虚拟机器人的交互输出方法以及系统
CN107784355A (zh) * 2017-10-26 2018-03-09 北京光年无限科技有限公司 虚拟人多模态交互数据处理方法和系统
CN109822587B (zh) * 2019-03-05 2022-05-31 哈尔滨理工大学 一种用于厂矿医院的语音导诊机器人头颈部装置控制方法
US20220366909A1 (en) * 2019-10-30 2022-11-17 Sony Group Corporation Information processing apparatus and command processing method
JP7510042B2 (ja) * 2020-01-27 2024-07-03 株式会社Mixi 情報処理システム、端末装置、端末装置の制御方法、及びプログラム
CN115497499A (zh) * 2022-08-30 2022-12-20 阿里巴巴(中国)有限公司 语音和动作时间同步的方法

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5111409A (en) * 1989-07-21 1992-05-05 Elon Gasper Authoring and use systems for sound synchronized animation
CN101364309A (zh) * 2008-10-09 2009-02-11 中国科学院计算技术研究所 一种源虚拟角色上的口形动画生成方法
US20090044112A1 (en) * 2007-08-09 2009-02-12 H-Care Srl Animated Digital Assistant
CN101968894A (zh) * 2009-07-28 2011-02-09 上海冰动信息技术有限公司 根据汉字自动实现音唇同步的方法
CN103596051A (zh) * 2012-08-14 2014-02-19 金运科技股份有限公司 电视装置及其虚拟主持人显示方法
CN104574478A (zh) * 2014-12-30 2015-04-29 北京像素软件科技股份有限公司 一种编辑动画人物口型的方法及装置
CN104866101A (zh) * 2015-05-27 2015-08-26 世优(北京)科技有限公司 虚拟对象的实时互动控制方法及装置
CN104883557A (zh) * 2015-05-27 2015-09-02 世优(北京)科技有限公司 实时全息投影方法、装置及系统

Family Cites Families (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10143351A (ja) * 1996-11-13 1998-05-29 Sharp Corp インタフェース装置
EP0944269B1 (fr) * 1996-12-04 2002-11-13 Matsushita Electric Industrial Co., Ltd. Disque optique pour enregistrement optique d'images tridimensionnelles et de haute resolution, dispositif de reproduction a disque optique, et dispositif d'enregistrement a disque optique
JP3792882B2 (ja) * 1998-03-17 2006-07-05 株式会社東芝 感情生成装置及び感情生成方法
JP4032273B2 (ja) * 1999-12-28 2008-01-16 ソニー株式会社 同期制御装置および方法、並びに記録媒体
JP4670136B2 (ja) * 2000-10-11 2011-04-13 ソニー株式会社 オーサリング・システム及びオーサリング方法、並びに記憶媒体
JP3930389B2 (ja) * 2002-07-08 2007-06-13 三菱重工業株式会社 ロボット発話中の動作プログラム生成装置及びロボット
JP2005003926A (ja) * 2003-06-11 2005-01-06 Sony Corp 情報処理装置および方法、並びにプログラム
EP1845724A1 (fr) * 2005-02-03 2007-10-17 Matsushita Electric Industrial Co., Ltd. Dispositif d'enregistrement/de reproduction, methode d'enregistrement/de reproduction, systeme d'enregistrement dote d'un programme d'enregistrement/de reproduction et circuit integre utilise par le systeme d'enregistrement/de reproduction
JP2008040726A (ja) * 2006-08-04 2008-02-21 Univ Of Electro-Communications ユーザ支援システム及びユーザ支援方法
JP5045519B2 (ja) * 2008-03-26 2012-10-10 トヨタ自動車株式会社 動作生成装置、ロボット及び動作生成方法
JP2012504810A (ja) * 2008-10-03 2012-02-23 ビ−エイイ− システムズ パブリック リミテッド カンパニ− システムにおける故障を診断するモデルの更新の支援
CN101604204B (zh) * 2009-07-09 2011-01-05 北京科技大学 智能情感机器人分布式认知系统
JP2011054088A (ja) * 2009-09-04 2011-03-17 National Institute Of Information & Communication Technology 情報処理装置、情報処理方法、プログラム及び対話システム
JP2012215645A (ja) * 2011-03-31 2012-11-08 Speakglobal Ltd コンピュータを利用した外国語会話練習システム
CN105598972B (zh) * 2016-02-04 2017-08-08 北京光年无限科技有限公司 一种机器人系统及交互方法

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5111409A (en) * 1989-07-21 1992-05-05 Elon Gasper Authoring and use systems for sound synchronized animation
US20090044112A1 (en) * 2007-08-09 2009-02-12 H-Care Srl Animated Digital Assistant
CN101364309A (zh) * 2008-10-09 2009-02-11 中国科学院计算技术研究所 一种源虚拟角色上的口形动画生成方法
CN101968894A (zh) * 2009-07-28 2011-02-09 上海冰动信息技术有限公司 根据汉字自动实现音唇同步的方法
CN103596051A (zh) * 2012-08-14 2014-02-19 金运科技股份有限公司 电视装置及其虚拟主持人显示方法
CN104574478A (zh) * 2014-12-30 2015-04-29 北京像素软件科技股份有限公司 一种编辑动画人物口型的方法及装置
CN104866101A (zh) * 2015-05-27 2015-08-26 世优(北京)科技有限公司 虚拟对象的实时互动控制方法及装置
CN104883557A (zh) * 2015-05-27 2015-09-02 世优(北京)科技有限公司 实时全息投影方法、装置及系统

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110610703A (zh) * 2019-07-26 2019-12-24 深圳壹账通智能科技有限公司 基于机器人识别的语音输出方法、装置、机器人及介质

Also Published As

Publication number Publication date
JP6567609B2 (ja) 2019-08-28
CN106471572A (zh) 2017-03-01
JP2018001403A (ja) 2018-01-11
CN106471572B (zh) 2019-09-03

Similar Documents

Publication Publication Date Title
WO2018006369A1 (fr) Procédé et système de synchronisation d'actions vocales et virtuelles, et robot
WO2018006370A1 (fr) Procédé et système d'interaction pour robot 3d virtuel, et robot
WO2018006371A1 (fr) Procédé et système de synchronisation de paroles et d'actions virtuelles, et robot
TWI778477B (zh) 互動方法、裝置、電子設備以及儲存媒體
JP7109408B2 (ja) 広範囲同時遠隔ディジタル提示世界
KR101306221B1 (ko) 3차원 사용자 아바타를 이용한 동영상 제작장치 및 방법
WO2018000267A1 (fr) Procédé de génération de contenu d'interaction de robot, système et robot
WO2018000259A1 (fr) Procédé et système pour générer un contenu d'interaction de robot et robot
US20220044490A1 (en) Virtual reality presentation of layers of clothing on avatars
WO2018000268A1 (fr) Procédé et système pour générer un contenu d'interaction de robot, et robot
WO2018006374A1 (fr) Procédé, système et robot de recommandation de fonction basés sur un réveil automatique
JP2016071247A (ja) 対話装置
WO2018006373A1 (fr) Procédé et système permettant de commander un appareil ménager sur la base d'une reconnaissance d'intention, et robot
CN111445561B (zh) 虚拟对象的处理方法、装置、设备及存储介质
WO2018006372A1 (fr) Procédé et système de commande d'appareil ménager sur la base de la reconnaissance d'intention, et robot
US11681372B2 (en) Touch enabling process, haptic accessory, and core haptic engine to enable creation and delivery of tactile-enabled experiences with virtual objects
WO2018000266A1 (fr) Procédé et système permettant de générer un contenu d'interaction de robot, et robot
CN104616336B (zh) 一种动画构建方法及装置
US11094136B2 (en) Virtual reality presentation of clothing fitted on avatars
WO2018000258A1 (fr) Procédé et système permettant de générer un contenu d'interaction de robot et robot
EP4275147A1 (fr) Production d'une représentation d'image numérique d'un corps
EP4053792A1 (fr) Dispositif de traitement d'informations, procédé de traitement d'informations, et procédé de fabrication de modèle d'intelligence artificielle
US11107129B1 (en) Dynamic media content for in-store screen experiences
WO2022256162A1 (fr) Réincrustation d'image avec démêlage d'éclairage
JPWO2018168247A1 (ja) 情報処理装置、情報処理方法およびプログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16907874

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16907874

Country of ref document: EP

Kind code of ref document: A1