WO2018006371A1 - Procédé et système de synchronisation de paroles et d'actions virtuelles, et robot - Google Patents

Procédé et système de synchronisation de paroles et d'actions virtuelles, et robot Download PDF

Info

Publication number
WO2018006371A1
WO2018006371A1 PCT/CN2016/089215 CN2016089215W WO2018006371A1 WO 2018006371 A1 WO2018006371 A1 WO 2018006371A1 CN 2016089215 W CN2016089215 W CN 2016089215W WO 2018006371 A1 WO2018006371 A1 WO 2018006371A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
length
robot
voice
time
Prior art date
Application number
PCT/CN2016/089215
Other languages
English (en)
Chinese (zh)
Inventor
邱楠
杨新宇
王昊奋
Original Assignee
深圳狗尾草智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳狗尾草智能科技有限公司 filed Critical 深圳狗尾草智能科技有限公司
Priority to CN201680001731.5A priority Critical patent/CN106463118B/zh
Priority to PCT/CN2016/089215 priority patent/WO2018006371A1/fr
Priority to JP2017133168A priority patent/JP6567610B2/ja
Publication of WO2018006371A1 publication Critical patent/WO2018006371A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • BPERFORMING OPERATIONS; TRANSPORTING
    • B25HAND TOOLS; PORTABLE POWER-DRIVEN TOOLS; MANIPULATORS
    • B25JMANIPULATORS; CHAMBERS PROVIDED WITH MANIPULATION DEVICES
    • B25J13/00Controls for manipulators
    • B25J13/003Controls for manipulators by means of an audio-responsive input
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/04Time compression or expansion
    • G10L21/055Time compression or expansion for synchronising with other signals, e.g. video signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/06Transformation of speech into a non-audible representation, e.g. speech visualisation or speech processing for tactile aids
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/226Procedures used during a speech recognition process, e.g. man-machine dialogue using non-speech characteristics

Definitions

  • the present invention relates to the field of robot interaction technologies, and in particular, to a method, system and robot for synchronizing voice and virtual motion.
  • robots are used more and more. For example, some elderly people and children can interact with robots, including dialogue and entertainment.
  • the inventor developed a virtual robot display device and imaging system, which can form a 3D animated image, and the virtual robot's host accepts human commands such as voice to interact with humans. Then the virtual 3D animated image will respond to the sounds and actions according to the instructions of the host, so that the robot can be more anthropomorphic, not only can interact with humans in sounds and expressions, but also interact with humans in actions, etc. Improve the experience of interaction.
  • a method of synchronizing speech and virtual actions including:
  • the interactive content includes at least voice information and action information
  • the length of the voice information and the length of the motion information are adjusted to phase
  • the same specific steps include:
  • the difference between the length of the voice information and the duration of the motion information is not greater than the threshold, when the length of the voice information is less than the duration of the motion information, the playback speed of the motion information is accelerated, and the duration of the motion information is equal to the voice.
  • the length of time for the message is not greater than the threshold, when the length of the voice information is less than the duration of the motion information, the playback speed of the motion information is accelerated, and the duration of the motion information is equal to the voice. The length of time for the message.
  • the playback speed of the voice information or/and the playback speed of the motion information is accelerated, so that the time length of the motion information is equal to the length of time of the voice information.
  • the specific steps of adjusting the length of the voice information and the length of the action information to the same include:
  • the difference between the length of the voice information and the time length of the action information is greater than the threshold, when the time length of the voice information is greater than the time length of the action information, at least two sets of action information are sorted and combined, so that the combined action information is The length of time is equal to the length of time of the voice message.
  • part of the action information is selected, so that the length of the selected part action is equal to the time length of the voice information.
  • the method for generating parameters of the life time axis of the robot includes:
  • the parameters of the robot's self-cognition are fitted to the parameters in the life time axis to generate the life time axis of the robot.
  • the step of expanding the self-cognition of the robot specifically comprises: combining the life scene with the self-knowledge of the robot to form a self-cognitive curve based on the life time axis.
  • the step of fitting the self-cognitive parameter of the robot to the parameter in the life time axis comprises: using a probability algorithm to calculate each parameter of the robot on the life time axis after the time axis scene parameter is changed.
  • the probability of change forms a fitted curve.
  • the life time axis refers to a time axis including 24 hours a day
  • the parameters in the life time axis include at least a daily life behavior performed by the user on the life time axis and parameter values representing the behavior.
  • a system for synchronizing voice and virtual actions including:
  • An obtaining module configured to acquire multi-modal information of the user
  • An artificial intelligence module for generating interactions based on the user's multimodal information and the life time axis
  • the interactive content includes at least voice information and action information
  • the control module is configured to adjust the length of the voice information and the length of the motion information to be the same.
  • control module is specifically configured to:
  • the difference between the length of the voice information and the duration of the motion information is not greater than the threshold, when the length of the voice information is less than the duration of the motion information, the playback speed of the motion information is accelerated, and the duration of the motion information is equal to the voice.
  • the length of time for the message is not greater than the threshold, when the length of the voice information is less than the duration of the motion information, the playback speed of the motion information is accelerated, and the duration of the motion information is equal to the voice. The length of time for the message.
  • the playback speed of the voice information or/and the playback speed of the motion information is accelerated, so that the time length of the motion information is equal to the length of time of the voice information.
  • control module is specifically configured to:
  • the difference between the length of the voice information and the time length of the action information is greater than the threshold, when the time length of the voice information is greater than the time length of the action information, at least two sets of action information are combined to make the combined action information time.
  • the length is equal to the length of time of the voice information.
  • part of the action information is selected, so that the length of the selected part action is equal to the time length of the voice information.
  • the system comprises a processing module for:
  • the parameters of the robot's self-cognition are fitted to the parameters in the life time axis to generate the life time axis of the robot.
  • the processing module is specifically configured to combine a life scene with a self-awareness of the robot to form a self-cognitive curve based on a life time axis.
  • the processing module is specifically configured to: use a probability algorithm to calculate a probability of each parameter change of the robot on the life time axis after the time axis scene parameter is changed, to form a fitting curve.
  • the life time axis refers to a time axis including 24 hours a day
  • the parameters in the life time axis include at least a daily life behavior performed by the user on the life time axis and parameter values representing the behavior.
  • the invention discloses a robot comprising a system for synchronizing speech and virtual actions as described above.
  • the present invention has the following advantages: synchronous speech and virtual action of the present invention
  • the method includes: acquiring multi-modal information of the user; generating interactive content according to the multi-modal information of the user and the life time axis, where the interactive content includes at least the voice information and the action information; the time length of the voice information and the time of the action information The length is adjusted to be the same.
  • the interactive content can be generated by one or more of the user's multimodal information such as user voice, user expression, user action, etc.
  • the interactive content includes at least voice information and motion information, and in order to make the voice information and action
  • the information can be synchronized, and the time length of the voice information and the time length of the motion information are adjusted to be the same, so that the robot can synchronously match when playing the sound and the action, so that the robot not only has the voice performance but also has the action when interacting.
  • the robot's representation is more diverse, making the robot more anthropomorphic and improving the user's experience in robot interaction.
  • FIG. 1 is a flowchart of a method for synchronizing voice and virtual actions according to Embodiment 1 of the present invention
  • FIG. 2 is a schematic diagram of a system for synchronizing voice and virtual actions according to Embodiment 2 of the present invention.
  • Computer devices include user devices and network devices.
  • the user equipment or the client includes but is not limited to a computer, a smart phone, a PDA, etc.;
  • the network device includes but is not limited to a single network server, a server group composed of multiple network servers, or a cloud computing-based computer or network server. cloud.
  • the computer device can operate alone to carry out the invention, and can also access the network and implement the invention through interoperation with other computer devices in the network.
  • the network in which the computer device is located includes, but is not limited to, the Internet, a wide area network, a metropolitan area network, a local area network, a VPN network, and the like.
  • first means “first,” “second,” and the like may be used herein to describe the various elements, but the elements should not be limited by these terms, and the terms are used only to distinguish one element from another.
  • the term “and/or” used herein includes any and all combinations of one or more of the associated listed items. When a unit is referred to as being “connected” or “coupled” to another unit, it can be directly connected or coupled to the other unit, or an intermediate unit can be present.
  • a method for synchronizing voice and virtual actions including:
  • S102 Generate interaction content according to the multimodal information of the user and the life timeline 300, where the interaction content includes at least voice information and action information.
  • the method for synchronizing speech and virtual actions of the present invention comprises: acquiring multimodal information of a user; generating interactive content according to multimodal information of the user and a life time axis, wherein the interactive content includes at least voice information and action information; The length of time and the length of the action information are adjusted to be the same.
  • the interactive content can be generated by one or more of the user's multimodal information such as user voice, user expression, user action, etc.
  • the interactive content includes at least voice information and motion information, and in order to make the voice information and action
  • the information can be synchronized, and the time length of the voice information and the time length of the motion information are adjusted to be the same, so that the robot can synchronously match when playing the sound and the action, so that the robot not only has the voice performance but also has the action when interacting.
  • the robot's representation is more diverse, making the robot more anthropomorphic and improving the user's experience in robot interaction.
  • the present invention adds the life time axis in which the robot is located to the interactive content generation of the robot, and makes the robot more humanized when interacting with the human, so that the robot has a human lifestyle in the life time axis, and the method can enhance the robot interaction content.
  • the interactive content may be a combination of one or more of an expression or a text or a voice or an action.
  • the life time axis 300 of the robot is completed and set in advance. Specifically, the life time axis 300 of the robot is a series of parameter collections, and this parameter is transmitted to the system to generate interactive content.
  • the multimodal information in this embodiment may be one of user expression, voice information, gesture information, scene information, image information, video information, face information, pupil iris information, light sense information, and fingerprint information.
  • the life time axis is specifically: according to the time axis of human daily life, according to the human way, the self-cognition value of the robot itself in the time axis of daily life is fitted, and the behavior of the robot is according to this The action is to get the robot's own behavior in one day, so that the robot can perform its own behavior based on the life time axis, such as generating interactive content and communicating with humans. If the robot is always awake, it will act according to the behavior on this timeline, and the robot's self-awareness will be changed according to this timeline.
  • the life timeline and variable parameters can be used to change the attributes of self-cognition, such as mood values, fatigue values, etc., and can also automatically add new self-awareness information, such as no previous anger value, based on the life time axis and The scene of the variable factor will automatically add to the self-cognition of the robot based on the scene that previously simulated the human self-cognition.
  • the life time axis includes not only voice information, but also information such as actions.
  • the user speaks to the robot: "It's so sleepy”, the robot understands that the user is sleepy, and then combines the life axis of the robot, for example, the current time is 9:00, then the robot knows that the owner just got up, Then you should ask the owner early, for example, answer the voice "Good morning” as a reply, you can also sing a song, and match the dance movement.
  • the robot understands that the user is sleepy, and then the robot's life time axis, such as the current time is 9:00, then the robot knows that the owner needs to sleep, then Will reply to the voice "master good night, sleep well” and other similar terms, and with the appropriate good night, sleep movements. This way is more close to people's life than simple voice and expression reply, and the action is more anthropomorphic.
  • the specific steps of adjusting the length of the voice information and the length of the action information to the same include:
  • the difference between the length of the voice information and the duration of the motion information is not greater than the threshold, when the length of the voice information is less than the duration of the motion information, the playback speed of the motion information is accelerated, and the duration of the motion information is equal to the voice.
  • the length of time for the message is not greater than the threshold, when the length of the voice information is less than the duration of the motion information, the playback speed of the motion information is accelerated, and the duration of the motion information is equal to the voice. The length of time for the message.
  • the playback speed of the voice information or/and the playback speed of the motion information are accelerated, so that the duration of the motion information is equal to the length of time of the voice information.
  • the specific meaning of the adjustment may be the length of time to compress or stretch the voice information or / and the length of the action information, or may speed up the playback speed or slow down the playback speed, for example, multiply the playback speed of the voice information by 2, or The playback time of the action information is multiplied by 0.8 and so on.
  • the time length of the voice information and the time length of the motion information are one minute.
  • the length of the voice information is 1 minute, and the duration of the motion information is 2 minutes.
  • the playback speed of the motion information can be accelerated to twice the original playback speed, and then the playback time after the motion information adjustment is 1 minute, thereby synchronizing with the voice information.
  • the playback speed of the voice information can be slowed down, and adjusted to 0.5 times the original playback speed, so that the voice information is adjusted and then slowed down to 2 minutes, thereby synchronizing with the motion information.
  • both the voice information and the motion information can be adjusted, for example, the voice information is slowed down, and the motion information is accelerated, and the time is adjusted to 1 minute and 30 seconds, and the voice and the motion can be synchronized.
  • the specific steps of adjusting the length of the voice information and the length of the action information to the same include:
  • the difference between the length of the voice information and the time length of the action information is greater than the threshold, when the time length of the voice information is greater than the time length of the action information, at least two sets of action information are sorted and combined, so that the combined action information is The length of time is equal to the length of time of the voice message.
  • part of the action information is selected, so that the length of the selected part action is equal to the time length of the voice information.
  • the meaning of the adjustment is to add or delete part of the action information, so that the time length of the action information is the same as the time length of the voice information.
  • the time length of the voice information and the time length of the motion information are 30 seconds.
  • the length of the voice information is 3 minutes, and the duration of the motion information is 1 minute.
  • other action information needs to be added to the original action information, for example, to find an action information with a length of 2 minutes, and the two sets of action information are sorted and combined to match the time length of the voice information. It is.
  • the length of the post action information is 2 minutes, so that The length of the voice message matches the same.
  • the action information closest to the time length of the voice information may be selected according to the length of the voice information, or the closest voice information may be selected according to the time length of the motion information.
  • the control module can conveniently adjust the time length of the voice information and the motion information, and it is easier to adjust to the same, and the adjusted play is more natural and smooth.
  • the method further includes: outputting the adjusted voice information and the motion information to the virtual image for display.
  • the output can be output after the adjustment is consistent, and the output can be output on the virtual image, thereby making the virtual robot more anthropomorphic and improving the user experience.
  • the method for generating parameters of the life time axis of the robot includes:
  • the parameters of the robot's self-cognition are fitted to the parameters in the life time axis to generate the life time axis of the robot.
  • the life time axis is added to the self-cognition of the robot itself, so that the robot has an anthropomorphic life. For example, add the cognition of lunch to the robot.
  • the step of expanding the self-cognition of the robot specifically includes: combining the life scene with the self-awareness of the robot to form a self-cognitive curve based on the life time axis.
  • the life time axis can be specifically added to the parameters of the robot itself.
  • the step of fitting the parameter of the self-cognition of the robot to the parameter in the life time axis comprises: using a probability algorithm to calculate the time of the robot on the life time axis after the time axis scene parameter is changed The probability of each parameter change forms a fitted curve.
  • the probability algorithm may be a Bayesian probability algorithm.
  • the robot will have sleep, exercise, eat, dance, read books, eat, make up, sleep and other actions. Each action will affect the self-cognition of the robot itself, and combine the parameters on the life time axis with the self-cognition of the robot itself.
  • the robot's self-cognition includes, mood, fatigue value, intimacy. , good feelings, number of interactions, The three-dimensional cognition, age, height, weight, intimacy, game scene value, game object value, location scene value, location object value, etc. of the robot.
  • For the robot to identify the location of the scene such as cafes, bedrooms, etc.
  • the machine will perform different actions in the time axis of the day, such as sleeping at night, eating at noon, exercising during the day, etc. All the scenes in the life time axis will have an impact on self-awareness. These numerical changes are modeled by the dynamic fit of the probability model, fitting the probability that all of these actions occur on the time axis.
  • Scene Recognition This type of scene recognition changes the value of the geographic scene in self-cognition.
  • a system for synchronizing voice and virtual actions including:
  • the obtaining module 201 is configured to acquire multi-modal information of the user
  • the artificial intelligence module 202 is configured to generate interaction content according to the multimodal information of the user and the life time axis, where the interaction content includes at least voice information and action information, wherein the life time axis is generated by the life time axis module 301;
  • the control module 203 is configured to adjust the time length of the voice information and the time length of the motion information to be the same.
  • the interactive content can be generated by one or more of the user's multimodal information such as user voice, user expression, user action, etc.
  • the interactive content includes at least voice information and motion information, and in order to make the voice information and action
  • the information can be synchronized, and the time length of the voice information and the time length of the motion information are adjusted to be the same, so that the robot can synchronously match when playing the sound and the action, so that the robot not only has the voice performance but also has the action when interacting.
  • the robot's representation is more diverse, making the robot more anthropomorphic and improving the user's experience in robot interaction.
  • the present invention adds the life time axis in which the robot is located to the interactive content generation of the robot, and makes the robot more humanized when interacting with the human, so that the robot has a human lifestyle in the life time axis, and the method can enhance the robot interaction content.
  • the interactive content may be a combination of one or more of an expression or a text or a voice or an action.
  • the life time axis 300 of the robot is completed and set in advance, specifically, the life time axis of the robot 300 It is a collection of parameters that are transmitted to the system to generate interactive content.
  • the multimodal information in this embodiment may be one of user expression, voice information, gesture information, scene information, image information, video information, face information, pupil iris information, light sense information, and fingerprint information.
  • the life time axis is specifically: according to the time axis of human daily life, according to the human way, the self-cognition value of the robot itself in the time axis of daily life is fitted, and the behavior of the robot is according to this The action is to get the robot's own behavior in one day, so that the robot can perform its own behavior based on the life time axis, such as generating interactive content and communicating with humans. If the robot is always awake, it will act according to the behavior on this timeline, and the robot's self-awareness will be changed according to this timeline.
  • the life timeline and variable parameters can be used to change the attributes of self-cognition, such as mood values, fatigue values, etc., and can also automatically add new self-awareness information, such as no previous anger value, based on the life time axis and The scene of the variable factor will automatically add to the self-cognition of the robot based on the scene that previously simulated the human self-cognition.
  • the life time axis includes not only voice information, but also information such as actions.
  • the user speaks to the robot: "It's so sleepy”, the robot understands that the user is sleepy, and then combines the life axis of the robot, for example, the current time is 9:00, then the robot knows that the owner just got up, Then you should ask the owner early, for example, answer the voice "Good morning” as a reply, you can also sing a song, and match the dance movement.
  • the robot understands that the user is sleepy, and then the robot's life time axis, such as the current time is 9:00, then the robot knows that the owner needs to sleep, then Will reply to the voice "master good night, sleep well” and other similar terms, and with the appropriate good night, sleep movements. This way is more close to people's life than simple voice and expression reply, and the action is more anthropomorphic.
  • control module is specifically configured to:
  • the difference between the length of the voice information and the duration of the motion information is not greater than the threshold, when the length of the voice information is less than the duration of the motion information, the playback speed of the motion information is accelerated, and the duration of the motion information is equal to the voice.
  • the length of time for the message is not greater than the threshold, when the length of the voice information is less than the duration of the motion information, the playback speed of the motion information is accelerated, and the duration of the motion information is equal to the voice. The length of time for the message.
  • the playback speed of the voice information or/and the playback speed of the motion information are accelerated, so that the duration of the motion information is equal to the length of time of the voice information.
  • the specific meaning of the adjustment can compress or stretch the length of the voice information or / and the length of the action information, or can speed up the playback speed or slow down the playback speed, for example, multiply the playback speed of the voice information by 2, or the action
  • the playback time of the information is multiplied by 0.8 and so on.
  • the time length of the voice information and the time length of the motion information are one minute.
  • the length of the voice information is 1 minute, and the duration of the motion information is 2 minutes.
  • the playback speed of the motion information can be accelerated to twice the original playback speed, and then the playback time after the motion information adjustment is 1 minute, thereby synchronizing with the voice information.
  • the playback speed of the voice information can be slowed down, and adjusted to 0.5 times the original playback speed, so that the voice information is adjusted and then slowed down to 2 minutes, thereby synchronizing with the motion information.
  • both the voice information and the motion information can be adjusted, for example, the voice information is slowed down, and the motion information is accelerated, and the time is adjusted to 1 minute and 30 seconds, and the voice and the motion can be synchronized.
  • control module is specifically configured to:
  • the difference between the length of the voice information and the time length of the action information is greater than the threshold, when the time length of the voice information is greater than the time length of the action information, at least two sets of action information are combined to make the combined action information time.
  • the length is equal to the length of time of the voice information.
  • part of the action information is selected, so that the length of the selected part action is equal to the time length of the voice information.
  • the meaning of the adjustment is to add or delete part of the action information, so that the time length of the action information is the same as the time length of the voice information.
  • the time length of the voice information and the time length of the motion information are 30 seconds.
  • the length of the voice information is 3 minutes, and the duration of the motion information is 1 minute.
  • other action information needs to be added to the original action information, for example, to find an action information with a length of 2 minutes, and the two sets of action information are sorted and combined to match the time length of the voice information. It is.
  • the length of the post-action information is 2 minutes, so that the length of the voice information can be matched the same.
  • the artificial intelligence module may be specifically used to: according to the time of the voice information For the length of the interval, the motion information closest to the length of the voice information is selected, and the closest voice information may be selected according to the length of the motion information.
  • the control module can conveniently adjust the time length of the voice information and the motion information, and it is easier to adjust to the same, and the adjusted play is more natural and smooth.
  • the system further includes an output module 204 for outputting the adjusted voice information and motion information to the virtual image for presentation.
  • the output can be output after the adjustment is consistent, and the output can be output on the virtual image, thereby making the virtual robot more anthropomorphic and improving the user experience.
  • the system includes a time axis based and artificial intelligence cloud processing module for:
  • the self-cognitive parameters of the robot are fitted to the parameters in the life time axis to generate a robot life time axis.
  • the life time axis is added to the self-cognition of the robot itself, so that the robot has an anthropomorphic life. For example, add the cognition of lunch to the robot.
  • the time-based and artificial intelligence cloud processing module is specifically configured to combine a life scene with a self-awareness of the robot to form a self-cognitive curve based on a life time axis.
  • the life time axis can be specifically added to the parameters of the robot itself.
  • the time-based and artificial intelligence cloud processing module is specifically configured to: use a probability algorithm to calculate a probability of each parameter change of a robot on a life time axis after a change of a time axis scene parameter, to form a fit curve.
  • the probability algorithm may be a Bayesian probability algorithm.
  • the robot will have sleep, exercise, eat, dance, read books, eat, make up, sleep and other actions. Each action will affect the self-cognition of the robot itself, and combine the parameters on the life time axis with the self-cognition of the robot itself.
  • the robot's self-cognition includes, mood, fatigue value, intimacy. , goodness, number of interactions, three-dimensional cognition of the robot, age, height, weight, intimacy, game scene value, game object value, location scene value, location object value, etc. For the robot to identify the location of the scene, such as cafes, bedrooms, etc.
  • the machine will perform different actions in the time axis of the day, such as sleeping at night, eating at noon, exercising during the day, etc. All the scenes in the life time axis will have an impact on self-awareness. These numerical changes are modeled by the dynamic fit of the probability model, fitting the probability that all of these actions occur on the time axis.
  • Scene Recognition This type of scene recognition changes the value of the geographic scene in self-cognition.
  • the invention discloses a robot comprising a system for synchronizing speech and virtual actions as described above.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Robotics (AREA)
  • Mechanical Engineering (AREA)
  • Manipulator (AREA)
  • Toys (AREA)

Abstract

L'invention porte sur un procédé de synchronisation de paroles et d'actions virtuelles, comprenant : l'obtention d'informations multimodales d'un utilisateur (S101) ; la génération d'un contenu interactif selon les informations multimodales et une chronologie de vie (300) de l'utilisateur, le contenu interactif incluant au moins des informations de paroles et des informations d'actions (S102) ; et l'ajustement de la durée des informations de paroles et de la durée des informations d'actions pour qu'elles soient identiques (S103). L'invention concerne également un système de synchronisation de paroles et d'actions virtuelles, comprenant un module d'acquisition (201), un module d'intelligence artificielle (202), un module de commande (203) et un module de sortie (204). De cette manière, le contenu interactif peut être généré selon un ou plusieurs types d'informations multimodales de l'utilisateur, telles que les paroles de l'utilisateur, l'expression de l'utilisateur et une action de l'utilisateur, et le contenu interactif comprend au moins les informations de paroles et les informations d'actions. De plus, afin de synchroniser les informations de paroles et les informations d'actions, la durée des informations de paroles et la durée des informations d'actions sont ajustées pour être identiques, de sorte que le son et les actions d'un robot puissent être synchronisés et mis en correspondance pendant la lecture. Par conséquent, le robot est plus humanisé, et l'expérience utilisateur en interaction avec le robot est également améliorée.
PCT/CN2016/089215 2016-07-07 2016-07-07 Procédé et système de synchronisation de paroles et d'actions virtuelles, et robot WO2018006371A1 (fr)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201680001731.5A CN106463118B (zh) 2016-07-07 2016-07-07 一种同步语音及虚拟动作的方法、系统及机器人
PCT/CN2016/089215 WO2018006371A1 (fr) 2016-07-07 2016-07-07 Procédé et système de synchronisation de paroles et d'actions virtuelles, et robot
JP2017133168A JP6567610B2 (ja) 2016-07-07 2017-07-06 音声と仮想動作を同期させる方法、システムとロボット本体

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/089215 WO2018006371A1 (fr) 2016-07-07 2016-07-07 Procédé et système de synchronisation de paroles et d'actions virtuelles, et robot

Publications (1)

Publication Number Publication Date
WO2018006371A1 true WO2018006371A1 (fr) 2018-01-11

Family

ID=58215741

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/089215 WO2018006371A1 (fr) 2016-07-07 2016-07-07 Procédé et système de synchronisation de paroles et d'actions virtuelles, et robot

Country Status (3)

Country Link
JP (1) JP6567610B2 (fr)
CN (1) CN106463118B (fr)
WO (1) WO2018006371A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108650217A (zh) * 2018-03-21 2018-10-12 腾讯科技(深圳)有限公司 动作状态的同步方法、装置、存储介质及电子装置
CN112528000A (zh) * 2020-12-22 2021-03-19 北京百度网讯科技有限公司 虚拟机器人的生成方法、装置和电子设备

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107992935A (zh) * 2017-12-14 2018-05-04 深圳狗尾草智能科技有限公司 为机器人设置生活周期的方法、设备及介质
CN109202925A (zh) * 2018-09-03 2019-01-15 深圳狗尾草智能科技有限公司 实现机器人动作和语音同步的方法、系统及设备
CN109521878A (zh) * 2018-11-08 2019-03-26 歌尔科技有限公司 交互方法、装置和计算机可读存储介质
CN115497499A (zh) * 2022-08-30 2022-12-20 阿里巴巴(中国)有限公司 语音和动作时间同步的方法
CN117058286B (zh) * 2023-10-13 2024-01-23 北京蔚领时代科技有限公司 一种文字驱动数字人生成视频的方法和装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101290718A (zh) * 2008-05-30 2008-10-22 梅敏 一种网络互动语音玩具组件及其实现方法
CN101604204A (zh) * 2009-07-09 2009-12-16 北京科技大学 智能情感机器人分布式认知技术
CN103037945A (zh) * 2010-04-30 2013-04-10 方瑞麟 具有基于声音的动作同步化的交互式装置
US9147388B2 (en) * 2012-06-26 2015-09-29 Yamaha Corporation Automatic performance technique using audio waveform data
CN105598972A (zh) * 2016-02-04 2016-05-25 北京光年无限科技有限公司 一种机器人系统及交互方法

Family Cites Families (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH10143351A (ja) * 1996-11-13 1998-05-29 Sharp Corp インタフェース装置
WO1998025413A1 (fr) * 1996-12-04 1998-06-11 Matsushita Electric Industrial Co., Ltd. Disque optique pour enregistrement optique d'images tridimensionnelles a haute resolution, dispositif de reproduction a disque optique, et dispositif d'enregistrement a disque optique
JP3792882B2 (ja) * 1998-03-17 2006-07-05 株式会社東芝 感情生成装置及び感情生成方法
JP2001154681A (ja) * 1999-11-30 2001-06-08 Sony Corp 音声処理装置および音声処理方法、並びに記録媒体
JP2001215940A (ja) * 2000-01-31 2001-08-10 Toshiba Corp 表情を有する知的ロボット
JP3930389B2 (ja) * 2002-07-08 2007-06-13 三菱重工業株式会社 ロボット発話中の動作プログラム生成装置及びロボット
JP2005003926A (ja) * 2003-06-11 2005-01-06 Sony Corp 情報処理装置および方法、並びにプログラム
JP2005092675A (ja) * 2003-09-19 2005-04-07 Science Univ Of Tokyo ロボット
WO2006082787A1 (fr) * 2005-02-03 2006-08-10 Matsushita Electric Industrial Co., Ltd. Dispositif d’enregistrement/de reproduction, methode d’enregistrement/de reproduction, systeme d’enregistrement dote d’un programme d’enregistrement/de reproduction et circuit integre utilise par le systeme d’enregistrement/de reproduction
JP2008040726A (ja) * 2006-08-04 2008-02-21 Univ Of Electro-Communications ユーザ支援システム及びユーザ支援方法
JP2009141555A (ja) * 2007-12-05 2009-06-25 Fujifilm Corp 音声入力機能付き撮像装置及びその音声記録方法
JP5045519B2 (ja) * 2008-03-26 2012-10-10 トヨタ自動車株式会社 動作生成装置、ロボット及び動作生成方法
JP2012504810A (ja) * 2008-10-03 2012-02-23 ビ−エイイ− システムズ パブリック リミテッド カンパニ− システムにおける故障を診断するモデルの更新の支援
JP2010094799A (ja) * 2008-10-17 2010-04-30 Littleisland Inc 人型ロボット
JP2011054088A (ja) * 2009-09-04 2011-03-17 National Institute Of Information & Communication Technology 情報処理装置、情報処理方法、プログラム及び対話システム
JP2012215645A (ja) * 2011-03-31 2012-11-08 Speakglobal Ltd コンピュータを利用した外国語会話練習システム
CN103596051A (zh) * 2012-08-14 2014-02-19 金运科技股份有限公司 电视装置及其虚拟主持人显示方法
JP6126028B2 (ja) * 2014-02-28 2017-05-10 三井不動産株式会社 ロボット制御システム、ロボット制御サーバ及びロボット制御プログラム
JP6053847B2 (ja) * 2014-06-05 2016-12-27 Cocoro Sb株式会社 行動制御システム、システム及びプログラム
WO2016006088A1 (fr) * 2014-07-10 2016-01-14 株式会社 東芝 Dispositif électronique, procédé et programme
CN104574478A (zh) * 2014-12-30 2015-04-29 北京像素软件科技股份有限公司 一种编辑动画人物口型的方法及装置
CN105807933B (zh) * 2016-03-18 2019-02-12 北京光年无限科技有限公司 一种用于智能机器人的人机交互方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101290718A (zh) * 2008-05-30 2008-10-22 梅敏 一种网络互动语音玩具组件及其实现方法
CN101604204A (zh) * 2009-07-09 2009-12-16 北京科技大学 智能情感机器人分布式认知技术
CN103037945A (zh) * 2010-04-30 2013-04-10 方瑞麟 具有基于声音的动作同步化的交互式装置
US9147388B2 (en) * 2012-06-26 2015-09-29 Yamaha Corporation Automatic performance technique using audio waveform data
CN105598972A (zh) * 2016-02-04 2016-05-25 北京光年无限科技有限公司 一种机器人系统及交互方法

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108650217A (zh) * 2018-03-21 2018-10-12 腾讯科技(深圳)有限公司 动作状态的同步方法、装置、存储介质及电子装置
CN108650217B (zh) * 2018-03-21 2019-07-23 腾讯科技(深圳)有限公司 动作状态的同步方法、装置、存储介质及电子装置
CN112528000A (zh) * 2020-12-22 2021-03-19 北京百度网讯科技有限公司 虚拟机器人的生成方法、装置和电子设备

Also Published As

Publication number Publication date
CN106463118A (zh) 2017-02-22
JP2018001404A (ja) 2018-01-11
JP6567610B2 (ja) 2019-08-28
CN106463118B (zh) 2019-09-03

Similar Documents

Publication Publication Date Title
WO2018006371A1 (fr) Procédé et système de synchronisation de paroles et d'actions virtuelles, et robot
WO2018006369A1 (fr) Procédé et système de synchronisation d'actions vocales et virtuelles, et robot
WO2018000259A1 (fr) Procédé et système pour générer un contenu d'interaction de robot et robot
TWI778477B (zh) 互動方法、裝置、電子設備以及儲存媒體
US20220284896A1 (en) Electronic personal interactive device
WO2018000268A1 (fr) Procédé et système pour générer un contenu d'interaction de robot, et robot
WO2018006370A1 (fr) Procédé et système d'interaction pour robot 3d virtuel, et robot
Leman Musical gestures and embodied cognition
Dionisio et al. 3D virtual worlds and the metaverse: Current status and future possibilities
KR101306221B1 (ko) 3차원 사용자 아바타를 이용한 동영상 제작장치 및 방법
Peng et al. Robotic dance in social robotics—a taxonomy
CN108665492A (zh) 一种基于虚拟人的舞蹈教学数据处理方法及系统
US20090128567A1 (en) Multi-instance, multi-user animation with coordinated chat
WO2018000267A1 (fr) Procédé de génération de contenu d'interaction de robot, système et robot
JP2019521449A (ja) 永続的コンパニオンデバイス構成及び配備プラットフォーム
WO2018006372A1 (fr) Procédé et système de commande d'appareil ménager sur la base de la reconnaissance d'intention, et robot
WO2018006374A1 (fr) Procédé, système et robot de recommandation de fonction basés sur un réveil automatique
KR20200097637A (ko) 시뮬레이션 모래상자 시스템
CN109409255A (zh) 一种手语场景生成方法及装置
WO2023246163A1 (fr) Procédé de commande d'être humain numérique virtuel, appareil, dispositif et support
Hartholt et al. Ubiquitous virtual humans: A multi-platform framework for embodied ai agents in xr
WO2018000258A1 (fr) Procédé et système permettant de générer un contenu d'interaction de robot et robot
WO2018000266A1 (fr) Procédé et système permettant de générer un contenu d'interaction de robot, et robot
WO2018000260A1 (fr) Procédé servant à générer un contenu d'interaction de robot, système et robot
Yu Robot behavior generation and human behavior understanding in natural human-robot interaction

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16907876

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16907876

Country of ref document: EP

Kind code of ref document: A1