WO2021196647A1 - Procédé et appareil permettant de commander un objet interactif, dispositif, et support de stockage - Google Patents

Procédé et appareil permettant de commander un objet interactif, dispositif, et support de stockage Download PDF

Info

Publication number
WO2021196647A1
WO2021196647A1 PCT/CN2020/129830 CN2020129830W WO2021196647A1 WO 2021196647 A1 WO2021196647 A1 WO 2021196647A1 CN 2020129830 W CN2020129830 W CN 2020129830W WO 2021196647 A1 WO2021196647 A1 WO 2021196647A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
control parameter
target data
interactive object
target
Prior art date
Application number
PCT/CN2020/129830
Other languages
English (en)
Chinese (zh)
Inventor
孙林
Original Assignee
北京市商汤科技开发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京市商汤科技开发有限公司 filed Critical 北京市商汤科技开发有限公司
Priority to JP2021549865A priority Critical patent/JP2022531056A/ja
Priority to KR1020217027681A priority patent/KR20210124306A/ko
Priority to SG11202109201XA priority patent/SG11202109201XA/en
Publication of WO2021196647A1 publication Critical patent/WO2021196647A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • G06F3/04847Interaction techniques to control parameter settings, e.g. interaction with sliders or dials

Definitions

  • the present disclosure relates to the field of computer technology, and in particular to a method, device, device, and storage medium for driving interactive objects.
  • the way of human-computer interaction is mostly: the user inputs based on keys, touch, and voice, and the device responds by presenting images, text or virtual characters on the display screen.
  • virtual characters are mostly improved on the basis of voice assistants, and the interaction between users and virtual characters is still on the surface.
  • the embodiments of the present disclosure provide a driving solution for interactive objects.
  • a method for driving an interactive object includes: acquiring sound-driven data of the interactive object displayed by a display device; The control parameter sequence of the setting action of the interactive object matched by the target data; the interactive object is controlled to execute the setting action according to the obtained control parameter sequence.
  • the method further includes: controlling the display device to output voice according to the voice information corresponding to the sound-driven data, and/or displaying text according to the text information corresponding to the sound-driven data .
  • the controlling the interaction object to perform the setting action according to the obtained control parameter sequence includes: determining the voice information corresponding to the target data; obtaining the voice information output Time information; determine the execution time of the setting action corresponding to the target data according to the time information; according to the execution time, control the interactive object to execute the setting action with the control parameter sequence corresponding to the target data.
  • control parameter sequence includes one or more sets of control parameters.
  • the control parameter sequence corresponding to the target data is used to control the interactive object to execute the setting.
  • the predetermined action includes: invoking each group of control parameters in the control parameter sequence at a set rate, so that the interactive object displays a posture corresponding to each group of control parameters.
  • control parameter sequence includes one or more sets of control parameters, and according to the execution time, the interactive object is controlled to execute the control parameter sequence corresponding to the target data.
  • the setting action includes: determining the calling rate of the control parameter sequence according to the execution time; calling each group of control parameters in the control parameter sequence at the calling rate, so that the interactive object outputs and each group of control parameters The posture corresponding to the parameter.
  • the controlling the interactive object to execute the setting action according to the execution time with the control parameter sequence corresponding to the target data includes: outputting the target data corresponding to the The set time before the voice message starts to call the control parameter sequence corresponding to the target data, so that the interactive object starts to perform the set action.
  • the sound-driven data includes multiple target data
  • the controlling the interactive object to perform the setting action according to the obtained control parameter sequence includes: responding to detecting the There is overlap between adjacent target data in the multiple target data, and the interactive object is controlled to perform the setting action according to the control parameter sequence corresponding to the target data arranged first based on the word order.
  • the sound-driven data includes a plurality of target data
  • the controlling the interactive object to perform the setting action according to the control parameter sequence corresponding to the target data includes: responding to detection When the control parameter sequences corresponding to adjacent target data among the multiple target data overlap in execution time, the overlapping parts of the control parameter sequences corresponding to the adjacent target data are merged.
  • the acquiring, based on the target data contained in the sound-driven data, the control parameter sequence of the setting action of the interactive object that matches the target data includes: responding to the The sound-driven data includes audio data, performs voice recognition on the audio data, and determines the target data contained in the audio data according to the recognized voice content; in response to the sound-driven data including text data, according to the text The text content contained in the data determines the target data contained in the text data.
  • the sound-driven data includes syllable data
  • the control of the set action of an interactive object matching the target data is obtained based on the target data contained in the sound-driven data
  • the parameter sequence includes: determining whether the syllable data contained in the sound driving data matches the target syllable data, wherein the target syllable data belongs to a pre-divided syllable type, and a syllable type corresponds to a device Mouth shape, a set of mouth shapes with corresponding control parameter sequences; in response to the syllable data being matched with the target syllable data, based on the syllable type to which the matched target syllable data belongs, the matched syllable data is obtained
  • the control parameter sequence for setting the mouth shape corresponding to the target syllable data is obtained by the target syllable data.
  • the method further includes: acquiring first data other than the target data in the sound-driven data; acquiring an acoustic characteristic of the first data; acquiring a posture matching the acoustic characteristic Control parameters; control the posture of the interactive object according to the posture control parameters.
  • an apparatus for driving an interactive object includes: a first acquisition unit for acquiring sound-driven data of an interactive object displayed by a display device; and a second acquisition unit for The target data contained in the sound-driven data obtains the control parameter sequence of the setting action of the interactive object that matches the target data; the driving unit is used to control the interactive object to execute the setting according to the obtained control parameter sequence Definite action.
  • the device further includes an output unit for controlling the display device to output voice according to the voice information corresponding to the sound-driven data, and/or, according to the voice information corresponding to the sound-driven data Text information display text.
  • the driving unit is specifically configured to: determine the voice information corresponding to the target data; obtain the time information for outputting the voice information; determine the voice information corresponding to the target data according to the time information Set the execution time of the action; according to the execution time, control the interactive object to execute the set action with the control parameter sequence corresponding to the target data.
  • control parameter sequence includes one or more sets of control parameters; the driving unit is used to control the control parameter sequence corresponding to the target data according to the execution time.
  • the interactive object executes the setting action, it is specifically used to: call each group of control parameters in the control parameter sequence at a set rate, so that the interactive object displays a posture corresponding to each group of control parameters.
  • the control parameter sequence includes one or more sets of control parameters; the driving unit is used to control the control parameter sequence corresponding to the target data according to the execution time.
  • the interactive object executes the setting action, it is specifically used to: determine the call rate of the control parameter sequence according to the execution time; call each group of control parameters in the control parameter sequence at the call rate, so that all The interactive object outputs the posture corresponding to each group of control parameters.
  • the control parameter sequence includes one or more sets of control parameters; the driving unit is used to control the control parameter sequence corresponding to the target data according to the execution time.
  • the interactive object executes the setting action, it is specifically used to: start to call the control parameter sequence corresponding to the target data at a set time before outputting the voice information corresponding to the target data, so that the interactive object starts to execute all The setting action.
  • the sound-driven data includes a plurality of target data
  • the driving unit is specifically configured to respond to detection of overlapping of adjacent target data in the plurality of target data, according to the arrangement based on word order
  • the control parameter sequence corresponding to the previous target data controls the interactive object to perform the setting action.
  • the sound driving data includes a plurality of target data
  • the driving unit is specifically configured to: in response to detecting that the control parameter sequence corresponding to adjacent target data in the plurality of target data is in The execution time overlaps, and the overlapping parts of the control parameter sequences corresponding to the adjacent target data are merged.
  • the second acquiring unit is specifically configured to: in response to the sound-driven data including audio data, perform voice recognition on the audio data, and perform voice recognition on the audio data according to the voice content contained in the audio data , Determine the target data included in the audio data; in response to the sound-driven data including text data, determine the target data included in the text data according to the text content included in the text data.
  • the sound-driven data includes syllable data
  • the second acquiring unit is specifically configured to determine whether the syllable data contained in the sound-driven data matches the target syllable data, wherein:
  • the target syllable data belongs to a pre-divided syllable type, one syllable type corresponds to a set mouth shape, and a set mouth shape is set with a corresponding control parameter sequence; in response to the syllable data and the set mouth shape.
  • the target syllable data is matched, and based on the syllable type to which the matched target syllable data belongs, a control parameter sequence for setting the mouth shape corresponding to the matched target syllable data is obtained.
  • the device further includes a posture control unit configured to: obtain first data other than the target data in the sound-driven data; obtain the acoustic characteristics of the first data; The posture control parameter matched with the acoustic feature of the first data; the posture of the interactive object is controlled according to the posture control parameter.
  • a posture control unit configured to: obtain first data other than the target data in the sound-driven data; obtain the acoustic characteristics of the first data; The posture control parameter matched with the acoustic feature of the first data; the posture of the interactive object is controlled according to the posture control parameter.
  • an electronic device includes a memory and a processor, the memory is used to store computer instructions that can be run on the processor, and the processor is used to execute the computer instructions when the computer instructions are executed.
  • the method for driving interactive objects described in any of the embodiments provided in the present disclosure is implemented.
  • a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the method for driving an interactive object according to any one of the embodiments provided in the present disclosure is realized.
  • the driving method, device, device, and computer-readable storage medium of an interactive object according to one or more embodiments of the present disclosure, according to at least one target data contained in the sound driving data of the interactive object displayed by the display device, and the target data are obtained
  • the control parameters of the set action of the matched interactive object are used to control the actions of the interactive object displayed by the display device, so that the interactive object can make the action corresponding to the target data contained in the sound-driven data, so that the interactive object can speak
  • the state is natural and vivid, which enhances the interactive experience of the target object.
  • FIG. 1 is a schematic diagram of a display device in a method for driving an interactive object according to an embodiment of the present disclosure
  • Fig. 2 is a flowchart of a method for driving interactive objects proposed according to an embodiment of the present disclosure
  • Fig. 3 is a flowchart of a method for driving interactive objects proposed according to an embodiment of the present disclosure
  • Fig. 4 is a flowchart of a method for driving interactive objects proposed according to an embodiment of the present disclosure
  • Fig. 5 is a schematic structural diagram of a driving device for interactive objects proposed according to an embodiment of the present disclosure
  • Fig. 6 is a schematic structural diagram of an electronic device proposed according to an embodiment of the present disclosure.
  • At least one embodiment of the present disclosure provides a method for driving interactive objects.
  • the driving method may be executed by electronic devices such as a terminal device or a server.
  • the terminal device may be a fixed terminal or a mobile terminal, such as a mobile phone, a tablet, or a game.
  • the server includes a local server or a cloud server, etc., and the method can also be implemented by a processor calling computer-readable instructions stored in a memory.
  • the interactive object can be any interactive object that can interact with the target object. It can be a virtual character, virtual animal, virtual item, cartoon image, etc., and other virtual objects that can realize interactive functions.
  • the image, the display form of the avatar may be either 2D or 3D, which is not limited in the present disclosure.
  • the target object may be a user, a robot, or other smart devices.
  • the interaction manner between the interaction object and the target object may be an active interaction manner or a passive interaction manner.
  • the target object can make a demand by making gestures or body movements, and trigger the interactive object to interact with it by means of active interaction.
  • the interactive object may actively greet the target object, prompt the target object to make an action, etc., so that the target object interacts with the interactive object in a passive manner.
  • the interactive object may be displayed through electronic equipment, and the electronic equipment may also be a TV, an all-in-one machine with a display function, a projector, a virtual reality (VR) device, or an augmented reality (AR) Devices, etc., the present disclosure does not limit the specific form of the electronic device.
  • the electronic equipment may also be a TV, an all-in-one machine with a display function, a projector, a virtual reality (VR) device, or an augmented reality (AR) Devices, etc., the present disclosure does not limit the specific form of the electronic device.
  • Fig. 1 shows a display device proposed according to an embodiment of the present disclosure.
  • the display device has a display screen, which can display a stereoscopic picture on the display screen to present a virtual scene and interactive objects.
  • the interactive objects displayed on the display screen in Figure 1 are virtual cartoon characters.
  • the electronic device described in the present disclosure may include a built-in display or be integrated with the above-mentioned display device. Through the display or the display device, a stereoscopic picture may be displayed to present a virtual scene and interactive objects. In other embodiments, the electronic device described in the present disclosure may not include a built-in display, and the content to be displayed may be notified through a wired or wireless connection to notify an external display to present a virtual scene and interactive objects.
  • the interactive object in response to the electronic device receiving sound driving data for driving the interactive object to output voice, the interactive object may emit a specified voice to the target object.
  • Sound-driven data can be generated according to the actions, expressions, identities, preferences, etc. of the target object around the electronic device to drive the interactive object to respond by issuing a specified voice, thereby providing anthropomorphic services for the target object.
  • the interactive object while driving the interactive object to emit a specified voice according to the sound-driven data, the interactive object cannot be driven to make facial movements synchronized with the specified voice, so that the interactive object is uttering a voice. Stiff and unnatural, affecting the target audience and interactive experience.
  • an embodiment of the present disclosure proposes a driving method for an interactive object, so as to improve the experience of the target object interacting with the interactive object.
  • FIG. 2 shows a flowchart of a method for driving an interactive object according to an embodiment of the present disclosure. As shown in FIG. 2, the method includes steps 201 to 203.
  • step 201 the sound-driven data of the interactive object displayed by the display device is obtained.
  • the sound driving data may include audio data (voice data), text data, and so on.
  • the sound driving data may be driving data generated by the electronic device according to the actions, expressions, identity, preferences, etc. of the target object interacting with the interactive object, or directly obtained by the electronic device, such as sound driving data called from internal memory Wait.
  • the present disclosure does not limit the acquisition method of the sound-driven data.
  • step 202 based on the target data contained in the sound-driven data, a control parameter sequence of a setting action of an interactive object matching the target data is obtained, and the control parameter sequence includes one or more sets of control parameters .
  • the target data is data that is pre-matched with a setting action, and the setting action is controlled by a corresponding control parameter sequence, so the target data and the control parameter of the setting action Sequence matching.
  • the target data may be set keywords, words, sentences, etc. Taking the keyword "wave” as an example, when text data is included in the sound-driven data, the target data corresponding to "wave” is the text data of "wave", and/or the sound-driven data contains In the case of audio or syllable data, the target data corresponding to "wave” is the voice data of "wave".
  • the sound-driven data matches the above-mentioned target data, it can be determined that the sound-driven data contains the target data.
  • the setting action can be realized by using a universal unit animation.
  • the unit animation can include a sequence of image frames. Each image frame in the sequence corresponds to a posture of the interactive object through the change of the corresponding posture between the image frames. That is, the interactive object can realize the set action.
  • the posture of the interactive object in an image frame can be realized by a set of control parameters, for example, a set of control parameters formed by the displacement of multiple bone points. Therefore, the control parameter sequence formed by multiple sets of control parameters is used to control the posture change of the interactive object, and the interactive object can be controlled to realize the setting action.
  • the target data may include target syllable data, the target syllable data corresponds to a control parameter for setting a mouth shape, a target syllable data belongs to a pre-divided syllable type, and the one One syllable type corresponds to a set mouth shape, and one set mouth shape is set with a corresponding control parameter sequence.
  • the syllable data is a phonetic unit formed by a combination of at least one phoneme, and the syllable data includes syllable data of a pinyin language and syllable data of a non-pinyin language (for example, Chinese).
  • a syllable type refers to syllable data whose pronunciation actions are consistent or basically consistent.
  • a syllable type can correspond to an action of an interactive object. Specifically, a syllable type can be a set mouth when speaking with an interactive object.
  • Type correspondence that is, corresponding to a kind of pronunciation action, so that the syllable data of the same type can match the set control parameter sequence of the same mouth shape, for example, the pinyin "ma”, “man”, “mang” type Syllable data, because the pronunciation actions of this type of syllable data are basically the same, it can be regarded as the same type, and can correspond to the control parameter sequence of the mouth shape of the "mouth open" when the interactive object speaks. In this way, the sound drive data is detected When such target syllable data is included, the interactive object can be controlled to make a corresponding mouth shape according to the control parameter sequence of the mouth shape matched by the target syllable data.
  • multiple different types of mouth shape control parameter sequences can be matched, and then the multiple control parameter sequences can be used to control the mouth shape changes of interactive objects, and control the interactive objects to achieve anthropomorphism.
  • Speaking state multiple different types of mouth shape control parameter sequences can be matched, and then the multiple control parameter sequences can be used to control the mouth shape changes of interactive objects, and control the interactive objects to achieve anthropomorphism.
  • the multiple control parameter sequences can be used to control the mouth shape changes of interactive objects, and control the interactive objects to achieve anthropomorphism.
  • Speaking state can be matched, and then the multiple control parameter sequences can be used to control the mouth shape changes of interactive objects, and control the interactive objects to achieve anthropomorphism.
  • step 203 the interactive object is controlled to execute the setting action according to the obtained control parameter sequence.
  • a corresponding control parameter sequence of the setting action For one or more target data contained in the sound driving data, a corresponding control parameter sequence of the setting action can be obtained.
  • the action of the interactive object is controlled according to the obtained control parameter sequence, that is, the setting action corresponding to each target data in the acoustic driving data can be realized.
  • the control parameter sequence of the set action of the interactive object that matches the target data is obtained to control the display of the display device.
  • the action of the interactive object enables the interactive object to make an action corresponding to the target data contained in the sound-driven data, so that the speaking state of the interactive object is natural and vivid, and the interactive experience of the target object is improved.
  • Fig. 3 shows a flowchart of a method for driving interactive objects according to an embodiment of the present disclosure. As shown in Fig. 3, the method further includes:
  • Step 204 Control the display device to output voice according to the voice information corresponding to the sound-driven data, or control the display device to output voice according to the voice information corresponding to the sound-driven data, and according to the text corresponding to the sound-driven data Information display text.
  • the interactive object While controlling the display device to output the voice corresponding to the sound-driven data, according to the control parameter sequence matched by each target data in the sound-driven data, the interactive object is sequentially controlled to perform corresponding actions, so that the interactive object can output the voice at the same time , Make actions according to the content of the sound, so that the speaking state of the interactive object is natural and vivid, and the interactive experience of the target object is improved.
  • the interactive object performs corresponding actions, so that the interactive object can perform actions according to the content contained in the sound and text while outputting voice and displaying text, so that the state of expression of the interactive object is natural and vivid, and the interaction of the target object is improved Experience.
  • the image frame sequence corresponding to the variable content can be formed, which improves the driving efficiency of the interactive object.
  • the target data can be added or modified as needed to cope with the changed content and facilitate the maintenance and update of the drive system.
  • the method is applied to a server, including a local server or a cloud server, etc.
  • the server processes the sound-driven data of the interactive object, generates the posture parameter value of the interactive object, and generates the posture parameter value according to the posture parameter.
  • the value is rendered using a three-dimensional or two-dimensional rendering engine to obtain the response animation of the interactive object.
  • the server may send the response animation to the terminal device for display to respond to the target object, and may also send the response animation to the cloud, so that the terminal device can obtain the response animation from the cloud to perform the response animation on the target object. Response.
  • the posture parameter value may also be sent to the terminal, so that the terminal completes the process of rendering, generating a response animation, and performing display.
  • the method is applied to a terminal device, which processes the sound-driven data of an interactive object, generates a posture parameter value of the interactive object, and uses 3D or 2D according to the posture parameter value.
  • the rendering engine performs rendering to obtain the response animation of the interactive object, and the terminal may display the response animation to respond to the target object.
  • the voice content contained in the audio data may be obtained by performing voice recognition on the sound-driven data, and the target data contained in the audio data may be determined. By matching the voice content with the target data, the target data contained in the sound-driven data can be determined.
  • the target data included in the text data is determined based on the text content included in the text data.
  • the sound driving data when the sound driving data includes syllable data, the sound driving data is split to obtain at least one syllable data.
  • the sound driving data is split to obtain at least one syllable data.
  • the priority of different splitting methods can be set to change the priority
  • the syllable data combination obtained by the splitting method is used as the splitting result.
  • the split syllable data is matched with the target syllable data.
  • the syllable data matching the target syllable data of any syllable type it can be determined that the syllable data matches the target syllable data, and the sound can be determined
  • the driving data includes the target data.
  • the target syllable data may include "ma”, "man”, and "mang” type syllable data, in response to the sound drive data containing a syllable that matches any of "ma”, "man”, and "mang” Data, it is determined that the sound-driven data includes the target syllable data.
  • the control parameter sequence for setting the mouth shape corresponding to the target syllable data is obtained, and the interactive object is controlled to make the corresponding Mouth type.
  • the mouth shape change of the interactive object can be controlled according to the control parameter sequence of the mouth shape corresponding to the sound-driven data, so that the interactive object realizes a anthropomorphic speaking state.
  • the syllable data obtained by splitting may be multiple syllable data. For each syllable data in multiple syllable data, it is possible to find whether the syllable data matches a certain target syllable data, and when the syllable data matches a certain target syllable data, obtain the set mouth corresponding to the target syllable data Type of control parameter sequence.
  • step 203 further includes:
  • Step 2031 Determine the voice information corresponding to the target data
  • Step 2032 Obtain time information for outputting the voice information
  • Step 2033 Determine the execution time of the set action corresponding to the target data according to the time information.
  • Step 2034 According to the execution time, control the interactive object to execute the setting action with the control parameter sequence corresponding to the target data.
  • the time information of the voice information corresponding to the output target data can be determined, for example, the time when the voice information corresponding to the target data starts to be output, The time and duration to end the output.
  • the execution time of the set action corresponding to the target data may be determined according to the time information, and within the execution time or within a certain range of the execution time, the interaction is controlled by the control parameter sequence corresponding to the target data The object executes the setting action.
  • the duration of outputting the voice according to the sound-driven data is the same or similar to the duration of controlling the interactive object to perform continuous setting actions according to a plurality of control parameter sequences; and for each target data, The duration of outputting the corresponding voice is also consistent or similar to the duration of controlling the interactive object to perform the set action according to the corresponding control parameter sequence, so that the time when the interactive object speaks and the time when the action is performed are matched, thus Synchronize and coordinate the voices and actions of interactive objects.
  • each group of control parameters in the control parameter sequence may be called at a set rate, so that the interactive object displays a posture corresponding to each group of control parameters. That is, the control parameter sequence corresponding to each target data is always executed at a constant speed.
  • the call rate of the control parameter sequence corresponding to the target data is determined according to the execution time of the set action corresponding to the target data, and the call rate corresponding to the target data is called at the call rate.
  • Each group of control parameters in the control parameter sequence enables the interactive object to display a posture corresponding to each group of control parameters.
  • the call rate of the control parameter sequence determines the rate at which the interactive object performs actions. For example, when the control parameter sequence is called at a higher speed, the posture of the interactive object changes relatively fast, so the set action can be completed in a shorter time.
  • the time to perform the set action can be adjusted according to the time when the voice of the target data is output, such as compression or expansion, so that the time when the interactive object performs the set action matches the time when the voice of the target data is output. , So as to synchronize and coordinate the voice and actions of the interacting objects.
  • control parameter sequence corresponding to the target data may be called at a set time before outputting the voice according to the phoneme corresponding to the target data, so that the interactive object starts to perform the setting corresponding to the control parameter sequence action.
  • the interactive object starts to output the voice corresponding to the target data, it starts to call the control parameter sequence corresponding to the target data, so that the interactive object starts to perform the set action, which is more in line with the state of the real person speaking.
  • the speech of the interactive object is more natural and vivid, which improves the interactive experience of the target object.
  • the corresponding target data may be arranged first based on the word order (that is, the natural arrangement order of the received sound-driven data)
  • the control parameter sequence controls the interactive object to perform a corresponding setting action, and ignores the target data that overlaps the target data and is arranged later.
  • Each target data contained in the sound-driven data may be stored in the form of an array, and each target data is an element thereof. It should be noted that since morphemes can be combined in different ways to obtain different target data, there may be overlaps between two adjacent target data among multiple target data. For example, in the case where the text corresponding to the sound-driven data is "The weather is really good", the corresponding target data are respectively: 1, day, 2, weather, 3, really good. For the adjacent target data 1 and 2, they contain the common morpheme " ⁇ ", and the target data 1 and 2 can match the same specified action, for example, pointing up with a finger.
  • the priority of the target data that appears first can be set higher than the target data that follows. Regarding the above example of "the weather is really good", the priority of "day” is higher than that of "weather”. Therefore, the interactive object is controlled to perform the setting action according to the control parameter sequence of the setting action corresponding to "day". And ignore the remaining morpheme "qi" (that is, ignore the target data "weather” that overlaps with the target data "day”), and then directly match "really good”.
  • the overlapping part of the control parameter sequences corresponding to the adjacent target data may be adjusted. Perform fusion.
  • the overlapping parts of the control parameter sequences may be averaged or weighted averaged to achieve the fusion of the overlapping control parameter sequences.
  • an interpolation method can be used to interpolate a frame of the previous action (for example, the Nth group of control parameters n of the first control parameter sequence corresponding to the action) to the next action according to the transition time Transition until the transition to the first frame in the next action coincides (for example, find the first group of control parameters 1 in the second control parameter sequence corresponding to the next action to be the same as the control parameter n, or change the next
  • the action is inserted into the certain frame, so that the total execution time of the two actions after the interpolation transition is the same as the playback or display time of the corresponding voice data/text data), then all the actions after a certain frame in the previous action are ignored Frame, directly execute the next action, thus realizing the fusion of overlapping control parameter sequences.
  • the actions of the interactive objects can be smoothly transitioned, so that the actions of the interactive objects are smooth and natural, and the interactive experience of the target object is improved.
  • data other than each target data for example, referred to as the first data
  • the first data can be based on the attitude control parameters matched by the acoustic characteristics of the first data, and based on the The posture control parameter controls the posture of the interactive object.
  • a sequence of voice frames contained in the first data may be acquired, and acoustic characteristics corresponding to at least one voice frame may be acquired, and the posture control parameters of the interactive object corresponding to the acoustic characteristics may be acquired , Such as a posture control vector, to control the posture of the interactive object.
  • the acoustic features corresponding to the phonemes can be obtained according to the phonemes corresponding to the morphemes in the text data, and the gesture control parameters of the interactive object corresponding to the acoustic features, such as gesture
  • the control vector is used to control the posture of the interactive object.
  • the acoustic feature may be a feature related to speech emotion, such as a fundamental frequency feature, a common peak feature, Mel Frequency Cofficient (MFCC) and so on.
  • MFCC Mel Frequency Cofficient
  • the speech and/or displayed text according to the first data is different from controlling the attitude of the interactive object according to the attitude parameter value.
  • the gesture made by the interactive object is synchronized with the output voice and/or text, giving the target object the feeling that the interactive object is speaking.
  • the attitude control vector is related to the acoustic characteristics of the output sound, driving according to the attitude control vector makes the expression and body movements of the interactive object have emotional factors, making the speaking process of the interactive object more natural and vivid, thereby Improve the interactive experience of the target object.
  • the sound-driven data includes at least one target data, and first data other than the target data.
  • For the first data determine the posture control parameters according to the acoustic characteristics of the first data to control the posture of the interactive object; for the target data, according to the set action matching the target data
  • the parameter sequence is controlled to control the interactive object to make the setting action.
  • FIG. 5 shows a schematic structural diagram of an interactive object driving apparatus according to at least one embodiment of the present disclosure.
  • the apparatus may include: a first obtaining unit 301, configured to obtain a sound drive of an interactive object displayed by a display device Data; the second acquisition unit 302, based on the target data contained in the sound-driven data, to acquire the control parameter sequence of the set action of the interactive object that matches the target data; the driving unit 303, used according to the The obtained control parameter sequence controls the interactive object to perform the setting action.
  • the device further includes an output unit for controlling the display device to output voice according to the voice information corresponding to the sound-driven data, and/or displaying text according to the text information corresponding to the sound-driven data .
  • the driving unit is specifically configured to: determine the voice information corresponding to the target data; obtain time information for outputting the voice information; determine the setting action corresponding to the target data according to the time information Execution time; according to the execution time, the interactive object is controlled to execute the set action with the control parameter sequence corresponding to the target data.
  • control parameter sequence includes one or more sets of control parameters; the drive unit is used to control the interactive object to execute the control parameter sequence corresponding to the target data according to the execution time.
  • the setting action it is specifically used to: call each group of control parameters in the control parameter sequence at a set rate, so that the interactive object displays a posture corresponding to each group of control parameters.
  • control parameter sequence includes one or more sets of control parameters; the drive unit is used to control the interactive object to execute the control parameter sequence corresponding to the target data according to the execution time.
  • the setting action it is specifically used to: determine the call rate of the control parameter sequence according to the execution time; call each group of control parameters in the control parameter sequence at the call rate to make the interactive object output The attitude corresponding to each set of control parameters.
  • control parameter sequence includes one or more sets of control parameters; the drive unit is used to control the interactive object to execute the control parameter sequence corresponding to the target data according to the execution time.
  • the setting action it is specifically used to: start calling the control parameter sequence corresponding to the target data at a set time before outputting the voice information corresponding to the target data, so that the interactive object starts to perform the setting action .
  • the sound driving data includes a plurality of target data
  • the driving unit is specifically configured to: in response to detecting that adjacent target data in the plurality of target data overlap;
  • the control parameter sequence corresponding to the target data controls the interactive object to perform the setting action.
  • the sound driving data includes a plurality of target data
  • the driving unit is specifically configured to: in response to detecting that the control parameter sequences corresponding to adjacent target data in the plurality of target data overlap in execution time Fusing the overlapping parts of the control parameter sequences corresponding to the adjacent target data.
  • the second acquiring unit is specifically configured to perform voice recognition on the audio data in response to the sound-driven data including audio data, and determine whether the audio data contains The target data; in response to the sound-driven data including text data, according to the text content contained in the text data, determine the target data contained in the text data.
  • the target data includes target syllable data
  • the second acquiring unit is specifically configured to determine whether the syllable data contained in the sound driving data matches the target syllable data, wherein the target syllable The data belongs to a pre-divided syllable type, one syllable type corresponds to a set mouth shape, and a set mouth shape is set with a corresponding control parameter sequence; in response to the syllable data and the target syllable data Matching, based on the syllable type to which the matched target syllable data belongs, obtaining a control parameter sequence for setting the mouth shape corresponding to the matched target syllable data.
  • the device further includes a posture control unit, configured to: obtain first data other than target data in the sound-driven data; obtain acoustic features of the first data; obtain matching with the acoustic features The posture control parameters of; control the posture of the interactive object according to the posture control parameters.
  • a posture control unit configured to: obtain first data other than target data in the sound-driven data; obtain acoustic features of the first data; obtain matching with the acoustic features The posture control parameters of; control the posture of the interactive object according to the posture control parameters.
  • At least one embodiment of this specification also provides an electronic device. As shown in FIG. 6, the device includes a memory and a processor. The memory is used to store computer instructions that can run on the processor. The processor is used to execute the The method for driving interactive objects described in any embodiment of the present disclosure is realized by computer instructions. At least one embodiment of this specification also provides a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the method for driving an interactive object according to any embodiment of the present disclosure is realized.
  • one or more embodiments of this specification can be provided as a method, a system, or a computer program product. Therefore, one or more embodiments of this specification may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment combining software and hardware. Moreover, one or more embodiments of this specification may adopt computer programs implemented on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) containing computer-usable program codes. The form of the product.
  • computer-usable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • the embodiments of the subject and functional operations described in this specification can be implemented in the following: digital electronic circuits, tangible computer software or firmware, computer hardware including the structures disclosed in this specification and structural equivalents thereof, or among them A combination of one or more.
  • the embodiments of the subject matter described in this specification can be implemented as one or more computer programs, that is, one or one of the computer program instructions encoded on a tangible non-transitory program carrier to be executed by a data processing device or to control the operation of the data processing device Multiple modules.
  • the program instructions may be encoded on artificially generated propagated signals, such as machine-generated electrical, optical or electromagnetic signals, which are generated to encode information and transmit it to a suitable receiver device for data transmission.
  • the processing device executes.
  • the computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.
  • the processing and logic flow described in this specification can be executed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating according to input data and generating output.
  • the processing and logic flow can also be executed by a dedicated logic circuit, such as FPGA (Field Programmable Gate Array) or ASIC (Application Specific Integrated Circuit), and the device can also be implemented as a dedicated logic circuit.
  • FPGA Field Programmable Gate Array
  • ASIC Application Specific Integrated Circuit
  • Computers suitable for executing computer programs include, for example, general-purpose and/or special-purpose microprocessors, or any other type of central processing unit.
  • the central processing unit will receive instructions and data from a read-only memory and/or a random access memory.
  • the basic components of a computer include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data.
  • the computer will also include one or more mass storage devices for storing data, such as magnetic disks, magneto-optical disks, or optical disks, or the computer will be operatively coupled to this mass storage device to receive data from or send data to it. It transmits data, or both.
  • the computer does not have to have such equipment.
  • the computer can be embedded in another device, such as a mobile phone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a global positioning system (GPS) receiver, or, for example, a universal serial bus (USB ) Flash drives are portable storage devices, just to name a few.
  • PDA personal digital assistant
  • GPS global positioning system
  • USB universal serial bus
  • Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media, and memory devices, including, for example, semiconductor memory devices (such as EPROM, EEPROM, and flash memory devices), magnetic disks (such as internal hard disks or Removable disks), magneto-optical disks, CD ROM and DVD-ROM disks.
  • semiconductor memory devices such as EPROM, EEPROM, and flash memory devices
  • magnetic disks such as internal hard disks or Removable disks
  • magneto-optical disks CD ROM and DVD-ROM disks.
  • the processor and the memory can be supplemented by or incorporated into a dedicated logic circuit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

L'invention porte sur un procédé et sur un appareil permettant de commander un objet interactif, sur un dispositif, et sur un support de stockage. Le procédé consiste : à obtenir des données de commande vocale d'un objet interactif affiché par un dispositif d'affichage (201) ; à obtenir, sur la base de données cibles comprises dans les données de commande vocale, une séquence de paramètres de commande d'une action définie d'un objet interactif correspondant aux données cibles (202) ; et à commander l'objet interactif pour exécuter l'action définie en fonction de la séquence de paramètres de commande obtenue (203).
PCT/CN2020/129830 2020-03-31 2020-11-18 Procédé et appareil permettant de commander un objet interactif, dispositif, et support de stockage WO2021196647A1 (fr)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2021549865A JP2022531056A (ja) 2020-03-31 2020-11-18 インタラクティブ対象の駆動方法、装置、デバイス、及び記録媒体
KR1020217027681A KR20210124306A (ko) 2020-03-31 2020-11-18 인터랙티브 대상의 구동 방법, 장치, 디바이스 및 기록 매체
SG11202109201XA SG11202109201XA (en) 2020-03-31 2020-11-18 Methods, apparatuses, electronic devices and storage media for driving an interactive object

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010245772.7 2020-03-31
CN202010245772.7A CN111459451A (zh) 2020-03-31 2020-03-31 交互对象的驱动方法、装置、设备以及存储介质

Publications (1)

Publication Number Publication Date
WO2021196647A1 true WO2021196647A1 (fr) 2021-10-07

Family

ID=71683496

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/129830 WO2021196647A1 (fr) 2020-03-31 2020-11-18 Procédé et appareil permettant de commander un objet interactif, dispositif, et support de stockage

Country Status (6)

Country Link
JP (1) JP2022531056A (fr)
KR (1) KR20210124306A (fr)
CN (1) CN111459451A (fr)
SG (1) SG11202109201XA (fr)
TW (1) TWI759039B (fr)
WO (1) WO2021196647A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111459451A (zh) * 2020-03-31 2020-07-28 北京市商汤科技开发有限公司 交互对象的驱动方法、装置、设备以及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06348297A (ja) * 1993-06-10 1994-12-22 Osaka Gas Co Ltd 発音練習装置
CN109599113A (zh) * 2019-01-22 2019-04-09 北京百度网讯科技有限公司 用于处理信息的方法和装置
CN110413841A (zh) * 2019-06-13 2019-11-05 深圳追一科技有限公司 多态交互方法、装置、系统、电子设备及存储介质
CN110853614A (zh) * 2018-08-03 2020-02-28 Tcl集团股份有限公司 虚拟对象口型驱动方法、装置及终端设备
CN111459451A (zh) * 2020-03-31 2020-07-28 北京市商汤科技开发有限公司 交互对象的驱动方法、装置、设备以及存储介质

Family Cites Families (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7827034B1 (en) * 2002-11-27 2010-11-02 Totalsynch, Llc Text-derived speech animation tool
US10630751B2 (en) * 2016-12-30 2020-04-21 Google Llc Sequence dependent data message consolidation in a voice activated computer network environment
KR20140052155A (ko) * 2012-10-19 2014-05-07 삼성전자주식회사 디스플레이 장치, 디스플레이 장치 제어 방법 및 디스플레이 장치의 제어를 위한 정보처리장치
JP5936588B2 (ja) * 2013-09-30 2016-06-22 Necパーソナルコンピュータ株式会社 情報処理装置、制御方法、及びプログラム
EP3100259A4 (fr) * 2014-01-31 2017-08-30 Hewlett-Packard Development Company, L.P. Commande par entrée vocale
JP2015166890A (ja) * 2014-03-03 2015-09-24 ソニー株式会社 情報処理装置、情報処理システム、情報処理方法及びプログラム
EP3371778A4 (fr) * 2015-11-06 2019-06-26 Mursion, Inc. Système de commande de personnages virtuels
CN106056989B (zh) * 2016-06-23 2018-10-16 广东小天才科技有限公司 一种语言学习方法及装置、终端设备
CN108140383A (zh) * 2016-07-19 2018-06-08 门箱股份有限公司 影像显示设备、话题选择方法、话题选择程序、影像显示方法及影像显示程序
CN106873773B (zh) * 2017-01-09 2021-02-05 北京奇虎科技有限公司 机器人交互控制方法、服务器和机器人
CN107340859B (zh) * 2017-06-14 2021-04-06 北京光年无限科技有限公司 多模态虚拟机器人的多模态交互方法和系统
CN107861626A (zh) * 2017-12-06 2018-03-30 北京光年无限科技有限公司 一种虚拟形象被唤醒的方法及系统
TWI658377B (zh) * 2018-02-08 2019-05-01 佳綸生技股份有限公司 機器人輔助互動系統及其方法
CN108942919B (zh) * 2018-05-28 2021-03-30 北京光年无限科技有限公司 一种基于虚拟人的交互方法及系统
CN110176284A (zh) * 2019-05-21 2019-08-27 杭州师范大学 一种基于虚拟现实的言语失用症康复训练方法
JP2019212325A (ja) * 2019-08-22 2019-12-12 株式会社Novera 情報処理装置、ミラーデバイス、プログラム
CN110815258B (zh) * 2019-10-30 2023-03-31 华南理工大学 基于电磁力反馈和增强现实的机器人遥操作系统和方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH06348297A (ja) * 1993-06-10 1994-12-22 Osaka Gas Co Ltd 発音練習装置
CN110853614A (zh) * 2018-08-03 2020-02-28 Tcl集团股份有限公司 虚拟对象口型驱动方法、装置及终端设备
CN109599113A (zh) * 2019-01-22 2019-04-09 北京百度网讯科技有限公司 用于处理信息的方法和装置
CN110413841A (zh) * 2019-06-13 2019-11-05 深圳追一科技有限公司 多态交互方法、装置、系统、电子设备及存储介质
CN111459451A (zh) * 2020-03-31 2020-07-28 北京市商汤科技开发有限公司 交互对象的驱动方法、装置、设备以及存储介质

Also Published As

Publication number Publication date
CN111459451A (zh) 2020-07-28
KR20210124306A (ko) 2021-10-14
TWI759039B (zh) 2022-03-21
JP2022531056A (ja) 2022-07-06
TW202138987A (zh) 2021-10-16
SG11202109201XA (en) 2021-11-29

Similar Documents

Publication Publication Date Title
WO2021169431A1 (fr) Procédé et appareil d'interaction, et dispositif électronique et support de stockage
TWI766499B (zh) 互動物件的驅動方法、裝置、設備以及儲存媒體
EP3612878B1 (fr) Exécution de tâche multimodale et édition de texte pour un système portable
WO2021196646A1 (fr) Procédé et appareil de commande d'objet interactif, dispositif et support de stockage
TWI760015B (zh) 互動物件的驅動方法、裝置、設備以及儲存媒體
WO2021196644A1 (fr) Procédé, appareil et dispositif permettant d'entraîner un objet interactif, et support d'enregistrement
JP7193015B2 (ja) コミュニケーション支援プログラム、コミュニケーション支援方法、コミュニケーション支援システム、端末装置及び非言語表現プログラム
US10388325B1 (en) Non-disruptive NUI command
CN110162598B (zh) 一种数据处理方法和装置、一种用于数据处理的装置
EP3142359A1 (fr) Dispositif d'affichage, et procédé d'exécution d'appel vidéo correspondant
WO2021232876A1 (fr) Procédé et appareil d'entraînement d'humain virtuel en temps réel, dispositif électronique et support d'enregistrement
JP2024513640A (ja) 仮想対象のアクション処理方法およびその装置、コンピュータプログラム
TW202121161A (zh) 互動對象的驅動方法、裝置、顯示設備、電子設備以及電腦可讀儲存介質
WO2022222572A1 (fr) Procédé et appareil pour piloter un objet d'interaction, dispositif, et support de stockage
WO2021196647A1 (fr) Procédé et appareil permettant de commander un objet interactif, dispositif, et support de stockage
TW202248994A (zh) 互動對象驅動和音素處理方法、設備以及儲存媒體
CN110166844B (zh) 一种数据处理方法和装置、一种用于数据处理的装置

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2021549865

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20217027681

Country of ref document: KR

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20929643

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20929643

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 521430719

Country of ref document: SA