CN111541908A - Interaction method, device, equipment and storage medium - Google Patents

Interaction method, device, equipment and storage medium Download PDF

Info

Publication number
CN111541908A
CN111541908A CN202010362562.6A CN202010362562A CN111541908A CN 111541908 A CN111541908 A CN 111541908A CN 202010362562 A CN202010362562 A CN 202010362562A CN 111541908 A CN111541908 A CN 111541908A
Authority
CN
China
Prior art keywords
response
content
client
interactive object
message
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010362562.6A
Other languages
Chinese (zh)
Inventor
张子隆
孙林
路露
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sensetime Technology Development Co Ltd
Original Assignee
Beijing Sensetime Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sensetime Technology Development Co Ltd filed Critical Beijing Sensetime Technology Development Co Ltd
Publication of CN111541908A publication Critical patent/CN111541908A/en
Priority to PCT/CN2020/130184 priority Critical patent/WO2021169431A1/en
Priority to JP2021549324A priority patent/JP2022524944A/en
Priority to SG11202109192Q priority patent/SG11202109192QA/en
Priority to KR1020217023002A priority patent/KR20210110620A/en
Priority to TW109145727A priority patent/TWI778477B/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/20Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
    • H04N21/21Server components or server architectures
    • H04N21/218Source of audio or video content, e.g. local disk arrays
    • H04N21/2187Live feed
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T13/00Animation
    • G06T13/203D [Three Dimensional] animation
    • G06T13/403D [Three Dimensional] animation of characters, e.g. humans, animals or virtual beings
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/40Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
    • H04N21/47End-user applications
    • H04N21/478Supplemental services, e.g. displaying phone caller identification, shopping application
    • H04N21/4788Supplemental services, e.g. displaying phone caller identification, shopping application communicating with other users, e.g. chatting
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N21/00Selective content distribution, e.g. interactive television or video on demand [VOD]
    • H04N21/80Generation or processing of content or additional data by content creator independently of the distribution process; Content per se
    • H04N21/81Monomedia components thereof
    • H04N21/816Monomedia components thereof involving special video data, e.g 3D video

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Databases & Information Systems (AREA)
  • Social Psychology (AREA)
  • Psychiatry (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • User Interface Of Digital Computer (AREA)
  • Processing Or Creating Images (AREA)
  • Pressure Welding/Diffusion-Bonding (AREA)
  • Electrophonic Musical Instruments (AREA)
  • Steroid Compounds (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The disclosure relates to an interaction method, apparatus, device and storage medium. The method comprises the following steps: receiving a first message from a client; acquiring driving data matched with the indication content based on the indication content included in the first message; and controlling a video playing interface of the client to play a response animation of the interactive object by using the driving data, wherein the interactive object is obtained by rendering a two-dimensional or three-dimensional virtual model.

Description

Interaction method, device, equipment and storage medium
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to an interaction method, apparatus, device, and storage medium.
Background
With the rapid development of the internet, live broadcasting becomes an important information propagation mode. Because different audiences watch the live broadcast of the network in different time periods, the live broadcast can not be carried out for 24 hours by the real person anchor so as to meet the requirements of different audiences. The problem can be solved by using the digital people to carry out live broadcasting, however, the interaction technology between the digital people anchor and the audience is yet to be researched and developed.
Disclosure of Invention
The disclosed embodiments provide an interaction scheme.
According to an aspect of the present disclosure, there is provided an interaction method, the method including: receiving a first message from a client; acquiring driving data matched with the indication content based on the indication content included in the first message; and controlling a video playing interface of the client to play a response animation of the interactive object by using the driving data, wherein the interactive object is obtained by rendering a two-dimensional or three-dimensional virtual model.
In combination with any embodiment provided by the present disclosure, the obtaining, based on the indication content included in the first message, the driving data matching the indication content includes: acquiring response content aiming at the indication content, wherein the response content comprises response text; and acquiring control parameters of the set action of the interactive object matched with the target data based on at least one target data contained in the response text.
In combination with any embodiment provided by the present disclosure, the obtaining, based on the indication content included in the first message, the driving data matching the indication content includes: acquiring response content aiming at the indication content, wherein the response content comprises a phoneme sequence; and acquiring the control parameters of the interactive object matched with the phoneme sequence.
In combination with any one of the embodiments provided by the present disclosure, the acquiring the control parameters of the interactive object matching the phoneme sequence includes: performing feature coding on the phoneme sequence to obtain a first coding sequence corresponding to the phoneme sequence; acquiring a feature code corresponding to at least one phoneme according to the first coding sequence; and acquiring the attitude control vector of at least one local area of the interactive object corresponding to the feature code.
In connection with any embodiment provided by the disclosure, the instructional content comprises textual content; the obtaining of the response content for the indication content includes: and identifying the language intention expressed by the text content based on a natural language processing algorithm, and acquiring response content matched with the language intention.
In combination with any embodiment provided by the present disclosure, the method further comprises: and sending indication information comprising response content aiming at the indication content to the client so as to enable the client to show the response content based on the indication information.
In combination with any one of the embodiments provided by the present disclosure, the controlling, by using the driving data, the client to play the response animation of the interactive object in the video playing interface includes: sending the driving data of the interactive object to the client so that the client generates a response animation according to the driving data; controlling the client to play the response animation in a video playing interface; or adjusting two-dimensional or three-dimensional virtual model parameters of the interactive object based on the driving data; and generating a response animation of the interactive object by using a rendering engine based on the adjusted two-dimensional or three-dimensional virtual model parameters, and sending the response animation to the client.
According to an aspect of the present disclosure, there is provided an interaction method, the method including: in response to a user input operation from a client, sending a first message including indication content to a server; and playing the response animation of the interactive object in a video playing interface of the client based on a second message responded by the server to the first message, wherein the interactive object is obtained by rendering through a two-dimensional or three-dimensional virtual model.
In connection with any embodiment provided by the disclosure, the instructional content comprises textual content; the method further comprises the following steps: and displaying the text content in the client, and/or playing an audio file corresponding to the text content.
In combination with any one of the embodiments provided by the present disclosure, the presenting the text content in the client includes: generating bullet screen information of the text content; and displaying the bullet screen information in a video playing interface of the client.
In connection with any embodiment provided by the disclosure, the second message includes a response text for the indication content; the method further comprises the following steps: and displaying the response text in a video playing interface of the client, and/or playing an audio file corresponding to the response text.
In connection with any embodiment provided by the present disclosure, the second message includes driving data of the interactive object; the playing the response animation of the interactive object in the video playing interface of the client based on the second message responded by the server to the first message comprises the following steps: adjusting two-dimensional or three-dimensional virtual model parameters of the interactive object based on the driving data; based on the adjusted two-dimensional or three-dimensional virtual model parameters, generating a response animation of the interactive object by using a rendering engine, and displaying the response animation in a video playing interface of the client; wherein the driving data includes control parameters of the interactive object matching with a phoneme sequence corresponding to a response text for the indication content, and/or control parameters of a setting action of the interactive object matching with at least one target data included in the response text.
In combination with any one of the embodiments provided in the disclosure, the second message includes a response animation made by the interactive object to the indication content.
According to an aspect of the present disclosure, an interaction apparatus is provided, the apparatus including: a receiving unit, configured to receive a first message from a client; an acquisition unit that acquires drive data matching the instruction content based on the instruction content included in the first message; and the driving unit is used for controlling a video playing interface of the client to play the response animation of the interactive object by using the driving data, wherein the interactive object is obtained by rendering a two-dimensional or three-dimensional virtual model.
In combination with any one of the embodiments provided by the present disclosure, the obtaining unit is specifically configured to: acquiring response content aiming at the indication content, wherein the response content comprises response text; and acquiring control parameters of the set action of the interactive object matched with the target data based on at least one target data contained in the response text.
In combination with any one of the embodiments provided by the present disclosure, the obtaining unit is specifically configured to: acquiring response content aiming at the indication content, wherein the response content comprises a phoneme sequence; and acquiring the control parameters of the interactive object matched with the phoneme sequence.
In connection with any embodiment provided by the present disclosure, the control parameters of the interactive object include an attitude control vector of at least one local region; the obtaining unit, when configured to obtain the control parameter of the interactive object matched with the phoneme sequence, is specifically configured to: performing feature coding on the phoneme sequence to obtain a first coding sequence corresponding to the phoneme sequence; acquiring a feature code corresponding to at least one phoneme according to the first coding sequence; and acquiring the attitude control vector of at least one local area of the interactive object corresponding to the feature code.
In connection with any embodiment provided by the disclosure, the instructional content comprises textual content; when the obtaining unit is configured to obtain the response content for the indication content, the obtaining unit is specifically configured to: and identifying the language intention expressed by the text content based on a natural language processing algorithm, and acquiring response content matched with the language intention.
In combination with any embodiment provided by the present disclosure, the apparatus further includes a presentation unit, configured to send, to the client, indication information including response content for the indication content, so as to cause the client to present the response content based on the indication information.
In combination with any one of the embodiments provided by the present disclosure, the playing unit is specifically configured to: sending the driving data of the interactive object to the client so that the client generates a response animation according to the driving data; controlling the client to play the response animation in a video playing interface; or adjusting two-dimensional or three-dimensional virtual model parameters of the interactive object based on the driving data; and generating a response animation of the interactive object by using a rendering engine based on the adjusted two-dimensional or three-dimensional virtual model parameters, and sending the response animation to the client.
According to an aspect of the present disclosure, an interaction apparatus is provided, the apparatus including: a sending unit configured to send a first message including an instruction content to a server in response to a user input operation from a client; and the playing unit is used for playing the response animation of the interactive object in the video playing interface of the client based on the second message responded by the server to the first message, wherein the interactive object is obtained by rendering through a two-dimensional or three-dimensional virtual model.
In connection with any embodiment provided by the disclosure, the instructional content comprises textual content; the device further comprises a first display unit, which is used for displaying the text content in the client and/or playing an audio file corresponding to the text content.
In combination with any embodiment provided by the present disclosure, when the first presentation unit is configured to present the text content in the client, the first presentation unit is specifically configured to: generating bullet screen information of the text content; and displaying the bullet screen information in a video playing interface of the client.
In connection with any embodiment provided by the disclosure, the second message includes a response text for the indication content; the device further comprises a second display unit, which is used for displaying the response text in a video playing interface of the client and/or playing an audio file corresponding to the response text.
In connection with any embodiment provided by the present disclosure, the second message includes driving data of the interactive object; the play unit is specifically configured to: adjusting two-dimensional or three-dimensional virtual model parameters of the interactive object based on the driving data; based on the adjusted two-dimensional or three-dimensional virtual model parameters, generating a response animation of the interactive object by using a rendering engine, and displaying the response animation in a video playing interface of the client; wherein the driving data includes control parameters of the interactive object matching with a phoneme sequence corresponding to a response text for the indication content, and/or control parameters of a setting action of the interactive object matching with at least one target data included in the response text.
In combination with any one of the embodiments provided in the disclosure, the second message includes a response animation made by the interactive object to the indication content.
According to an aspect of the present disclosure, an electronic device is provided, which includes a memory for storing computer instructions executable on a processor, and the processor is configured to implement the interaction method proposed in any embodiment of the present disclosure when executing the computer instructions.
According to an aspect of the present disclosure, a computer-readable storage medium is proposed, on which a computer program is stored, which when executed by a processor implements the interaction method proposed by any of the embodiments of the present disclosure.
In the interaction method, the device, the equipment and the storage medium provided by the embodiment of the disclosure, the first message from the client is received, the matched driving data is obtained according to the indication content contained in the first message, the driving data is utilized to control the video playing interface of the client to play the response animation of the interaction object, and the response of the interaction object is displayed, so that the interaction object can timely feed back the indication content of the user, and the timely interaction with the user is realized.
Drawings
In order to more clearly illustrate one or more embodiments or technical solutions in the prior art in the present specification, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in one or more embodiments of the present specification, and other drawings can be obtained by those skilled in the art without inventive exercise.
Fig. 1 illustrates a flow diagram of an interaction method in accordance with at least one embodiment of the present disclosure;
fig. 2 is a schematic diagram illustrating that an interactive method proposed by at least one embodiment of the present disclosure is applied to a live broadcast process;
FIG. 3 illustrates a flow chart of a method for obtaining an attitude control vector in accordance with at least one embodiment of the present disclosure;
FIG. 4 illustrates a flow diagram of another interaction method in accordance with at least one embodiment of the present disclosure;
FIG. 5 illustrates a schematic structural diagram of an interaction device in accordance with at least one embodiment of the present disclosure;
FIG. 6 illustrates a schematic structural diagram of another interactive device in accordance with at least one embodiment of the present disclosure;
fig. 7 shows a schematic structural diagram of an electronic device according to at least one embodiment of the present disclosure.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.
The digital people are used as the anchor, the live broadcast can be carried out at any time period, the 24-hour uninterrupted live broadcast can be realized, and different requirements of different audiences on live broadcast watching time are met. The problem that how to feed back the problems brought forward by the user in time and how to perform vivid and natural interaction with the user is urgently needed to be solved by taking a digital person as an interactive object of the user in the live broadcasting process.
In view of the above, the present disclosure provides an interaction scheme, which is applicable to any scene involving interaction with a virtual interaction object, such as live webcasting.
The interaction method provided by the embodiment of the disclosure can be applied to a terminal device or a server, the terminal device can be, for example, an electronic device with a client installed, such as a mobile phone, a tablet computer, and the like, and the disclosure is not limited to the form of the terminal device. The client is, for example, a live video client, and includes a live video client, a somatosensory interaction client, and the like. The server may be any server capable of providing the processing capability of the interactive object, and the form of the server is not limited in the present disclosure.
The interactive object can be any interactive object capable of interacting with the user, and can be a virtual character, a virtual animal, a virtual article, a cartoon image and other virtual images capable of realizing interactive functions, the interactive object can be constructed based on a two-dimensional virtual model, can also be constructed based on a three-dimensional virtual model, and is obtained by rendering the two-dimensional or three-dimensional virtual model. The user can be a real person user, a robot or other intelligent equipment. The interaction mode between the interaction object and the user can be an active interaction mode or a passive interaction mode.
For example, in a live video scene, an animation of an interactive object may be displayed in a live video interface of a client, and a user may perform an input operation, such as text input, voice input, action trigger, key trigger, and the like, in the client of a terminal device to implement interaction with the interactive object.
Fig. 1 shows a flowchart of an interaction method according to at least one embodiment of the present disclosure, which may be applied to a server side. As shown in fig. 1, the method includes steps 101 to 103.
In step 101, a first message is received from a client.
For example, the indication content carried in the first message may be information input by the user through an input operation performed by a client of the terminal device, where the input operation of the user includes a text input operation, a voice input operation, an action trigger operation, a key trigger operation, and the like. The form of the indication content carried in the first message includes, but is not limited to, text, voice, image (e.g., expression, action image), video, and the like. For example, in a live video scene, the client may be a client supporting a live video watching function, the first message may be sent out after acquiring that a user inputs text content in a live video interface, the indication content carried by the first message is, for example, the input text content, and the indication content may be displayed in the live video interface in a bullet screen manner; in the somatosensory interaction scene, the first message can be sent out after the user behavior image is collected, and the indication content carried by the first message is the collected user behavior image for example. Of course, the present disclosure does not limit the sending mechanism of the first message and the form of the indication content carried in the first message in the specific implementation.
In step 102, based on the indication content included in the first message, the driving data matching with the indication content is obtained.
Illustratively, the driving data includes one or more of sound driving data, expression driving data, and motion driving data. In one embodiment, the driving data may be pre-stored in a server or other associated service server, and after receiving the first message from the client, the driving data may be retrieved in the server or other associated service server according to the indication content to obtain the driving data matching with the indication content. In another embodiment, the driving data may be generated according to the indication content, for example, by inputting the indication content into a deep learning model trained in advance to predict the driving data corresponding to the indication content.
In step 103, the driving data is utilized to control the video playing interface of the client to play the response animation of the interactive object.
In the embodiment of the present disclosure, the interactive object is obtained by rendering a two-dimensional or three-dimensional virtual model. The two-dimensional or three-dimensional virtual model can be generated by self-definition, and can also be obtained by converting an image or a video of a character. The embodiment of the present disclosure does not limit the generation manner of the virtual model.
The response animation can be generated according to the driving data, the response animation of the interactive object is played by controlling a video playing interface of the client, such as a video live interface, and the response of the interactive object to the first message from the client can be displayed.
In the embodiment of the disclosure, the first message from the client is received, the matched driving data is obtained according to the indication content contained in the first message, the driving data is utilized to control the video playing interface of the client to play the response animation of the interactive object, and the response of the interactive object is displayed, so that the interactive object can feed back the indication content of the user in time, and the user can interact with the interactive object in time.
Fig. 2 is an exemplary illustration of an interactive method applied to a live broadcast process according to at least one embodiment of the present disclosure. As shown in fig. 2, the interactive object is a three-dimensional virtual character having a doctor's image. The method comprises the steps that a process that the three-dimensional virtual character is used as a main broadcast to conduct live broadcast can be displayed in a video live broadcast interface of a client, an operator of the client, namely a user, can input indication content in the video live broadcast interface to send a first message carrying the indication content, correspondingly, after the server receives the first message from the client, the indication content can be identified, such as how to wash hands, matched driving data can be obtained according to the indication content, and the client can be controlled to display a response of the three-dimensional virtual character to the indication content of how to wash hands.
In some embodiments, the instructional content comprises textual content. The response content for the indication content may be acquired as follows: and identifying the Language intention expressed by the text content based on a Natural Language Processing (NLP) algorithm, and acquiring response content matched with the Language intention.
In some embodiments, the text content may be processed using a pre-trained Neural Network model for natural language processing, such as Convolutional Neural Networks (CNNs), Recurrent Neural Networks (RNNs), Long Short Term memory Networks (LTSMs), and so on. And determining the language intention category expressed by the text content by inputting the text content included in the first message into the neural network model and classifying the language intention represented by the text content.
Because the text content included in the first message may include multilayer meanings, the intention which the user actually wants to express can be identified by utilizing a natural language processing algorithm, so that the content which the user really wants to obtain can be directly fed back, and the interaction experience of the user is improved.
In some embodiments, the response content matching the language intent and conforming to the language intent may be searched from a preset database according to the language intent, and further, the driving data for causing the interactive object to express the response content may be generated based on the response content. The database may be deployed in a server or a cloud, which is not limited in this disclosure.
In the case of recognizing the language intent, the entity may be determined by extracting a parameter related to the language intent in the text content, that is, by system word segmentation, information extraction, and the like. In the data corresponding to the language intention classification, the response text conforming to the language intention can be further determined by the entity. It will be appreciated by those skilled in the art that the above approach is merely exemplary, and that other approaches may be utilized to obtain answer text matching the linguistic intent, and that the present disclosure is not so limited.
In some embodiments, voice-driven data may be generated according to the response content, and the voice-driven data may include, for example, a phoneme sequence corresponding to a response text included in the response content. By generating the voice corresponding to the phoneme sequence and controlling the client to output the voice, the interactive object can be enabled to output the voice expressing the content represented by the response text.
In some embodiments, action-driving data may be generated from the response content to cause the interaction object to make an action that expresses the response content.
In one example, where the response content includes response text, the action driving data may be generated from the response content in the following manner: and acquiring control parameters of the set action of the interactive object matched with the target data based on at least one target data contained in the response text.
The target data may be set keywords, words, sentences, and the like. Taking the keyword as "washing hands" as an example, if the response text includes "washing hands", it can be determined that the response text includes the target data. Each target data can be preset with a matched setting action, and each setting action can be realized through a set of control parameter sequence, for example, a set of control parameters is formed by displacement of a plurality of bone points, and the model parameters of the interactive object can be adjusted by using the control parameter sequence formed by a plurality of sets of control parameters, so that the interactive object can make the setting action.
In the embodiment of the disclosure, the interactive object responds to the first message in the form of action, so that the user can obtain intuitive and vivid response to the first message, and the interactive experience of the user is improved.
In some embodiments, voice information corresponding to the target data may be determined; acquiring time information for outputting the voice information; determining the execution time of a set action corresponding to the target data according to the time information; and controlling the interactive object to execute the set action according to the control parameter corresponding to the target data according to the execution time.
In a case where the client is controlled to output the speech according to the phoneme sequence corresponding to the response text, time information of outputting the speech corresponding to the target data, such as a time when the speech corresponding to the target data starts to be output, a time when the output is ended, and a duration, may be determined. The execution time of the setting action corresponding to the target data can be determined according to the time information, and the interactive object is controlled to execute the setting action by the control parameter corresponding to the target data during the execution time or within a certain range of the execution time.
In the embodiment of the disclosure, for each target data, the duration of outputting the corresponding voice is consistent or similar to the duration of controlling the action according to the corresponding control parameter, so that the voice corresponding to the target data output by the interactive object is matched with the action performing time, thereby synchronizing and coordinating the voice and the action of the interactive object, enabling a user to generate a feeling of responding to the interactive object in a live broadcast process, and improving the experience of the user in interacting with a main broadcast in the live broadcast process.
In some embodiments, gesture-driven data may be generated from the response text to cause the client to exhibit a gesture of the interactive object that matches the speech corresponding to the response text, e.g., to make a corresponding expression and action.
In one example, the response content may further include a phoneme sequence, or, in a case that the response content includes a response text, a phoneme sequence corresponding to the response text may be extracted, and after the response content including the phoneme sequence is acquired, the control parameter of the interactive object matching the phoneme sequence may be acquired. Wherein the control parameters of the interactive object include an attitude control vector of at least one local region, and the obtaining of the control parameters of the interactive object matched with the phoneme sequence includes: performing feature coding on the phoneme sequence to obtain a first coding sequence corresponding to the phoneme sequence; acquiring a feature code corresponding to at least one phoneme according to the first coding sequence; and acquiring the attitude control vector of at least one local area of the interactive object corresponding to the feature code.
In some embodiments, the client is controlled to play the voice corresponding to the response text and display the response animation of the gesture of the interactive object matched with the voice, so that the response of the interactive object is more anthropomorphic, more vivid and natural, and the interaction experience of the user is improved.
In case the control parameters of the interaction object comprise an attitude control vector of at least one local area, the attitude control vector may be obtained in the following way.
Firstly, carrying out feature coding on a phoneme sequence corresponding to the response text to obtain a coding sequence corresponding to the phoneme sequence. Here, in order to distinguish from the coding sequence mentioned later, the coding sequence corresponding to the phoneme sequence of the text data is referred to as a first coding sequence.
And generating a sub-coding sequence corresponding to each phoneme aiming at the phonemes contained in the phoneme sequence.
In one example, whether a first phoneme corresponds to each time point is detected, wherein the first phoneme is any one of the phonemes; and setting the coding value at the time point with the first phoneme as a first numerical value, setting the coding value at the time without the first phoneme as a second numerical value, and obtaining the coding sequence corresponding to the first phoneme after assigning the coding values at the time points. For example, the coding value at the time with the first phoneme may be set to 1, and the coding value at the time without the first phoneme may be set to 0. It will be understood by those skilled in the art that the above-mentioned setting of the encoding value is only an example, and other values may be set as well, and the present disclosure does not limit this.
And then, obtaining a first coding sequence corresponding to the phoneme sequence according to the sub-coding sequences corresponding to the multiple phonemes respectively.
In one example, for the sub-coding sequence corresponding to the first phoneme, a gaussian filter may be used to perform a gaussian convolution operation on the time continuous values of the first phoneme to filter the matrix corresponding to the feature code, so as to smooth the transition action of the mouth region at each phoneme transition.
Fig. 3 illustrates a flowchart of a method for obtaining an attitude control vector according to at least one embodiment of the present disclosure. As shown in fig. 3, the phoneme sequence 310 includes phonemes j, i1, j, and ie4 (for simplicity, only a part of the phonemes are shown), and sub-coding sequences 321, 322, and 323 corresponding to the phonemes j, i1, and ie4 are obtained for each phoneme. In each sub-coding sequence, the coding value corresponding to the time with the phoneme is a first numerical value (for example, 1), and the coding value corresponding to the time without the phoneme is a second numerical value (for example, 0). Taking the sub-coding sequence 321 as an example, at the time of phoneme j in the phoneme sequence 310, the value of the sub-coding sequence 321 is the first numerical value, and at the time of no phoneme j, the value of the sub-coding sequence 321 is the second numerical value. All the sub-coding sequences constitute a first coding sequence 320.
And then, acquiring a feature code corresponding to at least one phoneme according to the first coding sequence.
The feature information of the sub-coding sequences 321, 322, 323 can be obtained according to the coding values of the sub-coding sequences 321, 322, 323 corresponding to the phonemes j, i1, ie4, respectively, and the durations of the phonemes corresponding to the three sub-coding sequences, i.e., the duration of j in the sub-coding sequence 321, the duration of i1 in the sub-coding sequence 322, and the duration of ie4 in the sub-coding sequence 323.
In one example, a gaussian filter may be used to perform a gaussian convolution operation on temporally continuous values of phonemes j, i1, ie4 in the respective sub-encoding sequences 321, 322, 323 to smooth the feature encoding, resulting in the smoothed first encoding sequence 330. That is, the gaussian convolution operation is performed on temporally continuous values of 0 to 1 of the phoneme by the gaussian filter so that the change phase of the coded value from the second value to the first value or from the first value to the second value in each coded sequence becomes smooth. For example, the values of the coded sequence also present intermediate state values, such as 0.2, 0.3 and the like, besides 0 and 1, and the gesture control vector obtained according to the intermediate state values enables excessive movement and more gradual and natural expression change of the interactive character, thereby improving the interactive experience of the target object.
In some embodiments, the feature codes corresponding to at least one phoneme may be obtained by performing a sliding window on the first coding sequence. Wherein the first coding sequence may be a coding sequence after a gaussian convolution operation.
And sliding the coding sequence by a time window with a set length and a set step length, taking the feature codes in the time window as the feature codes of the corresponding at least one phoneme, and obtaining a second coding sequence according to the obtained plurality of feature codes after the sliding is finished. As shown in fig. 3, by sliding a time window with a set length on the first coded sequence 320 or the smoothed first coded sequence 330, feature code 1, feature code 2, feature code 3 are obtained, and so on, after traversing the first coded sequence, feature codes 1, 2, 3, …, M are obtained, and thus the second coded sequence 340 is obtained. Wherein, M is a positive integer, and the value is determined according to the length of the first coding sequence, the length of the time window and the step length of sliding the time window.
From the feature encodings 1, 2, 3, …, M, the corresponding attitude control vectors 1, 2, 3, …, M, respectively, may be obtained, thereby obtaining a sequence of attitude control vectors 350.
The sequence of pose control vectors 350 is temporally aligned with the second encoded sequence 340, and each feature vector in the sequence of pose control vectors 350 is also obtained from at least one phoneme in the sequence of phonemes since each encoding feature in the second encoded sequence is obtained from at least one phoneme in the sequence of phonemes. And when the phoneme sequence corresponding to the text data is played, the interactive object is driven to make an action according to the sequence of the attitude control vector, namely the interactive object is driven to make a sound corresponding to the text content, and simultaneously the action synchronous with the sound is made, so that the target object can have the feeling that the interactive object is speaking, and the interactive experience of the target object is improved.
Assuming that the encoding feature starts to be output at the setting time of the first time window, the gesture control vector before the setting time may be set as a default value, that is, the interactive object is made to perform a default action when the phoneme sequence starts to be played, and the interactive object is driven to perform an action by using the sequence of the gesture control vector obtained according to the first encoding sequence after the setting time. Taking fig. 3 as an example, the output of the encoding feature 1 is started at time t0, and the default posture control vector is assigned before time t 0.
In some embodiments, in the case that the time interval between phonemes in the phoneme sequence is greater than a set threshold, the interactive object is driven to make an action according to the set posture control vector of the local region. That is, when the interactive character pauses for a long time, the interactive object is driven to make a set action. For example, when the output sound is in a large pause, the interactive character can be made to have a smiling expression or to make a slight swing of the body, so that the interactive character is prevented from standing upright without expression when the pause is long, the speaking process of the interactive object is natural and smooth, and the interactive feeling of the target object is improved.
In some embodiments, for target data contained in the response text, obtaining control parameters of a set action of an interactive object matched with the target data to drive the interactive object to execute the set action; for response contents other than the target data, the control parameters of the interactive object can be acquired according to the phonemes corresponding to the response contents, so that the interactive object is driven to make a gesture, such as an expression and an action, which is matched with the pronunciation of the response contents.
Taking the live broadcast process shown in fig. 2 as an example, in the case that the received first message contains the text content "how to wash hands", it can be recognized through a natural language processing algorithm that the language intention of the user is "consult how to wash hands". By searching in a preset database, content according with how to answer hand washing can be obtained and taken as answer text. By generating action driving data, voice driving data and posture driving data according to the response text, the interactive object can answer the question of 'how to wash hands' through voice, make expressions and actions matched with pronunciation, and demonstrate how to wash hands through limb actions.
In some embodiments, indication information including the response text may also be sent to the client, so that the client presents the response text based on the indication information.
For example, for the response text responding to the question of "how to wash hands", the indication message containing the response text may be sent to the client to display the indication message in the form of text on the client, so that the user can more accurately receive the information conveyed by the interactive object.
In some embodiments, a two-dimensional or three-dimensional virtual model corresponding to the interactive object may be stored at the client. In this case, the driving data of the interactive object may be sent to the client, so that the client generates a response animation according to the driving data; and controlling the client to play the response animation. For example, the client may be controlled to adjust parameters of a two-dimensional or three-dimensional virtual model of the interacted object according to control parameters contained in the driving data; and based on the adjusted two-dimensional or three-dimensional virtual model parameters, generating a response animation of the interactive object by using a rendering engine, and playing the response animation to respond to the first message.
Under the condition that the data volume of the two-dimensional or three-dimensional virtual model of the interactive object is small and the performance of the client is not occupied, the client can generate response animation according to the drive data by sending the drive data to the client, so that the picture of the interactive object for responding can be displayed conveniently and flexibly.
In some embodiments, the two-dimensional or three-dimensional virtual model corresponding to the interactive object is stored at a server side or a cloud side. In this case, two-dimensional or three-dimensional virtual model parameters of the interactive object may be adjusted based on the driving data; and generating a response animation of the interactive object by using a rendering engine based on the adjusted two-dimensional or three-dimensional virtual model parameters, and sending the response animation to the client, wherein the action or expression of the interactive object is displayed in the response animation. The response of the interactive object is realized by sending the response animation to the client, so that the stagnation caused by rendering of the client can be avoided, the high-quality response animation can be displayed at the client, and the interaction experience of a user is improved.
Fig. 4 illustrates a flow diagram of another interaction method in accordance with at least one embodiment of the present disclosure. The interaction method can be applied to the client. The method comprises steps 401-402.
In step 401, in response to a user input operation from a client, a first message including an indication content is transmitted to a server.
Illustratively, the user input operation includes a text input operation, a voice input operation, an action trigger operation, a key trigger operation, and the like, and in response to the user input operation, a first message is sent to the server, and the indication content carried in the first message includes but is not limited to one or more of text, voice, image (e.g., expression, action image), video, and the like. In a live video scene, the client may be a client supporting a live video watching function, the first message may be sent out after acquiring text content input by a user in a live video interface, the indication content carried by the first message is, for example, the input text content, and the indication content may be displayed in the live video interface in a bullet screen manner; in the somatosensory interaction scene, the first message can be sent out after the user behavior image is collected, and the indication content carried by the first message is the collected user behavior image for example. Of course, the present disclosure does not limit the sending mechanism of the first message and the form of the indication content carried in the first message in the specific implementation.
In step 402, playing a response animation of the interactive object in the video playing of the client based on a second message responded by the server to the first message.
The second message is generated by the server for the indication content contained in the first message, and is used for enabling the client to display the response of the interaction object to the indication content.
In the embodiment of the present disclosure, the interactive object is obtained by rendering a two-dimensional or three-dimensional virtual model. The two-dimensional or three-dimensional virtual model can be generated by self-definition, and can also be obtained by converting an image or a video of a character. The embodiment of the present disclosure does not limit the generation manner of the virtual model.
In the embodiment of the disclosure, a first message including indication content is sent to a server according to user input operation, and the response of an interactive object to the indication content is displayed in a client based on a second message responded by the server to the first message, so that the interactive object can timely feed back the indication content of a user, and timely interaction with the user is realized.
In some embodiments, the instructional content comprises textual content; the method further comprises the following steps: and displaying the text content in the client, and/or playing an audio file corresponding to the text content. That is, the text content input by the user can be displayed at the client; and playing an audio file corresponding to the text content at the client, and outputting the voice corresponding to the text content.
In some embodiments, said presenting said textual content in said client comprises: generating bullet screen information of the text content; and displaying the bullet screen information in a video live broadcast interface of the client.
Under a video live broadcast scene, corresponding barrage information can be generated for text content input by a user, and the barrage information is displayed on a video live broadcast interface of a client. Taking fig. 2 as an example, in a case that the user inputs "how to wash hands" in the live broadcast interactive interface of the client, the live broadcast interface of the video may show the bullet screen information "how to wash hands" corresponding to the text content.
In some embodiments, the second message includes response text for the indication content; the method further comprises the following steps: and displaying the response text in a video playing interface of the client, and/or playing an audio file corresponding to the response text.
The response text of the indication content can be obtained by identifying the language intention expressed by the text content and searching the response text which is matched with the language intention and accords with the language intention from a preset database. The specific method is described in the above embodiments, and is not described herein again.
Taking a live video scene as an example, a response text replied to the bullet screen information of the user can be displayed in a live video interface in the form of bullet screen information; and the audio file corresponding to the response text can be played on a video live interface, namely, the voice corresponding to the response text is output, so that the bullet screen information of the user can be accurately and visually replied, and the interaction experience of the user is improved.
In some embodiments, the second message includes control parameters of the interactive object matching with a phoneme sequence corresponding to the response text, and/or control parameters of a setting action of the interactive object matching with at least one target data included in the response text; the playing the response animation of the interactive object in the video playing interface of the client based on the second message responded by the server to the first message comprises the following steps: adjusting two-dimensional or three-dimensional virtual model parameters of the interactive object based on the driving data; and generating a response animation of the interactive object by using a rendering engine based on the adjusted two-dimensional or three-dimensional virtual model parameters, and displaying the response animation in a video playing interface of the client. For a specific method for generating the control parameter of the interactive object matched with the phoneme sequence corresponding to the response text and generating the control parameter of the setting action of the interactive object matched with the at least one target datum contained in the response text, reference is made to the above-mentioned embodiment, and details are not repeated here.
And under the condition that the data volume of the two-dimensional or three-dimensional virtual model of the interactive object is small and the performance of the client is not occupied, the client acquires the driving data and generates response animation according to the driving data, so that the picture of the interactive object for responding can be displayed conveniently and flexibly.
In some embodiments, the second message further comprises a response animation made by the interactive object to the indication content; the playing the response animation of the interactive object in the video playing interface of the client based on the second message responded by the server to the first message comprises the following steps: and displaying the response animation in a video playing interface of the client.
In some embodiments, the two-dimensional or three-dimensional virtual model corresponding to the interactive object is stored at a server side or a cloud side. In this case, the response animation may be generated at the server side or the cloud side. For the specific way of generating the response animation, refer to the above embodiments, and are not described herein again.
The response of the interactive object is realized by sending the response animation to the client, so that the stagnation caused by rendering of the client can be avoided, the high-quality response animation can be displayed at the client, and the interaction experience of a user is improved.
Exemplarily, the following are some embodiments of the present disclosure applied in the scene of a video live platform:
in some embodiments, the first message received from the client is user barrage text transmitted by the live platform.
In some embodiments, after analyzing the intention of the bullet screen through a natural language processing algorithm, a corresponding answer is obtained, and then the content of the answer is broadcasted through the interactive object. And the action corresponding to the content of the answer can be displayed through the interactive object.
In some embodiments, a natural language processing capability of the client is directly integrated, the natural language processing is performed on the indication content included in the first message, a response text which matches the language intent of the indication content and conforms to the language intent is obtained, and the output text corresponding to the response text is directly provided to the interactive object for playing.
In some embodiments, the interactive object may mimic the user's spoken content. For example, for the voice input by the user through the client, the voice is converted into the text, the voice characteristic of the user is obtained according to the voice, and the voice corresponding to the text is output based on the voice characteristic, so that the interactive object can imitate the speaking content of the user.
In some embodiments, the interactive object may also perform page display according to the content returned by the natural language processing, and may display the UI content according to the content to be displayed that is designed in advance and the interactive mode, so that the display of the response content is more striking and attracts the attention of the user.
In the embodiment, live broadcast real-time interaction can be realized, and in the live broadcast process, a user can interact with an interactive object in real time to obtain feedback. The method can also realize continuous live broadcast and can automatically produce video contents, thereby being a new live television broadcast mode.
Illustratively, the interactive object may be represented as a digital person in 3D form. The digital man combines the AI simulation animation generation capability with the natural language understanding capability, and can communicate with the user in the same voice type as a real person. The digital person can generate corresponding mouth shape, expression, eye spirit and whole body action according to the answer content, finally output high-quality and audio-video synchronous voice and two-dimensional or three-dimensional animation content, and naturally present the complete digital person image to the user.
In some embodiments, content service libraries in different knowledge fields can be rapidly docked, the method is efficiently applied to more industries, digital human images in various styles such as super-realistic styles and cartoon styles can be provided according to different scene requirements, and intelligent interaction with users through AI technologies such as face recognition and gesture recognition is supported. For example, a digital person with a super-realistic style can build intelligent foregrounds of banks, business halls and service halls, and can be in real and effective contact with customers, so that the service quality and the customer satisfaction are improved.
In some embodiments, the cartoon-style digital man can be applied to scenes which are guided by interesting interaction, such as intelligent guides in offline business surpasses, or intelligent coaches, virtual teachers and the like, so as to achieve the purposes of guiding customers, arousing interests, strengthening teaching effects and the like.
At least one embodiment of the present disclosure also provides an interactive apparatus, as shown in fig. 5, the apparatus including: a receiving unit 501, configured to receive a first message from a client; an obtaining unit 502, configured to obtain, based on the indication content included in the first message, driving data matching the indication content; the driving unit 503 is configured to control, by using the driving data, the video playing interface of the client to play the response animation of the interactive object, where the interactive object is obtained by rendering a two-dimensional or three-dimensional virtual model.
In some embodiments, the obtaining of the driving data matching with the indication content based on the indication content included in the first message includes: acquiring response content aiming at the indication content, wherein the response content comprises response text; and acquiring control parameters of the set action of the interactive object matched with the target data based on at least one target data contained in the response text.
In some embodiments, the obtaining of the driving data matching with the indication content based on the indication content included in the first message includes: acquiring response content aiming at the indication content, wherein the response content comprises a phoneme sequence; and acquiring the control parameters of the interactive object matched with the phoneme sequence.
In some embodiments, the control parameters of the interactive object include an attitude control vector of at least one local region, and the obtaining the control parameters of the interactive object matching the phoneme sequence includes: performing feature coding on the phoneme sequence to obtain a first coding sequence corresponding to the phoneme sequence; acquiring a feature code corresponding to at least one phoneme according to the first coding sequence; and acquiring the attitude control vector of at least one local area of the interactive object corresponding to the feature code.
In some embodiments, the instructional content comprises textual content; the obtaining of the response content for the indication content includes: and identifying the language intention expressed by the text content based on a natural language processing algorithm, and acquiring response content matched with the language intention.
In some embodiments, the method further comprises: and sending indication information comprising response content aiming at the indication content to the client so as to enable the client to show the response content based on the indication information.
In some embodiments, the controlling, by using the driving data, the client to play the response animation of the interactive object in the video playing interface includes: sending the driving data of the interactive object to the client so that the client generates a response animation according to the driving data; controlling the client to play the response animation in a video playing interface; or adjusting two-dimensional or three-dimensional virtual model parameters of the interactive object based on the driving data; and generating a response animation of the interactive object by using a rendering engine based on the adjusted two-dimensional or three-dimensional virtual model parameters, and sending the response animation to the client.
At least one embodiment of the present disclosure also provides another interaction apparatus, as shown in fig. 6, the apparatus including: a sending unit 601, configured to send a first message including an indication content to a server in response to a user input operation from a client; a playing unit 602, configured to play, in a video playing interface of the client, a response animation of the interactive object based on a second message that the server responds to the first message, where the interactive object is obtained by rendering through a two-dimensional or three-dimensional virtual model.
In some embodiments, the instructional content comprises textual content; the device further comprises a first display unit, which is used for displaying the text content in the client and/or playing an audio file corresponding to the text content.
In some embodiments, when the first presentation unit is configured to present the text content in the client, it is specifically configured to: generating bullet screen information of the text content; and displaying the bullet screen information in a video playing interface of the client.
In some embodiments, the second message includes response text for the indication content; the device further comprises a second display unit, which is used for displaying the response text in a video playing interface of the client and/or playing an audio file corresponding to the response text.
In some embodiments, the second message includes driving data of the interactive object; the play unit is specifically configured to: adjusting two-dimensional or three-dimensional virtual model parameters of the interactive object based on the driving data; based on the adjusted two-dimensional or three-dimensional virtual model parameters, generating a response animation of the interactive object by using a rendering engine, and displaying the response animation in a video playing interface of the client; wherein the driving data includes control parameters of the interactive object matching with a phoneme sequence corresponding to a response text for the indication content, and/or control parameters of a setting action of the interactive object matching with at least one target data included in the response text.
In some embodiments, the second message includes a responsive animation of the interactive object to the instructional content.
At least one embodiment of the present disclosure further provides an electronic device, as shown in fig. 7, including a memory for storing computer instructions executable on a processor, and the processor for implementing the interaction method according to any one of the embodiments of the present disclosure when executing the computer instructions.
At least one embodiment of the present specification also provides a computer-readable storage medium on which a computer program is stored, which when executed by a processor implements the interaction method according to any one of the embodiments of the present disclosure.
As will be appreciated by one skilled in the art, one or more embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, one or more embodiments of the present description may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present description may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the data processing apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to part of the description of the method embodiment.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the acts or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
Embodiments of the subject matter and the functional operations described in this specification can be implemented in: digital electronic circuitry, tangibly embodied computer software or firmware, computer hardware including the structures disclosed in this specification and their structural equivalents, or a combination of one or more of them. Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions, encoded on a tangible, non-transitory program carrier for execution by, or to control the operation of, data processing apparatus. Alternatively or additionally, the program instructions may be encoded on an artificially generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode and transmit information to suitable receiver apparatus for execution by the data processing apparatus. The computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.
The processes and logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).
Computers suitable for executing computer programs include, for example, general and/or special purpose microprocessors, or any other type of central processing unit. Generally, a central processing unit will receive instructions and data from a read-only memory and/or a random access memory. The basic components of a computer include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer does not necessarily have such a device. Moreover, a computer may be embedded in another device, e.g., a mobile telephone, a Personal Digital Assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device such as a Universal Serial Bus (USB) flash drive, to name a few.
Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices (e.g., EPROM, EEPROM, and flash memory devices), magnetic disks (e.g., an internal hard disk or a removable disk), magneto-optical disks, and CD ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any invention or of what may be claimed, but rather as descriptions of features specific to particular embodiments of particular inventions. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. In other instances, features described in connection with one embodiment may be implemented as discrete components or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In some cases, multitasking and parallel processing may be advantageous. Moreover, the separation of various system modules and components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.
Thus, particular embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. Further, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some implementations, multitasking and parallel processing may be advantageous.
The above description is only for the purpose of illustrating the preferred embodiments of the one or more embodiments of the present disclosure, and is not intended to limit the scope of the one or more embodiments of the present disclosure, and any modifications, equivalent substitutions, improvements, etc. made within the spirit and principle of the one or more embodiments of the present disclosure should be included in the scope of the one or more embodiments of the present disclosure.

Claims (19)

1. An interactive method, characterized in that the method comprises:
receiving a first message from a client;
acquiring driving data matched with the indication content based on the indication content included in the first message;
and controlling a video playing interface of the client to play a response animation of the interactive object by using the driving data, wherein the interactive object is obtained by rendering a two-dimensional or three-dimensional virtual model.
2. The method according to claim 1, wherein the obtaining of the driving data matching with the indication content based on the indication content included in the first message comprises:
acquiring response content aiming at the indication content, wherein the response content comprises response text;
and acquiring control parameters of the set action of the interactive object matched with the target data based on at least one target data contained in the response text.
3. The method according to claim 1, wherein the obtaining of the driving data matching with the indication content based on the indication content included in the first message comprises:
acquiring response content aiming at the indication content, wherein the response content comprises a phoneme sequence;
and acquiring the control parameters of the interactive object matched with the phoneme sequence.
4. The method of claim 3, wherein the control parameters of the interactive object comprise an attitude control vector of at least one local region, and wherein the obtaining the control parameters of the interactive object matching the phoneme sequence comprises:
performing feature coding on the phoneme sequence to obtain a first coding sequence corresponding to the phoneme sequence;
acquiring a feature code corresponding to at least one phoneme according to the first coding sequence;
and acquiring the attitude control vector of at least one local area of the interactive object corresponding to the feature code.
5. The method according to any one of claims 2 to 4, wherein the indication content comprises text content;
the obtaining of the response content for the indication content includes:
and identifying the language intention expressed by the text content based on a natural language processing algorithm, and acquiring response content matched with the language intention.
6. The method of any of claims 1 to 5, further comprising:
and sending indication information comprising response content aiming at the indication content to the client so as to enable the client to show the response content based on the indication information.
7. The method according to any one of claims 1 to 6, wherein the controlling the client to play the response animation of the interactive object in the video playing interface by using the driving data comprises:
sending the driving data of the interactive object to the client so that the client generates a response animation according to the driving data; controlling the client to play the response animation in a video playing interface;
or adjusting two-dimensional or three-dimensional virtual model parameters of the interactive object based on the driving data; and generating a response animation of the interactive object by using a rendering engine based on the adjusted two-dimensional or three-dimensional virtual model parameters, and sending the response animation to the client.
8. An interactive method, characterized in that the method comprises:
in response to a user input operation from a client, sending a first message including indication content to a server;
and playing the response animation of the interactive object in a video playing interface of the client based on a second message responded by the server to the first message, wherein the interactive object is obtained by rendering through a two-dimensional or three-dimensional virtual model.
9. The method of claim 8, wherein the indication comprises textual content;
the method further comprises the following steps: and displaying the text content in the client, and/or playing an audio file corresponding to the text content.
10. The method of claim 9, wherein said presenting the text content in the client comprises: generating bullet screen information of the text content; and displaying the bullet screen information in a video playing interface of the client.
11. The method according to any one of claims 8 to 10, wherein the second message includes a response text for the indication content;
the method further comprises the following steps: and displaying the response text in a video playing interface of the client, and/or playing an audio file corresponding to the response text.
12. The method according to any one of claims 8 to 10, wherein the second message includes driving data of the interactive object;
the playing the response animation of the interactive object in the video playing interface of the client based on the second message responded by the server to the first message comprises the following steps:
adjusting two-dimensional or three-dimensional virtual model parameters of the interactive object based on the driving data;
based on the adjusted two-dimensional or three-dimensional virtual model parameters, generating a response animation of the interactive object by using a rendering engine, and displaying the response animation in a video playing interface of the client;
wherein the driving data includes control parameters of the interactive object matching with a phoneme sequence corresponding to a response text for the indication content, and/or control parameters of a setting action of the interactive object matching with at least one target data included in the response text.
13. The method of any of claims 8 to 10, wherein the second message comprises a response animation of the interactive object to the indication.
14. An interactive apparatus, characterized in that the apparatus comprises:
a receiving unit, configured to receive a first message from a client;
an acquisition unit that acquires drive data matching the instruction content based on the instruction content included in the first message;
and the driving unit is used for controlling a video playing interface of the client to play the response animation of the interactive object by using the driving data, wherein the interactive object is obtained by rendering a two-dimensional or three-dimensional virtual model.
15. An interactive apparatus, characterized in that the apparatus comprises:
a sending unit configured to send a first message including an instruction content to a server in response to a user input operation from a client;
and the playing unit is used for playing the response animation of the interactive object in the video playing interface of the client based on the second message responded by the server to the first message, wherein the interactive object is obtained by rendering through a two-dimensional or three-dimensional virtual model.
16. An electronic device, comprising a memory for storing computer instructions executable on a processor, the processor being configured to implement the method of any one of claims 1 to 7 when executing the computer instructions.
17. An electronic device, characterized in that the device comprises a memory for storing computer instructions executable on a processor, the processor being adapted to carry out the method of any one of claims 8 to 13 when executing the computer instructions.
18. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 1 to 7.
19. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 8 to 13.
CN202010362562.6A 2020-02-27 2020-04-30 Interaction method, device, equipment and storage medium Pending CN111541908A (en)

Priority Applications (5)

Application Number Priority Date Filing Date Title
PCT/CN2020/130184 WO2021169431A1 (en) 2020-02-27 2020-11-19 Interaction method and apparatus, and electronic device and storage medium
JP2021549324A JP2022524944A (en) 2020-02-27 2020-11-19 Interaction methods, devices, electronic devices and storage media
SG11202109192Q SG11202109192QA (en) 2020-02-27 2020-11-19 Interaction method and apparatus, electronic device and storage medium
KR1020217023002A KR20210110620A (en) 2020-02-27 2020-11-19 Interaction methods, devices, electronic devices and storage media
TW109145727A TWI778477B (en) 2020-02-27 2020-12-23 Interaction methods, apparatuses thereof, electronic devices and computer readable storage media

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010125701 2020-02-27
CN2020101257013 2020-02-27

Publications (1)

Publication Number Publication Date
CN111541908A true CN111541908A (en) 2020-08-14

Family

ID=71980272

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010362562.6A Pending CN111541908A (en) 2020-02-27 2020-04-30 Interaction method, device, equipment and storage medium

Country Status (6)

Country Link
JP (1) JP2022524944A (en)
KR (1) KR20210110620A (en)
CN (1) CN111541908A (en)
SG (1) SG11202109192QA (en)
TW (1) TWI778477B (en)
WO (1) WO2021169431A1 (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111970539A (en) * 2020-08-19 2020-11-20 赵蒙 Data coding method based on deep learning and cloud computing service and big data platform
CN112633110A (en) * 2020-12-16 2021-04-09 中国联合网络通信集团有限公司 Data processing method and device
WO2021169431A1 (en) * 2020-02-27 2021-09-02 北京市商汤科技开发有限公司 Interaction method and apparatus, and electronic device and storage medium
WO2021196643A1 (en) * 2020-03-31 2021-10-07 北京市商汤科技开发有限公司 Method and apparatus for driving interactive object, device, and storage medium
CN113766253A (en) * 2021-01-04 2021-12-07 北京沃东天骏信息技术有限公司 Live broadcast method, device, equipment and storage medium based on virtual anchor
CN113810729A (en) * 2021-09-16 2021-12-17 中国平安人寿保险股份有限公司 Live broadcast atmosphere special effect matching method, device, equipment and medium
CN113849117A (en) * 2021-10-18 2021-12-28 深圳追一科技有限公司 Interaction method, interaction device, computer equipment and computer-readable storage medium
CN113867538A (en) * 2021-10-18 2021-12-31 深圳追一科技有限公司 Interaction method, interaction device, computer equipment and computer-readable storage medium
CN115086693A (en) * 2022-05-07 2022-09-20 北京达佳互联信息技术有限公司 Virtual object interaction method and device, electronic equipment and storage medium
CN116168134A (en) * 2022-12-28 2023-05-26 北京百度网讯科技有限公司 Digital person control method, digital person control device, electronic equipment and storage medium
WO2024114162A1 (en) * 2022-11-29 2024-06-06 腾讯科技(深圳)有限公司 Animation processing method and apparatus, computer device, storage medium, and program product

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230127495A1 (en) * 2021-10-22 2023-04-27 Lemon Inc. System and method for animated emoji recording and playback
CN114241132B (en) * 2021-12-16 2023-07-21 北京字跳网络技术有限公司 Scene content display control method and device, computer equipment and storage medium
CN114363685A (en) * 2021-12-20 2022-04-15 咪咕文化科技有限公司 Video interaction method and device, computing equipment and computer storage medium
CN114302241A (en) * 2021-12-30 2022-04-08 阿里巴巴(中国)有限公司 Virtual live broadcast service pushing method and device
CN114401438B (en) * 2021-12-31 2022-12-09 魔珐(上海)信息科技有限公司 Video generation method and device for virtual digital person, storage medium and terminal
CN117813579A (en) * 2022-07-29 2024-04-02 京东方科技集团股份有限公司 Model control method, device, equipment, system and computer storage medium
CN118118719A (en) * 2022-11-30 2024-05-31 北京字跳网络技术有限公司 Dynamic playing method and device, electronic equipment and storage medium
CN116668796B (en) * 2023-07-03 2024-01-23 佛山市炫新智能科技有限公司 Interactive artificial live broadcast information management system
CN116527956B (en) * 2023-07-03 2023-08-22 世优(北京)科技有限公司 Virtual object live broadcast method, device and system based on target event triggering
CN116824010B (en) * 2023-07-04 2024-03-26 安徽建筑大学 Feedback type multiterminal animation design online interaction method and system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104637482A (en) * 2015-01-19 2015-05-20 孔繁泽 Voice recognition method, device, system and language switching system
CN104866101A (en) * 2015-05-27 2015-08-26 世优(北京)科技有限公司 Real-time interactive control method and real-time interactive control device of virtual object
CN106056989A (en) * 2016-06-23 2016-10-26 广东小天才科技有限公司 Language learning method and device and terminal equipment
CN106878820A (en) * 2016-12-09 2017-06-20 北京小米移动软件有限公司 Living broadcast interactive method and device
CN109120985A (en) * 2018-10-11 2019-01-01 广州虎牙信息科技有限公司 Image display method, apparatus and storage medium in live streaming
CN109491564A (en) * 2018-10-18 2019-03-19 深圳前海达闼云端智能科技有限公司 Interaction method and device of virtual robot, storage medium and electronic equipment
CN110634483A (en) * 2019-09-03 2019-12-31 北京达佳互联信息技术有限公司 Man-machine interaction method and device, electronic equipment and storage medium

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006330958A (en) * 2005-05-25 2006-12-07 Oki Electric Ind Co Ltd Image composition device, communication terminal using the same, and image communication system and chat server in the system
JP2016038601A (en) * 2014-08-05 2016-03-22 日本放送協会 Cg character interaction device and cg character interaction program
CN105094315B (en) * 2015-06-25 2018-03-06 百度在线网络技术(北京)有限公司 The method and apparatus of human-machine intelligence's chat based on artificial intelligence
WO2017189559A1 (en) * 2016-04-26 2017-11-02 Taechyon Robotics Corporation Multiple interactive personalities robot
US10546229B2 (en) * 2016-06-02 2020-01-28 Kodak Alaris Inc. System and method for predictive curation, production infrastructure, and personal content assistant
CN107329990A (en) * 2017-06-06 2017-11-07 北京光年无限科技有限公司 A kind of mood output intent and dialogue interactive system for virtual robot
CN109388297B (en) * 2017-08-10 2021-10-22 腾讯科技(深圳)有限公司 Expression display method and device, computer readable storage medium and terminal
WO2019060889A1 (en) * 2017-09-25 2019-03-28 Ventana 3D, Llc Artificial intelligence (a) character system capable of natural verbal and visual interactions with a human
CN107784355A (en) * 2017-10-26 2018-03-09 北京光年无限科技有限公司 The multi-modal interaction data processing method of visual human and system
US10635665B2 (en) * 2017-12-21 2020-04-28 Disney Enterprises, Inc. Systems and methods to facilitate bi-directional artificial intelligence communications
CN108810561A (en) * 2018-06-21 2018-11-13 珠海金山网络游戏科技有限公司 A kind of three-dimensional idol live broadcasting method and device based on artificial intelligence
CN110298906B (en) * 2019-06-28 2023-08-11 北京百度网讯科技有限公司 Method and device for generating information
CN111541908A (en) * 2020-02-27 2020-08-14 北京市商汤科技开发有限公司 Interaction method, device, equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104637482A (en) * 2015-01-19 2015-05-20 孔繁泽 Voice recognition method, device, system and language switching system
CN104866101A (en) * 2015-05-27 2015-08-26 世优(北京)科技有限公司 Real-time interactive control method and real-time interactive control device of virtual object
CN106056989A (en) * 2016-06-23 2016-10-26 广东小天才科技有限公司 Language learning method and device and terminal equipment
CN106878820A (en) * 2016-12-09 2017-06-20 北京小米移动软件有限公司 Living broadcast interactive method and device
CN109120985A (en) * 2018-10-11 2019-01-01 广州虎牙信息科技有限公司 Image display method, apparatus and storage medium in live streaming
CN109491564A (en) * 2018-10-18 2019-03-19 深圳前海达闼云端智能科技有限公司 Interaction method and device of virtual robot, storage medium and electronic equipment
CN110634483A (en) * 2019-09-03 2019-12-31 北京达佳互联信息技术有限公司 Man-machine interaction method and device, electronic equipment and storage medium

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021169431A1 (en) * 2020-02-27 2021-09-02 北京市商汤科技开发有限公司 Interaction method and apparatus, and electronic device and storage medium
WO2021196643A1 (en) * 2020-03-31 2021-10-07 北京市商汤科技开发有限公司 Method and apparatus for driving interactive object, device, and storage medium
CN111970539B (en) * 2020-08-19 2021-04-16 深圳天使无忧科技开发有限公司 Data coding method based on deep learning and cloud computing service and big data platform
CN111970539A (en) * 2020-08-19 2020-11-20 赵蒙 Data coding method based on deep learning and cloud computing service and big data platform
CN112633110B (en) * 2020-12-16 2024-02-13 中国联合网络通信集团有限公司 Data processing method and device
CN112633110A (en) * 2020-12-16 2021-04-09 中国联合网络通信集团有限公司 Data processing method and device
CN113766253A (en) * 2021-01-04 2021-12-07 北京沃东天骏信息技术有限公司 Live broadcast method, device, equipment and storage medium based on virtual anchor
CN113810729A (en) * 2021-09-16 2021-12-17 中国平安人寿保险股份有限公司 Live broadcast atmosphere special effect matching method, device, equipment and medium
CN113810729B (en) * 2021-09-16 2024-02-02 中国平安人寿保险股份有限公司 Live atmosphere special effect matching method, device, equipment and medium
CN113867538A (en) * 2021-10-18 2021-12-31 深圳追一科技有限公司 Interaction method, interaction device, computer equipment and computer-readable storage medium
CN113849117A (en) * 2021-10-18 2021-12-28 深圳追一科技有限公司 Interaction method, interaction device, computer equipment and computer-readable storage medium
CN115086693A (en) * 2022-05-07 2022-09-20 北京达佳互联信息技术有限公司 Virtual object interaction method and device, electronic equipment and storage medium
WO2024114162A1 (en) * 2022-11-29 2024-06-06 腾讯科技(深圳)有限公司 Animation processing method and apparatus, computer device, storage medium, and program product
CN116168134A (en) * 2022-12-28 2023-05-26 北京百度网讯科技有限公司 Digital person control method, digital person control device, electronic equipment and storage medium
CN116168134B (en) * 2022-12-28 2024-01-02 北京百度网讯科技有限公司 Digital person control method, digital person control device, electronic equipment and storage medium

Also Published As

Publication number Publication date
WO2021169431A1 (en) 2021-09-02
TW202132967A (en) 2021-09-01
JP2022524944A (en) 2022-05-11
TWI778477B (en) 2022-09-21
KR20210110620A (en) 2021-09-08
SG11202109192QA (en) 2021-10-28

Similar Documents

Publication Publication Date Title
TWI778477B (en) Interaction methods, apparatuses thereof, electronic devices and computer readable storage media
TWI766499B (en) Method and apparatus for driving interactive object, device and storage medium
US11017551B2 (en) System and method for identifying a point of interest based on intersecting visual trajectories
CN111459454B (en) Interactive object driving method, device, equipment and storage medium
CN111459452B (en) Driving method, device and equipment of interaction object and storage medium
CN111460785B (en) Method, device and equipment for driving interactive object and storage medium
US10785489B2 (en) System and method for visual rendering based on sparse samples with predicted motion
US11308312B2 (en) System and method for reconstructing unoccupied 3D space
CN113067953A (en) Customer service method, system, device, server and storage medium
US20190251350A1 (en) System and method for inferring scenes based on visual context-free grammar model
CN113689879B (en) Method, device, electronic equipment and medium for driving virtual person in real time
CN116958342A (en) Method for generating actions of virtual image, method and device for constructing action library
CN113314104B (en) Interactive object driving and phoneme processing method, device, equipment and storage medium
CN115145434A (en) Interactive service method and device based on virtual image
CN112632262A (en) Conversation method, conversation device, computer equipment and storage medium
CN116843805B (en) Method, device, equipment and medium for generating virtual image containing behaviors
CN117373455B (en) Audio and video generation method, device, equipment and storage medium
CN118250523A (en) Digital human video generation method and device, storage medium and electronic equipment
CN116805458A (en) Auxiliary teaching method, device, equipment and storage medium
CN116074550A (en) Live broadcast interaction method and device
CN115578679A (en) Interaction method, interaction device, terminal, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40026472

Country of ref document: HK

RJ01 Rejection of invention patent application after publication

Application publication date: 20200814

RJ01 Rejection of invention patent application after publication