CN114866802B

CN114866802B - Video stream sending method and device, storage medium and electronic device

Info

Publication number: CN114866802B
Application number: CN202210390334.9A
Authority: CN
Inventors: 于航滨
Original assignee: Qingdao Haier Technology Co Ltd; Haier Smart Home Co Ltd
Current assignee: Qingdao Haier Technology Co Ltd; Haier Smart Home Co Ltd
Priority date: 2022-04-14
Filing date: 2022-04-14
Publication date: 2024-04-19
Anticipated expiration: 2042-04-14
Also published as: CN114866802A

Abstract

The application discloses a video stream sending method and device, a storage medium and an electronic device, and relates to the technical field of smart families, wherein the method comprises the following steps: the cloud server determines audio information for responding to the request event and action information of a virtual object corresponding to the audio information according to the acquired request event; the request event is an event that a target object requests from a virtual object on the terminal equipment; the cloud server configures a display animation of the virtual object according to the action information of the virtual object corresponding to the audio information; the cloud server determines all key frames and/or all forward frames corresponding to the display animation, renders a preset model according to all key frames and/or all forward frames, and generates a first video stream of a virtual object response request event; wherein the first video stream does not contain audio information; and the cloud server generates a second video stream according to the first video stream and the audio information.

Description

Video stream sending method and device, storage medium and electronic device

Technical Field

The present invention relates to the field of communications, and in particular, to a method and apparatus for transmitting a video stream, a storage medium, and an electronic device.

Background

With the development of technology and the continuous improvement of living standard of people in the future, more and more families use intelligent devices, and the intelligent devices can support displaying virtual objects.

Common virtual object schemes implement the display of virtual objects by integrating a 3D engine at the user's smart device. However, the scheme is limited by the hardware performance of the intelligent device, and on the intelligent device with weaker performance, smooth display effect cannot be realized, and even the situation of picture blocking and tearing can occur.

In addition, the calculation efficiency can be improved by implementing a parallel optimization mode, so that the effect of improving the frame rate is achieved. However, this solution only improves the rendering efficiency of the engine, but does not solve the limitation caused by the hardware performance, and on the low-performance intelligent device, even if the computing power is 100% utilized, the ideal display effect cannot be achieved.

Aiming at the problems of low speed and the like of generating video streams corresponding to virtual objects on low-performance intelligent equipment in the related art, no effective solution has been proposed.

Disclosure of Invention

The embodiment of the invention provides a video stream sending method and device, a storage medium and an electronic device, which at least solve the problems of low speed and the like of generating a video stream corresponding to a virtual object on low-performance intelligent equipment in the related technology.

According to an embodiment of the present invention, there is provided a method for transmitting a video stream, including: the cloud server determines audio information for responding to the request event and action information of a virtual object corresponding to the audio information according to the acquired request event; the request event is an event that a target object requests to a virtual object on the terminal equipment; the cloud server configures a display animation of the virtual object according to the action information of the virtual object corresponding to the audio information; the cloud server determines all key frames and/or all forward frames corresponding to the display animation, renders a preset model according to all key frames and/or all forward frames, and generates a first video stream of the virtual object responding to the request event; wherein the first video stream does not contain audio information; and the cloud server generates a second video stream according to the first video stream and the audio information.

In an exemplary embodiment, the cloud server configures a presentation animation of the virtual object according to the action information of the virtual object corresponding to the audio information, including: determining text information of the virtual object responding to the request event; determining audio information corresponding to the text information according to tone information and tone information of a preset virtual object, and determining a lip movement track and limb actions of the virtual object when the terminal equipment plays the audio information; wherein the action information includes: the lip movement track and limb actions; and configuring the display animation of the virtual object according to the lip movement track and the limb action.

In an exemplary embodiment, determining all key frames and/or all forward frames corresponding to the presentation animation includes: analyzing all frame data corresponding to the action information and the display animation, and determining unchanged first unit data and changed second unit data between adjacent first frame data and second frame data one by one; all key frames and/or all forward frames of the presentation animation are determined from the first unit data and the second unit data.

In an exemplary embodiment, rendering the preset model according to the all key frames to generate a first video stream includes: a first rendering step: determining a first key frame in all the key frames according to the playing time sequence corresponding to all the key frames; rendering the preset model according to the first key frame to generate a first video frame; circularly executing the first rendering step until the preset model is rendered according to the last key frame, and generating a last video frame corresponding to the last key frame; and ordering all video frames according to the time stamps of all video frames to generate the first video stream.

In an exemplary embodiment, rendering the preset model according to the all forward frames to generate a first video stream includes: determining a first forward frame in all forward frames according to the corresponding playing time sequence of all forward frames; rendering the preset model according to the first forward frame to generate a first video frame; a second rendering step: determining a second forward frame from all the forward frames according to the playing time sequence; rendering the preset model according to the second forward frame and the first video frame to generate a second video frame; the second rendering step is circularly executed until the preset model is rendered according to the last forward frame and the previous video frame, and a last video frame corresponding to the last forward frame is generated; and ordering all video frames according to the time stamps of all video frames to generate the first video stream.

In an exemplary embodiment, generating a second video stream from the first video stream and the audio information includes: acquiring a corresponding relation between each video frame in the first video stream and each audio frame in the audio information; and carrying out audio and video coding on the first video stream and the audio information according to the corresponding relation so as to generate a second video stream of the virtual object responding to the request event.

In one exemplary embodiment, determining unchanged first unit data and changed second unit data between adjacent first frame data and second frame data one by one includes: comparing the adjacent first frame data with the second frame data, determining second unit data of the second frame data which is changed relative to the first frame data, and determining first unit data of the second frame data which is unchanged relative to the first frame data.

According to another embodiment of the present invention, there is further provided a device for sending a video stream, applied to a cloud server, including: the determining module is used for determining audio information for responding to the request event and action information of a virtual object corresponding to the audio information according to the acquired request event; the request event is an event that a target object requests to a virtual object on the terminal equipment; the configuration module is used for configuring the display animation of the virtual object according to the action information of the virtual object corresponding to the audio information; the first generation module is used for determining all key frames and/or all forward frames corresponding to the display animation, rendering a preset model according to all key frames and/or all forward frames, and generating a first video stream of the virtual object responding to the request event; wherein the first video stream does not contain audio information; and the second generation module is used for generating a second video stream according to the first video stream and the audio information.

According to still another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium having a computer program stored therein, wherein the computer program is configured to perform the above-described video stream transmission method when executed.

According to still another aspect of the embodiments of the present invention, there is further provided an electronic device including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor executes the method for sending a video stream according to the above-mentioned method.

In the embodiment of the invention, a cloud server determines audio information for responding to a request event and action information of a virtual object corresponding to the audio information according to the acquired request event; the request event is an event that a target object requests to a virtual object on the terminal equipment; the cloud server configures a display animation of the virtual object according to the action information of the virtual object corresponding to the audio information; the cloud server determines all key frames and/or all forward frames corresponding to the display animation, renders a preset model according to all key frames and/or all forward frames, and generates a first video stream of the virtual object responding to the request event; wherein the first video stream does not contain audio information; the cloud server generates a second video stream according to the first video stream and the audio information; by adopting the technical scheme, the problems that the video stream corresponding to the virtual object is generated on the low-performance intelligent equipment is slow and the like are solved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.

In order to more clearly illustrate the embodiments of the application or the technical solutions of the prior art, the drawings which are used in the description of the embodiments or the prior art will be briefly described, and it will be obvious to a person skilled in the art that other drawings can be obtained from these drawings without inventive effort.

Fig. 1 is a schematic diagram of a hardware environment of a video stream transmission method according to an embodiment of the present application;

Fig. 2 is a flowchart of a method of transmitting a video stream according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a prior art method of transmitting a video stream;

Fig. 4 is a schematic diagram of a method of transmitting a video stream according to an embodiment of the present invention;

Fig. 5 is a block diagram of a video stream transmission apparatus according to an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

According to an aspect of an embodiment of the present application, there is provided a method for transmitting a video stream. The video stream sending method is widely applied to full-house intelligent digital control application scenes such as Smart Home (Smart Home), intelligent Home equipment ecology, intelligent Home (INTELLIGENCE HOUSE) ecology and the like. Alternatively, in the present embodiment, the above-described method of transmitting a video stream may be applied to a hardware environment constituted by the terminal device 102 and the server 104 as shown in fig. 1. As shown in fig. 1, the server 104 is connected to the terminal device 102 through a network, and may be used to provide services (such as application services and the like) for a terminal or a client installed on the terminal, a database may be set on the server or independent of the server, for providing data storage services for the server 104, and cloud computing and/or edge computing services may be configured on the server or independent of the server, for providing data computing services for the server 104.

The network may include, but is not limited to, at least one of: wired network, wireless network. The wired network may include, but is not limited to, at least one of: a wide area network, a metropolitan area network, a local area network, and the wireless network may include, but is not limited to, at least one of: WIFI (WIRELESS FIDELITY ), bluetooth. The terminal device 102 may not be limited to a PC, a mobile phone, a tablet computer, an intelligent air conditioner, an intelligent smoke machine, an intelligent refrigerator, an intelligent oven, an intelligent cooking range, an intelligent washing machine, an intelligent water heater, an intelligent washing device, an intelligent dish washer, an intelligent projection device, an intelligent television, an intelligent clothes hanger, an intelligent curtain, an intelligent video, an intelligent socket, an intelligent sound box, an intelligent fresh air device, an intelligent kitchen and toilet device, an intelligent bathroom device, an intelligent sweeping robot, an intelligent window cleaning robot, an intelligent mopping robot, an intelligent air purifying device, an intelligent steam box, an intelligent microwave oven, an intelligent kitchen appliance, an intelligent purifier, an intelligent water dispenser, an intelligent door lock, and the like.

In this embodiment, a method for sending a video stream is provided and applied to a cloud server, and fig. 2 is a flowchart of a method for sending a video stream according to an embodiment of the present invention, where the flowchart includes the following steps:

Step S202, determining audio information for responding to a request event and action information of a virtual object corresponding to the audio information by the cloud server according to the acquired request event; the request event is an event that a target object requests to a virtual object on the terminal equipment;

For example, the request event may include, but is not limited to: "ask the virtual object on the terminal device for the current point", "ask the virtual object on the terminal device for today's weather", "request the virtual object on the terminal device to set an alarm clock", etc.

Step S204, the cloud server configures a display animation of the virtual object according to the action information of the virtual object corresponding to the audio information;

wherein the motion information may be used to indicate limb motion, lip motion, etc. of the virtual object in case of playing the audio information. It should be noted that, in the embodiment of the present invention, a preset animation may be obtained, where the preset animation may be understood as an animation associated with audio information, for example, the audio information is "raining today", and the preset animation may be an animation related to rain, for example, a raining animation; and configuring the display animation of the virtual object according to the audio information, the action information of the virtual object corresponding to the audio information and the preset animation.

Step S206, the cloud server determines all key frames and/or all forward frames corresponding to the display animation, renders a preset model according to all key frames and/or all forward frames, and generates a first video stream of the virtual object responding to the request event; wherein the first video stream does not contain audio information;

In step S208, the cloud server generates a second video stream according to the first video stream and the audio information.

Through the steps, the cloud server determines audio information for responding to the request event and action information of a virtual object corresponding to the audio information according to the acquired request event; the request event is an event that a target object requests to a virtual object on the terminal equipment; the cloud server configures a display animation of the virtual object according to the action information of the virtual object corresponding to the audio information; the cloud server determines all key frames and/or all forward frames corresponding to the display animation, renders a preset model according to all key frames and/or all forward frames, and generates a first video stream of the virtual object responding to the request event; wherein the first video stream does not contain audio information; the cloud server generates the second video stream according to the first video stream and the audio information, so that the problems that in the related art, the speed of generating the video stream corresponding to the virtual object on the low-performance intelligent device is low are solved.

In an exemplary embodiment, the cloud server configures a presentation animation of the virtual object according to the action information of the virtual object corresponding to the audio information, including: determining text information of the virtual object responding to the request event; determining audio information corresponding to the text information according to tone information and tone information of a preset virtual object, and determining a lip movement track and limb actions of the virtual object when the terminal equipment plays the audio information; extracting keywords in the text information, and determining preset animation with association relation with the keywords; wherein the action information includes: the lip movement track and limb actions; and configuring the display animation of the virtual object according to the lip movement track, the limb movement and the preset animation.

That is, when the cloud server receives the request event, it determines the response information (i.e., text information) in text form corresponding to the request event, and when the text information is determined, it needs to execute the following steps:

Step 1, obtaining tone information and tone information of a target object or a virtual object preset by a developer; determining corresponding audio information according to tone information and tone information of the virtual object;

step 2, determining the lip movement track and limb movement of the virtual object when the terminal equipment plays the audio information;

Step 3, determining preset animation with association relation with keywords in the text information;

And 4, configuring the display animation of the virtual object according to the lip movement track, the limb movement and the preset animation.

It should be noted that, the step 2 is performed after the step 1, and the execution sequence of the step 3 and the step 1 is not limited to the embodiment of the present invention, that is, in the case of determining the text information, the sequence of steps to be performed may be: step 1, step 2 and step 3; the method can also be as follows: step 1, step 3 and step 2; it can also be: step 3, step 1 and step 2.

Further, determining the lip movement track and limb movement of the virtual object when the terminal equipment plays the audio information, including: determining the audio characteristics corresponding to each frame of audio in the audio information; determining lip motion information corresponding to each frame of audio according to the first corresponding relation between the audio characteristics and the lip motion; determining limb motion information corresponding to each frame of audio according to the second corresponding relation between the audio characteristics and the limb motion; and sequencing the lip movement information corresponding to each frame of audio according to the playing sequence of the audio information to obtain the lip movement track, and sequencing the limb movement information corresponding to each frame of audio according to the playing sequence of the audio information to obtain the limb movement.

In an exemplary embodiment, determining all key frames and/or all forward frames corresponding to the presentation animation includes: analyzing and displaying all frame data corresponding to the animation, and determining unchanged first unit data and changed second unit data between adjacent first frame data and second frame data one by one; all key frames and/or all forward frames of the presentation animation are determined from the first unit data and the second unit data.

That is, the frame data corresponding to the display animation is parsed to obtain unit data, and the adjacent two unit data are compared to obtain all key frames and/or all forward frames of the display animation, and a first video stream which does not contain audio information is generated according to all key frames and/or all forward frames.

The embodiment of the invention provides a method for generating a first video stream according to key frames, in particular, because the key frames are the complete storage of a frame of picture, only the key frames are required to be decoded, a complete video frame can be obtained, therefore, under the condition that a preset model is rendered according to all the key frames to generate a second video stream, only the preset model is required to be rendered one by one according to the playing time sequence corresponding to the key frames, the video frames corresponding to all the key frames are obtained, and all the video frames are ordered according to the time stamps of all the video frames, so that the first video stream is generated.

The embodiment of the invention provides a method for generating a first video stream according to a forward frame, in particular, because the forward frame is used for indicating the difference between a frame and a previous frame, the forward frame needs to be decoded, and a complete video information is obtained by combining the previous frame video and the forward frame, therefore, a preset model needs to be rendered according to the forward frame and the video frame before the forward frame to obtain video frames corresponding to all the forward frames, and all the video frames are ordered according to the time stamps of all the video frames, so as to generate the first video stream.

In one exemplary embodiment, determining the second video stream from the first video stream and the audio information includes: acquiring a corresponding relation between each video frame in the first video stream and each audio frame in the audio information; and carrying out audio and video coding on the first video stream and the audio information according to the corresponding relation so as to generate a second video stream of the virtual object responding to the request event.

That is, in the case of generating a first video stream containing no audio information, it is necessary to perform audio-video encoding on the first video stream and the audio information to obtain a second video stream containing audio information.

And comparing the second frame data with the first frame data, and determining the unit data which changes in the second frame data, thereby finding out repeated unit data between adjacent frames. The repeated unit data can be directly compressed, and the object data and the unit data index of the changed unit data are required to be determined, so that the repeated units between adjacent frames can be effectively found out and compressed and encoded, the transmission speed of a transmission layer is improved, and the bandwidth pressure is reduced.

In order to better understand the process of the video stream transmission method, the following describes the implementation method flow of the video stream transmission in combination with the alternative embodiment, but is not used for limiting the technical scheme of the embodiment of the present invention.

FIG. 3 is a schematic diagram of a prior art method of transmitting a video stream; in the existing scheme, model rendering and animation generation links of the virtual objects are carried out in terminal equipment, and after action processing is carried out, the cloud server transmits parameter data to the terminal for processing, so that the data requirement of network transmission is saved, but the requirement on the performance of the terminal equipment is greatly improved. The performance on the terminal equipment at the middle and low ends is difficult to meet the requirement of smooth display.

In this embodiment, a method for sending a video stream is provided, and fig. 4 is a schematic diagram of a method for sending a video stream according to an embodiment of the present invention, as shown in fig. 4, in the embodiment of the present invention, a unit model rendering and animation generating link with the greatest resource consumption is transferred to a cloud server for processing. Thus, the cloud server generates not only parameter data but also video streams after rendering and synthesis. The cloud performs Unity data acquisition, frame data analysis, unit data comparison, I frame/P frame encoding, I frame/P frame decoding, unity model rendering and animation generation links, so that the cloud generates a video stream after rendering and synthesis. And sending the video stream to a terminal, and calling a player to decode and play the video stream by the terminal. The terminal equipment only needs to call the player to decode and play the video stream, and does not need to process other links, so that the virtual object can be smoothly displayed on the terminal equipment at the middle and low ends.

Because the video stream is transmitted, when the terminal equipment does not receive the display video, the local standby animation is displayed, and after the cloud server video stream is issued, seamless switching of the animation can be directly realized, so that intuitive experience of blocking is avoided for a user.

Through the embodiment, the problems that in the related art, smooth display of the virtual object cannot be realized in the low-performance intelligent equipment, even the situation of picture blocking and tearing can occur are solved, and the calculated amount is transferred from the terminal equipment to the cloud server, so that the performance requirement of the virtual object on the terminal equipment is greatly reduced, and the virtual object can be displayed on the terminal equipment smoothly.

From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the above embodiments may be implemented by means of software plus the necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method of the various embodiments of the present invention.

The embodiment also provides a device for sending a video stream, which is applied to the cloud server and is used for implementing the above embodiment and the preferred implementation, and the description is omitted. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.

Fig. 5 is a block diagram of a transmission apparatus of a video stream according to an embodiment of the present invention; as shown in fig. 5, includes:

A determining module 52, configured to determine, according to the acquired request event, audio information for responding to the request event, and action information of a virtual object corresponding to the audio information; the request event is an event that a target object requests to a virtual object on the terminal equipment;

The configuration module 54 is configured to configure a display animation of the virtual object according to the action information of the virtual object corresponding to the audio information;

a first generating module 56, configured to determine all key frames and/or all forward frames corresponding to the display animation, and render a preset model according to the all key frames and/or all forward frames, so as to generate a first video stream of the virtual object responding to the request event; wherein the first video stream does not contain audio information;

a second generating module 58 is configured to generate a second video stream according to the first video stream and the audio information.

Through the device, the cloud server determines the audio information for responding to the request event and the action information of the virtual object corresponding to the audio information according to the acquired request event; the request event is an event that a target object requests to a virtual object on the terminal equipment; the cloud server configures a display animation of the virtual object according to the action information of the virtual object corresponding to the audio information; the cloud server determines all key frames and/or all forward frames corresponding to the display animation, renders a preset model according to all key frames and/or all forward frames, and generates a first video stream of the virtual object responding to the request event; wherein the first video stream does not contain audio information; the cloud server generates the second video stream according to the first video stream and the audio information, so that the problems that in the related art, the speed of generating the video stream corresponding to the virtual object on the low-performance intelligent device is low are solved.

In an exemplary embodiment, a configuration module is configured to determine text information of the virtual object in response to the request event; determining audio information corresponding to the text information according to tone information and tone information of a preset virtual object, and determining a lip movement track and limb actions of the virtual object when the terminal equipment plays the audio information; wherein the action information includes: the lip movement track and limb actions; and configuring the display animation of the virtual object according to the lip movement track and the limb action.

In an exemplary embodiment, a first generating module is configured to parse the motion information and all frame data corresponding to the display animation, and determine, one by one, first unit data and second unit data that are unchanged between adjacent first frame data and second frame data; all key frames and/or all forward frames of the presentation animation are determined from the first unit data and the second unit data.

In one exemplary embodiment, a first generation module is configured to perform a first rendering step: determining a first key frame in all the key frames according to the playing time sequence corresponding to all the key frames; rendering the preset model according to the first key frame to generate a first video frame; circularly executing the first rendering step until the preset model is rendered according to the last key frame, and generating a last video frame corresponding to the last key frame; and ordering all video frames according to the time stamps of all video frames to generate the first video stream.

In an exemplary embodiment, a first generating module is configured to determine a first forward frame from all the forward frames according to the play time sequences corresponding to all the forward frames; rendering the preset model according to the first forward frame to generate a first video frame; a second rendering step: determining a second forward frame from all the forward frames according to the playing time sequence; rendering the preset model according to the second forward frame and the first video frame to generate a second video frame; the second rendering step is circularly executed until the preset model is rendered according to the last forward frame and the previous video frame, and a last video frame corresponding to the last forward frame is generated; and ordering all video frames according to the time stamps of all video frames to generate the first video stream.

In an exemplary embodiment, a second generating module is configured to obtain a correspondence between each video frame in the first video stream and each audio frame in the audio information; and carrying out audio and video coding on the first video stream and the audio information according to the corresponding relation so as to generate a second video stream of the virtual object responding to the request event.

In an exemplary embodiment, a first generating module is configured to compare adjacent first frame data with second frame data, determine second unit data of the second frame data that changes relative to the first frame data, and determine first unit data of the second frame data that does not change relative to the first frame data.

An embodiment of the present invention also provides a storage medium including a stored program, wherein the program executes the method of any one of the above.

Alternatively, in the present embodiment, the above-described storage medium may be configured to store program code for performing the steps of:

S1, determining audio information for responding to a request event and action information of a virtual object corresponding to the audio information by a cloud server according to the acquired request event; the request event is an event that a target object requests to a virtual object on the terminal equipment;

s2, the cloud server configures a display animation of the virtual object according to the action information of the virtual object corresponding to the audio information;

S3, the cloud server determines all key frames and/or all forward frames corresponding to the display animation, renders a preset model according to all key frames and/or all forward frames, and generates a first video stream of the virtual object responding to the request event; wherein the first video stream does not contain audio information;

s4, the cloud server generates a second video stream according to the first video stream and the audio information.

An embodiment of the invention also provides an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.

Optionally, the electronic apparatus may further include a transmission device and an input/output device, where the transmission device is connected to the processor, and the input/output device is connected to the processor.

Alternatively, in the present embodiment, the above-described processor may be configured to execute the following steps by a computer program:

Alternatively, in the present embodiment, the storage medium may include, but is not limited to: a U-disk, a read-only memory (ROM), a random access memory (Random Access Memory RAM), a removable hard disk, a magnetic disk, or an optical disk, etc., which can store program codes.

Alternatively, specific examples in this embodiment may refer to examples described in the foregoing embodiments and optional implementations, and this embodiment is not described herein.

It will be appreciated by those skilled in the art that the modules or steps of the invention described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may alternatively be implemented in program code executable by computing devices, so that they may be stored in a memory device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than that shown or described, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps within them may be fabricated into a single integrated circuit module for implementation. Thus, the present invention is not limited to any specific combination of hardware and software.

The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application, which are intended to be comprehended within the scope of the present application.

Claims

1. A method for transmitting a video stream, comprising:

the cloud server determines audio information for responding to the request event and action information of a virtual object corresponding to the audio information according to the acquired request event; the request event is an event that a target object requests to a virtual object on the terminal equipment;

the cloud server configures a display animation of the virtual object according to the action information of the virtual object corresponding to the audio information;

The cloud server determines all key frames and/or all forward frames corresponding to the display animation,

Rendering a preset model according to all key frames and/or all forward frames, and generating a first video stream of the virtual object responding to the request event; wherein the first video stream does not contain audio information;

the cloud server generates a second video stream according to the first video stream and the audio information;

The cloud server configures a display animation of the virtual object according to the action information of the virtual object corresponding to the audio information, and the display animation comprises the following steps: determining text information of the virtual object responding to the request event; determining audio information corresponding to the text information according to tone information and tone information of a preset virtual object, and determining a lip movement track and limb actions of the virtual object when the terminal equipment plays the audio information; extracting keywords in the text information, and determining preset animation with association relation with the keywords; wherein the action information includes: the lip movement track and limb actions; configuring the display animation of the virtual object according to the lip movement track, the limb movement and the preset animation;

Rendering the preset model according to the key frames to generate a first video stream, wherein the rendering comprises the following steps: a first rendering step: determining a first key frame in all the key frames according to the playing time sequence corresponding to all the key frames; rendering the preset model according to the first key frame to generate a first video frame; circularly executing the first rendering step until the preset model is rendered according to the last key frame, and generating a last video frame corresponding to the last key frame; and ordering all video frames according to the time stamps of all video frames to generate the first video stream.

2. The method of claim 1, wherein determining all key frames and/or all forward frames corresponding to the presentation animation comprises:

Analyzing all frame data corresponding to the display animation, and determining unchanged first unit data and changed second unit data between adjacent first frame data and second frame data one by one;

all key frames and/or all forward frames of the presentation animation are determined from the first unit data and the second unit data.

3. The method of claim 1, wherein rendering the pre-set model from the all forward frames to generate the first video stream comprises:

Determining a first forward frame in all forward frames according to the corresponding playing time sequence of all forward frames; rendering the preset model according to the first forward frame to generate a first video frame;

A second rendering step: determining a second forward frame from all the forward frames according to the playing time sequence;

Rendering the preset model according to the second forward frame and the first video frame to generate a second video frame;

The second rendering step is circularly executed until the preset model is rendered according to the last forward frame and the previous video frame, and a last video frame corresponding to the last forward frame is generated;

and ordering all video frames according to the time stamps of all video frames to generate the first video stream.

4. The method of claim 1, wherein generating a second video stream from the first video stream and the audio information comprises:

Acquiring a corresponding relation between each video frame in the first video stream and each audio frame in the audio information;

and carrying out audio and video coding on the first video stream and the audio information according to the corresponding relation so as to generate a second video stream of the virtual object responding to the request event.

5. The method of claim 2, wherein determining unchanged first unit data and changed second unit data between adjacent first frame data and second frame data one by one comprises:

comparing the adjacent first frame data with the second frame data, determining second unit data of the second frame data which is changed relative to the first frame data, and determining first unit data of the second frame data which is unchanged relative to the first frame data.

6. The utility model provides a sending device of video stream, characterized in that is applied to high in the clouds server, includes:

The determining module is used for determining audio information for responding to the request event and action information of a virtual object corresponding to the audio information according to the acquired request event; the request event is an event that a target object requests to a virtual object on the terminal equipment;

The configuration module is used for configuring the display animation of the virtual object according to the action information of the virtual object corresponding to the audio information;

The first generation module is used for determining all key frames and/or all forward frames corresponding to the display animation, rendering a preset model according to all key frames and/or all forward frames, and generating a first video stream of the virtual object responding to the request event; wherein the first video stream does not contain audio information;

The second generation module is used for generating a second video stream according to the first video stream and the audio information; the configuration module is further used for determining text information of the virtual object responding to the request event; determining audio information corresponding to the text information according to tone information and tone information of a preset virtual object, and determining a lip movement track and limb actions of the virtual object when the terminal equipment plays the audio information; extracting keywords in the text information, and determining preset animation with association relation with the keywords; wherein the action information includes: the lip movement track and limb actions; configuring the display animation of the virtual object according to the lip movement track, the limb movement and the preset animation;

The first generating module is configured to perform a first rendering step: determining a first key frame in all the key frames according to the playing time sequence corresponding to all the key frames; rendering the preset model according to the first key frame to generate a first video frame; circularly executing the first rendering step until the preset model is rendered according to the last key frame, and generating a last video frame corresponding to the last key frame; and ordering all video frames according to the time stamps of all video frames to generate the first video stream.

7. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored program, wherein the program when run performs the method of any of the preceding claims 1 to 5.

8. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method according to any of the claims 1 to 5 by means of the computer program.