CN114866802A

CN114866802A - Video stream transmission method and device, storage medium and electronic device

Info

Publication number: CN114866802A
Application number: CN202210390334.9A
Authority: CN
Inventors: 于航滨
Original assignee: Qingdao Haier Technology Co Ltd; Haier Smart Home Co Ltd
Current assignee: Qingdao Haier Technology Co Ltd; Haier Smart Home Co Ltd
Priority date: 2022-04-14
Filing date: 2022-04-14
Publication date: 2022-08-05
Anticipated expiration: 2042-04-14
Also published as: CN114866802B

Abstract

The application discloses a video stream sending method and device, a storage medium and an electronic device, and relates to the technical field of smart families, wherein the method comprises the following steps: the cloud server determines audio information used for responding to the request event and action information of the virtual object corresponding to the audio information according to the acquired request event; the request event is an event that a target object requests a virtual object on the terminal equipment; the cloud server configures display animation of the virtual object according to the action information of the virtual object corresponding to the audio information; the cloud server determines all key frames and/or all forward frames corresponding to the display animation, renders the preset model according to all key frames and/or all forward frames, and generates a first video stream of a virtual object response request event; wherein the first video stream does not contain audio information; and the cloud server generates a second video stream according to the first video stream and the audio information.

Description

Video stream transmission method and device, storage medium and electronic device

Technical Field

The present invention relates to the field of communications, and in particular, to a method and an apparatus for transmitting a video stream, a storage medium, and an electronic apparatus.

Background

With the development of science and technology and the continuous improvement of the living standard of people, more and more families use intelligent equipment, and the intelligent equipment can support the display of virtual objects at present.

In a common virtual object scheme, a 3D engine is integrated in a smart device of a user to display a virtual object. However, the scheme is limited by the hardware performance of the smart device, and on the smart device with weak performance, smooth display effect cannot be achieved, and even the situation of unsmooth and tearing of the picture can occur.

In addition, the calculation efficiency can be improved by implementing a parallel optimization mode, and the effect of improving the frame rate is further achieved. However, this scheme only improves the rendering efficiency of the engine, but does not solve the limitation caused by hardware performance, and on a low-performance smart device, even if the computational power is utilized by 100%, an ideal display effect cannot be achieved.

Aiming at the problems that the speed of generating the video stream corresponding to the virtual object on the low-performance intelligent equipment is low and the like in the related art, an effective solution is not provided yet.

Disclosure of Invention

The embodiment of the invention provides a video stream sending method and device, a storage medium and an electronic device, and aims to at least solve the problems that in the related art, the speed of generating a video stream corresponding to a virtual object on a low-performance intelligent device is low and the like.

According to an embodiment of the present invention, there is provided a method for transmitting a video stream, including: the cloud server determines audio information used for responding to the request event and action information of a virtual object corresponding to the audio information according to the acquired request event; the request event is an event that a target object requests a virtual object on the terminal equipment; the cloud server configures display animation of the virtual object according to the action information of the virtual object corresponding to the audio information; the cloud server determines all key frames and/or all forward frames corresponding to the display animation, renders a preset model according to all the key frames and/or all the forward frames, and generates a first video stream of the virtual object responding to the request event; wherein the first video stream does not contain audio information; and the cloud server generates a second video stream according to the first video stream and the audio information.

In an exemplary embodiment, the configuring, by the cloud server, the display animation of the virtual object according to the action information of the virtual object corresponding to the audio information includes: determining text information of the virtual object responding to the request event; determining audio information corresponding to the text information according to preset tone information and tone information of a virtual object, and determining a lip movement track and limb movement of the virtual object when the terminal equipment plays the audio information; wherein the action information includes: the lip movement track and limb movement; and configuring a display animation of the virtual object according to the lip movement track and the limb movement.

In an exemplary embodiment, determining all key frames and/or all forward frames corresponding to the presentation animation comprises: analyzing the action information and all frame data corresponding to the display animation, and determining unchanged first unit data and changed second unit data between adjacent first frame data and second frame data one by one; determining all key frames and/or all forward frames of the presentation animation from the first cell data and the second cell data.

In an exemplary embodiment, rendering the preset model according to all the key frames to generate a first video stream includes: a first rendering step: determining a first key frame in all the key frames according to the playing time sequence corresponding to all the key frames; rendering the preset model according to the first key frame to generate a first video frame; circularly executing the first rendering step until the preset model is rendered according to the last key frame, and generating the last video frame corresponding to the last key frame; sorting all video frames according to their timestamps to generate the first video stream.

In an exemplary embodiment, rendering the preset model from all the forward frames to generate a first video stream comprises: determining a first forward frame in all the forward frames according to the playing time sequence corresponding to all the forward frames; rendering the preset model according to the first forward frame to generate a first video frame; a second rendering step: determining a second forward frame in all the forward frames according to the playing time sequence; rendering the preset model according to the second forward frame and the first video frame to generate a second video frame; circularly executing the second rendering step until the preset model is rendered according to the last forward frame and the previous video frame to generate the last video frame corresponding to the last forward frame; sorting all video frames according to their timestamps to generate the first video stream.

In one exemplary embodiment, generating a second video stream from the first video stream and the audio information comprises: acquiring a corresponding relation between each video frame in the first video stream and each audio frame in the audio information; and carrying out audio and video coding on the first video stream and the audio information according to the corresponding relation so as to generate a second video stream of the virtual object responding to the request event.

In an exemplary embodiment, determining unchanged first cell data and changed second cell data between adjacent first frame data and second frame data one by one includes: comparing adjacent first frame data and second frame data, determining second unit data of the second frame data changed relative to the first frame data, and determining first unit data of the second frame data unchanged relative to the first frame data.

According to another embodiment of the present invention, there is provided a video stream sending apparatus, applied to a cloud server, including: the determining module is used for determining audio information used for responding to the request event and action information of a virtual object corresponding to the audio information according to the acquired request event; the request event is an event that a target object requests a virtual object on the terminal equipment; the configuration module is used for configuring the display animation of the virtual object according to the action information of the virtual object corresponding to the audio information; the first generation module is used for determining all key frames and/or all forward frames corresponding to the display animation, rendering a preset model according to all the key frames and/or all the forward frames, and generating a first video stream of the virtual object responding to the request event; wherein the first video stream does not contain audio information; and the second generation module is used for generating a second video stream according to the first video stream and the audio information.

According to still another aspect of the embodiments of the present invention, there is also provided a computer-readable storage medium, in which a computer program is stored, wherein the computer program is configured to execute the above-mentioned video stream transmission method when running.

According to another aspect of the embodiments of the present invention, there is also provided an electronic apparatus, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the above-mentioned video stream transmission method through the computer program.

In the embodiment of the invention, the cloud server determines audio information for responding to the request event and action information of a virtual object corresponding to the audio information according to the acquired request event; the request event is an event that a target object requests a virtual object on the terminal equipment; the cloud server configures display animation of the virtual object according to the action information of the virtual object corresponding to the audio information; the cloud server determines all key frames and/or all forward frames corresponding to the display animation, renders a preset model according to all the key frames and/or all the forward frames, and generates a first video stream of the virtual object responding to the request event; wherein the first video stream does not contain audio information; the cloud server generates a second video stream according to the first video stream and the audio information; by adopting the technical scheme, the problem that the speed of generating the video stream corresponding to the virtual object on the low-performance intelligent device is low is solved, the calculated amount is transferred from the terminal device to the cloud server, the performance requirement of the virtual object on the terminal device is greatly reduced, and the video stream corresponding to the virtual object is generated on the cloud server, so that the virtual object can smoothly display the video stream on the terminal device.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

Fig. 1 is a schematic diagram of a hardware environment of a method for transmitting a video stream according to an embodiment of the present application;

fig. 2 is a flowchart of a method of transmitting a video stream according to an embodiment of the present invention;

fig. 3 is a schematic diagram of a related art video stream transmission method;

fig. 4 is a schematic diagram of a transmission method of a video stream according to an embodiment of the present invention;

fig. 5 is a block diagram of a transmitting apparatus of a video stream according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

According to an aspect of an embodiment of the present application, a method for transmitting a video stream is provided. The video stream sending method is widely applied to full-House intelligent digital control application scenes such as intelligent homes (Smart Home), intelligent homes, intelligent Home equipment ecology, intelligent House (Intelligent House) ecology and the like. Alternatively, in the present embodiment, the above-described video stream transmission method may be applied to a hardware environment formed by the terminal device 102 and the server 104 as shown in fig. 1. As shown in fig. 1, the server 104 is connected to the terminal device 102 through a network, and may be configured to provide a service (e.g., an application service) for the terminal or a client installed on the terminal, set a database on the server or independent of the server, and provide a data storage service for the server 104, and configure a cloud computing and/or edge computing service on the server or independent of the server, and provide a data operation service for the server 104.

The network may include, but is not limited to, at least one of: wired networks, wireless networks. The wired network may include, but is not limited to, at least one of: wide area networks, metropolitan area networks, local area networks, which may include, but are not limited to, at least one of the following: WIFI (Wireless Fidelity), bluetooth. Terminal equipment 102 can be but not limited to be PC, the cell-phone, the panel computer, intelligent air conditioner, intelligent cigarette machine, intelligent refrigerator, intelligent oven, intelligent kitchen range, intelligent washing machine, intelligent water heater, intelligent washing equipment, intelligent dish washer, intelligent projection equipment, intelligent TV, intelligent clothes hanger, intelligent (window) curtain, intelligence audio-visual, smart jack, intelligent stereo set, intelligent audio amplifier, intelligent new trend equipment, intelligent kitchen guarding equipment, intelligent bathroom equipment, intelligence robot of sweeping the floor, intelligence robot of wiping the window, intelligence robot of mopping the ground, intelligent air purification equipment, intelligent steam ager, intelligent microwave oven, intelligent kitchen is precious, intelligent clarifier, intelligent water dispenser, intelligent lock etc..

In this embodiment, a video stream sending method is provided, and is applied to a cloud server, and fig. 2 is a flowchart of a video stream sending method according to an embodiment of the present invention, where the flowchart includes the following steps:

step S202, the cloud server determines audio information used for responding to the request event and action information of a virtual object corresponding to the audio information according to the acquired request event; the request event is an event that a target object requests a virtual object on the terminal equipment;

for example, the request event may include, but is not limited to: "ask the virtual object on the terminal device for the current point", "ask the virtual object on the terminal device for the weather today", "request the virtual object on the terminal device to set an alarm clock", and the like.

Step S204, the cloud server configures display animation of the virtual object according to the action information of the virtual object corresponding to the audio information;

the motion information may be used to indicate a body motion, a lip motion, and the like of the virtual object when the audio information is played. It should be noted that, in the embodiment of the present invention, a preset animation may also be obtained, where the preset animation may be understood as an animation associated with the audio information, for example, the audio information is "raining today", and the preset animation may be an animation related to rain, for example, an animation related to rain; and configuring the display animation of the virtual object according to the audio information, the action information of the virtual object corresponding to the audio information and the preset animation.

Step S206, the cloud server determines all key frames and/or all forward frames corresponding to the display animation, renders a preset model according to all key frames and/or all forward frames, and generates a first video stream of the virtual object responding to the request event; wherein the first video stream does not contain audio information;

step S208, the cloud server generates a second video stream according to the first video stream and the audio information.

Through the steps, the cloud server determines audio information used for responding to the request event and action information of the virtual object corresponding to the audio information according to the acquired request event; the request event is an event that a target object requests a virtual object on the terminal equipment; the cloud server configures display animation of the virtual object according to the action information of the virtual object corresponding to the audio information; the cloud server determines all key frames and/or all forward frames corresponding to the display animation, renders a preset model according to all the key frames and/or all the forward frames, and generates a first video stream of the virtual object responding to the request event; wherein the first video stream does not contain audio information; the cloud server generates the second video stream according to the first video stream and the audio information, and the problem that in the related technology, the speed of generating the video stream corresponding to the virtual object on the low-performance intelligent device is low is solved.

In an exemplary embodiment, the configuring, by the cloud server, the display animation of the virtual object according to the action information of the virtual object corresponding to the audio information includes: determining text information of the virtual object responding to the request event; determining audio information corresponding to the text information according to preset tone information and tone information of a virtual object, and determining a lip movement track and limb movement of the virtual object when the terminal equipment plays the audio information; extracting keywords in the text information, and determining a preset animation which has an association relation with the keywords; wherein the action information includes: the lip movement track and limb movement; and configuring the display animation of the virtual object according to the lip movement track, the limb movement and the preset animation.

That is to say, the cloud server determines, when receiving the request event, response information (i.e., text information) in a text form corresponding to the request event, and if determining the text information, needs to perform the following steps:

step 1, obtaining tone color information and tone information of a target object or a virtual object preset by a developer; determining corresponding audio information according to the tone information and the tone information of the virtual object;

step 2, determining the lip movement track and the limb movement of the virtual object when the terminal equipment plays the audio information;

step 3, determining a preset animation which has a correlation with the keywords in the text information;

and 4, configuring the display animation of the virtual object according to the lip movement track, the limb movement and the preset animation.

It should be noted that, step 2 is executed after step 1, and the execution sequence of step 3 and step 1 is not limited in the embodiment of the present invention, that is, in the case of determining text information, the sequence of steps to be executed may be: step 1, step 2 and step 3; the following steps can be also included: step 1, step 3 and step 2; the method can also comprise the following steps: step 3, step 1 and step 2.

Further, determining the lip movement track and the limb movement of the virtual object when the terminal device plays the audio information includes: determining audio features corresponding to each frame of audio in the audio information; determining lip movement information corresponding to each frame of audio according to the first corresponding relation between the audio features and the lip movement; determining limb action information corresponding to each frame of audio according to the second corresponding relation between the audio features and the limb actions; and sequencing the lip movement information corresponding to each frame of audio according to the playing sequence of the audio information to obtain the lip movement track, and sequencing the limb action information corresponding to each frame of audio according to the playing sequence of the audio information to obtain the limb action.

In an exemplary embodiment, determining all keyframes and/or all forward frames corresponding to the presentation animation includes: analyzing all frame data corresponding to the display animation, and determining unchanged first unit data and changed second unit data between adjacent first frame data and second frame data one by one; determining all key frames and/or all forward frames of the presentation animation from the first cell data and the second cell data.

That is to say, the frame data corresponding to the display animation is analyzed to obtain unit data, every two adjacent unit data are compared to obtain all key frames and/or all forward frames of the display animation, and a first video stream not containing audio information is generated according to all key frames and/or all forward frames.

In an exemplary embodiment, rendering the preset model according to all the key frames to generate a first video stream includes: a first rendering step: determining a first key frame in all the key frames according to the playing time sequence corresponding to all the key frames; rendering the preset model according to the first key frame to generate a first video frame; circularly executing the first rendering step until the preset model is rendered according to the last key frame to generate the last video frame corresponding to the last key frame; sequencing all video frames according to their timestamps to generate the first video stream.

The embodiment of the invention provides a method for generating a first video stream according to key frames, and particularly, since the key frames are completely stored of a frame of picture, a complete frame of video frame can be obtained only by decoding the key frames, and therefore, when a preset model is rendered according to all the key frames to generate a second video stream, the preset model is rendered one by one according to a playing time sequence corresponding to the key frames to obtain video frames corresponding to all the key frames, and all the video frames are sequenced according to timestamps of all the video frames to generate the first video stream.

The embodiment of the invention provides a method for generating a first video stream according to a forward frame, and particularly, because the forward frame is used for indicating the difference between the frame and a previous frame, the forward frame needs to be decoded, and a frame of complete video information is obtained according to the combination of the previous frame and the forward frame, so that a preset model needs to be rendered according to the forward frame and the video frames before the forward frame to obtain video frames corresponding to all the forward frames, and all the video frames are sequenced according to timestamps of all the video frames to generate the first video stream.

In one exemplary embodiment, determining the second video stream from the first video stream and the audio information comprises: acquiring a corresponding relation between each video frame in the first video stream and each audio frame in the audio information; and carrying out audio and video coding on the first video stream and the audio information according to the corresponding relation so as to generate a second video stream of the virtual object responding to the request event.

That is, in the case of generating a first video stream that does not contain audio information, the first video stream and the audio information need to be subjected to audio-video coding to obtain a second video stream that contains audio information.

And comparing the second frame data with the first frame data, and determining changed unit data in the second frame data so as to find out repeated unit data between adjacent frames. Repeated unit data can be directly compressed, and for the changed unit data, object data and unit data subscripts to which the changed unit data belong need to be determined.

In order to better understand the process of the method for sending the video stream, the following describes a flow of the method for sending the video stream with reference to an alternative embodiment, but the flow is not limited to the technical solution of the embodiment of the present invention.

Fig. 3 is a schematic diagram of a related art video stream transmission method; in the existing scheme, the links of model rendering and animation generation of the virtual object are performed at the terminal equipment, and the cloud server issues the parameter data to the terminal for processing after performing action processing, so that the data requirement of network transmission is saved, but the requirement on the performance of the terminal equipment is greatly improved. The performance on the terminal equipment of middle and low ends is difficult to meet the requirement of smooth display.

In this embodiment, a video stream sending method is provided, and fig. 4 is a schematic diagram of a video stream sending method according to an embodiment of the present invention, and as shown in fig. 4, in the embodiment of the present invention, a Unity model rendering link and an animation generation link with the largest resource consumption are transferred to a cloud server for processing. Thus, the cloud server generates not only the parameter data but also the rendered and synthesized video stream. The cloud end carries out links of Unity data acquisition, frame data analysis, unit data comparison, I frame/P frame encoding, I frame/P frame decoding, Unity model rendering and animation generation, so that the cloud end generates a rendered and synthesized video stream. And sending the video stream to a terminal, and calling a player by the terminal to decode and play the video stream. The terminal equipment only needs to call the player to decode and play the video stream, and other links are not needed to be processed, so that the virtual object can be smoothly displayed on the terminal equipment at the middle and low ends.

The video stream is transmitted, so that when the terminal equipment does not receive the display video, the local standby animation is displayed, and after the video stream of the cloud server is issued, seamless switching of the animation can be directly realized, and the condition that a user feels unsmooth in blocking is avoided.

According to the embodiment, the problems that in the related art, smooth display of the virtual object cannot be achieved on low-performance intelligent equipment, even the situation that a picture is blocked and torn can occur are solved, the calculated amount is transferred from the terminal equipment to the cloud server, the performance requirement of the virtual object on the terminal equipment is greatly reduced, and the virtual object can be smoothly displayed on the terminal equipment.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

In this embodiment, a sending apparatus of a video stream is further provided, where the sending apparatus of the video stream is applied to a cloud server, and is used to implement the foregoing embodiments and preferred embodiments, which have already been described and are not described again. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

Fig. 5 is a block diagram of a transmitting apparatus of a video stream according to an embodiment of the present invention; as shown in fig. 5, includes:

a determining module 52, configured to determine, according to the obtained request event, audio information used for responding to the request event, and action information of the virtual object corresponding to the audio information; the request event is an event that a target object requests a virtual object on the terminal equipment;

the configuration module 54 is configured to configure the display animation of the virtual object according to the action information of the virtual object corresponding to the audio information;

a first generating module 56, configured to determine all key frames and/or all forward frames corresponding to the display animation, render a preset model according to all key frames and/or all forward frames, and generate a first video stream of the virtual object responding to the request event; wherein the first video stream does not contain audio information;

a second generating module 58, configured to generate a second video stream according to the first video stream and the audio information.

By the device, the cloud server determines audio information used for responding to the request event and action information of the virtual object corresponding to the audio information according to the acquired request event; the request event is an event that a target object requests a virtual object on the terminal equipment; the cloud server configures display animation of the virtual object according to the action information of the virtual object corresponding to the audio information; the cloud server determines all key frames and/or all forward frames corresponding to the display animation, renders a preset model according to all the key frames and/or all the forward frames, and generates a first video stream of the virtual object responding to the request event; wherein the first video stream does not contain audio information; the cloud server generates the second video stream according to the first video stream and the audio information, and the problem that in the related technology, the speed of generating the video stream corresponding to the virtual object on the low-performance intelligent device is low is solved.

In an exemplary embodiment, the configuration module is configured to determine text information of the virtual object in response to the request event; determining audio information corresponding to the text information according to preset tone information and tone information of a virtual object, and determining a lip movement track and limb movement of the virtual object when the terminal equipment plays the audio information; wherein the action information includes: the lip movement track and limb movement; and configuring a display animation of the virtual object according to the lip movement track and the limb movement.

In an exemplary embodiment, the first generating module is configured to parse all frame data corresponding to the motion information and the display animation, and determine, one by one, a first unit data that does not change and a second unit data that changes between adjacent first frame data and second frame data; determining all key frames and/or all forward frames of the presentation animation from the first cell data and the second cell data.

In an exemplary embodiment, the first generating module is configured to perform a first rendering step: determining a first key frame in all the key frames according to the playing time sequence corresponding to all the key frames; rendering the preset model according to the first key frame to generate a first video frame; circularly executing the first rendering step until the preset model is rendered according to the last key frame to generate the last video frame corresponding to the last key frame; sorting all video frames according to their timestamps to generate the first video stream.

In an exemplary embodiment, the first generating module is configured to determine a first forward frame among all the forward frames according to the playing time sequence corresponding to all the forward frames; rendering the preset model according to the first forward frame to generate a first video frame; a second rendering step: determining a second forward frame in all the forward frames according to the playing time sequence; rendering the preset model according to the second forward frame and the first video frame to generate a second video frame; circularly executing the second rendering step until the preset model is rendered according to the last forward frame and the previous video frame to generate the last video frame corresponding to the last forward frame; sorting all video frames according to their timestamps to generate the first video stream.

In an exemplary embodiment, the second generating module is configured to obtain a correspondence between each video frame in the first video stream and each audio frame in the audio information; and carrying out audio and video coding on the first video stream and the audio information according to the corresponding relation so as to generate a second video stream of the virtual object responding to the request event.

In an exemplary embodiment, the first generating module is configured to compare adjacent first frame data and second frame data, determine second unit data of the second frame data that changes with respect to the first frame data, and determine first unit data of the second frame data that does not change with respect to the first frame data.

An embodiment of the present invention further provides a storage medium including a stored program, wherein the program executes any one of the methods described above.

Alternatively, in the present embodiment, the storage medium may be configured to store program codes for performing the following steps:

s1, the cloud server determines audio information used for responding to the request event and action information of a virtual object corresponding to the audio information according to the acquired request event; the request event is an event that a target object requests a virtual object on the terminal equipment;

s2, the cloud server configures the display animation of the virtual object according to the action information of the virtual object corresponding to the audio information;

s3, the cloud server determines all key frames and/or all forward frames corresponding to the display animation, renders a preset model according to all key frames and/or all forward frames, and generates a first video stream of the virtual object responding to the request event; wherein the first video stream does not contain audio information;

and S4, the cloud server generates a second video stream according to the first video stream and the audio information.

Embodiments of the present invention also provide an electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the steps of any of the above method embodiments.

Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.

Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:

Optionally, in this embodiment, the storage medium may include, but is not limited to: various media capable of storing program codes, such as a usb disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk.

Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments and optional implementation manners, and this embodiment is not described herein again.

It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. Thus, the present invention is not limited to any specific combination of hardware and software.

The foregoing is only a preferred embodiment of the present application and it should be noted that those skilled in the art can make several improvements and modifications without departing from the principle of the present application, and these improvements and modifications should also be considered as the protection scope of the present application.

Claims

1. A method for transmitting a video stream, comprising:

the cloud server determines audio information used for responding to the request event and action information of a virtual object corresponding to the audio information according to the acquired request event; the request event is an event that a target object requests a virtual object on the terminal equipment;

the cloud server configures display animation of the virtual object according to the action information of the virtual object corresponding to the audio information;

the cloud server determines all key frames and/or all forward frames corresponding to the display animation, renders a preset model according to all the key frames and/or all the forward frames, and generates a first video stream of the virtual object responding to the request event; wherein the first video stream does not contain audio information;

and the cloud server generates a second video stream according to the first video stream and the audio information.

2. The method of claim 1, wherein the configuring, by the cloud server, the display animation of the virtual object according to the action information of the virtual object corresponding to the audio information comprises:

determining text information of the virtual object responding to the request event;

determining audio information corresponding to the text information according to preset tone information and tone information of a virtual object, and determining a lip movement track and limb movement of the virtual object when the terminal equipment plays the audio information; wherein the action information includes: the lip movement track and limb movement;

and configuring a display animation of the virtual object according to the lip movement track and the limb movement.

3. The method of claim 1, wherein determining all key frames and/or all forward frames for the presentation animation comprises:

analyzing all frame data corresponding to the display animation, and determining unchanged first unit data and changed second unit data between adjacent first frame data and second frame data one by one;

determining all key frames and/or all forward frames of the presentation animation from the first cell data and the second cell data.

4. The method of claim 1, wherein rendering the preset model according to all the key frames to generate a first video stream comprises:

a first rendering step: determining a first key frame in all the key frames according to the playing time sequence corresponding to all the key frames; rendering the preset model according to the first key frame to generate a first video frame;

circularly executing the first rendering step until the preset model is rendered according to the last key frame to generate the last video frame corresponding to the last key frame;

sorting all video frames according to their timestamps to generate the first video stream.

5. The method of claim 1, wherein rendering the pre-set model from all of the forward frames to generate a first video stream comprises:

determining a first forward frame in all the forward frames according to the playing time sequence corresponding to all the forward frames; rendering the preset model according to the first forward frame to generate a first video frame;

a second rendering step: determining a second forward frame in all the forward frames according to the playing time sequence;

rendering the preset model according to the second forward frame and the first video frame to generate a second video frame;

circularly executing the second rendering step until the preset model is rendered according to the last forward frame and the previous video frame to generate the last video frame corresponding to the last forward frame;

6. The method of claim 1, wherein generating a second video stream from the first video stream and the audio information comprises:

acquiring a corresponding relation between each video frame in the first video stream and each audio frame in the audio information;

and carrying out audio and video coding on the first video stream and the audio information according to the corresponding relation so as to generate a second video stream of the virtual object responding to the request event.

7. The method of claim 3, wherein determining unchanged first cell data and changed second cell data between adjacent first frame data and second frame data comprises:

comparing adjacent first frame data and second frame data, determining second unit data of the second frame data changed relative to the first frame data, and determining first unit data of the second frame data unchanged relative to the first frame data.

8. The utility model provides a video stream's transmitting device, its characterized in that is applied to high in the clouds server, includes:

the determining module is used for determining audio information used for responding to the request event and action information of the virtual object corresponding to the audio information according to the acquired request event; the request event is an event that a target object requests a virtual object on the terminal equipment;

the configuration module is used for configuring the display animation of the virtual object according to the action information of the virtual object corresponding to the audio information;

the first generation module is used for determining all key frames and/or all forward frames corresponding to the display animation, rendering a preset model according to all the key frames and/or all the forward frames, and generating a first video stream of the virtual object responding to the request event; wherein the first video stream does not contain audio information;

and the second generation module is used for generating a second video stream according to the first video stream and the audio information.

9. A computer-readable storage medium, comprising a stored program, wherein the program is operable to perform the method of any one of claims 1 to 7.

10. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to execute the method of any of claims 1 to 7 by means of the computer program.