CN115278366B

CN115278366B - Data processing method and device for video stream of virtual machine and electronic equipment

Info

Publication number: CN115278366B
Application number: CN202211186677.XA
Authority: CN
Inventors: 张作宸; 卢伟; 刘希超
Original assignee: Tianjin Zhuolang Kunlun Cloud Software Technology Co ltd
Current assignee: Tianjin Zhuolang Kunlun Cloud Software Technology Co ltd
Priority date: 2022-09-28
Filing date: 2022-09-28
Publication date: 2023-03-24
Anticipated expiration: 2042-09-28
Also published as: CN115278366A

Abstract

The invention provides a data processing method and device for a video stream of a virtual machine and electronic equipment, relates to the technical field of video stream data transmission, and solves the technical problem of poor video stream data transmission quality between the virtual machine and a client in the prior art. The method comprises the following steps: the virtual machine carries out first encapsulation processing on the first multimedia data inside to obtain encapsulated second multimedia data, and sends the second multimedia data to the server; the server performs second encapsulation processing on the second multimedia data to obtain encapsulated third multimedia data, and sends the third multimedia data to the client; the client decapsulates the third multimedia data to obtain first multimedia data; the client executes the first multimedia data.

Description

Data processing method and device for video stream of virtual machine and electronic equipment

Technical Field

The present application relates to the field of video stream data transmission technologies, and in particular, to a data processing method and apparatus for a video stream of a virtual machine, and an electronic device.

Background

At present, the requirement of playing high-definition videos exists in the use of a virtual desktop. If the playing video uses image transmission, the quality requirement of the network is high to realize high definition and low delay due to the rapid change of the image. In practical application, network environments are diversified, and it is difficult to ensure that the current application scene is suitable for stable transmission of video stream data, so that the video stream data transmission quality is poor.

Therefore, the technical problem of poor transmission quality of video stream data between the virtual machine and the client exists in the prior art.

Disclosure of Invention

The application aims to provide a data processing method and device for a video stream of a virtual machine and electronic equipment, so as to solve the technical problem that in the prior art, the transmission quality of the video stream data between the virtual machine and a client is poor.

In a first aspect, an embodiment of the present application provides a data processing method for a video stream of a virtual machine, which is applied to a data processing system for the video stream of the virtual machine, where the data processing system includes a virtual machine, a server and a client; the method comprises the following steps:

the virtual machine carries out first encapsulation processing on internal first multimedia data to obtain encapsulated second multimedia data, and sends the second multimedia data to the server;

the server side carries out second encapsulation processing on the second multimedia data to obtain encapsulated third multimedia data, and sends the third multimedia data to the client side;

the client decapsulates the third multimedia data to obtain the first multimedia data;

and the client executes the first multimedia data.

In one possible implementation, the virtual machine includes a data capture component and a data interaction component; the virtual machine carries out first encapsulation processing on internal first multimedia data to obtain encapsulated second multimedia data, and sends the second multimedia data to the server, and the method comprises the following steps:

the virtual machine acquires first multimedia data through the data capturing component;

the virtual machine carries out first encapsulation processing on the first multimedia data through the data interaction component to obtain encapsulated second multimedia data;

and the virtual machine sends the second multimedia data to the server.

In one possible implementation, the server includes a transport protocol component and a virtualization component; the server performs second encapsulation processing on the second multimedia data to obtain encapsulated third multimedia data, and sends the third multimedia data to the client, including:

the server side acquires the second multimedia data through the virtualization component;

the server side carries out second encapsulation processing on the second multimedia data through the transmission protocol assembly to obtain encapsulated third multimedia data; wherein the second encapsulation process is a protocol encapsulation process;

and the server side sends the third multimedia data to the client side.

In one possible implementation, the client includes a data parsing component; the client decapsulates the third multimedia data to obtain the first multimedia data, and includes:

the client carries out protocol decapsulation processing on the third multimedia data to obtain the second multimedia data;

and the client decapsulates the second multimedia data through the data analysis component to obtain the first multimedia data.

In one possible implementation, the first multimedia data includes any one or more of:

control information, basic information, audio coding information, and video coding information.

In one possible implementation, the client further comprises a control component, an audio decoder component, a video decoder component, and a playback component; the client executes the first multimedia data, and the execution includes:

the client decodes the audio coding information through the audio decoder component to obtain decoded audio information, and decodes the video coding information through the video decoder component to obtain decoded video information;

and the client controls the playing component to play the audio information and the video information through the control component based on the control information.

In one possible implementation, the decoding, by the client, the audio coding information through the audio decoder component to obtain decoded audio information, and decoding, by the video decoder component, the video coding information to obtain decoded video information includes:

the client sends the audio coding information to a data queue of the audio decoder component for storage through a first thread of the audio decoder component, and obtains the audio coding information from the data queue of the audio decoder component through a second thread of the audio decoder component for decoding to obtain decoded audio information;

and the client sends the video coding information to a data queue of the video decoder component for storage through a first thread of the video decoder component, and acquires the video coding information from the data queue of the video decoder component through a second thread of the video decoder component for decoding to obtain decoded video information.

In a second aspect, an embodiment of the present application provides a data processing apparatus for a video stream of a virtual machine, which is applied to a data processing system for a video stream of the virtual machine, where the data processing system includes a virtual machine, a server, and a client; the device comprises:

the first encapsulation module is used for the virtual machine to perform first encapsulation processing on the first multimedia data inside to obtain encapsulated second multimedia data and send the second multimedia data to the server;

the second encapsulation module is used for the server side to perform second encapsulation processing on the second multimedia data to obtain encapsulated third multimedia data and send the third multimedia data to the client side;

the decapsulation module is used for decapsulating the third multimedia data by the client to obtain the first multimedia data;

and the execution module is used for executing the first multimedia data by the client.

In a third aspect, an embodiment of the present application provides an electronic device, which includes a memory and a processor, where the memory stores a computer program that is executable on the processor, and the processor implements the steps of the method according to the first aspect when executing the computer program.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium storing computer-executable instructions that, when invoked and executed by a processor, cause the processor to perform the steps of the method of the first aspect.

The embodiment of the application brings the following beneficial effects:

the embodiment of the application provides a data processing method, a device and electronic equipment for video streams of a virtual machine. According to the scheme, data of the player in the virtual machine is delivered to the bottom layer of the virtual machine in a mode of writing the data into the player through equipment, and then the data are transmitted to the client through a specific transmission protocol of the server, so that direct communication between the virtual machine and the client is avoided, only the server and the client are required to be ensured to be in network communication, the necessary condition for deploying the cloud desktop is met, and the method and the device are suitable for application scenes needing network isolation. Compared with the traditional video streaming mode, the scheme has the advantages of stable data stream, high video image quality, rich colors, low occupied bandwidth, almost no occupation of server side resources and the like, can greatly reduce the coding and decoding pressure of the server side, can also maximally utilize the CPU/GPU processing capacity of the client side, has higher transmission speed and lower time delay, is more friendly to a real-time playing scene, and relieves the technical problem of poor transmission quality of the video stream data between the virtual machine and the client side in the prior art.

Drawings

In order to more clearly illustrate the detailed description of the present application or the technical solutions in the prior art, the drawings needed to be used in the detailed description of the present application or the prior art description will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a schematic structural diagram of a conventional virtual machine video streaming architecture according to an embodiment of the present disclosure;

fig. 2 is a schematic flowchart of a data processing method for a video stream of a virtual machine according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of a video streaming architecture of a virtual machine according to the present invention;

fig. 4 is a schematic diagram illustrating a processing flow of a client according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a data processing apparatus for virtual machine video streaming according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

To make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions of the present application will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "comprising" and "having," and any variations thereof, as referred to in the embodiments of the present application, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements but may alternatively include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The existing video redirection technology is a technology for capturing a video coding stream to be played by a server player and then directly sending the video coding stream to a client for decoding, playing and displaying. The technology is applied to the virtual desktop to play high-definition videos, and the load of the virtual desktop and the network bandwidth can be effectively reduced. The technique uses interception of the video stream, network transmission and video decoding techniques. Intercepting the video stream in the virtual desktop, transmitting the video stream to the client through the network, decoding the video stream by the client, playing the video stream, transmitting the video stream data by depending on the network, and performing customized development on a player in the virtual desktop. As shown in fig. 1, a conventional video redirection solution generally transmits data of a player in a virtual machine directly to a client through a network for decoding and displaying, and such a solution has disadvantages that the network of the virtual machine and the network of the client must be reachable, has high requirements on network environments, and cannot be applied in a scenario requiring network isolation.

Based on this, embodiments of the present application provide a data processing method and apparatus for a video stream of a virtual machine, and an electronic device, by which a technical problem in the prior art that a video stream data transmission quality between the virtual machine and a client is poor can be alleviated.

The embodiments of the present application will be further described with reference to the accompanying drawings.

Fig. 2 is a schematic flowchart of a data processing method for a virtual machine video stream according to an embodiment of the present disclosure, where the method is applied to a data processing system for a virtual machine video stream, and the data processing system includes a virtual machine, a server, and a client. As shown in fig. 2, the method includes:

step S210, the virtual machine performs a first encapsulation process on the internal first multimedia data to obtain encapsulated second multimedia data, and sends the second multimedia data to the server.

For example, as shown in fig. 3, from the system architecture level, the data processing system (video redirection system) of the video stream of the virtual machine can be divided into three major parts, namely a native player (e.g., MPC player), a browser (e.g., google browser), a data capture component and a data interaction component, which are located in the virtual machine; the system comprises a virtualization component (such as QEMU) and a transmission Protocol component (such as a Simple Protocol for Independent Computing Environment (Spice)) positioned at a server end, a data parsing component, an audio decoder, a control component and a video decoder positioned at a client end. When a player or a browser in the virtual machine opens a video file, corresponding first multimedia data is generated, where the first multimedia data may specifically include audio/video data before decoding, control information (e.g., play, pause), player window position information, and the like. The virtual machine can perform first encapsulation processing such as data format encapsulation and serialization on the first multimedia data inside to obtain encapsulated second multimedia data, and sends the second multimedia data to the server side in a Virt-io mode.

Step S220, the server performs a second encapsulation process on the second multimedia data to obtain encapsulated third multimedia data, and sends the third multimedia data to the client.

For example, as shown in fig. 3, after receiving the second multimedia data through Virt-io, the server may perform Spice protocol encapsulation on the second multimedia data, implement second encapsulation processing, obtain encapsulated third multimedia data, and send the third multimedia data to the client through network transmission or the like.

In step S230, the client decapsulates the third multimedia data to obtain the first multimedia data.

For example, as shown in fig. 3, after receiving the third multimedia data, the client performs decapsulation of a Spice protocol to obtain the second multimedia data, and then performs decapsulation and deserialization of a data format to obtain the first multimedia data.

In step S240, the client executes the first multimedia data.

Illustratively, as shown in fig. 3, the client may perform corresponding operations according to the type of information in the first multimedia data. For example, decoding audio data and video data, and sending the decoded audio data and video data to client hardware through a sound card drive for playing; control execution is performed for control information such as pause, play, and the like.

In the embodiment of the application, data of the player in the virtual machine is delivered to the bottom layer of the virtual machine in a mode of writing the data into the player through equipment, and then the data are transmitted to the client through the specific transmission protocol of the server, so that direct communication between the virtual machine and the client is avoided, only the server and the client are required to be ensured to be in network communication, the necessary condition for deploying the cloud desktop is met, and the method and the device are suitable for application scenes needing network isolation. Compared with the traditional video streaming mode, the scheme has the advantages of stable data stream, high video image quality, rich colors, low occupied bandwidth, almost no occupation of server side resources and the like, can greatly reduce the coding and decoding pressure of the server side, can also maximally utilize the CPU/GPU processing capacity of the client side, has higher transmission speed and lower time delay, is more friendly to a real-time playing scene, and relieves the technical problem of poor transmission quality of the video stream data between the virtual machine and the client side in the prior art.

The above steps are described in detail below.

In some embodiments, a virtual machine includes a data capture component and a data interaction component; the step S210 may specifically include the following steps:

step a), the virtual machine acquires first multimedia data through the data capturing component.

And b), the virtual machine carries out first encapsulation processing on the first multimedia data through the data interaction assembly to obtain encapsulated second multimedia data.

And c), the virtual machine sends the second multimedia data to the server.

Illustratively, as shown in fig. 3, the virtual machine includes a data capture component and a data interaction component, and when a player or a browser in the virtual machine opens a video file, corresponding first multimedia data is generated, where the first multimedia data may specifically include audio/video data before decoding, control information (RUN, STOP, PAUSE, SEEK), player window position information, and the like. The data capturing component can acquire audio and video data before decoding, control information, player window position information and other data, and sends the data to the data interaction component through Remote Procedure Calls (grpcs), and the data interaction component is responsible for sending the data to the service end in a Virt-io manner after packaging and serializing the data, for example, writing the data into a channel corresponding to the Qemu (service end virtualization component) in a Virt-io manner. Thereby better realizing high-quality transmission of data.

In some embodiments, the server includes a transport protocol component and a virtualization component; the step S220 may specifically include the following steps:

and d), the server side acquires the second multimedia data through the virtualization component.

Step e), the server side carries out second encapsulation processing on the second multimedia data through the transmission protocol assembly to obtain encapsulated third multimedia data; wherein the second encapsulation process is a protocol encapsulation process.

And f), the server side sends the third multimedia data to the client side.

For example, as shown in fig. 3, after the data interaction component of the virtual machine encapsulates and serializes the data, writes the data into a channel (server virtualization component) corresponding to the Qemu in a Virt-io manner, the data interaction component may encapsulate the second multimedia data by using the Spice protocol through the transmission protocol component, and send the data to the client through network transmission.

In some embodiments, the client includes a data parsing component; the step S230 may specifically include the following steps:

and g), the client performs protocol decapsulation processing on the third multimedia data to obtain second multimedia data.

And h), the client decapsulates the second multimedia data through the data analysis component to obtain the first multimedia data.

For example, after receiving the third multimedia data, the client may first perform decapsulation of a Spice protocol to obtain second multimedia data, and then perform decapsulation and deserialization of a data format by using the data parsing component to obtain the first multimedia data.

In some embodiments, the first multimedia data comprises any one or more of:

Illustratively, the first multimedia data may include control information, basic information, audio coding information, video coding information, and the like. The control information includes a RUN state (RUN), a PAUSE state (PAUSE), a STOP RUN State (STOP) and a random play State (SEEK) of the player. The basic information comprises initialization information of an audio and video decoder; coding type, sampling rate, channel number and bit rate of an audio decoder; the coding type of the video decoder; player location information, etc. The audio data and the video data corresponding to the audio coding information and the video coding information may be compressed data.

By enabling the data types of the first multimedia data to comprise a plurality of types, the functions are richer, and video transmission with higher quality can be realized between the virtual machine and the client.

In some embodiments, the client further comprises a control component, an audio decoder component, a video decoder component, and a playback component; the step S240 may specifically include the following steps:

and step i), the client decodes the audio coding information through the audio decoder component to obtain decoded audio information, and decodes the video coding information through the video decoder component to obtain decoded video information.

And j), the client controls the playing component to play the audio information and the video information through the control component based on the control information.

For example, after obtaining the audio coding information and the video coding information before decoding, the client may decode the audio coding information through the audio decoder component, respectively, to obtain decoded audio information; the video decoder component decodes the video coding information to obtain decoded video information, and then the control component controls the playing component to play the audio information and the video information based on the control information.

In practical application, because the timestamps of the virtual machine and the client are inconsistent, the fluency of video rendering cannot be ensured directly by means of the timestamps. Therefore, the sleep time can be calculated to ensure that each frame is rendered at the same time interval, thereby ensuring the uniform playing of the video stream. The calculation formula is as follows:

sleep_time = last_render_time + delta - now_time + remain_time；

remain_time = last_render_time + remain_time + delta - now_time；

where sleep _ time represents program sleep time, last _ render _ time represents last frame rendering end time, now _ time represents current client time, remaining _ time represents display error, delta represents frame interval, delta =1/fps × 1000000.

Based on the step i) and the step j), the step i) may specifically include the following steps:

and step k), the client sends the audio coding information to the data queue of the audio decoder component for storage through the first thread of the audio decoder component, and obtains the audio coding information from the data queue of the audio decoder component through the second thread of the audio decoder component for decoding to obtain the decoded audio information.

And step l), the client sends the video coding information to the data queue of the video decoder component for storage through the first thread of the video decoder component, and obtains the video coding information from the data queue of the video decoder component through the second thread of the video decoder component for decoding to obtain the decoded video information.

For example, as shown in fig. 4, the video start playing client may receive the following messages in sequence: kSpiceVideoCtrl _ Acontext, kSpiceVideoCtrl _ VContext, kSpiceVideoCtrl _ stop, kSpiceVideoCtrl _ seek, kSpiceVideoCtrl _ running, kSpiceVideoCtrl _ VData, kSpiceVideoCtrl _ ADAta, etc. When receiving a ksviceVideoCtrl _ VData message instruction, firstly, judging whether a decoder (such as a MediaCodec) is initialized or not, if not, initializing the MediaCodec, then sending data into a video data queue for storage, starting another thread to continuously obtain data from the video data queue, and sending the data into the MediaCodec for decoding and rendering. The MediaCodec adopts a mode of directly binding the surfaceview, avoids memory copy of the decoded data in a display memory, and improves the display efficiency.

Fig. 5 is a schematic structural diagram of a data processing apparatus for virtual machine video streaming according to an embodiment of the present disclosure. As shown in fig. 5, the data processing apparatus 500 for virtual machine video streaming includes:

a first encapsulation module 501, configured to perform a first encapsulation process on internal first multimedia data by a virtual machine to obtain encapsulated second multimedia data, and send the second multimedia data to a server;

a second encapsulation module 502, configured to perform a second encapsulation process on the second multimedia data by the server, obtain encapsulated third multimedia data, and send the third multimedia data to the client;

a decapsulation module 503, configured to decapsulate the third multimedia data by the client, to obtain first multimedia data;

an execution module 504, configured to execute, by the client, the first multimedia data.

In some embodiments, a virtual machine includes a data capture component and a data interaction component; the first encapsulation module 501 is specifically configured to:

the virtual machine acquires first multimedia data through a data capture component;

the virtual machine carries out first encapsulation processing on the first multimedia data through the data interaction assembly to obtain encapsulated second multimedia data;

and the virtual machine sends the second multimedia data to the server.

In some embodiments, the server includes a transport protocol component and a virtualization component; the second encapsulation module 502 is specifically configured to:

the server side acquires second multimedia data through the virtualization component;

the server side carries out second encapsulation processing on the second multimedia data through the transmission protocol assembly to obtain encapsulated third multimedia data; wherein the second encapsulation processing is protocol encapsulation processing;

and the server side sends the third multimedia data to the client side.

In some embodiments, the client includes a data parsing component; the decapsulation module 503 is specifically configured to: the client carries out protocol decapsulation processing on the third multimedia data to obtain second multimedia data;

In some embodiments, the first multimedia data comprises any one or more of:

In some embodiments, the client further comprises a control component, an audio decoder component, a video decoder component, and a playback component; the execution module 504 is specifically configured to:

In some embodiments, the execution module 504 is specifically configured to:

the client sends the video coding information to the data queue of the video decoder component for storage through the first thread of the video decoder component, and obtains the video coding information from the data queue of the video decoder component through the second thread of the video decoder component for decoding to obtain the decoded video information.

The device provided by the embodiment of the present invention has the same implementation principle and technical effect as the method embodiments, and for the sake of brief description, no mention is made in the system embodiments, and reference may be made to the corresponding contents in the method embodiments.

The embodiment of the invention provides electronic equipment, which particularly comprises a processor and a storage device; the storage means has stored thereon a computer program which, when executed by the processor, performs the method of any of the above embodiments.

Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, where the electronic device includes: a processor 601, a memory 602, a bus 603 and a communication interface 604, wherein the processor 601, the communication interface 604 and the memory 602 are connected through the bus 603; the processor 601 is used to execute executable modules, such as computer programs, stored in the memory 602.

The Memory 602 may include a high-speed Random Access Memory (RAM) and may further include a Non-volatile Memory (Non-volatile Memory), such as at least one disk Memory. The communication connection between the network element of the system and at least one other network element is implemented through at least one communication interface 604 (which may be wired or wireless), and the internet, a wide area network, a local network, a metropolitan area network, and the like may be used.

The bus 603 may be an ISA bus, a PCI bus, or an EISA bus, etc. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one double-headed arrow is shown in FIG. 6, but that does not indicate only one bus or one type of bus.

The memory 602 is used for storing a program, and the processor 601 executes the program after receiving an execution instruction, and the method executed by the apparatus defined by the flow process disclosed in any of the foregoing embodiments of the present invention may be applied to the processor 601, or implemented by the processor 601.

The processor 601 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 601. The Processor 601 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; the device can also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field-Programmable Gate Array (FPGA), or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present invention may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 602, and the processor 601 reads the information in the memory 602 and completes the steps of the method in combination with the hardware thereof.

The computer program product of the readable storage medium provided in the embodiment of the present invention includes a computer readable storage medium storing a program code, and instructions included in the program code may be used to execute the method in the foregoing method embodiment, and specific implementation may refer to the foregoing method embodiment, which is not described herein again.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Finally, it should be noted that: although the present invention has been described in detail with reference to the foregoing embodiments, those skilled in the art will understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A data processing method of a virtual machine video stream is characterized in that the data processing method is applied to a data processing system of the virtual machine video stream, and the data processing system comprises a virtual machine, a server and a client; the method comprises the following steps:

the virtual machine carries out first encapsulation processing on first multimedia data inside to obtain encapsulated second multimedia data, and the second multimedia data are sent to the server side;

the server side carries out second encapsulation processing on the second multimedia data to obtain encapsulated third multimedia data, and the third multimedia data are sent to the client side;

the client executes the first multimedia data;

the first multimedia data comprises any one or more of:

control information, basic information, audio coding information, and video coding information;

the client further comprises a control component, an audio decoder component, a video decoder component and a playing component; the client executes the first multimedia data, and the execution includes:

the client controls the playing component to play the audio information and the video information through the control component based on the control information;

controlling the playing component to play the audio information and the video information through the control component, including:

calculating the sleep time of the video information to ensure that each frame of the video information is rendered at the same time interval;

the calculation formula is as follows:

sleep_time = last_render_time + delta - now_time + remain_time；

remain_time = last_render_time + remain_time + delta - now_time；

wherein, the sleep _ time represents a program sleep time, the last _ render _ time represents a rendering end time of a last frame, the now _ time represents a current client time, the remaining _ time represents a display error, the delta represents a frame interval, and the delta =1/fps × 1000000.

2. The method of claim 1, wherein the virtual machine comprises a data capture component and a data interaction component; the virtual machine performs first encapsulation processing on first multimedia data inside to obtain encapsulated second multimedia data, and sends the second multimedia data to the server, including:

and the virtual machine sends the second multimedia data to the server.

3. The method of claim 1, wherein the server comprises a transport protocol component and a virtualization component; the server performs second encapsulation processing on the second multimedia data to obtain encapsulated third multimedia data, and sends the third multimedia data to the client, including:

and the server side sends the third multimedia data to the client side.

4. The method of claim 1, wherein the client comprises a data parsing component; the client decapsulates the third multimedia data to obtain the first multimedia data, and includes:

5. The method of claim 1, wherein the client decodes the audio encoded information by the audio decoder component to obtain decoded audio information, and decodes the video encoded information by the video decoder component to obtain decoded video information, comprising:

6. The data processing device of the video stream of the virtual machine is characterized in that the data processing device is applied to a data processing system of the video stream of the virtual machine, and the data processing system comprises the virtual machine, a server and a client; the device comprises:

the execution module is used for executing the first multimedia data by the client;

the calculation formula is as follows:

sleep_time = last_render_time + delta - now_time + remain_time；

remain_time = last_render_time + remain_time + delta - now_time；

7. An electronic device comprising a memory and a processor, wherein the memory stores a computer program operable on the processor, and wherein the processor implements the steps of the method of any of claims 1 to 5 when executing the computer program.

8. A computer readable storage medium having stored thereon computer executable instructions which, when invoked and executed by a processor, cause the processor to execute the method of any of claims 1 to 5.