CN113727142A

CN113727142A - Cloud rendering method and device and computer-storable medium

Info

Publication number: CN113727142A
Application number: CN202111026903.3A
Authority: CN
Inventors: 张宇博; 何进萍; 魏伟; 刘江涛; 郭景昊; 车广富; 郑红阳; 袁家军; 张磊
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2021-09-02
Filing date: 2021-09-02
Publication date: 2021-11-30

Abstract

The disclosure relates to a cloud rendering method and device and a computer-storable medium, and relates to the technical field of computers. The cloud rendering method executed by a cloud rendering device deployed in a cloud end comprises the following steps: receiving a video stream and service information from a client, wherein the video stream comprises a plurality of frames of images; determining an object recognition model and a rendering engine corresponding to the business information from a plurality of object recognition models and rendering engines deployed in the cloud; carrying out object identification on each frame of image by using an object identification model corresponding to the service information to obtain characteristic point cloud data of each frame of image; rendering a model to be rendered corresponding to the business information into each frame of image by utilizing a rendering engine corresponding to the business information according to the feature point cloud data of each frame of image to obtain a rendered image of each frame of image; generating a rendering video stream according to the multi-frame rendering image; sending a rendered video stream to the client.

Description

Cloud rendering method and device and computer-storable medium

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a cloud rendering method and apparatus, and a computer-readable storage medium.

Background

AR (Augmented Reality) interactive scene can establish the relation with retail and amusement, and compared with VR (Virtual Reality) Virtual scene, more close to people's real life, the ease of use is higher, and sharing formula AR interaction can increase social property, is involved in more user's participation, promotes user experience simultaneously.

In the related art, object recognition and rendering are performed locally at a mobile terminal to obtain a rendered video stream.

Disclosure of Invention

In the related art, object recognition and rendering are performed locally at the mobile terminal, which is limited by the performance of the mobile terminal, so that jamming and even breakdown are easy to occur, and rendering efficiency is low.

In view of the above technical problems, the present disclosure provides a solution, which can improve rendering efficiency and adapt to service requirements of clients of different types.

According to a first aspect of the present disclosure, there is provided a cloud rendering method performed by a cloud rendering apparatus deployed in a cloud end, including: receiving a video stream and service information from a client, wherein the video stream comprises a plurality of frames of images; determining an object recognition model and a rendering engine corresponding to the business information from a plurality of object recognition models and rendering engines deployed in the cloud; carrying out object identification on each frame of image by using an object identification model corresponding to the service information to obtain characteristic point cloud data of each frame of image; rendering a model to be rendered corresponding to the business information into each frame of image by utilizing a rendering engine corresponding to the business information according to the feature point cloud data of each frame of image to obtain a rendered image of each frame of image; generating a rendering video stream according to the multi-frame rendering image; sending a rendered video stream to the client.

In some embodiments, the cloud rendering method further comprises: after the rendering image of each frame of image is obtained, storing the obtained rendering image of each frame of image in a buffer area; and acquiring a rendering image of a previous frame image of each frame image from the buffer area as the rendering image of each frame image under the condition that the rendering image of each frame image is not obtained within a preset time.

In some embodiments, rendering the model to be rendered corresponding to the service information into each frame of image, and obtaining the rendered image of each frame of image includes: determining a network state of the client; determining a rendering resolution corresponding to a network state of the client; rendering the model to be rendered corresponding to the service information into each frame of image based on the rendering resolution to obtain a rendered image of each frame of image.

In some embodiments, the service information includes a service category, and generating the rendered video stream from the plurality of frames of rendered images includes: determining other clients interacting with the client under the condition that the service type belongs to a multi-person interaction type; converging the multi-frame rendering image corresponding to the client and the multi-frame rendering images corresponding to the other clients to obtain a multi-frame converging image; and generating the rendering video stream according to the multi-frame confluence image.

In some embodiments, the business information includes a business scene, and determining the object recognition model and the rendering engine corresponding to the business information includes: and determining an object recognition model and a rendering engine corresponding to the business scene.

In some embodiments, the client is a web page front end, and the cloud rendering method further includes: before receiving video stream and service information from a client, receiving a WebRTC (Web real-time communication) connection establishment request of a webpage from the client; and responding to the received WebRTC connection establishment request, and negotiating with the client to establish the WebRTC connection.

In some embodiments, the cloud rendering method further comprises: before object recognition is carried out on each frame of image, decoding operation is carried out on the video stream to obtain a multi-frame image in a YUV format; converting the multi-frame image in the YUV format into a multi-frame image in an RGB format, wherein the multi-frame image in the RGB format is used for object identification and rendering to obtain a multi-frame rendered image in the RGB format; wherein generating a rendered video stream from the plurality of frames of rendered images comprises: converting a multi-frame rendering image in an RGB format into a multi-frame rendering image in a YUV format; and coding the multi-frame rendering image in the YUV format to obtain the rendering video stream.

In some embodiments, the model to be rendered includes at least one of a shoe model, a facial makeup model, and a hairstyle model.

In some embodiments, the feature point cloud data of each frame of image is used to determine the location in which the model to be rendered is rendered into the each frame of image.

According to a second aspect of the present disclosure, there is provided a cloud rendering apparatus deployed in a cloud end, including: the media service module is configured to receive a video stream and service information from a client, wherein the video stream comprises a plurality of frames of images; a task scheduling module configured to: determining an object recognition model and a rendering engine corresponding to the business information from a plurality of object recognition models and rendering engines deployed in the cloud; carrying out object identification on each frame of image by using an object identification model corresponding to the service information to obtain characteristic point cloud data of each frame of image; rendering a model to be rendered corresponding to the business information into each frame of image by utilizing a rendering engine corresponding to the business information according to the feature point cloud data of each frame of image to obtain a rendered image of each frame of image; the media service module is further configured to: generating a rendering video stream according to the multi-frame rendering image; sending a rendered video stream to the client.

In some embodiments, the media service module is further configured to, before performing object recognition on each frame of image, perform a decoding operation on the video stream to obtain a YUV-format multi-frame image, and send the YUV-format multi-frame image to the task scheduling module; the task scheduling module is also configured to receive the YUV format multi-frame image from the media service module and convert the YUV format multi-frame image into an RGB format multi-frame image, wherein the RGB format multi-frame image is used for object identification and rendering, and an RGB format multi-frame rendering image is obtained; the task scheduling module is further configured to convert the RGB-format multi-frame rendering image into a YUV-format multi-frame rendering image and send the YUV-format multi-frame rendering image to the media service module; the media service module is also configured to receive the multi-frame rendering image in the YUV format from the task scheduling module, and encode the multi-frame rendering image in the YUV format to obtain the rendering video stream.

According to a third aspect of the present disclosure, there is provided a cloud rendering apparatus including: a memory; and a processor coupled to the memory, the processor configured to perform the cloud rendering method of any of the above embodiments based on instructions stored in the memory.

According to a fourth aspect of the present disclosure, there is provided a computer-storable medium having stored thereon computer program instructions which, when executed by a processor, implement the cloud rendering method of any of the above embodiments.

In the embodiment, the rendering efficiency can be improved, and the service requirements of different types of clients can be adapted.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.

The present disclosure may be more clearly understood from the following detailed description, taken with reference to the accompanying drawings, in which:

fig. 1 is a flow diagram illustrating a cloud rendering method according to some embodiments of the present disclosure;

FIG. 2 is a block diagram illustrating a cloud rendering apparatus according to some embodiments of the present disclosure;

FIG. 3 is a block diagram illustrating a cloud rendering apparatus according to further embodiments of the present disclosure;

FIG. 4 is a block diagram illustrating a computer system for implementing some embodiments of the present disclosure.

Detailed Description

Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions, and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless specifically stated otherwise.

Meanwhile, it should be understood that the sizes of the respective portions shown in the drawings are not drawn in an actual proportional relationship for the convenience of description.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.

Techniques, methods, and apparatus known to those of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

Fig. 1 is a flow diagram illustrating a cloud rendering method according to some embodiments of the present disclosure.

As shown in fig. 1, the cloud rendering method includes steps S110 to S160. The cloud rendering method is executed by a cloud rendering device deployed at a cloud end.

In step S110, a video stream and service information from a client are received. The video stream includes a plurality of frames of images. For example, the traffic information includes a traffic class. The service category includes a multi-person interactive category and a single-person category. As another example, the service information may also include a service scenario. The business scenes comprise shoe trying, clothes trying, makeup trying and hair trying. In some embodiments, the traffic information is received through a signaling service thread.

In some embodiments, the client is a web page front end. Before receiving video stream and service information from a client, receiving a WebRTC (Web Real-Time Communication) connection establishment request from the client; and in response to receiving the WebRTC connection establishment request, negotiating with the client to establish the WebRTC connection.

In step S120, an object recognition model and a rendering engine corresponding to the business information are determined from among a plurality of object recognition models and rendering engines deployed in the cloud. Taking the example that the service information includes the service scene, an object recognition model and a rendering engine corresponding to the service scene are determined.

For example, for a business scene of a shoe try, the object recognition model and the rendering engine are a shoe try algorithm model and a shoe try rendering engine. For another example, for a business scene of makeup trial, the object recognition model and the rendering engine are a makeup trial algorithm model and a makeup trial rendering engine. For another example, for a trial-type business scene, the object recognition model and the rendering engine are a trial-type algorithm model and a trial-type rendering engine. Different object recognition models can adopt the same or different neural network models, and for different service scenes, the scene sources of the training data of the object recognition models are different, and the training parameters of the object recognition models can also be the same or different.

In some embodiments, the plurality of object recognition models and rendering engines deployed in the cloud end each require initialization of parameters of the object recognition models and parameters of the rendering engines prior to execution of the cloud rendering method. For example, an optimal resolution may be set, reducing real-time rendering latency. For example, in initializing the rendering engine, an OpenGL (Open Graphics Library) environment, a Shader (Shader) and a rendering program are initialized, three-dimensional vertex data and texture coordinate data are added, and a texture layer is set.

In step S130, object recognition is performed on each frame of image by using an object recognition model corresponding to the service information, so as to obtain feature point cloud data of each frame of image. For example, the object recognition model is a neural network model. In some embodiments, the feature point cloud data for each frame of image is used to determine the location in each frame of image where the model to be rendered is rendered.

In some embodiments, before performing object recognition on each frame of image, performing decoding operation on a video stream to obtain a YUV format multi-frame image; and converting the multi-frame image in the YUV format into a multi-frame image in the RGB format. The multi-frame image in the RGB format is used for object recognition and rendering, and the multi-frame rendered image in the RGB format is obtained. For example, the decode operation may be performed by a decode thread. Redundant data can be removed by decoding the images into a plurality of frames of images in YUV format, so that data transmission is facilitated.

In step S140, according to the feature point cloud data of each frame of image, a rendering engine corresponding to the service information is used to render the model to be rendered corresponding to the service information into each frame of image, so as to obtain a rendered image of each frame of image. For example, the model to be rendered includes at least one of a shoe model, a face makeup model, and a hairstyle model.

In some embodiments, step S140 may be implemented as follows.

First, the network status of the client is determined. In some embodiments, the state level of the network state may be determined based on state information such as the transmission rate of the network. For example, the transmission rate of the network is divided into a plurality of intervals, and each interval corresponds to a state level of the network state. For example, the higher the transmission rate, the higher the status level, the better the network status of the client.

Then, a rendering resolution corresponding to the network state of the client is determined. For example, the corresponding relationship between the state level and the rendering resolution is stored in advance, and the corresponding rendering resolution is determined from the corresponding relationship according to the determined state level of the network state. In some embodiments, the network state is communicated through a signaling service thread.

And finally, rendering the model to be rendered corresponding to the service information into each frame of image based on the rendering resolution to obtain the rendered image of each frame of image. That is, the resolution of the rendered image per frame image is the rendering resolution.

In the embodiment, the corresponding rendering resolution is dynamically determined according to the network state of the client, so that rendering can be performed by using the rendering resolution corresponding to the network state, the network state of the client can be automatically adapted, the fluency of the client for displaying the rendered video stream is improved, and the user experience can be improved.

In some embodiments, after obtaining the rendering image of each frame of image, storing the obtained rendering image of each frame of image in a buffer area; and acquiring a rendering image of a previous frame image of each frame image from the buffer area as the rendering image of each frame image under the condition that the rendering image of each frame image is not obtained within the preset time aiming at each frame image.

In the above embodiment, by presetting the buffer area, under the condition that the cloud network condition is poor, the user can still obtain the rendering image in a short time, so that the rendering efficiency can be further improved, and the user experience can be further improved.

In step S150, a rendered video stream is generated from the plurality of frames of rendered images.

Taking the example that the multi-frame rendering image is the RGB format multi-frame image, converting the RGB format multi-frame rendering image into the YUV format multi-frame rendering image, and encoding the YUV format multi-frame rendering image to obtain the rendering video stream. For example, encoding is performed using an encoding thread.

Taking the example that the service information comprises the service category, determining other clients interacting with the client under the condition that the service category belongs to the multi-person interaction category; converging the multi-frame rendering image corresponding to the client and the multi-frame rendering images corresponding to other clients to obtain a multi-frame converging image; and generating a rendering video stream according to the multi-frame confluence image. Through the cloud confluence, the rapid rendering under the multi-person interactive scene is realized.

In some embodiments, a merging configuration parameter and a merging scheme may be preset, and merging may be performed according to the preset merging configuration parameter and the merging scheme. For example, the merge configuration parameters include picture level, merge resolution, merge rate, and the like. The merging scheme includes determining a layout of a merged picture for different multi-person AR scenes. For example, in an interactive game scene, a common rendering model is loaded into the counterpart's screen. As another example, in a self-portrait special effects scene, an image is drawn to scale to a specified location of the canvas.

In step S160, the rendered video stream is sent to the client.

In the embodiment, the plurality of object identification models and the rendering engines are deployed at the cloud end, so that the corresponding object identification models and the corresponding rendering engines can be selected for rendering according to the service of the client, the rendering efficiency can be improved, the service requirements of the clients of different types can be automatically adapted, and the practicability is higher.

Fig. 2 is a block diagram illustrating a cloud rendering apparatus according to some embodiments of the present disclosure.

As shown in fig. 2, the cloud rendering apparatus 2 includes a media service module 21 and a task scheduling module 22.

The media service module 21 is configured to receive the video stream and the service information from the client, for example, execute step S110 shown in fig. 1. The video stream includes a plurality of frames of images.

The task scheduling module 22 is configured to determine an object recognition model and a rendering engine corresponding to the business information from a plurality of object recognition models and rendering engines deployed in the cloud, for example, perform step S120 shown in fig. 1. For example, the cloud rendering apparatus 2 further includes a plurality of object recognition models 23 and a rendering engine 24.

The task scheduling module 22 is further configured to perform object recognition on each frame of image by using an object recognition model corresponding to the service information, so as to obtain feature point cloud data of each frame of image, for example, perform step S130 shown in fig. 1. For example, the task scheduling module 22 sends each frame of image to the object recognition model 23, and the object recognition model 23 performs object recognition on each frame of image to obtain feature point cloud data of each frame of image. In some embodiments, the stream push thread of the media service module 21 pushes each frame of image to the task scheduling module 22 in real time.

The task scheduling module 22 is further configured to render the model to be rendered corresponding to the service information into each frame of image by using the rendering engine corresponding to the service information according to the feature point cloud data of each frame of image, so as to obtain a rendered image of each frame of image, for example, perform step S140 shown in fig. 1. For example, the object recognition model 23 sends each frame of image and the corresponding feature point cloud data to the corresponding rendering engine 24, and the rendering engine 24 renders the model to be rendered corresponding to the service information into each frame of image according to the feature point cloud data of each frame of image, so as to obtain a rendered image of each frame of image.

The media service module 21 is further configured to generate a rendered video stream from the multiple frames of rendered images, for example, to perform step S150 shown in fig. 1. For example, the rendering engine 24 sends the rendered multi-frame rendering image to the task scheduling module 22, and the task scheduling module 22 sends the multi-frame rendering image to the media service module 21. In some embodiments, the pull thread of the media service module 21 pulls the multi-frame rendering image in the task scheduling module 22 in real time at a preset frame rate.

The media service module 21 is further configured to send the rendered video stream to the client, for example, to perform step S160 as shown in fig. 1.

In some embodiments, the media service module 21 is further configured to, after receiving the video stream and before performing object recognition on each frame of image, perform a decoding operation on the video stream to obtain a YUV format multi-frame image, and send the YUV format multi-frame image to the task scheduling module 22.

The task scheduling module 22 is further configured to receive the YUV format multi-frame image from the media service module 21 and convert the YUV format multi-frame image into an RGB format multi-frame image. The RGB-format multi-frame image is used for the object recognition model 23 and the rendering engine 24 to perform object recognition and rendering, respectively, to obtain an RGB-format multi-frame rendered image.

The task scheduling module 22 is further configured to convert the RGB-format multi-frame rendered image into a YUV-format multi-frame rendered image, and transmit the YUV-format multi-frame rendered image to the media service module 21.

The media service module 21 is further configured to receive the YUV-format multi-frame rendered image from the task scheduling module, and encode the YUV-format multi-frame rendered image to obtain a rendered video stream.

In some embodiments, the cloud rendering device 2 further includes a cloud merging module 25. Taking the example that the service information includes the service category and the service category belongs to the multi-user interaction category, the task scheduling module 22 is further configured to determine other clients interacting with the client, and notify rendering engines corresponding to the client and the other clients to send the rendered image to the cloud merging module 25.

The cloud merging module 25 is configured to receive rendered images from the rendering engines 24 corresponding to the client and other clients, and merge a multi-frame rendered image corresponding to the client and a multi-frame rendered image corresponding to other clients to obtain a multi-frame merged image. The cloud merge module 25 is further configured to send the multiple frames of the merged image to the task scheduling module 22. The task scheduling module 22 performs format conversion on the multiple frames of the merged image and forwards the converted multiple frames of the merged image to the media service module 21. The media service module 21 generates a rendered video stream from the plurality of frames of the merged image.

Fig. 3 is a block diagram illustrating a cloud rendering apparatus according to further embodiments of the present disclosure.

As shown in fig. 3, the cloud rendering apparatus 3 includes a memory 31; and a processor 32 coupled to the memory 31. The memory 31 is used for storing instructions for executing the corresponding embodiments of the cloud rendering method. The processor 32 is configured to perform the cloud rendering method in any of the embodiments of the present disclosure based on instructions stored in the memory 31.

As shown in FIG. 4, computer system 40 may take the form of a general purpose computing device. Computer system 40 includes a memory 410, a processor 420, and a bus 400 that connects the various system components.

The memory 410 may include, for example, system memory, non-volatile storage media, and the like. The system memory stores, for example, an operating system, an application program, a Boot Loader (Boot Loader), and other programs. The system memory may include volatile storage media such as Random Access Memory (RAM) and/or cache memory. The non-volatile storage medium, for example, stores instructions to perform corresponding embodiments of at least one of the cloud rendering methods. Non-volatile storage media include, but are not limited to, magnetic disk storage, optical storage, flash memory, and the like.

Processor 420 may be implemented as discrete hardware components, such as general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gates or transistors, and the like. Accordingly, each of the modules, such as the judging module and the determining module, may be implemented by a Central Processing Unit (CPU) executing instructions in a memory for performing the corresponding step, or may be implemented by a dedicated circuit for performing the corresponding step.

Bus 400 may use any of a variety of bus architectures. For example, bus structures include, but are not limited to, Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, and Peripheral Component Interconnect (PCI) bus.

Computer system 40 may also include input output interface 430, network interface 440, storage interface 450, and the like. These

interfaces

430, 440, 450 and the memory 410 and the processor 420 may be connected by a bus 400. The input/output interface 430 may provide a connection interface for input/output devices such as a display, a mouse, a keyboard, and the like. The network interface 440 provides a connection interface for various networking devices. The storage interface 450 provides a connection interface for external storage devices such as a floppy disk, a usb disk, and an SD card.

Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable apparatus to produce a machine, such that the execution of the instructions by the processor results in an apparatus that implements the functions specified in the flowchart and/or block diagram block or blocks.

These computer-readable program instructions may also be stored in a computer-readable memory that can direct a computer to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instructions which implement the function specified in the flowchart and/or block diagram block or blocks.

The present disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects.

By the cloud rendering method and device and the computer storage medium in the embodiment, the rendering efficiency can be improved, and the service requirements of different types of clients can be met.

Thus far, a cloud rendering method and apparatus, computer-storable medium, according to the present disclosure have been described in detail. Some details that are well known in the art have not been described in order to avoid obscuring the concepts of the present disclosure. It will be fully apparent to those skilled in the art from the foregoing description how to practice the presently disclosed embodiments.

Claims

1. A cloud rendering method executed by a cloud rendering device deployed in a cloud end includes:

receiving a video stream and service information from a client, wherein the video stream comprises a plurality of frames of images;

determining an object recognition model and a rendering engine corresponding to the business information from a plurality of object recognition models and rendering engines deployed in the cloud;

carrying out object identification on each frame of image by using an object identification model corresponding to the service information to obtain characteristic point cloud data of each frame of image;

rendering a model to be rendered corresponding to the business information into each frame of image by utilizing a rendering engine corresponding to the business information according to the feature point cloud data of each frame of image to obtain a rendered image of each frame of image;

generating a rendering video stream according to the multi-frame rendering image;

sending a rendered video stream to the client.

2. The cloud rendering method of claim 1, further comprising:

after the rendering image of each frame of image is obtained, storing the obtained rendering image of each frame of image in a buffer area;

and acquiring a rendering image of a previous frame image of each frame image from the buffer area as the rendering image of each frame image under the condition that the rendering image of each frame image is not obtained within a preset time.

3. The cloud rendering method of claim 1, wherein rendering a model to be rendered corresponding to the service information into each frame of image, and obtaining the rendered image of each frame of image comprises:

determining a network state of the client;

determining a rendering resolution corresponding to a network state of the client;

rendering the model to be rendered corresponding to the service information into each frame of image based on the rendering resolution to obtain a rendered image of each frame of image.

4. The cloud rendering method of claim 1, wherein the traffic information includes a traffic class, and generating a rendered video stream from the multi-frame rendered image comprises:

determining other clients interacting with the client under the condition that the service type belongs to a multi-person interaction type;

converging the multi-frame rendering image corresponding to the client and the multi-frame rendering images corresponding to the other clients to obtain a multi-frame converging image;

and generating the rendering video stream according to the multi-frame confluence image.

5. The cloud rendering method of claim 1, wherein the business information includes a business scene, and determining an object recognition model and a rendering engine corresponding to the business information includes:

and determining an object recognition model and a rendering engine corresponding to the business scene.

6. The cloud rendering method of claim 1, wherein the client is a web front end, the cloud rendering method further comprising:

before receiving video stream and service information from a client, receiving a webpage instant messaging (WebRTC) connection establishment request from the client;

and responding to the received WebRTC connection establishment request, and negotiating with the client to establish the WebRTC connection.

7. The cloud rendering method of claim 1, further comprising:

before object recognition is carried out on each frame of image, decoding operation is carried out on the video stream to obtain a multi-frame image in a YUV format;

converting the multi-frame image in the YUV format into a multi-frame image in an RGB format, wherein the multi-frame image in the RGB format is used for object identification and rendering to obtain a multi-frame rendered image in the RGB format;

wherein the content of the first and second substances,

generating a rendered video stream from the plurality of frames of rendered images comprises:

converting a multi-frame rendering image in an RGB format into a multi-frame rendering image in a YUV format;

and coding the multi-frame rendering image in the YUV format to obtain the rendering video stream.

8. The cloud rendering method of claim 1, wherein the model to be rendered comprises at least one of a shoe model, a facial makeup model, and a hairstyle model.

9. The cloud rendering method of claim 1, wherein the feature point cloud data for each frame of image is used to determine a location in which the model to be rendered is rendered into the each frame of image.

10. A cloud rendering device deployed at a cloud, comprising:

the media service module is configured to receive a video stream and service information from a client, wherein the video stream comprises a plurality of frames of images;

a task scheduling module configured to:

the media service module is further configured to:

sending a rendered video stream to the client.

11. The cloud rendering apparatus of claim 10,

the media service module is further configured to, before performing object identification on each frame of image, perform decoding operation on the video stream to obtain a YUV-format multi-frame image, and send the YUV-format multi-frame image to the task scheduling module;

the task scheduling module is also configured to receive the YUV format multi-frame image from the media service module and convert the YUV format multi-frame image into an RGB format multi-frame image, wherein the RGB format multi-frame image is used for object identification and rendering, and an RGB format multi-frame rendering image is obtained;

wherein the content of the first and second substances,

the task scheduling module is also configured to convert the RGB format multi-frame rendering image into a YUV format multi-frame rendering image and send the YUV format multi-frame rendering image to the media service module;

the media service module is also configured to receive the multi-frame rendering image in the YUV format from the task scheduling module, and encode the multi-frame rendering image in the YUV format to obtain the rendering video stream.

12. A cloud rendering apparatus, comprising:

a memory; and

a processor coupled to the memory, the processor configured to perform the cloud rendering method of any of claims 1 to 9 based on instructions stored in the memory.

13. A computer-storable medium having stored thereon computer program instructions which, when executed by a processor, implement the cloud rendering method of any of claims 1 to 9.