CN118154820A

CN118154820A - Real-time virtual-real fusion and real-time virtual-real interactive performance method

Info

Publication number: CN118154820A
Application number: CN202410586200.3A
Authority: CN
Inventors: 周坚
Original assignee: Nanjing Zhuying Digital Technology Co ltd
Current assignee: Nanjing Zhuying Digital Technology Co ltd
Priority date: 2024-05-13
Filing date: 2024-05-13
Publication date: 2024-06-07

Abstract

The invention discloses an interactive presentation method based on real-time virtual-real fusion and real-time virtual-real, and relates to the technical field of intelligent education. In order to solve the problem that in practical application, especially in a complex or dynamic background, the accuracy of gesture recognition may be affected, affecting the interactive experience of a user and a virtual studio; an interactive playing method based on real-time virtual-real fusion and real-time virtual-real interaction comprises the following steps: step one: initializing a system; step two: capturing a three-dimensional character or article; step three: green screen matting processing; step four: fusing deficiency and excess; step five: an interactive performance; the three-dimensional characters or articles are processed from the images in the real scene through the green screen matting technology and embedded into the virtual scene, the shadows of the three-dimensional characters or articles are simulated and calculated, the real fusion with the virtual scene is realized, students can learn at any place and any time, the restriction of time and space is avoided, and the popularity and accessibility of education are enlarged.

Description

Real-time virtual-real fusion and real-time virtual-real interactive performance method

Technical Field

The invention relates to the technical field of intelligent education, in particular to an interactive presentation method based on real-time virtual-real fusion and real-time virtual-real interaction.

Background

With the rapid development of information technology and virtual reality technology, distance education has become an important form of education. However, conventional distance education generally performs information transfer only through video and audio, and lacks realism and interactivity. Regarding the interactive broadcasting method, the disclosure number is: the China patent of CN110045821A discloses an augmented reality interaction method facing a virtual concert hall, which comprises the following steps: (1) Performing character segmentation and background color filtering on the color image captured by the Kinect; (2) Performing corrosion-before-expansion shadow elimination treatment on the treated image in the step (1); (3) The obtained image is used as dynamic texture to be assigned to a plane in the constructed virtual studio; (4) And recognizing the character skeleton in the captured scene according to the Kinect, and combining the depth image captured by the Kinect, so as to position the depth position of the character in the space. According to the recognition of the hand joints, various gestures are designed, and various interactive judgment in space is realized by combining the three-dimensional coordinates of the hands. The patent only needs simple gesture operation users to realize portable and content-rich augmented reality interaction experience.

The above patent, although enabling an interactive experience through gestures, still has the following problems:

In the prior art, although the interactive experience is realized through gestures, in practical application, especially in a complex or dynamic background, the accuracy of gesture recognition may be affected, which affects the interactive experience of a user with a virtual studio.

Disclosure of Invention

The invention aims to provide an interactive playing method based on real-time virtual-real fusion and real-time virtual-real, wherein three-dimensional characters or objects are processed from images in a real scene through a green screen matting technology and embedded into a virtual scene, shadows of the three-dimensional characters or objects are calculated in a simulation mode, real fusion with the virtual scene is achieved, students can learn at any place and any time through an online platform and cloud technology, the limitation of time and space is avoided, the popularity and accessibility of education are enlarged, and the problems in the background technology are solved.

In order to achieve the above purpose, the present invention provides the following technical solutions:

An interactive playing method based on real-time virtual-real fusion and real-time virtual-real interaction comprises the following steps:

Step one: initializing a system: setting required scene equipment in an actual scene, and establishing data connection between the scene equipment and an online platform and cloud technology;

Step two: capturing a three-dimensional character or article: starting a camera, acquiring a real scene acquired by the camera, capturing three-dimensional characters or objects in the real scene, and transmitting data acquired by the camera to a processor based on a wireless network;

Step three: green curtain matting processing: the processor separates a green pixel area from a non-green pixel area in the data acquired by the camera based on an HSV color space method, and extracts the captured three-dimensional character or object from the real scene to obtain an image only containing a target object;

Step four: and (3) virtual-real fusion: the extracted image only containing the target object is embedded into the constructed virtual scene, and the shadow of the three-dimensional character or article is simulated and calculated through a shader algorithm, so that the real fusion with the virtual scene is realized;

step five: interactive performance: in the virtual studio, interacting with three-dimensional characters or objects in the virtual scene based on a gesture or voice recognition mode, and remotely playing the set animation of the objects in the virtual scene through a remote controller;

step six: on-line transmission: and transmitting the images and the sounds of the virtual studio to the intelligent equipment terminal where the student is located in real time based on an online platform and a cloud technology.

Further, aiming at capturing three-dimensional figures or objects in the second step, the method specifically comprises the following steps:

starting a camera, ensuring the hardware connection and initialization of the camera, and acquiring video frames of a real scene in real time after the camera is started;

Preprocessing the acquired image, and estimating the depth information of each pixel point in the image based on a depth estimation algorithm;

The brightness data of each pixel point in the monitoring video data are collected, the brightness data of each pixel point are processed according to a preset nonlinear activation function, a processing result is obtained, the video data are compensated based on a brightness compensation function, and compensated video data are obtained;

Carrying out framing treatment on the compensated video data to obtain a treatment result, and extracting characteristic points of three-dimensional characters or objects from each frame of images in the treatment result to obtain a characteristic point extraction result;

comparing the feature point extraction results of each frame of image, and selecting the feature points of the target person appearing in each frame of image;

Extracting a target unit data segment in the compensated video data according to the target character feature points, and generating a position attribute of a three-dimensional character or object based on the sequence of the target unit data segment in a time sequence carried by the video data;

identifying a three-dimensional person or object in the image, determining the position of the three-dimensional person or object in the video frame, and determining the tracking result of the target person or object;

and integrating the depth information and the tracking result of the target person or object to generate a capturing result of the three-dimensional person or object.

Further, aiming at the green curtain matting processing in the third step, the method specifically comprises the following steps:

reading an image of a green curtain background and an image containing foreground objects, and converting the image from an RGB color space to an HSV color space;

dividing an input image into a green area and a non-green area according to the HSV value range of the green pixels in an HSV color space, generating a corresponding mask, marking the green area as white, and marking the non-green area as black;

the method comprises the steps of performing bitwise and operation on an image and a mask, matting off a green curtain background to obtain an image only containing a foreground object, and combining the image after matting off the green curtain with a new background image to obtain an image only containing a target object;

The bitwise and operation on the image and the mask is realized through the bitwise and operation function, and the green curtain is buckled.

Further, for the simulation calculation of the shadow of the three-dimensional character or article in the fourth step through a shader algorithm, the shader algorithm specifically comprises:

Extracting rendering types and eliminating attributes at a shader, determining rendering modes and appearance materials of three-dimensional figures or objects based on the rendering types, and determining the range of the rendered objects based on the setting of the eliminating attributes to obtain shadows of the three-dimensional figures or objects which are not affected by illumination;

creating a label based on the determined rendering type and the set rejection attribute, determining a rendering mode of the three-dimensional character or object when rendering different types of light sources based on the label, and recording reference information in a shadow map;

wherein, when using the tag, a shadow casting mode of a network renderer component corresponding to a three-dimensional character or object is set to turn on shadow casting or cast shadow only.

Further, for the interaction with the three-dimensional character or the object in the virtual scene in the fifth step, the method specifically comprises the following steps:

Capturing gestures in the real world through a camera to generate gesture data, extracting key features from the gesture data, training the extracted features, and generating a gesture recognition model;

gesture data of three-dimensional characters or objects captured in real time through a camera are input into a gesture recognition model for recognition, and whether interaction data occur is judged;

The interaction data are matched with preset interaction instructions one by one to obtain a matching result;

And transmitting the matched interaction instruction to a simulation scene through a communication link, and controlling the three-dimensional character or article in the simulation scene to respond.

Further, for the interactive performance in the fifth step, the method further includes voice acquisition of the three-dimensional character or object, specifically:

collecting voice data of a three-dimensional character through a microphone, preprocessing the voice data, and extracting key features from the preprocessed voice data;

Training the extracted key features based on a voice recognition algorithm, generating a voice recognition model, and playing the animation of the object in the virtual scene based on the voice data recognition result of the voice recognition model.

Further, for online transmission in the step six, specifically:

compression encoding is carried out on video data in the virtual studio based on a video encoder, compression encoding is carried out on voice data based on an audio encoder, and encoded video streams and audio streams are transmitted to a cloud server based on a real-time transmission protocol;

The cloud server receives the video stream and the audio stream, transcodes, stores and distributes the video stream and the audio stream, and establishes a data interaction channel with an online platform;

the intelligent device accesses an online platform of the virtual studio, receives video streams and audio streams from the cloud server based on the data interaction channel, decodes the video streams and the audio streams, and restores original images and sound;

The decoded images and sounds are displayed on the intelligent device of the student and interact with the virtual studio through the intelligent device.

Further, compression encoding is performed on video data in the virtual studio based on a video encoder, compression encoding is performed on voice data based on an audio encoder, and encoded video and audio streams are transmitted to a cloud server based on a real-time transmission protocol, including:

Performing compression coding processing on video data in the virtual studio based on a video coder to obtain a video stream data packet corresponding to a video stream;

performing compression coding processing on voice data in the virtual studio based on an audio encoder to obtain an audio stream data packet corresponding to an audio stream;

monitoring communication parameters in communication connection with a cloud server in real time, and judging whether the current communication state simultaneously carries out data transmission of a video stream data packet and an audio stream data packet according to the communication parameters;

When the communication state does not meet the data transmission requirement of the video stream data packet and the audio stream data packet, splitting the video stream data packet and the audio stream data packet to form a plurality of video stream sub-data packets and audio stream sub-data packets; wherein, the start time and the end time of the splitting of the video stream data packet and the audio stream data packet are the same;

packaging the video stream sub-data packet and the audio stream sub-data packet with the same starting time and ending time into an audio-video stream data packet;

sequentially sending the audio and video stream data packets to a cloud server;

Monitoring the operation parameters of a communication channel between the audio and video stream data packet transmission process and a cloud server in real time, and acquiring the operation quality evaluation parameters of the communication channel through the operation parameters; the operation quality evaluation parameter of the communication channel is obtained through the following formula:

；

Wherein F represents an operational quality assessment parameter of the communication channel; n represents the number of unit time contained in the operation duration of the communication channel, and the unit time is 1s; p _i represents the error rate of the data transmission corresponding to the ith unit time; s _i represents the signal-to-noise ratio of the communication channel corresponding to the ith unit time; f represents an adjustment coefficient; s ₀ represents the theoretical signal-to-noise ratio corresponding to the communication channel; f _i represents the actual communication frequency of the communication channel corresponding to the i-th unit time; f ₀ denotes a communication frequency corresponding to a communication channel;

and when the operation quality evaluation parameter of the communication channel is lower than a preset parameter threshold, carrying out communication abnormity alarm.

Further, monitoring communication parameters of communication connection with the cloud server in real time, determining whether the current communication state simultaneously carries out data transmission of the video stream data packet and the audio stream data packet according to the communication parameters, including:

monitoring the unit time change rate of the residual capacity of a communication channel connected with the cloud server in real time;

Monitoring the change rate of the actual bandwidth of a communication channel in communication connection with the cloud server in real time;

Monitoring the unit time change rate of the communication frequency of a communication channel in communication connection with the cloud server in real time;

Generating a stability evaluation parameter corresponding to the communication channel by using the unit time change rate of the residual capacity of the communication channel, the unit time change rate of the actual bandwidth and the unit time change rate of the communication frequency, wherein the stability evaluation parameter is obtained by the following formula:

；

Wherein ζ represents a stability assessment parameter; l _c、L_b and L _f respectively correspond to a unit time change rate representing the residual capacity of a communication channel, a unit time change rate of an actual bandwidth and a unit time change rate of a communication frequency;

When the stability evaluation parameter reaches or exceeds a preset stability evaluation threshold, judging that the current communication state of the communication channel meets the requirement of simultaneously transmitting data of the video stream data packet and the audio stream data packet;

And when the stability evaluation parameter is lower than a preset stability evaluation threshold, judging that the current communication state of the communication channel does not meet the requirement of simultaneously transmitting the video stream data packet and the audio stream data packet.

Further, when the communication state does not meet the data transmission requirement of the video stream data packet and the audio stream data packet, splitting the video stream data packet and the audio stream data packet to form a plurality of video stream sub-data packets and audio stream sub-data packets, including:

When the communication state does not meet the data transmission requirement of the video stream data packet and the audio stream data packet, the stability evaluation parameter corresponding to the current communication channel is called;

acquiring a data segmentation coefficient by utilizing the difference value between the stability evaluation parameter and a preset stability evaluation threshold; the data segmentation coefficient is obtained through the following formula:

；

Wherein δ represents a data division coefficient; l _x represents a constant; ζ ₀ represents a preset stability evaluation threshold; c _i denotes a channel remaining capacity corresponding to the i-th communication channel per unit time; c _b represents the corresponding channel remaining capacity in case of channel saturation; c _e denotes the overall capacity corresponding to the communication channel;

Acquiring the upper limit value of the data quantity corresponding to the data packets of the video stream sub-data packet and the audio stream sub-data packet by utilizing the data segmentation coefficient; the upper limit value of the data packet corresponding to the data packet of the video stream sub-data packet and the data packet of the audio stream sub-data packet is obtained through the following formula:

；

Wherein, C _s and C _y represent upper limit values of data amounts corresponding to the packets of the video stream sub-packets and the audio stream sub-packets; c ₀ represents the reference data quantity corresponding to the preset sub-data packet; c _x denotes a channel remaining capacity of the current communication channel;

Splitting the video stream data packet and the audio stream data packet according to the upper limit value of the data packet corresponding to the video stream data packet and the audio stream data packet, obtaining a plurality of video stream data packets and audio stream data packets with data volumes not exceeding the corresponding upper limit value, wherein the starting time and the ending time of the splitting of the video stream data packet and the audio stream data packet are the same.

Compared with the prior art, the invention has the beneficial effects that:

the three-dimensional character or object figure is captured through the camera, the three-dimensional character or object figure is processed from the image in the real scene through the green screen matting technology, the three-dimensional character or object figure is embedded into the virtual scene, the shadow of the three-dimensional character or object figure is calculated in a simulation mode, the real fusion with the virtual scene is achieved, in the virtual studio, the character in the scene can remotely play the set animation of the object in the virtual scene through the remote controller, students can learn at any place and any time through the online platform and cloud technology, the restriction of time and space is avoided, and the popularity and accessibility of education are enlarged.

Drawings

FIG. 1 is a flow chart of an interactive presentation method based on real-time virtual-real fusion and real-time virtual-real interaction in the invention;

Fig. 2 is a flowchart of an interactive performance method based on real-time virtual-real fusion and real-time virtual-real.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In order to solve the technical problem that in practical application, especially in a complex or dynamic background, the accuracy of gesture recognition may be affected, and the interaction experience between a user and a virtual studio is affected, please refer to fig. 1-2, the present embodiment provides the following technical scheme:

Step one: initializing a system: setting required scene equipment in an actual scene, wherein the scene equipment comprises a camera, green screen matting equipment, virtual scene generating equipment, a remote controller, a pad, an HDMI (high definition multimedia interface) line and recording equipment, and establishing data connection between the scene equipment and an online platform and cloud technology;

Step five: interactive performance: in the virtual studio, interaction is carried out with three-dimensional characters or objects in the virtual scene based on a gesture or voice recognition mode, and meanwhile, animation playing is carried out on objects in the virtual scene remotely through a remote controller, wherein the animation playing specifically comprises the following steps:

Capturing gestures in the real world through a camera to generate gesture data, extracting key features from the gesture data, such as the shape, speed, direction and the like of the gestures, training the extracted features, and generating a gesture recognition model;

the interaction data are matched with a preset interaction instruction one by one to obtain a matching result;

transmitting the matched interaction instruction to a simulation scene through a communication link, and controlling three-dimensional characters or objects in the simulation scene to respond actions such as position adjustment, rotation, zooming in and out and the like;

Collecting voice data of a three-dimensional character through a microphone, preprocessing the voice data, and extracting key features such as pitch, duration, tone color and the like from the preprocessed voice data;

training the extracted key features based on a voice recognition algorithm, generating a voice recognition model, and playing the animation of the object in the virtual scene based on the voice data recognition result of the voice recognition model;

In this embodiment, the content of the animation playing setting is customized and developed according to the course content, for example, the steps of washing hands include five steps, the virtual characters in the virtual scene are remotely controlled to wash hands in the third step, and in the virtual studio, the three-dimensional characters can individually and deeply explain the step;

step six: on-line transmission: based on an online platform and a cloud technology, transmitting images and sounds of a virtual studio to an intelligent equipment terminal where a student is located in real time, so that the student can learn at any place and any time, specifically:

The cloud server receives the video stream and the audio stream, transcodes, stores and distributes the video stream and the audio stream, and establishes a data interaction channel with the online platform;

The intelligent device accesses an online platform of the virtual studio, such as a webpage, an application program and the like, receives video streams and audio streams from the cloud server based on the data interaction channel, decodes the video streams and the audio streams, and restores original images and sound;

The decoded images and sounds are displayed on the intelligent equipment of the students for the students to learn and watch, the students can learn at any place and any time without being limited by time and place, and interact with the virtual studio through the intelligent equipment, such as submitting homework, participating in discussion and the like, meanwhile, the students can provide feedback and advice so as to improve the learning experience of the virtual studio, when the learning is finished, the students can opt to exit from an online learning platform and save or share learning achievements, meanwhile, the teaching in the form can be applied to various education forms such as MOOC, online education and the like, and the popularity and accessibility of education are enlarged.

In the embodiment, the three-dimensional character or object shape is captured through a camera, the three-dimensional character or object is processed from the image in the real scene through a green screen matting technology, the three-dimensional character or object is embedded into the virtual scene, the shadow of the three-dimensional character or object is simulated and calculated through a camera algorithm, the real fusion with the virtual scene is realized, in a virtual studio, the character in the scene can remotely play the set animation of the object in the virtual scene through a remote controller, in the process of speech, the character can write through a pad in a hand, the character can be transmitted to software through an HDMI line, the picture-in-picture function is realized, the picture is collected through the image and microphone collection data on the character body, the mp4 format video is converged, and the lecture is recorded.

In this embodiment, for capturing a three-dimensional character or object in the second step, specifically:

Starting a camera to ensure the hardware connection and initialization of the camera, wherein the method comprises the steps of setting proper parameters such as resolution, frame rate, exposure and the like, and collecting video frames of a real scene in real time after the camera is started, wherein the video frames comprise capturing light rays and converting the capturing light rays into digital signals, and the camera continuously captures the video frames, wherein tens to hundreds of frames can be captured per second;

Preprocessing the acquired image, including denoising, color correction, brightness adjustment and the like, so as to improve the effect of subsequent processing, and estimating the depth information of each pixel point in the image based on a depth estimation algorithm, such as a structured light-based method, a stereoscopic vision-based method or a depth learning-based monocular depth estimation method;

The brightness data of each pixel point in the monitoring video data are collected, the brightness data of each pixel point are processed according to a preset nonlinear activation function, a processing result is obtained, the video data are compensated based on a brightness compensation function, and the compensated video data are obtained;

carrying out framing treatment on the compensated video data, obtaining a treatment result, extracting characteristic points of three-dimensional characters or objects from each frame of images in the treatment result, and obtaining a characteristic point extraction result;

And extracting the target unit data segment in the compensated video data according to the target character feature points, and generating the position attribute of the three-dimensional character or object based on the sequence of the target unit data segment in the time sequence carried by the video data.

In this embodiment, the luminance data is processed through a preset nonlinear activation function, and the video data is compensated by combining the luminance compensation function, so that image distortion caused by uneven luminance can be significantly reduced, the overall visibility of the video is improved, a basis is provided for subsequent target recognition, tracking and positioning, a data basis is provided for determining the position of a three-dimensional character or object in a video frame based on the generated position attribute of the three-dimensional character or object, the improvement of video quality and visual effect is facilitated, and powerful support is provided for subsequent video analysis;

and combining the depth information and the tracking result of the target person or object to integrate and generate the capturing result of the three-dimensional person or object, thereby realizing continuous tracking of the target object.

In this embodiment, for the green screen matting processing in the third step, the specific steps are:

In this embodiment, converting an image from an RGB color space to an HSV color space is accomplished using an open source computer vision library in which color space conversion functions are functions for converting an image from one color space to another, in the field of computer vision, color space conversion of an image is often required for subsequent processing and analysis, color space conversion functions may accomplish conversion between different color spaces, for example: RGB, HSV, HSL, YUV, GRAY, etc., wherein the color space conversion function requires two parameters, the first parameter being the image to be converted and the second parameter being the designated target color space;

In this embodiment, in the background replacement, the mask is used to mark the background to be replaced as white, and the foreground object is marked as black, the background and the foreground are marked in the HSV color space based on the marking result, the pixels matched with the background of the green screen are marked as white, the value is 255, the pixels not matched with the background are marked as black, the value is 0, and the mask is multiplied with the original image, so that an image only containing the foreground object can be obtained; the bitwise and operation on the image and the mask is realized through the bitwise and operation function, and the green curtain is buckled.

In this embodiment, the shader algorithm is specifically:

Extracting rendering types and eliminating attributes at the shader, and determining rendering modes and appearance materials of the three-dimensional character or object based on the rendering types, such as: if the rendering type is opaque, it indicates that the object defined by the shader is opaque, and if the rendering type is transparent, it indicates that the object defined by the shader is translucent; determining a range of rendering objects based on the set culling attribute, such as: setting closed rejection, namely rendering the front and back of the object, setting only the back, representing that only the front is rendered, setting only the front is rendered, representing that only the back is rendered, and rendering the front and back of the three-dimensional character or object by setting closed rejection in the example to obtain the shadow of the three-dimensional character or object which is not influenced by illumination;

creating a label based on the determined rendering type and the set rejection attribute, determining a rendering mode of the three-dimensional character or object when rendering different types of light sources based on the label, and recording reference information in the shadow map; tags are commonly used on objects that need to receive and generate real-time shadows, such as characters, scenes, etc.;

Wherein, when using the tag, a shadow casting mode of a network renderer component corresponding to the three-dimensional character or object is set to turn on shadow casting or cast shadow only.

In the embodiment, the shader algorithm can optimize the performance of graphic rendering by flexibly setting the rendering type and the rejection attribute, organizing the rendering behavior by using the label and configuring the shadow casting mode and control the casting of the shadow, and particularly when a large number of objects and complex illumination conditions are processed, the efficient, vivid and flexible graphic rendering can be realized in the virtual environment, particularly the shadow generation and processing of three-dimensional characters or objects, and the sense of reality and immersion of the virtual scene are greatly enhanced.

Specifically, compression encoding is performed on video data in a virtual studio based on a video encoder, compression encoding is performed on voice data based on an audio encoder, and encoded video streams and audio streams are transmitted to a cloud server based on a real-time transmission protocol, including:

sequentially sending the audio and video stream data packets to a cloud server;

；

The technical effects of the technical scheme are as follows: the video encoder and the audio encoder are used for carrying out compression encoding on video and voice data in the virtual studio, so that the transmission quantity of the data can be obviously reduced, and the transmission efficiency of the data can be improved. The compressed and encoded video stream data packet and audio stream data packet are more suitable for transmission in a network environment with limited bandwidth, and the transmission cost is reduced. The communication connection state between the cloud server and the cloud server is monitored in real time, and the real-time performance and stability of data transmission can be ensured. When the communication state does not meet the requirement of simultaneous transmission of the video stream data packet and the audio stream data packet, the transmission strategy can be flexibly adjusted by splitting the data packet so as to adapt to different network environments and communication conditions and ensure the continuity and the integrity of data transmission. The video stream sub-data packet and the audio stream sub-data packet with the same starting time and ending time are packed into one audio/video stream data packet, so that synchronous transmission of audio/video data can be ensured, disorder or loss of the audio/video data in the transmission process is avoided, and viewing experience of a user is improved. By monitoring the operation parameters of the communication channel in real time, the operation quality evaluation parameters of the communication channel can be obtained, so that the operation condition of the communication channel can be judged. When the operation quality evaluation parameter is lower than a preset parameter threshold, communication abnormality alarm is carried out, so that the communication problem can be found and solved in time, and the stability and reliability of data transmission are ensured.

In summary, according to the technical scheme, by means of data compression coding, communication state real-time monitoring and data packet splitting, audio and video stream data packet synchronous transmission, communication channel operation quality evaluation, abnormal alarm and the like, efficient, stable and reliable audio and video data transmission is realized, and communication quality and user experience between a virtual studio and a cloud server are improved.

Specifically, monitoring communication parameters of communication connection with the cloud server in real time, determining whether the current communication state simultaneously carries out data transmission of the video stream data packet and the audio stream data packet according to the communication parameters, including:

；

The technical effects of the technical scheme are as follows: by monitoring in real time a plurality of key parameters of a communication channel (including a rate of change per unit time of a remaining capacity of the communication channel, a rate of change per unit time of an actual bandwidth, and a rate of change per unit time of a communication frequency) in communication connection with a cloud server, a current state of the communication channel can be evaluated accurately in real time. This helps to discover problems or potential risks in the communication channel in time, and thus to take appropriate measures to adjust and optimize. According to the stability evaluation parameters obtained by real-time monitoring, whether the current communication channel meets the data transmission requirement of simultaneously carrying out video stream data packets and audio stream data packets can be judged. When the stability evaluation parameter reaches or exceeds a preset stability evaluation threshold, the state of the communication channel is good, the simultaneous transmission of video and audio data can be supported, and the efficiency and stability of data transmission are ensured. When the stability evaluation parameter is lower than the preset stability evaluation threshold, it indicates that the state of the communication channel may be poor, and corresponding adjustment needs to be performed, such as splitting the data packet, reducing the data transmission rate, etc., so as to ensure the reliability and continuity of the data transmission.

Through the real-time monitoring and evaluation of the communication channel state, the data transmission strategy can be flexibly adjusted according to the actual situation, and the problem that a large amount of data is blindly transmitted when the communication channel state is poor, so that the waste of communication resources and the reduction of communication quality are caused is avoided. Meanwhile, the data transmission rate and the priority can be timely adjusted according to the state change of the communication channel, so that the efficient utilization of the communication resources is realized. And the communication channel state is monitored and evaluated in real time, and the data transmission strategy is adjusted according to the evaluation result, so that the transmission quality and user experience of the audio and video data can be ensured to the greatest extent. When the communication channel state is good, high-efficiency and smooth data transmission can be realized; when the communication channel state is poor, the continuity and stability of data transmission can be ensured by splitting data packets, reducing the data transmission rate and the like, and the disorder or loss of audio and video data is avoided.

In summary, according to the technical scheme, the communication channel parameters of the communication connection with the cloud server are monitored in real time, and the stability of the communication channel is evaluated according to the parameters, so that the data transmission strategy is optimized, the efficient utilization of communication resources is realized, and the data transmission quality and the user experience are improved.

Specifically, when the communication state does not meet the data transmission requirement of the video stream data packet and the audio stream data packet, splitting the video stream data packet and the audio stream data packet to form a plurality of video stream sub-data packets and audio stream sub-data packets, including:

；

The technical effects of the technical scheme are as follows: when the communication state does not meet the simultaneous transmission requirement of the video stream data packet and the audio stream data packet, the technical scheme can dynamically adjust the data transmission strategy according to the stability evaluation parameter of the current communication channel. By splitting the video stream data packet and the audio stream data packet into a plurality of sub-data packets, the method can adapt to different communication channel states and ensure the continuity and stability of data transmission. The method takes the difference between the real-time state and the ideal state of a communication channel into consideration, and accurately determines the degree of data segmentation based on the relation between the residual capacity and the whole capacity of the channel. This helps to control the size of the sub-packets more accurately to accommodate the actual transmission capabilities of the communication channel.

And calculating the upper limit value of the data quantity of the video stream sub-data packet and the audio stream sub-data packet by comprehensively considering the preset reference data quantity of the sub-data packet, the channel residual capacity of the current communication channel and other factors. This approach ensures that the size of the sub-packets does not exceed the transmission capacity of the communication channel nor is it too small to affect the efficiency of the data transmission. When the video stream data packet and the audio stream data packet are split, the same starting time and ending time of the splitting are ensured, so that the synchronism of the audio and video data is maintained, and the problem of asynchronous audio and video caused by data splitting is avoided. Meanwhile, through monitoring the communication state in real time and adjusting the data transmission strategy in time when the requirements are not met, the technical scheme can reduce the possibility of interruption or delay of data transmission, thereby improving the watching experience of users. Meanwhile, the synchronism of the audio and video data is maintained, so that the satisfaction degree of the user is improved.

In summary, according to the technical scheme, through technical means of dynamically adjusting the data transmission strategy, accurately calculating the data division coefficient, setting a reasonable upper limit value of the data volume of the sub-data packet, ensuring synchronous transmission of audio and video data and the like, more efficient and stable data transmission is realized, and user experience is improved.

The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should be covered by the protection scope of the present invention by making equivalents and modifications to the technical solution and the inventive concept thereof.

Claims

1. The interactive broadcasting method based on real-time virtual-real fusion and real-time virtual-real is characterized by comprising the following steps: the method comprises the following steps:

2. The interactive presentation method based on real-time virtual-real fusion and real-time virtual-real as claimed in claim 1, wherein: aiming at capturing three-dimensional figures or objects in the second step, the method specifically comprises the following steps:

3. The interactive presentation method based on real-time virtual-real fusion and real-time virtual-real as claimed in claim 2, wherein: aiming at the green curtain matting processing in the third step, the method specifically comprises the following steps:

4. The interactive presentation method based on real-time virtual-real fusion and real-time virtual-real as claimed in claim 3, wherein: and D, simulating and calculating the shadow of the three-dimensional character or article through a shader algorithm in the fourth step, wherein the shader algorithm specifically comprises the following steps:

5. The interactive presentation method based on real-time virtual-real fusion and real-time virtual-real as claimed in claim 4, wherein: the interaction with the three-dimensional character or the object in the virtual scene in the fifth step is specifically as follows:

6. The interactive presentation method based on real-time virtual-real fusion and real-time virtual-real as claimed in claim 5, wherein: aiming at the interactive performance in the fifth step, the method further comprises the step of collecting voice of the three-dimensional character or object, and specifically comprises the following steps:

7. The interactive presentation method based on real-time virtual-real fusion and real-time virtual-real as claimed in claim 6, wherein: aiming at the online transmission in the step six, the method specifically comprises the following steps:

8. The interactive presentation method based on real-time virtual-real fusion and real-time virtual-real as claimed in claim 7, wherein: compression encoding video data in the virtual studio based on a video encoder, compression encoding voice data based on an audio encoder, and transmitting the encoded video stream and audio stream to a cloud server based on a real-time transmission protocol, wherein the compression encoding comprises the following steps:

sequentially sending the audio and video stream data packets to a cloud server;

；

9. The interactive presentation method based on real-time virtual-real fusion and real-time virtual-real as claimed in claim 8, wherein: monitoring communication parameters of communication connection with a cloud server in real time, judging whether the current communication state simultaneously carries out data transmission of a video stream data packet and an audio stream data packet according to the communication parameters, and comprising the following steps:

；

10. The interactive presentation method based on real-time virtual-real fusion and real-time virtual-real as claimed in claim 9, wherein: when the communication state does not meet the data transmission requirement of the video stream data packet and the audio stream data packet, splitting the video stream data packet and the audio stream data packet to form a plurality of video stream sub-data packets and audio stream sub-data packets, including:

；