CN116033189B - Live broadcast interactive video partition intelligent control method and system based on cloud edge cooperation - Google Patents

Live broadcast interactive video partition intelligent control method and system based on cloud edge cooperation Download PDF

Info

Publication number
CN116033189B
CN116033189B CN202310330567.4A CN202310330567A CN116033189B CN 116033189 B CN116033189 B CN 116033189B CN 202310330567 A CN202310330567 A CN 202310330567A CN 116033189 B CN116033189 B CN 116033189B
Authority
CN
China
Prior art keywords
video
interaction
live broadcast
edge
cloud
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310330567.4A
Other languages
Chinese (zh)
Other versions
CN116033189A (en
Inventor
郑伟平
李海平
曾宪国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aspire Technologies Shenzhen Ltd
Original Assignee
Aspire Technologies Shenzhen Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aspire Technologies Shenzhen Ltd filed Critical Aspire Technologies Shenzhen Ltd
Priority to CN202310330567.4A priority Critical patent/CN116033189B/en
Publication of CN116033189A publication Critical patent/CN116033189A/en
Application granted granted Critical
Publication of CN116033189B publication Critical patent/CN116033189B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a live broadcast interactive video partition intelligent control method and system based on cloud edge coordination, which are characterized in that through cloud edge coordination, ROI rendering calculation of video pictures is respectively carried out on the edge side and the cloud end of a near user according to different scene requirements, corresponding interactive video pictures are generated firstly, then the live broadcast interactive video is intelligently partitioned at the edge end and then transmitted back to a user terminal, so that a novel mode of interactive experience is achieved; through the compression of the refined sub-region, the human eye visual experience is almost unchanged, but the code rate can be reduced by 50% +, so that the requirement on the bandwidth is reduced, the transmission efficiency is improved, the delay is reduced, and the user's experience of watching live broadcast is better. High delay and blocking caused by transmitting live pictures with high definition and ultra-high definition image quality are reduced, and complete local-like experience is realized.

Description

Live broadcast interactive video partition intelligent control method and system based on cloud edge cooperation
Technical Field
The invention belongs to the technical field of live broadcast interaction network information communication industry, and particularly relates to a live broadcast interaction video partition intelligent control method and system based on cloud edge collaboration.
Background
Live interaction is an industry of very fire explosion in recent years, the development of video capability corresponding to live interaction is accelerated to a certain extent due to special events in recent years, and a large number of people are from strangers of live interaction video or originally a low-frequency user, so that fear and inertia are eliminated, custom gaps are crossed, the live interaction video capability huggers and beneficiaries are formed, and the number and the use frequency of the live interaction video are greatly increased.
The application range and frequency of the live interactive video are improved, and the requirements on the fluency and definition of the audio and video are improved; the higher the video frame rate, the higher the video fluency, the higher the video definition caused by the higher the video PSNR (or SSIM) score, and the higher the audio fluency and definition caused by the higher the audio MOS score; however, the improvement of the performance also places higher demands on network bandwidth and cost investment. With the development of technology, almost all data needs to be connected to the cloud, and if the data is transmitted back to a cloud server in a traditional manner for video processing and live streaming, huge network bandwidth pressure and cost problems are generated.
With the appearance of high-definition and ultrahigh-definition live video sources, the problems of large video delay, blocking and the like caused by overlarge bandwidth occupation of live streaming limit the development of the number of users watching live video.
The method comprises the steps of sampling content resource data of a video live broadcast of a China mobile government and enterprise department, and measuring and calculating the data as follows:
Figure SMS_1
from the experience of the practitioner, the network speed of smoothly playing standard definition video needs to reach 2Mbps (250 KB/S); the network speed of smoothly playing the high-definition video needs to reach 4Mbps (500 KB/S); theoretically supporting the bandwidth occupation of 8000 people for simultaneously playing standard definition video in a flow process to be 16Gbps, and considering the redundancy of the bandwidth, the design of the export bandwidth is about 20G; the bandwidth requirements for subsequent provision of high definition video services are also increasing. However, the firewall equipment can only support 10G bandwidth at maximum at present, and the service operation can be influenced due to longer firewall replacement period; in addition to the above-mentioned difficulties from the engineering implementation point of view, from the point of view of cost, bandwidth is an expensive resource, video playing has own busy-idle rule, and directly expanding the exit bandwidth as required can cause waste of resource cost.
Disclosure of Invention
In order to solve the problems in the prior art, the invention aims to provide a live broadcast interactive video partition intelligent control method and system based on cloud edge coordination, which are used for respectively carrying out ROI rendering calculation of video pictures on the edge side and the cloud end of a near user according to different scene requirements, so that high delay and cartoon caused by transmitting live broadcast pictures with high definition and ultra-high definition image quality are reduced, and complete local-like experience is realized.
The technical scheme adopted by the invention is as follows:
a live broadcast interactive video partition intelligent control system based on cloud edge coordination comprises a cloud server, an edge server and a plurality of live broadcast interactive clients;
each live broadcast interaction user side is used for interaction live broadcast and can send interaction instructions to a cloud server;
the cloud server is used for collecting an interaction instruction input by the live broadcast interaction user terminal, executing interaction logic according to the interaction instruction, performing cloud rendering on the interaction live broadcast video picture, responding in real time to generate a corresponding interaction video picture, forming a video stream through first encoding, and transmitting the video stream to the edge server;
the edge server is used for providing cloud service and IT environment service on the network edge side, performing ROI rendering calculation processing on video streams, dividing the size of a user region of interest, performing secondary intelligent video coding, compressing to form audio and video stream data, and selecting and transmitting the audio and video stream data to different live broadcast interactive user ends according to the sizes of different user regions of interest;
the edge end server comprises a plurality of edge computing nodes, and each edge computing node is used for performing ROI rendering computing processing on the audio and video stream;
and the live broadcast interactive user terminals are used for receiving the audio and video stream data and decoding and displaying the audio and video stream data.
Further, one of the live interaction clients is a live interaction video anchor client.
Further, the ROI rendering calculation process of the edge server includes the following:
by identifying the video type and the picture content, the live interactive video is intelligently partitioned, the pixel data which does not affect the picture quality is removed, the optimal parameters are intelligently selected according to the human eye vision model, and the video code rate is dynamically reduced.
Further, the edge server comprises a graph rendering server, an instruction processing server and an audio and video processing server;
the image rendering server is used for processing the image;
the instruction processing server is used for analyzing the interaction instruction;
the audio and video processing server is used for encoding and decoding the audio and video.
Further, the intelligent control algorithm for the ROI picture partition adopts a video coding technology based on the region of interest, and reduces quantization parameter values for the region of interest in the image, so that more code rates are distributed to improve picture quality; the quantization parameter value is improved for the region which is not interested, so that the distribution code rate is reduced, and the video code rate is reduced on the premise of not losing the whole quality of the image;
the logic thinking of the ROI video coding code rate allocation is as follows: before video coding, performing visual perception analysis on an input video scene to determine a region of interest; in the encoding process, the encoding parameters are adjusted, so that the encoding rate allocated to the region of interest is improved, the region of interest has better visual quality, and the allocated encoding rate is correspondingly reduced in other regions.
Further, the intelligent control algorithm for the ROI picture partition comprises the following contents: original image quality is reserved for the live interaction human eye attention area, image quality is enhanced, and common compression transcoding is performed for the non-human eye attention area.
Further, the edge server analyzes the video content through an artificial intelligence technology, distinguishes and identifies a human eye region of interest and a non-human eye region of interest, and uses the ROI to delineate the target of the human eye region of interest.
The invention also relates to a live broadcast interactive video partition intelligent control method based on cloud edge coordination, which uses the live broadcast interactive video partition intelligent control system based on cloud edge coordination and comprises the following operation steps:
s01, setting a plurality of edge computing nodes, and setting a plurality of live broadcast interaction user ends through the plurality of edge computing nodes;
s02, carrying out online interaction on a cloud server by a plurality of live broadcast interaction user terminals through a network, and carrying out interactive live broadcast video picture rendering on the cloud server by an interaction logic operation result of the online interaction;
s03, one or more live broadcast interaction user terminals in the live broadcast interaction user terminals send interaction instructions to a cloud server;
s04, the cloud server receives the interaction instruction and responds in real time to generate a corresponding interaction live video picture;
s05, the cloud server encodes the interactive live video picture for the first time to form a video stream;
s06, the cloud server transmits the video stream to the edge server;
s07, the edge server receives the video stream, and performs ROI rendering calculation processing through an ROI picture partition intelligent control algorithm;
s08, the edge server performs machine learning on the current video stream to obtain a decision result;
s09, dividing the size of the region of interest of the user by the edge server according to the decision result;
s10, the edge server performs secondary intelligent video coding, and compresses the secondary intelligent video coding to form audio and video stream data;
s11, the edge server returns the audio and video stream data to each live broadcast interactive video user terminal;
and S12, each live broadcast interactive video user terminal decodes and displays the audio and video stream data.
Still further, in the step S07, the following operations are included:
s071, defining an ROI region of interest for a video stream formed by moving live video pictures;
s0711, analyzing the video content by the edge server through an artificial intelligence technology, distinguishing and identifying a human eye attention area and a non-human eye attention area, and delineating a target of the human eye attention area by using the ROI;
s0712, reserving original image quality for the live interaction human eye attention area, enhancing the image quality, and performing common compression transcoding for the non-human eye attention area;
s072, cloud edge cooperation; the cloud management and the side management are used for redefining the boundary and mode of the battle construction;
s073, machine learning training is performed, and the ROI of the live broadcast picture is recorded.
Still further, in the step S072, intelligent video coding IVE is performed through a plurality of edge computing nodes; i.e. the video is second intelligent video coded according to the ROI requirements.
The beneficial effects of the invention are as follows:
according to the cloud edge collaboration based live broadcast interactive video partition intelligent control method and system, ROI rendering calculation of video pictures is respectively carried out on the edge side and the cloud end of a near user according to different scene requirements, corresponding interactive video pictures are generated firstly, then the live broadcast interactive video is continuously intelligently partitioned at the edge end and then transmitted back to a user terminal, and therefore a novel mode of interactive experience is achieved; through the compression of the refined sub-region, the human eye visual experience is almost unchanged, but the code rate can be reduced by 50% +, so that the requirement on the bandwidth is reduced, the transmission efficiency is improved, the delay is reduced, and the user's experience of watching live broadcast is better. High delay and blocking caused by transmitting live pictures with high definition and ultra-high definition image quality are reduced, and complete local-like experience is realized.
Drawings
FIG. 1 is a schematic diagram of an architecture of a live interaction video partition intelligent control system based on cloud edge collaboration;
fig. 2 is a schematic diagram of a live broadcast interactive video partition intelligent control method based on cloud edge coordination;
fig. 3 is a schematic flow chart of a live broadcast interactive video partition intelligent control method based on cloud edge coordination.
FIG. 4 is a flowchart of logical operation processing of an edge server in sequence and step by step in an ROI picture partition intelligent control algorithm based on a cloud edge cooperative live broadcast interactive video partition intelligent control method;
fig. 5 is a schematic diagram of the practical application of the ROI frame partition intelligent control algorithm, which is described by taking cloud games as an example of the live interactive video capability application in the live interactive video partition intelligent control method based on cloud edge collaboration.
Detailed Description
The following description of the technical solutions in the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.
As shown in fig. 1 to 5, in order to solve the common problems in the prior art, the invention provides a live broadcast interactive video partition intelligent control method and system based on cloud edge collaboration, and the whole planning scheme is as follows:
firstly, providing a live broadcast interactive video partition intelligent control system based on cloud edge coordination, which is formed by combining a cloud server, an edge server and a plurality of live broadcast interactive clients; the cloud server adopts a GPU server cluster, and the GPU is a graphic processor and has real-time high-speed parallel computing and floating point computing capabilities. The cloud server breaks through the resource limit of a single machine, and more machines can jointly complete three tasks of graphic rendering service, instruction processing service and audio/video processing service.
Each live broadcast interaction user side is used for interaction live broadcast and can send interaction instructions to the cloud server;
an application server can be respectively arranged at each live broadcast interaction user terminal, and live broadcast application software is operated through the application server; the application server also adopts a GPU server cluster, namely a GPU (graphics processing unit) with real-time high-speed parallel computing and floating point computing capabilities. The application server breaks through the resource limit of a single machine, so that more machines can jointly complete three tasks of graphic rendering service, instruction processing service and audio/video processing service.
The cloud server is used for collecting interaction instructions input by the live interaction user terminal, and the user can feed back operation through various live interaction user terminals, for example, the user can access through a computer terminal, and can have events of a mouse and a keyboard; the user accesses through the mobile phone terminal, and then has an event of the mobile phone touch screen; and the command feedback of all events is carried out through the real-time communication technology of the WebRTC standard protocol. The cloud server responds to the operation truly after processing, and the picture changes correspondingly. For example, a live video client software initiates an instruction on a user side, the instruction is sent to a cloud server, is returned after the background yellow-related non-judgment, and can see the content on a live video plane on the video client software.
After collecting an interaction instruction input by a live broadcast interaction user terminal, the cloud server executes interaction logic according to the interaction instruction, renders cloud rendering on an interaction live broadcast video picture, and processes various audios and videos, including a graph rendering service, an instruction processing service and an audio and video processing service; with the support of the cloud operation capability of games and software, large games and software can be operated in a browser and an APP client. The cloud server puts software and games to the cloud for running through cloud rendering, access is supported through the web and the mobile phone, and a user can realize operation experience close to local delay and image quality across any platform. Cloud rendering is a service based on underlying content, resources, scheduling. And constructing the PaaS capability of cloud upstream, cloud hand upstream and cloud application of the middle layer, and finally pushing out some finished products in the form of SaaS or solutions. Based on the Real-time communication technology of the WebRTC (Web Real-Time Communications) standard protocol, point-to-point (Peer-to-Peer) connection between browsers is established, and transmission of video streams and/or audio streams or other arbitrary data is realized.
The cloud server responds in real time to generate a corresponding interactive video picture, and the corresponding interactive video picture is transmitted to the edge server after being formed into a video stream through first coding.
The goal of cloud rendering is to provide experience capabilities close to native to the local platform, involving two key indicators of latency and image quality. The first encoding is performed after the video picture is encoded for transmission in order to reduce delay. The first encoding format adopts common H.264 and H.265.
The edge end server is used for providing cloud service and IT environment service on the network edge side, performing ROI rendering calculation processing on the video stream, dividing the size of the user region of interest, performing secondary intelligent video coding, compressing to form audio and video stream data, and selecting and transmitting the audio and video stream data to different live broadcast interaction user ends according to the sizes of different user regions of interest; the cloud service and IT environment service may directly employ conventional technical means in the prior art.
The edge server sets a plurality of edge computing nodes in a national range, and each edge computing node is used for performing ROI rendering computing processing on the audio and video stream. Each edge computing node is equivalent to a node control switch, and is turned on to process and turned off to not process.
And the live broadcast interactive user terminals are used for receiving the audio and video stream data and performing decoding display.
Further, one of the live interaction user terminals is a live interaction video anchor terminal; the live interaction video main broadcasting end is a source end for initiating live broadcasting, namely a main control party; the live broadcast interaction user terminal refers to other live broadcast interaction user terminals which access live broadcast and perform interaction response to the live broadcast interaction video main broadcasting terminal.
Further, the ROI rendering calculation process of the edge server includes the following:
by identifying the video type and the picture content, the live interactive video is intelligently partitioned, the pixel data which does not affect the picture quality is removed, the optimal parameters are intelligently selected according to the human eye vision model, and the video code rate is dynamically reduced.
Further, the edge server comprises a graph rendering server, an instruction processing server and an audio and video processing server;
the image rendering server is used for processing the image; for example, 3D rendering, mainly performing picture processing;
the instruction processing server is used for analyzing the interaction instruction; such as operational feedback, primarily the handling of the message interface;
the audio and video processing server is used for encoding and decoding the audio and video; such as H.264/H.265;
the three servers can be processed on the same physical machine.
The cloud server captures the rendered data through data acquisition, the rendering mode is carried out in a GPU server cluster, and the inside of the GPU is simply divided into a rendering unit and a decoding unit. The software comprises that the game uses a rendering unit, and all data are generated in a video memory after the rendering unit; and the real audio and video data are encoded by the encoder and transmitted to the WebRTC layer, the WebRTC layer is transmitted to the user side through the audio and video data stream, and the client side of the user side obtains the data and decodes the data to render and then displays pictures. The instruction processing is that the operation acquired by the application layer can drive the data to the device, the operation can be truly responded, and the picture can be correspondingly changed.
Further, the intelligent control algorithm for the ROI picture partition adopts a video coding technology based on the region of interest, and reduces quantization parameter values for the region of interest in the image, so that more code rates are distributed to improve picture quality; and the quantization parameter value is improved for the region which is not interested, so that the distribution code rate is reduced, and the video code rate is reduced on the premise of not losing the whole quality of the image.
The logic thinking of the ROI video coding code rate allocation is as follows: before video coding, performing visual perception analysis on an input video scene to determine a region of interest; in the encoding process, the encoding parameters are adjusted, so that the encoding rate allocated to the region of interest is improved, the region of interest has better visual quality, and the allocated encoding rate is correspondingly reduced in other regions.
Further, the edge server analyzes the video content through an artificial intelligence technology, distinguishes and identifies a human eye region of interest and a non-human eye region of interest, and uses the ROI to delineate the target of the human eye region of interest.
Specifically, the edge server analyzes video content through an artificial intelligence technology, marks visual focuses of human faces, human bodies, animals, scenes and the like as human eye attention areas based on a human eye visual model, marks backgrounds, props, plants and the like as non-human eye attention areas, and uses the ROI to delineate targets of the human eye attention areas.
Further, the intelligent control algorithm for the ROI picture partition comprises the following contents: original image quality is reserved for the live interaction human eye attention area, image quality is enhanced, and common compression transcoding is performed for the non-human eye attention area.
ROI region determination: 1) Face, human body, animal, object: positioning faces, human bodies, animals and objects, and adjusting image quality and coding parameters in a targeted manner; 2) Center region: the middle or fixed area of the picture, the picture of the specific area is enhanced; 3) Human eye focus area: acquiring focus data through an eye tracker, and determining a picture enhancement position; 4) Front-to-back frame variability region: the part with less change of the front frame and the rear frame is subjected to picture weakening, and the picture with sudden increase and sudden decrease of the front frame and the rear frame is strengthened; the flow of the logic operation processing of the specific edge server is shown in fig. 4.
The invention also relates to a live broadcast interactive video partition intelligent control method based on cloud edge coordination, which uses the live broadcast interactive video partition intelligent control system based on cloud edge coordination and comprises the following operation steps:
s01, setting a plurality of edge computing nodes, and setting a plurality of live broadcast interaction user ends through the plurality of edge computing nodes;
s02, carrying out online interaction on a cloud server by a plurality of live broadcast interaction user terminals through a network, and carrying out interactive live broadcast video picture rendering on the cloud server by an interaction logic operation result of the online interaction;
s03, one or more live broadcast interaction user terminals in the live broadcast interaction user terminals send interaction instructions to a cloud server;
s04, the cloud server receives the interaction instruction and responds in real time to generate a corresponding interaction live video picture;
s05, the cloud server encodes the interactive live video picture for the first time to form a video stream;
s06, the cloud server transmits the video stream to the edge server;
s07, the edge server receives the video stream, and performs ROI rendering calculation processing through an ROI picture partition intelligent control algorithm;
s08, the edge server performs machine learning on the current video stream to obtain a decision result;
according to the rules of the aforementioned ROI, the following 1) face, body, animal, object, etc. that need to be machine-learning trained: machine learning training is required, and human face, human body, animal and object recognition is improved; 2) A central region, not required, fixed region segmentation; 3) The human eye focusing area is not needed, and an eye movement instrument is not needed; (external apparatus) 4) the front-to-rear frame variability region, is not required.
S09, dividing the size of the region of interest of the user by the edge server according to the decision result;
s10, the edge server performs secondary intelligent video coding, and compresses the secondary intelligent video coding to form audio and video stream data;
s11, the edge server returns the audio and video stream data to each live broadcast interactive video user terminal;
and S12, each live broadcast interactive video user terminal decodes and displays the audio and video stream data.
Still further, the following operations are included in step S07:
s071, defining an ROI region of interest for a video stream formed by moving live video pictures;
s0711, analyzing the video content by the edge server through an artificial intelligence technology, distinguishing and identifying a human eye attention area and a non-human eye attention area, and delineating a target of the human eye attention area by using the ROI; specifically, a Fast R-CNN artificial intelligence technology is adopted, visual focuses of human faces, human bodies, animals, scenes and the like are marked as human eye attention areas based on a human eye visual model, and backgrounds, props, plants and the like are marked as non-human eye attention areas;
s0712, reserving original image quality for the live interaction human eye attention area, enhancing the image quality, and performing common compression transcoding for the non-human eye attention area;
s072, cloud edge cooperation; the cloud management and the side management are used for redefining the boundary and mode of the battle construction;
s073, machine learning training is performed, and the ROI of the live broadcast picture is recorded.
Still further, in step S072, intelligent video coding IVE is performed through a plurality of edge computing nodes; performing secondary intelligent video coding on the video according to the ROI requirement;
IVE refers to Intelligent video encoding, i.e., intelligent video coding, which has many well-established processing schemes and patents in the prior art. The invention directly adopts the specific technical scheme of IVE intelligent video coding in the prior art.
According to the cloud-edge-collaboration-based live broadcast interactive video partition intelligent control method and system, through cloud edge collaboration, ROI rendering calculation of video pictures is respectively carried out on the edge side and the cloud end of a near user according to different scene requirements, corresponding interactive video pictures are generated firstly, then the live broadcast interactive video is continuously intelligently partitioned at the edge end and then transmitted back to a user terminal, and therefore a novel mode of interactive experience is achieved; through the compression of the refined sub-region, the human eye visual experience is almost unchanged, but the code rate can be reduced by 50% +, so that the requirement on the bandwidth is reduced, the transmission efficiency is improved, the delay is reduced, and the user's experience of watching live broadcast is better. High delay and blocking caused by transmitting live pictures with high definition and ultra-high definition image quality are reduced, and complete local-like experience is realized.
Embodiment one:
scheme is briefly described:
according to the invention, through cloud edge coordination, the ROI rendering calculation of the video picture is respectively carried out on the edge side of the near user and the cloud end according to different scene requirements, so that high delay and blocking caused by transmitting the live picture with high definition and ultra-high definition image quality are reduced, and complete local-like experience is realized.
Edge computing is the provision of cloud services and IT environment services for application developers and service providers on the edge side of the network, "edge" refers to the edge located in the administrative domain, as close as possible to the data source or user, with the goal of providing computing, storage, and network bandwidth in close proximity to the data input or user. With the advent of the 5G age, cloud edge cooperation +5G can promote cloud online mode development and landing, and brand new experience upgrading can be brought to the video industry. Based on cloud edge cooperation mode, AI computing power is injected into the edge, events such as people/things/actions/expressions in live video stream are dynamically perceived, data are not required to be transmitted to a remote cloud end, partial or all video preprocessing and analysis can be completed at the edge side, the calculation, storage and network bandwidth requirements of a cloud center are reduced, the response speed is improved, and therefore customer experience perception is improved.
1. Live interaction application is operated on a cloud GPU machine, and the operated result is rendered on the cloud
2. The edge terminal service encodes and compresses the audio and video to generate audio and video stream data, and the edge intelligently partitions the live interaction video by identifying the video type and the picture content in the processing process, removes pixel data which does not affect the picture quality, intelligently selects the optimal parameters according to the human eye vision model, and dynamically reduces the video code rate.
3. Transmitting to user terminal, and finishing decoding display by terminal
4. In the interaction process of the terminal user, an interaction instruction is sent to the cloud, after the cloud receives the user side feedback instruction, the cloud responds in real time to generate a corresponding interaction video picture, and then the corresponding interaction video picture is continuously transmitted to the user terminal after the live interaction video is intelligently partitioned at the edge end, so that a novel mode of interaction experience is achieved.
The core idea of the algorithm is as follows: cloud edge collaborative live broadcast interactive video capability ROI picture partition intelligent control algorithm.
1. Defining ROI (region of interest) regions of interest for live interaction pictures
In the field of image processing, a region of interest (ROI) is an image region selected from an image, which is the focus of interest for your image analysis. In live interaction, video content is analyzed through an artificial intelligence technology, visual focuses of human faces, human bodies, animals, scenes and the like are marked as human eye attention areas based on a human eye visual model, backgrounds, props, plants and the like are marked as non-human eye attention areas, and targets of the human eye attention areas are defined by using the ROI, so that processing time can be reduced, accuracy can be increased, and bandwidth consumption can be reduced.
ROI picture partition intelligent control algorithm: original image quality is reserved for the live interaction human eye attention area, image quality is enhanced, and common compression transcoding is performed for the non-human eye attention area. Through the compression of the refined sub-region, the human eye visual experience is almost unchanged, but the code rate can be reduced by 50% +, so that the requirement on the bandwidth is reduced, the transmission efficiency is improved, the delay is reduced, and the user's experience of watching live broadcast is better.
The following describes the practical application of the ROI frame partition intelligent control algorithm, taking the cloud game as an example of the live interactive video capability application.
The video picture area of the dashed border is an object such as a wall of a building (the target person cannot come out of the wall and therefore the video user cannot pay attention to the part), and a control area solidified on both left and right sides (viewing live games only care themselves, the content of the control instruction area is ignored); the video picture area with solid border is focused by human eyes, such as the forthcoming target person, key prompt information (the prompt of insufficient ammunition), and the video picture prompt hitting the target, which are the most concerned contents of the live interaction user.
If the ROI is not enabled, the entire region will be video coded (compressed) and transmitted, thus putting pressure on network bandwidth and video storage. After the ROI function is started, important video pictures when a target is hit or moving areas such as a moving target person can be subjected to high-definition quality lossless coding, the code rate and the image quality of other non-concerned areas are reduced, standard definition video compression is performed, even the video of the area is not transmitted, and the purposes of saving network bandwidth occupation and video storage space are achieved.
2. Cloud-edge cooperation, "cloud" management, "edge" management war, redefining boundary and mode of "war construction
The video capability of live interaction is from 'end' to 'cloud' and 'side', and the nationwide edge computing nodes are utilized to conduct intelligent video coding IVE (Intelligent video encoding) on video pictures, namely intelligent coding is conducted on video according to the ROI requirements, so that efficient and reasonable utilization and distribution of computing power are achieved. On the premise of not losing image quality, the video coding performance is optimized, the network bandwidth occupancy rate is reduced, the storage space is reduced, and the large bandwidth and the low delay are ensured.
3. ROI for recording live pictures in machine learning exploration process
Although the human eye can distinguish between different ROI pictures of a video in an interactive live broadcast, the program cannot know what is "a picture" and what is "a new picture is entered". AI cannot independently consider the contextual association between each video frame (picture). Despite the Scene concept in Unity programming, we still need to define the "interface" themselves, by the time of video capability processing.
A specific flow chart refers to fig. 3.
Live broadcast picture has higher requirements on video capability processing, live broadcast watching experience of users is the most important, and high-definition video technology brings about improvement of video resolution and fineness of video pictures, and simultaneously, the video size generated after video coding is also improved continuously. The higher video file not only causes cost pressure of network bandwidth for video providers, but also improves the use threshold of the live interaction user side. The ROT-based picture intelligent control technology provided by the proposal is based on the application of cloud edge coordination in the field of live broadcast interactive video capability, can reduce the occupation of bandwidth on the premise of affecting the picture quality as little as possible, and can bring win-win to video providers and video users.
Compared with traditional coding, the subjective overall visual effect of the AI technology such as ROI coding is obviously improved, and the method is particularly obvious in a lower bandwidth environment. The ROI coding technology not only can obtain expected high-quality pictures, but also keeps lower code rate, and well solves the contradiction between the code rate and the picture quality.
In the field of video capability, the method is designed to select the size of the region of interest of the user, and the cloud edge cooperation is utilized to distribute the work processed at the cloud end to the edge end for processing in the past, so that the video computing efficiency is further improved, the experience of the user for watching video content is improved, and the method is a technical innovation aiming at the current video capability.
The invention is not limited to the above-described alternative embodiments, and any person who may derive other various forms of products in the light of the present invention, however, any changes in shape or structure thereof, all falling within the technical solutions defined in the scope of the claims of the present invention, fall within the scope of protection of the present invention.

Claims (2)

1. Live broadcast interactive video subregion intelligent control system based on cloud limit is cooperated, its characterized in that: the system comprises a cloud server, an edge server and a plurality of live broadcast interaction clients;
each live broadcast interaction user side is used for interaction live broadcast and can send interaction instructions to a cloud server; one of the live broadcast interactive user terminals is a live broadcast interactive video main broadcasting terminal;
the cloud server is used for collecting an interaction instruction input by the live broadcast interaction user terminal, executing interaction logic according to the interaction instruction, performing cloud rendering on the interaction live broadcast video picture, responding in real time to generate a corresponding interaction video picture, forming a video stream through first encoding, and transmitting the video stream to the edge server;
the edge server is used for providing cloud service and IT environment service on the network edge side, performing ROI rendering calculation processing on video streams, dividing the size of a user region of interest, performing secondary intelligent video coding, compressing to form audio and video stream data, and selecting and transmitting the audio and video stream data to different live broadcast interactive user ends according to the sizes of different user regions of interest;
the edge end server comprises a plurality of edge computing nodes, and each edge computing node is used for performing ROI rendering computing processing on the audio and video stream; the ROI rendering calculation process includes the following: by identifying the video type and the picture content, the live interactive video is intelligently partitioned, the pixel data which does not affect the picture quality is removed, the optimal parameters are intelligently selected according to the human eye vision model, and the video code rate is dynamically reduced;
the ROI picture partition intelligent control algorithm reserves original image quality aiming at a live-broadcast interactive human eye attention area, enhances the image quality and carries out common compression transcoding aiming at a non-human eye attention area;
the intelligent control algorithm of the ROI picture partition adopts a video coding technology based on the region of interest, and reduces quantization parameter values of the region of interest in the image, so that more code rates are distributed to improve picture quality; the quantization parameter value is improved for the region which is not interested, so that the distribution code rate is reduced, and the video code rate is reduced on the premise of not losing the whole quality of the image; the logic thinking of the ROI video coding code rate allocation is as follows: before video coding, performing visual perception analysis on an input video scene to determine a region of interest; in the encoding process, the encoding parameters are adjusted, so that the encoding rate allocated to the region of interest is improved, the region of interest has better visual quality, and the allocated encoding rate is correspondingly reduced in other regions;
the edge end server comprises a graph rendering server, an instruction processing server and an audio/video processing server, wherein the graph rendering server is used for processing images, the instruction processing server is used for analyzing interaction instructions, and the audio/video processing server is used for encoding and decoding the audio/video;
the edge server analyzes the video content through an artificial intelligence technology, distinguishes and marks a human eye attention area and a non-human eye attention area, and uses the ROI to outline the target of the human eye attention area;
and the live broadcast interactive user terminals are used for receiving the audio and video stream data and decoding and displaying the audio and video stream data.
2. A live broadcast interactive video partition intelligent control method based on cloud edge cooperation is characterized by comprising the following steps: the cloud-edge collaboration-based live interaction video partition intelligent control system comprises the following steps:
s01, setting a plurality of edge computing nodes, and setting a plurality of live broadcast interaction user ends through the plurality of edge computing nodes;
s02, carrying out online interaction on a cloud server by a plurality of live broadcast interaction user terminals through a network, and carrying out interactive live broadcast video picture rendering on the cloud server by an interaction logic operation result of the online interaction;
s03, one or more live broadcast interaction user terminals in the live broadcast interaction user terminals send interaction instructions to a cloud server;
s04, the cloud server receives the interaction instruction and responds in real time to generate a corresponding interaction live video picture;
s05, the cloud server encodes the interactive live video picture for the first time to form a video stream;
s06, the cloud server transmits the video stream to the edge server;
s07, the edge server receives the video stream, and performs ROI rendering calculation processing through an ROI picture partition intelligent control algorithm;
s071, defining an ROI region of interest for a video stream formed by moving live video pictures;
s0711, analyzing the video content by the edge server through an artificial intelligence technology, distinguishing and identifying a human eye attention area and a non-human eye attention area, and delineating a target of the human eye attention area by using the ROI;
s0712, reserving original image quality for the live interaction human eye attention area, enhancing the image quality, and performing common compression transcoding for the non-human eye attention area;
s072, cloud edge cooperation; the cloud management and the side management are used for redefining the boundary and mode of the battle construction;
performing intelligent video coding IVE of the video picture through a plurality of edge computing nodes; performing secondary intelligent video coding on the video according to the ROI requirement;
s073, machine learning training, namely recording the ROI of the live broadcast picture;
s08, the edge server performs machine learning on the current video stream to obtain a decision result;
s09, dividing the size of the region of interest of the user by the edge server according to the decision result;
s10, the edge server performs secondary intelligent video coding, and compresses the secondary intelligent video coding to form audio and video stream data;
s11, the edge server returns the audio and video stream data to each live broadcast interactive video user terminal;
and S12, each live broadcast interactive video user terminal decodes and displays the audio and video stream data.
CN202310330567.4A 2023-03-31 2023-03-31 Live broadcast interactive video partition intelligent control method and system based on cloud edge cooperation Active CN116033189B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310330567.4A CN116033189B (en) 2023-03-31 2023-03-31 Live broadcast interactive video partition intelligent control method and system based on cloud edge cooperation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310330567.4A CN116033189B (en) 2023-03-31 2023-03-31 Live broadcast interactive video partition intelligent control method and system based on cloud edge cooperation

Publications (2)

Publication Number Publication Date
CN116033189A CN116033189A (en) 2023-04-28
CN116033189B true CN116033189B (en) 2023-06-30

Family

ID=86074500

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310330567.4A Active CN116033189B (en) 2023-03-31 2023-03-31 Live broadcast interactive video partition intelligent control method and system based on cloud edge cooperation

Country Status (1)

Country Link
CN (1) CN116033189B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116527840A (en) * 2023-07-05 2023-08-01 卓望数码技术(深圳)有限公司 Live conference intelligent subtitle display method and system based on cloud edge collaboration
CN117528147A (en) * 2024-01-03 2024-02-06 国家广播电视总局广播电视科学研究院 Video enhancement transmission method, system and storage medium based on cloud edge cooperative architecture

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104735464A (en) * 2015-03-31 2015-06-24 华为技术有限公司 Panorama video interactive transmission method, server and client end
CN106453328A (en) * 2016-10-18 2017-02-22 乐视控股(北京)有限公司 Publishing method for live broadcast video file, publishing client and edge streaming media server
CN112533012A (en) * 2020-11-25 2021-03-19 北京达佳互联信息技术有限公司 Live broadcast room interactive information method and device
CN113162895A (en) * 2020-12-22 2021-07-23 咪咕文化科技有限公司 Dynamic coding method, streaming media quality determination method and electronic equipment
CN113613038A (en) * 2021-08-02 2021-11-05 成都航空职业技术学院 Intelligent streaming media service system and video stream scheduling method thereof
CN114727122A (en) * 2021-12-22 2022-07-08 南京渡涛智能科技有限公司 Transcoding and transmitting method for live VR service

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7958536B2 (en) * 2008-03-31 2011-06-07 Broadcom Corporation Video transmission system with timing based on a global clock and methods for use therewith
US9466130B2 (en) * 2014-05-06 2016-10-11 Goodrich Corporation Systems and methods for enhancing displayed images
CN106162221A (en) * 2015-03-23 2016-11-23 阿里巴巴集团控股有限公司 The synthetic method of live video, Apparatus and system
CN106550240A (en) * 2016-12-09 2017-03-29 武汉斗鱼网络科技有限公司 A kind of bandwidth conservation method and system
CN107105333A (en) * 2017-04-26 2017-08-29 电子科技大学 A kind of VR net casts exchange method and device based on Eye Tracking Technique
CN113301336A (en) * 2020-02-21 2021-08-24 华为技术有限公司 Video coding method, device, equipment and medium
CN115442615A (en) * 2021-06-04 2022-12-06 京东方科技集团股份有限公司 Video coding method and device, electronic equipment and storage medium
CN113489993A (en) * 2021-07-22 2021-10-08 Oppo广东移动通信有限公司 Encoding method, apparatus, encoder, device, and computer-readable storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104735464A (en) * 2015-03-31 2015-06-24 华为技术有限公司 Panorama video interactive transmission method, server and client end
CN106453328A (en) * 2016-10-18 2017-02-22 乐视控股(北京)有限公司 Publishing method for live broadcast video file, publishing client and edge streaming media server
CN112533012A (en) * 2020-11-25 2021-03-19 北京达佳互联信息技术有限公司 Live broadcast room interactive information method and device
CN113162895A (en) * 2020-12-22 2021-07-23 咪咕文化科技有限公司 Dynamic coding method, streaming media quality determination method and electronic equipment
CN113613038A (en) * 2021-08-02 2021-11-05 成都航空职业技术学院 Intelligent streaming media service system and video stream scheduling method thereof
CN114727122A (en) * 2021-12-22 2022-07-08 南京渡涛智能科技有限公司 Transcoding and transmitting method for live VR service

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于MEC的移动网络直播视频分发机制研究;贾庆民;李子姝;李诚成;谢人超;黄韬;;信息通信技术(第05期);全文 *

Also Published As

Publication number Publication date
CN116033189A (en) 2023-04-28

Similar Documents

Publication Publication Date Title
CN116033189B (en) Live broadcast interactive video partition intelligent control method and system based on cloud edge cooperation
CN111479112B (en) Video coding method, device, equipment and storage medium
WO2022100522A1 (en) Video encoding method, video decoding method, apparatus, electronic device, storage medium, and computer program product
CN113015021B (en) Cloud game implementation method, device, medium and electronic equipment
CN105791977B (en) Virtual reality data processing method, equipment and system based on cloud service
CN112543342B (en) Virtual video live broadcast processing method and device, storage medium and electronic equipment
US10242462B2 (en) Rate control bit allocation for video streaming based on an attention area of a gamer
Hemmati et al. Game as video: Bit rate reduction through adaptive object encoding
CN109304031A (en) A kind of virtualization cloud game platform based on isomery intelligent terminal
CN112616061B (en) Live interaction method and device, live server and storage medium
CN111951366B (en) Cloud native 3D scene game method and system
CN110969572B (en) Face changing model training method, face exchange device and electronic equipment
US11451858B2 (en) Method and system of processing information flow and method of displaying comment information
CN111614967B (en) Live virtual image broadcasting method and device, electronic equipment and storage medium
WO2021139359A1 (en) Image processing method and apparatus, electronic device, and storage medium
CN114463470A (en) Virtual space browsing method and device, electronic equipment and readable storage medium
CN111970565A (en) Video data processing method and device, electronic equipment and storage medium
CN113709560B (en) Video editing method, device, equipment and storage medium
CN112492324A (en) Data processing method and system
CN111643901B (en) Method and device for intelligent rendering of cloud game interface
CN111510769B (en) Video image processing method and device and electronic equipment
CN116744027A (en) Meta universe live broadcast system
Sun et al. A hybrid remote rendering method for mobile applications
CN114945097B (en) Video stream processing method and device
CN112702625B (en) Video processing method, device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant