CN116366871A

CN116366871A - Continuous-wheat video display method and device, equipment and medium thereof

Info

Publication number: CN116366871A
Application number: CN202310332814.4A
Authority: CN
Inventors: 鲍琦
Original assignee: Guangzhou Cubesili Information Technology Co Ltd
Current assignee: Guangzhou Cubesili Information Technology Co Ltd
Priority date: 2023-03-30
Filing date: 2023-03-30
Publication date: 2023-06-30

Abstract

The application discloses a method for displaying a wheat-linked video, a device, equipment and a medium thereof, wherein the method comprises the following steps: acquiring a mixed picture video stream forwarded by a content distribution network in a network live broadcast process, wherein the mixed picture video stream comprises more than two video images of a host user connected with a wheat; responding to a video layout switching instruction of a viewer user, and determining target playing windows of each anchor user defined by the video layout switching instruction, wherein part of target playing windows have a layering relationship; identifying and extracting video pictures occupied by video images of all anchor users in the full pictures of the mixed picture video stream, and extracting video images of all anchor users according to all the video pictures; and respectively drawing and displaying the video images of the anchor users to the corresponding target playing windows of the anchor users. The method and the device can flexibly adjust the layout of each video image of the mixed picture video stream of the network live broadcast.

Description

Continuous-wheat video display method and device, equipment and medium thereof

Technical Field

The present disclosure relates to the field of network live broadcasting technologies, and in particular, to a method for displaying a link video, a corresponding device, an electronic device, and a computer readable storage medium.

Background

In a network live broadcast scene, a host user pushes video streams to a live broadcast room, so that application purposes of talent exhibition, information sharing, knowledge education and the like are realized, the host user participates in social labor through the activities to obtain benefits, and overall social benefits are promoted.

In view of cost control, in the network live broadcast and play process, the terminal devices of the anchor users participating in the wheat connecting activity can push the live video streams produced respectively to the terminal device or the media server of one of the anchor users for mixed picture processing, for example, video images of two anchor users are spliced together adjacently, and thus the mixed picture video streams obtained through processing can be forwarded to the terminal devices of audience users through a content distribution network for display. Thus, the audience user can see the video images of a plurality of anchor users connected with the wheat only by acquiring a single-path mixed-picture video stream.

Although the mixed picture video stream can save network traffic, and offset the defect that the delay of the content distribution network is slightly higher than that of a self-built server responsible for forwarding the live video stream, the mixed picture video stream also brings difficulty to the terminal equipment to flexibly adjust the layout of video images of each anchor user in the same graphical user interface, and the mixed picture video stream can be realized by means of additional technical means. Therefore, how to adjust the mixed video stream of the wheat-linking anchor user in the terminal device and decouple the video images of the anchor users to realize flexible layout adjustment becomes a problem hoped to be solved in the industry.

Disclosure of Invention

A primary object of the present application is to solve at least one of the above problems and provide a method for displaying a headset video, a corresponding apparatus, an electronic device, and a computer-readable storage medium.

In order to meet the purposes of the application, the application adopts the following technical scheme:

one of the purposes of the present application is to provide a method for displaying a headset video, comprising the following steps:

acquiring a mixed picture video stream forwarded by a content distribution network in a network live broadcast process, wherein the mixed picture video stream comprises more than two video images of a host user connected with a wheat;

responding to a video layout switching instruction of a viewer user, and determining target playing windows of each anchor user defined by the video layout switching instruction, wherein part of target playing windows have a layering relationship;

identifying and extracting video pictures occupied by video images of all anchor users in the full pictures of the mixed picture video stream, and extracting video images of all anchor users according to all the video pictures;

and respectively drawing and displaying the video images of the anchor users to the corresponding target playing windows of the anchor users.

In an alternative embodiment, identifying and extracting video frames occupied by video images of each anchor user in the full frames of the mixed video stream includes:

extracting at least two adjacent image frames in the mixed picture video stream;

calculating frame difference information between at least two adjacent image frames, and determining boundary information between video images of each anchor user according to the frame difference information;

and determining the video picture occupied by the video image of each anchor user according to the boundary information and the full picture of the mixed picture video stream.

In an alternative embodiment, calculating frame difference information between the at least two adjacent image frames, and determining boundary line information between video images of each anchor user according to the frame difference information includes:

searching two rows of pixels with the minimum sum of pixel difference values of all pixel points in the same row in the mixed picture video stream based on frame difference information between each pair of adjacent image frames;

judging whether the two rows of pixels are positioned successively, and determining boundary information by using the positions of the two rows of pixels when the two rows of pixels are positioned successively;

and when the boundary line information is not determined, iteratively determining the boundary line information by adopting frame difference information between the next pair of adjacent image frames.

In an alternative embodiment, when two rows of pixels are consecutive, determining boundary information using positions of the two rows of pixels includes:

and when the positions of the two rows of pixels are unchanged, determining boundary information by using the positions of the two rows of pixels, otherwise, continuing to iteratively determine the boundary information by adopting frame difference information between the next pair of adjacent image frames.

extracting frame proportion information from the supplementary enhancement information of the mixed picture video stream, wherein the frame proportion information comprises the proportion of video images of all anchor users in the mixed picture video stream;

and calculating and determining the video picture occupied by the video image of each anchor user according to the picture proportion information based on the full picture of the mixed picture video stream.

In an optional embodiment, the drawing and displaying the video image of each anchor user to a corresponding target playing window of each anchor user includes:

Calculating the display resolution of the video image of each anchor user corresponding to the corresponding target playing window;

judging whether the display resolution of each video image is lower than a first resolution threshold and higher than a second resolution threshold, and when judging that the display resolution is positive, carrying out pixel interpolation on the corresponding video image by adopting an interpolation mode so as to realize image quality enhancement;

judging whether the display resolution of each video image is lower than a second resolution threshold, and when the display resolution of each video image is lower than the second resolution threshold, adopting a preset super-resolution enhancement model to enhance the image quality of the corresponding video image;

and drawing and displaying the video image with the enhanced image quality to a corresponding target playing window.

In an optional embodiment, after the video images of the anchor users are respectively drawn and displayed to the corresponding target playing windows of the anchor users, the method includes:

acquiring an image frame in the mixed picture video stream;

dividing a special effect image of the image frame by adopting a preset image dividing model;

determining the special effect type of the special effect image by adopting a preset special effect classification model;

and executing a local special effect playing instruction corresponding to the special effect type, playing a corresponding animation special effect in the current graphical user interface synchronously with the image frame, and covering the image in the animation special effect with the special effect image appearing in a target playing window with the largest display area.

A headset video display device adapted to one of the objects of the present application, comprising:

the video streaming module is used for acquiring a mixed picture video stream forwarded by a content distribution network in the network live broadcast process, wherein the mixed picture video stream comprises more than two video images of a host user connected with a wheat;

the switching control module is used for responding to video layout switching instructions of audience users and determining target playing windows of each anchor user defined by the video layout switching instructions, wherein at least two target playing windows have a stacking relationship;

the image extraction module is used for identifying and extracting video frames occupied by video images of all anchor users in the full frames of the mixed-picture video stream, and extracting the video images of all anchor users according to the video frames;

and the image display module is used for respectively drawing and displaying the video images of the anchor users to the corresponding target playing windows of the anchor users.

An electronic device provided in accordance with one of the objects of the present application includes a central processor and a memory, the central processor being configured to invoke execution of a computer program stored in the memory to perform the steps of the method for the communication video display described herein.

A computer readable storage medium adapted to another object of the present application is provided, in which a computer program implemented according to the method for displaying a headset video is stored in the form of computer readable instructions, which computer program, when being called by a computer for running, performs the steps comprised by the method.

A computer program product is provided adapted for another object of the present application, comprising a computer program/instruction which, when executed by a processor, carries out the steps of the method described in any of the embodiments of the present application.

Compared with the prior art, the method and the device for displaying the mixed picture video stream aim at the mixed picture video stream distributed through the content distribution network, the video picture occupied by the adjacent video images of all the main broadcasting users is identified, the video images of all the main broadcasting users are accurately extracted from the mixed picture video stream according to the video picture, when a video layout switching instruction is triggered by a response user, the video images of all the main broadcasting users are drawn and displayed to the corresponding target playing windows of all the main broadcasting users, and part of the target playing windows have a layering relationship, so that a plurality of video images which are originally adjacent are converted into a layout with the layering relationship, one video image can be displayed as a full-screen large window, the other video image is displayed as a small window, a live video stream of all the main broadcasting users is not required to be independently pulled by a viewer user side, and flexible adjustment of the display layout of the video images of all the main broadcasting users can be realized in terminal equipment which can also receive the mixed picture video stream while the minimum consumption of network flow is maintained.

Drawings

The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:

FIG. 1 is an exemplary network architecture employed by a webcast service in a webcast scenario of the present application;

FIG. 2 is a flow chart of an embodiment of a method for displaying a headset video according to the present application;

FIG. 3 is an exemplary diagram illustrating an effect of a video layout of a mixing video stream script in an embodiment of the present application;

fig. 4 is an exemplary diagram of an effect after a video layout of a mixed video stream is switched in an embodiment of the present application;

FIG. 5 is a schematic flow chart of determining video frames of video images of respective anchor users by using frame difference information of two adjacent image frames in an embodiment of the present application;

FIG. 6 is a flowchart illustrating determining boundary information of a video frame according to frame difference information in an embodiment of the present application;

fig. 7 is a schematic flow chart of image quality enhancement for video images extracted from a mixed video stream according to an embodiment of the present application;

FIG. 8 is a flowchart of identifying special effects in a mixed video stream to locally execute playback of corresponding animated special effects in an embodiment of the present application;

FIG. 9 is a schematic block diagram of a headset video display device of the present application;

fig. 10 is a schematic structural diagram of an electronic device used in the present application.

Detailed Description

Referring to fig. 1, an exemplary network architecture used in the application scenario of the present application includes

terminal devices

801, 802, a media server 81, an application server 82, an internal content distribution network 83, and an external content distribution network 84. The

terminal devices

801, 802 may be used to run a live room terminal program for a live room user or a viewer user to use a live room function, for example, the live video stream is uploaded to the media server 81 by the live user through the terminal device 801 thereof, or the live video stream of the target user is pushed to the terminal device 802 of the live user or the viewer user for playing through the media server 81 via the internal content distribution network 83 or the external content distribution network 84, respectively, and so on. The media server 81 may be responsible for performing a mixed picture processing on live video streams of different anchor users in a linking activity, and after video images of live video streams of a plurality of anchor users are adjacently spliced into a same mixed picture video stream, the mixed picture video stream is pushed to each

terminal device

801, 802 of each audience user in a live broadcast room of each anchor user through the internal content distribution network 83 or the external content distribution network 84. The application server 82 may be used to deploy a webcast service to maintain live-room-based interactions between anchor users and audience users.

In one embodiment, the internal content distribution network 83 may serve the anchor user exclusively, and the external content distribution network 84 may serve the viewer user exclusively, for example, forward the live video stream of the second anchor user to the first anchor user in the continuous motion, thereby facilitating the mixing process of the live video stream of the first anchor user and the live video stream of the second anchor user, so that the video images are spliced to form a mixed video stream, and pushed to the media server 81 for being distributed to the viewer user via the internal content distribution network 83 or the external content distribution network 84. In such an embodiment, the internal content distribution network 83 generally has a lower latency than the external content distribution network 84, and thus can serve the generation and forwarding of video streams on the anchor user side to ensure low latency, while the external content distribution network 84 has the advantage of low cost concurrency for a large number of audience users, although the latency is relatively late, and thus is beneficial.

In another embodiment either of the internal content distribution network 83 and the external content distribution network 84 may be used alone, with the forwarding of live video streams or mixed video streams being performed by the audience users serving the anchor users.

It should be noted that whether the mixed video stream is processed by the terminal device 801 of the anchor user or by the media server 81, it does not affect the inventive spirit of the present application.

The computer program product programmed by the method for displaying the continuous video can be run in the terminal equipment of the audience user, and through the running of the computer program product, the steps of the method are executed, so that the technical scheme of the application is realized, and the video images of the anchor users in the mixed video stream obtained from the content distribution network are displayed after layout adjustment again, regardless of whether the content distribution network is an internal content distribution network or an external content distribution network.

Referring to fig. 2, with reference to the above exemplary scenario and related schematic description, in one embodiment, the method for displaying a headset video of the present application includes the following steps:

step S1100, obtaining a mixed picture video stream forwarded by a content distribution network in the network live broadcast process, wherein the mixed picture video stream comprises more than two video images of a host user connected with a wheat;

in the network live broadcast process, the anchor user can invite other anchor users to participate in the linking activity together, taking two anchor users to participate in the same linking activity as an example, wherein a first anchor user is taken as an initiator, a second anchor user as a receiver is invited to participate, and after the second anchor user responds to the invitation of the first anchor user, both sides enter a linking mode. In the wheat linking mode, the live broadcasting room of the two anchor users is switched to a state of simultaneously displaying live video streams of the two anchor users, and the audience users can simultaneously see video images of the two anchor users.

In order to achieve the purpose that audience users in a live broadcast room of two anchor users can see that the two anchor users are simultaneously present in a graphical user interface, respective live video streams of the two anchor users are transmitted to a media server for network live broadcast. The live video streams of the two anchor users can be mixed and drawn in any one anchor user terminal device or in the media server to be synthesized into the same path of live video stream, namely, mixed and drawn video stream. In the mixed picture video stream, the live video streams of the two anchor users are cut and spliced, typically, adjacent splicing, so that the mixed picture video stream can present the video image layout effect as shown in fig. 3 when being played, that is, the effect that video images of all anchor users are displayed side by side in the same mixed picture video stream.

When a large number of, for example, more than three, main broadcasting users appear in the wheat connecting activity, the mixed picture processing can be performed according to the above mode, finally, corresponding to one wheat connecting activity, a single-path mixed picture video stream is synthesized, and the single-path mixed picture video stream is pushed to the terminal equipment of each audience user in the live broadcasting room where the main broadcasting users participating in the wheat connecting activity are located, so that all audience users in each live broadcasting room can see the same video image content.

After the media server obtains or generates the mixed video stream, the mixed video stream may be forwarded to the audience user via a content distribution network. In one embodiment, a background architecture for serving live webcasting includes an internal content distribution network and an external content distribution network, where the internal content distribution network is configured to distribute live video streams belonging to a host user between a media server and each host user, and the external content distribution network is configured to forward the mixed video stream to a viewer user, so as to distinguish characteristics of each of the internal and external content distribution networks, and ensure that a live video stream generation link on the host user side has a higher speed, and a mixed video stream pushing link on the viewer user side has a higher cost benefit. Of course, in other embodiments, the anchor user side and the audience user side may share the same content distribution network to forward the live video stream and the mixed video stream, which may be flexibly deployed as required.

When the terminal equipment or any one of the media servers on the side of the anchor user responsible for generating the mixed picture video stream obtains the live video streams of different anchor users participating in the same continuous play, the corresponding clipping processing can be carried out on the video images in the live video streams of all anchor users according to a certain full picture size, and then all the video images are spliced adjacently into the mixed picture video stream according to a certain sequence.

Of course, in other embodiments, adaptive image enhancement processing may be performed on the video images of each anchor user before stitching, so as to ensure that the mixed-picture video stream has higher video image quality.

Audience users who enter a live broadcast room holding a connecting event can continuously pull the mixed video streams from the content distribution network and then play and display the mixed video streams to a graphical user interface of the terminal equipment.

Step 1200, responding to a video layout switching instruction of a viewer user, and determining a target playing window of each anchor user defined by the video layout switching instruction, wherein part of target playing windows have a stacking relationship;

when the user needs to switch the layout relation between the video images of different anchor users in the mixed video stream, a corresponding video layout switching instruction can be issued, for example, the video layout switching instruction can be triggered through a specific touch gesture or through a control button provided in the graphical user interface. When the video layout switching instruction is triggered, a background process of the live program responds to the switching instruction, layout information representing the video layout corresponding to the switching instruction is determined, and target playing windows corresponding to the anchor users participating in the wheat connecting activity and provided in the layout information are obtained.

The play relationship of the target play windows of the respective anchor users may be defined in this application to include a stacked relationship, for example, for a terminal device such as a smart phone, the target play window of the first anchor user participating in the wheat connecting activity may be set to a full screen size, and the target play windows of the other anchor users or other anchor users may be set to a widget stacked above the target play window of the first anchor user. Fig. 4 shows an example of a stacking relationship of target play windows of two anchor users, which is available for reference. The background process of the live program obtains the position information of the playing windows of the anchor users in the graphical user interface through the layout information, and the position information can be specifically represented as corresponding coordinate information.

In one embodiment, the correspondence between each anchor user and its target play window may be handled as follows: for the anchor user of which the current audience user is concerned or as the anchor user of all people in the live broadcasting room where the current audience user is located, default the anchor user as a first anchor user, and the video image of the anchor user can be placed in a target playing window of a full screen for display; for other anchor users, one or more other target play windows may be randomly allocated. Generally, when generating the mixed video stream, the video image of the first anchor user may be placed at a default position, for example, at the left side, so that the user side of the audience can conveniently identify the user identity to which the video image belongs.

In addition, in other embodiments, the process of the live program may implement the mapping relationship exchange between the anchor users to which the two target play windows belong in response to the video image exchange instruction acted between each two target play windows by the viewer user. In this manner, one skilled in the art can flexibly set as desired to promote a fine-tuning operation in which the viewer user changes the video image display layout between multicast users.

Step 1300, identifying and extracting video frames occupied by video images of all anchor users in the full frames of the mixed-picture video stream, and extracting video images of all anchor users according to all the video frames;

in order to enable each target playing window to display the video image of the corresponding anchor user, identification of the video picture occupied by the video image of each anchor user in the mixed picture video stream is needed, and on the basis of identifying the video picture occupied by each anchor user in the whole picture of the mixed picture video stream, the video image of each anchor user corresponding to each video picture is cut and extracted from the mixed picture video stream so as to process the video images respectively.

One of the modes of identifying the video frames occupied by the video images of each anchor user in the full frames of the mixed video stream can be that the terminal equipment or the media server responsible for carrying out mixed picture processing on the video images of each anchor user provides corresponding frame proportion information, and the frame proportion information is hidden in the mixed video stream and transmitted together, so that a viewer user side can calculate the video frames occupied by the video images of each anchor user according to the frame proportion information.

In another way of identifying the video frames of each anchor user in the full frame of the mixed video stream, the live program process in the terminal device receiving the mixed video stream searches for boundary information between the video images of each anchor user based on two consecutive image frames in the mixed video stream without transmitting any splicing information about the video images through the mixed video stream, and then divides the video frames corresponding to the video images corresponding to each anchor user according to the boundary information.

It is easy to understand that the video picture of each anchor user is identified through the mixed video stream, so that the video images of different anchor users can be distinguished, and the video images of the anchor users are divided into different independent video images, so that the layout of the video images can be adjusted.

And step 1400, respectively drawing and displaying the video images of the anchor users to the corresponding target playing windows of the anchor users.

After extracting the video images of each anchor user from the mixed-picture video stream, respectively drawing and displaying each video image into a corresponding target playing window according to the corresponding relation between the video images of each anchor user and the target playing window of each anchor user, so as to realize the playing effect, and enabling the audience user to watch the video image synchronous playing effect of the multicast user displayed according to the new video layout in the graphical user interface.

In one embodiment, when each video image is rendered and displayed to its corresponding target play window, referring to fig. 4, the process may be as follows: firstly, designating the size of the whole target playing window by an opengl instruction, wherein the window is a region to be displayed of a first anchor user and is defined as a window 1; then, obtaining the region position of a target play window designated by a second anchor user by a service and to be scaled and adjusted, wherein the window is defined as a window 2; further, the texture coordinates TexLeft of the left first anchor user and the texture coordinates TexRight of the right second anchor user are obtained according to the process of step S1300; then, performing texture sampling on textures of the first anchor user on the left side according to texture coordinates TexLeft to obtain a video image of the first anchor user; then, a scaling algorithm is executed on the texture coordinates TexRight of the second anchor user on the right side, and the second anchor user is scaled to a texture area corresponding to the second anchor user shown in fig. 4, namely, a window 2; then, sampling the right second anchor user according to the scaled texture coordinates to obtain a video image of the scaled second anchor user; further, according to the output pixels of the areas where the different target playing windows are located, the window 2 outputs the image pixels corresponding to the second anchor user, and the window 1 outputs the image pixels corresponding to the first anchor user except the other areas including the position of the window 2; and finally, performing on-screen display according to the result output in the previous step.

In some embodiments, for a video image that is enlarged from a smaller display scale to a larger display scale, that is, a target playing window of a host user is enlarged relative to a display scale before implementing a video layout switching instruction, whether to interpolate an image or implement super-resolution enhancement processing on the enlarged video image may be determined according to whether the enlarged resolution of the video image does not meet a preset condition, so as to improve the image quality of the enlarged video image.

According to the above embodiments, for the mixed video stream distributed through the content distribution network, the video picture occupied by the adjacent video images of each anchor user is identified, the video images of each anchor user are accurately extracted from the mixed video stream according to the video picture, when the video layout switching instruction is triggered in response to the user, the video images of each anchor user are drawn and displayed to the corresponding target playing windows of each anchor user, and part of the target playing windows have a stacking relationship, so that the multiple video images which are originally adjacent are converted into a layout with a stacking relationship, for example, one video image is displayed as a full-screen large window, the other video image is displayed as a small window, the live video stream of each anchor user is not required to be independently pulled on the user side of the audience, the flexible adjustment of the display layout of the video images of each anchor user can be realized in the terminal device capable of receiving the mixed video stream while the minimum consumption of network flow is ensured, in addition, the background architecture of live broadcasting is convenient for the network can separate and use of the content distribution network with internal and external delay, and the cost of the two content distribution networks are balanced and comprehensive advantages are achieved.

On the basis of any embodiment of the present application, referring to fig. 5, identifying a video frame that is extracted from a video image of each anchor user and occupies a full frame of the mixed video stream includes:

step S2100, extracting at least two adjacent image frames in the blended video stream;

in order to identify the video frames of the video images of the anchor users from the mixed video stream, every two front and back adjacent image frames in the mixed video stream can be used as materials, and the subsequent process is applied to analyze. In general, during the playing of a host user, the position of the camera and the ambient light are relatively fixed, so that the boundary information in the imaged video stream image is relatively fixed, and in every two adjacent image frames, the boundary information of the video image is relatively fixed even if the video image is cut during the mixed drawing, so that the boundary of the video image of the host user can be identified according to the principle.

Step S2200, calculating frame difference information between at least two adjacent image frames, and determining boundary information between video images of each anchor user according to the frame difference information;

for the front and back two image frames contained in every two adjacent image frames, subtracting the two image frames according to pixel point one-to-one correspondence to obtain frame difference information, and obtaining the difference value of each pixel point in the full-frame range in the mixed-picture video stream. It will be understood that, taking the video layout of the two anchor users shown in fig. 3 as an example, in the left and right video images, the difference value in the frame difference information of the pixel points in the row where the rightmost boundary of the left video image is located and the pixel points in the row where the leftmost boundary of the right video image is located are relatively stable, and the difference value is relatively the lowest. Therefore, according to this principle, a row of pixels where the right side edge of the left side video image is located and a row of pixels where the left side edge of the right side video image is located are found, so that the boundary information of the video images of different anchor users is found in practice.

Step S2300, determining the video frames occupied by the video images of the anchor users according to the dividing line information and the full frames of the mixed video stream.

After the dividing line information is determined, each image frame of the mixed picture video stream can be delimited by the dividing line information, and video pictures occupied by video images of each anchor user can be respectively determined, wherein the video pictures can be represented as coordinate position information, for example, the coordinate position information of the video pictures occupied by each video image can be represented according to the coordinate representation modes of the upper left corner and the lower right corner.

For the situation that more than three anchor users of the mixed picture video stream are spliced adjacently in a more complex position mode, boundary line information can be searched according to the mode, the method is mainly embodied in that a plurality of groups of two adjacent rows of pixels are found, and then the layout relationship of the plurality of groups of two rows of pixels in a plane provided by the full-frame of the mixed picture video stream is utilized to determine the video frame in which the video image of each anchor user is located.

According to the above embodiment, it can be understood that, based on the frame difference information, the boundary information of the video images of different anchor users is found in the mixed video stream, and then the video frames occupied by the video images of each anchor user are determined according to the boundary information, so that the operation amount is low, the demarcation is accurate and rapid, and the external information is not needed, even if individual frame loss occurs in the transmission process of the mixed video stream, the mixed video stream is not affected.

On the basis of any embodiment of the present application, referring to fig. 6, calculating frame difference information between the at least two adjacent image frames, and determining boundary information between video images of each anchor user according to the frame difference information includes:

step S2210, searching two rows of pixels with the minimum sum of pixel difference values of all pixel points in the same row in the mixed picture video stream based on frame difference information between each pair of adjacent image frames;

taking the video layout of the mixed video stream shown in fig. 4 as an example, when the user side of the audience starts to receive the mixed video stream, first two image frames are extracted, the difference value of each corresponding pixel point between the two image frames is calculated, and frame difference information between the two image frames is formed.

And then, in the frame difference information, calculating the sum of the differences of each row and each column of pixel points by taking the row as a unit, and determining two rows of pixels with the minimum sum of the differences. In general, the two rows of pixels should be co-directional, such as in the co-row direction or in the co-column direction.

Step S2220, judging whether the two rows of pixels are successive in position, and determining boundary line information by using the positions of the two rows of pixels when the two rows of pixels are successive;

Further, according to the position coordinates of the two rows of pixels, whether the two rows of pixels are in two successive rows or not is judged, and if the two rows of pixels are in two successive rows, the positions of the two rows of pixels can be used for determining boundary line information. The boundary information may be expressed as a coordinate value of a direction in which two rows of pixels are located, as long as the subsequent calculation is facilitated.

In some embodiments, a decision threshold may be further set, and the sum of the differences between the two pixels in the successive rows that are determined preliminarily is compared with the decision threshold, and when one or two sums of the differences between the two pixels in the two rows is smaller than the decision threshold, the boundary information is determined, which may ensure that a more accurate boundary recognition effect is obtained.

Step S2230, when the boundary information is not determined, iteratively determining the boundary information using frame difference information between the next pair of adjacent image frames.

If the two rows of pixels are not consecutive, it indicates that the two rows of pixels are insufficient to determine the boundary information, and then the iteration may continue from step S2210, and a pair of adjacent image frames may be removed to continue to calculate the frame difference information to determine the boundary information as soon as possible, until the boundary information is determined, and the video frame of each anchor user may be determined according to the boundary information.

Of course, after determining the boundary information and then determining the video frame, the above process may also be iterated, and then the newly found boundary information is continuously compared with the determined boundary information, and when the boundary information changes, the video frame of each anchor user is readjusted with new boundary information, and the video image of each anchor user is repositioned, so that the video image partition of each anchor user can be tracked and adjusted in real time.

For this purpose, in one embodiment, after the determination is performed in step S2220, when two rows of pixels are consecutive, the positions of the two rows of pixels determined at the time are compared with the positions of the two rows of pixels determined in the previous round, when the positions are unchanged, the boundary information is determined by using the positions where the two rows of pixels are located, otherwise, step S2230 is performed to continue to iteratively determine the boundary information using the frame difference information between the next pair of adjacent image frames.

According to the above embodiment, it can be understood that, based on the frame difference information, the boundary information of the video images of different anchor users is found in the mixed video stream, and then the video frames occupied by the video images of each anchor user are determined according to the boundary information, so that the operation amount is low, the demarcation is accurate and rapid, and the external information is not required, even if individual frame loss occurs in the transmission process of the mixed video stream, the video frames are not affected, the dynamic change of the video frames of different anchor users in the mixed video stream can be adaptively and accurately determined, and the video images of different anchor users can be accurately cut.

On the basis of any embodiment of the present application, identifying and extracting a video frame occupied by a video image of each anchor user in a full frame of the mixed-picture video stream includes:

step S3100, extracting frame proportion information from the supplementary enhancement information of the mixed video stream, wherein the frame proportion information comprises the duty ratio of video images of all anchor users in the mixed video stream;

when the device responsible for implementing the mixed picture, such as the terminal device of the first anchor user or the media server, performs the mixed picture on the video images of the anchor users participating in the same continuous operation, the video images of the anchor users are spliced according to a certain picture proportion, so that the picture proportion information can be transmitted to the audience user side along with the mixed picture video stream, and the audience user side can calculate the video picture occupied by the video image of each anchor user according to the picture proportion information.

In one embodiment, to avoid adding transmission channels, the frame rate information may be added to additional enhancement information (SEI) of the mixed video stream while transmitting the mixed video stream, with the convenience provided by related video protocols such as h.264, h.265, etc.

Step S3200, calculating and determining the video picture occupied by the video image of each anchor user according to the picture proportion information based on the full picture of the mixed picture video stream.

After receiving the mixed picture video stream, the terminal equipment at the audience user side reads full picture information, namely size information, of the picture proportion information from the additional enhancement information, and the video picture of the video image of each anchor user can be calculated and determined according to the occupation ratio of the video image of each anchor user specified by the picture proportion information in the full picture.

According to the embodiment, the method and the device can adapt to certain protocols, the equipment responsible for mixed painting embeds the picture proportion information into the mixed painting video stream and pushes the picture proportion information to the audience user side, and the terminal equipment of the audience user side can calculate and determine the video picture occupied by the video image of each anchor user by directly utilizing the picture proportion information, so that the method and the device are quick.

On the basis of any embodiment of the present application, referring to fig. 7, the drawing and displaying the video image of each anchor user to a corresponding target playing window of each anchor user includes:

step S4100, calculating the display resolution of the video image of each anchor user corresponding to the corresponding target playing window;

When video images of respective anchor users are to be drawn and displayed on corresponding target playing windows, scaling usually occurs, and the quality of the scaled display image quality can be identified by the scaled display resolution. Therefore, the display resolution after the video image of each anchor user is drawn to the target playing window can be calculated according to the size information of the target playing window of the video image of each anchor user generated by matting in the mixed video stream. It will be appreciated that for a zoom operation to zoom in on a video image, the display resolution of the video image will typically be reduced; on the contrary, for the zooming operation for zooming out the video image, there is a possibility that the zooming out operation may cause jaggies after the video image is displayed, wherein the effect of the zooming-out operation is more remarkable, and thus, attention is paid.

Step S4200, determining whether the display resolution of each video image is lower than a first resolution threshold and higher than a second resolution threshold, and performing pixel interpolation on the corresponding video image by adopting an interpolation method to realize image quality enhancement when the display resolution of each video image is determined to be higher than the first resolution threshold;

further, the display resolution of each video image is determined to be lower than the first resolution threshold and higher than the second resolution threshold, wherein the second resolution threshold is lower than the first resolution threshold, and when the determination is positive, it is indicated that the corresponding scaling operation causes image distortion, but the distortion degree is relatively limited. The pixel interpolation mode can be implemented in a linear or nonlinear interpolation mode, and when the linear interpolation mode is adopted, a bicubic interpolation algorithm or any other feasible algorithm can be selected and used by a person skilled in the art flexibly.

Step S4300, judging whether the display resolution of each video image is lower than a second resolution threshold, and when the display resolution of each video image is lower than the second resolution threshold, adopting a preset super-resolution enhancement model to enhance the image quality of the corresponding video image;

and continuously judging whether the display resolution is lower than the second resolution threshold, and when the display resolution is lower than the second resolution threshold, indicating that the distortion degree of the zoomed video image is higher, wherein a common interpolation mode can hardly obtain a better image quality enhancement effect, so that the image quality enhancement processing can be carried out on the video image by adopting a super-resolution enhancement model trained in advance until convergence so as to greatly improve the video image quality of the zoomed video image.

And step S4400, drawing and displaying the video image with the enhanced image quality to a corresponding target playing window.

After the video image of each anchor user is enhanced according to the required image quality corresponding to the display resolution, the video image can be drawn and displayed in the corresponding target playing window according to the mode.

According to the above embodiments, it can be seen that, by distinguishing video images of different anchor users, respectively calculating the scaled display resolutions, and performing different image quality enhancement processing on the corresponding video images based on different levels of display resolutions by using the correspondence between the display resolutions and the image quality, it is ensured that after the video layout is adjusted, the video images of each anchor user participating in the communication activity can all obtain good image quality effects, and the user viewing experience is ensured to be good.

On the basis of any embodiment of the present application, referring to fig. 8, after the video images of the anchor users are respectively drawn and displayed to the corresponding target playing windows of the anchor users, the method includes:

step S1500, obtaining an image frame in the mixed picture video stream;

in a live broadcasting room, various animation effects are triggered frequently, and the situation that the animation effects are synthesized into a live video stream frequently occurs, wherein the animation effects can be synthesized in the live video stream of a single anchor user, or can be synthesized in a full-picture of a mixed picture video stream, the former can be cut in a mixed picture stage, the latter can be cut when a video image of different anchor users is cut on a viewer user side, and in any case, when the full-picture of the mixed picture video stream is cut, the animation effects which are normally displayed originally easily occur, and the situation that the video layout is not completely displayed after adjustment is carried out. In this regard, the present application may overcome the technical means by obtaining, frame by frame, an image frame in the mixed video stream, and performing subsequent processing based on the image frame.

Step 1600, dividing a special effect image of the image frame by adopting a preset image dividing model;

In order to identify the special effect images in the mixed picture video stream, a corresponding training sample is adopted to train an image segmentation model in advance, the image segmentation model can be constructed and trained and obtained based on a deep learning model of a U-net series, and the image segmentation model is used for acquiring position information of the special effect images relative to the image frames by acquiring image masks of various special effect images determined according to the given image frames.

Accordingly, the image frames are input into the image segmentation model, the image masks corresponding to the special effect images in the image frames are segmented by the image segmentation model, and the position information of the special effect images in the image frames is obtained in practice. This display position is typically determined by the position of the effect image in the target play window having the largest display area after the video layout adjustment.

In addition, the image frame is subjected to image segmentation according to the image mask, and the special effect image can be obtained.

Step 1700, determining the special effect type of the special effect image by adopting a preset special effect classification model;

in order to identify which animation effect the special effect image belongs to, a special effect classification model can be trained in advance, and a training sample and supervision tag pair is constructed by utilizing the mapping relation data between each special effect image of the animation special effect and the special effect type thereof and is used for training the special effect classification model so as to acquire the capability of determining the corresponding special effect type according to the input special effect image. The special effect classification model may be a neural network-based deep learning model.

Accordingly, for the special effect image segmented in the previous step, the special effect image is input into the special effect classification model, and the type of the special effect to which the special effect image belongs can be determined.

Step 1800, executing a local special effect playing instruction corresponding to the special effect type, playing a corresponding animation special effect in the current graphical user interface in synchronization with the image frame, and covering the image in the animation special effect with the special effect image appearing in the target playing window with the largest display area.

The live program locally defines a local special effect playing instruction corresponding to each special effect type in advance, according to the special effect type determined in the previous step, the local special effect playing instruction corresponding to the special effect type can be called and executed, the display position of the special effect image in the current video layout is transmitted and referred through the local special effect playing instruction, therefore, the local special effect playing instruction is executed, the corresponding animation special effect is played, the playing position of the local special effect playing instruction is aligned with the display position and synchronous with the image frame playing, and the obtained effect is that the special effect image of the animation special effect played locally can cover the same special effect image with the largest display area in the target playing window. Because the animation special effect is replayed in the local graphical user interface and can cover the original special effect image in the maximum display area, a better special effect playing effect can be obtained, and the special effect playing process is smoother and more natural.

According to the above embodiment, after the video layout of the video image of each anchor user connected with the video stream is adjusted for the mixed picture video stream, the special effect image in the mixed picture video stream is further accurately identified, the corresponding special effect type is determined according to the special effect image, and then the corresponding local special effect playing instruction is called and executed according to the special effect type, so that the pre-synthesized special effect image in the mixed picture video stream can be reproduced with high picture quality, and good user experience is ensured.

Referring to fig. 9, a connected video display device provided for one of the purposes of the present application includes a video streaming module 1100, a switching control module 1200, an image extraction module 1300, and an image display module 1400, where the video streaming module 1100 is configured to obtain a mixed video stream forwarded by a content distribution network in a network live broadcast process, and the mixed video stream includes video images of more than two connected anchor users; the switching control module 1200 is configured to determine, in response to a video layout switching instruction of a viewer user, a target playing window to which each anchor user defined by the video layout switching instruction belongs, where at least two target playing windows have a stacking relationship; the image extraction module 1300 is configured to identify and extract video frames occupied by video images of each anchor user in the full frames of the mixed video stream, and extract video images of each anchor user according to each video frame; the image display module 1400 is configured to respectively draw and display the video images of the anchor users to the corresponding target playing windows of the anchor users.

On the basis of any embodiment of the present application, the image extraction module 1300 includes: a frame pair extracting unit configured to extract at least two adjacent image frames in the mixed picture video stream; the boundary analysis unit is used for calculating frame difference information between at least two adjacent image frames and determining boundary information between video images of each anchor user according to the frame difference information; and the picture splitting unit is used for determining the video picture occupied by the video image of each anchor user according to the boundary information and the full picture of the mixed picture video stream.

On the basis of any embodiment of the present application, the boundary analysis unit includes: a boundary searching subunit, configured to search, based on frame difference information between each pair of adjacent image frames, two rows of pixels in which a sum of pixel difference values of all pixel points in the same row in the mixed video stream is minimum; the interface judging subunit is configured to judge whether the two rows of pixels are in position succession, and when the two rows of pixels are in position succession, the position where the two rows of pixels are located is used for determining boundary line information; and the iteration processing subunit is used for iteratively determining the boundary information by adopting the frame difference information between the next pair of adjacent image frames when the boundary information is not determined.

On the basis of any embodiment of the present application, the interface discrimination subunit includes: and when the positions of the two rows of pixels are unchanged, determining boundary information by using the positions of the two rows of pixels, otherwise, continuing to iteratively determine the boundary information by adopting frame difference information between the next pair of adjacent image frames.

On the basis of any embodiment of the present application, the image extraction module 1300 includes: a proportion obtaining unit, configured to extract frame proportion information from the supplemental enhancement information of the mixed video stream, where the frame proportion information includes a ratio of video images of the respective anchor users in the mixed video stream; and the picture determining unit is used for calculating and determining the video pictures occupied by the video images of all the anchor users according to the picture proportion information based on the full pictures of the mixed picture video stream.

On the basis of any embodiment of the present application, the image display module 1400 includes: the image quality analysis unit is used for calculating the display resolution of the video image of each anchor user corresponding to the corresponding target playing window; an interpolation enhancing unit configured to determine whether a display resolution of each of the video images is lower than a first resolution threshold and higher than a second resolution threshold, and when the determination is positive, perform pixel interpolation on the corresponding video image by adopting an interpolation manner to achieve image quality enhancement; the depth enhancement unit is used for judging whether the display resolution of each video image is lower than a second resolution threshold, and when the display resolution of each video image is lower than the second resolution threshold, a preset super-resolution enhancement model is adopted to enhance the image quality of the corresponding video image; and the display processing unit is used for drawing and displaying the video image with the enhanced image quality to a corresponding target playing window.

On the basis of any embodiment of the application, the wheat-connected video display device of the application further comprises: the image frame calling module is arranged to acquire image frames in the mixed picture video stream; the special effect image segmentation module is used for segmenting special effect images of the image frames by adopting a preset image segmentation model; the special effect type determining module is used for determining the special effect type of the special effect image by adopting a preset special effect classification model; and the special effect local playing module is used for executing a local special effect playing instruction corresponding to the special effect type, playing a corresponding animation special effect in the current graphical user interface in synchronization with the image frame, and covering the image in the animation special effect with the special effect image appearing in a target playing window with the largest display area.

In order to solve the technical problems, the embodiment of the application also provides electronic equipment. As shown in fig. 10, the internal structure of the electronic device is schematically shown. The electronic device includes a processor, a computer readable storage medium, a memory, and a network interface connected by a system bus. The computer readable storage medium of the electronic device stores an operating system, a database and computer readable instructions, the database can store a control information sequence, and when the computer readable instructions are executed by a processor, the processor can realize a wheat-linked video display method. The processor of the electronic device is configured to provide computing and control capabilities to support the operation of the entire electronic device. The memory of the electronic device may store computer readable instructions that, when executed by the processor, cause the processor to perform the method of the present application. The network interface of the electronic device is used for communicating with the terminal connection. It will be appreciated by those skilled in the art that the structure shown in fig. 10 is merely a block diagram of a portion of the structure associated with the present application and is not limiting of the electronic device to which the present application is applied, and that a particular electronic device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

The processor in this embodiment is configured to perform specific functions of each module and its unit in fig. 9, and the memory stores program codes and various types of data required for executing the above modules or units. The network interface is used for data transmission between the user terminal or the server. The memory in this embodiment stores program codes and data required for executing all modules/units in the communication video display device of the present application, and the server can call the program codes and data of the server to execute the functions of all units.

The present application also provides a storage medium storing computer readable instructions that, when executed by one or more processors, cause the one or more processors to perform the steps of the method for displaying a headset video of any of the embodiments of the present application.

The present application also provides a computer program product comprising computer programs/instructions which when executed by one or more processors implement the steps of the method described in any of the embodiments of the present application.

Those skilled in the art will appreciate that implementing all or part of the above-described methods of embodiments of the present application may be accomplished by way of a computer program stored on a computer readable storage medium, which when executed, may comprise the steps of embodiments of the methods described above. The storage medium may be a computer readable storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).

In summary, the method and the device can flexibly adjust the layout of the video images of each anchor user in the mixed video stream received by the user side in the network live broadcast process, thereby saving network traffic, being beneficial to embodying the cost advantage of the background deployment of the network live broadcast and improving the user experience of watching live video streams generated by the link.

Claims

1. The wheat connecting video display method is characterized by comprising the following steps of:

2. The method for displaying a cam video according to claim 1, wherein identifying the video frame that is occupied by the video image of each anchor user in the full frame of the mixed video stream comprises:

3. The method of claim 2, wherein calculating frame difference information between the at least two adjacent image frames, and determining boundary information between video images of each anchor user based on the frame difference information, comprises:

4. A method of displaying a headset video as claimed in claim 3 wherein when two rows of pixels are consecutive, determining the parting line information using the locations of the two rows of pixels comprises:

5. The method for displaying a cam video according to claim 1, wherein identifying the video frame that is occupied by the video image of each anchor user in the full frame of the mixed video stream comprises:

6. The method for displaying a wheat-linked video according to any one of claims 1 to 5, wherein rendering and displaying the video image of each anchor user to a corresponding target playing window of each anchor user, respectively, includes:

7. The method for displaying a wheat-linked video according to any one of claims 1 to 5, wherein after the video images of the respective anchor users are respectively drawn and displayed to the respective target play windows of the respective anchor users, the method comprises:

Acquiring an image frame in the mixed picture video stream;

8. A headset video display device, comprising:

9. An electronic device comprising a central processor and a memory, characterized in that the central processor is arranged to invoke a computer program stored in the memory for performing the steps of the method according to any of claims 1 to 7.

10. A computer-readable storage medium, characterized in that it stores in the form of computer-readable instructions a computer program implemented according to the method of any one of claims 1 to 7, which, when invoked by a computer, performs the steps comprised by the corresponding method.