CN114339393A

CN114339393A - Display processing method, server, device, system and medium for live broadcast picture

Info

Publication number: CN114339393A
Application number: CN202111362016.3A
Authority: CN
Inventors: 曾家乐
Original assignee: Guangzhou Cubesili Information Technology Co Ltd
Current assignee: Guangzhou Cubesili Information Technology Co Ltd
Priority date: 2021-11-17
Filing date: 2021-11-17
Publication date: 2022-04-12

Abstract

The application discloses a live broadcast picture display processing method, a server, electronic equipment, a live broadcast system and a storage medium. The method comprises the following steps: acquiring target object data of a target object and background image data of a background image; and sending the target object data and the background image data to a user terminal, so that the user terminal is used for combining the target object data and the background image data to display the layer of the target object on the layer of the background image in the live broadcast picture, and further used for responding to a view angle adjusting instruction to perform view angle adjustment matched with the view angle adjusting instruction on the target object displayed in the live broadcast picture. Through the mode, the interactive function of the live broadcast picture is enhanced.

Description

Display processing method, server, device, system and medium for live broadcast picture

Technical Field

The present application relates to the field of live broadcast technologies, and in particular, to a live broadcast screen display processing method, a server, an electronic device, a live broadcast system, and a storage medium.

Background

With the popularization of intelligent devices and the development of communication technologies, society has entered the era of intelligent interconnection. The network communication speed is faster and faster, and people can conveniently use the intelligent equipment to roam the network. The live broadcast technology enriches the use scenes of the intelligent equipment, and people can watch live broadcast or live broadcast anytime and anywhere, thereby enriching the life of people.

The interaction function between the live broadcast picture of the current live broadcast room and the audience users is few, and the live broadcast picture is easy to show a dull feeling, so that the stickiness of the audience users is poor.

Disclosure of Invention

The technical problem mainly solved by the application is to provide a display processing method of a live broadcast picture, a server, an electronic device, a live broadcast system and a storage medium, which can enhance the interactive function of the live broadcast picture.

In order to solve the above technical problem, the first technical solution adopted by the present application is: a display processing method of a live broadcast picture is provided, and the method comprises the following steps: acquiring target object data of a target object and background image data of a background image; and sending the target object data and the background image data to a user terminal, so that the user terminal is used for combining the target object data and the background image data to display the layer of the target object on the layer of the background image in the live broadcast picture, and further used for responding to a view angle adjusting instruction to perform view angle adjustment matched with the view angle adjusting instruction on the target object displayed in the live broadcast picture.

In order to solve the above technical problem, the second technical solution adopted by the present application is: a display processing method of a live broadcast picture is provided, and the method comprises the following steps: acquiring target object data of a target object and background image data of a background image; combining the target object data and the background image data to display the layer of the target object on the layer of the background image in the live broadcast picture; and responding to the visual angle adjusting instruction, and at least carrying out visual angle adjustment matched with the visual angle adjusting instruction on the target object displayed in the live broadcast picture.

In order to solve the above technical problem, the third technical solution adopted by the present application is: there is provided a server comprising a processor, a transceiver and a memory, the memory and the transceiver being respectively coupled to the processor, the memory storing a computer program, the processor being capable of executing the computer program to implement the display processing method as described above.

In order to solve the above technical problem, a fourth technical solution adopted by the present application is: there is provided an electronic device comprising a display, a processor, a transceiver and a memory, the display, the memory and the transceiver being respectively coupled to the processor, the memory storing a computer program, the processor being capable of executing the computer program to implement the display processing method as described above.

In order to solve the above technical problem, a fifth technical solution adopted by the present application is: a live system comprising a server as described above and an electronic device as described above, the server and the electronic device being communicatively connected.

In order to solve the above technical problem, a sixth technical solution adopted by the present application is: there is provided a computer-readable storage medium storing a computer program executable by a processor to implement the display processing method as described above.

The beneficial effect of this application is: different from the prior art, the target object data and the background image data are sent to the user terminal, so that after the user terminal combines the target object data and the background image data, the layer of the target object presented in the live broadcast picture is positioned on the layer of the background image, the layers of the target object and the background image are different and can be relatively independent but not the same, and further the target object is not limited by the background image, so that the user can independently perform interactive operation on the target object or the background image, such as visual angle adjustment, and the user terminal can perform corresponding visual angle adjustment on the target object in response to a visual angle adjustment instruction, thereby effectively enhancing the interactive function of the live broadcast picture, facilitating the user to perform visual angle adjustment on the target object in the live broadcast process, improving the playability and the impression of the live broadcast process, and reducing the lingering impression of the live broadcast process, thereby improving user stickiness.

Drawings

Fig. 1 is a schematic system composition diagram of an embodiment of a live broadcast system of the present application;

fig. 2 is a flowchart illustrating a first embodiment of a display processing method for a live view according to the present application;

FIG. 3 is a schematic timing diagram illustrating a display processing method for a live view according to a first embodiment of the present application;

fig. 4 is an exemplary scene diagram of a first embodiment of a display processing method for a live view according to the present application;

fig. 5 is another exemplary scene diagram of the first embodiment of the display processing method of the live view of the present application;

FIG. 6 is a schematic diagram of a global color histogram of a first embodiment of a display processing method for a live view according to the present application;

fig. 7 is a flowchart illustrating a display processing method for a live view according to a second embodiment of the present application;

FIG. 8 is a schematic block diagram of a circuit configuration of an embodiment of a server of the present application;

FIG. 9 is a schematic block diagram of a circuit configuration of an embodiment of an electronic device of the present application;

FIG. 10 is a schematic block diagram of the circuit structure of the computer readable storage medium of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

Through long-term research, the inventor of the application finds that the live pictures of the anchor in the live broadcast are relatively static. For example, the expression and action changes of the anchor are often dependent on the expression and action changes of the anchor, the audience cannot interact or operate the changes, or other objects in the live broadcast picture are often still, and the user cannot change the state of the live broadcast picture. The exchange between the audience and the live broadcast is that when the audience gives a virtual gift, a corresponding special effect is generated on the live broadcast picture, and the original appearance of the live broadcast picture is not changed actually. Therefore, the relatively static picture impression has a dull impression and a lack of interactive function along with the continuous live broadcast time, which results in poor watching viscosity of the audience. In order to improve the above technical problem, the present application proposes the following embodiments.

As shown in fig. 1, a live system 1 described in the present embodiment of the live system may include a server 10, an anchor 30, and a viewer terminal 30. The anchor terminal 20 and the audience terminal 30 may be electronic devices, specifically, the anchor terminal 20 and the audience terminal 30 are electronic devices installed with corresponding clients, and may be mobile terminals, computers, servers, or other terminals, the mobile terminals may be mobile phones, notebook computers, tablet computers, intelligent wearable devices, and the like, and the computers may be desktop computers, and the like. Both the anchor terminal 20 and the viewer terminal 30 are user terminals. The server 10 may pull the live data stream from the anchor terminal 20 and push the obtained live data stream to the viewer terminal 30. After the viewer terminal 30 acquires the live data stream, the live process of the anchor or guest can be watched. The mixing of the live data streams may occur at least one of the server 10, the anchor terminal 20 and the viewer terminal 30. Video or voice connections may be made between the anchor terminal 20 and the anchor terminal 20, and between the anchor terminal 20 and the viewer terminals 30. In the video microphone, the microphone connecting party may push the live data stream including the video stream to the server 10, and further push the corresponding live data to the corresponding microphone connecting party and the viewer terminal 30. The anchor terminal 20 and the viewer terminal 30 can display to the respective live pictures in the live room. The live view may have a target object, such as a human face of a main broadcast, or other objects, displayed therein. Of course, the anchor terminal 20 and the viewer terminal 30 are relative, and the terminal in the live broadcasting process is the anchor terminal 20, and the terminal in the live broadcasting watching process is the viewer terminal 30.

As shown in fig. 2, a first embodiment of a method for displaying a live view according to the present application, for example, with a server 10 as an execution subject, includes:

s100: target object data of a target object and background image data of a background image are acquired.

The target object is, for example, an object displayed in a live view of a live broadcast room, and may be a human face object, a torso object, an animal object, or another still object. The target object data is, for example, data including at least image data of the target object. The background image is, for example, a background formed based on a scene in which an object is present in a live view, a scene for setting off an object, or the like. The background image data is, for example, data including at least image data of a background image.

There are various ways in which the server 10 acquires the target object data and the background image data. First, the server 10 may acquire target object data and background image data from a user terminal, such as the anchor terminal 20. Secondly, the server 10 may also generate target object data and background image data after performing corresponding processing on the acquired video stream. Of course, the server 10 may also obtain the target object data and the background image data from a cloud, a usb disk, or other devices.

As shown in fig. 3, for the server 10 to obtain the target object data and the background image data in a self-generating manner, the following steps may be specifically mentioned:

s110: and acquiring a preset video frame in the video stream corresponding to the live broadcast picture.

The video stream is generated by the anchor terminal 20 during the live broadcast. The server 10 may obtain a live data stream including a video stream during a live process. The viewer terminal 30 may present a live view upon receiving its corresponding live data stream. The server 10 may capture a preset video frame in the video stream. The preset video frame may refer to one or more frames of video determined in the video stream in which the target object is displayed. During the live broadcast, the server 10 may capture preset video frames at preset intervals or triggered by preset conditions.

S120: and carrying out segmentation processing on the preset video frame to generate target object data and background image data.

After the server 10 collects the preset video frame, it performs segmentation processing on the preset video frame, for example, segmenting the target object from the preset video frame, so that the target object and the background are separated from each other and independent from each other. The server 10 may segment the preset video frame to obtain target object data and background image data, so that the server 10 may obtain the target object data and the background image data.

As for a specific segmentation process, the following steps included in step S120 can be referred to:

s121: and identifying the target object in the preset video frame.

After the preset video frame is acquired, the server 10 identifies the target object by using a corresponding identification algorithm, for example, a deep neural network (such as a target detection algorithm, an image segmentation network), and the like. Further, the target object may be identified using the YOLO algorithm. For example, the target object may be a human face object, and the server 10 recognizes the human face object in an image of a preset video frame by using a human face recognition algorithm.

S122: after the target object is identified, the target object is segmented from the preset video frame to obtain target object data.

Specifically, the server 10 may segment the target object from the preset video frame using a neural network algorithm, such as an image segmentation network, and obtain target object data. The target object data may include, for example, position data of the target object, segmentation position data, image data, and the like.

The segmentation position of the target object is, for example, a first segmentation position, and the server 10 may perform segmentation on the preset video frame by using the first segmentation position. Specifically, the following steps included in step S122 may be referred to:

s1221: a first segmentation location for segmenting a target object is determined in a preset video frame.

Alternatively, the server 10 may determine the first division position in the preset video frame using an image division network. For example, after identifying the target object, the image segmentation network may determine a first segmentation location along an outer edge of the target object. The first division position may be represented in coordinate data, such as a coordinate position in a preset video frame, or a coordinate position in a display screen.

S1222: and segmenting the target object from the preset video frame according to the first segmentation position to obtain the image data of the target object and the first segmentation position data corresponding to the first segmentation position.

After determining the first segmentation position, the server 10 may segment the preset video frame along/according to the first segmentation position, and further segment the target object from the preset video frame, so that the target object and the remaining background are separated from each other. In this way, the image data of the segmented target object and the first segmentation position data, that is, the above-mentioned target object data, can be acquired. The first segmentation position data is convenient for a subsequent user terminal to paste the target object back to the background image again to realize image combination.

S123: and filling the residual background of the preset video frame after the target object is segmented out to obtain background image data.

After the target object is segmented, the remaining background of the preset video frame may have a vacant area originally occupied by the target object. The server 10 may further perform filling processing on the remaining background of the preset video frame after the target object is segmented, and may specifically fill the vacant region, so that the remaining background may become a complete background image, and further obtain background image data.

There are many ways to perform the filling process on the residual background after the preset video frame is divided into the target object, and some of them are listed below by way of example.

The first mode is as follows: completion can be filled by generating a countermeasure network. See in particular the following steps:

s1231: and filling and completing the vacant areas in the residual background after the target object is segmented from the preset video frame by utilizing the trained generation countermeasure network.

The generation of the countermeasure network is a deep learning model and is an artificial intelligence algorithm for unsupervised machine learning. Generating a countermeasure network generally includes two parts: the generator and the discriminator realize corresponding functions by fitting corresponding generating and discriminating functions. Specifically, the generator may be a neural network model, and the discriminator may also be a neural network model. For example, the creator fills the remaining background to create a new background image, the determiner determines whether the new background image is "real" or not, and if not, feeds the result back to the creator, which continues to optimize and create another new background image, forming a dynamic game. The purpose is that the generator can finally generate a background image which is enough to be 'fake and trusting', and finally the purpose of completely filling the residual background is achieved.

Therefore, the generated countermeasure network is utilized to fill and complement the vacant area, so that the background image is more natural, more harmonious and more real, the visual effect and the attractiveness are further improved, and the user viscosity is convenient to improve.

The second mode is as follows: and collecting adjacent colors by using a neural network to complete the vacant area. See in particular the following steps:

s1232: and collecting color information adjacent to the vacant region of the preset video frame after the target object is segmented by utilizing a neural network, and filling the vacant region with color.

The color information of the periphery of the vacant area is used for complementing the vacant area, so that the vacant area can be integrated with the peripheral adjacent area after being complemented, and the background image is more harmonious and natural integrally. Color information may refer to information or data related to color, and may include, for example, one or more of color values, saturation, contrast, brightness, and the like. The neural network is used for collecting adjacent color information to fill the vacant areas, so that the vacant areas can be more natural and vivid after being filled, filling traces are reduced, and finally background images are more complete and harmonious.

Of course, in addition to the above-mentioned manner in which the server 10 performs segmentation processing on the acquired preset video frame and fills the remaining background, the server 10 may also directly acquire the target object data and the background image data transmitted by the anchor terminal 20. In other words, the steps described above regarding how the target object data and the background image data are specifically obtained may also be performed on the anchor terminal 20. After the anchor terminal 20 acquires the video stream through a camera or the like, one or more frames of the video stream may be collected as preset video frames, the preset video frames are segmented, the described target object data and background image data are generated and sent to the server 10, and then the server 10 may acquire the target object data and the background image data.

S200: and sending the target object data and the background image data to a user terminal, so that the user terminal can be used for combining the target object data and the background image data to display the layer of the target object on the layer of the background image in the live broadcast picture, and the user terminal can be further used for responding to a view angle adjusting instruction to perform view angle adjustment matched with the view angle adjusting instruction on the target object displayed in the live broadcast picture.

The server 10 transmits the acquired target object data and background image data to a user terminal, such as the viewer terminal 30, along with the live data stream. The viewer terminal 30 may process the video stream accordingly so that the layer of the target object in the live view presented by the video frame is displayed on the layer of the background image. In other words, the target object and the background image are in the mutually independent state, so that the target object can be correspondingly operated on the premise of not being influenced by the background image, and the operation on the target object does not influence or is not limited by the background image.

Specifically, the user terminal may be configured to combine image data of the target object and background image data according to the first segmentation position data, so as to display the layer of the target object on the layer of the background image at the first segmentation position in the live broadcast picture. In other words, the user terminal may reattach the target object to the original position in the background image, that is, the first segmentation position, but the layer of the target object is displayed on the layer of the background image, so that the target object can hide the corresponding area of the background image, and visually restore the effect before segmentation, but at the same time, the target object can be visually adjusted and is no longer limited by the background image.

Specifically, the user terminal may perform, in response to the view angle adjustment instruction, view angle adjustment matched with the view angle adjustment instruction on the target object displayed in the live view. In practical applications, the user may perform corresponding operations while watching the live broadcast through the viewer terminal 30. The viewer terminal 30 may generate a viewing angle adjustment instruction in response to a corresponding operation by the user, and perform a viewing angle adjustment on the target object in response to the viewing angle adjustment instruction.

The view angle adjustment is, for example, at least one of rotation, deformation, zooming, panning, and perspective of a target object displayed in the live view. For example, the target object may follow the user's viewing perspective such that the target object is clocked towards the user's viewing perspective. For example, if the target object is deformed or the like to correspond to the user's perspective, such as when the user is viewing a live view from a side view, the target object may be deformed accordingly to accommodate the user's side view in a more natural manner. For example, the target object is adapted to the perspective viewing angle of the user. Of course, there are many ways to adjust the viewing angle, and the above is only an example. The target object can be adjusted in view angle, which means that the target object can achieve a 3D effect or a 3D-like effect. A user watches live pictures through different visual angles, and the target object can be correspondingly adjusted, so that the target object presents a 3D or 3D-like effect.

The target object data and the background image data which are relatively independent are sent to the user terminal, so that after the user terminal combines the target object data and the background image data, a layer of a target object presented in a live broadcast picture is positioned on a layer of a background image, the target object and the background image are relatively independent and are not the same picture, and the target object is not limited by the background image.

The above scheme is exemplarily described in an exemplary scenario below:

as shown in fig. 4, after the server 10 acquires the preset video frame 500, the target object 510 is identified in the preset video frame by using a corresponding identification algorithm. After identifying the target object 510, the server 10 determines a first segmentation position by using the image segmentation network, and segments the target object 510 from the preset video frame 500 according to the first segmentation position to obtain image data of the target object, the first segmentation position data, and the like. The server 10 performs filling processing on the remaining background in the preset video frame to obtain a background image 520, and further obtains background image data and the like.

As shown in fig. 5, the user terminal, for example, the viewer terminal 30, may combine the target object data including the image data of the target object and the first division position data with the background image data, and may display the layer of the target object 510 on the layer of the background image 520 on the live view 530 of the user terminal. Since the target object 510 and the background image 520 are two layers independent of each other, the target object 510 can be adjusted. An exemplary coordinate axis oxyz is shown in fig. 4. For example, the target object 510 may be rotated about any axis, or moved in a certain direction, and more complex adjustments may be made. The target object 510 may present a 3D effect or a 3D-like effect in the live screen 530. The user terminal may perform a matching perspective adjustment on the target object 510 in response to the perspective adjustment instruction, for example, rotating the target object 510 around the z-axis, so that the orientation of the target object may be changed. Taking the face object of the anchor shown in fig. 5 as an example, the face object in fig. 5 rotates around the z-axis relative to the face object in fig. 4, so that the face object can present a 3D or 3D-like effect, and the interaction effect with the audience is improved.

Further, when the server 10 performs the segmentation process on the preset video frame, the segmentation process may also be performed on the local feature object on the target object. Specifically, the following steps included in step S122 may be referred to:

s1223: local feature objects of the target object are further identified in the target object.

The local feature object is, for example, a local feature on the target object. When identifying the target object, the server 10 may further identify a local feature object of the target object in the target object. Taking the target object as a face object as an example, the local feature objects include an eye object, a mouth object, a nose object, and the like, for example. For example, a human face object is recognized through a facial neural network, and a hair object, an eyebrow object, an eye object, an eyeball object, an eyelash object, a lip object, an ear object, a facial expression, and the like are on the human face object.

S1224: and segmenting the local characteristic object from the target object to obtain local characteristic data of the local characteristic object.

Specifically, the target object is segmented from the preset video frame by using an image segmentation network, and the local feature object is further segmented from the target object, so that at least three types of objects can be generated: and presetting the residual background of the video frame, the residual background of the target object and the local characteristic object. The local feature object is segmented from the target object, and local feature data of the local feature object can be obtained. The local feature data includes, for example, image data, position data, and segmentation position data of the local feature object. And processing the residual background of the target object to obtain object background image data.

S1225: and filling the residual background of the local characteristic object segmented from the target object to obtain object background image data of the object background image.

And after the local characteristic object is segmented from the target object, filling the vacant areas in the residual background of the target object to enable the residual background to be a complete background image. For example, taking a human face object as an example, after the eye object is segmented from the human face object, the original eye position is a vacant region. And completely filling the vacant area to ensure that the residual background is complete, generating an object background image, and further obtaining the object background image data of the object background image. As such, the target object data may include at least first segmentation location data, local feature data, object background image data, and the like.

After receiving the target object data and the background image data, the user terminal may be configured to combine the local feature data of the local feature object and the object background image data, so as to display the layer of the local feature object on the layer of the object background image in the live broadcast picture. Thus, the layer of the background image of the user terminal object is displayed on the layer of the background image in the live broadcast picture, and the layer of the local feature object is displayed on the layer of the object background image. By means of layer superposition after segmentation, the local feature object can not be limited by an object background image, and the object background image and the local feature object can not be limited by the background image, so that more objects can be adjusted, the 3D effect presented by the whole target object is more obvious, and the effect is better.

The user terminal may be configured to respond to the view angle adjustment instruction to further perform view angle adjustment on the local feature object displayed in the live view, which matches the view angle adjustment instruction. In other words, the user terminal responds to the view angle adjusting instruction, and accordingly can perform view angle adjustment on the local feature object and the object background image, which is matched with the view angle adjusting instruction. Of course, the local feature object and the object background image are not necessarily adjusted together, and may be adjusted as the case may be, or the trigger condition required for the local feature object and the object background image to be adjusted may be different.

For example, the target object includes a human face object, the local feature object includes an eyeball object, and the user terminal, in response to the view angle adjustment instruction, may respectively rotate an object background image of the human face object and the eyeball feature displayed in the live-action picture, in accordance with the view angle adjustment instruction, so as to adjust the orientation of the object background image and the eyeball object.

When the viewing angle of the viewer changes, corresponding operations may be performed on the viewer terminal 30, so that the viewer terminal 30 generates a corresponding viewing angle adjustment instruction, and correspondingly rotates the object background image and the eyeball object in response to the viewing angle adjustment instruction, so as to implement that the object background image and the eyeball object rotate along with the viewing angle of the user, and maintain the viewing angle toward the user. In general, by orienting the background image and the eyeball object, the viewer is always stared at, and a 3D effect or a 3D-like effect is realized. Therefore, the interactive function of the live broadcast picture is effectively enhanced, the interaction between the live broadcast picture and audiences is realized, the dull feeling of the live broadcast picture is reduced, and the viscosity of a user is improved. Although the local feature object is not labeled in fig. 3 and 4, and the process of segmenting the local feature object is not illustrated, it can be understood from fig. 3 and 4 that the perspective of the local feature object can be adjusted accordingly, so that the 3D effect or the 3D-like effect presented by the target object is better.

As for the specific segmentation of the target object, the following steps included in step S1224 can be referred to:

s1224 a: a second segmentation location is determined for segmenting the local feature object in the target object using the image segmentation network.

Alternatively, the server 10 may determine the first segmentation position in the preset video frame and the second segmentation position in the target object using an image segmentation network. For example, after identifying the target object, the image segmentation network may determine, for example, a first segmentation location along an edge of the target object and a second segmentation outlay along an edge of the local feature object. The first and second division positions may be represented in coordinate data, such as coordinate positions in a preset video frame, or coordinate positions in a display screen, etc., although the second division position may also be represented in coordinate positions in the target object.

S1224 b: and segmenting local features from the target object according to the second segmentation position to obtain local feature data and second segmentation position data corresponding to the second segmentation position.

After determining the first segmentation position and the second segmentation position, the server 10 may segment the target object along/according to the first segmentation position, segment the local feature object according to the second segmentation position, further segment the local feature object from the target object, and separate the local feature object and the object background image from each other. In this way, the segmented object background image data, the image data of the local feature object, and the second segmentation position data are acquired. The server 10 may transmit background image data, object background image data, image data of local feature objects, first division position data, second division position data, and the like to the user terminal.

The user terminal may be configured to combine the image data of the local feature object and the object background image according to the second segmentation position data, so as to display the layer of the local feature object on the layer of the object background image at the second segmentation position in the live broadcast. The local feature object is attached back to the second segmentation position of the object background image, but the layer of the local feature object is displayed on the layer of the object background image, so that the effect before segmentation can be restored as much as possible visually, and the local feature object can be visually adjusted without being limited by the object background image. The object background image is also not limited to the background image, and both the object background image and the local feature object can be visually adjusted.

The embodiment can also complete the image segmentation network by using the generation test object. Specifically, the following steps included after step S1224b in this embodiment may be referred to:

s1224 c: and combining the image data of the local characteristic object and the object background image data according to the second segmentation position data so as to display the layer of the local characteristic object on the layer of the object background image at the second segmentation position, and generating the test object.

In other words, the server 10 pastes the local feature objects back on top of the object background image at the second segmentation location, recombining into a "new" target object, i.e., the test object.

S1224 d: and comparing the test object with the target object to obtain difference data between the test object and the target object, and adjusting the image segmentation network by using the difference data.

The server 10 compares the test object with the target object, determines whether the test object and the target object are the same target object, that is, calculates the similarity between the test object and the target object, and obtains difference data representing the difference between the test object and the target object. The difference data is fed back to the image segmentation network to adjust and perfect the image segmentation network, so that the accuracy of the image segmentation network is higher and higher, and the segmented local characteristic object and the segmented object background image are more and more natural and real. The method for calculating the similarity between the test object and the target object may be a method for calculating the similarity between images in the prior art, and is not described herein again.

As for the filling processing of the remaining background of the target object in particular, two ways thereof are exemplified below.

The first way can be seen in the following steps included in step S1225:

s1225 a: a global color histogram of the target object is obtained.

As shown in fig. 6, the global color histogram reflects the composition distribution of colors in the image, i.e. which colors are present and the respective parameters of the various colors. The parameter of the color is the probability of the color appearing. Colors can be represented by color values, and a global color histogram can know which color values appear in an image and the probability of various color values appearing. In other words, the global color histogram may be referred to as a color value histogram.

After a global color histogram is obtained by image analysis of the target object, at least one color and the probability of occurrence may be selected in the histogram. Specifically, one or more kinds of the compounds may be selected. For example, three colors may be selected, such as three colors with the highest probability of occurrence.

S1225 b: and determining at least one color with the probability of occurrence in the front row in the global color histogram, and filling the color in the vacant region of the target object after the local feature object is segmented by using the at least one color.

The key color, i.e. the color with the highest probability of occurrence, or at least two colors with the probability of occurrence in the front row, is determined in the global color histogram. The probability of occurrence in the front row means that the probability of occurrence is in the front, for example, three colors with the first three probabilities of occurrence, or five colors with the first five probabilities of occurrence. After the at least one color is determined, the at least one color may be processed, such as color blending, and then the void area may be filled.

The main tone of the target object can be effectively determined by obtaining the color according to the occurrence probability, so that the tone of the filled vacant area is generally consistent with the overall tone of the target object, the vacant area looks uniform, harmonious and natural, and the vacant area can have a good visual effect even if not shielded by a local feature object.

The second mode is as follows: and performing filling processing through a neural network algorithm. See step S1225 including the following steps:

s1225 c: and acquiring color information adjacent to the vacant region of the target object after the local characteristic object is segmented by utilizing a neural network, and filling the vacant region with color.

The color information of the periphery of the vacant region of the target object is acquired by utilizing a neural network algorithm to complement the vacant region, so that the vacant region can be integrated with the periphery after being complemented, and the background image of the object is integrally more harmonious and natural. Color information may refer to information or data related to color, and may include, for example, one or more of color values, saturation, contrast, brightness, and the like. The neural network is used for collecting adjacent color information to fill the vacant areas, so that the vacant areas can be more natural and vivid after being filled, filling traces are reduced, and finally background images are more complete and harmonious.

Of course, in the preset video frame, not only the target object may be identified, but also objects other than the target object may be identified and segmented. The method specifically comprises the following steps:

s130: target object data of a target object, background image data of a background image, and peripheral object data of a peripheral object are acquired.

A peripheral object refers to an object that is different from the target object. For example, in a preset video frame, one or more of the objects other than the target object may be a peripheral object. For example, the target object is a face object, the peripheral objects may include, for example, a chair, a keyboard, a mobile phone, a pet, a cabinet, and the like in a main live scene, and objects other than the face object may be used as the peripheral objects.

The server 10 may perform a segmentation process on a preset video frame to generate target object data, background image data, and peripheral object data.

Specifically, the server 10 may identify a target object, a peripheral object, and the like in the live scene using an algorithm such as a deep neural network, and segment the target object and the peripheral object from a preset video frame through an image segmentation network to obtain target object data and peripheral object data. Of course, for a specific segmentation process of the peripheral object, reference may be made to the aforementioned segmentation process of the target object, such as determining a segmentation position and recording segmentation position data.

The server 10 also performs a filling process on the remaining background after the target object is segmented from the preset video frame and after the peripheral object is segmented. The specific filling process may refer to the description of the foregoing steps of S123 and the like.

S230: and transmitting the target object data, the peripheral object data and the background image data to the user terminal.

The server 10 transmits the obtained target object data, background image data, peripheral object data, and the like to the user terminal along with live streaming data. The user terminal is used for combining the target object data, the peripheral object data and the background image data so as to display the layer of the target object and the layer of the peripheral object on the layer of the background image in a live broadcast picture. Specifically, the user terminal may process the video frame by using the target object data, the peripheral object data, and the background image data, so that both the layer of the target object and the layer of the peripheral object, which are presented in the live view, may be displayed on the layer of the background image.

Therefore, the background image is an independent and complete background image, the target object and the peripheral object are not limited by the background image at the same time, the visual angle of the target object is adjusted, the peripheral object is moved, and the integrity of the background image is not influenced.

The user terminal may perform, in response to the view angle adjustment instruction, view angle adjustment matched with the view angle adjustment instruction on the target object displayed in the live view.

The user terminal may move the peripheral object displayed in the live view to a position matching the position adjustment instruction in response to the position adjustment instruction. For example, the viewer may press and hold a peripheral object, such as a keyboard, in the dragged live frame on the display screen of the viewer terminal 30 to move the peripheral object to another position, so that the user can move the peripheral object according to his or her needs.

Of course, the peripheral object may be at least one of rotated, translated, perspective, and zoomed.

By sending the peripheral object data, the target object data and the background image data to the user terminal, the user terminal can combine the peripheral object data, the target object data and the background image data to display the layers of the peripheral object and the target object on the layer of the background image in a live broadcast picture, so that a user can conveniently adjust the visual angle and position, the interaction function is further enhanced, the interaction efficiency is improved, the limitation that audiences cannot interact with the live broadcast picture in the traditional live broadcast technology is broken through, the dull and uninteresting feeling of the live broadcast picture is reduced, and the stickiness of the user is improved.

Of course, in order to further enhance the viewer's stickiness and the intelligence of the live system 1, the present embodiment may further include the following steps:

s310: acquiring attribute data of each anchor user concerned by a current user of a user terminal to form virtual attribute data.

Specifically, attribute data of an anchor user concerned by a current user of the user terminal is acquired. The attribute data may include, for example, appearance data, body data, facial data, clothing data, topic data, character data, and the like, and the virtual attribute data is calculated by integrating them.

The server 10 can combine virtual attribute data preferred by the audience through big data calculation by the attribute data of each anchor concerned, and can further produce a virtual model.

S320: and searching anchor users similar to the attribute data and the virtual attribute data, and recommending the live broadcast room information corresponding to the searched anchor users to the user terminal.

The server 10 may also utilize the virtual attribute data to search for similar anchor users on the platform. For example, by calculating the similarity between the virtual attribute data and the attribute data of the searched anchor user, the distance between the vectors may be calculated in a multi-dimensional vector manner, and the distance may be compared with a preset threshold value to calculate the similarity between the two. The server 10 recommends the searched live room information (e.g., live room ID) of the anchor user to the user terminal. In this way, the server 10 can also implement more refined recommendations, with higher recommendation effectiveness.

Of course, the viewer may score the viewed anchor or make a corresponding rating. The server 10 can collect the evaluation of the audience, and further optimize the live broadcast picture to be more suitable for the preference of the audience. See in particular the following steps:

s410: and obtaining label data generated by identifying at least one item of attribute of the target object in different live broadcast rooms by the current user of the user terminal.

The viewer may tag the target object in the live view while watching a different live view through the viewer terminal 30. The target object is, for example, a human face object whose attributes include, for example, eyes, eyebrows, mouth, nose, hair, and face shape, and the like. The identification may include, for example: eye size, whether the eyes are single or double, eyebrow style, whether the skin is white or not, whether the melon seed face exists or not, and the like. The viewer may label a anchor, such as "big eye, white skin, lipstick number XX", etc. Viewers watch different anchor rooms and all can tag the anchors. Or, the live broadcast interface may pop up a popup window for the viewer to label, and then obtain the label data. Thus, the server 10 may obtain tag data generated by the user terminal identifying at least one attribute of a target object of a different live room.

S420: and respectively counting the label data corresponding to the at least one item of attribute to determine the label data with the highest quantity in each item of attribute.

The server 10 counts the tag data generated by the user of the user terminal, for example, the number of "big eye" tags is 50, "small eye" tags is 2, and the number of "big eye" tags is the highest in the attribute of eyes. The number of labels with "lipstick number XX 1" was 30, the number of labels with "lipstick number XX 2" was 10, and the number of labels with "lipstick number XX 1" was the highest. And counting the number of the labels corresponding to each attribute in such a way to determine the label data with the highest number in each attribute.

S430: and generating a second adjusting parameter matched with the label data with the highest quantity corresponding to the at least one attribute respectively.

And generating a second adjustment parameter by using the tag data with the highest quantity in each attribute obtained through statistics, wherein the second adjustment parameter adjusts the video stream issued to the client, for example, a target object in each video frame of the video stream is processed. The second adjustment parameter is a parameter for performing image processing on a video frame in which each attribute of the target object is displayed in the video stream.

S440: and adjusting at least one attribute of the target object in the video stream by using the second adjustment parameter, and sending the adjusted video stream to the user terminal.

The server 10 adjusts the corresponding attribute of the target object in the video stream using the second adjustment parameter. For example, the label data of "big eye" may adjust the eyes of a human face object in a video stream to a size corresponding to the label data. The tag data of "lipstick number XX 1" may perform color adjustment on the lips of a human face object in a video stream to adjust to the lipstick number. And after the second adjustment parameter is adjusted, sending the adjusted video stream to the user terminal. Therefore, by collecting, calculating and counting the tag data of the user for tagging the main broadcast, the preference of the user is analyzed, the corresponding adjusting parameters are generated, the video stream issued to the user terminal is adjusted, and personalized adjustment can be realized, so that each audience can see the live broadcast picture related to the habit and the preference of the audience, the method is more intelligent and personalized, and the user viscosity is improved.

The present embodiment can enhance the interactive function of the anchor and live pictures in addition to the interactive function of the audience and the live pictures. In this embodiment, a anchor adjustment mechanism may be established, specifically refer to the following steps included before step S110:

s111: and receiving the video stream of the anchor terminal which utilizes the first adjusting parameter to adjust the target object, and storing the first adjusting parameter of the anchor terminal in the current live broadcast.

The anchor may make adjustments to target objects in the captured video stream at anchor terminal 20. Taking the target object as a face object as an example, a setting panel can be established to adjust the position size and color of each five sense organs of the anchor, such as left eye magnification, eyebrow thickening, hair thickening, nose thickening, eyelashes, lip thickening, left and right eye size alignment, eyebrow alignment, eyeball color change and other detailed operations. The anchor may operate on the setup panel and the anchor terminal 20 generates a first adjustment parameter to perform image processing on the video frames of the captured video stream. The anchor terminal 20 sends the adjusted video stream and the corresponding first adjustment parameters to the server 10. The server 10 stores the first adjustment parameter.

The server 10 may also complement the first adjustment parameter for setting the corresponding template. For example, the attribute that is not involved in the first adjustment parameter, such as skin color, the server 10 may set the skin color at a default value, so that the first adjustment parameter can fully adjust the target object. For example, the first adjustment parameter only relates to 10 items, and the template is set to 12 items, the server 10 may configure the remaining two items with default values, so that the first adjustment parameter is saved in the entire template style.

S112: and when a new live broadcast request of the anchor terminal is received after the secondary live broadcast, sending the stored first adjustment parameter to the anchor terminal, so that the anchor terminal is used for adjusting the target object by using the stored first adjustment parameter in the live broadcast corresponding to the new live broadcast request.

The anchor finishes the current live broadcast, and when starting the next live broadcast, the server 10 may issue the stored first adjustment parameter to the anchor side. That is, after receiving the new live broadcast request from the anchor after the secondary live broadcast, the server 10 sends the stored first adjustment parameter to the anchor, so that the anchor is configured to adjust the target object by using the received first adjustment parameter in the new live broadcast. Therefore, the adjustment time of the anchor can be effectively saved, and the broadcasting efficiency of the anchor is improved. In addition, the server 10 may also recommend a similar adjustment scheme to the anchor terminal according to the received first adjustment parameter, so that the adjustment scheme of the anchor terminal is diversified and enriched, and the adjustment mechanism of the anchor is more convenient.

Therefore, the method and the device can establish a main broadcast adjusting mechanism, can recover the first adjusting parameter of the last live broadcast to the next live broadcast of the main broadcast, can improve the broadcasting efficiency of the main broadcast, and save time.

In this embodiment, the server 10 may use all video frames in which the target object is recognized as preset video frames in the live broadcast process, sequentially perform corresponding recognition and segmentation processing on each preset video frame to obtain corresponding target object data, background image data, and the like, and finally display the target object and the background image on the corresponding video frame of the live broadcast picture. Video frames that do not identify the target object may not be processed.

Of course, the server 10 may perform the above-described recognition and division processing under the corresponding conditions. As the live broadcasting process proceeds, the target object data, the peripheral object data, the background image data, and the like mentioned in the above may be updated under the corresponding conditions. For example, if the target object in the preset video frame corresponding to the currently obtained target object data and background image data is not close to the target object in the certain subsequent video frame or the difference between the target object in the preset video frame and the target object in the certain subsequent video frame is too large, the server 10 is triggered to re-acquire a new preset video frame. For example, if the face object in the current preset video frame is a front face, the face object in the subsequent multi-frame video frame is still the front face, and the shapes of the face object in the subsequent multi-frame video frame and the face object in the preset video frame are close or the similarity satisfies the condition, the target object and the background image data may not be updated. If the face object in the current preset video frame is a front face, the face object in a subsequent video frame is a side face, and the shapes of the face object and the face object in the preset video frame are not close or the similarity does not meet the condition, the server 10 may reacquire the video frame as the preset video frame, and then execute the display processing method of the embodiment. As for the similarity calculation, for example, a plurality of feature points on the target object may be determined, a multi-dimensional feature vector corresponding to each feature point may be established, and then the distance between the multi-dimensional feature vectors in the two video frames may be calculated, and the distance may be compared with a preset threshold value to determine the similarity between the two.

As shown in fig. 3 and fig. 7, a second embodiment of the method for processing a live view display according to the present application, for example, a user terminal is taken as an execution subject, specifically, an audience terminal 30, includes:

m100: target object data of a target object and background image data of a background image are acquired.

The viewer side 20 acquires target object data of a target object and background image data of a background image from the server 10. The target object data and the background image data transmitted from the server 10 may be generated by self-processing or may be received from the host terminal 20. For the description of the target object data, the background image data, and the like, reference may be made to the above description related to the first embodiment of the display processing method for live broadcast frames in the present application, and details are not repeated here.

M200: the target object data and the background image data are combined to display the layer of the target object on top of the layer of the background image in the live view.

After the viewer end 20 receives the target object data and the background image data, for example, the combined image may become a corresponding video frame in the video stream, and in the video frame, the layer of the target object is displayed on the layer of the background image. In other words, the target object and the background image are independent from each other, so that the target object can be operated correspondingly without affecting the background image or being limited by the background image.

For a more detailed description of step M200, reference may be made to the above description related to the first embodiment of the display processing method for a live view in the present application, and details are not repeated here.

M300: and responding to the visual angle adjusting instruction, and at least carrying out visual angle adjustment matched with the visual angle adjusting instruction on the target object displayed in the live broadcast picture.

Since the target object and the background image are in a mutually independent state, the target object can be adjusted individually. The user may perform a corresponding view angle adjustment operation, and the viewer terminal 30 may generate a corresponding view angle adjustment instruction, and perform view angle adjustment matched with the view angle adjustment instruction at least on the target object in the live view in response to the view angle adjustment instruction. For detailed description of this step, reference may be made to related description in the first embodiment of the display processing method for a live view in this application, and details are not described herein again.

For acquiring the target object data of the target object and the background image data of the background image, the following steps included in step M100 may be specifically referred to:

m110: and the acquisition server divides preset video frames acquired from the video stream corresponding to the live broadcast picture to generate target object data and background image data.

For details of the server 10 acquiring the preset video frame from the video frame corresponding to the live view and performing the segmentation processing, reference may be made to the related descriptions in the first embodiment of the display processing method for the live view in the present application, for example, S110 and S120, which are not described herein again.

Specifically, as for a specific procedure of the segmentation process, the following steps included in M110 can be referred to:

m111: after the acquisition server identifies the target object in the preset video frame, the target object is divided into the generated target object data and the generated background image data from the preset video frame.

For a detailed description of step M111, reference may be made to the related description in the first embodiment of the display processing method for a live view in the present application, for example, steps S121 and S122, which are not described herein again.

Optionally, the target object data received by the viewer may be generated by further performing segmentation processing on the target object. Step M100 may comprise the steps of:

m120: and further acquiring local feature data of a local feature object segmented from the target object and object background image data of an object background image, wherein the object background image is obtained by filling the residual background of the local feature segmented from the target object.

Regarding how the local feature object and the object background image are formed, reference may be made to the related description in the first embodiment of the display processing method for the live view in the present application, such as steps S1223 to S1225, which is not described herein again.

The viewer terminal 30 may receive background image data, local feature data, object background image data, and the like from the server 10. That is, the target object data may include local feature data, object background image data, and the like, and may also include first segmentation position data, and the like, which may specifically refer to the related description in the first embodiment of the display processing method for a live view in the present application.

Step 200 may include:

m210: and combining the local feature data and the object background data to display the layer of the local feature object on the layer of the object background image in the live broadcast picture.

Specifically, in a live broadcast picture, a layer of a local feature object is displayed on a layer of an object background image, and the layer of the object background image is displayed on the layer of the background image. For details, reference may be made to related descriptions in the first embodiment of the display processing method for a live view in the present application, and details are not described herein again.

After combining the target object and the background image, there are many ways to visually adjust the target object, such as the following steps:

m310: and performing at least one of rotation, deformation, zooming, translation and perspective matched with the posture data on the target object displayed in the live-action picture so as to correspondingly adjust the visual angle of the target object.

For a detailed description of M310, reference may be made to the related description in the first embodiment of the display processing method for a live view in the present application, and for example, the related description of S200 is not repeated herein.

There are also various ways of generating the view angle adjustment instruction, as follows:

the first mode is as follows: the pose of the audience terminal 30 is detected by the pose sensor, and a corresponding visual angle adjustment instruction is generated. See step M300 for details including the following steps:

m320: and acquiring a visual angle adjusting instruction generated based on the attitude data sent by the attitude sensor.

When the viewer watches the live broadcast through the viewer terminal 30, the viewing angle may be adjusted by adjusting the posture of the viewer terminal 30. As the pose of the spectator terminal 30 changes, the pose sensor therein can acquire the corresponding pose data, which corresponds to the viewing angle corresponding to the spectator. The pose sensors include, for example, gyroscopes and gravity sensors, and the pose data includes, for example, angle data and gravity data. Taking a gyroscope and a gravity sensor as an example, the viewer terminal 30 may obtain a viewing angle adjustment instruction generated based on at least angle data collected by the gyroscope and gravity data collected by the gravity sensor.

M320: and responding to the visual angle adjusting instruction, and at least performing visual angle adjustment matched with the attitude data on the target object displayed in the live-broadcasting picture.

After the audience terminal 30 acquires the pose data, the pose data is used to perform corresponding viewing angle adjustment on the target object, so that the target object can adapt to the viewing angle of the user. Such adjustment of the target object is equivalent to the target object being capable of presenting a 3D effect or a 3D-like effect.

After receiving the pose data including the angle data and the gravity data, the viewer terminal 30 may rotate the target object displayed in the live view in accordance with the gravity data and the angle data. Specifically, the gravity data is used to adjust the steering of the target object, and the angle data is used to adjust the rotation angle of the target object. For example, if the viewer holds the mobile phone sideways and tilts it to the left, the user's angle of view may be from the right side to the screen, so the target object may rotate to the right side under the action of the gravity data and rotate by an angle matching the angle data. Therefore, the target object rotates according to the watching visual angle of the user, for example, the target object keeps facing the user, and the live feeling of the live broadcast picture watched by the user is stronger.

For example, the target object includes a human face object, the local feature object includes an eyeball object, and the step M300 may include the steps of:

m330: and respectively rotating the object background image of the human face object and the eyeball object displayed in the live-action picture in a way of being matched with the visual angle adjusting instruction so as to adjust the orientation of the object background image and the orientation of the eyeball object.

Alternatively, the above-described pose data may be used to perform perspective adjustment, for example, rotation, on the object background image (for example, the face) and the eyeball of the human face object, so that the orientation of the object background image and the eyeball object of the human face object is changed in accordance with the perspective adjustment instruction. In this way, the object background image and the eyeball object rotate along with the visual angle of the user, and the visual angle towards the user is kept. In general, by orienting the background image and the eyeball object, the viewer is always stared at, and a 3D effect or a 3D-like effect is realized. Therefore, the interactive function of the live broadcast picture is effectively enhanced, the interaction between the live broadcast picture and audiences is realized, the dull feeling of the live broadcast picture is reduced, and the viscosity of a user is improved. If the anchor side is facing the face or the eye position of the face cannot be acquired, the gyroscope, the gravity sensor and the like do not need to be used as local characteristic data of the eyeball object and the like.

The second mode is as follows: the viewing angle of the user is collected by the camera of the audience terminal 30, and a corresponding viewing angle adjusting instruction is generated. Specifically, the following may be mentioned:

m340: and acquiring a scene image through a camera.

Specifically, a scene image of a scene in which the viewer is currently located is captured by a camera of the viewer terminal 30, another camera installed in a live scene, or the like.

M350: and recognizing the face of the audience in the scene image, and determining the position information of the face of the audience after recognizing the face of the audience.

And recognizing the face of the audience in the scene image by using a face recognition technology, and calculating the position information of the face of the audience in the scene image. Specifically, position information of eyeballs of the viewer's face may be calculated.

M360: and generating a visual angle adjusting instruction based on the position information of the face of the audience.

And after the position information of the face of the audience is obtained, generating a visual angle adjusting instruction based on the position information, and further adjusting the visual angle of the target object.

M370: and responding to the visual angle adjusting instruction, and adjusting the visual angle of the target object displayed in the live broadcast picture by utilizing the position information of the human face of the audience so as to adjust the visual angle of the target object displayed in the live broadcast picture to face the human face of the audience.

And adjusting the visual angle of the target object by using the position information of the face of the audience, for example, rotating the target object so that the visual angle of the target object faces to the position of the face of the audience. Taking the human face object of the anchor as an example, the human face object in the anchor picture is rotated by utilizing the position of the human face of the audience so as to face the position of the human face of the audience, and the human face object looks like acting and speaking towards the audience, so that the interactive feeling with the audience can be enhanced, and the watching experience and the viscosity of the audience are improved.

In order to further enhance the presence and interaction of the audience, the following processes can be further performed:

m410: a light-reflectable region is identified in a preset video frame.

Alternatively, the viewer terminal 30 may identify a light-reflective region, such as a mirror, a human eye, etc., in the preset video frame through a threshold segmentation algorithm.

M420: and acquiring a scene image through a camera, and adjusting the size of the scene image to be matched with the size of the light reflecting area.

The spectator terminal 30 collects a scene image of a scene where spectators are currently located through a camera of the spectator terminal or other cameras and the like, and adjusts the size of the scene image to match the size of the light reflecting area, that is, the light reflecting area can accommodate the adjusted scene image.

M430: and displaying the adjusted scene image in a reflective area in the live broadcast picture.

The adjusted scene image is displayed in a reflective area in a preset video frame, so that the live broadcast scene is simulated to face the audience in effect, the current scene of the audience can be reflected in a reflective object in the live broadcast scene, and the telepresence of the audience is enhanced.

For example, a light filter is added to the mirror to process the image of the scene, and the eye adjusts the image of the scene showing the eye's reflection by adding a filter filtered by the eyeball.

Through the mode, the method can technically create a live broadcast process similar to the live broadcast and the face-to-face chat of audiences, and the audiences watch the live broadcast in the presence, so that the interactive function of the live broadcast system 1 is enhanced, the watching experience of the audiences is improved, and the viscosity of users is improved.

Of course, in addition to the target object, corresponding operations may be performed on objects other than the target object. See in particular the following steps:

m130: acquiring target object data of a target object, background image data of a background image and peripheral object data of a peripheral object;

m230: combining the target object data, the peripheral object data, and the background image data to display the layer of the target object and the layer of the peripheral object on top of the layer of the background image in the live view.

For detailed description of M130 and M230, reference may be specifically made to the related description in the first embodiment of the display processing method for live frames in the present application, and details are not repeated here.

M380: and responding to the position adjusting instruction, and moving the peripheral object displayed in the live broadcasting picture to a position matched with the position adjusting instruction.

For detailed description of M380, reference may be specifically made to related descriptions in the first embodiment of the display processing method for a live view in the present application, and details are not described herein again.

As shown in fig. 8, the server 10 described in the server embodiments herein may include a processor 110, a memory 120, and a transceiver 130. The memory 120 and the transceiver 130 are respectively coupled to the processor 110.

The processor 110 is used for controlling the operation of the electronic device 100, and the processor 110 may also be referred to as a Central Processing Unit (CPU). The processor 110 may be an integrated circuit chip having signal processing capabilities. The processor 110 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor 110 may be any conventional processor or the like.

The memory 120 is used for storing computer programs, and may be a RAM, a ROM, or other types of storage devices. In particular, the memory may include one or more computer-readable storage media, which may be non-transitory. The memory may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in a memory is used to store at least one program code.

The transceiver 130 is a device or circuit for the server 10 to make communication connection with an external device, so that the processor 110 can make data interaction with an external device via the transceiver 130.

The memory 120 stores a computer program, and the processor 110 can execute the computer program to implement the display processing method described in the first embodiment of the display processing method for live pictures in the present application.

As shown in fig. 9, the electronic device 30, such as the viewer terminal 30, described in the electronic device embodiment of the present application includes a processor 310 and a memory 320. The memory 320 is coupled to the processor 310.

The memory 320 is used for storing computer programs, and may be a RAM, a ROM, or other types of storage devices. In particular, the memory may include one or more computer-readable storage media, which may be non-transitory. The memory may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in a memory is used to store at least one program code.

The processor 310 is used for controlling the operation of the electronic device 30, and the processor 310 may also be referred to as a Central Processing Unit (CPU). The processor 310 may be an integrated circuit chip having signal processing capabilities. The processor 310 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor 310 may be any conventional processor or the like.

The processor 310 is configured to execute the computer program stored in the memory 320 to implement the display processing method described in the first embodiment of the display processing method of the live view of the present application.

In some embodiments, the electronic device 30 may further include: a peripheral interface 330 and at least one peripheral. The processor 310, memory 320, and peripheral interface 330 may be connected by buses or signal lines. Various peripheral devices may be connected to peripheral interface 330 via a bus, signal line, or circuit board. Specifically, the peripheral device includes: at least one of radio frequency circuitry 340, display 350, audio circuitry 360, and power source 370.

The peripheral interface 330 may be used to connect at least one peripheral related to I/O (Input/output) to the processor 310 and the memory 320. In some embodiments, processor 310, memory 320, and peripheral interface 330 are integrated on the same chip or circuit board; in some other embodiments, any one or two of the processor 310, the memory 320 and the peripheral interface 330 may be implemented on a separate chip or circuit board, which is not limited in this embodiment.

The Radio Frequency circuit 340 is used for receiving and transmitting RF (Radio Frequency) signals, also called electromagnetic signals, and the Radio Frequency circuit 340 may also be called a transceiver. The radio frequency circuit 340 communicates with communication networks and other communication devices via electromagnetic signals. The rf circuit 340 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 340 includes: an antenna system, an RF transceiver, one or more amplifiers, a tuner, an oscillator, a digital signal processor, a codec chipset, a subscriber identity module card, and so forth. The radio frequency circuit 340 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocols include, but are not limited to: the world wide web, metropolitan area networks, intranets, generations of mobile communication networks (2G, 3G, 4G, and 5G), Wireless local area networks, and/or WiFi (Wireless Fidelity) networks. In some embodiments, rf circuit 340 may further include NFC (Near Field Communication) related circuits, which are not limited in this application.

The display screen 350 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 350 is a touch display screen, the display screen 350 also has the ability to capture touch signals on or over the surface of the display screen 350. The touch signal may be input to the processor 310 as a control signal for processing. At this point, the display screen 350 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display screen 350 may be one, disposed on the front panel of the electronic device 30; in other embodiments, the display screens 350 may be at least two, respectively disposed on different surfaces of the electronic device 30 or in a folded design; in other embodiments, the display 350 may be a flexible display disposed on a curved surface or a folded surface of the electronic device 30. Even further, the display screen 350 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The Display 350 can be made of LCD (Liquid Crystal Display), OLED (Organic Light-emitting diode), and the like.

The audio circuit 360 may include a microphone and a speaker. The microphone is used for collecting sound waves of a user and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 310 for processing or inputting the electric signals to the radio frequency circuit 340 for realizing voice communication. For stereo capture or noise reduction purposes, the microphones may be multiple and disposed at different locations of the electronic device 30. The microphone may also be an array microphone or an omni-directional pick-up microphone. The speaker is used to convert electrical signals from the processor 310 or the radio frequency circuit 340 into sound waves. The loudspeaker can be a traditional film loudspeaker or a piezoelectric ceramic loudspeaker. When the speaker is a piezoelectric ceramic speaker, the speaker can be used for purposes such as converting an electric signal into a sound wave audible to a human being, or converting an electric signal into a sound wave inaudible to a human being to measure a distance. In some embodiments, audio circuitry 360 may also include a headphone jack.

The power supply 370 is used to power various components in the electronic device 30. Power source 370 may be alternating current, direct current, disposable or rechargeable batteries. When power source 370 comprises a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

For detailed explanation of functions and execution processes of each functional module or component in the embodiment of the electronic device of the present application, reference may be made to the explanation in the second embodiment of the display processing method for live broadcast of the present application, and details are not described here again.

In the embodiments provided in the present application, it should be understood that the disclosed electronic device and background processing method may be implemented in other manners. For example, the above-described embodiments of the electronic device are merely illustrative, and for example, a module or a unit may be divided into only one logic function, and may be implemented in other ways, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

Referring to fig. 10, the integrated unit may be stored in a computer-readable storage medium 200 if it is implemented in the form of a software functional unit and sold or used as a separate product. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions/computer programs for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media such as a usb disk, a portable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and electronic devices such as a computer, a mobile phone, a notebook computer, a tablet computer, and a camera having the storage medium.

For the description of the execution process of the program data in the computer-readable storage medium, reference may be made to the first embodiment and the second embodiment of the display processing method for live broadcast frames in the present application, which are not described herein again.

To sum up, in the embodiments, the target object data and the background image data are sent to the user terminal, and the user terminal combines the target object data and the background image data to place the layer of the target object on the layer of the background image in the live broadcast picture, so that the target object and the background image can be independent of each other and are not constrained with each other, and thus the target object is in an adjustable state.

The above description is only an example of the present application and is not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings, or which are directly or indirectly applied to other related technical fields, are intended to be included within the scope of the present application.

Claims

1. A display processing method of a live broadcast picture is characterized by comprising the following steps:

acquiring target object data of a target object and background image data of a background image;

and sending the target object data and the background image data to a user terminal, so that the user terminal is used for combining the target object data and the background image data to display the layer of the target object on the layer of the background image in the live broadcast picture, and is further used for responding to a view angle adjusting instruction to perform view angle adjustment matched with the view angle adjusting instruction on the target object displayed in the live broadcast picture.

2. The display processing method according to claim 1, characterized in that:

the acquiring target object data of a target object and background image data of a background image includes:

collecting a preset video frame in a video stream corresponding to the live broadcast picture;

and carrying out segmentation processing on the preset video frame to generate the target object data and the background image data.

3. The display processing method according to claim 2, characterized in that:

the segmenting the preset video frame to generate the target object data and the background image data includes:

identifying the target object in the preset video frame;

after the target object is identified, segmenting the target object from the preset video frame to obtain the target object data;

and filling the residual background of the preset video frame after the target object is segmented out to obtain the background image data.

4. The display processing method according to claim 3, characterized in that:

the segmenting the target object from the preset video frame to obtain the target object data includes:

determining a first segmentation position for segmenting the target object in the preset video frame;

and segmenting the target object from the preset video frame according to the first segmentation position to obtain image data of the target object and first segmentation position data corresponding to the first segmentation position, so that the user terminal is used for combining the image data of the target object and the background image data according to the first segmentation position data, and displaying the layer of the target object on the layer of the background image at the first segmentation position in the live broadcast picture.

5. The display processing method according to claim 3, characterized in that:

after the target object is identified, segmenting the target object from the preset video frame to obtain the target object data, including:

further identifying local feature objects of the target object in the target object;

segmenting the local characteristic object from the target object to obtain local characteristic data of the local characteristic object;

and performing filling processing on the residual background of the local feature object segmented by the target object to obtain object background image data of an object background image, so that the user terminal is used for combining the local feature data of the local feature object and the object background image data to display the layer of the local feature object on the layer of the object background image in the live broadcast picture, and is further used for responding to the view angle adjusting instruction to further perform view angle adjustment matched with the view angle adjusting instruction on the local feature object displayed in the live broadcast picture.

6. The display processing method according to claim 5, characterized in that:

the segmenting the local feature object from the target object to obtain local feature data of the local feature object includes:

determining a second segmentation location in the target object at which to segment the local feature object using an image segmentation network;

and segmenting the local feature object from the target object according to the second segmentation position to obtain the local feature data and second segmentation position data corresponding to the second segmentation position, so that the user terminal is used for combining the image data of the local feature object and the object background image data according to the second segmentation position data, and displaying the layer of the local feature object on the layer of the object background image at the second segmentation position in the live broadcast picture.

7. The display processing method according to claim 6,

after the obtaining the local feature data and the second segmentation position data corresponding to the second segmentation position, the method includes:

combining the image data of the local characteristic object and the object background image data according to the second segmentation position data to display the layer of the local characteristic object on the layer of the object background image at the second segmentation position, and generating a test object;

and comparing the test object with the target object to obtain difference data between the test object and the target object, and adjusting the image segmentation network by using the difference data.

8. The display processing method according to claim 5, characterized in that:

the filling processing of the residual background of the local feature object segmented from the target object includes:

acquiring a global color histogram of the target object;

and determining at least one color with the probability of occurrence in the front row in the global color histogram, and filling the color in the vacant region of the target object after the local feature object is segmented by using the at least one color.

9. The display processing method according to claim 5, characterized in that:

and acquiring color information adjacent to the vacant region of the target object after the local characteristic object is segmented by utilizing a neural network to fill the vacant region with colors.

10. The display processing method according to claim 3, characterized in that:

the filling processing of the residual background after the target object is segmented from the preset video frame includes:

and filling and completing the vacant areas in the residual background after the target object is segmented from the preset video frame by using the trained generated confrontation network.

11. The display processing method according to claim 2, characterized in that:

before the acquiring of the preset video frame in the video stream corresponding to the live broadcast picture, the method comprises the following steps:

receiving the video stream after the anchor terminal adjusts the target object by using a first adjustment parameter, and storing the first adjustment parameter of the anchor terminal in the current live broadcast;

and when a new live broadcast request of the anchor terminal is received after secondary live broadcast, sending the stored first adjustment parameter to the anchor terminal, so that the anchor terminal is used for adjusting the target object by using the stored first adjustment parameter in the live broadcast corresponding to the new live broadcast request.

12. The display processing method according to claim 1, comprising:

acquiring label data generated by identifying at least one attribute of the target object in different live broadcast rooms by a current user of the user terminal;

respectively counting the label data corresponding to the at least one item of attribute to determine the label data with the highest quantity in each item of attribute;

generating a second adjustment parameter matched with the label data with the highest quantity corresponding to the at least one attribute respectively;

and adjusting the at least one attribute of the target object in the video stream by using the second adjustment parameter, and sending the adjusted video stream to a user terminal.

13. The display processing method according to claim 1, comprising:

acquiring attribute data of each anchor user concerned by a current user of the user terminal to form virtual attribute data;

searching anchor users similar to the attribute data and the virtual attribute data, and recommending the searched live broadcast room information corresponding to the anchor users to the user terminal.

14. The display processing method according to claim 1, characterized in that:

acquiring target object data of a target object and background image data of a background image, including:

acquiring target object data of a target object, background image data of a background image and peripheral object data of a peripheral object;

the sending the target object data and the background image data to a user terminal includes:

and sending the target object data, the peripheral object data and the background image data to a user terminal, so that the user terminal is used for combining the target object data, the peripheral object data and the background image data to display the layer of the target object and the layer of the peripheral object on the layer of the background image in the live broadcast picture.

15. A display processing method of a live broadcast picture is characterized by comprising the following steps:

combining the target object data and the background image data to display a layer of the target object on top of a layer of the background image in the live view;

and responding to a visual angle adjusting instruction, and at least carrying out visual angle adjustment matched with the visual angle adjusting instruction on the target object displayed in the live broadcast picture.

16. The display processing method according to claim 15, characterized in that:

and the acquisition server divides preset video frames acquired from the video stream corresponding to the live broadcast picture to generate the target object data and the background image data.

17. The display processing method according to claim 16, characterized in that:

the acquiring server divides preset video frames collected from a video stream corresponding to the live broadcast picture to generate the target object data and the background image data, and the acquiring server comprises:

and after the server identifies the target object in the preset video frame, the target object is obtained, and the generated target object data and the generated background image data are divided from the preset video frame.

18. The display processing method according to claim 16, comprising:

identifying a light reflecting area in the preset video frame;

acquiring a scene image through a camera, and adjusting the size of the scene image to be matched with the size of the light reflecting area;

and displaying the adjusted scene image in the reflectable area in the live broadcast picture.

19. The display processing method according to claim 15,

the acquiring target object data of a target object and background image data of a background image further includes:

acquiring local feature data of a local feature object segmented from the target object and object background image data of an object background image, wherein the object background image is obtained by filling the residual background of the local feature segmented from the target object;

the combining the target object data and the background image data to display the layer of the target object on top of the layer of the background image in the live view further comprises:

and combining the local feature data and the object background data to display the layer of the local feature object on the layer of the object background image in the live broadcast picture.

20. The display processing method according to claim 19, characterized in that:

the target object comprises a human face object, and the local feature object comprises an eyeball object; wherein the performing, at least to the target object displayed in the live view, the view adjustment matched with the view adjustment instruction includes:

and respectively rotating the object background image and the eyeball object of the face object displayed in the live broadcast picture in a manner of being matched with a visual angle adjusting instruction so as to adjust the orientation of the object background image and the orientation of the eyeball object.

21. The display processing method according to any one of claims 15 to 20, characterized in that:

before the responding to a view angle adjusting instruction, at least performing view angle adjustment on the target object displayed in the live broadcast picture, which is matched with the view angle adjusting instruction, the method comprises the following steps:

acquiring a scene image through a camera;

recognizing the faces of the audiences in the scene image, and determining the position information of the faces of the audiences after recognizing the faces of the audiences;

generating the visual angle adjusting instruction based on the position information of the face of the audience;

the responding to a visual angle adjusting instruction, at least performing visual angle adjustment matched with the visual angle adjusting instruction on the target object displayed in the live broadcast picture, including:

and responding to a visual angle adjusting instruction, and adjusting the visual angle of the target object displayed in the live broadcast picture by using the position information of the face of the audience so as to adjust the visual angle of the target object displayed in the live broadcast picture to face the face of the audience.

22. The display processing method according to any one of claims 15 to 20, characterized in that:

acquiring the visual angle adjusting instruction generated based on the attitude data sent by the attitude sensor;

and responding to the visual angle adjusting instruction, and at least carrying out visual angle adjustment matched with the attitude data on the target object displayed in the live broadcast picture.

23. The display processing method according to claim 22, wherein:

the adjusting of the view angle, which is matched with the attitude data, at least on the target object displayed in the live broadcast picture comprises:

and performing at least one of rotation, deformation, zooming, translation and perspective matched with the posture data on the target object displayed in the live broadcast picture so as to correspondingly adjust the visual angle of the target object.

24. The display processing method according to claim 22, wherein:

the obtaining of the view angle adjustment instruction generated based on the attitude data sent by the attitude sensor includes:

acquiring the visual angle adjusting instruction generated at least based on angle data acquired by a gyroscope and gravity data acquired by a gravity sensor;

the responding to the visual angle adjusting instruction, at least carrying out visual angle adjustment matched with the attitude data on the target object displayed in the live broadcast picture, and the visual angle adjustment comprises the following steps:

and adjusting the steering of the target object displayed in the live broadcast picture by using the gravity data, and rotating the target object by an angle matched with the angle data.

25. The display processing method according to claim 15, characterized in that:

acquiring target object data of the target object, background image data of the background image and peripheral object data of a peripheral object;

the combining the target object data and the background image data to generate a target image such that the target object is displayed over a layer of the background image comprises:

combining the target object data, peripheral object data, and the background image data to display a layer of the target object and a layer of the peripheral object on a layer of the background image in the live view.

26. The display processing method according to claim 25, characterized in that:

after said combining said target object data, peripheral object data and said background image data, comprising:

and responding to a position adjusting instruction, and moving the peripheral object displayed in the live broadcast picture to a position matched with the position adjusting instruction.

27. A server, comprising a processor, a transceiver, and a memory, the memory and the transceiver being respectively coupled to the processor, the memory storing a computer program, the processor being capable of executing the computer program to implement the method of any one of claims 1-14.

28. An electronic device comprising a display, a processor, a transceiver, and a memory, the display, the memory, and the transceiver being respectively coupled to the processor, the memory storing a computer program, the processor being capable of executing the computer program to implement the method of any of claims 15-26.

29. A live broadcast system, comprising: the server of claim 27 and the electronic device of claim 28, the server and the electronic device being communicatively coupled.

30. A computer-readable storage medium, in which a computer program is stored which can be executed by a processor to carry out the method according to any one of claims 1 to 26.