CN113949893A

CN113949893A - Live broadcast processing method and device, electronic equipment and readable storage medium

Info

Publication number: CN113949893A
Application number: CN202111205287.8A
Authority: CN
Inventors: 李宏平; 蔡玥
Original assignee: China United Network Communications Group Co Ltd
Current assignee: China United Network Communications Group Co Ltd
Priority date: 2021-10-15
Filing date: 2021-10-15
Publication date: 2022-01-18

Abstract

The application provides a live broadcast processing method, a live broadcast processing device, an electronic device and a readable storage medium, wherein the method comprises the following steps: determining a live broadcast mode, and when the live broadcast mode is a free visual angle live broadcast mode, acquiring a visual angle switching instruction sent by a user side, wherein the visual angle switching instruction comprises switching control information, acquiring a first target video data frame corresponding to the switching control information from a pre-acquired video data stream according to the visual angle switching instruction, determining a target visual angle according to the first target video data frame and a pre-trained visual angle prediction model, sending the first target video data frame to the user side for live broadcast operation, acquiring a video data frame corresponding to a target visual angle from the video data stream, and sending the video data frame corresponding to the target visual angle to the user side for live broadcast operation. According to the scheme, the live broadcast response time is shortened, the smoothness of visual angle switching is improved, and the live broadcast effect is further improved.

Description

Live broadcast processing method and device, electronic equipment and readable storage medium

Technical Field

The present application relates to the field of data processing technologies, and in particular, to a live broadcast processing method and apparatus, an electronic device, and a readable storage medium.

Background

With the development of Chinese sports, the people's attention to sports events is higher and higher, and more people like watching the sports events in a live broadcast manner.

In the prior art, in the traditional match, due to the limitation of factors such as the live broadcast angle, the machine position, the personal preference of the director and the like, audiences often cannot select to watch the favorite angle or detail of the audiences at will, and therefore, the multi-angle and all-around video watching requirement of a user can be met by adopting a multi-view competition watching mode. Specifically, the sports event can be live broadcast in a play mode combining a guide view and a free view of a traditional director.

However, when the sports game is live broadcast in a free view mode, because the view switching instruction provided by the user is real-time, the server needs to acquire the video data frame corresponding to the switching instruction first, and then live broadcast is performed according to the video data frame corresponding to the acquired switching instruction, so that the live broadcast response time is increased, the view switching is not smooth, and the live broadcast effect is influenced.

Disclosure of Invention

The application provides a live broadcast processing method and device, electronic equipment and a readable storage medium, so as to reduce live broadcast response time and improve live broadcast effect.

In a first aspect, the present application provides a live broadcast processing method, including:

determining a live broadcast mode, and acquiring a view switching instruction sent by a user side when the live broadcast mode is a free view live broadcast mode, wherein the view switching instruction comprises switching control information;

acquiring a first target video data frame corresponding to the switching control information from a pre-acquired video data stream according to the view switching instruction, and determining a target view according to the first target video data frame and a pre-trained view prediction model;

sending the first target video data frame to a user side for live broadcasting operation;

and acquiring a video data frame corresponding to the target visual angle from the video data stream, and sending the video data frame corresponding to the target visual angle to the user side for live broadcasting operation.

Optionally, the switching control information includes information of a current playing mode and a to-be-switched viewing angle,

then, the acquiring a first target video data frame corresponding to the switching control information from a pre-acquired video data stream according to the view switching instruction, and determining a target view according to the first target video data frame and a pre-trained view prediction model includes:

acquiring a first target video data frame corresponding to the current playing mode and the to-be-switched visual angle information from a pre-stored video data stream according to the visual angle switching instruction;

and inputting the first target video data frame into a pre-trained visual angle prediction model for recognition, and determining a target visual angle.

Optionally, the view prediction model includes a CNN convolutional neural network module and an LSTM long-term memory network module,

inputting the first target video data frame into a pre-trained view prediction model for recognition, and determining a target view, including:

inputting the target video data to the CNN module in the visual angle prediction model for image feature extraction to obtain spatial image feature information;

and inputting the spatial image characteristic information into the LSTM module in the view angle prediction model to perform time relevance identification to obtain a target view angle.

Optionally, the acquiring a video data frame corresponding to the target view from the video data stream includes:

determining video resolution corresponding to each video acquisition device according to the target visual angle;

and acquiring corresponding video data frames from the video data stream according to the video resolution corresponding to each video acquisition device.

Optionally, the determining the live mode includes:

receiving a mode selection instruction sent by a user side, wherein the mode selection instruction comprises a mode identifier;

and determining a live broadcast mode according to the mode identification.

Optionally, if the live mode is a fixed view live mode, the method further includes:

acquiring a corresponding second target video data frame from the video data stream according to preset optimal visual angle information;

and sending the second target video data frame to the user side so that the user side carries out live broadcast operation according to the second target video data frame.

Optionally, before the determining the live mode, the method further includes:

the method comprises the steps of acquiring a video data stream sent by a secondary source station end in real time and storing the video data stream, wherein the video data stream sent by the secondary source station end is acquired from the source station end in a real-time message transfer protocol (RTMP) mode, and the video data stream of the source station end is obtained by preprocessing original video data frames acquired by various video acquisition devices.

Optionally, before the determining a target view according to the first target video data frame and the pre-trained view prediction model, the method further includes:

acquiring a visual angle prediction training data set, wherein the visual angle prediction training data set comprises a plurality of groups of visual angle prediction training data, and each group of visual angle prediction training data comprises historical video data acquired by each image acquisition device and a visual angle to be switched selected by a user in the next adjacent time period;

inputting historical video data collected by each image collecting device in each group of visual angle prediction training data and a visual angle to be switched selected by a user in the next adjacent time period into a neural network model for training to obtain training characteristic information of each space image;

and inputting the training characteristic information of each spatial image and the corresponding to-be-switched visual angle into a long-time memory network model for training to obtain a visual angle prediction model.

Optionally, after the video data frame corresponding to the target view is sent to the user side for live broadcast operation, the method further includes:

receiving a live broadcasting ending instruction sent by a user side;

and stopping acquiring a new video data stream according to the live broadcast ending instruction.

In a second aspect, the present application provides a live broadcast processing apparatus, including:

the system comprises a determining module, a switching module and a switching module, wherein the determining module is used for determining a live broadcast mode and acquiring a view switching instruction sent by a user side when the live broadcast mode is a free view live broadcast mode, and the view switching instruction comprises switching control information;

the processing module is used for acquiring a first target video data frame corresponding to the switching control information from a pre-acquired video data stream according to the view switching instruction, and determining a target view according to the first target video data frame and a pre-trained view prediction model;

the processing module is further configured to send the first target video data frame to a user side for live broadcast operation;

the processing module is further configured to acquire a video data frame corresponding to the target view from the video data stream, and send the video data frame corresponding to the target view to the user side for live broadcast operation.

In a third aspect, the present application provides an electronic device, comprising: a processor, and a memory communicatively coupled to the processor;

the memory stores computer-executable instructions;

the processor executes computer-executable instructions stored by the memory to implement a live processing method as claimed in any one of the first aspects.

In a fourth aspect, the present application provides a computer-readable storage medium, in which computer-executable instructions are stored, and when a processor executes the computer-executable instructions, the live broadcast processing method according to any one of the first aspect is implemented.

After the scheme is adopted, a live broadcast mode can be determined, then when the live broadcast mode is a free visual angle live broadcast mode, a visual angle switching instruction containing switching control information sent by a user side is obtained, a first target video data frame corresponding to the switching control information is obtained from a video data stream obtained in advance according to the visual angle switching instruction, a target visual angle is determined according to the first target video data frame and a pre-trained visual angle prediction model, the video data frame corresponding to the target visual angle can be obtained from the video data stream while the first target video data is sent to the user side for live broadcast operation, the video data frame corresponding to the target video is also sent to the user side for live broadcast operation, the target visual angle is predicted through the pre-trained visual angle prediction model, and the video data frame corresponding to the target view angle is obtained in advance, so that the waiting time for obtaining the video data frame is shortened, the live broadcast response time is shortened, the smoothness of view angle switching is improved, and the live broadcast effect is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.

Fig. 1 is a schematic architecture diagram of an application system of a live broadcast processing method according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a live broadcast processing method according to an embodiment of the present application;

fig. 3 is a schematic flowchart of a live broadcast processing method according to another embodiment of the present application;

fig. 4 is a schematic structural diagram of a live broadcast processing apparatus according to an embodiment of the present application;

fig. 5 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present application and in the above-described drawings (if any) are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of including other sequential examples in addition to those illustrated or described. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In the traditional match, due to the limitation of factors such as the live broadcast angle, the machine position, the personal preference of the director and the like, audiences often cannot select to watch the favorite angle or detail of the audiences at will, and therefore, the multi-angle and all-around video watching requirement of a user can be met by adopting a multi-view match watching mode. The multi-view match watching mode is a video acquisition technology capable of meeting the requirement of a user for watching videos in a multi-angle and all-around mode, and when the videos are recorded, a camera array shoots the same scene from different view angles to obtain a group of video data. When watching, the user can freely switch the viewpoints in the multi-viewpoint video to watch the contents of different viewing angles, thereby realizing the bullet time special effect and watching the viewpoint contents interested by the user, and leading the user to have the feeling of being in a competition scene. In order to enrich the user's viewing mode, prevent that the user from missing the splendid moment because of lacking the whole assurance to the match, propose the broadcast mode that traditional guide's instruction visual angle combines with free visual angle, select by oneself according to user's demand, preset the mode and guaranteed the ornamental nature of visual angle when the user is inconvenient to give visual angle switching information in real time, the free mode has given the right that the user selected by oneself, both combine to promote user's viewing experience. However, when the sports game is live broadcast in a free view mode, because the view switching instruction provided by the user is real-time, the server needs to acquire the video data frame corresponding to the switching instruction first, and then live broadcast is performed according to the video data frame corresponding to the acquired switching instruction, so that the live broadcast response time is increased, the view switching is not smooth, and the live broadcast effect is influenced.

Based on the technical problem, the target visual angle is predicted through the visual angle prediction model based on pre-training, and the video data frame corresponding to the target visual angle is obtained in advance, so that the waiting time for obtaining the video data frame is shortened, the live broadcast response time is shortened, the smoothness of visual angle switching is improved, and the technical effect of the live broadcast effect is improved.

Fig. 1 is a schematic architecture diagram of an application system of a live broadcast processing method provided in an embodiment of the present application, and as shown in fig. 1, the application system may include: the system comprises an acquisition end 101, a processing end 102 and a plug flow end 103, wherein the acquisition end 101 may include a plurality of image acquisition devices, and the image acquisition devices may be cameras, for example. Furthermore, the image acquisition devices can be uniformly distributed to form a camera array, and the full-angle video information of the target scene is acquired. The acquisition end is mainly responsible for reasonably setting the number of cameras, camera parameters and synchronous triggering of the cameras. The larger the number of cameras is, the more the visual angle is switched to be fine, so that the user experience is improved, but the increase of the collected video stream information also increases the network burden and reduces the user experience in turn, because the number n of the cameras needs to be reasonably controlled. The camera parameters such as resolution, focal length, aperture, etc. are kept consistent. In order to strictly acquire scene images with different visual angles at the same time, a camera needs to pay attention to synchronous triggering, and the timestamp information of each frame of data in each path of stream is extracted to facilitate post-processing.

The video data stream acquired by the acquisition end 101 may be sent to the processing end 102, and the processing end 102 is composed of an edge node server, and may perform encoding operation, splicing operation, management operation, and CDN (Content Delivery Network) source station stream pushing operation on the video data stream acquired by the acquisition end 101. In order to ensure the synchronism of data streams of different visual angles, viewpoint combining splicing coding can be adopted, and a plurality of paths of videos are spliced in a form similar to a style of a Sudoku, and simultaneously, the data volume is effectively compressed. And according to the Time of each visual angle data frame and the camera position information, when packaging is carried out after encoding, Time information, camera position arrangement information and the like are added in the packaging information, and then the information is pushed to a CDN source station through an uplink Time Messaging Protocol (RTMP) for content distribution and shares all configuration files with the bottom CDN.

For example, HEVC, AVS2, VP9, or the like may be used as the encoding scheme.

In addition, the stream pushing end 103 may be composed of a secondary source station end, an underlying content distribution network end (i.e., an underlying network end), and a user end. And the secondary source station requests the data stream content to the source station end through the RTMP pull stream and buffers the data stream content. And the secondary source station end shares all the cache data with the bottom-layer content distribution network end at the same time. And the playing control of the multi-view video is carried out between the user side and the bottom content distribution network side through streaming media protocol RTMP plug flow. Wherein, the underlying network terminal can be a server or a server cluster.

The technical solution of the present application will be described in detail below with specific examples. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.

Fig. 2 is a schematic flowchart of a live broadcast processing method provided in an embodiment of the present application, where the method of this embodiment may be executed by a bottom-layer content distribution network. As shown in fig. 2, the method of this embodiment may include:

s201: and determining a live broadcast mode, and acquiring a visual angle switching instruction sent by a user side when the live broadcast mode is a free visual angle live broadcast mode, wherein the visual angle switching instruction comprises switching control information.

In this embodiment, before the live broadcast starts, the live broadcast mode may be determined, and then it is determined whether a view switching instruction of the user side needs to be acquired according to the live broadcast mode, that is, it is determined whether the view switching instruction sent by the user side is valid according to the live broadcast mode.

Further, the live broadcast module may include a free view live broadcast mode and a fixed view live broadcast mode, the free view live broadcast mode needs to acquire a corresponding video data frame according to a view switching instruction sent by the user side, and the fixed view live broadcast mode does not need to acquire the view switching instruction sent by the user side.

In a specific embodiment, if it is determined that the live broadcast mode is the free view live broadcast mode, indicating that live broadcast needs to be performed according to a view selected by a user, a view switching instruction including switching control information and sent by a user side may be obtained, and then corresponding operation is performed according to the view switching instruction.

S202: and acquiring a first target video data frame corresponding to the switching control information from the pre-acquired video data stream according to the view switching instruction, and determining a target view according to the first target video data frame and a pre-trained view prediction model.

In this embodiment, the switching control information included in the view switching instruction may indicate match information of a corresponding view that the user wants to watch, and therefore, after the view switching instruction including the switching control information is acquired, the first target video data frame corresponding to the switching control information may be acquired from the video data stream acquired in advance according to the view switching instruction.

Further, the switching control information may include a current play mode and view angle information to be switched, and then a first target video data frame corresponding to the switching control information is acquired from a pre-acquired video data stream according to the view angle switching instruction, and a target view angle is determined according to the first target video data frame and a pre-trained view angle prediction model, which may specifically include:

and acquiring a first target video data frame corresponding to the current playing mode and the to-be-switched visual angle information from a pre-stored video data stream according to the visual angle switching instruction.

In addition, the pre-acquired video data stream may be a video data stream acquired from the secondary source station side. Furthermore, when the video data stream is obtained from the secondary source station, the video data stream can be obtained from the secondary source station in real time, and the video data stream can also be obtained from the secondary source station once every preset time length. The preset duration can be set according to the practical application scene in a user-defined mode, only the live broadcast effect needs to be guaranteed, and the detailed discussion is omitted. In addition, when the video data stream is acquired from the secondary source station end, only the updated video data stream needs to be acquired, and the local existing video data stream does not need to be acquired repeatedly, so that the pressure of data transmission is reduced, and the efficiency of data transmission is improved.

Further, the to-be-switched view angle information may include current view angle information and switched start-stop viewpoint information (such as an offset), and then the to-be-switched view angle information may be determined according to the current view angle information and the switched start-stop viewpoint information. In addition, the switching control information may further include a switching speed, and the live broadcast angle may be switched according to the switching speed, or the live broadcast angle may be switched according to a default switching speed or not switched if the switching speed is not specified.

In addition, the view prediction model may include a CNN (Convolutional Neural Network) module and an LSTM (Long Short-Term Memory Network) module, and the inputting the first target video data frame into the pre-trained view prediction model for recognition and determining the target view may include:

and inputting target video data to the CNN module in the visual angle prediction model for image feature extraction to obtain spatial image feature information.

In this embodiment, since video data obtained from view information acquired by all video acquisition devices at a certain time has correlation with view selection of a user at the next time, feature extraction may be performed on all view information in a CNN + LST manner, and then the next view selection of the user may be predicted. The local features of the data are extracted through the CNN and combined and abstracted into high-level features, and the time correlation among the data is extracted through the LSTM, so that the space and time characterization capability of the model and the prediction capability of the next view angle of a user are improved, and the prediction accuracy of the model is further improved.

S203: and sending the first target video data frame to a user side for live broadcasting operation.

In this embodiment, after the first target video data frame corresponding to the switching control information is obtained, the first target video data frame may be sent to the user side for live broadcast operation, so as to reduce the waiting time of the user and improve the live broadcast experience of the user.

S204: and acquiring a video data frame corresponding to the target visual angle from the video data stream, and sending the video data frame corresponding to the target visual angle to a user side for live broadcasting operation.

In this embodiment, when sending the first target video data frame to the user side for live broadcasting, the video data frame corresponding to the target view angle may also be obtained from the video data stream, and then the video data frame corresponding to the target view angle is also sent to the user side for live broadcasting, which may specifically include:

and determining the video resolution corresponding to each video acquisition device according to the target visual angle.

Correspondingly, the number of the image acquisition equipment can be obtained through the visual angle prediction model, and then the stream is pulled to the secondary source station end in advance according to the predicted number of the image acquisition equipment. For example, the specific pull flow pattern may be: two or four video data closest to the image acquisition equipment are transmitted to the user side at high resolution, video data closer to the image acquisition equipment (such as three to six camera positions) are transmitted to the user side at low resolution, and video data farther from the image acquisition equipment (more than six video data) are not transmitted in advance.

After the scheme is adopted, the live broadcast mode can be determined firstly, then when the live broadcast mode is the free visual angle live broadcast mode, the visual angle switching instruction which is sent by the user end and contains the switching control information is obtained, the first target video data frame which corresponds to the switching control information is obtained from the video data stream which is obtained in advance according to the visual angle switching instruction, the target visual angle is determined according to the first target video data frame and the pre-trained visual angle prediction model, the video data frame which corresponds to the target visual angle can be obtained from the video data stream while the first target video data is sent to the user end for live broadcast operation, the video data frame which corresponds to the target video is also sent to the user end for live broadcast operation, the target visual angle is predicted according to the pre-trained visual angle prediction model, the video data frame which corresponds to the target visual angle is obtained in advance, and the waiting time for obtaining the video data frame is reduced, the live broadcast response time is shortened, the smoothness of visual angle switching is improved, and the live broadcast effect is further improved.

Based on the method of fig. 2, the present specification also provides some specific embodiments of the method, which are described below.

Furthermore, in another embodiment, the determining the live mode may include:

and receiving a mode selection instruction sent by the user side, wherein the mode selection instruction comprises a mode identifier.

And determining a live broadcast mode according to the mode identifier.

In this embodiment, in order to enrich the user's viewing mode and prevent the user from missing a highlight moment due to lack of overall grasp of the event, a play mode combining a free view angle live broadcast mode and a fixed view angle live broadcast mode is provided, and the user can select the play mode according to actual needs. The live mode of fixed visual angle has guaranteed the ornamental nature of visual angle when the inconvenient real-time visual angle of giving of user switches information, and the live mode of free visual angle has given the right that the user selected by oneself, and both combine together and can promote user's experience of looking for matches.

Further, in order to improve the accuracy of pattern recognition, each live mode may be assigned a pattern identifier, and then the live mode may be determined by the pattern identifier.

In addition, in another embodiment, if the live mode is the fixed view live mode, the method may further include:

and acquiring a corresponding second target video data frame from the video data stream according to preset optimal visual angle information.

In this embodiment, if the live mode selected by the user is the fixed view live mode, it indicates that the view information selected by the user does not need to be considered, and the live operation can be directly performed according to the preset optimal view information. Correspondingly, the optimal view angle information may be view angle information customized according to an actual application scene. The view information with the most number of different user selections in the historical data can be defined as the optimal view information.

In addition, the optimal view angle information can be optimal view angle control information in a fixed view angle live broadcast mode, which is analyzed and added when the processing terminal reads each frame of data sent by the acquisition terminal. The optimal view angle information may include directivity information of the current view angle and the next optimal view angle, and is strictly matched with the timestamp of each frame, and is combined into a control parameter set to be issued to the stream pushing end along with the data stream.

Further, in another embodiment, prior to determining the live mode, the method may further comprise:

In this embodiment, the video data stream acquired by the underlying content distribution network end is acquired from the secondary source station end in real time, the video data stream acquired by the secondary source station end is acquired from the source station end, and the video data stream acquired by the source station end is obtained after the original video data frames acquired by each video acquisition device are preprocessed. Correspondingly, the specific transmission process may be: and at the acquisition end, acquiring original video data frames acquired by each video acquisition device, and sending the acquired original video data frames to the processing end, wherein each edge node server in the processing end can perform preprocessing such as compression coding, splicing, management, CDN source station stream pushing and the like on the original video data frames to obtain video data streams. And the secondary source station CDN close to the CDN source station acquires the video data stream from the CDN source station, and the bottom-layer CDN acquires and caches all the video data streams of the n paths of streams from the secondary source station CDN, so that the video stream data can be shared among all the CDNs.

Further, in another embodiment, prior to determining the target view angle from the first target frame of video data and the pre-trained view angle prediction model, the method may further comprise:

and acquiring a visual angle prediction training data set, wherein the visual angle prediction training data set comprises a plurality of groups of visual angle prediction training data, and each group of visual angle prediction training data comprises historical video data acquired by each image acquisition device and a visual angle to be switched selected by a user in the next adjacent time period.

And inputting historical video data collected by each image collecting device in each group of visual angle prediction training data and the visual angle to be switched selected by the user in the next adjacent time period into a neural network model for training to obtain training characteristic information of each space image.

In this embodiment, since the view angle information of all the video capture devices at a certain time has a correlation with the view angle selection of the user at the next time, the next view angle selection of the user can be predicted after feature extraction is performed on all the view angle information in a CNN + LSTM manner. The local features of the data are extracted through CNN and combined and abstracted into high-level features, and the prediction accuracy is further improved through the time relevance between the data extracted through LSTM. The model has better space and time characterization capability and prediction capability of the next view angle of the user. Correspondingly, the specific processing procedure may be: firstly, acquiring a historical sample set of user view angle switching. The method comprises the steps that image matrix information of all cameras at a certain moment and a camera view angle selected by a user at the next moment form a sample, and a plurality of samples of historical moments of the user are collected and arranged in a time ascending order to form a sample set. And then inputting the pixel matrix information of all the video acquisition equipment at a certain moment and the machine position number labels of the video acquisition equipment as training parameters into a CNN network model, and extracting the spatial image characteristic information at the moment through the CNN convolution layer and the pooling layer depth. Wherein the pooling layer may employ maximum pooling. To prevent that the fit can be passed through the Dropout layer again, some neurons can be temporarily dropped from the network with a certain probability. An LSTM neural network time series prediction model can then be designed, selecting the appropriate activation function loss function. And (3) extracting image features of the CNN at each moment and the number labels of the image acquisition equipment stands to form a sample set according to a time sequence, splitting the sample set into a training set and a testing set, and inputting the training set and the testing set into an LSTM model. And training the prediction model by adopting a training set, testing the prediction model by adopting a testing set, constructing a model loss function, and continuously optimizing network model parameters by adopting a back propagation algorithm. And finding out the time relevance between the image characteristics and the prediction model at the next moment, wherein the time relevance can be used for predicting the selection of the next visual angle of the user, and finally accessing the input of the LSTM to the full-connection layer to obtain a prediction result. And when the accuracy of the prediction result reaches a preset accuracy threshold, finishing the training of the visual angle prediction model.

In addition, in another embodiment, after the video data frames corresponding to the target view are sent to the user side for live broadcasting, the method may further include:

and receiving a live broadcasting ending instruction sent by the user side.

In this embodiment, after the video data frame corresponding to the target view is played, it needs to be determined whether the user needs to end the live broadcast. Namely, the judgment can be performed by judging whether a live broadcasting ending instruction sent by the user side is received. And if a live broadcasting ending instruction sent by the user side is received, stopping acquiring a new video data stream according to the live broadcasting ending instruction, and ending the live broadcasting. If the live broadcasting finishing instruction sent by the user side is not received and indicates that the user wants to continue to watch live broadcasting, the live broadcasting mode selected by the user can be determined again, and related video data frames are correspondingly acquired according to the live broadcasting mode selected by the user to carry out live broadcasting operation.

Fig. 3 is a schematic flow diagram of a live broadcast processing method according to another embodiment of the present application, where in this embodiment, the method may include:

and S1, the processing end acquires the original video data frames acquired by each video acquisition device.

And S2, the processing end carries out compression coding operation, splicing operation, management operation and source station end stream pushing operation to obtain video data stream.

And S3, the secondary source station side acquires the configuration file.

And S4, the bottom content distribution network end obtains the video data stream from the secondary source station end and caches the video data stream, and the content distribution network ends share the data.

S5, the user end selects the playing mode, wherein, there are two playing modes: a free view live mode and a fixed view live mode (i.e., a preset mode).

And S6, the user instruction is invalid in the fixed visual angle live broadcast mode, the stream can be pulled from the content distribution network end according to the optimal visual angle instruction information synchronously carried by each frame of data, the user end plays the preset optimal visual angle video, and the playing angle is automatically switched.

And S7, enabling the user instruction to take effect in the free visual angle live broadcast mode, receiving the switching instruction control information of the user by the content distribution network terminal server, and enabling the user to select to watch the visual angle to pull the stream to the content distribution network terminal. The control information may include a current play mode, current viewing angle information, start and stop viewpoint information (such as offset) of switching, speed of switching, and the like.

And S8, predicting the next target view angle switched by the user based on the trained view angle prediction model.

S9: and judging whether the user indicates the end of the playing. And after each frame of video data is played, judging whether the video is played completely according to the user instruction information, stopping playing when the video is played completely, and otherwise, continuously reading the data.

Based on the same idea, an embodiment of the present specification further provides a device corresponding to the foregoing method, and fig. 4 is a schematic structural diagram of a live broadcast processing device provided in the embodiment of the present application, and as shown in fig. 4, the live broadcast processing device may include:

the determining module 401 is configured to determine a live broadcast mode, and acquire a view switching instruction sent by a user side when the live broadcast mode is a free view live broadcast mode, where the view switching instruction includes switching control information.

In this embodiment, the determining module 401 is further configured to:

receiving a mode selection instruction sent by a user side, wherein the mode selection instruction comprises a mode identifier.

And determining a live broadcast mode according to the mode identification.

A processing module 402, configured to obtain, according to the view switching instruction, a first target video data frame corresponding to the switching control information from a pre-obtained video data stream, and determine a target view according to the first target video data frame and a pre-trained view prediction model.

In this embodiment, if the switching control information includes the current play mode and the information of the viewing angle to be switched, the processing module 402 is further configured to:

The view prediction model includes a CNN convolutional neural network module and an LSTM long-term memory network module, and the processing module 402 is further configured to:

The processing module 402 is further configured to send the first target video data frame to a user side for live broadcast operation.

The processing module 402 is further configured to obtain a video data frame corresponding to the target view from the video data stream, and send the video data frame corresponding to the target view to a user side for live broadcast operation.

In this embodiment, the processing module 402 is further configured to:

In addition, in another embodiment, if the live mode is a fixed view live mode, the processing module 402 is further configured to:

Moreover, in another embodiment, the processing module 402 is further configured to:

and receiving a live broadcasting ending instruction sent by the user side.

The apparatus provided in the embodiment of the present application can implement the method of the embodiment shown in fig. 2, and the implementation principle and the technical effect are similar, which are not described herein again.

Fig. 5 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application, and as shown in fig. 5, a device 500 according to the embodiment includes: a processor 501, and a memory communicatively coupled to the processor. The processor 501 and the memory 502 are connected by a bus 503.

In a specific implementation, the processor 501 executes the computer executable instructions stored in the memory 502, so that the processor 501 executes the method in the above method embodiment.

For a specific implementation process of the processor 501, reference may be made to the above method embodiments, which implement the similar principle and technical effect, and this embodiment is not described herein again.

In the embodiment shown in fig. 5, it should be understood that the Processor may be a Central Processing Unit (CPU), other general purpose processors, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present invention may be embodied directly in a hardware processor, or in a combination of the hardware and software modules within the processor.

The memory may comprise high speed RAM memory and may also include non-volatile storage NVM, such as at least one disk memory.

The bus may be an Industry Standard Architecture (ISA) bus, a Peripheral Component Interconnect (PCI) bus, an Extended ISA (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, the buses in the figures of the present application are not limited to only one bus or one type of bus.

The embodiment of the present application further provides a computer-readable storage medium, where a computer execution instruction is stored in the computer-readable storage medium, and when a processor executes the computer execution instruction, the live broadcast processing method of the foregoing method embodiment is implemented.

An embodiment of the present application further provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the live broadcast processing method as described above is implemented.

The computer-readable storage medium may be implemented by any type of volatile or non-volatile memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk. Readable storage media can be any available media that can be accessed by a general purpose or special purpose computer.

An exemplary readable storage medium is coupled to the processor such the processor can read information from, and write information to, the readable storage medium. Of course, the readable storage medium may also be an integral part of the processor. The processor and the readable storage medium may reside in an Application Specific Integrated Circuits (ASIC). Of course, the processor and the readable storage medium may also reside as discrete components in the apparatus.

Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Finally, it should be noted that: the above embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present application.

Claims

1. A live broadcast processing method is characterized by comprising the following steps:

sending the first target video data frame to the user side for live broadcasting operation;

2. The method according to claim 1, wherein the switching control information includes a current playing mode and information of a viewing angle to be switched,

3. The method of claim 2, wherein the perspective prediction model comprises a CNN convolutional neural network module and an LSTM long-and-short time memory network module,

4. The method of claim 1, wherein the obtaining the video data frame corresponding to the target view from the video data stream comprises:

5. The method of claim 1, wherein the determining the live mode comprises:

and determining a live broadcast mode according to the mode identification.

6. The method of any one of claims 1-5, wherein if the live mode is a fixed view live mode, the method further comprises:

7. The method of any of claims 1-5, wherein prior to the determining the live mode, the method further comprises:

8. The method of any of claims 1-5, wherein prior to said determining a target view from the first target frame of video data and a pre-trained view prediction model, the method further comprises:

9. The method according to any one of claims 1 to 5, wherein after the sending the video data frame corresponding to the target view to the user side for live broadcasting operation, the method further comprises:

receiving a live broadcasting ending instruction sent by a user side;

10. A live broadcast processing apparatus, comprising:

11. An electronic device, comprising: a processor, and a memory communicatively coupled to the processor;

the memory stores computer-executable instructions;

the processor executes computer-executable instructions stored by the memory to implement the live processing method of any of claims 1-9.

12. A computer-readable storage medium having computer-executable instructions stored thereon which, when executed by a processor, implement a live processing method as claimed in any one of claims 1 to 9.