CN116132737A

CN116132737A - Data processing method, live broadcast method and device, electronic equipment and storage medium

Info

Publication number: CN116132737A
Application number: CN202310161096.9A
Authority: CN
Inventors: 于和新; 巢娅
Original assignee: Guangzhou Boguan Information Technology Co Ltd
Current assignee: Guangzhou Boguan Information Technology Co Ltd
Priority date: 2023-02-22
Filing date: 2023-02-22
Publication date: 2023-05-16

Abstract

The disclosure provides a data processing method, a live broadcast method and device, electronic equipment and a storage medium, and relates to the technical field of live broadcast. The data processing method comprises the following steps: acquiring an initial video; inputting the initial video into a pre-trained target super-resolution model, and outputting an intermediate video, wherein the image resolution of the intermediate video is greater than that of the initial video; and carrying out detail recovery processing on the intermediate video to obtain a target video. The technical scheme of the embodiment of the disclosure can overcome the technical problem of poor live broadcast effect caused by the model limitation of high frame rate, high complexity and other requirements.

Description

Data processing method, live broadcast method and device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of live broadcasting technology, and in particular, to a data processing method, a live broadcasting method, a data processing apparatus, a live broadcasting apparatus, an electronic device, and a computer readable storage medium.

Background

With the improvement of video production level, the quality requirement of the audience on the live broadcast picture is increasingly increased, although the computational power requirement on a machine model is not high in the method for improving the resolution based on the traditional algorithm, the reconstructed image output is generally blurred, the definition is even lower than that of the original image, for example, the phenomena of picture blurring, distortion and the like are easily caused under the high-frame-rate and high-complexity live broadcast scene due to factors such as bandwidth, coding loss and the like, and therefore, the traditional algorithm is difficult to multiplex to the live broadcast scene.

Aiming at the technical problem that in the process of applying the ultra-high definition video technology to the live broadcast technology, the live broadcast effect is poor due to the model limitation of the requirements of high frame rate, high complexity and the like, no solution is proposed at present.

It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the present disclosure and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

An object of an embodiment of the present disclosure is to provide a data processing method, a live broadcast method, a data processing apparatus, a live broadcast apparatus, an electronic device, and a computer readable storage medium, so as to overcome, at least to some extent, a technical problem of poor live broadcast effect due to model limitations required by high frame rate, high complexity, and the like.

Other features and advantages of the present disclosure will be apparent from the following detailed description, or may be learned in part by the practice of the disclosure.

According to a first aspect of an embodiment of the present disclosure, there is provided a data processing method, including: acquiring an initial video; inputting the initial video into a pre-trained target super-resolution model, and outputting an intermediate video, wherein the image resolution of the intermediate video is greater than that of the initial video; and carrying out detail recovery processing on the intermediate video to obtain a target video.

In some example embodiments of the present disclosure, based on the foregoing approach, the target super-resolution model is derived based on a pre-training process comprising: acquiring a training data set; model training is carried out on a pre-constructed initial super-resolution model through a training data set to obtain an intermediate super-resolution model, wherein the initial super-resolution model comprises a first convolution layer and a heavy parameter convolution layer, and the first convolution layers and/or the heavy parameter convolution layers are connected through a long and short residual error network; and reconstructing the network structure of the intermediate super-resolution model to obtain the target super-resolution model.

In some example embodiments of the present disclosure, based on the foregoing solution, performing a reconstruction process on a network structure of an intermediate super-resolution model to obtain a target super-resolution model, including: determining a second convolution layer equivalent to the heavy parameter convolution layer; and replacing the heavy parameter convolution layer of the intermediate super-resolution model with a second convolution layer to obtain the target super-resolution model.

In some example embodiments of the present disclosure, determining a second convolutional layer equivalent to the heavy parameter convolutional layer based on the foregoing scheme, comprises: performing equivalent fusion treatment on each convolution layer in the heavy parameter convolution layers to determine a second convolution layer; wherein the equivalent fusion process comprises einstein summation.

In some example embodiments of the present disclosure, based on the foregoing approach, the target super-resolution model includes a first convolution layer and a second convolution layer, inputting the initial video into the pre-trained target super-resolution model, outputting the intermediate video, including: inputting the initial video into a first convolution layer, and extracting first characteristic data of the initial video; inputting the first characteristic data into a second convolution layer, and extracting second characteristic data of the initial video; performing up-sampling processing on the second characteristic data to obtain expanded second characteristic data; and taking the expanded second characteristic data as an intermediate video.

In some example embodiments of the present disclosure, based on the foregoing scheme, performing detail restoration processing on an intermediate video to obtain a target video, including: and sharpening the intermediate video to promote high-frequency information in the intermediate video to obtain a target video.

According to a second aspect of the embodiments of the present disclosure, there is also provided a live broadcast method, including: acquiring a target video, wherein the target video is determined according to the data processing method of the first aspect; acquiring live broadcast data, and generating a live broadcast data stream according to a target video and the live broadcast data; and pushing the live data stream to each live client for display.

In some example embodiments of the present disclosure, based on the foregoing scheme, generating a live data stream from a target video and live data includes: encoding the image data in the live broadcast data and the image data in the target video to obtain a live broadcast video stream; synthesizing the audio data in the live broadcast data and the audio data in the target video to obtain a live broadcast audio stream; and packaging the live video stream and the live audio stream to obtain a live data stream.

In some example embodiments of the disclosure, based on the foregoing scheme, the method further comprises: setting a target definition gear at a live client according to the transmission state of the target video; setting a target definition gear at a live client according to a transmission state of a target video, including: if the target video is monitored to be in a normal transmission state in a first preset time period, adding a target definition gear in the live broadcast client; or if the target video is in the cut-off state in the second preset time period, deleting the target definition gear in the live client.

According to a third aspect of embodiments of the present disclosure, there is provided a data processing apparatus comprising: the first acquisition unit is used for acquiring an initial video; the output unit is used for inputting the initial video into the pre-trained target super-resolution model and outputting an intermediate video, wherein the image resolution of the intermediate video is larger than that of the initial video; and the first processing unit is used for carrying out detail recovery processing on the intermediate video to obtain a target video.

According to a fourth aspect of the embodiments of the present disclosure, there is also provided a live broadcast apparatus, including: a third acquisition unit configured to acquire a target video determined according to the data processing apparatus of the third aspect described above; the generation unit is used for acquiring live broadcast data and generating a live broadcast data stream according to the target video and the live broadcast data; and the display unit is used for pushing the live broadcast data stream to each live broadcast client for display.

According to a fifth aspect of embodiments of the present disclosure, there is provided an electronic device, comprising: a processor; and a memory having stored thereon computer readable instructions which when executed by the processor perform any of the methods described above.

According to a sixth aspect of embodiments of the present disclosure, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a method according to any one of the above.

The technical scheme provided by the embodiment of the disclosure can comprise the following beneficial effects:

the data processing method in the example embodiment of the present disclosure, by acquiring an initial video; inputting the initial video into a pre-trained target super-resolution model, and outputting an intermediate video, wherein the image resolution of the intermediate video is greater than that of the initial video; and carrying out detail recovery processing on the intermediate video to obtain a target video. On one hand, the pre-trained target superdivision model can overcome the problem that the input initial video is distorted due to overlarge bandwidth and coding loss under a high-frame-rate and high-complexity live broadcast scene; on the other hand, through detail recovery processing, the high-frequency information of the picture is enhanced, and the visual effect which accords with the watching habit of the audience better is achieved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. It will be apparent to those of ordinary skill in the art that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived from them without undue effort. In the drawings:

FIG. 1 illustrates a schematic diagram of an exemplary system architecture to which a data processing method and apparatus of the present disclosure may be applied in an exemplary embodiment;

FIG. 2 schematically illustrates a schematic diagram of a data processing method according to some embodiments of the present disclosure;

FIG. 3 schematically illustrates a training network architecture diagram in accordance with some embodiments of the present disclosure;

fig. 4 schematically illustrates a schematic diagram of an inference network architecture in accordance with some embodiments of the present disclosure;

FIG. 5 schematically illustrates a schematic diagram of a heavy parameter module architecture according to some embodiments of the present disclosure;

FIG. 6 schematically illustrates a schematic diagram of a heavy parameter module equivalent according to some embodiments of the present disclosure;

FIG. 7 schematically illustrates a schematic diagram of a heavy parameter convolution in accordance with some embodiments of the present disclosure;

fig. 8 is a schematic diagram of a live method according to some embodiments of the present disclosure;

FIG. 9 schematically illustrates a schematic diagram of a data processing apparatus according to some embodiments of the present disclosure;

fig. 10 schematically illustrates a schematic diagram of a live device according to some embodiments of the present disclosure;

fig. 11 schematically illustrates a structural schematic diagram of a computer system of an electronic device according to some embodiments of the present disclosure.

In the drawings, the same or corresponding reference numerals indicate the same or corresponding parts.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the disclosed aspects may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the disclosure.

Moreover, the drawings are only schematic illustrations and are not necessarily drawn to scale. The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

FIG. 1 illustrates a schematic diagram of an exemplary system architecture to which the data processing methods and apparatus of embodiments of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include one or more of the

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the

terminal devices

101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others. The

terminal devices

101, 102, 103 may be various electronic devices with display screens including, but not limited to, desktop computers, portable computers, smart phones, tablet computers, and the like. It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, the server 105 may be a server cluster formed by a plurality of servers. As an example, a client application may be installed in a terminal device, and a server provides live services to each client application. For example, a user may use a terminal to install a live client application, obtain a live service provided by a server through the live client application, or use a terminal to install a browser client application, log in a live page provided by the server through the browser client application, and obtain a live service. Generally, two types of users are involved in the live broadcast process, one type of users is a hosting user, and the other type of users is a spectator user, based on which user terminals can be divided into a hosting end and a spectator end. The client application can provide a live broadcasting function and a live watching function of audiences, the live broadcasting user can use the live broadcasting function provided by the client application to conduct live video broadcasting, and the audience user can use the live watching function provided by the client application to watch video contents. For example, a video shooting module of a host side installed with a client application can be started, audio and video data are collected in real time through the video shooting module and sent to a server, the server sends the received audio and video data to a viewer side installed with the client application, and a viewer user can watch live broadcast contents of the host user by using a watching function provided by the client application.

The data processing method provided by the embodiments of the present disclosure may be performed by the

terminal devices

101, 102, 103, and correspondingly, the data processing apparatus may also be provided in the

terminal devices

101, 102, 103. The data processing method provided by the embodiment of the present disclosure may also be performed by the

terminal devices

101, 102, 103 and the server 105, and accordingly, the data processing apparatus may be disposed in the

terminal devices

101, 102, 103 and the server 105. In addition, the data processing method provided in the embodiment of the present disclosure may also be executed by the server 105, and accordingly, the data processing apparatus may be provided in the server 105, which is not particularly limited in the present exemplary embodiment.

For example, in the present exemplary embodiment, acquisition of an initial video received through the

terminal devices

101, 102, 103 from the server 105 deployed on the live platform may be acquired; then, inputting the initial video into a pre-trained target super-resolution model, and outputting an intermediate video; after that, the server 105 continues the detail restoration processing for the intermediate video, resulting in a target video.

It will be readily appreciated by those skilled in the art that the above operations are for example only and the present exemplary embodiment is not limited thereto.

In one embodiment of the present disclosure, the data processing method may be executed on a terminal device or a server, and a server is used to execute the data processing method in the present disclosure as an example. Fig. 2 schematically illustrates a schematic diagram of a data processing method according to some embodiments of the present disclosure.

Referring to fig. 2, the data processing method may include the steps of:

step S210, acquiring an initial video.

Specifically, the target video obtained in the embodiment of the present invention may be applied to a super-resolution cloud streaming media processing link, and the video stream of the initial video input to the processing link may be a live-room original video stream or another type of live-room original video stream, where the resolution of the original video stream may be 720P or 1080P, and the disclosure is not limited.

Step S220, inputting the initial video into the pre-trained target super-resolution model, and outputting an intermediate video, wherein the image resolution of the intermediate video is greater than that of the initial video.

Specifically, the image frames with low resolution in the initial video can be converted into the image frames with high resolution through the target super-resolution model, wherein the intermediate video can be a video with enhanced resolution after 1080P live broadcasting room original picture video flows through the target super-resolution model.

The target super-resolution model is obtained based on a pre-training process, and the specific pre-training process comprises the following steps: acquiring a training data set; model training is carried out on a pre-constructed initial super-resolution model through a training data set to obtain an intermediate super-resolution model, wherein the initial super-resolution model comprises a first convolution layer and a heavy parameter convolution layer, and the first convolution layers and/or the heavy parameter convolution layers are connected through a long and short residual error network; and reconstructing the network structure of the intermediate super-resolution model to obtain the target super-resolution model.

Specifically, an original picture video stream of a live broadcasting room to be displayed and enhanced is obtained, and the video stream is used as a training data set to train a pre-built initial super-resolution model.

For example, the process of training a pre-constructed initial super-resolution model through a training data set to obtain an intermediate super-resolution model may be as shown in fig. 3, inputting a video stream of a game original drawing 1080P into a 5*5 convolution layer (corresponding to a first convolution layer in the disclosure) of the initial super-resolution model shown in fig. 3 to obtain a first set of training image feature data, inputting the first set of image training feature data into a heavy parameter convolution layer to obtain a second set of training image feature data, and then continuing to iteratively train the second set of training image feature data to obtain the intermediate super-resolution model, where the convolution layers of the initial super-resolution model are connected by a long and short residual error network and perform identical mapping, so as to further reduce difficulty of machine learning and enhance correlation between adjacent pixels, and when the resolution of the video stream of 1080P is greater, using a heavy parameter structure to enable the training process of the initial super-resolution model to give enough experience field to learn corresponding model parameters without increasing time consumption of a subsequent inference to the target super-resolution model. The generation efficiency of the target video is further improved.

Specifically, the target super-resolution model is obtained by reconstructing a network structure of the intermediate super-resolution model, and the specific implementation process comprises the following steps: determining a second convolution layer equivalent to the heavy parameter convolution layer; and replacing the heavy parameter convolution layer of the intermediate super-resolution model with a second convolution layer to obtain the target super-resolution model.

For example, the heavy parameter module in the intermediate super-resolution model in fig. 3 is equivalent to the convolution layer 3*3 in fig. 4 (corresponding to the second convolution layer in the present disclosure), and of course, the heavy parameter module in fig. 3 may be equivalent to the convolution layer of other levels, which is not limited in this disclosure, and the present disclosure can effectively avoid phenomena such as blurring and distortion of a picture caused by factors such as bandwidth and coding loss in a high-frame-rate and high-complexity live broadcast scene through the target super-resolution model.

Specifically, determining a second convolution layer equivalent to the heavy parameter convolution layer, the implementation steps include: performing equivalent fusion treatment on each convolution layer in the heavy parameter convolution layers to determine a second convolution layer; wherein the equivalent fusion process comprises einstein summation.

For example, as shown in fig. 5, which shows the network structure of the heavy parameter module, fig. 6 is the result after each heavy convolution layer in the heavy parameter module is equivalent, where the internal schematic structure of the heavy parameter convolution in fig. 5 may be as shown in fig. 7, and is characterized in that the convolution layers do not add nonlinear factors, so that the target super-resolution model may be obtained by using the einstein summation formula equivalent fusion method.

Specifically, the target super-resolution model comprises a first convolution layer and a second convolution layer, an initial video is input into the pre-trained target super-resolution model, and an intermediate video is output, and the method is realized through the following steps: inputting the initial video into a first convolution layer, and extracting first characteristic data of the initial video; inputting the first characteristic data into a second convolution layer, and extracting second characteristic data of the initial video; performing up-sampling processing on the second characteristic data to obtain expanded second characteristic data; and taking the expanded second characteristic data as an intermediate video.

For example, as shown in fig. 4, if an initial live original 1080P video stream is input to a 5*5 convolution layer (corresponding to a first convolution layer in the disclosure), after 1080P video frame features of the live original (i.e. first feature data in the disclosure) are extracted, the 1080P video frame features of the live original are continuously input to a plurality of 3*3 convolution layers for fine feature extraction, so as to obtain extracted data (i.e. second feature data in the disclosure), the extracted data is subjected to data expansion in an up-sampling manner, so as to obtain expanded data (i.e. intermediate video in the disclosure), and the data is expanded in an up-sampling manner, so that the data features in the live original are enhanced.

And step S230, performing detail recovery processing on the intermediate video to obtain a target video.

Specifically, the detail restoration may be to enhance the brightness of the image, enhance the display degree, make the edge outline clearer, and so on, so as to obtain a target video with clearer image quality resolution, where the target video may be a high-definition 2K image quality or an ultra-high-definition 4K image quality, and the disclosure is not limited.

In this example embodiment, the detail recovery process is performed on the intermediate video to obtain the target video, which may include the following steps: and sharpening the intermediate video to promote high-frequency information in the intermediate video to obtain a target video. That is, the present disclosure achieves a sharpening effect by using high contrast filtering on the extended data, and obtains a target video with ultra-high definition image quality resolution by enhancing high frequency information in the target video frame, wherein the process of calculating a gaussian blur image (the result in fig. 3) is performed before the target super-resolution model, so that the time consumed for reasoning can be reduced, the range of action of the gaussian kernel can be increased, and the sharpening effect can be further enhanced. Through the optimization operation, the sharpening effect of the convolution kernel size 17 can be obtained within the time consumption of 1.5ms by the inference model network architecture shown in fig. 4, and the visual effect of the viewing habit of the audience is improved.

To sum up, in an example embodiment of the present disclosure, an initial video is obtained; inputting the initial video into a pre-trained target super-resolution model, and outputting an intermediate video, wherein the image resolution of the intermediate video is greater than that of the initial video; and carrying out detail recovery processing on the intermediate video to obtain a target video. On one hand, the pre-trained target superdivision model can overcome the problem that the input initial video is distorted due to overlarge bandwidth and coding loss under a high-frame-rate and high-complexity live broadcast scene; on the other hand, through detail recovery processing, the high-frequency information of the picture is enhanced, and the visual effect which accords with the watching habit of the audience better is achieved.

Fig. 8 schematically illustrates a schematic diagram of a live method according to some embodiments of the present disclosure.

Referring to fig. 8, the live broadcast method may include the steps of:

in step S810, a target video is acquired, which is determined according to the data processing method as described above in the embodiment of fig. 2.

Step S820, acquiring live broadcast data, and generating a live broadcast data stream according to the target video and the live broadcast data.

Step S830, the live data stream is pushed to each live client for display.

The live broadcast data may include audio and video data of a live broadcast room, and a sound, a shot picture, and the like of a host broadcast.

Specifically, generating a live data stream according to a target video and live data includes: encoding the image data in the live broadcast data and the image data in the target video to obtain a live broadcast video stream; synthesizing the audio data in the live broadcast data and the audio data in the target video to obtain a live broadcast audio stream; and packaging the live video stream and the live audio stream to obtain a live data stream.

For example, the video (target video) with enhanced image quality is encoded with live broadcast data to obtain a video stream of a live broadcast room, and the original video of the game in the video stream of the live broadcast room is synthesized with the audio in the live broadcast data, which may be a mixing synthesis of the video stream of the live broadcast room or a mute processing of the video stream of the live broadcast room to obtain a live broadcast audio stream, and the live broadcast audio stream and the encoded live broadcast video stream are encapsulated to obtain an ultra-high definition game live broadcast data stream with low delay, high frame rate and high quality.

For example, a target definition gear is set in the client, and the definition gear is monitored through a heartbeat mechanism, where the target definition gear may be a 2K gear or a 4K gear, and the disclosure is not limited, and the method further includes: setting a target definition gear at a live client according to the transmission state of the target video; setting a target definition gear at a live client according to a transmission state of a target video, including: if the target video is monitored to be in a normal transmission state in a first preset time period, adding a target definition gear in the live broadcast client; and if the target video is in the cut-off state in the second preset time period, deleting the target definition gear in the live client.

For example, when the encapsulation is completed to push the ultra-high definition live data stream to the target definition range, a heartbeat mechanism is added to the target definition range of all live rooms, and the implementation process of the heartbeat mechanism is as follows: monitoring the transmission state of a target video, if the transmission state of the target video is normal in a certain time, adding a target definition gear at a target client to ensure that a viewer can watch a live video normally, if the transmission state of the target video is disconnected in a certain time, automatically deleting the target definition gear at the target client to avoid the situation that the viewer clicks the target definition gear to watch the video, and according to the method, a heartbeat mechanism is added on the target definition gear of the client to avoid the situation that the viewer can not watch the video continuously because the user still clicks the target definition gear under the condition that the transmission link is a video stream corresponding to the target video with ultra-high definition, so that the live experience of the viewer is poor.

It should be noted that although the steps of the methods of the present disclosure are illustrated in the accompanying drawings in a particular order, this does not require or imply that the steps must be performed in that particular order or that all of the illustrated steps be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform, etc.

Further, in the present exemplary embodiment, a data processing apparatus is provided. Referring to fig. 9, the data processing apparatus 900 includes: a first acquisition unit 910, an output unit 920, and a first processing unit 930.

Specifically, the first obtaining unit 910 is configured to obtain an initial video;

an output unit 920, configured to input the initial video into a pre-trained target super-resolution model, and output an intermediate video, where an image resolution of the intermediate video is greater than an image resolution of the initial video;

the first processing unit 930 is configured to perform detail restoration processing on the intermediate video to obtain a target video.

To sum up, in an example embodiment of the present disclosure, an initial video is acquired by the first acquisition unit 910; the output unit 920 inputs the initial video into the pre-trained target super-resolution model, and outputs an intermediate video, wherein the image resolution of the intermediate video is greater than that of the initial video; the first processing unit 930 performs detail restoration processing on the intermediate video to obtain a target video. On one hand, the pre-trained target superdivision model can overcome the problem that the input initial video is distorted due to overlarge bandwidth and coding loss under a high-frame-rate and high-complexity live broadcast scene; on the other hand, through detail recovery processing, the high-frequency information of the picture is enhanced, and the visual effect which accords with the watching habit of the audience better is achieved.

In some example embodiments of the present disclosure, based on the foregoing scheme, the apparatus further includes: a second acquisition unit configured to acquire a training data set; the training unit is used for carrying out model training on the pre-constructed initial super-resolution model through a training data set to obtain an intermediate super-resolution model, wherein the initial super-resolution model comprises a first convolution layer and a heavy parameter convolution layer, and the first convolution layers and/or the heavy parameter convolution layers are connected through a long and short residual error network; and the second processing unit is used for reconstructing the network structure of the intermediate super-resolution model to obtain the target super-resolution model.

In some example embodiments of the present disclosure, based on the foregoing scheme, the second processing unit includes: a determining module, configured to determine a second convolution layer equivalent to the heavy parameter convolution layer; and the replacing module is used for replacing the heavy parameter convolution layer of the intermediate super-resolution model with the second convolution layer to obtain the target super-resolution model.

In some example embodiments of the present disclosure, based on the foregoing scheme, the determining module includes: the processing submodule is used for carrying out equivalent fusion processing on each convolution layer in the repeated parameter convolution layers to determine a second convolution layer; wherein the equivalent fusion process comprises einstein summation.

In some example embodiments of the present disclosure, based on the foregoing scheme, the target super-resolution model includes a first convolution layer and a second convolution layer, an output unit including: the first extraction module is used for inputting the initial video into the first convolution layer and extracting first characteristic data of the initial video; the second extraction module is used for inputting the first characteristic data into the second convolution layer and extracting second characteristic data of the initial video; the sampling module is used for carrying out up-sampling processing on the second characteristic data to obtain expanded second characteristic data; and the determining module is used for taking the expanded second characteristic data as an intermediate video.

In some example embodiments of the present disclosure, based on the foregoing scheme, the first processing unit includes: and the sharpening module is used for sharpening the intermediate video so as to promote the high-frequency information in the intermediate video and obtain the target video.

In another exemplary embodiment, there is also provided a live broadcast apparatus, referring to fig. 10, the live broadcast apparatus 1000, including: a third obtaining unit 1010, a generating unit 1020, and a displaying unit 1030.

Specifically, the third obtaining unit 1010 is configured to obtain a target video, where the target video is generated according to the data processing apparatus according to the embodiment shown in fig. 9;

The generating unit 1020 is configured to obtain live broadcast data, and generate a live broadcast data stream according to the target video and the live broadcast data;

and the display unit 1030 is configured to push the live data stream to each live client for display.

In some example embodiments of the present disclosure, based on the foregoing scheme, the generating unit includes: the encoding module is used for encoding the image data in the live broadcast data and the image data in the target video to obtain a live broadcast video stream; the synthesis module is used for synthesizing the audio data in the live broadcast data with the audio data in the target video to obtain a live broadcast audio stream; and the encapsulation module is used for encapsulating the live video stream and the live audio stream to obtain a live data stream.

In some example embodiments of the present disclosure, based on the foregoing scheme, the apparatus further includes: the setting unit is used for setting a target definition gear at the live broadcast client according to the transmission state of the target video; a setting unit including: the adding module is used for adding a target definition gear at the live client if the target video is monitored to be in a normal transmission state in a first preset time period; or the deleting module is used for deleting the target definition gear in the live client if the target video in the second preset time period is in the cut-off state.

The specific details of each module of the data processing apparatus described above have been described in detail in the corresponding data processing method, so that the details are not repeated here.

It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.

Furthermore, although the steps of the methods in the present disclosure are depicted in a particular order in the drawings, this does not require or imply that the steps must be performed in that particular order or that all illustrated steps be performed in order to achieve desirable results. Additionally or alternatively, certain steps may be omitted, multiple steps combined into one step to perform, and/or one step decomposed into multiple steps to perform, etc.

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, including several instructions to cause a computing device (may be a personal computer, a server, a mobile terminal, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.

In an exemplary embodiment of the present disclosure, a computer storage medium capable of implementing the above method is also provided. On which a program product is stored which enables the implementation of the method described above in the present specification. In some possible embodiments, various aspects of the disclosure may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps according to the various exemplary embodiments of the disclosure as described in the "exemplary methods" section of this specification, when the program product is run on the terminal device, e.g. the following steps may be carried out: acquiring an initial video; inputting the initial video into a pre-trained target super-resolution model, and outputting an intermediate video, wherein the image resolution of the intermediate video is greater than that of the initial video; and carrying out detail recovery processing on the intermediate video to obtain a target video.

In an alternative embodiment, the target super-resolution model is derived based on a pre-training process comprising: acquiring a training data set; model training is carried out on a pre-constructed initial super-resolution model through a training data set to obtain an intermediate super-resolution model, wherein the initial super-resolution model comprises a first convolution layer and a heavy parameter convolution layer, and the first convolution layers and/or the heavy parameter convolution layers are connected through a long and short residual error network; and reconstructing the network structure of the intermediate super-resolution model to obtain the target super-resolution model.

In an alternative embodiment, the reconstructing the network structure of the intermediate super-resolution model to obtain the target super-resolution model includes: determining a second convolution layer equivalent to the heavy parameter convolution layer; and replacing the heavy parameter convolution layer of the intermediate super-resolution model with a second convolution layer to obtain the target super-resolution model.

In an alternative embodiment, determining a second convolution layer equivalent to the heavy parameter convolution layer includes: performing equivalent fusion treatment on each convolution layer in the heavy parameter convolution layers to determine a second convolution layer; wherein the equivalent fusion process comprises einstein summation.

In an alternative embodiment, the target super-resolution model includes a first convolution layer and a second convolution layer, the inputting the initial video into the pre-trained target super-resolution model, outputting the intermediate video, including: inputting the initial video into a first convolution layer, and extracting first characteristic data of the initial video; inputting the first characteristic data into a second convolution layer, and extracting second characteristic data of the initial video; performing up-sampling processing on the second characteristic data to obtain expanded second characteristic data; and taking the expanded second characteristic data as an intermediate video.

In an alternative embodiment, performing detail restoration processing on the intermediate video to obtain a target video, including: and sharpening the intermediate video to promote high-frequency information in the intermediate video to obtain a target video.

In an alternative embodiment, embodiments of the present disclosure may also include a program product for implementing the above method, which may employ a portable compact disc read-only memory (CD-ROM) and include program code, and may run on a terminal device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The computer readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

In addition, in an exemplary embodiment of the present disclosure, an electronic device capable of implementing the above method is also provided.

Those skilled in the art will appreciate that the various aspects of the present disclosure may be implemented as a system, method, or program product. Accordingly, various aspects of the disclosure may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.

An electronic device 1100 according to such an embodiment of the present disclosure is described below with reference to fig. 11. The electronic device 1100 shown in fig. 11 is merely an example and should not be construed as limiting the functionality and scope of use of the disclosed embodiments.

As shown in fig. 11, the electronic device 1100 is embodied in the form of a general purpose computing device. Components of electronic device 1100 may include, but are not limited to: the at least one first processing unit 1110, the at least one memory unit 1120, a bus 1130 connecting the different system components (including the memory unit 1120 and the first processing unit 1110), and a display unit 1140.

Wherein the storage unit stores program code executable by the first processing unit 1110 such that the first processing unit 1110 performs steps according to various exemplary embodiments of the present disclosure described in the above-described "exemplary methods" section of the present specification. For example, the first processing unit 1110 may perform the steps as follows: acquiring an initial video; inputting the initial video into a pre-trained target super-resolution model, and outputting an intermediate video, wherein the image resolution of the intermediate video is greater than that of the initial video; and carrying out detail recovery processing on the intermediate video to obtain a target video.

The storage unit 1120 may include a readable medium in the form of a volatile storage unit, such as a Random Access Memory (RAM) 11201 and/or a cache memory 11202, and may further include a Read Only Memory (ROM) 11203.

The storage unit 1120 may also include a program/utility 11204 having a set (at least one) of program modules 11205, such program modules 11205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.

The bus 1130 may be a local bus representing one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a first processing unit, or a local bus using any of a variety of bus architectures.

The electronic device 1100 may also communicate with one or more external devices 1200 (e.g., keyboard, pointing device, bluetooth device, etc.), one or more devices that enable a user to interact with the electronic device 1100, and/or any devices (e.g., routers, modems, etc.) that enable the electronic device 1100 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 1150. Also, electronic device 1100 can communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 1160. As shown, network adapter 1160 communicates with other modules of electronic device 1100 via bus 1130. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 1100, including, but not limited to: microcode, device drivers, redundant first processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.

From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, including several instructions to cause a computing device (may be a personal computer, a server, a terminal device, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.

Furthermore, the above-described figures are only schematic illustrations of processes included in the method according to the exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily appreciated that the processes shown in the above figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any adaptations, uses, or adaptations of the disclosure following the general principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims

1. A method of data processing, comprising:

acquiring an initial video;

inputting the initial video into a pre-trained target super-resolution model, and outputting an intermediate video, wherein the image resolution of the intermediate video is greater than that of the initial video;

and carrying out detail recovery processing on the intermediate video to obtain a target video.

2. The method of claim 1, wherein the target super-resolution model is derived based on a pre-training process comprising:

acquiring a training data set;

model training is carried out on a pre-built initial super-resolution model through the training data set to obtain an intermediate super-resolution model, wherein the initial super-resolution model comprises a first convolution layer and a heavy parameter convolution layer, and the first convolution layer and/or the heavy parameter convolution layer are/is connected through a long and short residual error network;

and reconstructing the network structure of the intermediate super-resolution model to obtain the target super-resolution model.

3. The method according to claim 2, wherein reconstructing the network structure of the intermediate super-resolution model to obtain the target super-resolution model comprises:

Determining a second convolutional layer equivalent to the heavy parameter convolutional layer;

and replacing the heavy parameter convolution layer of the intermediate super-resolution model with the second convolution layer to obtain a target super-resolution model.

4. A method according to claim 3, wherein said determining a second convolutional layer equivalent to the heavy parameter convolutional layer comprises:

performing equivalent fusion treatment on each convolution layer in the heavy parameter convolution layers to determine the second convolution layer;

wherein the equivalent fusion process comprises einstein summation.

5. The method of claim 4, wherein the target super-resolution model comprises the first convolution layer and the second convolution layer, wherein inputting the initial video into the pre-trained target super-resolution model, outputting the intermediate video, comprises:

inputting the initial video into the first convolution layer, and extracting first characteristic data of the initial video;

inputting the first characteristic data into the second convolution layer, and extracting second characteristic data of the initial video;

performing up-sampling processing on the second characteristic data to obtain expanded second characteristic data;

And taking the expanded second characteristic data as the intermediate video.

6. The method according to claim 1, wherein performing detail restoration processing on the intermediate video to obtain a target video comprises:

and sharpening the intermediate video to promote high-frequency information in the intermediate video to obtain the target video.

7. A live broadcast method, comprising:

obtaining a target video, the target video being generated according to the method of any one of claims 1 to 6;

acquiring live broadcast data, and generating a live broadcast data stream according to the target video and the live broadcast data;

and pushing the live broadcast data stream to each live broadcast client for display.

8. The method of claim 7, wherein generating a live data stream from the target video and the live data comprises:

encoding the image data in the live broadcast data and the image data in the target video to obtain a live broadcast video stream;

synthesizing the audio data in the live broadcast data and the audio data in the target video to obtain a live broadcast audio stream;

And packaging the live video stream and the live audio stream to obtain a live data stream.

9. The method of claim 7, wherein the method further comprises:

setting a target definition gear at the live broadcast client according to the transmission state of the target video;

setting a target definition gear at the live client according to the transmission state of the target video, including:

if the target video is in a normal transmission state in a first preset time period, adding a target definition gear in the live broadcast client; or alternatively

And if the target video is in the cut-off state in the second preset time period, deleting the target definition gear in the live client.

10. A data processing apparatus, comprising:

the first acquisition unit is used for acquiring an initial video;

the output unit is used for inputting the initial video into a pre-trained target super-resolution model and outputting an intermediate video, wherein the image resolution of the intermediate video is larger than that of the initial video;

and the first processing unit is used for carrying out detail recovery processing on the intermediate video to obtain a target video.

11. A live broadcast device, comprising:

a third acquisition unit configured to acquire a target video, the target video being generated according to the method as set forth in any one of claims 1 to 6;

the generation unit is used for acquiring live broadcast data and generating a live broadcast data stream according to the target video and the live broadcast data;

and the display unit is used for pushing the live broadcast data stream to each live broadcast client for display.

12. An electronic device, comprising:

a processor; and

a memory having stored thereon computer readable instructions which, when executed by the processor, implement the method of any of claims 1 to 9.

13. A computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of any of claims 1 to 9.