CN113628121A

CN113628121A - Method and device for processing data and training multimedia data

Info

Publication number: CN113628121A
Application number: CN202010373497.7A
Authority: CN
Inventors: 林宪晖
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-05-06
Filing date: 2020-05-06
Publication date: 2021-11-09
Anticipated expiration: 2040-05-06
Also published as: CN113628121B

Abstract

The invention discloses a method and a device for processing and training multimedia data. Wherein, the method comprises the following steps: degrading the image quality of the sample data through at least one random degradation mode to obtain first quality multimedia data; randomly sampling from the first quality multimedia data to obtain a first quality data block; enhancing the first quality data block to obtain a second quality data block; comparing the second quality data block with a data block in the sample data and a data block in the multimedia data to which the sample data belongs respectively to obtain a loss function; and training the process of enhancing the first quality data block to the second quality data block according to the loss function to obtain a machine learning model. The method solves the technical problem that in the prior art, after the video image is enhanced based on the convolutional neural network, the details in the video image are difficult to repair complementarily, so that the image is excessively smooth.

Description

Method and device for processing data and training multimedia data

Technical Field

The invention relates to the technical field of internet, in particular to a method and a device for processing and training multimedia data.

Background

The video quality enhancement can intelligently improve the resolution of a video, repair the low-quality condition existing in a video picture, remove the blurring and noise of the picture, obviously improve the quality and the impression of the video and have strong requirements and wide application scenes in old film repair, 4K reproduction and online video platforms without additional input and interaction.

However, most of the existing video enhancement methods are mainly based on image enhancement methods based on convolutional neural networks and are applied to video scenes. This does not guarantee high quality enhancement of low quality video in different styles and scenes. Meanwhile, the common neural network is trained and optimized by using similar loss functions such as mean square error and the like, so that the details in the video image are difficult to repair complementarily, and the defects of excessive smoothness and floating details exist.

Aiming at the problem that in the prior art, after a video image is enhanced based on a convolutional neural network, the details in the video image are difficult to repair complementarily, so that the image is excessively smooth, an effective solution is not provided at present.

Disclosure of Invention

The embodiment of the invention provides a method and a device for processing data and training multimedia data, which at least solve the technical problem that in the prior art, after a video image is enhanced based on a convolutional neural network, the details in the video image are difficult to repair complementarily, so that the image is excessively smooth.

According to an aspect of an embodiment of the present invention, there is provided a data processing method including: receiving first multimedia data uploaded by a client; receiving a machine learning model selected by a client; converting the first multimedia data into second multimedia data based on the machine learning model; displaying the second multimedia data; wherein the image quality of the second multimedia data is different from the first multimedia data.

According to an aspect of an embodiment of the present invention, there is provided another data processing method including: receiving a service request from a client; analyzing the service request to obtain a first multimedia data address and a machine learning model ID; acquiring first multimedia data through a first multimedia data address; acquiring a machine learning model according to the machine learning model ID; the first multimedia data is converted into second multimedia data based on a machine learning model, wherein the image quality of the second multimedia data is different from the first multimedia data.

Optionally, the second multimedia data is subjected to data specification conversion according to the type of the device to be launched, so as to obtain converted second multimedia data.

According to another aspect of an embodiment of the present invention, there is provided a method of training multimedia data, including: degrading the image quality of the sample data through at least one random degradation mode to obtain first quality multimedia data; randomly sampling from the first quality multimedia data to obtain a first quality data block; enhancing the first quality data block to obtain a second quality data block, wherein the image quality of the second quality data block is different from that of the first quality data block; comparing the second quality data block with a data block in the sample data and a data block in the multimedia data to which the sample data belongs respectively to obtain a loss function; and training the process of enhancing the first quality data block to the second quality data block according to the loss function to obtain a machine learning model.

Optionally, before the image quality of the sample data is degraded through at least one random degradation mode to obtain the first quality multimedia data, the method further includes: and collecting sample data from the multimedia data according to the application scene.

Optionally, the at least one random degradation pattern comprises: down-sampling, blurring, adding noise, compression coding.

Optionally, randomly sampling from the first quality multimedia data to obtain a first quality data block includes: randomly acquiring a data block from first quality multimedia data; and screening the data blocks according to the preset data block information to obtain a first quality data block.

Optionally, the comparing the second quality data block with the data block in the sample data and the data block in the multimedia data to which the sample data belongs respectively, and the obtaining the loss function includes: comparing the second quality data block with data blocks in the sample data to obtain a first loss function and a perception loss function, wherein multimedia data generated by the first loss function and the perception loss function due to constraint is close to the multimedia data in content and sensory characteristics; comparing the second quality data block with a data block in the multimedia data to which the sample data belongs to obtain a second loss function and a counterloss function, wherein the second loss function is used for improving the judgment of the generated picture; a penalty function for increasing the efficiency of restoring details in the multimedia data by the second quality data block; obtaining a third loss function according to the first loss function, the perception loss function and the countermeasure loss function; a third loss function is determined as the loss function.

Optionally, the multimedia data to which the sample data belongs includes: video data, picture data, augmented reality image data, or virtual reality image data.

According to still another aspect of the embodiments of the present invention, there is also provided a data processing apparatus including: the data receiving module is used for receiving first multimedia data uploaded by the client; the selection module is used for receiving the machine learning model selected by the client; the data conversion module is used for converting the first multimedia data into second multimedia data based on the machine learning model; the display module is used for displaying the second multimedia data; wherein the image quality of the second multimedia data is different from the first multimedia data.

According to still another aspect of the embodiments of the present invention, there is provided another data processing apparatus including: the receiving module is used for receiving a service request from a client; the analysis module is used for analyzing the service request to obtain a first multimedia data address and a machine learning model ID; the data acquisition module is used for acquiring first multimedia data through the first multimedia data address; the model acquisition module is used for acquiring the machine learning model according to the machine learning model ID; and the data conversion module is used for converting the first multimedia data into second multimedia data based on the machine learning model, wherein the image quality of the second multimedia data is different from that of the first multimedia data.

According to still another aspect of the embodiments of the present invention, there is also provided an apparatus for training multimedia data, including: the quality degradation module is used for degrading the image quality of the sample data through at least one random quality degradation mode to obtain first quality multimedia data; the sampling module is used for randomly sampling from the first quality multimedia data to obtain a first quality data block; the enhancement module is used for enhancing the first quality data block to obtain a second quality data block, wherein the image quality of the second quality data block is different from that of the first quality data block; the acquisition module is used for comparing the second quality data block with a data block in the sample data and a data block in the multimedia data to which the sample data belongs respectively to acquire a loss function; and the model training module is used for training the process of enhancing the first quality data block to the second quality data block according to the loss function to obtain a machine learning model.

According to an aspect of another embodiment of the present invention, there is further provided a storage medium, where the storage medium includes a stored program, and where the program is executed to control a device on which the storage medium is located to perform the above-mentioned method for training and enhancing the image quality of multimedia data.

According to another aspect of another embodiment of the present invention, there is also provided a processor, wherein the processor is configured to execute a program, and wherein the program executes the method for training and enhancing the image quality of multimedia data.

In the embodiment of the invention, a video enhancement model is trained in a circulating iteration mode based on a deep convolutional neural network, then an input low-quality video is deduced based on the trained model, a video enhancement task is completed, and the image quality of sample data is degraded through at least one random degradation mode to obtain first-quality multimedia data; randomly sampling from the first quality multimedia data to obtain a first quality data block; enhancing the first quality data block to obtain a second quality data block, wherein the image quality of the second quality data block is different from that of the first quality data block; comparing the second quality data block with a data block in the sample data and a data block in the multimedia data to which the sample data belongs respectively to obtain a loss function; training the process of enhancing the first quality data block to the second quality data block according to the loss function to obtain a machine learning model, achieving the purposes of generating a high-quality video image and achieving video quality restoration, thereby achieving the technical effect of improving the restoration quality of the video image, and further solving the technical problem that in the prior art, after the video image is enhanced based on a convolutional neural network, the details in the video image are difficult to be complementarily restored, so that the image is excessively smooth.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a flow chart of data processing according to a first embodiment of the invention;

FIG. 2 is a flow chart of data processing according to a second embodiment of the present invention;

fig. 3 is a flowchart of video enhancement in a data processing method according to a second embodiment of the present invention;

fig. 4 is a block diagram of a hardware structure of a computer terminal of a method of training multimedia data according to an embodiment of the present invention;

FIG. 5 is a flowchart of a method of training multimedia data according to a third embodiment of the present invention;

FIG. 6 is a flowchart of model training in a method of training multimedia data according to a third embodiment of the present invention;

FIG. 7 is a schematic diagram of a data processing apparatus according to a fourth embodiment of the present invention;

FIG. 8 is a schematic diagram of a data processing apparatus according to a fifth embodiment of the present invention;

fig. 9 is a schematic diagram of an apparatus for training multimedia data according to a sixth embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The technical terms related to the present application are:

and (3) generating a countermeasure network: one method of unsupervised learning is to learn by letting two neural networks game each other.

And (3) video quality enhancement: the method aims to repair the given low-quality video, remove the blur and noise in the video picture, improve the video resolution and improve the video picture quality.

And (3) quality degradation: refers to a process of degrading the quality of video and images by adding redundant interference information or erasing effective information.

Example 1

According to an aspect of an embodiment of the present invention, a data processing method is provided, and fig. 1 is a flowchart of data processing according to a first embodiment of the present invention. As shown in fig. 1, a data processing method provided in an embodiment of the present application includes:

step S102, receiving first multimedia data uploaded by a client;

step S104, receiving a machine learning model selected by a client;

step S106, converting the first multimedia data into second multimedia data based on the machine learning model;

step S108, displaying the second multimedia data;

wherein the image quality of the second multimedia data is different from the first multimedia data.

Specifically, with reference to steps S102 to S108, the data processing method provided in the embodiment of the present application may be applied to an image processing scheme, where for example, the data processing method is implemented by a client installed in a mobile terminal device, and for example, the first multimedia data is used as picture data, a user uploads the picture data through the client, and a network receives the picture data uploaded by the client; the user selects a corresponding machine learning model through the client, the network terminal calls the machine learning model, and based on the machine learning model, the uploaded picture data is subjected to image quality enhancement to obtain the picture data with the improved image quality; and displaying the picture data with the improved image quality.

The training process of the machine learning model provided by the embodiment of the application comprises the following steps: degrading the image quality of the sample data through at least one random degradation mode to obtain first quality multimedia data; randomly sampling from the first quality multimedia data to obtain a first quality data block; enhancing the first quality data block to obtain a second quality data block, wherein the image quality of the second quality data block is higher than that of the first quality data block; comparing the second quality data block with a data block in the sample data and a data block in the multimedia data to which the sample data belongs respectively to obtain a loss function; and training the process of enhancing the first quality data block to the second quality data block according to the loss function to obtain a machine learning model.

In addition, the image quality provided by the embodiment of the present application may include: sharpness, clarity and resolution; preferably, if the image quality is the definition, the image data with the first definition is converted into the image data with the second definition based on the machine learning model, wherein the first definition is smaller than the second definition.

If the image quality is definition and resolution, converting the picture data of the first definition and the first resolution into picture data of second definition and second resolution based on the machine learning model, wherein the first definition is smaller than the second definition, and the first resolution is larger than the second resolution, or the first resolution is smaller than the second resolution.

The setting or adjustment of the resolution may be set or adjusted according to the specification of the platform or the device to be displayed, so that, when the first resolution is smaller than the second resolution, that is, the resolution of the picture data needs to be increased, taking 720P as an example, if the resolution of the picture data is 720P, the resolution of the picture data is increased to 1080P, 2K or 4K according to the requirement of the display platform or the device.

If the first resolution is higher than the second resolution, i.e. the resolution of the picture data needs to be reduced, taking 720P as an example, if the resolution of the picture data is 720P, the resolution of the picture data is reduced to 360P according to the requirement of the display platform or the device.

It should be noted that, the above example is only for implementing the data processing method provided in the embodiment of the present application, and is not limited specifically.

Example 2

According to an aspect of an embodiment of the present invention, another data processing method is provided, and fig. 2 is a flowchart of data processing according to a second embodiment of the present invention. As shown in fig. 2, a data processing method provided in the embodiment of the present application includes:

step S200, receiving a service request from a client;

step S202, analyzing the service request, and acquiring a first multimedia data address and a machine learning model ID;

step S204, acquiring first multimedia data through the first multimedia data address;

step S208, acquiring a machine learning model according to the machine learning model ID;

step S210, converting the first multimedia data into second multimedia data based on the machine learning model, wherein the image quality of the second multimedia data is different from that of the first multimedia data.

Specifically, with reference to steps S200 to S210, the data processing method provided in this embodiment of the present application may be applied to an image processing scheme, where, for example, the data processing method is implemented by a client installed in a mobile terminal device, and first multimedia data is used as picture data, at a network end, the network end receives a service request sent by the client, obtains an address where the picture data to be processed is stored and a machine learning model ID by analyzing the service request, and obtains the picture data according to the address and a corresponding machine learning model according to the machine learning model ID by the network end; based on the machine learning model, the uploaded picture data is subjected to image quality enhancement, and the picture data with improved image quality is obtained.

For example, assuming that the picture data is stored in the cloud, the picture data is obtained from the cloud according to the address by obtaining the address of the picture data stored in the service request, the corresponding machine learning model is obtained through the machine learning model ID, and based on the machine learning model, the image quality of the uploaded picture data is enhanced, so that the picture data with the improved image quality is obtained.

In addition, the image quality provided by the embodiment of the present application may include: sharpness, sharpness and resolution.

The second multimedia data in the embodiment of the present application performs data specification conversion according to the type of the device to be delivered, where the data specification includes: sharpness, clarity and resolution; if the definition of the second multimedia data is X and the resolution is 720P, if the data specification adapted to the type of the device to be launched is definition X +1 and the resolution is 1080P, the definition and the resolution of the second multimedia data are adjusted according to the definition of X +1 and the resolution of 1080P, and finally the multimedia data with the definition of X +1 and the resolution of 1080P is obtained.

Specifically, with reference to steps S200 to S210, as shown in fig. 3, fig. 3 is a flowchart of video enhancement in a data processing method according to a second embodiment of the present invention, wherein the image quality of the first quality multimedia data is lower than the image quality of the second quality multimedia data;

taking video data as an example, the machine learning model is changed into a video quality enhancement model, after the video quality enhancement model is obtained, low-quality video data is input into the video quality enhancement model, the image quality is enhanced through the video quality enhancement model, and finally high-quality video data is obtained, so that video quality restoration is realized.

In addition, in the same way of repairing old photos, black and white photos and applying the images in the fields of AR and VR, the old photos can be repaired, textures can be refined, and defects can be filled through the learning of a machine learning model on a large amount of data; and black and white photos or black and white images (films) can be colorized by a machine learning model, namely, the environment, characters and articles in the photos or images are reasonably colored and rendered by learning a large amount of data, the color is added while repairing, and the probability of data streaming over time is increased.

Example 3

There is also provided, in accordance with an embodiment of the present invention, an embodiment of a method of training multimedia data, it being noted that the steps illustrated in the flowchart of the figure may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than that presented herein.

The method provided by the third embodiment of the present application may be executed in a mobile terminal, a computer terminal, or a similar computing device. Taking the example of running on a computer terminal, fig. 4 is a block diagram of a hardware structure of the computer terminal of a method for training multimedia data according to an embodiment of the present invention. As shown in fig. 4, the computer terminal 40 may include one or more (only one shown) processors 402 (the processor 402 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA), a memory 404 for storing data, and a transmission module 406 for communication functions. It will be understood by those skilled in the art that the structure shown in fig. 4 is only an illustration and is not intended to limit the structure of the electronic device. For example, the computer terminal 40 may also include more or fewer components than shown in FIG. 4, or have a different configuration than shown in FIG. 4.

The memory 404 may be used to store software programs and modules of application software, such as program instructions/modules corresponding to the method for training multimedia data in the embodiment of the present invention, and the processor 402 executes various functional applications and data processing by executing the software programs and modules stored in the memory 404, that is, implementing the method for training multimedia data of the application program. The memory 404 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 404 may further include memory located remotely from the processor 402, which may be connected to the computer terminal 40 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 406 is used for receiving or sending data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 40. In one example, the transmission device 406 includes a Network adapter (NIC) that can be connected to other Network devices through a base station so as to communicate with the internet. In one example, the transmission device 406 may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

In the above operating environment, the present application provides a method of training multimedia data as shown in fig. 5. Fig. 5 is a flowchart of a method for training multimedia data according to a third embodiment of the present invention. The method for training multimedia data provided by the embodiment of the application comprises the following steps:

step S502, degrading the image quality of the sample data through at least one random degradation mode to obtain first quality multimedia data;

in step S502, in order to enhance the quality of multimedia data, in the process of training the machine learning model, first, sample data is randomly degraded to obtain training data as an input machine learning model.

The multimedia data to which the sample data belongs includes: video data, picture data, augmented reality image data, or virtual reality image data.

Acquiring sample data through multimedia data, and generating low-quality multimedia data after random degradation, namely the first-quality multimedia data in the embodiment of the application; in the embodiment of the present application, taking video data as an example, the multimedia quality may include parameters that represent image quality: the resolution, in the embodiment of the present application, the low-quality multimedia data may be low-quality video data, and the low-quality representation manner may include: blurred pictures, noise, color loss, etc.

Wherein the at least one random degradation pattern may include: down-sampling, blurring, adding noise, compression coding.

Through the combination of the random degradation modes, the image quality of the sample data is degraded to obtain a low-quality video image, so that the degradation of the low-quality video quality is effectively simulated, and training data more suitable for an application scene is generated.

Step S504, randomly sampling from the first quality multimedia data to obtain a first quality data block;

in the above step S504 of the present application, based on the first quality multimedia data obtained in the step S502, a small data block (i.e., the first quality data block in the embodiment of the present application) is obtained by sampling from the whole image through random sampling, and the small data block is further used as input data for training a machine learning model.

Step S506, enhancing the first quality data block to obtain a second quality data block, wherein the image quality of the second quality data block is different from that of the first quality data block;

in the above step S506 of the present application, based on the first quality data block obtained in step S504, the first quality data block is referred to as LQ, and the second quality data block after quality enhancement is referred to as HQ, where the image quality of HQ is higher than LQ.

Step S508, comparing the second quality data block with the data block in the sample data and the data block in the multimedia data to which the sample data belongs respectively to obtain a loss function;

in the above step S508, the second quality data block obtained in step S506 is compared with the data block in the sample data and the data block in the multimedia data to which the sample data belongs, respectively, to obtain a loss function.

The second quality data block HQ is compared with the data block GT in the sample data to obtain a first loss function L and a perception loss function L_percepThrough a first loss function L₁And a perceptual loss function L_percepThe image generated by constraint can approach the original high-quality image from the content and the sensory characteristics;

data in multimedia data to which second quality data block HQ and sample data belongComparing the blocks GT to obtain a discriminant loss function L_DAnd a penalty function L_GANTo enable determination of a true high quality image and generation of a high quality image, optimizing L_DOptimizing L for accurately identifying which pictures were generated_GANThe method is used for making the image obtained after the quality enhancement difficult to identify, so that more picture details can be restored in the HQ, and the enhancement of higher quality can be realized.

Finally according to the first loss function L₁The perceptual loss function L_percepAnd a penalty function L_GANObtaining a loss function L_GWherein L is_G＝αL₁+βL_percep+γL_GANAnd α, β, γ represent the weights corresponding to the losses, respectively.

Step S510, training the process of enhancing the first quality data block to the second quality data block according to the loss function to obtain a machine learning model.

In the above step S510, the loss function L obtained in step S508 is used as the basis_GAnd a discriminant loss function L_DAnd training the process of enhancing the first quality data block to the second quality data block again to finally obtain the machine learning model.

In the embodiment of the invention, a video enhancement model is trained in a circulating iteration mode based on a deep convolutional neural network, then an input low-quality video is deduced based on the trained model, a video enhancement task is completed, and the image quality of sample data is degraded through at least one random degradation mode to obtain first-quality multimedia data; randomly sampling from the first quality multimedia data to obtain a first quality data block; enhancing the first quality data block to obtain a second quality data block, wherein the image quality of the second quality data block is different from that of the first quality data block; comparing the second quality data block with a data block in the sample data and a data block in the multimedia data to which the sample data belongs respectively to obtain a loss function; training the process of enhancing the first quality data block to the second quality data block according to the loss function to obtain a multimedia quality enhancement model, achieving the purposes of generating a high-quality video image and achieving video quality restoration, thereby achieving the technical effect of improving the restoration quality of the video image, and further solving the technical problem that the image is excessively smooth because the details in the video image are difficult to be complementarily restored after the video image is enhanced based on a convolutional neural network in the prior art.

Optionally, before the image quality of the sample data is degraded through at least one random degradation mode in step S502 to obtain the first quality multimedia data, the method for training multimedia data provided in the embodiment of the present application further includes:

step S500, sample data is collected from multimedia data according to the application scene.

Specifically, data are collected from a real scene, corresponding high-quality film and television video data are collected mainly for an application scene, high-quality video images are extracted to serve as training data, and finally sample data are obtained.

Optionally, the randomly sampling from the first quality multimedia data in step S504 to obtain a first quality data block includes:

step S5041, randomly obtaining a data block from the first quality multimedia data;

step S5042, the data blocks are screened according to the preset data block information to obtain a first quality data block.

Specifically, by combining step S5041 and step S5042, a small data block is sampled from the whole image, and the small data block can be randomly sampled from the whole image in each iteration, so that the randomness ensures the richness of sampling, and meanwhile, the richness of data block information is added for judgment, so that the good sampling block is intelligently screened out to be used as training input, and the data blocks with too little information and too poor quality are filtered out, so as to avoid the negative influence of the data blocks on model training.

Optionally, in step S508, comparing the second quality data block with the data block in the sample data and the data block in the multimedia data to which the sample data belongs, and acquiring the loss function includes:

step S5081, comparing the second quality data block with data blocks in the sample data to obtain a first loss function and a perception loss function, wherein multimedia data generated by the first loss function and the perception loss function due to constraint is close to the multimedia data in content and sensory characteristics;

specifically, the first quality data block is denoted as LQ, the second quality data block is denoted as HQ, and the data block GT of sample data corresponding to HQ and LQ is used to determine L₁Loss (i.e., first loss function in the embodiment of the present application) and perceptual loss L_percep(i.e., the perceptual loss function in the embodiments of the present application), the image generated with simultaneous constraints can approximate the original high quality image from content and sensory features.

Step 5082, comparing the second quality data block with a data block in the multimedia data to which the sample data belongs to obtain a second loss function and a counterloss function, wherein the second loss function is used for improving the judgment of the generated picture; a penalty function for increasing the efficiency of restoring details in the multimedia data by the second quality data block;

specifically, the HQ is compared with the data blocks in the multimedia data to which the sample data belongs to obtain the discrimination loss L_D(i.e., the second loss function in the embodiment of the present application) and the countermeasure loss L_GAN(i.e., the penalty function in the embodiment of the present application).

Step S5083, obtaining a third loss function according to the first loss function, the perceptual loss function, and the antagonistic loss function;

specifically, the third loss function obtained according to the first loss function, the perceptual loss function, and the antagonistic loss function can be obtained by the following formula:

third loss function L_G＝αL₁+βL_percep+γL_GAN；

Where α, β, γ represent the weights corresponding to the losses, respectively.

In step S5084, a third loss function is determined as the loss function.

In summary, with reference to steps S502 to S510, as shown in fig. 6, fig. 6 is a flowchart of model training in a method for training multimedia data according to a third embodiment of the present invention, where the method for training multimedia data provided in the embodiment of the present invention specifically includes:

firstly, a video image custom random degradation module is constructed, a video enhancement model based on a convolutional neural network is constructed to be used as a generator, and a discriminator based on the convolutional neural network is constructed;

(1) acquiring data from a real scene, mainly aiming at an application scene, acquiring corresponding high-quality film and television video data, and extracting a high-quality video image as training data;

(2) and (3) subjecting the high-quality video image of the training data in the step (1) to a self-defined random degradation module to perform degradation of various random combinations to generate a low-quality video image which is used as low-quality data of a training generator. The random degradation module which is formed by modes of down-sampling, blurring, noise adding, compression coding and the like can effectively simulate the degradation of low-quality video quality and generate training data which is more suitable for application scenes;

(3) and (3) for the degraded low-quality video image in the step (2), sampling a small data block from the whole image through a random intelligent sampling module to be used as the input of a training generator each time. The module can randomly sample small blocks from the whole image in each iteration, the randomness ensures the richness of sampling, meanwhile, the judgment of the richness degree of data block information is added, the good sampling blocks are intelligently screened out to be used as training input, and the data blocks with too little information and too poor quality are filtered out to avoid the negative influence on the model training;

(4) taking the low-quality data block generated in the step (3) as a generator input, recording the low-quality data block as LQ, obtaining an output HQ after quality enhancement after the generator is carried out, and solving the L of the HQ and the high-quality video data block GT corresponding to the LQ₁Loss and perceptual loss L_percepThe image generated by simultaneous constraint can approach the original high-quality image from the content and the sensory characteristics;

(5) the HQ generated in the step (4) and the corresponding original high-quality data block GT are used as the input of a discriminator, and the corresponding discrimination loss L is solved_DAnd to combat the loss L_GANThe main measure is whether the discriminator can discriminate the real high quality image and generate the high quality imageOptimizing L_DAims to train a strong discriminator to accurately identify which pictures are generated and optimize L_GANThe purpose of the method is to enable the image generated by the generator to deceive the discriminator to be mistaken as a real image, so that more picture details can be restored in the HQ on the basis of the step (4), and the higher quality enhancement can be realized;

(6) based on (4) and (5), a loss function L of the final discriminator is obtained_DLoss function L of sum generator_G＝αL₁+βL_percep+γL_GANWherein, alpha, beta and gamma represent the weight of the corresponding loss respectively, the two losses are returned to the discriminator and the generator model respectively, the two models are trained in a random gradient descent mode, and finally a good generator, namely a video quality enhancement model, can be obtained;

it should be noted that, in the embodiment of the present application, the random degradation module may be based on a conventional image degradation algorithm, or may obtain a degradation model based on deep neural network learning as a degradation algorithm; the random intelligent sampling module can be based on traditional algorithms such as significance detection, edge detection and the like, and can also be based on a deep neural network learning to obtain an information abundance discrimination model as a discrimination algorithm; there are many choices for the sampling size of the data block during training, and if the training time cost and resource consumption are not considered, larger data blocks or even the whole high-quality graph can be used as training input. That is, the method for training multimedia data provided in the embodiment of the present application is described only by taking the above example as an example, and is not particularly limited by the method for training multimedia data provided in the embodiment of the present application.

The method for training the multimedia data, provided by the embodiment of the application, uses the real video data, the random degradation module and the random intelligent sampling module to generate the training data, can effectively fit the low-quality scene of the video, and enhances the generalization performance in video application. Meanwhile, a training mechanism for generating an antagonistic network is introduced, so that further enhancement and restoration of details can be realized, and the problem of excessive smoothness of the enhanced image is solved.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method for training multimedia data according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

Example 4

According to an embodiment of the present invention, there is further provided an apparatus for implementing the data processing method in embodiment 1 above, and fig. 7 is a schematic diagram of a data processing apparatus according to a fourth embodiment of the present invention. As shown in fig. 7, the data processing apparatus provided in the embodiment of the present application includes: a data receiving module 72, configured to receive first multimedia data uploaded by a client; a selection module 74, configured to receive the machine learning model selected by the client; a data conversion module 76 for converting the first multimedia data into second multimedia data based on the machine learning model; a presentation module 78 for presenting the second multimedia data; wherein the image quality of the second multimedia data is different from the first multimedia data.

Example 5

According to an embodiment of the present invention, there is further provided an apparatus for implementing the data processing method in embodiment 2, and fig. 8 is a schematic diagram of a data processing apparatus according to a fifth embodiment of the present invention. As shown in fig. 8, includes: a receiving module 80, configured to receive a service request from a client; the analysis module 82 is used for analyzing the service request to obtain a first multimedia data address and a machine learning model ID; a data obtaining module 84, configured to obtain first multimedia data through the first multimedia data address; a model obtaining module 86, configured to obtain a machine learning model according to the machine learning model ID; a data conversion module 88 for converting the first multimedia data into second multimedia data based on the machine learning model, wherein the image quality of the second multimedia data is different from the first multimedia data.

Example 6

According to an embodiment of the present invention, there is further provided an apparatus for implementing the method for training multimedia data in embodiment 3, and fig. 9 is a schematic diagram of an apparatus for training multimedia data according to a sixth embodiment of the present invention. As shown in fig. 9, an apparatus for training multimedia data provided in an embodiment of the present application includes: a degradation module 90, configured to degrade image quality of the sample data by at least one random degradation mode to obtain first quality multimedia data; a sampling module 92, configured to randomly sample from the first quality multimedia data to obtain a first quality data block; an enhancement module 94, configured to enhance the first quality data block to obtain a second quality data block, where an image quality of the second quality data block is different from that of the first quality data block; an obtaining module 96, configured to compare the second quality data block with a data block in the sample data and a data block in the multimedia data to which the sample data belongs, respectively, and obtain a loss function; and the model training module 98 is configured to train a process of enhancing the first quality data block to the second quality data block according to the loss function, so as to obtain a machine learning model.

Example 7

According to an aspect of another embodiment of the present invention, there is further provided a storage medium, where the storage medium includes a stored program, and when the program runs, the apparatus on which the storage medium is located is controlled to execute the method for processing data and training multimedia data according to any one of embodiments 1 to 3.

Example 7

According to another aspect of another embodiment of the present invention, there is further provided a processor, wherein the processor is configured to execute a program, and when the program is executed, the method for processing and training multimedia data as described in any one of embodiments 1 to 3 is performed.

Example 8

The embodiment of the invention also provides a storage medium. Optionally, in this embodiment, the storage medium may be configured to store program codes executed by the method for training multimedia data provided in the first embodiment.

Optionally, in this embodiment, the storage medium may be located in any one of computer terminals in a computer terminal group in a computer network, or in any one of mobile terminals in a mobile terminal group.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: degrading the image quality of the sample data through at least one random degradation mode to obtain first quality multimedia data; randomly sampling from the first quality multimedia data to obtain a first quality data block; enhancing the first quality data block to obtain a second quality data block, wherein the image quality of the second quality data block is different from that of the first quality data block; comparing the second quality data block with a data block in the sample data and a data block in the multimedia data to which the sample data belongs respectively to obtain a loss function; and training the process of enhancing the first quality data block to the second quality data block according to the loss function to obtain a machine learning model.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: the method comprises the steps of collecting sample data from multimedia data according to an application scene before the image quality of the sample data is degraded through at least one random degradation mode to obtain first-quality multimedia data.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: the at least one random degradation pattern comprises: down-sampling, blurring, adding noise, compression coding.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: randomly sampling from the first quality multimedia data to obtain a first quality data block comprises: randomly acquiring a data block from first quality multimedia data; and screening the data blocks according to the preset data block information to obtain a first quality data block.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: comparing the second quality data block with a data block in the sample data and a data block in the multimedia data to which the sample data belongs, and acquiring a loss function comprises: comparing the second quality data block with data blocks in the sample data to obtain a first loss function and a perception loss function, wherein multimedia data generated by the first loss function and the perception loss function due to constraint is close to the multimedia data in content and sensory characteristics; comparing the second quality data block with a data block in the multimedia data to which the sample data belongs to obtain a second loss function and a counterloss function, wherein the second loss function is used for improving the judgment of the generated picture; a penalty function for increasing the efficiency of restoring details in the multimedia data by the second quality data block; obtaining a third loss function according to the first loss function, the perception loss function and the countermeasure loss function; a third loss function is determined as the loss function.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: the multimedia data to which the sample data belongs includes: video data, picture data, augmented reality image data, or virtual reality image data.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method of data processing, comprising:

receiving first multimedia data uploaded by a client;

receiving a machine learning model selected by a client;

converting the first multimedia data into second multimedia data based on the machine learning model;

displaying the second multimedia data; wherein the second multimedia data has a different image quality than the first multimedia data.

2. A method of data processing, comprising:

receiving a service request from a client;

analyzing the service request to obtain a first multimedia data address and a machine learning model ID;

acquiring the first multimedia data through the first multimedia data address;

acquiring the machine learning model according to the machine learning model ID;

based on the machine learning model, the first multimedia data is transformed into second multimedia data, wherein the second multimedia data has a different image quality than the first multimedia data.

3. The method according to claim 2, wherein the second multimedia data is converted according to a data specification of a device to be delivered, so as to obtain the converted second multimedia data.

4. A method of training multimedia data, comprising:

degrading the image quality of the sample data through at least one random degradation mode to obtain first quality multimedia data;

randomly sampling from the first quality multimedia data to obtain a first quality data block;

enhancing the first quality data block to obtain a second quality data block, wherein the image quality of the second quality data block is different from that of the first quality data block;

comparing the second quality data block with a data block in the sample data and a data block in the multimedia data to which the sample data belongs respectively to obtain a loss function;

and training the process of enhancing the first quality data block to the second quality data block according to the loss function to obtain a machine learning model.

5. The method of claim 4, wherein before degrading the image quality of the sample data by at least one random degradation mode to obtain a first quality multimedia data, the method further comprises:

and collecting the sample data from the multimedia data according to the application scene.

6. The method of claim 4 or 5, wherein the at least one random degradation pattern comprises: down-sampling, blurring, adding noise, compression coding.

7. The method of claim 4, wherein said randomly sampling from the first quality multimedia data to obtain a first quality data block comprises:

randomly acquiring a data block from the first quality multimedia data;

and screening the data blocks according to preset data block information to obtain the first quality data block.

8. The method according to claim 4 or 7, wherein said comparing said second quality data chunk with a data chunk in said sample data and a data chunk in multimedia data to which said sample data belongs respectively, obtaining a loss function comprises:

comparing the second quality data block with data blocks in the sample data to obtain a first loss function and a perception loss function, wherein multimedia data generated by the first loss function and the perception loss function due to constraint is close to the multimedia data in content and sensory characteristics;

comparing the second quality data block with a data block in the multimedia data to which the sample data belongs to obtain a second loss function and a counterloss function, wherein the second loss function is used for improving the judgment of a generated picture; the penalty function is used for improving the efficiency of restoring details in the multimedia data through the second quality data block;

obtaining a third loss function according to the first loss function, the perception loss function and the countermeasure loss function;

determining the third loss function as the loss function.

9. The method of claim 4, wherein the multimedia data to which the sample data belongs comprises: video data, picture data, augmented reality image data, or virtual reality image data.

10. A data processing apparatus comprising:

the data receiving module is used for receiving first multimedia data uploaded by the client;

the selection module is used for receiving the machine learning model selected by the client;

a data conversion module for converting the first multimedia data into second multimedia data based on the machine learning model;

the display module is used for displaying the second multimedia data; wherein the second multimedia data has a different image quality than the first multimedia data.

11. A data processing apparatus comprising:

the receiving module is used for receiving a service request from a client;

the analysis module is used for analyzing the service request to obtain a first multimedia data address and a machine learning model ID;

the data acquisition module is used for acquiring the first multimedia data through the first multimedia data address;

the model acquisition module is used for acquiring the machine learning model according to the machine learning model ID;

a data conversion module, configured to convert the first multimedia data into second multimedia data based on the machine learning model, where an image quality of the second multimedia data is different from that of the first multimedia data.

12. An apparatus for training multimedia data, comprising:

the quality degradation module is used for degrading the image quality of the sample data through at least one random quality degradation mode to obtain first quality multimedia data;

the sampling module is used for randomly sampling from the first quality multimedia data to obtain a first quality data block;

the enhancement module is used for enhancing the first quality data block to obtain a second quality data block, wherein the image quality of the second quality data block is different from that of the first quality data block;

the obtaining module is used for comparing the second quality data block with a data block in the sample data and a data block in the multimedia data to which the sample data belongs respectively to obtain a loss function;

and the model training module is used for training the process of enhancing the first quality data block to the second quality data block according to the loss function to obtain a machine learning model.

13. A storage medium, wherein the storage medium includes a stored program, and wherein when the program is executed, the apparatus on which the storage medium is located is controlled to execute the method for processing and training multimedia data according to any one of claims 1 to 9.

14. A processor, wherein the processor is configured to execute a program, wherein the program is configured to execute the method for processing and training multimedia data according to any one of claims 1 to 9.