CN113628121B

CN113628121B - Method and device for processing and training multimedia data

Info

Publication number: CN113628121B
Application number: CN202010373497.7A
Authority: CN
Inventors: 林宪晖
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-05-06
Filing date: 2020-05-06
Publication date: 2023-11-14
Anticipated expiration: 2040-05-06
Also published as: CN113628121A

Abstract

The invention discloses a method and a device for processing and training multimedia data. Wherein the method comprises the following steps: degrading the image quality of the sample data through at least one random degradation mode to obtain first quality multimedia data; randomly sampling from the first quality multimedia data to obtain a first quality data block; enhancing the first quality data block to obtain a second quality data block; comparing the second quality data block with the data block in the sample data and the data block in the multimedia data to which the sample data belongs respectively to obtain a loss function; and training the process of enhancing the first quality data block to the second quality data block according to the loss function to obtain a machine learning model. The method solves the technical problem that the detail part in the video image is difficult to make supplementary repair after the video image is enhanced based on the convolutional neural network in the prior art, so that the image is excessively smooth.

Description

Method and device for processing and training multimedia data

Technical Field

The invention relates to the technical field of Internet, in particular to a method and a device for processing and training multimedia data.

Background

The video quality enhancement can intelligently improve the resolution of the video without additional input and interaction, repair the low quality condition existing in video pictures, remove the blurring and noise of the pictures, remarkably improve the quality and the look and feel of the video, and have strong demands and wide application scenes in old film repair, 4K reproduction and online video platforms.

However, the existing video enhancement method is mostly based on the convolutional neural network-based image enhancement method and is applied to video scenes. This does not guarantee high quality enhancement of low quality video in different styles and scenes. Meanwhile, the common neural network performs training optimization by using similar loss functions such as mean square error and the like, so that complementary repair is difficult to be performed on detail positions in the video image, and the defects of excessive smoothness and detail trowelling exist.

Aiming at the problem that the prior art is difficult to make supplementary repair at details in the video image after the video image is enhanced based on the convolutional neural network, so that the image is excessively smooth, no effective solution is proposed at present.

Disclosure of Invention

The embodiment of the invention provides a method and a device for processing and training multimedia data, which at least solve the technical problem that the detail in a video image is difficult to make supplementary repair after the video image is enhanced based on a convolutional neural network in the prior art, so that the image is excessively smooth.

According to an aspect of an embodiment of the present invention, there is provided a data processing method including: receiving first multimedia data uploaded by a client; receiving a machine learning model selected by a client; converting the first multimedia data into second multimedia data based on a machine learning model; displaying the second multimedia data; wherein the image quality of the second multimedia data is different from the first multimedia data.

According to an aspect of an embodiment of the present invention, there is provided another data processing method, including: receiving a service request from a client; analyzing the service request to obtain a first multimedia data address and a machine learning model ID; acquiring first multimedia data through a first multimedia data address; acquiring a machine learning model according to the machine learning model ID; the first multimedia data is converted into second multimedia data based on a machine learning model, wherein the image quality of the second multimedia data is different from the first multimedia data.

Optionally, the second multimedia data is converted into data specifications according to the type of the equipment to be put in, and the converted second multimedia data is obtained.

According to another aspect of an embodiment of the present invention, there is provided a method of training multimedia data, including: degrading the image quality of the sample data through at least one random degradation mode to obtain first quality multimedia data; randomly sampling from the first quality multimedia data to obtain a first quality data block; enhancing the first quality data block to obtain a second quality data block, wherein the image quality of the second quality data block is different from that of the first quality data block; comparing the second quality data block with the data block in the sample data and the data block in the multimedia data to which the sample data belongs respectively to obtain a loss function; and training the process of enhancing the first quality data block to the second quality data block according to the loss function to obtain a machine learning model.

Optionally, before degrading the image quality of the sample data by at least one random degradation mode to obtain the first quality multimedia data, the method further comprises: sample data are collected from multimedia data according to application scenes.

Optionally, the at least one random degradation pattern comprises: downsampling, blurring, noise adding, compression coding.

Optionally, randomly sampling from the first quality multimedia data, obtaining the first quality data block includes: randomly acquiring a data block from the first quality multimedia data; and screening the data blocks according to the preset data block information to obtain a first quality data block.

Optionally, comparing the second quality data block with the data block in the sample data and the data block in the multimedia data to which the sample data belongs, respectively, and obtaining the loss function includes: comparing the second quality data block with the data blocks in the sample data to obtain a first loss function and a perception loss function, wherein the multimedia data generated by the first loss function and the perception loss function due to constraint is close to the multimedia data in terms of content and sensory characteristics; comparing the second quality data block with the data block in the multimedia data to which the sample data belong to obtain a second loss function and an anti-loss function, wherein the second loss function is used for improving the judgment of the generated picture; an anti-loss function for improving the efficiency of recovering details in the multimedia data by the second quality data block; obtaining a third loss function according to the first loss function, the perceived loss function and the counterloss function; the third loss function is determined as a loss function.

Optionally, the multimedia data to which the sample data belongs includes: video data, picture data, augmented reality image data, or virtual reality image data.

According to still another aspect of the embodiment of the present invention, there is also provided a data processing apparatus including: the data receiving module is used for receiving the first multimedia data uploaded by the client; the selection module is used for receiving the machine learning model selected by the client; the data conversion module is used for converting the first multimedia data into the second multimedia data based on the machine learning model; the display module is used for displaying the second multimedia data; wherein the image quality of the second multimedia data is different from the first multimedia data.

According to still another aspect of the embodiments of the present invention, there is provided another data processing apparatus, including: the receiving module is used for receiving a service request from the client; the analysis module is used for analyzing the service request and acquiring a first multimedia data address and a machine learning model ID; the data acquisition module is used for acquiring the first multimedia data through the first multimedia data address; the model acquisition module is used for acquiring a machine learning model according to the machine learning model ID; and the data conversion module is used for converting the first multimedia data into the second multimedia data based on the machine learning model, wherein the image quality of the second multimedia data is different from that of the first multimedia data.

According to still another aspect of the embodiment of the present invention, there is also provided an apparatus for training multimedia data, including: the degradation module is used for degrading the image quality of the sample data through at least one random degradation mode to obtain first quality multimedia data; the sampling module is used for randomly sampling from the first quality multimedia data to obtain a first quality data block; the enhancement module is used for enhancing the first quality data block to obtain a second quality data block, wherein the image quality of the second quality data block is different from that of the first quality data block; the acquisition module is used for comparing the second quality data block with the data block in the sample data and the data block in the multimedia data to which the sample data belongs respectively to acquire a loss function; and the model training module is used for training the process of enhancing the first quality data block to the second quality data block according to the loss function to obtain a machine learning model.

According to an aspect of another embodiment of the present invention, there is further provided a storage medium, where the storage medium includes a stored program, and where the program, when executed, controls a device in which the storage medium is located to perform the above-mentioned method for training and enhancing image quality of multimedia data.

According to another aspect of another embodiment of the present invention, there is also provided a processor, where the processor is configured to run a program, where the program performs the above-mentioned training, and method for enhancing image quality of multimedia data.

In the embodiment of the invention, a mode of training a video enhancement model based on a depth convolution neural network in a circulating and iterating way, deducing an input low-quality video based on the trained model, and completing a video enhancement task is adopted, and the image quality of sample data is degraded through at least one random degradation mode to obtain first-quality multimedia data; randomly sampling from the first quality multimedia data to obtain a first quality data block; enhancing the first quality data block to obtain a second quality data block, wherein the image quality of the second quality data block is different from that of the first quality data block; comparing the second quality data block with the data block in the sample data and the data block in the multimedia data to which the sample data belongs respectively to obtain a loss function; training the process of enhancing the first quality data block to the second quality data block according to the loss function to obtain a machine learning model, so that the purposes of generating a high-quality video image and realizing video quality restoration are achieved, the technical effect of improving the video image restoration quality is achieved, and the technical problem that the detail in the video image is difficult to make supplementary restoration after the video image is enhanced based on a convolutional neural network in the prior art, so that the image is excessively smooth is solved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

FIG. 1 is a flow chart of data processing according to a first embodiment of the present application;

FIG. 2 is a flow chart of data processing according to a second embodiment of the present application;

FIG. 3 is a flow chart of video enhancement in a data processing method according to a second embodiment of the present application;

fig. 4 is a hardware block diagram of a computer terminal for a method of training multimedia data according to an embodiment of the present application;

fig. 5 is a flowchart of a method of training multimedia data according to a third embodiment of the present application;

FIG. 6 is a flow chart of model training in a method of training multimedia data according to a third embodiment of the present application;

FIG. 7 is a schematic diagram of a data processing apparatus according to a fourth embodiment of the present application;

FIG. 8 is a schematic diagram of a data processing apparatus according to a fifth embodiment of the present application;

fig. 9 is a schematic diagram of an apparatus for training multimedia data according to a sixth embodiment of the present application.

Detailed Description

In order that those skilled in the art will better understand the present application, a technical solution in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present application without making any inventive effort, shall fall within the scope of the present application.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The technical nouns related to the application are as follows:

generating an antagonizing network: a method for non-supervision learning carries out learning by making two neural networks game with each other.

Video quality enhancement: the method is a task for repairing given low-quality video, removing blurring and noise in video pictures, improving video resolution and improving video picture quality.

Degradation: refers to a process of degrading video and image quality due to adding redundant interference information or erasing effective information.

Example 1

According to an aspect of an embodiment of the present application, there is provided a data processing method, and fig. 1 is a flowchart of data processing according to a first embodiment of the present application. As shown in fig. 1, the data processing method provided by the embodiment of the present application includes:

step S102, receiving first multimedia data uploaded by a client;

step S104, receiving a machine learning model selected by a client;

step S106, converting the first multimedia data into second multimedia data based on the machine learning model;

step S108, displaying the second multimedia data;

wherein the image quality of the second multimedia data is different from the first multimedia data.

Specifically, in combination with step S102 to step S108, the data processing method provided in the embodiment of the present application may be applied to an image processing scheme, where, taking a client installed in a mobile terminal device as an example, taking first multimedia data as picture data as an example, a user uploads the picture data through the client, and a network side receives the picture data uploaded by the client; the user selects a corresponding machine learning model through the client, the network side invokes the machine learning model, and the image quality of the uploaded image data is enhanced based on the machine learning model to obtain the image data with improved image quality; and displaying the picture data with the improved image quality.

The training process of the machine learning model provided by the embodiment of the application comprises the following steps: degrading the image quality of the sample data through at least one random degradation mode to obtain first quality multimedia data; randomly sampling from the first quality multimedia data to obtain a first quality data block; enhancing the first quality data block to obtain a second quality data block, wherein the image quality of the second quality data block is higher than that of the first quality data block; comparing the second quality data block with the data block in the sample data and the data block in the multimedia data to which the sample data belongs respectively to obtain a loss function; and training the process of enhancing the first quality data block to the second quality data block according to the loss function to obtain a machine learning model.

In addition, the image quality provided by the embodiment of the application can include: definition, definition and resolution; preferably, if the image quality is a definition, converting the first definition image data into the second definition image data based on the machine learning model, wherein the first definition is smaller than the second definition.

If the image quality is definition and resolution, converting the first definition and the first resolution of the image data into the second definition and the second resolution of the image data based on a machine learning model, wherein the first definition is smaller than the second definition, the first resolution is larger than the second resolution, or the first resolution is smaller than the second resolution.

The setting or adjusting of the resolution may be set or adjusted according to the specification of the platform or the device to be displayed, so that in the case that the first resolution is smaller than the second resolution, that is, the resolution of the picture data needs to be improved, taking 720P as an example, if the resolution of the picture data is 720P, the resolution of the picture data is improved to 1080P, 2K or 4K according to the requirement of the display platform or the device.

In the case that the first resolution is greater than the second resolution, that is, the resolution of the picture data needs to be reduced, taking 720P as an example, if the resolution of the picture data is 720P, the resolution of the picture data is reduced to 360P according to the requirement of the display platform or the device.

It should be noted that, the above examples are only for implementing the data processing method provided by the embodiments of the present application, and are not limited specifically.

Example 2

According to an aspect of an embodiment of the present application, another data processing method is provided, and fig. 2 is a flowchart of data processing according to a second embodiment of the present application. As shown in fig. 2, the data processing method provided by the embodiment of the present application includes:

step S200, receiving a service request from a client;

step S202, analyzing a service request to obtain a first multimedia data address and a machine learning model ID;

Step S204, obtaining first multimedia data through the first multimedia data address;

step S208, according to the machine learning model ID, acquiring a machine learning model;

step S210, converting the first multimedia data into second multimedia data based on the machine learning model, wherein the image quality of the second multimedia data is different from the first multimedia data.

Specifically, in combination with step S200 to step S210, the data processing method provided in the embodiment of the present application may be applied to an image processing scheme, where, taking a client installed in a mobile terminal device as an example, taking first multimedia data as picture data, at a network end, the network end receives a service request sent by the client, and obtains an address stored in the picture data to be processed and a machine learning model ID by analyzing the service request, the network end obtains the picture data according to the address, and obtains a corresponding machine learning model according to the machine learning model ID; and based on the machine learning model, enhancing the image quality of the uploaded picture data to obtain the picture data with improved image quality.

For example, assuming that the picture data is stored in the cloud, the picture data is obtained from the cloud according to the address stored in the picture data in the service request, a corresponding machine learning model is obtained through the machine learning model ID, and the image quality of the uploaded picture data is enhanced based on the machine learning model, so that the picture data with improved image quality is obtained.

In addition, the image quality provided by the embodiment of the application can include: definition, definition and resolution.

The second multimedia data in the embodiment of the present application performs conversion of a data specification according to a type of a device to be put in, where the data specification includes: definition, clarity, and resolution; if the definition of the second multimedia data is X and the resolution is 720P, and if the data specification adapted to the type of the device to be put in is definition x+1 and the resolution is 1080P, the definition and the resolution of the second multimedia data are adjusted according to definition x+1 and resolution 1080P, and finally the multimedia data with definition x+1 and resolution 1080P are obtained.

Specifically, in combination with step S200 to step S210, as shown in fig. 3, fig. 3 is a flowchart of video enhancement in the data processing method according to the second embodiment of the present invention, wherein the image quality of the first quality multimedia data is lower than the image quality of the second quality multimedia data;

taking video data as an example, the machine learning model is changed into a video quality enhancement model, after the video quality enhancement model is obtained, low-quality video data is input into the video quality enhancement model, the video quality is enhanced through the video quality enhancement model, and finally high-quality video data is obtained, so that video quality restoration is realized.

In addition, in repairing old photos, black-and-white photos and applying the same to images in the AR and VR fields, the old photos can be repaired, textures can be thinned and the defects can be filled through learning of a machine learning model on a large amount of data; and black and white photos or black and white images (movies) can be colored through a machine learning model, namely, the environment, the characters and the articles in the photos or the images are reasonably colored and rendered through learning of a large amount of data, the colors are added while repairing, and the probability of light flow of the data along with time is increased.

Example 3

There is also provided, in accordance with an embodiment of the present application, a method embodiment of training multimedia data, it being noted that the steps illustrated in the flowcharts of the figures may be performed in a computer system, such as a set of computer executable instructions, and that although a logical order is illustrated in the flowcharts, in some cases the steps illustrated or described may be performed in an order other than that illustrated herein.

The method embodiment provided by the third embodiment of the present application may be executed in a mobile terminal, a computer terminal or a similar computing device. Taking a computer terminal as an example, fig. 4 is a block diagram of a hardware structure of a computer terminal according to a method for training multimedia data according to an embodiment of the present application. As shown in fig. 4, the computer terminal 40 may include one or more (only one is shown in the figure) processors 402 (the processors 402 may include, but are not limited to, a microprocessor MCU or a processing device such as a programmable logic device FPGA), a memory 404 for storing data, and a transmission module 406 for communication functions. It will be appreciated by those of ordinary skill in the art that the configuration shown in fig. 4 is merely illustrative and is not intended to limit the configuration of the electronic device described above. For example, the computer terminal 40 may also include more or fewer components than shown in FIG. 4, or have a different configuration than shown in FIG. 4.

The memory 404 may be used to store software programs and modules of application software, such as program instructions/modules corresponding to the method of training multimedia data in the embodiment of the present invention, and the processor 402 executes the software programs and modules stored in the memory 404 to perform various functional applications and data processing, i.e., implement the method of training multimedia data of application programs described above. Memory 404 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, memory 404 may further include memory located remotely from processor 402, which may be connected to computer terminal 40 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission means 406 is used to receive or transmit data via a network. The specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 40. In one example, the transmission means 406 comprises a network adapter (Network Interface Controller, NIC) that can be connected to other network devices via a base station to communicate with the internet. In one example, the transmission device 406 may be a Radio Frequency (RF) module for communicating with the internet wirelessly.

In the above-described operating environment, the present application provides a method of training multimedia data as shown in fig. 5. Fig. 5 is a flowchart of a method of training multimedia data according to a third embodiment of the present application. The method for training the multimedia data provided by the embodiment of the application comprises the following steps:

step S502, image quality of sample data is degraded through at least one random degradation mode, and first quality multimedia data is obtained;

in the step S502 of the present application, in order to enhance the quality of the multimedia data, in the process of training the machine learning model, first, the sample data is randomly degraded, so as to obtain the training data as the input machine learning model.

Wherein, the multimedia data to which the sample data belongs includes: video data, picture data, augmented reality image data, or virtual reality image data.

Sample data are obtained through multimedia data, and low-quality multimedia data, namely first-quality multimedia data in the embodiment of the application, are generated after random degradation; in the embodiment of the present application, taking video data as an example, the parameters for representing image quality may include: the resolution, in the embodiment of the present application, low-quality multimedia data may be low-quality video data, and the low-quality representation may include: blurred pictures, noisy, color missing, etc.

Wherein the at least one random degradation pattern may include: downsampling, blurring, noise adding, compression coding.

And carrying out degradation on the image quality of the sample data through the combination of the plurality of random degradation modes to obtain a low-quality video image, thereby effectively simulating the degradation of the low-quality video quality and generating training data more suitable for an application scene.

Step S504, randomly sampling from the first quality multimedia data to obtain a first quality data block;

in the step S504, based on the first quality multimedia data obtained in the step S502, a small data block (i.e., the first quality data block in the embodiment of the present application) is obtained by sampling from the whole image through random sampling, and the small data block is further used as input data for training a machine learning model, and the richness of sampling can be ensured due to the random sampling.

Step S506, enhancing the first quality data block to obtain a second quality data block, wherein the image quality of the second quality data block is different from that of the first quality data block;

in the above step S506 of the present application, the first quality data block obtained in step S504 is denoted as LQ, and the second quality data block with enhanced quality is denoted as HQ, wherein the image quality of HQ is higher than LQ.

Step S508, the second quality data block is respectively compared with the data block in the sample data and the data block in the multimedia data to which the sample data belongs, and a loss function is obtained;

in the step S508, the second quality data block obtained in the step S506 is compared with the data block in the sample data and the data block in the multimedia data to which the sample data belongs, respectively, to obtain the loss function.

The second quality data block HQ is compared with the data block GT in the sample data to obtain a first loss function L and a perceived loss function L _percep Through a first loss function L ₁ And a perceptual loss function L _percep The image generated by constraint can approach to the original high-quality image in terms of content and sensory characteristics;

the second quality data block HQ is compared with the data block GT in the multimedia data to which the sample data belongs to obtain a discrimination loss function L _D And an countermeasure loss function L _GAN To enable judgment of a true high-quality image and generation of a high-quality image, optimize L _D Optimizing L for accurately identifying which pictures are generated _GAN The method is used for enabling the image obtained after the quality enhancement to be difficult to identify, so that more picture details are restored in the HQ, and higher quality enhancement is achieved.

Finally according to the first loss function L ₁ Perceptual loss function L _percep And an countermeasure loss function L _GAN Obtaining a loss function L _G Wherein L is _G ＝αL ₁ +βL _percep +γL _GAN ，αAnd beta and gamma respectively represent the weight of the corresponding loss.

Step S510, training the process of enhancing the first quality data block to the second quality data block according to the loss function, to obtain a machine learning model.

In the step S510, the loss function L obtained in the step S508 is used _G And discriminating a loss function L _D Training the process of enhancing the first quality data block to the second quality data block again, and finally obtaining the machine learning model.

In the embodiment of the application, a mode of training a video enhancement model based on a depth convolution neural network in a circulating and iterating way, deducing an input low-quality video based on the trained model, and completing a video enhancement task is adopted, and the image quality of sample data is degraded through at least one random degradation mode to obtain first-quality multimedia data; randomly sampling from the first quality multimedia data to obtain a first quality data block; enhancing the first quality data block to obtain a second quality data block, wherein the image quality of the second quality data block is different from that of the first quality data block; comparing the second quality data block with the data block in the sample data and the data block in the multimedia data to which the sample data belongs respectively to obtain a loss function; training the process of enhancing the first quality data block to the second quality data block according to the loss function to obtain a multimedia quality enhancement mode, thereby achieving the purposes of generating a high-quality video image and realizing video quality restoration, further realizing the technical effect of improving the video image restoration quality, and further solving the technical problem that the detail in the video image is difficult to make supplementary restoration after the video image is enhanced based on a convolutional neural network in the prior art, so that the image is excessively smooth.

Optionally, in step S502, the image quality of the sample data is degraded by at least one random degradation mode, and before the first quality multimedia data is obtained, the method for training multimedia data provided in the embodiment of the present application further includes:

step S500, sample data are collected from the multimedia data according to the application scene.

Specifically, data is collected from a real scene, mainly aiming at an application scene, corresponding high-quality film and television video data is collected, a high-quality video image is extracted as training data, and finally sample data is obtained.

Optionally, in step S504, randomly sampling from the first quality multimedia data, to obtain a first quality data block includes:

step S5041, randomly acquiring a data block from the first quality multimedia data;

in step S5042, the data block is filtered according to the preset data block information to obtain a first quality data block.

Specifically, in combination with step S5041 and step S5042, small data blocks are sampled from the whole image, and in each iteration, the small data blocks can be randomly sampled from the whole image, so that the randomness ensures the richness of sampling, meanwhile, the information richness degree of the data blocks is added for discrimination, the good data blocks are intelligently screened out as training input, and the data blocks with too little information and too poor quality are filtered out, so that the negative influence on model training is avoided.

Optionally, in step S508, comparing the second quality data block with the data block in the sample data and the data block in the multimedia data to which the sample data belongs, respectively, and obtaining the loss function includes:

step S5081, comparing the second quality data block with the data block in the sample data to obtain a first loss function and a perception loss function, wherein the multimedia data generated by the first loss function and the perception loss function due to constraint is close to the multimedia data in terms of content and sensory characteristics;

specifically, the first mass data block is denoted as LQ, the second mass data block is denoted as HQ, and the data block GT of the sample data corresponding to HQ and LQ is denoted as L ₁ Loss (i.e., the first loss function in embodiments of the application) and perceived loss L _percep (i.e., the perceptual loss function in embodiments of the present application) to simultaneously constrain the generated image to approximate the original high quality image in terms of content and sensory characteristics.

Step S5082, comparing the second quality data block with the data block in the multimedia data to which the sample data belongs to obtain a second loss function and an antagonistic loss function, wherein the second loss function is used for improving the judgment of the generated picture; an anti-loss function for improving the efficiency of recovering details in the multimedia data by the second quality data block;

Specifically, the HQ is compared with a data block in the multimedia data to which the sample data belongs to obtain a discrimination loss L _D (i.e., the second loss function in the embodiment of the present application) and counterloss L _GAN (i.e., the fight loss function in embodiments of the application).

Step S5083, obtaining a third loss function according to the first loss function, the perceived loss function and the counterloss function;

specifically, the third loss function obtained according to the first loss function, the perceptual loss function, and the antagonistic loss function may be obtained by the following formula:

third loss function L _G ＝αL ₁ +βL _percep +γL _GAN ；

Wherein α, β, γ represent the weight of the corresponding loss, respectively.

Step S5084, the third loss function is determined as a loss function.

In summary, referring to step S502 to step S510, as shown in fig. 6, fig. 6 is a flowchart of model training in a method for training multimedia data according to a third embodiment of the present application, and the method for training multimedia data provided by the embodiment of the present application specifically includes the following steps:

firstly, constructing a video image custom random degradation module, constructing a video enhancement model based on a convolutional neural network as a generator, and constructing a discriminator based on the convolutional neural network;

(1) Data are collected from a real scene, and corresponding high-quality film and television video data are collected mainly aiming at an application scene, and high-quality video images are extracted to serve as training data;

(2) And (3) carrying out self-defined random degradation on the high-quality video image of the training data in the step (1) to carry out degradation of various random combinations, and generating a low-quality video image serving as low-quality data of a training generator. The random degradation module formed by the modes of downsampling, blurring, noise adding, compression coding and the like can effectively simulate degradation of low-quality video quality and generate training data more suitable for application scenes;

(3) And (3) sampling small data blocks from the whole image through a random intelligent sampling module as the input of each training generator for the low-quality video image after the degradation in the step (2). The module can randomly sample small blocks from the whole graph in each iteration, the randomness ensures the richness of sampling, meanwhile, the information richness degree of the data blocks is added for discrimination, the well-screened sample blocks are intelligently selected as training input, and the data blocks with too little information quantity and too poor quality are filtered out, so that the negative influence on model training is avoided;

(4) Taking the low-quality data block generated in the step (3) as a generator input, recording as LQ, obtaining an output HQ with enhanced quality after the generator, and obtaining L from a high-quality video data block GT corresponding to the HQ and the LQ ₁ Loss and perceived loss L _percep The generated image can be restrained to approach to the original high-quality image in terms of content and sensory characteristics;

(5) Taking the HQ generated in (4) and the corresponding original high-quality data block GT as inputs of a discriminator, and obtaining corresponding discrimination loss L _D And counter loss L _GAN Mainly measures whether the discriminator can discriminate the true high quality image and generate the high quality image, optimizes L _D The aim of (1) is to train a strong arbiter to accurately identify which pictures are generated and optimize L _GAN The aim of the method is to make the image generated by the generator deceptively become a real image, so that more picture details can be restored in the HQ on the basis of the method (4), and the enhancement of higher quality can be realized;

(6) Based on (4) and (5), a loss function L of the final discriminator is obtained _D Loss function L of sum generator _G ＝αL ₁ +βL _percep +γL _GAN Wherein alpha, beta and gamma respectively represent the weight of the corresponding loss, and the two losses are respectively returned to the discriminator and the generator moduleTraining the two models in a random gradient descent mode to finally obtain a good generator, namely a video quality enhancement model;

it should be noted that, in the embodiment of the present application, the random degradation module may be based on a conventional image degradation algorithm, or may obtain a degradation model based on deep neural network learning as a degradation algorithm; the random intelligent sampling module can be based on traditional algorithms such as saliency detection and edge detection, and can also be used for obtaining an information richness discrimination model based on deep neural network learning as a discrimination algorithm; the sample size of the data block during training is chosen from a variety of ways, and even a larger data block or even a full high quality map can be used as training input if the training time cost and resource consumption are not considered. That is, the method for training multimedia data provided by the embodiment of the present application is only described by taking the above example as an example, and the method for training multimedia data provided by the embodiment of the present application is not limited in detail.

According to the method for training the multimedia data, provided by the embodiment of the application, the training data is generated by using the real video data, the random degradation module and the random intelligent sampling module, so that a low-quality video scene can be effectively fitted, and the generalization performance of the video application is enhanced. Meanwhile, a training mechanism for generating an countermeasure network is introduced, so that further enhancement and restoration of details can be realized, and the problem of excessive smoothness of the enhanced image is solved.

It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.

From the above description of the embodiments, it will be clear to a person skilled in the art that the method of training multimedia data according to the above embodiments may be implemented by means of software plus a necessary general hardware platform, but of course also by means of hardware, although in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present application.

Example 4

According to an embodiment of the present application, there is further provided an apparatus for implementing the data processing method in the above embodiment 1, and fig. 7 is a schematic diagram of a data processing apparatus according to a fourth embodiment of the present application. As shown in fig. 7, a data processing apparatus provided in an embodiment of the present application includes: a data receiving module 72, configured to receive the first multimedia data uploaded by the client; a selection module 74 for receiving a client-selected machine learning model; a data conversion module 76 for converting the first multimedia data into second multimedia data based on the machine learning model; a display module 78 for displaying the second multimedia data; wherein the image quality of the second multimedia data is different from the first multimedia data.

Example 5

According to an embodiment of the present application, there is further provided an apparatus for implementing the data processing method in the foregoing embodiment 2, and fig. 8 is a schematic diagram of a data processing apparatus according to a fifth embodiment of the present application. As shown in fig. 8, includes: a receiving module 80, configured to receive a service request from a client; the parsing module 82 is configured to parse the service request to obtain a first multimedia data address and a machine learning model ID; a data acquisition module 84, configured to acquire the first multimedia data through the first multimedia data address; a model acquisition module 86 for acquiring a machine learning model according to the machine learning model ID; the data conversion module 88 is configured to convert the first multimedia data into second multimedia data based on the machine learning model, wherein the second multimedia data has a different image quality from the first multimedia data.

Example 6

There is further provided an apparatus for implementing the method for training multimedia data in the above embodiment 3 according to an embodiment of the present application, and fig. 9 is a schematic diagram of an apparatus for training multimedia data according to a sixth embodiment of the present application. As shown in fig. 9, the apparatus for training multimedia data provided in the embodiment of the present application includes: a degradation module 90, configured to degrade the image quality of the sample data by at least one random degradation mode, so as to obtain first quality multimedia data; a sampling module 92, configured to randomly sample from the first quality multimedia data to obtain a first quality data block; an enhancing module 94, configured to enhance the first quality data block to obtain a second quality data block, where the image quality of the second quality data block is different from that of the first quality data block; an obtaining module 96, configured to compare the second quality data block with the data block in the sample data and the data block in the multimedia data to which the sample data belongs, respectively, to obtain a loss function; the model training module 98 is configured to train the process of enhancing the first quality data block to the second quality data block according to the loss function, so as to obtain a machine learning model.

Example 7

According to an aspect of another embodiment of the present invention, there is further provided a storage medium, where the storage medium includes a stored program, where the program, when executed, controls a device where the storage medium is located to perform the method for processing data and training multimedia data according to any one of the foregoing embodiments 1 to 3.

Example 7

According to another aspect of another embodiment of the present invention, there is provided a processor, where the processor is configured to execute a program, where the program executes the method for processing data and training multimedia data according to any one of the foregoing embodiments 1 to 3.

Example 8

The embodiment of the invention also provides a storage medium. Alternatively, in this embodiment, the storage medium may be used to store program code executed by the method for training multimedia data provided in the first embodiment.

Alternatively, in this embodiment, the storage medium may be located in any one of the computer terminals in the computer terminal group in the computer network, or in any one of the mobile terminals in the mobile terminal group.

Alternatively, in the present embodiment, the storage medium is configured to store program code for performing the steps of: degrading the image quality of the sample data through at least one random degradation mode to obtain first quality multimedia data; randomly sampling from the first quality multimedia data to obtain a first quality data block; enhancing the first quality data block to obtain a second quality data block, wherein the image quality of the second quality data block is different from that of the first quality data block; comparing the second quality data block with the data block in the sample data and the data block in the multimedia data to which the sample data belongs respectively to obtain a loss function; and training the process of enhancing the first quality data block to the second quality data block according to the loss function to obtain a machine learning model.

Alternatively, in the present embodiment, the storage medium is configured to store program code for performing the steps of: before image quality of sample data is degraded through at least one random degradation mode to obtain first quality multimedia data, the sample data is collected from the multimedia data according to an application scene.

Alternatively, in the present embodiment, the storage medium is configured to store program code for performing the steps of: the at least one random degradation pattern comprises: downsampling, blurring, noise adding, compression coding.

Alternatively, in the present embodiment, the storage medium is configured to store program code for performing the steps of: randomly sampling from the first quality multimedia data to obtain a first quality data block includes: randomly acquiring a data block from the first quality multimedia data; and screening the data blocks according to the preset data block information to obtain a first quality data block.

Alternatively, in the present embodiment, the storage medium is configured to store program code for performing the steps of: comparing the second quality data block with the data block in the sample data and the data block in the multimedia data to which the sample data belongs respectively, wherein the obtaining the loss function comprises the following steps: comparing the second quality data block with the data blocks in the sample data to obtain a first loss function and a perception loss function, wherein the multimedia data generated by the first loss function and the perception loss function due to constraint is close to the multimedia data in terms of content and sensory characteristics; comparing the second quality data block with the data block in the multimedia data to which the sample data belong to obtain a second loss function and an anti-loss function, wherein the second loss function is used for improving the judgment of the generated picture; an anti-loss function for improving the efficiency of recovering details in the multimedia data by the second quality data block; obtaining a third loss function according to the first loss function, the perceived loss function and the counterloss function; the third loss function is determined as a loss function.

Alternatively, in the present embodiment, the storage medium is configured to store program code for performing the steps of: the multimedia data to which the sample data belongs includes: video data, picture data, augmented reality image data, or virtual reality image data.

The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

In the foregoing embodiments of the present application, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.

In the several embodiments provided in the present application, it should be understood that the disclosed technology may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, such as the division of the units, is merely a logical function division, and may be implemented in another manner, for example, multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims

1. A data processing method, comprising:

receiving first multimedia data uploaded by a client;

receiving a machine learning model selected by a client;

converting the first multimedia data into second multimedia data based on the machine learning model;

displaying the second multimedia data; wherein the image quality of the second multimedia data is different from the first multimedia data;

the machine learning model is obtained by training a process of enhancing a first quality data block to a second quality data block according to a loss function, the loss function is obtained by respectively comparing the second quality data block with a data block in the sample data and a data block in multimedia data to which the sample data belongs, the second quality data block is obtained by enhancing the first quality data block, the image quality of the second quality data block is different from that of the first quality data block, the first quality data block is obtained by randomly sampling the first quality multimedia data, and the first quality multimedia data is obtained by degrading the image quality of the sample data through a combination of a plurality of random degradation modes;

The loss function is further used for determining based on a third loss function, and the third loss function is obtained according to the first loss function, the perceived loss function and the counterloss function;

the countermeasures loss function is used for comparing the second quality data block with the data block in the multimedia data of the sample data, wherein the countermeasures loss function is used for improving the efficiency of restoring details in the multimedia data through the second quality data block, and the multimedia data of the sample data comprises: video data, picture data, augmented reality image data, or virtual reality image data;

the first loss function and the perceptual loss function are used for comparing the second quality data block with the sample data block, wherein the multimedia data generated by the first loss function and the perceptual loss function due to constraint is close to the multimedia data in terms of content and sensory characteristics.

2. A data processing method, comprising:

receiving a service request from a client;

analyzing the service request to obtain a first multimedia data address and a machine learning model ID;

acquiring the first multimedia data through the first multimedia data address;

Acquiring the machine learning model according to the machine learning model ID;

converting the first multimedia data into second multimedia data based on the machine learning model, wherein the second multimedia data has a different image quality than the first multimedia data;

3. The method according to claim 2, wherein the second multimedia data is converted into data specifications according to the type of the device to be put in, and the converted second multimedia data is obtained.

4. A method of training multimedia data, comprising:

degrading the image quality of the sample data through the combination of a plurality of random degradation modes to obtain first quality multimedia data;

Randomly sampling from the first quality multimedia data to obtain a first quality data block;

enhancing the first quality data block to obtain a second quality data block, wherein the image quality of the second quality data block is different from that of the first quality data block;

comparing the second quality data block with the data block in the sample data and the data block in the multimedia data to which the sample data belongs respectively to obtain a loss function;

training the process of enhancing the first quality data block to the second quality data block according to the loss function to obtain a machine learning model;

wherein the comparing the second quality data block with the data block in the sample data and the data block in the multimedia data to which the sample data belongs, respectively, and obtaining a loss function includes:

comparing the second quality data block with the data blocks in the sample data to obtain a first loss function and a perception loss function, wherein the multimedia data generated by the first loss function and the perception loss function due to constraint are close to the multimedia data in terms of content and sensory characteristics;

Comparing the second quality data block with a data block in the multimedia data to which the sample data belongs to obtain an counterloss function, wherein the counterloss function is used for improving the efficiency of restoring details in the multimedia data through the second quality data block, and the multimedia data to which the sample data belongs includes: video data, picture data, augmented reality image data, or virtual reality image data;

obtaining a third loss function according to the first loss function, the perceived loss function and the counterloss function;

determining the third loss function as the loss function.

5. The method of claim 4, wherein prior to degrading the image quality of the sample data by at least one random degradation pattern to obtain the first quality multimedia data, the method further comprises:

and collecting the sample data from the multimedia data according to the application scene.

6. The method of claim 4 or 5, wherein the at least one random degradation pattern comprises: downsampling, blurring, noise adding, compression coding.

7. The method of claim 4, wherein the randomly sampling from the first quality multimedia data to obtain a first quality data block comprises:

Randomly acquiring a data block from the first quality multimedia data;

and screening the data blocks according to preset data block information to obtain the first quality data block.

8. A data processing apparatus comprising:

the data receiving module is used for receiving the first multimedia data uploaded by the client;

the selection module is used for receiving the machine learning model selected by the client;

a data conversion module for converting the first multimedia data into second multimedia data based on the machine learning model;

the display module is used for displaying the second multimedia data; wherein the image quality of the second multimedia data is different from the first multimedia data;

9. A data processing apparatus comprising:

the receiving module is used for receiving a service request from the client;

the analysis module is used for analyzing the service request and acquiring a first multimedia data address and a machine learning model ID;

The data acquisition module is used for acquiring the first multimedia data through the first multimedia data address;

the model acquisition module is used for acquiring the machine learning model according to the machine learning model ID;

a data conversion module for converting the first multimedia data into second multimedia data based on the machine learning model, wherein the second multimedia data has a different image quality than the first multimedia data;

10. An apparatus for training multimedia data, comprising:

the degradation module is used for degrading the image quality of the sample data through at least one random degradation mode to obtain first quality multimedia data;

the sampling module is used for randomly sampling from the first quality multimedia data to obtain a first quality data block;

the enhancement module is used for enhancing the first quality data block to obtain a second quality data block, wherein the image quality of the second quality data block is different from that of the first quality data block;

the acquisition module is used for comparing the second quality data block with the data block in the sample data and the data block in the multimedia data to which the sample data belongs respectively to acquire a loss function;

the model training module is used for training the process of enhancing the first quality data block to the second quality data block according to the loss function to obtain a machine learning model;

the acquisition module is further used for comparing the second quality data block with the data blocks in the sample data to obtain a first loss function and a perception loss function, wherein the multimedia data generated by the first loss function and the perception loss function due to constraint are close to the multimedia data in terms of content and sensory characteristics; comparing the second quality data block with a data block in the multimedia data to which the sample data belongs to obtain an counterloss function, wherein the counterloss function is used for improving the efficiency of restoring details in the multimedia data through the second quality data block, and the multimedia data to which the sample data belongs includes: video data, picture data, augmented reality image data, or virtual reality image data; obtaining a third loss function according to the first loss function, the perceived loss function and the counterloss function; determining the third loss function as the loss function.

11. A storage medium, wherein the storage medium comprises a stored program, wherein the program, when run, controls a device in which the storage medium is located to perform the method of data processing, training multimedia data according to any one of claims 1 to 7.

12. A processor, wherein the processor is configured to run a program, wherein the program when run performs the method of data processing, training multimedia data as claimed in any one of claims 1 to 7.