CN116128775A

CN116128775A - Three-dimensional lookup table training method and video enhancement method

Info

Publication number: CN116128775A
Application number: CN202111309389.4A
Authority: CN
Inventors: 张荣成; 苏坦; 姜文杰
Original assignee: Insta360 Innovation Technology Co Ltd
Current assignee: Insta360 Innovation Technology Co Ltd
Priority date: 2021-11-06
Filing date: 2021-11-06
Publication date: 2023-05-16

Abstract

The application relates to a three-dimensional lookup table training method and a video enhancement method. The three-dimensional lookup table training method comprises the following steps: acquiring a training image set, a verification image set and an initial three-dimensional lookup table, wherein the training image in the training image set carries scene category labels; respectively extracting image features of the training images to obtain image feature information corresponding to the training images; according to the image characteristic information, carrying out scene classification on the training image to obtain a predicted scene classification result, and inputting the image characteristic information into an initial three-dimensional lookup table to obtain a predicted enhanced image; obtaining a loss function according to the scene category label, the predicted scene classification result, the verification image and the predicted enhancement image; and returning to the step of respectively extracting the image characteristics of the training images to obtain the image characteristic information corresponding to the training images until the loss function converges to obtain the target three-dimensional lookup table. By adopting the scheme, the video enhancement effect and the instantaneity can be improved.

Description

Three-dimensional lookup table training method and video enhancement method

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to a three-dimensional lookup table training method and a video enhancement method.

Background

With the development of image processing technology, image enhancement technology has emerged. The image enhancement technique can enhance useful information in an image, and aims to improve visual effect of the image and improve definition of the image. Image enhancement techniques may be applied in video enhancement.

In the prior art, when enhancement processing is performed on video, frame extraction processing is performed on the video to obtain corresponding video frames, then the video frames are sequentially input into a convolutional neural network (convolutional neural networks, CNN) trained in advance, so that enhancement processing of the video frames is realized, and finally video enhancement can be realized by collecting the video frames subjected to the enhancement processing. When a video frame is enhanced by using a convolutional neural network trained in advance, the method mainly comprises the steps of downsampling an input image with full resolution through the convolutional neural network, extracting features of a low-resolution image obtained by downsampling, learning a three-dimensional look-up table (3D LUT), and converting each input pixel of the input image according to a mapping relation of input pixels and output pixels in the three-dimensional look-up table to obtain an enhancement result.

However, in the conventional method, since the video frame is first downsampled and learned to the three-dimensional lookup table, the obtained three-dimensional lookup table also corresponds to a small resolution, and many enhancement effects are lost when the three-dimensional lookup table is applied to the video frame with the original resolution, so that the problem of poor video enhancement effect exists. If the convolutional neural network is directly adopted to perform full-image processing on the video frame with full resolution, the higher the resolution of the video frame is, the slower the corresponding processing speed is, and when the resolution of the video frame is too high, the problem of poor video enhancement instantaneity exists due to the too low processing speed.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a three-dimensional look-up table training method, a video enhancement method, a device, a computer apparatus, and a storage medium that can improve video enhancement effects and video enhancement instantaneity.

A method of three-dimensional look-up table training, the method comprising:

acquiring a training image set, a verification image set and an initial three-dimensional lookup table, wherein the training images in the training image set carry scene category labels, the verification images in the verification image set correspond to the training images in the training image set one by one, and the verification images are enhancement images corresponding to the training images;

Respectively extracting image features of the training images to obtain image feature information corresponding to the training images;

according to the image characteristic information, carrying out scene classification on the training image to obtain a predicted scene classification result, and inputting the image characteristic information into an initial three-dimensional lookup table to obtain a predicted enhanced image;

obtaining a loss function according to the scene category label, the predicted scene classification result, the verification image and the predicted enhancement image;

and returning to the step of respectively extracting the image characteristics of the training images to obtain the image characteristic information corresponding to the training images until the loss function converges to obtain the target three-dimensional lookup table.

In one embodiment, the initial three-dimensional lookup table includes three-dimensional sub-lookup tables, the number of which corresponds to the dimensions of the image feature information.

In one embodiment, inputting the image characteristic information into an initial three-dimensional look-up table to obtain the predicted enhanced image comprises:

determining the weight of a lookup table of the three-dimensional sub-lookup table according to each dimension characteristic information in the image characteristic information;

weighting the three-dimensional sub-lookup table according to the weight of the lookup table to obtain a weighted three-dimensional lookup table;

and inputting the training image into a weighted three-dimensional lookup table to obtain a prediction enhancement image.

In one embodiment, deriving the loss function based on the scene category label, the predicted scene category result, the verification image, and the predicted enhanced image comprises:

calculating a first loss function according to the scene category labels and the prediction scene classification results, and calculating a second loss function according to the verification image and the prediction enhancement image;

and obtaining a loss function according to the first loss function and the second loss function.

In one embodiment, calculating the second loss function from the verification image and the predicted enhancement image comprises:

determining a pixel point corresponding relation between a first pixel point in the verification image and a second pixel point in the prediction enhancement image;

determining a pixel point difference value between a first pixel point and a corresponding second pixel point according to the pixel point corresponding relation;

and calculating a second loss function according to the pixel point difference value.

A three-dimensional look-up table training apparatus, the apparatus comprising:

the acquisition module is used for acquiring a training image set, a verification image set and an initial three-dimensional lookup table, wherein the training images in the training image set carry scene category labels, the verification images in the verification image set correspond to the training images in the training image set one by one, and the verification images are enhancement images corresponding to the training images;

The feature extraction module is used for extracting image features of the training images respectively to obtain image feature information corresponding to the training images;

the first processing module is used for carrying out scene classification on the training image according to the image characteristic information to obtain a predicted scene classification result, and inputting the image characteristic information into the initial three-dimensional lookup table to obtain a predicted enhanced image;

the loss function calculation module is used for obtaining a loss function according to the scene category label, the predicted scene classification result, the verification image and the predicted enhancement image;

and the second processing module is used for returning to the step of respectively extracting the image characteristics of the training images to obtain the image characteristic information corresponding to the training images until the loss function converges to obtain the target three-dimensional lookup table.

A computer device comprising a memory storing a computer program and a processor which when executing the computer program performs the steps of:

A computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

A method of video enhancement, the method comprising:

acquiring video data to be processed;

extracting frames of the video data to be processed to obtain a video frame set corresponding to the video data to be processed;

sequentially inputting video frames in a video frame set into a target three-dimensional lookup table to obtain an enhanced video frame corresponding to the video frame, wherein the target three-dimensional lookup table is obtained through the three-dimensional lookup table training method;

and obtaining the enhanced video data according to the enhanced video frame.

In one embodiment, sequentially inputting video frames in a video frame set into a target three-dimensional lookup table to obtain an enhanced video frame corresponding to the video frame includes:

sequentially inputting video frames in the video frame set into a target three-dimensional lookup table to obtain a lookup output result corresponding to the video frames;

When the video frame is not the first video frame, acquiring a search output result of the last video frame corresponding to the video frame;

carrying out moving average on the searching output result of the previous video frame and the searching output result to obtain a target searching output result;

and carrying out enhancement adjustment on the video frame according to the target searching output result to obtain an enhanced video frame corresponding to the video frame.

A video enhancement device, the device comprising:

the video acquisition module is used for acquiring video data to be processed;

the frame extraction module is used for extracting frames of the video data to be processed to obtain a video frame set corresponding to the video data to be processed;

the enhancement module is used for sequentially inputting video frames in the video frame set into the target three-dimensional lookup table to obtain enhanced video frames corresponding to the video frames, and the target three-dimensional lookup table is obtained through the three-dimensional lookup table training method;

and the frame processing module is used for obtaining the enhanced video data according to the enhanced video frame.

acquiring video data to be processed;

and obtaining the enhanced video data according to the enhanced video frame.

acquiring video data to be processed;

and obtaining the enhanced video data according to the enhanced video frame.

According to the three-dimensional lookup table training method, the device, the computer equipment and the storage medium, after the training image set, the verification image set and the initial three-dimensional lookup table are obtained, image feature extraction is respectively carried out on the training image to obtain image feature information corresponding to the training image, scene classification is carried out on the training image according to the image feature information to obtain a prediction scene classification result, the image feature information is input into the initial three-dimensional lookup table to obtain a prediction enhancement image, preliminary training on the initial three-dimensional lookup table can be achieved, a loss function is obtained through the scene classification label, the prediction scene classification result, the verification image and the prediction enhancement image, image feature extraction is carried out on the training image respectively until the loss function converges, and the image feature information corresponding to the training image is obtained. The method combines scene classification training to obtain the target three-dimensional lookup table capable of accurately realizing enhancement toning, can improve video enhancement effect, and simultaneously utilizes the target three-dimensional lookup table to process video frames in video data, so that the corresponding relation between pixel values of the video frames and pixel values of the enhancement video frames can be obtained rapidly, the processing speed of the video frames is increased, and the video enhancement instantaneity is improved.

According to the video enhancement method, the video enhancement device, the computer equipment and the storage medium, the video data to be processed are acquired, the frame extraction is carried out on the video data to be processed, the video frame set corresponding to the video data to be processed is obtained, the video frames in the video frame set are sequentially input into the target three-dimensional lookup table, the enhanced video frames corresponding to the video frames are obtained, the enhanced video data are obtained according to the enhanced video frames, the video frames in the video data are processed by utilizing the target three-dimensional lookup table, the video enhancement effect can be improved, the corresponding relation between the pixel values of the video frames and the pixel values of the enhanced video frames can be obtained rapidly, the processing speed of the video frames is accelerated, and the video enhancement instantaneity is improved.

Drawings

FIG. 1 is a flow chart of a three-dimensional look-up table training method in one embodiment;

FIG. 2 is a schematic diagram of a neural network in one embodiment;

FIG. 3 is a schematic diagram of a scene prediction network in one embodiment;

FIG. 4 is a flow chart of a video enhancement method in one embodiment;

FIG. 5 is a flow chart of a three-dimensional look-up table training method according to another embodiment;

FIG. 6 is a block diagram of a three-dimensional look-up table training apparatus in one embodiment;

FIG. 7 is a block diagram of a video enhancement device in one embodiment;

fig. 8 is an internal structural diagram of a computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

In one embodiment, as shown in fig. 1, a three-dimensional lookup table training method is provided, where the method is applied to a server for illustration, it is understood that the method may also be applied to a terminal, and may also be applied to a system including the terminal and the server, and implemented through interaction between the terminal and the server. The terminal may be, but not limited to, various personal computers, notebook computers, smartphones, tablet computers, portable wearable devices, panoramic cameras, motion cameras, etc., and the server may be implemented by an independent server or a server cluster formed by a plurality of servers. In this embodiment, the method includes the steps of:

Step 102, a training image set, a verification image set and an initial three-dimensional lookup table are obtained, wherein the training images in the training image set carry scene category labels, the verification images in the verification image set correspond to the training images in the training image set one by one, and the verification images are enhancement images corresponding to the training images.

The training image set comprises training images, wherein the training images refer to images which are shot by shooting equipment such as a camera and a mobile phone and are not subjected to image processing. For example, the training image may specifically refer to a picture taken using a photographing device. For another example, the training image may specifically refer to a video frame extracted from a video captured using a capturing device. The verification image set comprises a verification image, wherein the verification image is an enhanced image corresponding to the training image, and is obtained by toning the training image. A three-dimensional Look-Up Table (3 d lut) is a data set structure that can directly query corresponding output data through input data. For example, according to the RGB (red, green, blue, red, green, blue) values of the training image, the corresponding output RGB values may be found in the three-dimensional lookup table. The scene category label is used for representing a scene category corresponding to the training image. For example, scene categories may specifically refer to day, night, etc. differentiated by brightness.

Specifically, when training the three-dimensional lookup table, the server may acquire a training image set, a verification image set and an initial three-dimensional lookup table from a preset database.

And 104, respectively extracting image features of the training images to obtain image feature information corresponding to the training images.

Wherein the image characteristic information refers to information for characterizing the image characteristic. For example, the image feature information may specifically refer to an N-dimensional vector for characterizing the image feature.

Specifically, the server acquires a neural network, and performs image feature extraction on the training images by using the neural network to obtain image feature information corresponding to the training images. The neural network is a network composed of a downsampling layer, a convolution layer and an output layer, wherein the downsampling layer is used for downsampling the training image so as to reduce the image and reduce the influence of the resolution of the training image on the training speed. The specific network architecture of the neural network can be set according to the needs, and the embodiment is not particularly limited herein. For example, the network architecture of the neural network may include a Downsampling layer (Downsampling), a convolutional layer composed of a plurality of convolutional networks (conv+relu), and an output layer (dropout+conv) as shown in fig. 2.

And 106, carrying out scene classification on the training image according to the image characteristic information to obtain a predicted scene classification result, and inputting the image characteristic information into an initial three-dimensional lookup table to obtain a predicted enhanced image.

The scene classification refers to classifying what scene the training image belongs to. For example, scene classification may specifically refer to classifying whether a training image is attributed to the day or night. The initial three-dimensional lookup table refers to a three-dimensional lookup table for which no parameter adjustment is performed.

Specifically, the server can perform scene classification on the training image by using the image characteristic information to obtain a predicted scene classification result. When the scene classification is carried out, the server can acquire a scene prediction network, and the scene prediction network is utilized to realize the scene classification of the training images. The scene prediction network includes a fully connected layer and a softmax layer. For example, as shown in fig. 3, the scene prediction network may be connected to the neural network, and after the neural network outputs the image feature information (f 1, f2, f 3), the image feature information may be directly input into the scene prediction network, where the probabilities (p 1, p 2) that the training image belongs to each preset scene may be obtained through the full connection layer and the softmax layer, and the probabilities that the training image belongs to each preset scene may be the prediction scene classification result.

Specifically, the initial three-dimensional lookup table comprises three-dimensional sub-lookup tables with the number corresponding to the dimensions of the image feature information, after the image feature information is obtained, the server determines the lookup table weight of each three-dimensional sub-lookup table by utilizing the feature information of each dimension in the image feature information, weights the three-dimensional sub-lookup tables according to the lookup table weights to obtain weighted three-dimensional lookup tables, and inputs the training images into the weighted three-dimensional lookup tables to obtain the corresponding prediction enhancement images. After the training image is input into the weighted three-dimensional lookup table, for the parameter value of each pixel point in the training image, a corresponding output parameter value can be found in the weighted three-dimensional lookup table, and the parameter value of each pixel point in the training image is adjusted according to the output parameter value of each pixel point, so that the prediction enhancement image can be obtained. The parameter values may specifically be RGB values, YUV values, etc., which are not specifically limited herein.

And step 108, obtaining a loss function according to the scene category label, the prediction scene classification result, the verification image and the prediction enhancement image.

Wherein, in the process of training the neural network, the loss function is used for evaluating the degree that the predicted value and the true value of the neural network are different, and the smaller the loss function is, the better the performance of the neural network is generally indicated. In this embodiment, the loss function is used to evaluate the degree to which the scene classification and the predicted value and the true value of the image enhancement are different, and the smaller the loss function, the higher the accuracy of the scene classification and the better the image enhancement effect.

Specifically, the server calculates a first loss function according to the scene category label and the prediction scene classification result, calculates a second loss function according to the verification image and the prediction enhancement image, and obtains the loss function according to the first loss function and the second loss function.

And 110, returning to the step of respectively extracting the image characteristics of the training images to obtain the image characteristic information corresponding to the training images until the loss function converges to obtain a target three-dimensional lookup table.

Specifically, when the loss function is not converged, the server returns to the step of respectively extracting image features of the training image to obtain image feature information corresponding to the training image, adjusts the weight matrix of the neural network and the scene prediction network and parameters of the three-dimensional sub-lookup table in the initial three-dimensional lookup table to obtain a new neural network, a new scene prediction network and a new three-dimensional lookup table, performs image feature extraction on the training image by using the new neural network to obtain new image feature information, performs scene classification on the training image according to the new image feature information and the new scene prediction network to obtain a prediction scene classification result, inputs the new image feature information into the new three-dimensional lookup table to obtain a prediction enhancement image, calculates the loss function again according to the scene classification label, the prediction scene classification result, the verification image and the prediction enhancement image, and judges whether the loss function is converged. And when the loss function still does not converge, returning to the step of respectively extracting the image characteristics of the training images to obtain the image characteristic information corresponding to the training images until the loss function converges to obtain the target three-dimensional lookup table.

According to the three-dimensional lookup table training method, after the training image set, the verification image set and the initial three-dimensional lookup table are obtained, image feature extraction is carried out on the training images respectively to obtain image feature information corresponding to the training images, scene classification is carried out on the training images according to the image feature information to obtain a prediction scene classification result, the image feature information is input into the initial three-dimensional lookup table to obtain a prediction enhancement image, preliminary training on the initial three-dimensional lookup table can be achieved, a loss function is obtained through the scene classification label, the prediction scene classification result, the verification image and the prediction enhancement image, image feature extraction is carried out on the training images respectively, and the step of obtaining the image feature information corresponding to the training images is returned until the loss function converges, so that the target three-dimensional lookup table is obtained. The method combines scene classification training to obtain the target three-dimensional lookup table capable of accurately realizing enhancement toning, can improve video enhancement effect, and simultaneously utilizes the target three-dimensional lookup table to process video frames in video data, so that the corresponding relation between pixel values of the video frames and pixel values of the enhancement video frames can be obtained rapidly, the processing speed of the video frames is increased, and the video enhancement instantaneity is improved.

Specifically, the initial three-dimensional lookup table comprises three-dimensional sub-lookup tables, and the number of the three-dimensional sub-lookup tables corresponds to the dimension of the image characteristic information. For example, when the image feature information is an N-dimensional vector, the number of three-dimensional sub-lookup tables is N.

In this embodiment, by corresponding the number of the three-dimensional sub-lookup tables to the dimensions of the image feature information, more abundant semantic information can be further learned through the plurality of three-dimensional sub-lookup tables, so that enhancement of the training image is realized by using the semantic information.

Specifically, the server multiplies the lookup table weight by the corresponding three-dimensional sub-lookup table to obtain a weighted three-dimensional lookup table, inputs the training image into the weighted three-dimensional lookup table, searches the output parameter value corresponding to the parameter value of each pixel in the training image by the weighted three-dimensional lookup table, and adjusts the parameter value of each pixel in the training image according to the output parameter value of each pixel. The parameter values may specifically be RGB values, YUV values, etc., which are not specifically limited herein.

In this embodiment, the three-dimensional sub-lookup table is weighted according to the lookup table weight according to the dimension feature information in the image feature information, the weighted three-dimensional sub-lookup table is obtained, the training image is input into the weighted three-dimensional lookup table, and the prediction enhancement image is obtained, so that more abundant semantic information can be learned by using a plurality of three-dimensional sub-lookup tables, and enhancement of the training image is realized by using the semantic information.

Specifically, the prediction scene classification result is the probability that the training image belongs to each preset scene, and the server can obtain a cross entropy loss function, namely a first loss function, according to the scene category label and the probability that the training image belongs to each preset scene. Lifting device For example, the training images can be divided into daytime scenes and night scenes according to brightness, and the corresponding scene category labels are daytime labels and night labels c _i E (0, 1), if the probability of the training image belonging to each preset scene is p _i The cross entropy loss function can be obtained as:

wherein n is the number of training images in the training image set, and the scene category label c of the training image _i May be a day tag 0, a night tag 1, or a day tag 1, a night tag 0.

Specifically, the server calculates a second loss function according to the verification image and the prediction enhancement image, and when calculating the second loss function, the server mainly calculates pixel difference values between pixel points at the same position in the verification image and the prediction enhancement image. For example, the second loss function may specifically be:

wherein n represents the number of training images in the training image set, B _i Representing a verification image, A _i Representing a predictive enhanced image, |b _i -A _i ‖ ₁ Representing the corresponding loss function of the image of the list Zhang Xunlian, i.e. the pixel difference between the co-located pixels in the verification image and the predicted enhancement image. For example, the loss function corresponding to a single Zhang Xunlian image can be calculated by the formula: / >

The method comprises the steps of calculating to obtain h which refers to the height of a verification image and a prediction enhancement image, w which refers to the width of the verification image and the prediction enhancement image, and (x, y) which are used for representing pixel points at the same position in the verification image and the prediction enhancement image.

Specifically, after obtaining the first loss function and the second loss function, the server may obtain the loss function by superimposing the first loss function and the second loss function. For example, the loss function may specifically be:

in this embodiment, the first loss function is calculated according to the scene category label and the prediction scene classification result, the second loss function is calculated according to the verification image and the prediction enhancement image, and the loss function is obtained according to the first loss function and the second loss function, so that the loss function can be calculated.

Specifically, the server determines a pixel point correspondence between the first pixel point in the verification image and the second pixel point in the prediction enhancement image according to the position of the first pixel point in the verification image and the position of the second pixel point in the prediction enhancement image. After the pixel point corresponding relation is obtained, the server can calculate a pixel point difference value between the first pixel point and the corresponding second pixel point according to the pixel point corresponding relation, so that a loss function corresponding to a single training image can be determined according to the pixel point difference value, and a second loss function can be calculated according to the loss function corresponding to the single training image. The corresponding second pixel point is a pixel point at the same position as the first pixel point. The pixel difference value mainly represents the difference between the first pixel point and the second pixel point, may be a pixel value difference between the first pixel point and the second pixel point, may be a brightness difference, may be a difference value in other aspects or a comprehensive difference value of a multi-aspect difference value, and is not limited in this embodiment.

For example, the formula may be

Obtaining a loss function corresponding to a single training image, wherein h refers to the heights of the verification image and the prediction enhancement image, w refers to the widths of the verification image and the prediction enhancement image, and (x, y) is used for representing pixel points at the same position in the verification image and the prediction enhancement image.

In this embodiment, by determining the pixel point correspondence, and determining the pixel point difference between the first pixel point and the corresponding second pixel point according to the pixel point correspondence, the calculation of the second loss function can be implemented according to the pixel point difference.

In one embodiment, as shown in fig. 4, a video enhancement method is provided, where this embodiment is applied to a server for illustration, and it is understood that the method may also be applied to a terminal, and may also be applied to a system including a terminal and a server, and implemented through interaction between the terminal and the server. The terminal may be, but not limited to, various personal computers, notebook computers, smartphones, tablet computers and portable wearable devices, and the server may be implemented by a separate server or a server cluster formed by a plurality of servers. In this embodiment, the method includes the steps of:

step 402, obtaining video data to be processed.

The video data to be processed refers to video data to be subjected to enhancement processing.

Specifically, when enhancement processing is required to be performed on video data, the server acquires the video data to be processed.

Step 404, frame extraction is performed on the video data to be processed, so as to obtain a video frame set corresponding to the video data to be processed.

Specifically, after obtaining the video of the data to be processed, the server performs frame extraction on the video data to be processed to obtain a video frame set corresponding to the video data to be processed. The frame extraction mode of the video data to be processed can be uniform frame extraction, namely frame extraction is carried out according to a preset frame extraction interval, in the embodiment, the frame extraction mode is not limited, and the preset frame extraction interval can be set according to the needs.

Step 406, inputting the video frames in the video frame set into a target three-dimensional lookup table in sequence to obtain the enhanced video frames corresponding to the video frames, wherein the target three-dimensional lookup table is obtained through the three-dimensional lookup table training method.

Specifically, the server sequentially inputs video frames in the video frame set into the target three-dimensional lookup table, and can find out output parameter values corresponding to the parameter values of the pixel points in the video frame by using the target three-dimensional lookup table, and can obtain a search output result corresponding to the video frame through the output parameter values of each pixel point. After obtaining the search output result corresponding to the video frame, the server combines the search output result and the search output result of the previous video frame corresponding to the video frame to obtain a target search output result for carrying out enhancement adjustment on the video frame, and carries out enhancement adjustment on the video frame by utilizing the target search output result to obtain the enhancement video frame corresponding to the video frame. The parameter values may specifically be RGB values, YUV values, etc., which are not specifically limited herein.

In step 408, enhanced video data is obtained from the enhanced video frames.

Specifically, after obtaining the enhanced video frame, the server may perform video restoration according to the enhanced video frame to obtain enhanced video data.

According to the video enhancement method, the video data to be processed is obtained, the frame extraction is carried out on the video data to be processed, the video frame set corresponding to the video data to be processed is obtained, the video frames in the video frame set are sequentially input into the target three-dimensional lookup table, the enhanced video frames corresponding to the video frames are obtained, the enhanced video data are obtained according to the enhanced video frames, the video frames in the video data are processed by utilizing the target three-dimensional lookup table, the video enhancement effect can be improved, the corresponding relation between the pixel values of the video frames and the pixel values of the enhanced video frames is obtained rapidly, the processing speed of the video frames is accelerated, and the video enhancement instantaneity is improved.

The search output result refers to an output parameter value of each pixel point in the video frame output by the target three-dimensional search table, and corresponds to the parameter value of each pixel point.

Specifically, because single-frame video frames are mainly processed when enhancement processing is performed, the processing of the video frames is independent, and the relationship between the video frames is not related, discontinuous jitter is caused, and visual appearance of a result is affected, when the enhancement video frames are obtained, a server needs to apply a moving average method to solve the problem of jitter, and a stable result in a time domain is obtained.

Specifically, the server sequentially inputs video frames in the video frame set into a target three-dimensional lookup table to obtain a lookup output result corresponding to the video frames, judges whether the video frames are first video frames, obtains a previous video frame lookup output result corresponding to the video frames when the video frames are not first video frames, performs sliding average on the previous video frame lookup output result and the lookup output result to obtain a target lookup output result, performs enhancement adjustment on the parameter value of each pixel point corresponding to the video frames according to the target parameter value of each pixel point in the target lookup output result, and obtains an enhanced video frame corresponding to the video frames.

For example, the target search output result may be obtained by using a preset running average calculation formula, where the running average calculation formula may specifically be:

moving_average_lut_new＝γ*output_lut+(1-γ)*moving_average_lut，

the output_ lut is a search output result, the moving_average_ lut is a search output result of the previous video frame, the moving_average_ lut _new is a target search output result, and γ is a coefficient set by itself as required between 0 and 1.

In this embodiment, after obtaining the search output result corresponding to the video frame, the search output result of the previous video frame corresponding to the video frame is obtained, the search output result of the previous video frame and the search output result are subjected to a moving average to obtain a target search output result, and the video frame is subjected to enhancement adjustment according to the target search output result to obtain an enhanced video frame corresponding to the video frame, so that the problem of jitter between video frames can be solved by using the moving average.

As shown in fig. 5, the present application further provides a flowchart for illustrating the three-dimensional look-up table training method of the present application. Specifically, the three-dimensional lookup table training method comprises the following steps:

the method comprises the steps that a server acquires a training image set, a verification image set and an initial three-dimensional lookup table, wherein training images in the training image set carry scene category labels, verification images in the verification image set correspond to training images in the training image set one by one, the verification images are enhancement images corresponding to the training images, the initial three-dimensional lookup table comprises three-dimensional sub-lookup tables, and the number of the three-dimensional sub-lookup tables corresponds to the dimension of image characteristic information.

Training image (input picture A) _i ) Inputting a neural network to extract image characteristics to obtain image characteristic information (namely N-dimensional weight in fig. 5) corresponding to a training image, classifying the training image according to the image characteristic information to obtain a prediction scene classification result (namely N-dimensional weight is input into a full-connection layer in fig. 5, and the prediction probability p is obtained after the full-connection layer and softmax are passed through _i ) And determining the lookup table weight of the three-dimensional sub-lookup table (i.e. a plurality of parallel 3 DLUTs after N-dimensional weight in FIG. 5) according to each dimension characteristic information in the image characteristic informationWeighting the three-dimensional sub-lookup table according to the weight of the lookup table to obtain a weighted three-dimensional lookup table (namely, a single 3D LUT obtained by collecting a plurality of parallel 3D LUTs in FIG. 5), inputting the training image into the weighted three-dimensional lookup table to obtain a prediction enhancement image (output picture B) _i ) Calculating a first loss function according to the scene category labels and the prediction scene classification results, calculating a second loss function according to the verification images and the prediction enhancement images, obtaining the loss function according to the first loss function and the second loss function, and returning to the step of respectively extracting the image characteristics of the training images to obtain the image characteristic information corresponding to the training images until the loss function converges to obtain the target three-dimensional lookup table.

It should be understood that, although the steps in the flowcharts related to the above embodiments are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least a part of the steps in the flowcharts related to the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages performed is not necessarily sequential, but may be performed alternately or alternately with at least a part of the steps or stages in other steps or other steps.

In one embodiment, as shown in fig. 6, there is provided a three-dimensional look-up table training apparatus comprising: an acquisition module 602, a feature extraction module 604, a first processing module 606, a loss function calculation module 608, and a second processing module 610, wherein:

the acquisition module 602 is configured to acquire a training image set, a verification image set, and an initial three-dimensional lookup table, where training images in the training image set carry scene class labels, verification images in the verification image set correspond to training images in the training image set one to one, and the verification images are enhancement images corresponding to the training images;

The feature extraction module 604 is configured to perform image feature extraction on the training images respectively, so as to obtain image feature information corresponding to the training images;

the first processing module 606 is configured to perform scene classification on the training image according to the image feature information to obtain a predicted scene classification result, and input the image feature information into an initial three-dimensional lookup table to obtain a predicted enhanced image;

the loss function calculation module 608 is configured to obtain a loss function according to the scene category label, the predicted scene classification result, the verification image and the predicted enhanced image;

the second processing module 610 is configured to return to the step of extracting image features of the training images respectively to obtain image feature information corresponding to the training images until the loss function converges, thereby obtaining the target three-dimensional lookup table.

In one embodiment, the first processing module is further configured to determine a lookup table weight of the three-dimensional sub-lookup table according to each dimension feature information in the image feature information, weight the three-dimensional sub-lookup table according to the lookup table weight to obtain a weighted three-dimensional lookup table, and input the training image into the weighted three-dimensional lookup table to obtain the prediction enhancement image.

In one embodiment, the loss function calculation module is further configured to calculate a first loss function according to the scene category label and the predicted scene category result, calculate a second loss function according to the verification image and the predicted enhanced image, and obtain the loss function according to the first loss function and the second loss function.

In one embodiment, the loss function calculation module is further configured to determine a pixel corresponding relationship between a first pixel in the verification image and a second pixel in the prediction enhancement image, determine a pixel difference between the first pixel and the corresponding second pixel according to the pixel corresponding relationship, and calculate the second loss function according to the pixel difference.

In one embodiment, as shown in fig. 7, there is provided a video enhancement apparatus comprising: video acquisition module 702, frame extraction module 704, enhancement module 706, and frame processing module 708, wherein:

a video acquisition module 702, configured to acquire video data to be processed;

the frame extraction module 704 is configured to extract frames from video data to be processed, so as to obtain a video frame set corresponding to the video data to be processed;

the enhancement module 706 is configured to sequentially input video frames in the video frame set into a target three-dimensional lookup table, to obtain an enhanced video frame corresponding to the video frame, where the target three-dimensional lookup table is obtained by the three-dimensional lookup table training method;

the frame processing module 708 is configured to obtain enhanced video data according to the enhanced video frame.

In one embodiment, the enhancement module is further configured to sequentially input video frames in the video frame set into the target three-dimensional lookup table to obtain a lookup output result corresponding to the video frame, when the video frame is not the first video frame, obtain a previous video frame lookup output result corresponding to the video frame, perform a sliding average on the previous video frame lookup output result and the lookup output result to obtain a target lookup output result, perform enhancement adjustment on the video frame according to the target lookup output result, and obtain an enhanced video frame corresponding to the video frame.

For specific embodiments of the three-dimensional lookup table training apparatus and the video enhancement apparatus, reference may be made to the embodiments of the three-dimensional lookup table training method and the video enhancement method described above, and the description thereof will not be repeated here. The above-described three-dimensional look-up table training apparatus and each module in the video enhancement apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 8. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing data such as training image sets, verification image sets and the like. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a three-dimensional look-up table training method and a video enhancement method.

It will be appreciated by those skilled in the art that the structure shown in fig. 8 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of:

In one embodiment, the processor when executing the computer program further performs the steps of: and determining the weight of a lookup table of the three-dimensional sub-lookup table according to each dimension characteristic information in the image characteristic information, weighting the three-dimensional sub-lookup table according to the weight of the lookup table to obtain a weighted three-dimensional lookup table, and inputting the training image into the weighted three-dimensional lookup table to obtain the prediction enhancement image.

In one embodiment, the processor when executing the computer program further performs the steps of: and calculating a first loss function according to the scene category label and the prediction scene classification result, calculating a second loss function according to the verification image and the prediction enhancement image, and obtaining the loss function according to the first loss function and the second loss function.

In one embodiment, the processor when executing the computer program further performs the steps of: and determining a pixel point corresponding relation between a first pixel point in the verification image and a second pixel point in the prediction enhancement image, determining a pixel point difference value between the first pixel point and the corresponding second pixel point according to the pixel point corresponding relation, and calculating a second loss function according to the pixel point difference value.

acquiring video data to be processed;

And obtaining the enhanced video data according to the enhanced video frame.

In one embodiment, the processor when executing the computer program further performs the steps of: sequentially inputting video frames in a video frame set into a target three-dimensional lookup table to obtain a lookup output result corresponding to the video frames, when the video frame is not the first video frame, obtaining a previous video frame lookup output result corresponding to the video frame, performing sliding average on the previous video frame lookup output result and the lookup output result to obtain a target lookup output result, and performing enhancement adjustment on the video frames according to the target lookup output result to obtain enhanced video frames corresponding to the video frames.

In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:

In one embodiment, the computer program when executed by the processor further performs the steps of: and determining the weight of a lookup table of the three-dimensional sub-lookup table according to each dimension characteristic information in the image characteristic information, weighting the three-dimensional sub-lookup table according to the weight of the lookup table to obtain a weighted three-dimensional lookup table, and inputting the training image into the weighted three-dimensional lookup table to obtain the prediction enhancement image.

In one embodiment, the computer program when executed by the processor further performs the steps of: and calculating a first loss function according to the scene category label and the prediction scene classification result, calculating a second loss function according to the verification image and the prediction enhancement image, and obtaining the loss function according to the first loss function and the second loss function.

In one embodiment, the computer program when executed by the processor further performs the steps of: and determining a pixel point corresponding relation between a first pixel point in the verification image and a second pixel point in the prediction enhancement image, determining a pixel point difference value between the first pixel point and the corresponding second pixel point according to the pixel point corresponding relation, and calculating a second loss function according to the pixel point difference value.

acquiring video data to be processed;

and obtaining the enhanced video data according to the enhanced video frame.

In one embodiment, the computer program when executed by the processor further performs the steps of: sequentially inputting video frames in a video frame set into a target three-dimensional lookup table to obtain a lookup output result corresponding to the video frames, when the video frame is not the first video frame, obtaining a previous video frame lookup output result corresponding to the video frame, performing sliding average on the previous video frame lookup output result and the lookup output result to obtain a target lookup output result, and performing enhancement adjustment on the video frames according to the target lookup output result to obtain enhanced video frames corresponding to the video frames.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, or the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims

1. A method of three-dimensional look-up table training, the method comprising:

acquiring a training image set, a verification image set and an initial three-dimensional lookup table, wherein training images in the training image set carry scene category labels, verification images in the verification image set correspond to training images in the training image set one by one, and the verification images are enhancement images corresponding to the training images;

according to the image characteristic information, carrying out scene classification on the training image to obtain a predicted scene classification result, and inputting the image characteristic information into the initial three-dimensional lookup table to obtain a predicted enhanced image;

Obtaining a loss function according to the scene category label, the prediction scene classification result, the verification image and the prediction enhancement image;

and returning to the step of extracting the image characteristics of the training images respectively to obtain image characteristic information corresponding to the training images until the loss function converges to obtain a target three-dimensional lookup table.

2. The method of claim 1, wherein the initial three-dimensional look-up table comprises three-dimensional sub-look-up tables, the number of three-dimensional sub-look-up tables corresponding to dimensions of the image characteristic information.

3. The method of claim 2, wherein said inputting the image characteristic information into the initial three-dimensional look-up table to obtain a predicted enhanced image comprises:

determining the lookup table weight of the three-dimensional sub-lookup table according to each dimension characteristic information in the image characteristic information;

and inputting the training image into the weighted three-dimensional lookup table to obtain a prediction enhancement image.

4. The method of claim 1, wherein deriving the loss function from the scene category label, the predicted scene classification result, the verification image, and the predicted enhanced image comprises:

Calculating a first loss function according to the scene category label and the prediction scene classification result, and calculating a second loss function according to the verification image and the prediction enhancement image;

5. The method of claim 4, wherein said calculating a second loss function from said verification image and said predicted enhancement image comprises:

determining a pixel point difference value between the first pixel point and the corresponding second pixel point according to the pixel point corresponding relation;

6. A method of video enhancement, the method comprising:

acquiring video data to be processed;

extracting frames from the video data to be processed to obtain a video frame set corresponding to the video data to be processed;

sequentially inputting video frames in the video frame set into a target three-dimensional lookup table to obtain an enhanced video frame corresponding to the video frame, wherein the target three-dimensional lookup table is obtained by the method of any one of claims 1-5;

And obtaining the enhanced video data according to the enhanced video frame.

7. The method of claim 6, wherein sequentially inputting video frames in the set of video frames into a target three-dimensional lookup table to obtain enhanced video frames corresponding to the video frames comprises:

when the video frame is not the first video frame, acquiring a search output result of a last video frame corresponding to the video frame;

performing moving average on the search output result of the previous video frame and the search output result to obtain a target search output result;

8. A three-dimensional look-up table training apparatus, the apparatus comprising:

the acquisition module is used for acquiring a training image set, a verification image set and an initial three-dimensional lookup table, wherein training images in the training image set carry scene category labels, verification images in the verification image set correspond to training images in the training image set one by one, and the verification images are enhancement images corresponding to the training images;

the loss function calculation module is used for obtaining a loss function according to the scene category label, the prediction scene classification result, the verification image and the prediction enhancement image;

and the second processing module is used for returning to the step of respectively extracting the image characteristics of the training images to obtain the image characteristic information corresponding to the training images until the loss function converges to obtain a target three-dimensional lookup table.

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.