CN117152067A

CN117152067A - Non-reference light field image quality evaluation method and system based on deep element learning

Info

Publication number: CN117152067A
Application number: CN202311025101.XA
Authority: CN
Inventors: 马健; 张潇尹; 李志鹏; 王俊博; 刘恋国
Original assignee: Anhui University
Current assignee: Anhui University
Priority date: 2023-08-14
Filing date: 2023-08-14
Publication date: 2023-12-01

Abstract

The invention provides a depth element learning-based non-reference light field image quality evaluation method and a system, which respectively construct a training set containing non-reference light field image quality evaluation tasks and a test set containing target evaluation tasks; constructing a deep meta-learning model containing a Swim transducer network structure; based on the training set, learning priori knowledge of distortion to obtain a quality priori model; fine tuning the quality priori model on the test set to obtain a final light field image quality evaluation model; and performing light field image quality evaluation on the light field image to be evaluated by using the light field image quality evaluation model. The invention overcomes the fitting problem and inadaptability to various distortions which can be generated by the far difference between the experimental result and the real result caused by the insufficient combination of the space and the angle information of the light field image and by adopting the deep learning, adapts to each distortion type and greatly enhances the performance.

Description

Non-reference light field image quality evaluation method and system based on deep element learning

Technical Field

The invention relates to the technical field of image processing, in particular to a depth element learning-based reference-free light field image quality evaluation method and system, and provides a corresponding computer terminal and a computer readable storage medium.

Background

With the development of immersive multimedia technology and virtual reality technology, light field technology has become a research hotspot in the field of computational vision, and has attracted attention of many relevant researchers in recent years. Since a light field is a collection of light in space, a three-dimensional (3D) world can be intuitively rendered by acquiring and displaying light field images. Because of its rich spatial and angular information, light fields are widely used in image applications, including depth estimation, refocusing, three-dimensional reconstruction, and the like. However, in each link of the light field processing, distortion effects are unavoidable, which leads to a degradation of the perceived quality of the light field content. In order to guide and supervise the acquisition, processing and application of Light Field Images (LFIs), it is important to design a Light Field Image Quality Assessment (LFIQA) model conforming to the human visual system.

However, most existing light field image quality assessment models are based on traditional machine learning to manually extract features, and then map the input image features to their quality scores through a quality regression model (such as a support vector machine (SVR)). Although the effect is good, full-automatic objective evaluation is not realized, and great convenience is not brought. Meanwhile, the research finds that the image quality evaluation using the deep convolutional neural network has good effect on a low-dimensional image evaluation model. Unfortunately, LFIQA is a small sample problem, and most light field datasets contain relatively little image content. Therefore, most existing deep learning based Image Quality Assessment (IQA) models do not adapt well to the high-dimensional LFIQA model, resulting in the tendency to under-fit or over-fit problems when assessing different types of distortion. Moreover, the characteristics of extremely abundant light field image information are not considered in a few IQA models, and spatial information and angle information of the light field image information are not combined together for evaluation, so that a good effect cannot be achieved when evaluating distortion of certain damaged angle information caused by reconstruction and the like. Meanwhile, because the image quality evaluation needs to depend on human eye recognition, whether the models accord with human eye visual special effects is also particularly important, and some models do not well combine human eye recognition characteristics to perform image quality evaluation, so that the effects of the models do not reach a certain height.

The search finds that:

chinese patent application publication No. CN115937064a, "a light field image quality evaluation method based on spatial and angular measurement," includes: extracting spatial features of the light field image from the sub-aperture image array of the 4-dimensional light field image by adopting a multiband local binary pattern algorithm; extracting features of a microlens image array of the 4-dimensional light field image by using a feature extraction algorithm based on entropy weighted local phase quantization as angular features of the light field image; carrying out feature fusion on the spatial features of the light field image and the angular features of the light field image to obtain a one-dimensional feature vector; the quality fraction of the light field image is obtained after the one-dimensional feature vector is subjected to support vector regression pooling operation, and the problem that the quality of the light field image is limited by the space quality and the angular consistency is avoided by respectively extracting features of the sub-aperture image and the macro pixels so as to effectively quantify the space quality and the angular consistency of the light field image. The method still has the following technical problems:

the method still adopts the traditional method of manually extracting the characteristics, and the quality fraction of the image is required to be mapped by regression means such as a support vector regression machine after the characteristics are manually extracted.

This approach does not adapt well to different warp type images and may result in lower scores for certain warp.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a non-reference light field image quality evaluation method and system based on deep element learning.

According to one aspect of the present invention, there is provided a reference-free light field image quality evaluation method based on deep meta-learning, including:

respectively constructing a training set containing a non-reference light field image quality evaluation task and a test set containing a target evaluation task;

constructing a deep meta-learning model containing a Swim transducer network structure;

based on the training set, learning priori knowledge of distortion to obtain a quality priori model;

fine tuning the quality priori model on a test set to obtain a final light field image quality evaluation model;

and performing light field image quality evaluation on the light field image to be evaluated by using the light field image quality evaluation model.

Preferably, the constructing a training set containing no reference light field image quality assessment task and a testing set containing target assessment task respectively includes:

dividing a light field image into a plurality of non-reference light field image quality assessment tasks according to distortion types of the light field image, and constructing a training set; wherein, in the same data set, the light field image with the same distortion type is divided into a group, and each group is used as a light field image quality assessment task;

Taking a light field image of an unknown distortion type as a target evaluation task for constructing a test set;

and respectively converting the light field image serving as the non-reference light field image quality evaluation task and the light field image serving as the target evaluation task into MacPI modes to obtain a training set and a test set required by non-reference light field image quality evaluation.

Preferably, the constructing a deep meta-learning model including a Swim transducer network structure includes:

constructing a Swim transducer network and a full connection layer network; wherein:

the Swim transducer network comprises: the device comprises a PatchEbed module, four basic layer modules and a Linear layer which are connected in sequence; wherein, the first three basicLayer modules all comprise Swin Transformer block sub-modules and Patch metering sub-modules, and the fourth basicLayer module comprises Transformer block sub-modules; finally, connecting the whole connecting layer;

the PatchEbed module is used for dividing an input light field image with the dimension of H x W x C into N patches with the size of P x C and flattening the N patches, whereinThen input to the Swin Transformer block sub-module of the first basicLayer module;

the Patch metering sub-module is used for reducing the resolution of the picture after each swin Transformer block sub-module, and adjusting the number of channels to form a layered image;

The full-connection layer network is used for mapping the characteristics of the image output by the Swim converter network to an output result, and finally the output value is the predicted quality score value of the light field image.

Preferably, the number of Swin Transformer block sub-modules in the first three basicLayer modules is 2, 2 and 18, respectively.

Preferably, the number of Transformer block sub-modules in the fourth basicLayer module is 2.

Preferably, the full-connection layer network comprises 3 full-connection layers, the image output by the Swim transducer network maps the characteristics of the light field image layer by layer into a final output result through the 3 full-connection layers, and the final output value is the predicted quality score value of the light field image.

Preferably, the learning the prior knowledge of the distortion based on the training set includes:

taking a light field image x in a training set as input, sequentially passing through a Swim transducer network and a full-connection layer network, and generating an output valueWherein the output value +.>For the predicted quality fraction of the light field image x, it is defined as:

where θ is the initialized model parameter, f _θ Is a deep meta learning model comprising a Swim transducer network structure;

using average square error (MSE) loss function Training is continued by gradient descent so that the predicted mass fraction +.>The difference from the true quality score y of the input image x is minimized; wherein the average squared error loss function is L:

wherein n is the number of input images in each task, and i is the current input image;

dividing the non-reference light field image quality assessment tasks of different distortion types in the light field image data set into training setsWherein (1)>N non-reference light field image quality assessment tasks are respectively provided for a support set and a query set of each task;

using Adam optimizer in support setThe neural network weights are iteratively updated, wherein the gradient of Adam's t+1st time stepg _t The method comprises the following steps:

current model parameter θ _t The updated formula of (2) is:

in θ ^′ For updating the values of the model parameters after the update, α is the learning rate, and ε is set to 10 ^-8 To avoid divisor zero, m _t Is momentum, v _t Variance of gradient variation, m _t And v _t First moment and second moment estimates of the gradient, respectively, are expressed as:

wherein beta is ₁ And beta ₂ The coefficient is an exponential decay rate, and the influence of weight distribution and previous gradient square is controlled respectively;andbeta of the t-th time step respectively ₁ And beta ₂ As t increases, their value tends to be increasingly 0;

Using Adam optimizer in query setThe second update of the model parameters is performed, then:

iterating the update gradients of K non-reference light field image quality assessment tasks in the light field image data set, and obtaining final model parameters theta as follows:

θ _i is the parameter value after the kth task is updated, and finally the final parameter theta of the model is obtained according to all the updates, and the final deep element learning model f containing the Swim transducer network structure is obtained _θ The quality priori model is obtained.

Preferably, said fine tuning the quality prior model on the test set comprises:

dividing M light field images with quality score values from the test set to serve as training samples;

minimizing predictive quality scores for light field images using average squared error loss functionAnd the error between the true mass fraction y;

updating network parameters of a quality priori model by adopting an Adam optimizer, and fine-tuning the quality priori model to obtain the best predicted quality fractionIs a light field image quality assessment model of (1).

According to another aspect of the present invention, there is provided a reference-free light field image quality evaluation system based on deep meta-learning, comprising:

the data set construction module is used for respectively constructing a training set containing a non-reference light field image quality assessment task and a test set containing a target assessment task;

The model construction module is used for constructing a deep meta-learning model containing a Swim transducer network structure;

the model element training module is used for learning priori knowledge of distortion based on the training set to obtain a quality priori model;

the model fine tuning module is used for fine tuning the quality priori model on a test set to obtain a final light field image quality evaluation model;

and the quality evaluation module is used for evaluating the light field image quality of the light field image to be evaluated by utilizing the light field image quality evaluation model.

According to a third aspect of the present invention there is provided a computer terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor being operable to perform a method or run a system as claimed in any one of the preceding claims when the program is executed.

According to a fourth aspect of the present invention there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor is operable to perform a method or run a system according to any of the preceding claims.

Due to the adoption of the technical scheme, compared with the prior art, the invention has at least one of the following beneficial effects:

the invention discloses a traditional machine learning method for manually extracting image features, which adopts a deep learning technology in the machine learning field, obtains interpretation information of image data in the learning process by learning the internal rule and the expression level of sample image data, and can autonomously realize the required performance like a robot. By adopting the deep learning scheme, the knowledge of the distortion of the light field image can be automatically learned, the distorted image can be evaluated, and the inconvenience of manually extracting the characteristics in the traditional machine learning is solved.

The invention adopts a deep element learning model, solves the problem of over-fitting or under-fitting caused by the small sample characteristic of the light field image in the deep learning, and has better generalization capability for images with different distortion types. Due to the adoption of meta learning, the quality evaluation of various distortions in the light field image can be realized, and the fitting problem possibly generated during deep learning of the characteristics of the small sample of the light field data set and the inadaptability of the light field data set to various distortions are solved.

The invention converts the input image into MacPI (macro pixel image) mode, wherein MacPI is formed by fixing a piece of space information and angle information, and can well combine the two information, so that the information extraction is more comprehensive. Due to the adoption of the MacPI mode, the space and the angle information of the light field image can be well combined, and the problem that the model cannot achieve a good effect in the evaluation process of certain angle distortion due to the fact that the image angle information is ignored is solved.

According to the invention, the Swim transducer network structure is added, and the attention mechanism characteristic enables the model to accord with the visual characteristic of human eyes, so that the experimental effect is greatly improved. Due to the adoption of the Swim converter network structure, the evaluation effect can be enabled to accord with the human visual characteristics, and the problem that the result is not in accordance with the reality or the performance is low due to the fact that the model is not in accordance with the human visual characteristics is solved.

The invention adopts a deep learning scheme, can realize full-automatic learning of the knowledge of the distortion of the light field image, evaluates the distorted image and solves the inconvenience of manually extracting the characteristics in the traditional machine learning.

The invention adopts meta learning, can realize quality evaluation of various distortions in the light field image, and does not generate inadaptability caused by distortion types.

Drawings

Other features, objects and advantages of the present invention will become more apparent upon reading of the detailed description of non-limiting embodiments, given with reference to the accompanying drawings in which:

FIG. 1 is a workflow diagram of a method for non-reference light field image quality assessment based on deep meta-learning in a preferred embodiment of the present invention.

Fig. 2 is a schematic diagram of the operation of a method for estimating image quality of a non-reference light field based on deep element learning according to a preferred embodiment of the present invention.

FIG. 3 is a schematic diagram of a Swim transducer network according to a preferred embodiment of the present invention.

FIG. 4 is a scatter plot of the results of an experiment in accordance with the present invention; the abscissa is the mass fraction predicted by the model, and the ordinate is the subjective evaluation value. Wherein, (a) is a test result of the Win5-LID data set, (b) is a test result of the SHU data set, (c) is a test result of different distortion types in the Win5-LID data set, and (d) is a test result of different distortion types in the SHU data set.

FIG. 5 is a schematic diagram showing the constituent modules of a non-reference light field image quality assessment system based on deep meta-learning in a preferred embodiment of the present invention.

Detailed Description

The following describes embodiments of the present invention in detail: the embodiment is implemented on the premise of the technical scheme of the invention, and detailed implementation modes and specific operation processes are given. It should be noted that variations and modifications can be made by those skilled in the art without departing from the spirit of the invention, which falls within the scope of the invention.

An embodiment of the present invention provides a deep element learning-based non-reference Light Field Image Quality Assessment (LFIQA) method, which can learn distortion knowledge by previous assessment tasks when assessing the quality of various distorted images, and then easily adapt to the following unknown distortions, and the method is verified by extensive experiments to be superior to other advanced LFIQA models to a great extent, and has good generalization capability for various distortions.

As shown in fig. 1, the reference-free light field image quality evaluation method based on deep element learning provided in this embodiment may include:

s1, respectively constructing a training set containing a non-reference light field image quality evaluation task and a test set containing a target evaluation task;

s2, constructing a deep element learning model containing a Swim converter network structure;

s3, learning priori knowledge of various distortions based on a training set to obtain a quality priori model;

s4, fine tuning is carried out on the quality priori model on the test set, and a final light field image quality evaluation model is obtained;

s5, performing light field image quality evaluation on the light field image to be evaluated by using a light field image quality evaluation model.

In a preferred embodiment of S1, constructing a training set containing no-reference light field image quality assessment tasks and a test set containing target assessment tasks, respectively, may include:

dividing a light field image into a plurality of non-reference light field image quality assessment tasks according to distortion types of the light field image, and constructing a training set;

respectively converting a light field image serving as a non-reference light field image quality evaluation task and a light field image serving as a target evaluation task into MacPI modes to obtain a training set and a test set required by non-reference light field image quality evaluation;

Wherein:

in the training set, dividing light field images with the same distortion type in the same data set into a group, wherein each group is used as a light field image quality assessment task;

in the test set, the real type of the light field image in the data set is unknown.

In this preferred embodiment, the light field image dataset used to construct the training set and the test set may comprise:

in the Win5-LID dataset, the types of distortions are JPEG2000, HEVC, LINEAR, NN and two CNN models, for a total of six distortions.

In the SHU dataset, the types of distortions are JPEG2000, JPEG, gaussian blur, white noise, and Motion blur, for a total of five distortions.

The NBU-LF1.0 dataset has a distortion type of NNI, BI, EPICNN, zhang and VDSR, for a total of five distortions.

In the same dataset, the light field images of the same warp type are divided into groups, each group serving as an evaluation task. The model is trained on two of the datasets, with fine-tuning testing in the other dataset, e.g., in the Win5-LID and NBU-LF1.0 datasets, and testing in the SHU dataset; training was performed in SHU and NBU-LF1.0 dataset and testing was performed in Win5-LID dataset. The task division as a training set is as described above; as a test set, its warp type is unknown.

In a preferred embodiment of S2, constructing a deep meta-learning model comprising a Swim transducer network structure, comprises:

a Swim transducer network comprising: the device comprises a PatchEbed module, four basic layer modules and a Linear layer which are connected in sequence; wherein, the first three basicLayer modules all comprise Swin Transformer block sub-modules and Patch metering sub-modules, and the fourth basicLayer module comprises Transformer block sub-modules; finally, connecting a Linear layer;

the PatchEbed module is used for dividing an input light field image with dimension H x W x C into N patches with size of P x C and flattening the N patches, whereinThen input to the Swin Transformer block sub-module of the first basiclyer module;

the full-connection layer network is used for mapping the image characteristics output by the Swim converter network to output results, and finally the output value is the predictive score value of the light field image.

Further, in a preferred embodiment, the number of Swin Transformer block sub-modules in the first three basiclyer modules is 2, 2 and 18, respectively.

Further, in a preferred embodiment, the number of Transformer block sub-modules in the fourth basiclyer module is 2.

Further, in a preferred embodiment, the fully-connected layer network includes 3 fully-connected layers, the image output by the Swim transducer network maps the features of the light field image layer by layer into a final output result through the 3 fully-connected layers, for example, from 1000 to 1024 to 512 to 1, and a final output value is the predicted quality score value of the light field image.

In a preferred embodiment of S3, a priori knowledge of various distortions is learned based on the training set, including:

training a model in the divided non-reference light field image quality evaluation task, initializing a neural network model, and simultaneously adjusting model parameters in a support set and a query set of a training set by a double-layer gradient descent method; specifically:

taking the light field image x in the training set as input, and generating an output value through a Swim transducer network and a full-connection layer networkWherein, output to->For the predicted quality fraction of the light field image x, it is defined as:

Using average square error (MSE) loss functionTraining is continued by gradient descent so that the predicted mass fraction +.>The difference from the true quality score y of the input image x is minimized; wherein the average square error loss function is +.>

using Adam optimizer in support setThe neural network weights are iteratively updated, wherein the gradient g of the t+1st time step of Adam _t The method comprises the following steps:

current model parameter θ _t The updated formula of (2) is:

where θ' is the value of the updated model parameter, α is the learning rate, and ε is set to 10 ^-8 To avoid divisor zero, m _t Is momentum, v _t Variance of gradient variation, m _t And v _t First moment and second moment estimates of the gradient, respectively, are expressed as:

the update gradients of K non-reference light field image quality assessment tasks in the light field image data set are integrated, and the final model parameters are updated as follows:

θ _i is the parameter value after the kth task is updated, and finally the final parameter theta of the model is obtained according to all the updates, and the final deep element learning model f containing the Swim transducer network structure is finally obtained _θ The quality priori model is obtained.

In the preferred embodiment, the process is iterative, and θ updated in each step is used for the next task, then θ is updated again until all tasks are trained, and finally θ updated is the last parameter of the model.

In a preferred embodiment of S4, fine tuning the quality a priori model over the target non-reference light field image quality assessment task includes:

based on the obtained quality priori model, optimizing parameters by fine-tuning the model in a target non-reference light field image quality evaluation task; specifically:

dividing M light field images with quality scores from a target non-reference light field image quality evaluation task to serve as training samples; wherein the mass fraction is a mass fraction value given by the official data set;

updating network parameters of the quality priori model by adopting an Adam optimizer, and fine-tuning the quality priori model to obtain the best predicted quality fractionIs a light field image quality assessment model of (1).

The technical scheme provided by the embodiment of the invention is further described in detail below.

As shown in fig. 2, the above embodiment of the present invention first establishes a model capable of learning a priori knowledge of the distortion type of the light field image, and then fine-adjusts the model in a new unknown distortion type to obtain a final light field image quality evaluation model.

The network of the model consists of two networks: a Swin transducer network and a full connectivity layer.

1) The Swin transform network consists of four basic layer modules and a linear layer, wherein the first three basic layer modules consist of Swin Transformer block submodules and Patch raising submodules, and the numbers of Swin Transformer block submodules are 2,2 and 18 respectively. The fourth basiclyer module consisted of 2 Transformer block sub-modules, no Patch Merging sub-modules, and finally a Linear layer was attached. As shown in fig. 3.

A distorted light field image is input, and the image is cut into blocks through a PatchEmbled module and embedded into vectors. And the video streaming sub-module is sent into a basicLayer module, wherein the Patch streaming sub-module mainly reduces the picture resolution after each swin Transformer block sub-module, adjusts the channel number, further forms a hierarchical design, and can save certain operation amount. After passing through the swin Transformer network, the dimensions of the image change from HxW to 1x1000.

2) And the total of the layers is three, the upper layer is output from 1000- >1024- >512- >1, and the final output result is the predictive score value of the light field image.

Training of the model consists of two phases: a meta training stage and a fine tuning stage.

1) And a meta training stage, wherein the model is trained in the partitioned NR-IQA training task, and after the network model is initialized, model parameters are adjusted in a support set and a query set of a training set through a double-layer gradient descent method.

2) And a fine tuning stage, which optimizes parameters by fine tuning the model in the target task based on the prior model learned before.

Firstly, in a data preprocessing stage, a light field image is converted into a sub-aperture image (SAI) array image formed by a plurality of sub-aperture images, and the SAI array image is converted into a macro-pixel image (MacPI) form, wherein the macro-pixel image is generated by organizing each SAI with the same spatial position according to the angular position. The macro-pixel form can well combine the spatial and angular information of the light field image to fully utilize the information contained in the image. Depth training is then performed, which includes two steps. Firstly, according to the distortion type of an image in a light field data set, dividing the distortion type into a plurality of tasks as a meta-training set, dividing the training set into a support set and a query set, and training a priori model on the training set by a two-stage gradient descent method. Then we fine tune the previously trained model on the target NR-IQA task to get the final quality model.

1) Training phase of meta learning

This stage is similar to the pre-training stage of image processing with deep learning, where an initialized model is first constructed, and then the various LFIQA distortion tasks are pre-trained. The deep regression network of the training model consists of a Swin transducer network layer and a full connection layer. Swin transducer is a new transducer architecture, which adopts a sliding window and a layered structure mechanism, so that the Swin transducer becomes a new backbox in machine vision, and reaches SOTA level in various machine vision tasks such as image classification, target detection and the like. And its attention mechanism is consistent with the human eye perception characteristics in the image quality assessment process. It is added to the model of the present invention to enhance the estimated performance of the model. Then, a fully connected layer is added after the transducer layer, and the output of the regression network is generated as the value of the light field image quality score.

Specifically, a light field image x is used as input, which is fed into two networks to generate output valuesThis->Is the predicted quality fraction of the image, defined as:

where θ is an initialized network parameter, f _θ Is a network evaluation model, and is continuously trained by a gradient descent method to ensure that the evaluation value of the network evaluation model is obtained Near the true value y of the image score. Using an average squared error (MSE) loss function, training is continued such that the predicted quality fraction +.>And conveying and transportingThe difference between the true quality scores y of the incoming images x becomes smaller and smaller. This loss function is the most commonly used error function in the regression loss function in deep learning, which is the predicted value f _θ (x) And the sum of squares of the differences between the target values y. The formula is as follows:

firstly, dividing several NR-LFIQA tasks with different distortion types in a light field data set into a training setWherein->There are a total of N NR-LFIQA tasks, respectively a support set and a query set for each task.

Since the quality assessment model is relatively complex compared to the recognition model, a random gradient descent method is used to optimize the function. Adam is a first order optimization algorithm that can replace the traditional random gradient descent process. The learning rate of each parameter in the training process is dynamically adjusted by using the gradient of the first moment estimation and the second moment estimation. Thus, the present invention uses Adam optimizers in the support setAnd iteratively updating the neural network weights. The gradient of Adam at time t+1 is:

the update formula is as follows:

where α is the learning rate and ε is set to 10 ^-8 To avoid zero divisor. m is m _t Is momentum, v _t Is the variance of the gradient variation, which are the first and second moment estimates of the gradient, respectively, expressed as:

β ₁ and beta ₂ The coefficients are exponential decay rates, controlling the influence of the weight distribution (momentum and current gradient) and the square of the previous gradient, respectively.

After the model parameters on the support set are optimized, in order to enable the optimized model to have higher performance on the query set, the invention performs a second parameter optimization on the query set. Parameter θ of model ^′ At a query setThe update was also performed using Adam optimizer.

The update gradients of the K tasks in each batch will then be aggregated to update the final prior model, which is defined as:

through the steps, the method iteratively samples different distortion tasks on the meta-training set to train the depth meta-regression network f _θ . Finally, the prior model of the LFI with different distortions is obtained.

2) Fine tuning stage of deep meta learning

At this stage, some unknown NR-LFIQAThe model is fine-tuned in the task. The model is trained in a specific distortion task to learn various a priori knowledge about the distortion, but it also lacks mild training of the target task. Therefore, we divide M light field images with subjective scores from the target NR-IQA task for training to fine tune the model. Likewise, an MSE loss function is used to minimize the output value And the true value y, the Adam optimizer is used to update the parameters of the model. For query set->In (a) distortion LFIx, through a quality assessment model +.>After fine tuning the best predicted quality fraction can be obtained>The two stages of the model adopt the same learning method, and no additional parameters are required to be learned, so that the model has high running efficiency and excellent fitting characteristics. />

In order to study the performance effectiveness of the proposed method, an evaluation experiment was performed on the method in two published data sets, including Win5-LID, SHU. In this evaluation experiment, a spearman scale correlation coefficient (SROCC), a linear correlation coefficient (PLCC), and a Root Mean Square Error (RMSE) were used as evaluation criteria to intuitively display experimental performance. Table 1 shows experimental performance of the quality assessment method proposed by the present invention on Win5-LID and SHU datasets, and comparison results with other most advanced IQA methods, where best performance is shown in bold, nine representative most advanced comparison indicators were selected by the present invention. From experimental results, the quality assessment method proposed by the present invention is superior to several comparable advanced IQA methods in overall performance. In addition, to verify that the proposed meta-deep learning model is able to accommodate various distortion types, the SROCC values for the different distortion types tested on the Win5-LID dataset are listed in table 2. It can be seen that the quality assessment method proposed by the present invention can also maintain a high performance over all types of distortion. In order to more directly show the experimental performance of the quality assessment method proposed by the present invention, the present invention draws a scatter plot of the overall performance of the method and different distortions, as shown in fig. 4 (a) - (d). Based on a large number of experimental results, it can be concluded that the non-reference light field image quality evaluation method based on deep element learning has strong practical value for LFI quality evaluation.

TABLE 1 comparison of Performance on Win5-LID and SHU datasets

TABLE 2 SROCC Performance for different distortion types on Win5-LID datasets

In order to further investigate whether MacPI mode and Swim transducer network structures have an enhancing effect on model performance in the proposed method, two ablative experiments were performed. Specifically, the network structure is first changed to the most common Resnet network in deep learning in MacPI mode to verify the validity of the Swim transducer network. Second, the Swim transducer network was used directly in the original SAI pattern to verify whether switching to MacPI mode has an enhancement to the experimental results. The experimental results are shown in Table 3. It can be seen that the experimental effect is better to a certain extent after the Swim transducer network structure is added. In the Win5-LID data set, the effect improvement is more obvious, while in the SHU data set, a very high effect can be achieved by using only a common depth network. One possible reason is that the spatial distortion of the SHU dataset is higher and thus higher performance is easily achieved in a common network architecture. It is also observed that switching to MacPI mode has significant gain to the experimental results of both data sets. Because the generation of MacPI needs to include the space and angle information of the light field image, the deep learning in the MacPI mode can be well adapted to the characteristics of the light field image, so that the experimental effect is better. In summary, deep learning of a network structure composed of Swim transformers in MacPI mode can greatly improve overall performance of the model.

TABLE 3 validation experiments on the effect of MacPI and Swim transducer on experimental performance

An embodiment of the present invention provides a non-reference light field image quality evaluation system based on deep element learning, as shown in fig. 5, the system may include:

the model element training module is used for learning priori knowledge of distortion based on a training set to obtain a quality priori model;

the model fine tuning module is used for fine tuning the quality priori model on the test set to obtain a final light field image quality evaluation model;

It should be noted that, the steps in the method provided by the present invention may be implemented by using corresponding modules, devices, units, etc. in the system, and those skilled in the art may refer to a technical solution of the method to implement the composition of the system, that is, the embodiment in the method may be understood as a preferred example of constructing the system, which is not described herein.

An embodiment of the present invention provides a computer terminal including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor, when executing the program, is operative to perform the method or operate the system of any of the foregoing embodiments of the present invention.

Optionally, a memory for storing a program; memory, which may include volatile memory (english) such as random-access memory (RAM), such as static random-access memory (SRAM), double data rate synchronous dynamic random-access memory (Double Data Rate Synchronous Dynamic Random Access Memory, DDR SDRAM), and the like; the memory may also include a non-volatile memory (English) such as a flash memory (English). The memory is used to store computer programs (e.g., application programs, functional modules, etc. that implement the methods described above), computer instructions, etc., which may be stored in one or more memories in a partitioned manner. And the above-described computer programs, computer instructions, data, etc. may be invoked by a processor.

The computer programs, computer instructions, etc. described above may be stored in one or more memories in partitions. And the above-described computer programs, computer instructions, data, etc. may be invoked by a processor.

And a processor for executing the computer program stored in the memory to implement the steps in the method or the modules of the system according to the above embodiments. Reference may be made in particular to the description of the previous method and system embodiments.

The processor and the memory may be separate structures or may be integrated structures that are integrated together. When the processor and the memory are separate structures, the memory and the processor may be connected by a bus coupling.

An embodiment of the present invention provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, is operative to perform a method or a system of any of the above embodiments of the present invention.

According to the non-reference light field image quality evaluation method and system based on deep element learning provided by the embodiment of the invention, the technologies such as deep learning, deep element learning, macPI mode and Swim transform network are adopted, so that the light field image quality evaluation technology can achieve higher performance in quality evaluation on data sets such as Win5-LID and SHU, is suitable for each distortion type, and compared with the latest advanced LQA model, the performance is greatly enhanced. The non-reference light field image quality evaluation method and system based on the deep element learning provided by the embodiment of the invention overcome the defect that the experimental result and the real result are far apart due to insufficient combination of the space and the angle information of the light field image in the prior art, and the fitting problem and the inadaptability to various distortions possibly generated by adopting the deep learning.

Those skilled in the art will appreciate that the invention provides a system and its individual devices that can be implemented entirely by logic programming of method steps, in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc., in addition to the system and its individual devices being implemented in pure computer readable program code. Therefore, the system and various devices thereof provided by the present invention may be considered as a hardware component, and the devices included therein for implementing various functions may also be considered as structures within the hardware component; means for achieving the various functions may also be considered as being either a software module that implements the method or a structure within a hardware component.

The foregoing embodiments of the present invention are not all well known in the art.

The foregoing describes specific embodiments of the present invention. It is to be understood that the invention is not limited to the particular embodiments described above, and that various changes and modifications may be made by one skilled in the art within the scope of the claims without affecting the spirit of the invention.

Claims

1. A depth element learning-based reference-free light field image quality assessment method is characterized by comprising the following steps:

Respectively constructing a training set containing a non-reference light field image quality evaluation task and a test set containing a target evaluation task; no-reference light field image quality assessment task

2. The depth element learning-based no-reference light field image quality assessment method according to claim 1, wherein the constructing a training set containing no-reference light field image quality assessment tasks and a test set containing target assessment tasks, respectively, comprises:

3. The method for estimating image quality of a non-reference light field based on deep meta-learning according to claim 1, wherein said constructing a deep meta-learning model including a Swim transducer network structure comprises:

4. A depth element learning-based referenceless light field image quality assessment method according to claim 3, wherein the number of Swin Transformer block sub-modules in the first three basic layer modules is 2, 2 and 18, respectively; and/or

The Transformer block sub-modules in the fourth basiclyer module are 2.

5. The depth element learning-based reference-free light field image quality assessment method according to claim 3, wherein the full-connection layer network comprises 3 full-connection layers, the image output by the Swim transform network maps the characteristics of the light field image into a final output result layer by layer through the 3 full-connection layers, and the final output value is a predicted quality score value of the light field image.

6. The depth element learning-based reference-free light field image quality assessment method according to claim 1, wherein the learning distortion prior knowledge based on the training set comprises:

taking a light field image x in a training set as input, sequentially passing through a Swim transducer network and a full-connection layer network, and generating an output value Wherein the output value +.>For the predicted quality fraction of the light field image x, it is defined as:

using average square error loss functionTraining is continued by gradient descent so that the predicted mass fraction +.>The difference from the true quality score y of the input image x is minimized; wherein the average square error loss function is +.>

current model parameter θ _t The updated formula of (2) is:

m _t ＝β ₁ m _t-1 +(1-β ₁ )g _t ,

Wherein beta is ₁ And beta ₂ The coefficient is an exponential decay rate, and the influence of weight distribution and previous gradient square is controlled respectively;and->Beta of the t-th time step respectively ₁ And beta ₂ As t increases, their value tends to be increasingly 0;

iterating the update gradients of K non-reference light field image quality assessment tasks in the light field image data set, and obtaining final model parameters as follows:

7. The depth element learning-based referenceless light field image quality assessment method of claim 1, wherein said fine tuning the quality prior model on the test set comprises:

updating network parameters of a quality priori model by adopting an Adam optimizer, and fine-tuning the quality priori model to obtain the best predicted quality fraction Is a light field image quality assessment model of (1).

8. A depth element learning-based referenceless light field image quality assessment system comprising:

9. A computer terminal comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor is operable to perform the method of any one of claims 1-7 or to run the system of claim 8 when the program is executed by the processor.

10. A computer readable storage medium having stored thereon a computer program, which when executed by a processor is operative to perform the method of any one of claims 1-7 or to run the system of claim 8.