CN117237358B

CN117237358B - Stereoscopic image quality evaluation method based on metric learning

Info

Publication number: CN117237358B
Application number: CN202311517074.8A
Authority: CN
Inventors: 潘兆庆; 武泽煦; 张昊; 雷建军; 彭勃; 沈丽丽
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2023-11-15
Filing date: 2023-11-15
Publication date: 2024-02-06
Anticipated expiration: 2043-11-15
Also published as: CN117237358A

Abstract

The invention provides a stereoscopic image quality evaluation method based on metric learning. The method comprises the following steps: constructing a stereoscopic image quality evaluation model based on metric learning by adopting a weight sharing strategy of a twin structure; training and testing the stereoscopic image quality evaluation model respectively through a training data set and a testing data set which are obtained through random division, so as to obtain a trained stereoscopic image quality evaluation model; inputting two groups of stereoscopic images to be evaluated into a trained stereoscopic image quality evaluation model for quality score prediction, and utilizing each branch network of the trained stereoscopic image quality evaluation model to be responsible for processing one group of stereoscopic images to be evaluated to obtain two groups of quality scores; the two sets of mass fractions are subjected to a difference process and used as a difference result to measure the difference in quality of the two sets of stereoscopic images to be evaluated. The stereoscopic image quality evaluation method based on metric learning provided by the invention can accurately evaluate the quality of stereoscopic images and has wide application scenes.

Description

Stereoscopic image quality evaluation method based on metric learning

Technical Field

The present invention relates to the field of multimedia technologies, and in particular, to a stereoscopic image quality evaluation method based on metric learning, an electronic device, and a storage medium.

Background

Because the visual image has quality loss in the processes of compression, transmission and the like, the image quality evaluation technology has important significance for the wide application of high-quality visual images. Image quality assessment is generally classified into subjective assessment and objective assessment. Subjective evaluation refers to the recruitment of subjects to evaluate image quality, however, the human cost is high, and the subjective evaluation is not suitable for large-scale popularization. The objective image quality evaluation technology is to automatically evaluate the image quality by using an algorithm, and has low cost and high efficiency, and is widely focused by academia and industry. The objective image quality evaluation can be classified into a full reference image quality evaluation, a reduced reference image quality evaluation, and a no-reference image quality evaluation according to the use condition of the reference image. Since no reference image exists in the actual scene, the non-reference image quality evaluation algorithm plays an irreplaceable role in military, news, education and the like. With the development of 3D movies and virtual reality industries, attention is paid to a quality evaluation algorithm without reference for stereoscopic images. The stereoscopic image is composed of a left view image and a right view image, and provides stereoscopic vision including a sense of depth. Therefore, the stereoscopic image quality evaluation method needs to evaluate not only quality degradation of two 2D images but also a binocular combination problem thereof, which makes the stereoscopic image more complicated than a reference-free image quality evaluation algorithm of the 2D images.

However, regarding quality evaluation of a stereoscopic image, in the prior art, a dicing processing mode is generally adopted, and the content integrity of the stereoscopic image is damaged by adopting the stereoscopic image dicing mode, so that the quality of the stereoscopic image is difficult to evaluate effectively, and the technical problems of high training cost of a stereoscopic image evaluation model, low accuracy of a quality evaluation result, insufficient mining of stereoscopic image semantic information and the like in the prior art are caused.

Disclosure of Invention

In view of the above, the present invention provides a stereoscopic image quality evaluation method based on metric learning, with the aim of solving at least one of the above problems.

According to a first aspect of the present invention, there is provided a stereoscopic image quality evaluation method based on metric learning, comprising:

constructing a stereoscopic image quality evaluation model based on metric learning by adopting a weight sharing strategy of a twin structure, wherein the stereoscopic image quality evaluation model comprises two weight sharing branch networks;

training and testing the stereoscopic image quality evaluation model respectively through a training data set and a testing data set which are obtained through random division, so as to obtain a trained stereoscopic image quality evaluation model;

inputting two groups of stereoscopic images to be evaluated into a trained stereoscopic image quality evaluation model for quality score prediction, and utilizing each branch network of the trained stereoscopic image quality evaluation model to be responsible for processing one group of stereoscopic images to be evaluated to obtain two groups of quality scores;

the two sets of mass fractions are subjected to a difference process and used as a difference result to measure the difference in quality of the two sets of stereoscopic images to be evaluated.

According to an embodiment of the present invention, the constructing a stereoscopic image quality evaluation model based on metric learning by using the weight sharing policy of the twin structure includes:

introducing a bidirectional gating interaction module and a weight adjustable module based on a gating circulation unit on the basis of a residual error network so as to construct a weak supervision-based circulation diagram generation network, and constructing a quality regression network on the basis of the residual error network;

performing serial operation on a network structure of a cyclic graph generation network and a quality regression network based on weak supervision to obtain a branch network so as to realize end-to-end training from a stereoscopic image to quality scores;

and constructing two branch networks into a twin structure of the two branches, so that each branch network can realize weight sharing, and a three-dimensional image quality evaluation model based on metric learning is obtained.

According to the embodiment of the invention, the weak supervision-based cyclic graph generating network adopts a bidirectional gating interaction module based on a gating cyclic unit to analyze the interactivity of left and right views of the stereoscopic image, and extracts the characteristics of the left and right views of the stereoscopic image through the bidirectional gating interaction module based on the gating cyclic unit;

the cyclic graph generating network based on weak supervision adopts a weight adjustable module to analyze the weights of left and right views of the stereoscopic image, and calculates the characteristic weights of the left and right views of the stereoscopic image through the weight adjustable module.

According to the embodiment of the invention, the cyclic graph generating network based on weak supervision calculates the left and right views of the stereoscopic image and the characteristics of the left and right views of the stereoscopic image respectively to obtain the cyclic graph.

According to the embodiment of the invention, the quality regression network carries out global average pooling and global variance pooling on the characteristics extracted from the cyclic graph respectively to obtain the quality characteristics of the stereoscopic image, and maps the quality characteristics into quality scores through a plurality of full-connection layers of the quality regression network.

According to an embodiment of the present invention, the training data set and the test data set obtained by random division respectively train and test a stereoscopic image quality evaluation model, and obtaining a trained stereoscopic image quality evaluation model includes:

randomly dividing the data sample set into a training data set and a test data set according to a preset proportion, and randomly combining the training data set to obtain two groups of stereo image pair sample sets;

processing a group of stereoscopic image pair sample sets by using each branch network of the stereoscopic image quality evaluation model to obtain two groups of quality scores;

processing two groups of quality scores and truth labels of two groups of stereo image sample sets corresponding to the two groups of quality scores by using a preset loss function to obtain loss values;

according to the loss value, updating parameters of the stereoscopic image quality evaluation model in a back propagation mode, and controlling a training process of the stereoscopic image quality evaluation model by using a preset optimizer;

according to a preset evaluation standard, evaluating the stereoscopic image quality evaluation model with updated parameters by using a test data set to obtain an evaluation result;

and iterating the quality score obtaining operation, the loss value calculating operation, the model parameter updating operation and the model evaluating operation until the evaluating result meets the preset training condition, and obtaining the trained stereoscopic image quality evaluating model.

According to an embodiment of the present invention, the preset loss function includes an average absolute error loss function and an absolute error loss function;

the average absolute error loss function is used for calculating the loss between the quality fraction predicted value calculated by each branch network of the stereoscopic image quality evaluation model and the true value label of the preprocessed stereoscopic image on the sample;

the absolute error loss function is used for calculating loss between a predicted difference value and an actual difference value, the predicted difference value represents a difference value between quality score predicted values calculated by each branch network of the stereoscopic image quality evaluation model, and the actual difference value represents a difference value between truth labels of the preprocessed stereoscopic image and samples.

According to an embodiment of the present invention, the preset optimizer includes an adaptive moment estimation optimizer;

the preset evaluation criteria comprise a Pelson linear correlation coefficient, a Spicman rank correlation coefficient and a mean square error.

According to a second aspect of the present invention, there is provided an electronic device comprising:

one or more processors;

storage means for storing one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform a stereoscopic image quality evaluation method based on metric learning.

According to a third aspect of the present invention, there is provided a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform a stereoscopic image quality evaluation method based on metric learning.

According to the stereoscopic image quality evaluation method based on metric learning, which is provided by the invention, the cyclic graph conforming to human eye stereoscopic vision is generated from the original-size stereoscopic image through the cyclic graph generation network. The quality regression network is then constructed to map the cyclic graph to image scores. Finally, the whole network adopts a twin structure, so that the network can output two pairs of stereoscopic image quality scores, and the two pairs of stereoscopic image quality scores can judge the sizes of the two pairs of stereoscopic image scores by using a loss function based on metric learning. Meanwhile, the stereoscopic image quality evaluation method based on metric learning provided by the invention avoids the technical problem of incomplete stereoscopic image content caused by adopting a stereoscopic image dicing mode in the prior art, and obtains good performance when evaluating a stereoscopic distortion figure by expanding training samples. In addition, the stereoscopic image quality evaluation method based on metric learning provided by the invention can assist 3D equipment manufacturers, researchers or users in judging the quality of 3D images, and has wide application scenes.

Drawings

FIG. 1 is a flow chart of a stereoscopic image quality evaluation method based on metric learning according to an embodiment of the present invention;

FIG. 2 is a flow chart of constructing a metric learning based stereoscopic image quality assessment model in accordance with an embodiment of the present invention;

FIG. 3 is a schematic view of a stereoscopic image quality evaluation model based on metric learning according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a two-way gating interaction module based on a gating loop according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a weight-adjustable module according to an embodiment of the invention;

FIG. 6 is a flow chart of a trained stereoscopic image quality assessment model according to an embodiment of the present invention;

fig. 7 schematically shows a block diagram of an electronic device adapted to implement a metric learning based stereoscopic image quality evaluation method according to an embodiment of the invention.

Detailed Description

The present invention will be further described in detail below with reference to specific embodiments and with reference to the accompanying drawings, in order to make the objects, technical solutions and advantages of the present invention more apparent.

Objective quality assessment of stereoscopic images (or 3D images) is more complex than planar image (or 2D image) techniques. However, with the widespread use of stereoscopic images in various fields, quality evaluation of stereoscopic images is receiving increasing attention. Various stereoscopic image quality evaluation/assessment schemes have been developed in the prior art, such as a no-reference stereoscopic image quality evaluation network StereoQA Net, for taking a stereoscopic image block as an input to the StereoQA Net, and designing a plurality of sub-networks to extract a series of visual features, and then designing a fusion strategy to fuse the plurality of visual features into quality scores; for example, extracting features from three aspects of a left view, a right view and a depth map respectively by adopting a depth belief network, and collecting the three features into a final quality score; for example, a reference-free stereoscopic image quality evaluation method based on global and local content features. The method designs a main network to extract low-level characteristics of left and right image blocks. Subsequently, multiple features are extracted using two local feature enhancer networks and one global feature fusion subnetwork. Finally, mapping the plurality of features into final quality scores by utilizing two full connection layers; for example, a reference-free stereoscopic image quality evaluation method based on a dual-stream interactive network simulates a binocular perception process by constructing an interactive sub-network and adds an asymmetric convolution kernel to enhance the perception of local information. Although the above technical solutions all achieve certain performance, the existing algorithms all adopt a dicing training method, i.e. the image is diced into a plurality of image blocks according to the positions, so as to expand the size of the training set. However, the distortion degree of each image block is different, and the semantic information is incomplete, so that the performance of each image block is still to be improved.

Aiming at the defects existing in the prior art, the invention aims to provide a three-dimensional image quality evaluation method based on metric learning, a cyclic graph generating network is used for generating a cyclic graph conforming to human eye three-dimensional vision from a three-dimensional image with original size, then a quality regression network is constructed to map the cyclic graph into image scores, and finally the whole network adopts a twin structure, so that the model can output two pairs of three-dimensional image quality scores, and the two pairs of image quality scores can judge the sizes of the two pairs of three-dimensional image scores by utilizing a loss function based on the metric learning.

It should be noted that, in the technical scheme disclosed by the invention, the related stereoscopic image data is obtained by authorization of the related party, and the data is processed, applied and stored under the permission of the related party, and the related process accords with the rules of laws and regulations, and necessary and reliable security measures are adopted to meet the requirements of popular regulations.

Fig. 1 is a flowchart of a stereoscopic image quality evaluation method based on metric learning according to an embodiment of the present invention.

As shown in fig. 1, the method for evaluating stereoscopic image quality based on metric learning includes operations S110 to S140.

In operation S110, a stereoscopic image quality evaluation model based on metric learning is constructed using a weight sharing strategy of a twin structure, wherein the stereoscopic image quality evaluation model includes two weight-shared branch networks.

Fig. 2 is a flowchart of constructing a stereoscopic image quality evaluation model based on metric learning according to an embodiment of the present invention.

As shown in fig. 2, the above-mentioned construction of the stereoscopic image quality evaluation model based on metric learning by adopting the weight sharing strategy of the twin structure includes operations S210 to S230.

In operation S210, a bidirectional gating interaction module and a weight adjustable module based on a gating loop unit are introduced on the basis of a residual network to construct a weak supervision-based loop graph generation network, and a quality regression network is constructed on the basis of the residual network.

Alternatively, the residual network is ResNet18, and the residual network can be replaced by other neural networks based on the technical solution provided by the present invention.

In operation S220, the weak supervision-based cyclic graph generation network and the quality regression network are subjected to a serial operation on the network structure to obtain a branch network so as to realize end-to-end training from the stereoscopic image to the quality score.

In operation S230, two branch networks are constructed as a twin structure of two branches, so that each branch network can realize weight sharing, and a stereoscopic image quality evaluation model based on metric learning is obtained.

Through the operations S210 to S230, a stereoscopic image quality evaluation model with a twin structure and weight sharing can be obtained, and the model is trained through supervised learning and metric learning, so that the stereoscopic image quality evaluation model has good quality evaluation performance.

Fig. 3 is a schematic structural diagram of a stereoscopic image quality evaluation model based on metric learning according to an embodiment of the present invention.

The above stereoscopic image quality evaluation model based on metric learning provided by the present invention is described in further detail below with reference to fig. 3.

As shown in fig. 3, the stereo image quality evaluation model based on metric learning includes two branch networks with shared weights, each including a weak supervision-based loop chart generation network and a branch regression network connected in series with each other. And training the stereoscopic image quality evaluation model based on the metric learning through the supervised learning and the metric learning, so as to obtain the trained stereoscopic image quality evaluation model based on the metric learning.

In operation S120, the stereoscopic image quality evaluation model is trained and tested by randomly dividing the obtained training data set and test data set, respectively, to obtain a trained stereoscopic image quality evaluation model.

In operation S130, inputting the two sets of stereoscopic images to be evaluated into the trained stereoscopic image quality evaluation model for quality score prediction, and using each branch network of the trained stereoscopic image quality evaluation model to process one set of stereoscopic images to be evaluated, so as to obtain two sets of quality scores.

In operation S140, the two sets of quality scores are subjected to a difference process and used as a difference result to measure the quality differences of the two sets of stereoscopic images to be evaluated.

To achieve metric learning of the model, the model may simultaneously output scores of a pair of stereo images and measure the difference between the pair of image scores. The network structure of the step four is constructed into a twin structure of double branches, namely, both branches are of the network structure of the step four, and the weights of the double branches are shared.

According to the embodiment of the invention, the weak supervision-based cyclic graph generating network adopts a bidirectional gating interaction module based on a gating cyclic unit to analyze the interactivity of left and right views of the stereoscopic image, and extracts the characteristics of the left and right views of the stereoscopic image through the bidirectional gating interaction module based on the gating cyclic unit; the cyclic graph generating network based on weak supervision adopts a weight adjustable module to analyze the weights of left and right views of the stereoscopic image, and calculates the characteristic weights of the left and right views of the stereoscopic image through the weight adjustable module.

Fig. 4 is a schematic structural diagram of a bi-directional gating interaction module based on a gating circulation unit according to an embodiment of the present invention.

Fig. 5 is a schematic structural view of a weight adjustable module according to an embodiment of the present invention.

The bidirectional gating interaction module and the weight adjustable module based on the gating circulation unit are described in further detail below with reference to fig. 4 and 5.

As shown in fig. 4, the bi-directional gating interaction module based on the gating cycle unit is used for analyzing the interactivity between the left view feature and the right view feature so as to fully extract the perception feature. The bidirectional gating interaction module can extract the characteristics of the left view of the stereoscopic image and the right view of the stereoscopic image, respectively obtain the left view characteristics and try characteristics, and realize the interaction of the left view characteristics and the right view characteristics in the characteristic extraction process through the interaction mechanism of the bidirectional gating interaction module.

As shown in fig. 5, the weight adjustable module is composed of two identical branches, each including a 3×3 convolution with a step size of 1 and a Sigmoid activation function to calculate the weight of a single view.

The working principles of the bidirectional gating interaction module and the weight adjustable module are further described below with reference to specific embodiments.

A bidirectional gating interaction module and a weight adjustable module are introduced on the basis of ResNet18, so that a cyclic graph generating network is constructed.

When the human eyes process the stereoscopic image, the visual signals input to the left eye and the right eye are firstly converted into a cyclic graph, and the process can be expressed by a formula (1):

（1），

wherein,representing left view +.>Representing right view +.>For left view weight, ++>Right view weights; />Is a cyclic graph. To simulate this process, the present invention first extracts features of the left and right views using ResNet 18. And then, analyzing the mutual influence relation of the left view characteristic and the right view characteristic by utilizing a bidirectional gating interaction module to further strengthen the characteristic extraction capability of the cyclic graph generating network. The bi-directional gating interaction module is represented by equation sets (2) and (3):

（2），

（3），

wherein, the bidirectional gating interaction module consists of two gating circulation unitsAnd->Composition (S)/(S)>For the right view feature of the previous layer, +.>For the left view feature of the previous layer, +.>For the right view obtained by the current bi-directional gating interaction module +.>Is the left view obtained by the current bi-directional gating interaction module. The calculation mode of each gating cycle unit is shown in formulas (4) - (7):

（4），

（5），

（6），

（7），

wherein,for the right view feature of the previous layer, +.>For the left view feature of the previous layer, +.>The function is activated for Sigmoid,for the right view obtained by the current bi-directional gating interaction module +.>Representing a first weight matrix,/->Representing a second weight matrix,/->Representing a third weight matrix,>representing a fourth weight matrix,>representing hyperbolic tangent function, ">Representing vector concatenation function, ">Representing a vector dot product operation, ">Representing memory weight, ++>Representing update weights, ++>Representing the intermediate hidden layer features. The cyclic graph generation network uses a total of 3 bi-directional interaction modules. The left and right view features extracted via the bi-directional interaction module will then pass through the weight adjustment module to generate weights for the left and right views. It will be appreciated by those skilled in the art that equations (4) - (7) above apply equally to the left view of the stereoscopic image.

The weight adjustable module can be described by formulas (8) - (12):

（8），

（9），

（10），

（11），

（12），

wherein,for left view weight, ++>For right view weight, ++>For splicing operation, < >>In order to make a full connection,representing the average pooling function,/->Representing a square difference pooling function,/->Representing the perception of the left view feature,representing right view perception features->Representing the left view feature output by the bi-directional gating unit,/->Right view feature representing output through bi-directional gating unit,/->Representing the left and right view merge feature. Finally, the left view weight and the right view weight which are extracted are multiplied by the left view and the right view respectively, and the left view weight and the right view weight are summed to obtain a final cyclic graph.

The quality regression network takes the ResNet18 pre-trained based on the ImageNet as a backbone, and respectively carries out global average pooling and global difference pooling on the characteristics extracted by the ResNet18 so as to extract the quality characteristics thereof, and finally uses two layers of full connection to map the quality characteristics into quality scores.

Fig. 6 is a flowchart of a trained stereoscopic image quality assessment model according to an embodiment of the present invention.

As shown in fig. 6, the training data set and the test data set obtained by random division respectively train and test the stereoscopic image quality evaluation model, so as to obtain a trained stereoscopic image quality evaluation model, which includes operations S610-S660.

In operation S610, the data sample set is randomly divided into a training data set and a test data set according to a preset ratio, and the training data sets are randomly combined to obtain two stereo image pair sample sets.

In order to verify performance in assessing stereoscopic distorted image quality, the present invention employs a LIVE-I SIQA dataset to assess stereoscopic image quality. The LIVE-1 SIQA dataset includes 365 pairs of distorted images, all 640 x 360 in size. First, the invention will follow 4: the scale of 1 divides the preliminary training set and the test set and the process will be repeated 10 times to eliminate the bias. Secondly, in order to realize the measurement learning of the model and expand the training sample under the condition of ensuring that the image size is unchanged, the invention combines the distorted images in the training set in a random mode to obtain 5000 pairs of distorted image combinations. At the same time, the fractional difference between the distorted image pairs will be recorded as a label for metric learning.

In operation S620, a set of stereo image pair sample sets is processed with each branch network of the stereo image quality evaluation model to obtain two sets of quality scores.

In operation S630, the two sets of quality scores and the truth labels of the two sets of stereo image sample sets corresponding to the two sets of quality scores are processed by using a preset loss function to obtain a loss value.

According to an embodiment of the present invention, the preset loss function includes an average absolute error loss function and an absolute error loss function; the average absolute error loss function is used for calculating the loss between the quality fraction predicted value calculated by each branch network of the stereoscopic image quality evaluation model and the true value label of the preprocessed stereoscopic image on the sample; the absolute error loss function is used for calculating loss between a predicted difference value and an actual difference value, the predicted difference value represents a difference value between quality score predicted values calculated by each branch network of the stereoscopic image quality evaluation model, and the actual difference value represents a difference value between truth labels of the preprocessed stereoscopic image and samples.

The non-reference stereo image quality evaluation algorithm of the invention uses two groups of absolute error loss functions to respectively calculate the loss between the calculated predicted values and the labels of the two branches in the training process. Furthermore, in order to measure the quality difference between the two sets of stereo images, the difference between the predicted difference and the actual difference between the two branches is calculated using a set of absolute error loss functions.

In operation S640, parameters of the stereoscopic image quality evaluation model are updated in a back-propagation manner according to the loss value, and a training process of the stereoscopic image quality evaluation model is controlled by using a preset optimizer.

In operation S650, the stereoscopic image quality evaluation model with updated parameters is evaluated by using the test data set according to the preset evaluation criteria, so as to obtain an evaluation result.

According to an embodiment of the present invention, the preset optimizer includes an adaptive moment estimation optimizer; the preset evaluation criteria comprise a Pelson linear correlation coefficient, a Spicman rank correlation coefficient and a mean square error.

In operation S660, the quality score obtaining operation, the loss value calculating operation, the model parameter updating operation, and the model evaluating operation are iterated until the evaluating result meets the preset training condition, thereby obtaining the trained stereoscopic image quality evaluating model.

The present invention updates network parameters by back propagation, in which training and test sets are constructed by operation S610, the model is run 10 times to eliminate bias, and PLCCs, SROCCs, and RMSEs of 10 operations are counted to evaluate the network training results. The network selects Adam as an optimizer, the training times are 20 epochs, and the learning rate is 0.001.

In order to verify the effectiveness and advantages of the stereoscopic image evaluation model obtained by the above training method, the above method is described in further detail below in connection with specific experiments.

For the stereoscopic image quality evaluation model obtained in this example, a test was performed using the LIVE I SIQA dataset. The hardware platform for testing is NVIDIA Titan RTX GPU with 24G video memory, and the running environment is Window 10. The indices PLCC, SROCC and RMSE for recorded image quality are shown in table 1, where higher PLCC, SROCC represent better model performance, and lower RMSE represents better model performance. The results of the comparative experiments are shown in table 1.

As can be seen from the comparison of table 1, the method for evaluating stereoscopic image quality based on metric learning according to the present invention has the highest performance, and at the same time, RMSE according to the method according to the present invention is the lowest compared with the comparison scheme. In summary, the present invention provides the above method for effectively evaluating the stereoscopic distortion image quality.

As shown in fig. 7, an electronic device 700 according to an embodiment of the present invention includes a processor 701 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. The processor 701 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. The processor 701 may also include on-board memory for caching purposes. The processor 701 may comprise a single processing unit or a plurality of processing units for performing different actions of the method flow according to an embodiment of the invention.

In the RAM 703, various programs and data necessary for the operation of the electronic apparatus 700 are stored. The processor 701, the ROM 702, and the RAM 703 are connected to each other through a bus 704. The processor 701 performs various operations of the method flow according to an embodiment of the present invention by executing programs in the ROM 702 and/or the RAM 703. Note that the program may be stored in one or more memories other than the ROM 702 and the RAM 703. The processor 701 may also perform various operations of the method flow according to embodiments of the present invention by executing programs stored in one or more memories.

According to an embodiment of the invention, the electronic device 700 may further comprise an input/output (I/O) interface 705, the input/output (I/O) interface 705 also being connected to the bus 704. The electronic device 700 may also include one or more of the following components connected to the I/O interface 705: an input section 706 including a keyboard, a mouse, and the like; an output portion 707 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 708 including a hard disk or the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. The drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read therefrom is mounted into the storage section 708 as necessary.

The present invention also provides a computer-readable storage medium that may be embodied in the apparatus/device/system described in the above embodiments; or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs which, when executed, implement methods in accordance with embodiments of the present invention.

According to embodiments of the present invention, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to an embodiment of the invention, the computer-readable storage medium may include ROM 702 and/or RAM 703 and/or one or more memories other than ROM 702 and RAM 703 described above.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The foregoing embodiments have been provided for the purpose of illustrating the general principles of the present invention, and are not meant to limit the scope of the invention, but to limit the invention thereto.

Claims

1. A stereoscopic image quality evaluation method based on metric learning, comprising:

training and testing the stereoscopic image quality evaluation model respectively through a training data set and a test data set which are obtained through random division, so as to obtain a trained stereoscopic image quality evaluation model;

inputting two groups of stereoscopic images to be evaluated into the trained stereoscopic image quality evaluation model for quality score prediction, and utilizing each branch network of the trained stereoscopic image quality evaluation model to be responsible for processing one group of stereoscopic images to be evaluated to obtain two groups of quality scores;

performing difference processing on the two groups of quality scores, and using the two groups of quality scores as a difference result to measure the quality difference of the two groups of stereoscopic images to be evaluated;

the method for constructing the stereoscopic image quality evaluation model based on metric learning by adopting the weight sharing strategy of the twin structure comprises the following steps:

performing serial operation on the network structure of the cyclic graph generating network based on weak supervision and the quality regression network to obtain a branch network so as to realize end-to-end training from a stereoscopic image to quality scores;

and constructing two branch networks into a twin structure of two branches, so that each branch network can realize weight sharing, and the stereoscopic image quality evaluation model based on metric learning is obtained.

2. The method according to claim 1, wherein the weakly supervised loop graph generation network analyzes interactivity of left and right views of the stereoscopic image using the gating-loop-unit-based bi-directional gating interaction module, and extracts features of the left and right views of the stereoscopic image through the gating-loop-unit-based bi-directional gating interaction module;

the cyclic graph generating network based on weak supervision adopts the weight adjustable module to analyze the weights of the left view and the right view of the stereoscopic image, and calculates the characteristic weights of the left view and the right view of the stereoscopic image through the weight adjustable module.

3. The method of claim 2, wherein the weakly-supervised cyclic graph generation network computes left and right views of the stereoscopic image with features of the left and right views of the stereoscopic image, respectively, to obtain a cyclic graph.

4. A method according to claim 3, wherein the quality regression network performs global average pooling and global difference pooling on the features extracted from the cyclic graph, respectively, to obtain quality features of the stereoscopic image, and maps the quality features to the quality scores through multiple full connection layers of the quality regression network.

5. The method according to claim 1, wherein training and testing the stereoscopic image quality evaluation model by randomly dividing the obtained training data set and test data set, respectively, to obtain a trained stereoscopic image quality evaluation model comprises:

randomly dividing a data sample set into the training data set and the test data set according to a preset proportion, and randomly combining the training data set to obtain two groups of stereo image pair sample sets;

processing a group of stereo image pair sample sets by utilizing each branch network of the stereo image quality evaluation model to obtain two groups of quality scores;

processing the two groups of quality scores and truth labels of the two groups of stereo image sample sets corresponding to the two groups of quality scores by using a preset loss function to obtain loss values;

according to a preset evaluation standard, evaluating the stereoscopic image quality evaluation model with updated parameters by using the test data set to obtain an evaluation result;

6. The method of claim 5, wherein the predetermined loss function comprises an average absolute error loss function and an absolute error loss function;

the absolute error loss function is used for calculating loss between a predicted difference value and an actual difference value, the predicted difference value represents a difference value between quality score predicted values calculated by each branch network of the stereoscopic image quality evaluation model, and the actual difference value represents a difference value between truth labels of the preprocessed stereoscopic image on samples.

7. The method of claim 5, wherein the preset optimizer comprises an adaptive moment estimation optimizer;

the preset evaluation criteria comprise a Pelson linear correlation coefficient, a Spilot rank correlation coefficient and a mean square error.

8. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs,

wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-7.

9. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method according to any of claims 1-7.