CN113658122A

CN113658122A - Image quality evaluation method, device, storage medium and electronic equipment

Info

Publication number: CN113658122A
Application number: CN202110908704.9A
Authority: CN
Inventors: 杨子木; 张璇; 王武生
Original assignee: Shenzhen Huantai Technology Co Ltd
Current assignee: Shenzhen Huantai Technology Co Ltd
Priority date: 2021-08-09
Filing date: 2021-08-09
Publication date: 2021-11-16

Abstract

The disclosure provides an image quality evaluation method and device, a computer readable storage medium and electronic equipment, and relates to the technical field of image and video processing. The image quality evaluation method comprises the following steps: acquiring basic characteristic images of a target image under multiple scales; performing pooling on the basic feature image under each scale, and extracting local quality features under each scale according to the pooled abstract feature image; performing convolution processing on the basic feature image under each scale to extract semantic features under each scale, and determining self-adaptive parameters under each scale according to the semantic features; respectively fusing the local quality features under the same scale by using the self-adaptive parameters under each scale to obtain fused quality features under each scale; and determining the evaluation value of the target image based on the fusion quality characteristics under each scale. The method and the device can improve the accuracy of image quality evaluation.

Description

Image quality evaluation method, device, storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of image and video processing technologies, and in particular, to an image quality evaluation method, an image quality evaluation device, a computer-readable storage medium, and an electronic device.

Background

Image Quality Assessment (IQA) refers to the Assessment of Image Quality by a computer to mimic the criteria of human perception of Image Quality. Image quality assessment typically includes three types of tasks: full reference image quality evaluation, reduced reference image quality evaluation, and no reference image quality evaluation (or blind quality evaluation). The no-reference image quality evaluation refers to the image quality evaluation under the condition of not using any reference image, and is a very critical task with the highest difficulty in the image quality evaluation.

In the related art, when no-reference image quality evaluation is performed, certain difference exists between the evaluation result and human visual perception, so that the accuracy of the evaluation result is not high.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those skilled in the art.

Disclosure of Invention

The present disclosure provides an image quality evaluation method, an image quality evaluation device, a computer-readable storage medium, and an electronic device, thereby solving, at least to some extent, a problem that an image cannot be accurately evaluated in related art.

According to a first aspect of the present disclosure, there is provided an image quality evaluation method including: acquiring basic characteristic images of a target image under multiple scales; performing pooling on the basic feature image under each scale, and extracting local quality features under each scale according to the pooled abstract feature image; performing convolution processing on the basic feature image under each scale to extract semantic features under each scale, and determining self-adaptive parameters under each scale according to the semantic features; respectively fusing the local quality features under the same scale by using the self-adaptive parameters under each scale to obtain fused quality features under each scale; and determining the evaluation value of the target image based on the fusion quality characteristics under each scale.

According to a second aspect of the present disclosure, there is provided an image quality evaluation apparatus including: a basic feature image acquisition module configured to acquire basic feature images of a target image at a plurality of scales; the local quality feature extraction module is configured to perform pooling processing on the basic feature image under each scale and extract local quality features under each scale according to the pooled abstract feature image; the self-adaptive parameter determining module is configured to perform convolution processing on the basic feature image under each scale so as to extract semantic features under each scale, and determine self-adaptive parameters under each scale according to the semantic features; the local quality feature fusion module is configured to respectively utilize the adaptive parameters under each scale to fuse the local quality features under the same scale, so as to obtain fused quality features under each scale; a multi-scale fusion module configured to determine an evaluation value of the target image based on the fusion quality features at each scale.

According to a third aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the image quality evaluation method of the first aspect described above and possible implementations thereof.

According to a fourth aspect of the present disclosure, there is provided an electronic device comprising: a processor; and a memory for storing executable instructions of the processor; wherein the processor is configured to execute the image quality evaluation method of the first aspect and possible implementations thereof via execution of the executable instructions.

The technical scheme of the disclosure has the following beneficial effects:

on the first hand, basic characteristic images are obtained firstly, the image is identified, then local quality characteristics are extracted, adaptive parameters are determined according to semantic characteristics, and the image is perceived, so that the scheme adopts an image quality evaluation mode of identifying firstly and then perceiving, and the process of human visual perception is met. In the second aspect, by performing pooling processing on the basic feature image and further extracting the local quality feature according to the obtained abstract feature image, local distortion in the target image can be more sharply found, and processing of uneven distortion in the target image is facilitated. In the third aspect, the convolution processing is carried out on the basic characteristic image, and the self-adaptive parameters are further determined according to the obtained semantic characteristics, so that the parameters can be adjusted in a self-adaptive manner according to the change of the image content, namely, the manner or standard of image quality evaluation is adjusted in a self-adaptive manner, and the characteristic that the perception manner of human beings on the image quality changes along with the image content is simulated. In the fourth aspect, local quality features are extracted and adaptive parameters are determined under multiple scales, and full evaluation of distortion of different sizes in the target image is guaranteed. By integrating the aspects, the method and the device can improve the accuracy of image quality evaluation.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty.

FIG. 1 shows a schematic diagram of a system architecture in the present exemplary embodiment;

fig. 2 shows a schematic configuration diagram of an electronic apparatus in the present exemplary embodiment;

fig. 3 shows a flowchart of an image quality evaluation method in the present exemplary embodiment;

FIG. 4 is a diagram illustrating processing of a target image by an underlying feature extraction network in the present exemplary embodiment;

FIG. 5 illustrates a flow chart for extracting local quality features in the present exemplary embodiment;

FIG. 6 is a diagram illustrating processing of an underlying feature image by a local quality feature extraction network in the present exemplary embodiment;

FIG. 7 illustrates a flow chart for determining adaptive weight parameters and adaptive bias parameters in the present exemplary embodiment;

FIG. 8 is a diagram illustrating processing of a first base feature image by a first adaptive weight determination network in the present exemplary embodiment;

FIG. 9 illustrates a flow chart for determining a global adaptive weight parameter and a global adaptive bias parameter in the exemplary embodiment;

FIG. 10 is a diagram illustrating processing of an nth base feature image by a global adaptive weight determination network in the present exemplary embodiment;

fig. 11 shows a schematic diagram of processing a target image by an image quality evaluation model in the present exemplary embodiment;

fig. 12 shows a flowchart of training an image quality evaluation model in the present exemplary embodiment;

fig. 13 is a schematic configuration diagram showing an image quality evaluation apparatus in the present exemplary embodiment;

fig. 14 shows a result verification diagram of image quality evaluation in the present exemplary embodiment.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and the like. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus their repetitive description will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

In the related technology, deep learning is mostly adopted to evaluate the quality of the non-reference image, and certain progress is achieved. However, the following disadvantages still exist:

the deep learning model established for the quality evaluation of the non-reference image is mostly based on the structure of a target detection network, and the global information of the learning image is emphasized. However, the distortion of the real image mostly exists in a local area, and particularly, when the image shows a good quality in a large area, the human visual system is very sensitive to the local distortion. Therefore, the neglect of the local information by the deep learning model results in a large difference between the output image quality evaluation result and human perception.

After the deep learning model is trained, the structure and parameters of the deep learning model are fixed, which means that the evaluation mode of the image quality is also fixed. For example, the quality of the blue sky image is considered to be high by people, and most deep learning models mistake the blue sky image as a blurred image because the image includes a large open area. Therefore, the image quality evaluation method of deep learning model curing affects the accuracy of the result.

In view of one or more of the above problems, exemplary embodiments of the present disclosure first provide an image quality evaluation method. The system architecture and application scenario of the operating environment of the exemplary embodiment are described below with reference to fig. 1.

Fig. 1 shows a schematic diagram of a system architecture, and the system architecture 100 may include a terminal 110 and a server 120. The terminal 110 may be a terminal device such as a smart phone, a tablet computer, a desktop computer, or a notebook computer, and the server 120 generally refers to a background system providing the image quality evaluation related service in the exemplary embodiment, and may be a server or a cluster formed by multiple servers. The terminal 110 and the server 120 may form a connection through a wired or wireless communication link for data interaction.

In one embodiment, the above-described image quality evaluation method may be performed by the terminal 110. For example, when a user takes an image using the terminal 110 or the user selects an image in an album of the terminal 110, the terminal 110 evaluates the quality of the image and outputs the evaluation value.

In one embodiment, the above-described image quality evaluation method may be performed by the server 120. For example, when a user takes an image using the terminal 110 or the user selects an image in an album of the terminal 110, the terminal 110 uploads the image to the server 120, the server 120 performs quality evaluation on the image, and the evaluation value is returned to the terminal 110.

As can be seen from the above, the execution subject of the image quality evaluation method in the present exemplary embodiment may be the terminal 110 or the server 120, which is not limited by the present disclosure.

Exemplary embodiments of the present disclosure also provide an electronic device for executing the above image quality evaluation method, which may be the above terminal 110 or the server 120. In general, the electronic device may include a processor and a memory for storing executable instructions of the processor, the processor being configured to execute the image quality evaluation method described above via execution of the executable instructions.

The structure of the electronic device will be exemplarily described below by taking the mobile terminal 200 in fig. 2 as an example. It will be appreciated by those skilled in the art that the configuration of figure 2 can also be applied to fixed type devices, in addition to components specifically intended for mobile purposes.

As shown in fig. 2, the mobile terminal 200 may specifically include: the mobile communication device comprises a processor 201, a memory 202, a bus 203, a mobile communication module 204, an antenna 1, a wireless communication module 205, an antenna 2, a display screen 206, a camera module 207, an audio module 208, a power module 209 and a sensor module 210.

The processor 201 may include one or more processing units, such as: the Processor 210 may include an AP (Application Processor), a modem Processor, a GPU (Graphics Processing Unit), an ISP (Image Signal Processor), a controller, an encoder, a decoder, a DSP (Digital Signal Processor), a baseband Processor, and/or an NPU (Neural-Network Processing Unit), etc. The image quality evaluation method in the present exemplary embodiment may be performed by an AP, a GPU, or a DSP, and when the method involves neural network-related processing, may be performed by an NPU.

An encoder may encode (i.e., compress) an image or video, for example, the target image may be encoded into a particular format to reduce the data size for storage or transmission. The decoder may decode (i.e., decompress) the encoded data of the image or video to restore the image or video data, for example, the encoded data of the target image may be read, and decoded by the decoder to restore the data of the target image, so as to perform the related processing of image quality evaluation on the data. The mobile terminal 200 may support one or more encoders and decoders. In this way, the mobile terminal 200 may process images or video in a variety of encoding formats, such as: image formats such as JPEG (Joint Photographic Experts Group), PNG (Portable Network Graphics), BMP (Bitmap), and Video formats such as MPEG (Moving Picture Experts Group) 1, MPEG2, h.263, h.264, and HEVC (High Efficiency Video Coding).

The processor 201 may be connected to the memory 202 or other components by a bus 203.

The memory 202 may be used to store computer-executable program code, which includes instructions. The processor 201 executes various functional applications of the mobile terminal 200 and data processing by executing instructions stored in the memory 202. The memory 202 may also store application data, such as files for storing images, videos, and the like.

The communication function of the mobile terminal 200 may be implemented by the mobile communication module 204, the antenna 1, the wireless communication module 205, the antenna 2, a modem processor, a baseband processor, and the like. The

antennas

1 and 2 are used for transmitting and receiving electromagnetic wave signals. The mobile communication module 204 may provide a mobile communication solution of 2G, 3G, 4G, 5G, etc. applied to the mobile terminal 200. The wireless communication module 205 may provide wireless communication solutions such as wireless lan, bluetooth, near field communication, etc. applied to the mobile terminal 200.

The display screen 206 is used to implement display functions, such as displaying user interfaces, images, videos, and the like. The camera module 207 is used for performing a photographing function, such as photographing an image, a video, and the like. The audio module 208 is used to implement audio functions, such as playing audio, collecting voice, and the like. The power module 209 is used to implement power management functions, such as charging batteries, powering devices, monitoring battery status, etc. The sensor module 210 may include a depth sensor 2101, a pressure sensor 2102, a gyro sensor 2103, an air pressure sensor 2104, etc., to implement a corresponding sensing detection function.

The image quality evaluation method in the present exemplary embodiment is described below with reference to fig. 3, where fig. 3 shows an exemplary flow of the image quality evaluation method, and may include:

step S310, acquiring basic characteristic images of a target image under multiple scales;

step S320, performing pooling treatment on the basic characteristic image under each scale, and extracting local quality characteristics under each scale according to the pooled abstract characteristic image;

step S330, performing convolution processing on the basic feature image under each scale to extract semantic features under each scale, and determining self-adaptive parameters under each scale according to the semantic features;

step S340, respectively utilizing the self-adaptive parameters under each scale to fuse the local quality features under the same scale to obtain fused quality features under each scale;

step S350, determining the evaluation value of the target image based on the fusion quality characteristics under each scale.

Based on the method, on the first aspect, a basic feature image is obtained firstly, the image is identified, then local quality features are extracted, adaptive parameters are determined according to semantic features, and the image is perceived, so that the scheme adopts an image quality evaluation mode of identifying firstly and then perceiving, and accords with the process of human visual perception. In the second aspect, by performing pooling processing on the basic feature image and further extracting the local quality feature according to the obtained abstract feature image, local distortion in the target image can be more sharply found, and processing of uneven distortion in the target image is facilitated. In the third aspect, the convolution processing is carried out on the basic characteristic image, and the self-adaptive parameters are further determined according to the obtained semantic characteristics, so that the parameters can be adjusted in a self-adaptive manner according to the change of the image content, namely, the manner or standard of image quality evaluation is adjusted in a self-adaptive manner, and the characteristic that the perception manner of human beings on the image quality changes along with the image content is simulated. In the fourth aspect, local quality features are extracted and adaptive parameters are determined under multiple scales, and full evaluation of distortion of different sizes in the target image is guaranteed. By integrating the aspects, the method and the device can improve the accuracy of image quality evaluation.

Each step in fig. 3 is explained in detail below.

Referring to fig. 3, in step S310, a base feature image of a target image at a plurality of scales is acquired.

The target image is an image for which quality evaluation is required. The source of the target image is not limited in the present disclosure, for example, the target image may be a currently captured image or an arbitrary image selected by a user.

In step S310, preliminary feature extraction may be performed on the target image at different scales, and the features extracted at this stage are generally features with relatively basic and low levels, so that the obtained feature image is referred to as a basic feature image.

The present disclosure does not limit the specific manner of extracting the basic feature image, such as extracting the basic feature image from the target image by using different types of feature descriptors. In one embodiment, the process of step S310 may be implemented through a specific network. Specifically, the above-mentioned plurality of scales are expressed as first to nth scales, and n is a positive integer not less than 2. The image quality evaluation method may further include the steps of:

and acquiring a basic feature extraction network. The basic feature extraction network comprises n convolution layer combinations, wherein the first convolution layer combination to the nth convolution layer combination respectively correspond to the first scale to the nth scale.

Correspondingly, the above-mentioned obtaining the basic feature image of the target image under multiple scales may include the following steps:

and inputting the target image into the basic feature extraction network, and respectively outputting a first basic feature image to an nth basic feature image through a first convolution layer combination to an nth convolution layer combination.

The first to nth basic feature images are basic feature images at a first scale to a nth scale, respectively. The basic feature extraction network is used for extracting basic feature images, and the main structure is n convolution layer combinations. Convolutional layer combination refers to a multi-layer structure comprising a plurality of convolutional layers, and other types of intermediate layers, such as pooling layers, upsampling layers, and the like, can also be included in the convolutional layer combination. The combination of the n convolution layers can be in a serial connection or parallel connection structure. When the n convolution layer combinations are of a serial structure, each convolution layer combination can be used for processing intermediate data output by the previous convolution layer combination, and when the n convolution layer combinations are of a parallel structure, each convolution layer combination can be used for processing a target image or the same intermediate data. In addition to convolution layer combinations, the underlying feature extraction network may include other intermediate layers, such as a single convolution layer, pooling layer, upsampling layer, and the like.

FIG. 4 shows a schematic structure of an underlying feature extraction network, comprising one convolutional layer and four convolutional layer combinations, each convolutional layer combination comprising a plurality of convolutional layers, in each convolutional layer there are shown convolution parameters, including the size of the convolutional core, the number of channels and the step size. In fig. 4, partial convolution folds are shown as x 2, × 3, × 5, as there are two structures in the first convolution combination (convolution 1 × 64+ convolution 3 × 64+ convolution 1 × 512). In any convolution layer, computation such as BN (Batch Normalization), regu (Rectified Linear Unit, or other activation functions in addition to ReLU) and the like may be performed after convolution.

For example, assuming that the size of the target image is 224 × 224, the basic feature extraction network is input, the size and the dimension (dimension, i.e., the number of channels of the image) of the basic feature image obtained through each part are shown in table 1, S1 of the convolution parameter in table 1 indicates that the step size is 1, and S2 indicates that the step size is 2. The size of a Feature image (Feature map) obtained by the target image passing through the first convolution layer is 112 × 64; the feature image is continuously input into the first convolution layer combination, and a first basic feature image with the size of 56 × 256 is output; the first basic feature image is continuously input into the second convolution layer combination, and a second basic feature image with the size of 28 × 512 is output; inputting the second basic characteristic image into a third convolution layer combination, and outputting a third basic characteristic image with the size of 14 × 1024; the third base feature image continues to be input into the fourth convolution layer combination and a fourth base feature image is output, and the size of the fourth base feature image is 7 x 2048. The basic features of the target image under the micro scale and the basic features under the increasing macro scale are reflected from the first basic feature image to the fourth basic feature image.

TABLE 1

In an embodiment, a structure of a residual block may also be added in the base feature extraction network, for example, a structure of a direct connection of a residual block may be added between different convolution layers of each convolution layer combination, so that information of the base feature image is richer.

With continued reference to fig. 3, in step S320, the basic feature image at each scale is pooled, and the local quality feature at each scale is extracted according to the pooled abstract feature image.

The local quality feature is a feature that characterizes quality information in a local range of the image. The pooling process generally traverses the image using a pooling window of a certain size to abstract the image information within the pooling window. In the exemplary embodiment, the basic feature image is pooled to obtain an abstract feature image, and the abstract feature image is further feature-extracted to obtain a local quality feature.

Before pooling the base feature images, certain pre-processing may be performed, including but not limited to dimensionality reduction, cropping, reconstruction, and the like. For the abstracted feature images after the pooling processing, the local quality features can be further extracted through operations of global pooling, stretching (Flatten), full connection and the like.

In one embodiment, the process of step S320 may be implemented through a specific network. Specifically, the image quality evaluation method may further include the steps of:

and acquiring a first local quality feature extraction network to an nth local quality feature extraction network, wherein each local quality feature extraction network comprises a pooling layer and a full connection layer.

Correspondingly, as shown in fig. 5, the above-mentioned performing the pooling process on the basic feature image under each scale and extracting the local quality feature under each scale according to the pooled abstract feature image may include the following steps S510 to S530:

step S510, inputting the ith basic characteristic image into an ith local quality characteristic extraction network;

step S520, performing pooling processing on the ith basic feature image through a pooling layer of the ith local quality feature extraction network to obtain an ith abstract feature image, wherein the ith abstract feature image is an abstract feature image under the ith scale;

step S530, the ith abstract feature image is processed through the full connection layer of the ith local quality feature extraction network to obtain the ith local quality feature, wherein the ith local quality feature is the local quality feature under the ith scale.

Wherein i is any positive integer in [1, n ], the ith basic feature image represents each of the first to nth basic feature images, that is, each basic feature image can be processed by using the flow of fig. 5 to obtain the local quality features at the corresponding scale.

The first to nth local quality feature extraction networks may have the same structure, and network parameters (here, network parameters are hyper-parameters) may be different because the processed basic feature images are different. The first to nth local quality feature extraction networks may be connected to the rear ends of the first to nth convolution layer combinations in the above-described basic feature extraction network, respectively, i.e., a first basic feature image output by the first convolution layer combination is input to the first local quality extraction network, a second basic feature image output by the second convolution layer combination is input to the second local quality extraction network, and so on.

Each local quality feature extraction network comprises a pooling layer and a full-link layer, and is used for pooling processing and full-link processing. The pooling processing can extract abstract features, and the full-connection processing can fuse and further learn the abstract features to obtain local quality features. In addition, other types of intermediate layers can be arranged in the local quality feature extraction network, such as a dimensionality reduction layer, a cutting layer, a reconstruction layer and the like for preprocessing the basic feature image. Illustratively, each local quality feature extraction network may further include a dimensionality reduction layer; the pooling of the basic feature image at each scale and the extraction of the local quality feature at each scale according to the pooled abstract feature image may further include the following steps:

and performing dimensionality reduction processing on the ith basic feature image through a dimensionality reduction layer of the ith local quality feature extraction network.

Correspondingly, the pooling of the ith basic feature image through the pooling layer of the ith local quality feature extraction network to obtain the ith abstract feature image may include the following steps:

and performing pooling processing on the ith basic feature image after the dimensionality reduction processing through a pooling layer of the ith local quality feature extraction network to obtain an ith abstract feature image.

Wherein the dimensionality reduction layer can pass through 1 × m_i(i ═ 1,2, …, n), m_iThe number of convolution channels of the dimensionality reduction layer of the ith local quality feature extraction network corresponds to the dimensionality of the input basic feature image. As according to the dimensionality of the underlying feature image in table 1, m is therefore advantageous for extracting abstract features in view of the deeper network structure_iThe value of (a) is gradually increased along with the deepening of the dimension of the basic feature image, such as: m is₁＝28，m₂＝56，m₃＝112，m₄＝224。

When dimension reduction is carried out, the features of the basic feature image in different dimensions are actually fused, so that abstract features can be extracted, and the calculation amount of subsequent processing is reduced.

The abstract feature images subjected to pooling processing usually need to be subjected to dimension transformation before full-join processing. Generally, the abstract feature image may be stretched into a one-dimensional array by a stretching layer, or the abstract feature image of each channel may be converted into one-dimensional data by a global pooling layer, so as to obtain a one-dimensional array composed of each channel. Then, the full connection processing is performed on the one-dimensional array.

Fig. 6 is a schematic diagram illustrating processing of a basic feature image through local quality feature extraction networks, where each local quality feature extraction network includes a dimensionality reduction layer, a pooling layer, a stretching layer, and a full connection layer, and data dimensionality information output by each part may be referred to as shown in table 2. For example, the first basic feature image of 56 × 256 is input into the first local quality feature extraction network, and dimensionality reduction is performed in the dimensionality reduction layer through a convolution kernel of 1 × 28, so that a first basic feature image after dimensionality reduction of 56 × 28 is obtained; then entering a pooling layer, and carrying out average pooling treatment of 7 × 7 (step size is also 7) to obtain an abstract feature of each local area, namely a first abstract feature image, wherein the dimension of the first abstract feature image is 8 × 28; stretching the film into a one-dimensional array with dimension 1792(═ 1 × 8 × 28); the 28-dimensional first local quality features are then output through the full-link layer, typically in the form of vectors. The processing flow of the second, third, and fourth basic feature images is the same, and the related data dimension information may refer to table 2, which is not described again.

TABLE 2

With continued reference to fig. 3, in step S330, the base feature image at each scale is subjected to convolution processing to extract semantic features at each scale, and an adaptive parameter at each scale is determined according to the semantic features.

The adaptive parameters refer to parameters for fusing local quality features. In the present exemplary embodiment, the parameter needs to be adapted to the base feature image (which may also be understood as being adapted to the target image or the local quality feature described above), that is, differs from base feature image to base feature image, and is therefore referred to as an adaptive parameter. The method comprises the steps of performing convolution processing on a basic characteristic image, extracting semantic characteristics, enabling a computer to adapt to different evaluation modes or evaluation standards of human beings for different image contents on the basis of understanding image semantics, and determining adaptive parameters, wherein the adaptive parameters reflect changes of the evaluation modes or the evaluation standards.

The adaptation parameter may include an adaptation weight parameter (weight). In one embodiment, the process of step S330 may be implemented by a specific network. Specifically, the image quality evaluation method may further include the steps of:

acquiring a first adaptive parameter determination network to an nth adaptive parameter determination network, wherein each adaptive parameter determination network comprises a semantic feature extraction sub-network and a weight parameter determination sub-network, the semantic feature extraction sub-network comprises a convolutional layer, and the weight parameter determination sub-network comprises a reconstruction layer.

Correspondingly, as shown in fig. 7, the performing convolution processing on the basic feature image in each scale to extract the semantic features in each scale and determine the adaptive parameters in each scale according to the semantic features may include the following steps S710 to S730:

step S710, inputting the ith basic characteristic image into the ith adaptive parameter determination network;

step S720, performing convolution processing on the ith basic feature image through the ith semantic feature extraction sub-network to extract the ith semantic feature, wherein the ith semantic feature extraction sub-network is a semantic feature extraction sub-network for the ith adaptive parameter determination network, and the ith semantic feature is a semantic feature under the ith scale;

step S730, performing data reconstruction on the ith semantic feature through the ith weight parameter determining sub-network to obtain an ith adaptive weight parameter, wherein the ith weight parameter determining sub-network is a weight parameter determining sub-network of the ith adaptive parameter determining network, and the ith adaptive weight parameter is an adaptive weight parameter under the ith scale.

Wherein i is any positive integer in [1, n ], the ith basic feature image represents each of the first to nth basic feature images, that is, each basic feature image can be processed by using the process of fig. 7 to obtain the adaptive weight parameter under the corresponding scale.

The first to nth adaptive parameter determination networks may have the same structure, and the parameters of the networks may be different because the basic feature images processed by the first to nth adaptive parameter determination networks are different. The first to nth adaptive parameter determination networks may be connected to the rear ends of the first to nth convolution layer combinations in the above-described basic feature extraction network, respectively, i.e., a first basic feature image output by the first convolution layer combination is input to the first adaptive parameter determination network, a second basic feature image output by the second convolution layer combination is input to the second adaptive parameter determination network, and so on.

Each adaptive parameter determination network comprises a semantic feature extraction sub-network and a weight parameter determination sub-network. The semantic feature extraction sub-network comprises a convolution layer and a semantic feature extraction sub-network, wherein the convolution layer is used for performing convolution processing on the basic feature image to extract the semantic feature; the weight parameter determination subnetwork includes a reconstruction layer for data reconstruction (Reshape) of semantic features. Generally, the specific form of the semantic features may be a semantic feature image, and when data reconstruction is performed on the semantic feature image, data in the semantic feature image may be rearranged according to the dimensional information of the local quality features, so as to obtain the adaptive weight parameter.

In addition, other types of intermediate layers may also be provided in the semantic feature extraction sub-network or the weight parameter determination sub-network. In one embodiment, the semantic feature extraction subnetwork may further comprise a pooling layer. Therefore, the convolution processing on the ith basic feature image through the ith semantic feature extraction sub-network to extract the ith semantic feature may include the following steps:

and performing convolution and pooling on the ith basic feature image through the ith semantic feature extraction sub-network to obtain the ith semantic feature.

The pooling layer in the semantic feature extraction sub-network is usually positioned behind the convolution layer, after the basic feature image is input into the semantic feature extraction sub-network, the convolution processing is firstly carried out to extract the semantic features, and then the pooling processing is carried out to further abstract the extracted semantic features so as to improve the generalization of the semantic features.

In one embodiment, the weight parameter determination sub-network may further include a convolutional layer. Therefore, the obtaining the ith adaptive weight parameter by performing data reconstruction on the ith semantic feature through the ith weight parameter determining sub-network may include the following steps:

and determining a sub-network to carry out convolution processing and data reconstruction on the ith semantic feature through the ith weight parameter to obtain the ith self-adaptive weight parameter.

That is, before data reconstruction is performed on the semantic features, convolution processing may be performed again to fuse information of the semantic features in different dimensions, and the dimensions of the semantic features may be adjusted, for example, the semantic features may be subjected to dimension raising or dimension lowering according to the dimension information of the local quality features. And then data reconstruction is carried out, which is beneficial to further improving the accuracy of the obtained adaptive weight parameters.

In one embodiment, the adaptive parameters may also include an adaptive bias parameter (bias). In order to obtain the adaptive bias parameters, a bias parameter determining sub-network may also be provided in each adaptive parameter determining network, which may comprise fully connected layers. In general, the weight parameter determination sub-network and the bias parameter determination sub-network are two branches connected to the semantic feature extraction sub-network. Referring to fig. 7, the above-mentioned performing convolution processing on the basic feature image under each scale to extract semantic features under each scale and determine adaptive parameters under each scale according to the semantic features may further include the following step S740:

step S740, the ith semantic feature is subjected to full connection processing through the ith bias parameter determining sub-network to obtain the ith adaptive bias parameter, the ith bias parameter determining sub-network determines the sub-network for the bias parameter of the ith adaptive parameter determining network, and the ith adaptive bias parameter is the adaptive bias parameter under the ith scale.

The bias parameter determination sub-network full-connection layer is used for performing full-connection processing on the semantic features to obtain the self-adaptive bias parameters.

Generally, the specific form of the semantic features may be a semantic feature image, and before performing full join processing, dimension transformation is generally required. For example, the bias parameter determining sub-network may further include a global pooling layer for converting the semantic feature image of each channel into one-dimensional data, so as to obtain a one-dimensional array composed of the channels. Alternatively, the bias parameter determining sub-network may further comprise a stretching layer for stretching the semantic feature images into a one-dimensional array. And after the one-dimensional array is obtained, carrying out full connection processing on the one-dimensional array through a full connection layer.

Fig. 8 shows a schematic diagram of the processing of a first base feature image by a first adaptive parameter determination network. The first adaptive parameter determination network comprises a first semantic feature extraction sub-network, a first weight parameter determination sub-network and a first bias parameter determination sub-network, wherein the first weight parameter determination sub-network and the first bias parameter determination sub-network are two branches connected to the first semantic feature extraction sub-network. The first adaptive parameter determines the data dimension information output by each part in the network, which can be referred to table 3. Inputting the first basic feature image of 56 × 256 into a first adaptive parameter determination network, and passing through two convolution layers in a first semantic feature extraction sub-network, wherein, for example, the two convolution layers can carry out convolution and ReLU activation processing through a convolution kernel of 3 × 3; then performing pooling processing by using a pooling window of 8 × 8 and step length of 8 to extract a first semantic feature with dimension of 7 × 14; and then, respectively inputting the first semantic features into a first weight parameter determination sub-network and a first bias parameter determination sub-network, performing convolution and data reconstruction processing in the first weight parameter determination sub-network, outputting a first adaptive weight parameter with the dimension of 14 x 28 x 1, performing global pooling and full connection processing in the first bias parameter determination sub-network, and outputting a first adaptive bias parameter with the dimension of 14 x 1.

It should be understood that in other adaptive parameter determination networks, the network parameters (here, the network parameters are hyper-parameters) can be set according to actual conditions. For example, the pooling layer in the second adaptive parameter determination network may be set to a pooling window of 4 × 4 with a step size of 4, the pooling layer in the third adaptive parameter determination network may be set to a pooling window of 2 × 2 with a step size of 2, and the pooling layer in the fourth adaptive parameter determination network may be set to a pooling window of 1 × 1 with a step size of 1 (corresponding to no pooling); the number of channels of convolution kernels used by convolution layers in the second, third and fourth adaptive parameter determination networks may also be different from that of the first adaptive parameter determination network. The present disclosure is not limited thereto. The processing flow of the second, third, and fourth basic feature images is the same, and the related data dimension information may refer to table 3, which is not described again.

TABLE 3

With continuing reference to fig. 3, in step S340, the local quality features under the same scale are respectively fused by using the adaptive parameters under each scale, so as to obtain the fused quality features under each scale.

For example, a first adaptive weight parameter may be used to perform full join processing on the first local quality feature, which is equivalent to performing fusion on different dimensions of the first local quality feature to obtain a fusion quality feature under the first scale. Or performing full-connection processing on the first local quality feature by using the first adaptive weight parameter and the first adaptive bias parameter to obtain a fusion quality feature under the first scale.

In one embodiment, the image quality evaluation method may further include the steps of:

a converged network is obtained, the converged network including a fully connected layer.

Correspondingly, the above fusing the local quality features under the same scale by using the adaptive parameters under each scale to obtain the fused quality features under each scale respectively may include the following steps:

and inputting the self-adaptive parameters and the ith local quality characteristics under the ith scale into the fusion network, carrying out full-connection processing on the ith local quality characteristics by using the self-adaptive parameters in the fusion network, and outputting the fusion quality characteristics under the ith scale.

Wherein i is any positive integer in [1, n ], that is, for the adaptive parameter and the local quality feature in each scale, the above steps can be adopted for processing to obtain the fusion quality feature in the corresponding scale.

It should be noted that, the converged network can be expressed as follows:

HM(L_x,W_x)＝q (1)

where HM (hyper Module) represents the fusion network, x represents the target image, L_xRepresenting local quality features, W_xParameters representing the fusion network, i.e. adaptive parameters, and q represents fusion quality characteristics. It can be seen that Wx is related to x, and the parameters indicating the fusion network are not fixed and will change with different images, so that the output fusion quality characteristics are also adapted to the change of image content, so as to adaptively evaluate the image quality according to the image content.

With continued reference to fig. 3, in step S350, an evaluation value of the target image is determined based on the fusion quality feature at each scale.

Generally, fusion quality features under different scales can be further fused, so that information of local distortion with different sizes in the target image can be integrated, and the evaluation value of the target image can be finally output.

In one embodiment, as shown with reference to fig. 9, the image quality evaluation method may further include the following steps S910 to S940:

step S910, obtaining a global adaptive parameter determination network, where the global adaptive parameter determination network includes a global semantic feature extraction sub-network and a global weight parameter determination sub-network, the global semantic feature extraction sub-network includes a convolutional layer and a global pooling layer, and the global weight parameter determination sub-network includes a dimension adjustment layer;

step S920, inputting the nth basic characteristic image into a global self-adaptive parameter determination network;

step S930, performing convolution and global pooling on the nth basic feature image through the global semantic feature extraction sub-network to extract global semantic features;

and step S940, determining the dimension of the global semantic features by the sub-network through the global weight parameters to obtain global adaptive weight parameters.

Accordingly, the above determining the evaluation value of the target image based on the fusion quality characteristics at each scale may include the following steps:

and performing full-connection processing on the fusion quality characteristics under each scale by using the global self-adaptive weight parameters to obtain the evaluation value of the target image.

The global adaptive parameter is a parameter used for further fusing the fusion quality features under different scales, and may include a global adaptive weight parameter, which is a weight used for weighting the fusion quality features under different scales.

The global adaptive parameter determining network is used for determining the global adaptive parameters, and the structure of the global adaptive parameter determining network can be different from the first adaptive parameter determining network to the nth adaptive parameter determining network. Specifically, the global semantic feature extraction sub-network needs to extract global semantic features, which are relatively macro-scale semantic features, and a global pooling layer may be arranged therein to abstract the global semantic features, which may be a one-dimensional array. The global weight parameter determines the dimensions of the sub-network without convolution and data reconstruction processing, and the global adaptive weight parameter can be obtained by carrying out dimension adjustment on the global semantic features, so that the dimensions of the global adaptive weight parameter correspond to the dimensions of the fusion quality features of all scales. The global weight parameter determination subnetwork may therefore include a dimension adjustment layer, which may be an upscaled layer or a downscaled layer, which is not limited by this disclosure.

By using the global adaptive weight parameters, full-connection processing can be performed on the fusion quality features under each scale, which is equivalent to further fusion of the fusion quality features under different scales, and if the fusion quality features under the same scale can be fused into a numerical value, the numerical value is the evaluation value of the target image.

In one embodiment, the global adaptive parameters may further include a global adaptive bias parameter, which is a bias used when fusing the fusion quality features at different scales. Accordingly, the global adaptive parameter determination network may further comprise a global bias parameter determination sub-network comprising fully connected layers. Referring to fig. 9, after extracting the global semantic features, the image quality evaluation method may further include the following step S950:

and step S950, determining the sub-network to perform full connection processing on the global semantic features through the global bias parameters to obtain global adaptive bias parameters.

Correspondingly, the fully-connected processing of the fusion quality features under each scale by using the global adaptive weight parameter to obtain the evaluation value of the target image, and obtaining the evaluation value of the target image, may include the following steps:

and performing full-connection processing on the fusion quality characteristics under each scale by using the global adaptive weight parameters and the global adaptive bias parameters to obtain the evaluation value of the target image.

The global semantic features can be a one-dimensional array, the full connection layers in the sub-networks are determined through the global bias parameters, the global semantic features are fused, and the global adaptive bias parameters are output and can be one-dimensional data.

Fig. 10 shows a schematic diagram of processing of an nth base feature image by a global adaptive parameter determination network. The global adaptive parameter determination network comprises a global semantic feature extraction sub-network, a global weight parameter determination sub-network and a global bias parameter determination sub-network, wherein the global weight parameter determination sub-network and the global bias parameter determination sub-network are two branches connected to the global semantic feature extraction sub-network. It can be seen that the global adaptive parameter determination network has a very similar structure to the first adaptive parameter determination network in fig. 8, and the difference is that one more global pooling layer is added in the global semantic feature extraction sub-network, the global weight parameter determination sub-network is a 1 × 1 convolutional layer (i.e., a dimension adjustment layer), and the global bias parameter determination sub-network is not provided with a global pooling layer. The global adaptive parameters determine the data dimension information output by each part in the network, which can be referred to table 4.

TABLE 4

In one embodiment, the above-mentioned partial networks may be formed into an image quality evaluation model. Fig. 11 shows a schematic diagram of processing of a target image by an image quality evaluation model. The backhaul represents a basic feature extraction network, and comprises a first convolution layer (Conv1) and four convolution layer combinations (Stage 1-Stage 4). Lqe (local Quality extra) represents a local Quality extraction network, and the image Quality evaluation model includes 4 local Quality extraction networks, which are LQE1 to LQE4 respectively. The AWM (adaptive Weight Module) represents an adaptive parameter determination network, and the image quality evaluation model comprises 5 adaptive parameter determination networks, namely AWM 1-AWM 4 and AWMG (Global adaptive parameter determination network). The HM denotes a converged network, which is an algorithm module and does not include parameters.

The target image is input to the image quality evaluation model shown in fig. 11, and is processed by the first convolution layer, and then sequentially passes through stages 1 to 4, and the first to fourth base feature images are output.

Outputting the first local quality feature L1 from the first base feature image input LQE1 and the AWM1, LQE1, and outputting the first adaptive weight parameter W1 and the first adaptive bias parameter B1 from the AWM 1; inputting a second basic feature image LQE2 and AWM2, LQE2 to output a second local quality feature L2, and outputting a second adaptive weight parameter W2 and a second adaptive bias parameter B2 from AWM 2; inputting a third base feature image LQE3 and AWM3, LQE3 to output a third local quality feature L3, and outputting a third adaptive weight parameter W3 and a third adaptive bias parameter B3 from AWM 3; inputting a fourth basic feature image LQE4 and AWM4, LQE4 to output a fourth local quality feature L4, and outputting a fourth adaptive weight parameter W4 and a fourth adaptive bias parameter B4 from AWM 4; and inputting the fourth basic feature image into the AWMG, wherein the AWMG outputs a global adaptive weight parameter WG and a global adaptive bias parameter BG.

Inputting L1, W1 and B1 into HM, carrying out full connection processing on L1 by the HM by using W1 and B1, and outputting fusion quality characteristics H1 under a first scale; inputting L2, W2 and B2 into HM, carrying out full connection processing on L2 by the HM by using W2 and B2, and outputting fusion quality characteristics H2 under a second scale; inputting L3, W3 and B3 into HM, carrying out full connection processing on L3 by the HM by using W3 and B3, and outputting fusion quality characteristics H3 under a third scale; and inputting L4, W4 and B4 into HM in common, and performing full connection processing on L4 by the HM by using W4 and B4 to output fusion quality characteristics H4 at a fourth scale. The algorithm for HM when performing local quality feature fusion at each scale is as follows:

h1, H2, H3, and H4 are connected to each other, and then input to HM together with WG and BG, and the HM performs full-connection processing on L4 with W4 and B4, and outputs an evaluation value (Score) of the target image. When performing local quality feature fusion at all scales, the algorithm for HM is as follows:

Score＝WG*H′+BG (3)

wherein the content of the first and second substances,

a splice may be represented.

In one embodiment, the image quality evaluation method may further include a training procedure for the image quality evaluation model, and as shown in fig. 12, the training procedure may include the following steps S1210 to S1230:

step S1210, an image quality evaluation model is established, wherein the image quality evaluation model comprises a basic feature extraction network to be trained, a first local quality feature extraction network to an nth local quality feature extraction network, a first adaptive parameter determination network to an nth adaptive parameter determination network and a global adaptive parameter determination network.

The image quality evaluation model established at this time is an initial model, parameters in the model can be initialized in any mode, for example, random initialization can be performed, or a pre-training model of ImageNet is adopted for a basic feature extraction network, and an Xavier initialization mode is used for other parts.

In step S1220, a data set is obtained, where the data set includes a sample image and an evaluation value label of the sample image.

The sample image may be a true distorted image. Illustratively, a KonIQ-10k data set may be used, comprising 10073 true distorted sample images, with a high diversity of image content and distortion. The evaluation value label of the sample image may be an evaluation value obtained by manual evaluation. In the KonIQ-10k data set, each sample image is evaluated by at least 120, the evaluation value is between [1,100], and the evaluation value labels of the sample images are obtained by taking the average value.

After the data set is acquired, the training set and the test set may be divided, for example, randomly at an 8:2 ratio, the resulting training set includes 8058 sample images and the test set includes 2015 sample images.

In one embodiment, any one or more of the following pre-processing may be performed on the sample image:

setting a certain probability (for example, the probability can be 0.5), carrying out affine transformation on the sample image by adopting the probability, wherein the affine transformation comprises but is not limited to horizontal overturning, vertical overturning, translation, rotation, scaling and the like, and obtaining a new sample image after affine transformation.

② unify sample images to a predetermined resolution, such as 512 x 384.

And thirdly, randomly intercepting a certain number (such as 25) of local images in the sample image, wherein the size of the local images can be the same as the size of an input image of the image quality evaluation model, such as 224 × 224, and taking the intercepted local images as the real input of a subsequent training image quality evaluation model.

Normalizing the pixel value of the local image to be within 0, 1.0.

Step S1230, a first learning rate is set for the basic feature extraction network, a second learning rate is set for the first to nth local quality feature extraction networks, the first to nth adaptive parameter determination networks, and the global adaptive parameter determination network, and an image quality evaluation model is trained using the data set.

The first learning rate and the second learning rate may be variable learning rates, such as gradually decreasing with increasing epoch. The first learning rate may be regarded as a basic learning rate of the image quality evaluation model, and the second learning rate is a special learning rate set to accelerate convergence of the local quality feature extraction network and the adaptive parameter determination network. The second learning rate is greater than the first learning rate in the first z epochs of training, the second learning rate is equal to the first learning rate after the first z epochs of training, and z can be any positive integer. For example, taking z as 8 as an example, the relationship between the first learning rate and the second learning rate may be as follows:

where Lr1 denotes the first learning rate, Lr2 denotes the second learning rate, and epoch denotes the number of "generations" of the current training. In the first 5 epochs, Lr1 is the initial learning rate set to 2 × 10^-5The learning rate decreases by a factor of 10 for every additional epoch, starting with the 6 th epoch. Of the first 8 epochs, Lr2 was 10 times Lr1, and from the 9 th epoch, Lr2 was the same as Lr 1.

In model training, any form of loss function may be used, such as the L1 loss may be used, as follows:

Score_jevaluation value, Q, of sample image j output for image quality evaluation model_jIs an evaluation value label of the sample image j. E denotes an exerciseThe number of sample images used in the training period.

Illustratively, a weight attenuation of 5 × 10 is used^-4The Adam optimizer of (1) performs 15 epoch training on the image quality evaluation model, the training batch size is 96, and updates the model parameters by adopting the settings of the first learning rate and the second learning rate and through the back propagation of the loss function value.

In one embodiment, in the testing stage, 5 local images with 224 × 224 pixels may be randomly extracted from a sample image in a test set, an image quality evaluation model may be input to obtain an evaluation value of each local image, and the evaluation values of the 5 local images may be averaged to obtain an evaluation value of the sample image.

In one embodiment, before step S310, the following steps may be performed:

acquiring an image to be evaluated;

and intercepting a plurality of local images with preset sizes from the image to be evaluated to serve as target images.

After determining the evaluation value of the target image, the image quality evaluation method may further include the steps of:

and integrating the evaluation value of each target image to obtain the evaluation value of the image to be evaluated.

The image to be evaluated may be an original image to be evaluated, and may be processed in the same preprocessing manner as the sample image. For example, the image to be evaluated may be adjusted to a predetermined resolution, for example, 512 × 384, the pixel values are normalized to be within [0,1.0], and then t local images are randomly captured according to actual requirements to serve as t target images.

For each target image, the method flow of fig. 3 is adopted for processing, and an evaluation value is obtained. T evaluation values of the t target images are obtained, and then the t evaluation values are integrated, for example, an average value or a weighted average value can be calculated, so that the evaluation value of the image to be evaluated is obtained. For example, an average value is calculated for t evaluation values, which can be shown as follows

Wherein S is₀As an evaluation value of the image to be evaluated, S_kIs an evaluation value of the target image (partial image) k.

The method for processing the image to be evaluated by intercepting a plurality of local images can adapt to the size of the input image of the image quality evaluation model, pay more attention to finding the local distortion in the image, and further improve the accuracy of image quality evaluation.

Exemplary embodiments of the present disclosure also provide an image quality evaluation apparatus. Referring to fig. 13, the image quality evaluation apparatus 1300 may include:

a base feature image acquisition module 1310 configured to acquire a base feature image of a target image at a plurality of scales;

a local quality feature extraction module 1320, configured to pool the basic feature image under each scale, and extract a local quality feature under each scale according to the abstract feature image after the pooling;

an adaptive parameter determination module 1330 configured to perform convolution processing on the basic feature image at each scale to extract semantic features at each scale, and determine an adaptive parameter at each scale according to the semantic features;

the local quality feature fusion module 1340 is configured to respectively fuse the local quality features at the same scale by using the adaptive parameters at each scale to obtain a fused quality feature at each scale;

a multi-scale fusion module 1350 configured to determine an evaluation value of the target image based on the fusion quality features at each scale.

In one embodiment, the base feature image acquisition module 1310 is further configured to:

acquiring an image to be evaluated;

intercepting a plurality of local images with preset sizes from an image to be evaluated to serve as target images;

a multi-scale fusion module 1350, further configured to:

and after the evaluation value of the target image is determined, integrating the evaluation value of each target image to obtain the evaluation value of the image to be evaluated.

In one embodiment, the plurality of scales includes a first scale to an nth scale, n is a positive integer not less than 2; the image quality evaluation apparatus 1300 may further include a model acquisition module configured to:

acquiring a basic feature extraction network, wherein the basic feature extraction network comprises n convolution layer combinations, and the first convolution layer combination to the nth convolution layer combination respectively correspond to the first scale to the nth scale;

a base feature image acquisition module 1310 configured to:

inputting the target image into a basic feature extraction network, and outputting a first basic feature image to an nth basic feature image through a first convolution layer combination to an nth convolution layer combination respectively, wherein the first basic feature image to the nth basic feature image are a basic feature image under a first scale to a basic feature image under an nth scale respectively.

In one embodiment, the model acquisition module is further configured to:

acquiring a first local quality feature extraction network to an nth local quality feature extraction network, wherein each local quality feature extraction network comprises a pooling layer and a full connection layer;

a local quality feature extraction module 1320 configured to:

inputting the ith basic characteristic image into an ith local quality characteristic extraction network;

pooling the ith basic feature image through a pooling layer of an ith local quality feature extraction network to obtain an ith abstract feature image, wherein the ith abstract feature image is an abstract feature image under the ith scale;

processing the ith abstract characteristic image through a full connection layer of an ith local quality characteristic extraction network to obtain an ith local quality characteristic, wherein the ith local quality characteristic is a local quality characteristic under an ith scale;

wherein i is any positive integer within [1, n ], and the ith basic feature image represents each of the first to nth basic feature images.

In one embodiment, the adaptive parameters include adaptive weight parameters.

A model acquisition module further configured to:

An adaptive parameter determination module 1330 configured to:

inputting the ith basic characteristic image into an ith self-adaptive parameter determination network;

performing convolution processing on the ith basic feature image through an ith semantic feature extraction sub-network to extract an ith semantic feature, wherein the ith semantic feature extraction sub-network determines a semantic feature extraction sub-network of the network for the ith adaptive parameter, and the ith semantic feature is a semantic feature under the ith scale;

and performing data reconstruction on the ith semantic feature through an ith weight parameter determining sub-network to obtain an ith adaptive weight parameter, wherein the ith weight parameter determining sub-network is a weight parameter determining sub-network of the ith adaptive parameter determining network, and the ith adaptive weight parameter is an adaptive weight parameter under the ith scale.

In one embodiment, the adaptive parameters further include an adaptive bias parameter. Each adaptive parameter determining network further comprises a bias parameter determining sub-network comprising fully connected layers.

The adaptive parameter determination module 1330 is further configured to:

and performing full connection processing on the ith semantic feature through an ith bias parameter determining sub-network to obtain an ith adaptive bias parameter, wherein the ith bias parameter determining sub-network determines the bias parameter determining sub-network of the ith adaptive parameter determining sub-network, and the ith adaptive bias parameter is the adaptive bias parameter under the ith scale.

In one embodiment, the model acquisition module is further configured to:

acquiring a global adaptive parameter determination network, wherein the global adaptive parameter determination network comprises a global semantic feature extraction sub-network and a global weight parameter determination sub-network, the global semantic feature extraction sub-network comprises a convolutional layer and a global pooling layer, and the global weight parameter determination sub-network comprises a dimension adjustment layer;

inputting the nth basic characteristic image into a global self-adaptive parameter determination network;

performing convolution and global pooling on the nth basic feature image through a global semantic feature extraction sub-network to extract global semantic features;

determining a sub-network to perform dimension adjustment on the global semantic features through the global weight parameters to obtain global self-adaptive weight parameters;

a multi-scale fusion module 1350 configured to:

and performing full-connection processing on the fusion quality characteristics under each scale by using the global self-adaptive weight parameters to obtain an evaluation value of the target image.

In one embodiment, the global adaptive parameter determination network further comprises a global bias parameter determination sub-network, the global bias parameter determination sub-network comprising fully connected layers. A model acquisition module further configured to:

after the nth basic feature image is input into a global self-adaptive parameter determination network, performing full-connection processing on the global semantic features through a global bias parameter determination sub-network to obtain global self-adaptive bias parameters;

a multi-scale fusion module 1350 configured to:

In one embodiment, the model acquisition module is further configured to:

establishing an image quality evaluation model, wherein the image quality evaluation model comprises a basic feature extraction network to be trained, a first local quality feature extraction network to an nth local quality feature extraction network, a first self-adaptive parameter determination network to an nth self-adaptive parameter determination network and a global self-adaptive parameter determination network;

acquiring a data set, wherein the data set comprises a sample image and an evaluation value label of the sample image;

setting a first learning rate for a basic feature extraction network, setting a second learning rate for a first local quality feature extraction network to an nth local quality feature extraction network, a first adaptive parameter determination network to an nth adaptive parameter determination network and a global adaptive parameter determination network, and training an image quality evaluation model by using a data set;

wherein the second learning rate is greater than the first learning rate in the first z epochs of the training, the second learning rate is equal to the first learning rate after the first z epochs of the training, and k is a positive integer.

The specific details of each part in the above device have been described in detail in the method part embodiments, and details that are not disclosed may be referred to in the method part embodiments, and thus are not described again.

In order to verify the validity of the above method or apparatus, 4 sample images were taken from the KonIQ-10k data set, 5 partial images at 224 × 224 resolution were randomly cut out from each sample image, the evaluation values thereof were output using the image quality evaluation method in the present exemplary embodiment, and the average evaluation value of the 5 partial images was calculated as the evaluation value of the sample image. As a result of verification, as shown in fig. 14, MOS indicates an evaluation value label, and S indicates an evaluation value obtained in the present exemplary embodiment. The verification results of the 4 sample images are respectively as follows: (a) 77.384 for MOS and 77.585 for S. (b) 62.558 for MOS and 62.101 for S. (c) 53.073 for MOS and 53.901 for S. (d) 31.226 for MOS and 32.939 for S. The 4 sample images may represent images within different evaluation value intervals. As can be seen, for images in different evaluation value intervals, the evaluation value obtained by the exemplary embodiment is relatively close to the evaluation value label, and the image quality can be evaluated relatively accurately.

Further, the image quality evaluation method in the present exemplary embodiment is compared with nine existing algorithms such as brique (no reference image spatial quality evaluator), IL-NIQE (natural image quality evaluation of synthetic part), HOSA (high order statistical aggregation), BIECON (blind image evaluator based on convolutional neural network), WaDIQaM (weighted average depth image quality metric), SFA (semantic feature aggregation), PQR (probability quality representation), DBCNN (depth linear convolutional neural network), and HyperNet (hyper network) for the no reference image quality evaluation effect on the KonIQ-10k data set. The performance of image quality evaluation is measured by using PLCC (Pearson Correlation Coefficient) and SROCC (Spearman Rank Order Correlation Coefficient), wherein PLCC is used for quantitatively measuring the consistency of the quality score prediction result and a real result, SROCC is used for quantitatively measuring the sorting Correlation of the quality score prediction result and the real result, and the larger the value of PLCC or SROCC is, the better the prediction performance of the image quality evaluation method is. Comparison of the present exemplary embodiment with the other nine algorithms is shown in table 5, and it can be seen that the present exemplary embodiment has the optimum image quality evaluation performance.

TABLE 5

Name of algorithm	PLCC	SROCC
			BRISQUE	0.681	0.665
IL-NIQE	0.523	0.507
			HOSA	0.694	0.671
BIECON	0.651	0.618
			WaDIQaM	0.805	0.797
SFA	0.872	0.856
			PQR	0.884	0.880
DBCNN	0.884	0.875
			HyperNet	0.892	0.874
The present exemplary embodiment	0.897	0.881

Exemplary embodiments of the present disclosure also provide a computer-readable storage medium, which may be implemented in the form of a program product, including program code for causing an electronic device to perform the steps according to various exemplary embodiments of the present disclosure described in the above-mentioned "exemplary method" section of this specification, when the program product is run on the electronic device. In an alternative embodiment, the program product may be embodied as a portable compact disc read only memory (CD-ROM) and include program code, and may be run on an electronic device, such as a personal computer. However, the program product of the present disclosure is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

A computer readable signal medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit, according to exemplary embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

As will be appreciated by one skilled in the art, aspects of the present disclosure may be embodied as a system, method or program product. Accordingly, various aspects of the present disclosure may be embodied in the form of: an entirely hardware embodiment, an entirely software embodiment (including firmware, microcode, etc.) or an embodiment combining hardware and software aspects that may all generally be referred to herein as a "circuit," module "or" system. Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is to be limited only by the following claims.

Claims

1. An image quality evaluation method is characterized by comprising:

acquiring basic characteristic images of a target image under multiple scales;

performing pooling on the basic feature image under each scale, and extracting local quality features under each scale according to the pooled abstract feature image;

performing convolution processing on the basic feature image under each scale to extract semantic features under each scale, and determining self-adaptive parameters under each scale according to the semantic features;

respectively fusing the local quality features under the same scale by using the self-adaptive parameters under each scale to obtain fused quality features under each scale;

and determining the evaluation value of the target image based on the fusion quality characteristics under each scale.

2. The method of claim 1, further comprising:

acquiring an image to be evaluated;

intercepting a plurality of local images with preset sizes from the image to be evaluated to serve as the target image;

after determining the evaluation value of the target image, the method further comprises:

3. The method of claim 1, wherein the plurality of scales includes a first scale to an nth scale, n being a positive integer no less than 2; the method further comprises the following steps:

acquiring a basic feature extraction network, wherein the basic feature extraction network comprises n convolutional layer combinations, and the first convolutional layer combination to the nth convolutional layer combination respectively correspond to the first scale to the nth scale;

the acquiring of the basic feature image of the target image under multiple scales includes:

inputting the target image into the basic feature extraction network, and outputting a first basic feature image to an nth basic feature image through the first convolution layer combination to the nth convolution layer combination, wherein the first basic feature image to the nth basic feature image are the basic feature image under the first scale to the nth scale respectively.

4. The method of claim 3, further comprising:

the pooling of the basic feature image under each scale and the extraction of the local quality feature under each scale according to the pooled abstract feature image comprise:

pooling the ith basic feature image through a pooling layer of the ith local quality feature extraction network to obtain an ith abstract feature image, wherein the ith abstract feature image is an abstract feature image under an ith scale;

processing the ith abstract feature image through a full connection layer of the ith local quality feature extraction network to obtain an ith local quality feature, wherein the ith local quality feature is a local quality feature under an ith scale;

5. The method of claim 4, wherein the adaptive parameters comprise adaptive weight parameters; the method further comprises the following steps:

acquiring a first adaptive parameter determination network to an nth adaptive parameter determination network, wherein each adaptive parameter determination network comprises a semantic feature extraction sub-network and a weight parameter determination sub-network, the semantic feature extraction sub-network comprises a convolutional layer, and the weight parameter determination sub-network comprises a reconstruction layer;

the convolution processing is performed on the basic feature image under each scale to extract the semantic features under each scale, and the self-adaptive parameters under each scale are determined according to the semantic features, and the method comprises the following steps:

inputting the ith basic characteristic image into an ith adaptive parameter determination network;

performing convolution processing on the ith basic feature image through an ith semantic feature extraction sub-network to extract an ith semantic feature, wherein the ith semantic feature extraction sub-network is a semantic feature extraction sub-network of the ith adaptive parameter determination network, and the ith semantic feature is a semantic feature under the ith scale;

6. The method of claim 5, wherein the adaptive parameters further comprise an adaptive bias parameter; each adaptive parameter determination network further comprises a bias parameter determination sub-network, the bias parameter determination sub-network comprising a fully connected layer;

the convolution processing is performed on the basic feature image under each scale to extract the semantic features under each scale, and the self-adaptive parameters under each scale are determined according to the semantic features, and the method further comprises the following steps:

and performing full connection processing on the ith semantic feature through an ith bias parameter determining sub-network to obtain an ith adaptive bias parameter, wherein the ith bias parameter determining sub-network determines the bias parameter of the ith adaptive parameter determining sub-network, and the ith adaptive bias parameter is the adaptive bias parameter under the ith scale.

7. The method of claim 5, further comprising:

inputting the nth basic feature image into the global adaptive parameter determination network;

performing convolution and global pooling on the nth basic feature image through the global semantic feature extraction sub-network to extract global semantic features;

determining a sub-network to perform dimension adjustment on the global semantic features through the global weight parameters to obtain global adaptive weight parameters;

the determining the evaluation value of the target image based on the fusion quality features at each scale comprises:

and performing full-connection processing on the fusion quality characteristics under each scale by using the global adaptive weight parameters to obtain an evaluation value of the target image.

8. The method of claim 7, wherein the global adaptive parameter determination network further comprises a global bias parameter determination sub-network, wherein the global bias parameter determination sub-network comprises a fully connected layer; after inputting the nth base feature image into the global adaptive parameter determination network, the method further comprises:

determining a sub-network to perform full-connection processing on the global semantic features through the global bias parameters to obtain global self-adaptive bias parameters;

the using the global adaptive weight parameter to perform full-connection processing on the fusion quality features under each scale to obtain an evaluation value of the target image includes:

9. The method of claim 7, further comprising:

establishing an image quality evaluation model, wherein the image quality evaluation model comprises the basic feature extraction network to be trained, the first local quality feature extraction network to the nth local quality feature extraction network, the first adaptive parameter determination network to the nth adaptive parameter determination network and the global adaptive parameter determination network;

setting a first learning rate for the basic feature extraction network, setting a second learning rate for the first local quality feature extraction network to the nth local quality feature extraction network, the first adaptive parameter determination network to the nth adaptive parameter determination network, and the global adaptive parameter determination network, and training the image quality evaluation model by using the data set;

wherein the second learning rate is greater than the first learning rate in the first z epochs of training, the second learning rate is equal to the first learning rate after the first z epochs of training, and z is a positive integer.

10. An image quality evaluation apparatus, comprising:

a basic feature image acquisition module configured to acquire basic feature images of a target image at a plurality of scales;

the local quality feature extraction module is configured to perform pooling processing on the basic feature image under each scale and extract local quality features under each scale according to the pooled abstract feature image;

the self-adaptive parameter determining module is configured to perform convolution processing on the basic feature image under each scale so as to extract semantic features under each scale, and determine self-adaptive parameters under each scale according to the semantic features;

the local quality feature fusion module is configured to respectively utilize the adaptive parameters under each scale to fuse the local quality features under the same scale, so as to obtain fused quality features under each scale;

a multi-scale fusion module configured to determine an evaluation value of the target image based on the fusion quality features at each scale.

11. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 1 to 10.

12. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the method of any of claims 1 to 10 via execution of the executable instructions.