CN113256556A

CN113256556A - Image selection method and device

Info

Publication number: CN113256556A
Application number: CN202110332938.3A
Authority: CN
Inventors: 马柯德; 曹佩蓓; 刘毅; 邹学益; 许松岑
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2021-03-29
Filing date: 2021-03-29
Publication date: 2021-08-13

Abstract

The application discloses an image selection method, which is applied to the field of artificial intelligence and comprises the following steps: the method comprises the steps of obtaining a plurality of candidate images, processing each candidate image through a first image processing model and a second image processing model respectively to obtain a first processed image and a second processed image corresponding to each candidate image, obtaining the image quality difference between the first processed image and the second processed image corresponding to each candidate image, and selecting M target images from the candidate images according to the image quality difference. According to the method and the device, the image quality difference corresponding to each candidate image is used as the selection basis of the test sample, so that the image quality difference between the output images obtained when the first image processing model and the second image processing model process the same test sample is large, and image quality evaluating personnel can judge the performance between the first image processing model and the second image processing model subjectively more easily.

Description

Image selection method and device

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to an image selection method and apparatus.

Background

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. In other words, artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new intelligent machine that can react in a manner similar to human intelligence. Artificial intelligence is the research of the design principle and the realization method of various intelligent machines, so that the machines have the functions of perception, reasoning and decision making.

When the performance of the image processing model is compared, an ideal reference image cannot be obtained for a real-world contrast test scene. Therefore, when comparing the performance of the evaluation models, objective evaluation cannot be performed using a reference quality evaluation index (e.g., peak signal-to-noise ratio (PSNR), Structural Similarity (SSIM), etc.). Therefore, an expert is still often required to perform subjective model performance assessment.

In subjective model performance evaluation, a model to be compared in performance needs to process the same test sample and then perform performance evaluation based on the processing result, but the prior art cannot select a test sample that can more accurately represent the performance difference between models.

Disclosure of Invention

In a first aspect, the present application provides an image selection method, comprising:

acquiring a plurality of candidate images;

the method includes acquiring a plurality of candidate images, and a first image processing model and a second image processing model which need to be compared in performance, where the first image processing model and the second image processing model may be used to implement one image processing task, the image enhancement task may be, for example, an image enhancement task, and the plurality of candidate images may be a test sample set for the image enhancement task. Illustratively, the image enhancement task is defogging, and the plurality of candidate images may be a large-scale test sample set for a defogging scene.

Respectively processing each candidate image through a first image processing model and a second image processing model to obtain a first processed image and a second processed image corresponding to each candidate image; wherein the first image processing model and the second image processing model are used to implement the same image processing task; the first processed image is obtained by processing through the first image processing model, and the second processed image is obtained by processing through the second image processing model; the term "processing" is understood to mean the process of reasoning about the model.

Acquiring an image quality difference between a first processed image and a second processed image corresponding to each candidate image, and selecting M target images from the candidate images according to the image quality difference, wherein the M target images are used as test samples of the first image processing model and the second image processing model during performance comparison.

The image quality can be used for representing the degree of restoring the image to the real scene or called the fidelity of the image, and the closer the image is to the real scene, the higher the fidelity of the image is and the higher the image quality is; furthermore, the image quality can also be used to characterize the readability of the image, which is the ability of the image to provide information to a person or machine, and which is not only related to the requirements of the application of the image system, but often also to the subjective perception of the human eye, the higher the readability, the higher the image quality. In one implementation, the indicators used to evaluate image quality may include aspects of resolution, color depth, image distortion, and so forth.

In order to enable experts or other personnel to more accurately perform performance comparison between the first image processing model and the second image processing model, it is necessary to ensure that the difference between the image quality of the output images obtained when the first image processing model and the second image processing model process the same test sample is large, and further, the performance between the first image processing model and the second image processing model can be accurately determined based on the quality of the output images.

According to the embodiment of the application, the image quality difference corresponding to each candidate image is used as the selection basis of the test sample, so that the image quality difference between the output images obtained when the first image processing model and the second image processing model process the same test sample is large, and image quality evaluating personnel can judge the performance between the first image processing model and the second image processing model subjectively more easily.

In one possible implementation, the selecting M target images from the plurality of candidate images according to the image quality difference includes:

and according to the image quality difference corresponding to each candidate image, selecting M target images with the corresponding image quality difference larger than a first threshold value from the plurality of candidate images.

In one implementation, the M target images with the largest difference in image quality may be selected from the plurality of candidate images.

The embodiment of the application can ensure that the image quality difference between the output images obtained when the first image processing model and the second image processing model process the same test sample is larger, thereby helping image quality evaluating personnel to subjectively judge the performance quality between the first image processing model and the second image processing model more easily.

In one possible implementation, the method further comprises:

acquiring image content differences between each candidate image and other candidate images except the candidate image among the candidate images;

the selecting M target images from the plurality of candidate images according to the image quality difference comprises:

and selecting M target images from the plurality of candidate images according to the image quality difference and the image content difference corresponding to each candidate image.

Wherein the image content difference may be indicative of a difference between the target subject in a background region or a foreground region in the image.

Taking the difference between the target subjects in the image content difference foreground region as an example, the difference in image content between an image including a person and an image including a non-human animal (or an inanimate object such as a building or the like) is larger than the difference in image content between images both including persons, and the difference in image content between images including different persons is larger than the difference in image content between images including the same person.

In one possible implementation, the image features of each candidate image may be extracted using a conventional method such as scale-invariant feature transform (SIFT) or convolutional neural network CNN to obtain an image feature vector of each candidate image, which is used to characterize the image content, where the image features may include color features, texture features, shape features, spatial relationship features, and the like of the image. And then calculating a distance metric between the image feature vector of each candidate image and the image feature vectors of other candidate images, wherein the distance metric can express the content difference between the images, and the greater the content difference is, the greater the difference between the image content of the candidate image and the other candidate images is.

It should be understood that the distance metric may also be referred to as metric similarity, and that by calculating a distance metric between two multidimensional data, the similarity between the two multidimensional data may be determined. Generally, the smaller the distance measure between two multidimensional data is, the higher the similarity between the two multidimensional data is, and the smaller the difference is; conversely, the greater the distance metric between two pieces of multidimensional data, the less the similarity between the two pieces of multidimensional data, and the greater the degree of difference. Illustratively, the distance metric may include a Mean Squared Error (MES) distance, an L1 distance, or a Mean Squared Error MSE equidistant. It should be understood that the image content difference may be determined based on other distance measures besides the MES distance and the L1 distance, and the embodiment does not specifically limit the distance measure.

According to the embodiment of the application, the image content difference among the candidate images is used as the selection basis of the test sample, the test sample with larger content difference can be selected, the situation covered by the sampling of the test sample can be ensured to be comprehensive enough, and the reliability of the test result is ensured.

In one possible implementation, the selecting M target images from the plurality of candidate images according to the image quality difference and the image content difference corresponding to each candidate image includes:

determining a selected weight of each candidate image according to the image quality difference and the image content difference corresponding to the candidate image, wherein the selected weight is positively correlated with the image quality difference and the image content difference;

selecting M target images with the selected weights larger than a second threshold value from the plurality of candidate images according to the selected weight of each candidate image.

In one implementation, the M target images selected with the greatest weight may be selected from the plurality of candidate images.

In one possible implementation, the selected weight is a sum of a first weight and a second weight, the first weight being a product of the image quality difference and a corresponding weight, the second weight being a product of the image content difference and a corresponding weight.

The weights corresponding to the image quality difference and the image content difference can be adjusted by the user as needed, depending on whether the user is more concerned about the quality difference or about the diversity difference of the samples.

In one possible implementation, the first image processing model and the second image processing model are used to implement an image enhancement task.

In the embodiment of the present application, the first image processing network and the second image processing network may be used to implement an image enhancement task, and the image enhancement task may be understood as a task for enhancing the quality of a video, for example, the image enhancement task may be a video denoising task, a video defogging task, a super-resolution task, or a high dynamic range task, and is not limited herein.

In a second aspect, the present application provides an image selection apparatus, the apparatus comprising:

an acquisition module for acquiring a plurality of candidate images;

the image processing module is used for respectively processing each candidate image through a first image processing model and a second image processing model to obtain a first processed image and a second processed image corresponding to each candidate image; wherein the first image processing model and the second image processing model are used to implement the same image processing task; the first processed image is obtained by processing through the first image processing model, and the second processed image is obtained by processing through the second image processing model;

and the image selection module is used for acquiring the image quality difference between the first processed image and the second processed image corresponding to each candidate image, and selecting M target images from the candidate images according to the image quality difference, wherein the M target images are used as test samples of the first image processing model and the second image processing model during performance comparison.

In one possible implementation, the image selection module is configured to select M target images from the plurality of candidate images, where corresponding image quality differences are greater than a first threshold, according to the image quality difference corresponding to each candidate image.

In a possible implementation, the obtaining module is configured to obtain an image content difference between each candidate image and other candidate images except for the candidate image itself in the plurality of candidate images;

the image selection module is used for selecting M target images from the candidate images according to the image quality difference and the image content difference corresponding to each candidate image.

Wherein the image content difference may be indicative of a difference between the target subject in a background region and/or a foreground region in the image.

In one possible implementation, the image selection module is configured to determine a selected weight of each candidate image according to the image quality difference and the image content difference corresponding to the candidate image, where the selected weight is positively correlated to the image quality difference and the image content difference;

In a third aspect, the present application provides an image selection method, including:

acquiring a plurality of candidate images;

acquiring image content differences between each candidate image and other candidate images except the candidate image among the candidate images; wherein the image content difference may be indicative of a difference between the target subject in a background region and/or a foreground region in the image. Taking the difference between the target subjects in the image content difference foreground region as an example, the difference in image content between an image including a person and an image including a non-human animal (or an inanimate object such as a building or the like) is larger than the difference in image content between images both including persons, and the difference in image content between images including different persons is larger than the difference in image content between images including the same person.

And selecting M target images from the plurality of candidate images according to the image content difference corresponding to each candidate image.

In one possible implementation, the selecting M target images from the plurality of candidate images according to the image content difference corresponding to each candidate image includes:

and selecting M target images with image content difference larger than a third threshold value from the plurality of candidate images according to the image content difference corresponding to each candidate image.

In one possible implementation, the method further comprises:

respectively processing each candidate image through a first image processing model and a second image processing model to obtain a first processed image and a second processed image corresponding to each candidate image;

acquiring the image quality difference between a first processed image and a second processed image corresponding to each candidate image;

selecting M target images from the plurality of candidate images according to the image content difference corresponding to each candidate image, including:

In a fourth aspect, the present application provides an image selection apparatus, comprising:

an acquisition module for acquiring a plurality of candidate images;

and the image selection module is used for selecting M target images from the candidate images according to the image content difference corresponding to each candidate image.

In one possible implementation, the image selection module is configured to select M target images from the plurality of candidate images, where the difference in image content is greater than a third threshold, according to the difference in image content corresponding to each candidate image.

In one possible implementation, the apparatus further comprises:

the image processing module is used for respectively processing each candidate image through a first image processing model and a second image processing model to obtain a first processed image and a second processed image corresponding to each candidate image;

the acquisition module is used for acquiring the image quality difference between the first processed image and the second processed image corresponding to each candidate image;

In a fifth aspect, an embodiment of the present application provides a model testing method, where the method includes:

acquiring M target images;

respectively processing each target image through a first image processing model and a second image processing model to obtain a first processed image and a second processed image corresponding to each target image; the first image processing model and the second image processing model are used for realizing the same image processing task, the first processed image is obtained by processing the first image processing model, and the second processed image is obtained by processing the second image processing model; and the image quality difference between the first processing image and the second processing image corresponding to each target image is larger than a first threshold value;

and obtaining a model test result, wherein the model test result is used for representing a performance comparison result between the first image processing model and the second image processing model, and the model test result is determined according to the first processing image and the second processing image corresponding to each target image.

According to the image quality evaluating method and device, the image quality difference between the output images obtained when the first image processing model and the second image processing model process the same target image is large, and therefore image quality evaluating personnel can judge the performance between the first image processing model and the second image processing model subjectively and easily.

In one possible implementation, the selected weight of each of the M target images is greater than a second threshold, and the selected weight of each target image is positively correlated to the image quality difference and the image content difference corresponding to each target image, where the image content difference corresponding to each target image is used to represent the image content difference between each target image and the other target images except for the target image itself.

According to the embodiment of the application, the image content difference between the target images is large, namely, scenes covered by the M target images are comprehensive enough, and the reliability of the test result is ensured.

In a sixth aspect, an embodiment of the present application provides a model testing apparatus, including:

the acquisition module is used for acquiring M target images;

the image processing module is used for respectively processing each target image through a first image processing model and a second image processing model to obtain a first processing image and a second processing image corresponding to each target image; the first image processing model and the second image processing model are used for realizing the same image processing task, the first processed image is obtained by processing the first image processing model, and the second processed image is obtained by processing the second image processing model; and the image quality difference between the first processing image and the second processing image corresponding to each target image is larger than a first threshold value; the first processing image and the second processing image corresponding to each target image are used for performing performance comparison between the first image processing model and the second image processing model;

the obtaining module is further configured to obtain a model test result, where the model test result is used to represent a performance comparison result between the first image processing model and the second image processing model, and the model test result is determined according to the first processed image and the second processed image corresponding to each target image.

In a seventh aspect, an embodiment of the present application provides an image selecting apparatus, which may include a memory, a processor, and a bus system, where the memory is used to store a program, and the processor is used to execute the program in the memory to perform the methods according to the first aspect, the second aspect, the fifth aspect, and any optional method described above.

In an eighth aspect, embodiments of the present application provide a computer-readable storage medium having a computer program stored thereon, which, when run on a computer, causes the computer to perform the method of the first, second, fifth and any optional method described above.

In a ninth aspect, embodiments of the present application provide a computer program comprising code for implementing the first, second, fifth and any optional method described above, when the code is executed.

In a tenth aspect, the present application provides a chip system, which includes a processor, configured to support an executing device or a training device to implement the functions recited in the above aspects, for example, to transmit or process data recited in the above methods; or, information. In one possible design, the system-on-chip further includes a memory for storing program instructions and data necessary for the execution device or the training device. The chip system may be formed by a chip, or may include a chip and other discrete devices.

The embodiment of the application provides an image selection method, which is characterized by comprising the following steps: acquiring a plurality of candidate images; respectively processing each candidate image through a first image processing model and a second image processing model to obtain a first processed image and a second processed image corresponding to each candidate image; wherein the first image processing model and the second image processing model are used to implement the same image processing task; the first processed image is obtained by processing through the first image processing model, and the second processed image is obtained by processing through the second image processing model; acquiring an image quality difference between a first processed image and a second processed image corresponding to each candidate image, and selecting M target images from the candidate images according to the image quality difference, wherein the M target images are used as test samples of the first image processing model and the second image processing model during performance comparison. By adopting the mode, the image quality difference corresponding to each candidate image is used as the selection basis of the test sample, so that the image quality difference between the output images obtained when the first image processing model and the second image processing model process the same test sample is ensured to be larger, and an image quality evaluating person can judge the performance between the first image processing model and the second image processing model subjectively more easily.

In addition, the image content difference among the candidate images is used as a selection basis of the test sample, the test sample with larger content difference can be selected, the scene covered by the sampling of the test sample can be ensured to be comprehensive enough, and the reliability of the test result is ensured.

Drawings

FIG. 1 is a schematic structural diagram of an artificial intelligence body framework;

fig. 2 is a schematic diagram of an application scenario provided in an embodiment of the present application;

fig. 3 is a schematic diagram of an application scenario provided in an embodiment of the present application;

FIG. 4 is a schematic diagram of a convolutional neural network provided in an embodiment of the present application;

FIG. 5 is a schematic diagram of a convolutional neural network provided in an embodiment of the present application;

FIG. 6 is a block diagram of a system according to an embodiment of the present disclosure;

fig. 7 is a structural schematic diagram of a chip provided in an embodiment of the present application;

fig. 8 is a schematic diagram of an image selection method provided in an embodiment of the present application;

FIG. 9 is a schematic of the structure of an image processing model;

FIG. 10a is a schematic diagram of an image selection method provided in an embodiment of the present application;

FIG. 10b is a schematic diagram of an image selection method provided in an embodiment of the present application;

FIG. 10c is a schematic representation of the difference in quality of the target image in the defogging effect;

FIG. 10d is a schematic diagram of a model testing method according to an embodiment of the present application;

fig. 11 is a schematic diagram of an image selection apparatus according to an embodiment of the present application;

fig. 12 is a schematic diagram of an image selection apparatus according to an embodiment of the present application;

FIG. 13 is a schematic diagram of a model testing apparatus according to an embodiment of the present disclosure;

fig. 14 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application.

Detailed Description

The embodiments of the present invention will be described below with reference to the drawings. The terminology used in the description of the embodiments of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention.

Embodiments of the present application are described below with reference to the accompanying drawings. As can be known to those skilled in the art, with the development of technology and the emergence of new scenarios, the technical solution provided in the embodiments of the present application is also applicable to similar technical problems.

The terms "first," "second," and the like in the description and in the claims of the present application and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the terms so used are interchangeable under appropriate circumstances and are merely descriptive of the various embodiments of the application and how objects of the same nature can be distinguished. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of elements is not necessarily limited to those elements, but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

The general workflow of the artificial intelligence system will be described first, please refer to fig. 1, which shows a schematic structural diagram of an artificial intelligence body framework, and the artificial intelligence body framework is explained below from two dimensions of "intelligent information chain" (horizontal axis) and "IT value chain" (vertical axis). Where "intelligent information chain" reflects a list of processes processed from the acquisition of data. For example, the general processes of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision making and intelligent execution and output can be realized. In this process, the data undergoes a "data-information-knowledge-wisdom" refinement process. The 'IT value chain' reflects the value of the artificial intelligence to the information technology industry from the bottom infrastructure of the human intelligence, information (realization of providing and processing technology) to the industrial ecological process of the system.

(1) Infrastructure

The infrastructure provides computing power support for the artificial intelligent system, realizes communication with the outside world, and realizes support through a foundation platform. Communicating with the outside through a sensor; the computing power is provided by an intelligent chip, such as a Central Processing Unit (CPU), a Network Processor (NPU), a Graphic Processor (GPU), an Application Specific Integrated Circuit (ASIC), or a Field Programmable Gate Array (FPGA), or other hardware acceleration chip; the basic platform comprises distributed computing framework, network and other related platform guarantees and supports, and can comprise cloud storage and computing, interconnection and intercommunication networks and the like. For example, sensors and external communications acquire data that is provided to intelligent chips in a distributed computing system provided by the base platform for computation.

(2) Data of

Data at the upper level of the infrastructure is used to represent the data source for the field of artificial intelligence. The data relates to graphs, images, voice and texts, and also relates to the data of the Internet of things of traditional equipment, including service data of the existing system and sensing data such as force, displacement, liquid level, temperature, humidity and the like.

(3) Data processing

Data processing typically includes data training, machine learning, deep learning, searching, reasoning, decision making, and the like.

The machine learning and the deep learning can perform symbolized and formalized intelligent information modeling, extraction, preprocessing, training and the like on data.

Inference means a process of simulating an intelligent human inference mode in a computer or an intelligent system, using formalized information to think about and solve a problem by a machine according to an inference control strategy, and a typical function is searching and matching.

The decision-making refers to a process of making a decision after reasoning intelligent information, and generally provides functions of classification, sequencing, prediction and the like.

(4) General capabilities

After the above-mentioned data processing, further based on the result of the data processing, some general capabilities may be formed, such as algorithms or a general system, e.g. translation, analysis of text, computer vision processing, speech recognition, recognition of images, etc.

(5) Intelligent product and industrial application

The intelligent product and industry application refers to the product and application of an artificial intelligence system in various fields, and is the encapsulation of an artificial intelligence integral solution, the intelligent information decision is commercialized, and the landing application is realized, and the application field mainly comprises: intelligent terminal, intelligent transportation, intelligent medical treatment, autopilot, intelligent city etc..

The image selection method provided by the embodiment of the application can specifically select the performance difference between the image processing models (such as the first image processing model and the second image processing model in the embodiment of the application) which can be more accurately evaluated.

The first image processing model and the second image processing model in the embodiment of the application can be applied to intelligent vehicles for assisting driving and automatic driving, and can also be applied to the fields of computer vision fields such as smart cities and intelligent terminals and the like which need image enhancement. For example, the first image processing model and the second image processing model can be applied in a video streaming scenario as well as a video surveillance scenario. A brief description of a video streaming scenario and a video surveillance scenario is provided below in conjunction with fig. 2 and 3, respectively.

Video streaming scenario:

for example, when a client using a smart terminal (e.g., in a cell phone, car, robot, tablet, desktop, smart watch, virtual reality VR, augmented reality AR device, etc.) plays a video, to reduce the bandwidth requirement of the video stream, the server may transmit a downsampled, lower resolution, low quality video stream over the network to the client. The client may then enhance the images in the low-quality video stream using the trained second video processing network. For example, the images in the video are subjected to super-resolution, noise reduction and other operations, and finally, high-quality images are presented to the user.

Video monitoring scene:

in the security field, the method is limited by adverse conditions such as the installation position of a monitoring camera, limited storage space and the like, and the image quality of part of video monitoring is poor, so that the accuracy of identifying a target by people or an identification algorithm is influenced. Therefore, the trained second video processing network provided by the embodiment of the application can be used for converting low-quality video monitoring videos into high-quality high-definition videos, so that effective recovery of a large amount of details in the monitored images is realized, and more effective and richer information is provided for subsequent target identification tasks.

Since the embodiments of the present application relate to the application of a large number of neural networks, for the convenience of understanding, the related terms and related concepts such as neural networks related to the embodiments of the present application will be described below.

(1) Neural network

The neural network may be composed of neural units, and the neural units may refer to operation units with xs (i.e. input data) and intercept 1 as inputs, and the output of the operation units may be:

where s is 1, 2, … … n, n is a natural number greater than 1, Ws is the weight of xs, and b is the bias of the neural unit. f is an activation function (activation functions) of the neural unit for introducing a nonlinear characteristic into the neural network to convert an input signal in the neural unit into an output signal. The output signal of the activation function may be used as an input for the next convolutional layer, and the activation function may be a sigmoid function. A neural network is a network formed by a plurality of the above-mentioned single neural units being joined together, i.e. the output of one neural unit may be the input of another neural unit. The input of each neural unit can be connected with the local receiving domain of the previous layer to extract the characteristics of the local receiving domain, and the local receiving domain can be a region composed of a plurality of neural units.

(2) A Convolutional Neural Network (CNN) is a deep neural network with a convolutional structure. The convolutional neural network comprises a feature extractor consisting of convolutional layers and sub-sampling layers, which can be regarded as a filter. The convolutional layer is a neuron layer for performing convolutional processing on an input signal in a convolutional neural network. In convolutional layers of convolutional neural networks, one neuron may be connected to only a portion of the neighbor neurons. In a convolutional layer, there are usually several characteristic planes, and each characteristic plane may be composed of several neural units arranged in a rectangular shape. The neural units of the same feature plane share weights, where the shared weights are convolution kernels. Sharing weights may be understood as the way features are extracted is location independent. The convolution kernel may be formalized as a matrix of random size, and may be learned to obtain reasonable weights during the training of the convolutional neural network. In addition, sharing weights brings the direct benefit of reducing connections between layers of the convolutional neural network, while reducing the risk of overfitting.

CNN is a very common neural network, and the structure of CNN will be described in detail below with reference to fig. 4. As described in the introduction of the basic concept, the convolutional neural network is a deep neural network with a convolutional structure, and is a deep learning (deep learning) architecture, and the deep learning architecture refers to performing multiple levels of learning at different abstraction levels through a machine learning algorithm. As a deep learning architecture, CNN is a feed-forward artificial neural network in which individual neurons can respond to images input thereto.

As shown in fig. 4, the Convolutional Neural Network (CNN)200 may include an input layer 210, a convolutional/pooling layer 220 (where the pooling layer is optional), and a fully connected layer 230.

Convolutional layer/pooling layer 220:

and (3) rolling layers:

the convolutional layer/pooling layer 220 shown in fig. 4 may include layers such as 221 and 226, for example: in one implementation, 221 is a convolutional layer, 222 is a pooling layer, 223 is a convolutional layer, 224 is a pooling layer, 225 is a convolutional layer, 226 is a pooling layer; in another implementation, 221, 222 are convolutional layers, 223 is a pooling layer, 224, 225 are convolutional layers, and 226 is a pooling layer. I.e., the output of a convolutional layer may be used as input to a subsequent pooling layer, or may be used as input to another convolutional layer to continue the convolution operation.

The inner working principle of a convolutional layer will be described below by taking convolutional layer 221 as an example.

Convolution layer 221 may include a number of convolution operators, also called kernels, whose role in image processing is to act as a filter to extract specific information from the input image matrix, and the convolution operator may be essentially a weight matrix, which is usually predefined, and during the convolution operation on the image, the weight matrix is usually processed pixel by pixel (or two pixels by two pixels … …, depending on the value of the step size stride) in the horizontal direction on the input image, so as to complete the task of extracting specific features from the image. The size of the weight matrix should be related to the size of the image, and it should be noted that the depth dimension (depth dimension) of the weight matrix is the same as the depth dimension of the input image, and the weight matrix extends to the entire depth of the input image during the convolution operation. Thus, convolving with a single weight matrix will produce a single depth dimension of the convolved output, but in most cases not a single weight matrix is used, but a plurality of weight matrices of the same size (row by column), i.e. a plurality of matrices of the same type, are applied. The outputs of each weight matrix are stacked to form the depth dimension of the convolved image, where the dimension is understood to be determined by "plurality" as described above. Different weight matrices may be used to extract different features in the image, e.g., one weight matrix to extract image edge information, another weight matrix to extract a particular color of the image, yet another weight matrix to blur unwanted noise in the image, etc. The plurality of weight matrices have the same size (row × column), the feature maps extracted by the plurality of weight matrices having the same size also have the same size, and the extracted feature maps having the same size are combined to form the output of the convolution operation.

The weight values in these weight matrices need to be obtained through a large amount of training in practical application, and each weight matrix formed by the trained weight values can be used to extract information from the input image, so that the convolutional neural network 200 can make correct prediction.

When convolutional neural network 200 has multiple convolutional layers, the initial convolutional layer (e.g., 221) tends to extract more general features, which may also be referred to as low-level features; as the depth of convolutional neural network 200 increases, the more convolutional layers (e.g., 226) that go further back extract more complex features, such as features with high levels of semantics, the more highly semantic features are more suitable for the problem to be solved.

A pooling layer:

since it is often desirable to reduce the number of training parameters, it is often desirable to periodically introduce pooling layers after the convolutional layer, where the layers 221-226, as illustrated by 220 in fig. 4, may be one convolutional layer followed by one pooling layer, or multiple convolutional layers followed by one or more pooling layers. During image processing, the only purpose of the pooling layer is to reduce the spatial size of the image. The pooling layer may include an average pooling operator and/or a maximum pooling operator for sampling the input image to smaller sized images. The average pooling operator may calculate pixel values in the image over a certain range to produce an average as a result of the average pooling. The max pooling operator may take the pixel with the largest value in a particular range as the result of the max pooling. In addition, just as the size of the weighting matrix used in the convolutional layer should be related to the image size, the operators in the pooling layer should also be related to the image size. The size of the image output after the processing by the pooling layer may be smaller than the size of the image input to the pooling layer, and each pixel point in the image output by the pooling layer represents an average value or a maximum value of a corresponding sub-region of the image input to the pooling layer.

Fully connected layer 230:

after processing by convolutional layer/pooling layer 220, convolutional neural network 200 is not sufficient to output the required output information. Because, as previously described, the convolutional layer/pooling layer 220 only extracts features and reduces the parameters brought by the input image. However, to generate the final output information (required class information or other relevant information), the convolutional neural network 200 needs to generate one or a set of the required number of classes of output using the fully-connected layer 230. Thus, multiple hidden layers (231, 232, to 23n as shown in fig. 4) may be included in the fully-connected layer 230, and parameters included in the multiple hidden layers may be pre-trained according to the associated training data of a specific task type, for example, the task type may include … … for image recognition, image classification, image super-resolution reconstruction, and so on

After the hidden layers in the fully-connected layer 230, i.e., the last layer of the whole convolutional neural network 200 is the output layer 240, the output layer 240 has a loss function similar to the classification cross entropy, and is specifically used for calculating the prediction error, once the forward propagation (i.e., the propagation from the direction 210 to 240 in fig. 4 is the forward propagation) of the whole convolutional neural network 200 is completed, the backward propagation (i.e., the propagation from the direction 240 to 210 in fig. 4 is the backward propagation) starts to update the weight values and the bias of the aforementioned layers, so as to reduce the loss of the convolutional neural network 200, and the error between the result output by the convolutional neural network 200 through the output layer and the ideal result.

It should be noted that the convolutional neural network 200 shown in fig. 4 is only an example of a convolutional neural network, and in a specific application, the convolutional neural network may also exist in the form of other network models, for example, only includes a part of the network structure shown in fig. 4, for example, the convolutional neural network employed in the embodiment of the present application may only include the input layer 210, the convolutional layer/pooling layer 220, and the output layer 240.

It should be noted that the convolutional neural network 100 shown in fig. 4 is only an example of a convolutional neural network, and in a specific application, the convolutional neural network may also exist in the form of other network models, for example, as shown in fig. 5, a plurality of convolutional layers/pooling layers are parallel, and the features extracted respectively are all input to the fully-connected layer 230 for processing.

(3) Deep neural network

Deep neural network (De)ep Neural Network, DNN), also known as a multi-layer Neural Network, may be understood as a Neural Network having many hidden layers, where "many" has no particular metric. From the division of DNNs by the location of different layers, neural networks inside DNNs can be divided into three categories: input layer, hidden layer, output layer. Generally, the first layer is an input layer, the last layer is an output layer, and the middle layers are hidden layers. The layers are all connected, that is, any neuron of the ith layer is necessarily connected with any neuron of the (i + 1) th layer. Although DNN appears complex, it is not really complex in terms of the work of each layer, simply the following linear relational expression:

wherein the content of the first and second substances,

is the input vector of the input vector,

is the output vector of the output vector,

is an offset vector, W is a weight matrix (also called coefficient), and α () is an activation function. Each layer is only for the input vector

Obtaining the output vector through such simple operation

Due to the large number of DNN layers, the coefficient W and the offset vector

The number of the same is large. The definition of these parameters in DNN is as follows: taking coefficient W as an example: assume that in a three-layer DNN, the linear coefficients of the 4 th neuron of the second layer to the 2 nd neuron of the third layer are defined as

The superscript 3 represents the number of layers in which the coefficient W is located, while the subscripts correspond to the third layer index 2 of the output and the second layer index 4 of the input. The summary is that: the coefficients of the kth neuron of the L-1 th layer to the jth neuron of the L-1 th layer are defined as

Note that the input layer is without the W parameter. In deep neural networks, more hidden layers make the network more able to depict complex situations in the real world. Theoretically, the more parameters the higher the model complexity, the larger the "capacity", which means that it can accomplish more complex learning tasks. The final goal of the process of training the deep neural network, i.e., learning the weight matrix, is to obtain the weight matrix (the weight matrix formed by the vectors W of many layers) of all the layers of the deep neural network that is trained.

(4) Super-resolution

Super Resolution (SR) is an image enhancement technique, in which one or a group of low-Resolution images are given, and high-frequency detail information of the images is restored by means of learning priori knowledge of the images, similarity of the images themselves, multi-frame image information complementation and the like, so as to generate a target image with higher Resolution. In the application of super-resolution, the super-resolution can be divided into single-frame image super-resolution and video super-resolution according to the number of input images. The super-resolution has important application value in the fields of high-definition televisions, monitoring equipment, satellite images, medical images and the like.

(5) Noise reduction

Images are often affected by the imaging device and the external environment during digitization and transmission, resulting in images containing noise. The process of reducing noise in an image is referred to as image denoising, which is sometimes referred to as image denoising.

(6) Image features

The image features mainly include color features, texture features, shape features, spatial relationship features and the like of the image.

The color feature is a global feature describing surface properties of a scene corresponding to an image or an image area; the general color features are based on the characteristics of the pixel points, and all pixels belonging to the image or the image area have respective contributions. Since color is not sensitive to changes in the orientation, size, etc. of an image or image region, color features do not capture local features of objects in an image well.

Texture features are also global features that also describe the surface properties of the scene corresponding to the image or image area; however, since texture is only a characteristic of the surface of an object and does not completely reflect the essential attributes of the object, high-level image content cannot be obtained by using texture features alone. Unlike color features, texture features are not based on the characteristics of the pixel points, which requires statistical calculations in regions containing multiple pixel points.

The shape features are represented in two types, one is outline features, the other is region features, the outline features of the image mainly aim at the outer boundary of the object, and the region features of the image are related to the whole shape region.

The spatial relationship characteristic refers to the mutual spatial position or relative direction relationship among a plurality of targets segmented from the image, and these relationships can be also divided into a connection/adjacency relationship, an overlapping/overlapping relationship, an inclusion/containment relationship, and the like. In general, spatial location information can be divided into two categories: relative spatial position information and absolute spatial position information. The former relation emphasizes the relative situation between the objects, such as the upper, lower, left and right relations, and the latter relation emphasizes the distance and orientation between the objects.

It should be noted that the above listed image features can be taken as some examples of features in the image, and the image can also have other features, such as features of higher levels: semantic features, which are not expanded here.

(7) Image/video enhancement

Image/video enhancement refers to actions on images/video that can improve the imaging quality. For example, enhancement processing includes super-resolution, noise reduction, sharpening, or demosaicing, among others.

The system architecture provided by the embodiment of the present application is described in detail below with reference to fig. 6. Fig. 6 is a schematic diagram of a system architecture according to an embodiment of the present application. As shown in FIG. 6, the system architecture 500 includes an execution device 510, a training device 520, a database 530, a client device 540, a data storage system 550, and a data collection system 560.

The execution device 510 includes a computation module 511, an I/O interface 512, a pre-processing module 513, and a pre-processing module 514. The target model/rule 501 may be included in the calculation module 511, with the pre-processing module 513 and the pre-processing module 514 being optional.

The data acquisition device 560 is used to acquire training data. The video sample can be a low-quality image, and the supervision image is a high-quality image corresponding to the image sample acquired in advance before model training. The image sample may be, for example, a low resolution image, and the surveillance image is a high resolution image; alternatively, the image sample may be, for example, a video containing fog or noise, and the surveillance image is an image from which the fog or noise is removed. After the training data is collected, data collection facility 560 stores the training data in database 530, and training facility 520 trains target model/rule 501 based on the training data maintained in database 530.

The target model/rule 501 (e.g., the first image processing model and the second image processing model in the embodiment of the present application) can be used to implement an image enhancement task, that is, an image to be processed is input into the target model/rule 501, so that a processed enhanced image (e.g., the first processed image and the second processed image in the embodiment of the present application) can be obtained. It should be noted that, in practical applications, the training data maintained in the database 530 may not necessarily all come from the collection of the data collection device 560, and may also be received from other devices. It should be noted that, the training device 520 does not necessarily perform the training of the target model/rule 501 based on the training data maintained by the database 530, and may also obtain the training data from the cloud or other places to perform the model training, and the above description should not be taken as a limitation to the embodiments of the present application.

The target model/rule 501 obtained by training according to the training device 520 may be applied to different systems or devices, for example, the executing device 510 shown in fig. 6, where the executing device 510 may be a terminal, such as a mobile phone terminal, a tablet computer, a notebook computer, an Augmented Reality (AR)/Virtual Reality (VR) device, a vehicle-mounted terminal, or a server or a cloud. In fig. 6, the execution device 510 configures an input/output (I/O) interface 512 for data interaction with an external device, and a user may input data to the I/O interface 512 through a client device 540.

The pre-processing module 513 and the pre-processing module 514 are used for pre-processing according to the input data (such as the plurality of candidate images in the embodiment of the present application) received by the I/O interface 512. It should be understood that there may be no pre-processing module 513 and pre-processing module 514 or only one pre-processing module. When the pre-processing module 513 and the pre-processing module 514 are not present, the input data may be processed directly using the calculation module 511.

During the process of preprocessing the input data by the execution device 510 or performing the calculation and other related processes by the calculation module 511 of the execution device 510, the execution device 510 may call the data, codes and the like in the data storage system 550 for corresponding processes, or store the data, instructions and the like obtained by corresponding processes in the data storage system 550.

Finally, the I/O interface 512 presents the processing results, such as the enhanced images (e.g., the first processed image and the second processed image in the embodiment of the present application) obtained after the processing to the client device 540, so as to provide them to the user.

It is worth noting that the training device 520 may generate corresponding target models/rules 501 for different targets or different tasks based on different training data, and the corresponding target models/rules 501 may be used to implement the image enhancement task, so as to provide the user with the required results.

In the case shown in fig. 6, the user may manually give input data (e.g., a plurality of candidate images in the embodiment of the present application), and the "manually given input data" may operate through an interface provided by the I/O interface 512. Alternatively, the client device 540 may automatically send the input data to the I/O interface 512, and if the client device 540 is required to automatically send the input data to obtain authorization from the user, the user may set the corresponding permissions in the client device 540. The user can view the results output by the execution device 510 at the client device 540, and the specific presentation form can be display, sound, action, and the like. The client device 540 may also serve as a data collection terminal, collecting input data of the input I/O interface 512 and output results of the output I/O interface 512 as new sample data, as shown, and storing the new sample data in the database 530. Of course, the input data inputted to the I/O interface 512 and the output result outputted from the I/O interface 512 as shown in the figure may be directly stored in the database 530 as new sample data by the I/O interface 512 without being collected by the client device 540.

Embodiments of the application may only select from a plurality of candidate images as a test sample to evaluate model performance based on a first processed image and a second processed image.

It should be noted that fig. 6 is only a schematic diagram of a system architecture provided in the embodiment of the present application, and the position relationship between the devices, modules, and the like shown in the diagram does not constitute any limitation, for example, in fig. 6, the data storage system 550 is an external memory with respect to the execution device 510, and in other cases, the data storage system 550 may be disposed in the execution device 510.

A hardware structure of a chip provided in an embodiment of the present application is described below.

Fig. 7 is a hardware structure diagram of a chip provided in an embodiment of the present application, where the chip includes a neural network processor 700. The chip may be disposed in the execution device 510 as shown in fig. 6 to complete the calculation work of the calculation module 511. The chip may also be disposed in a training apparatus 520 as shown in fig. 6 to complete the training work of the training apparatus 520 and output the target model/rule 501. The algorithms for the various layers in the image processing model shown in fig. 6 may all be implemented in a chip as shown in fig. 7.

A neural Network Processor (NPU) 700 is mounted as a coprocessor on a host central processing unit (host CPU), and tasks are allocated by the host CPU. The core portion of the NPU is an arithmetic circuit 703, and the controller 704 controls the arithmetic circuit 703 to extract data in a memory (the weight memory 702 or the input memory 701) and perform arithmetic.

In some implementations, the arithmetic circuit 703 includes a plurality of processing units (PEs) therein. In some implementations, the operational circuit 703 is a two-dimensional systolic array. The arithmetic circuit 703 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuitry 703 is a general-purpose matrix processor.

For example, assume that there is an input matrix A, a weight matrix B, and an output matrix C. The arithmetic circuit 703 fetches the data corresponding to the matrix B from the weight memory 702 and buffers it in each PE in the arithmetic circuit 703. The arithmetic circuit 703 takes the matrix a data from the input memory 701 and performs matrix arithmetic with the matrix B, and stores a partial result or a final result of the matrix in an accumulator (accumulator) 708.

The vector calculation unit 707 may further process the output of the operation circuit 703, such as vector multiplication, vector addition, exponential operation, logarithmic operation, magnitude comparison, and the like. For example, the vector calculation unit 707 may be used for network calculations of non-convolution/non-FC layers in a neural network, such as pooling (pooling), batch normalization (batch normalization), local response normalization (local response normalization), and the like.

In some implementations, the vector calculation unit 707 can store the processed output vector to the unified memory 706. For example, the vector calculation unit 707 may apply a non-linear function to the output of the arithmetic circuit 703, such as a vector of accumulated values, to generate the activation value. In some implementations, the vector calculation unit 707 generates normalized values, combined values, or both. In some implementations, the vector of processed outputs can be used as activation inputs to the arithmetic circuitry 703, for example for use in subsequent layers in a neural network.

The unified memory 706 is used to store input data as well as output data.

The weight data directly passes through a memory cell access controller (DMAC) 705 to transfer the input data in the external memory to the input memory 701 and/or the unified memory 706, store the weight data in the external memory into the weight memory 702, and store the data in the unified memory 706 into the external memory.

A Bus Interface Unit (BIU) 710, configured to implement interaction between the main CPU, the DMAC, and the instruction fetch memory 709 through a bus.

An instruction fetch buffer 709 connected to the controller 704 for storing instructions used by the controller 704.

The controller 704 is configured to call the instruction cached in the instruction fetch memory 709, so as to control the working process of the operation accelerator.

Generally, the unified memory 706, the input memory 701, the weight memory 702, and the instruction fetch memory 709 are all on-chip memories, the external memory is a memory outside the NPU, and the external memory may be a double data rate synchronous dynamic random access memory (DDR SDRAM), a High Bandwidth Memory (HBM), or other readable and writable memories.

Referring to fig. 8, fig. 8 is a schematic diagram of an embodiment of an image selection method provided in an embodiment of the present application, and as shown in fig. 8, an image selection method provided in an embodiment of the present application includes:

801. a plurality of candidate images are acquired.

In this embodiment of the present application, a plurality of candidate images and a first image processing model and a second image processing model that need to be compared in performance may be acquired, where the first image processing model and the second image processing model may be used to implement one image processing task, the image enhancement task may be, for example, an image enhancement task, and the plurality of candidate images may be a test sample set specific to the image enhancement task. Illustratively, the image enhancement task is defogging, and the plurality of candidate images may be a large-scale test sample set for a defogging scene.

It should be understood that the first image processing network and the second image processing network are models for implementing the same image processing task, and the present application does not limit the specific type of image processing task. Taking the image enhancement task as an example of the super-resolution task, an example of a network structure of the first image processing network and the second image processing network is described next.

Referring to fig. 9, fig. 9 is a structural schematic diagram of an image processing model, as shown in fig. 9, an image to be processed may be a low resolution image (LR), an image frame with low resolution may be processed by a feature extraction module to obtain image features, and then the feature map may be processed by a plurality of basic units, where a basic unit may be a network structure obtained by connecting basic modules through basic operations of a neural network, the network structure may include a preset basic operation or a combination of basic operations in a convolutional neural network, and these basic operations or combinations of basic operations may be collectively referred to as basic operations. For example, the basic operation may refer to a convolution operation, a pooling operation, a residual join, etc., and the basic operation may be used to join the basic modules, so as to obtain the network structure of the basic unit. The nonlinear transformation part is used for transforming the image characteristics of the image to be input and mapping the image characteristics to a high-dimensional characteristic space, and the mapped high-dimensional space is easier to reconstruct a hyper-resolution image under normal conditions; the reconstruction part is used for performing up-sampling and convolution processing on the image features output by the nonlinear change part to obtain a super-resolution image (high resolution, LR, HR) corresponding to the image to be input.

802. Respectively processing each candidate image through a first image processing model and a second image processing model to obtain a first processed image and a second processed image corresponding to each candidate image; wherein the first image processing model and the second image processing model are used to implement the same image processing task; the first processed image is obtained by processing through the first image processing model, and the second processed image is obtained by processing through the second image processing model.

In the embodiment of the present application, each candidate image may be processed by a first image processing model and a second image processing model respectively to obtain a first processed image and a second processed image corresponding to each candidate image, where "processing" may be understood as an inference process of the models.

In this embodiment of the application, the first image processing model and the second image processing model are used to implement the same image processing task, and taking the example that the first image processing network and the second image processing network are used to implement the image enhancement task, the first processed image is an enhanced image obtained by image enhancement of each candidate image through the first image processing network, and the second processed image is an enhanced image obtained by image enhancement of each candidate image through the second image processing network.

Taking the image enhancement task as an example of defogging, the first processed image and the second processed image are candidate images after defogging.

In the embodiment of the present application, each candidate image of the multiple candidate images may be sequentially traversed and processed, or the multiple candidate images may be processed in parallel, so as to obtain the first processed image and the second processed image corresponding to each candidate image.

803. Acquiring an image quality difference between a first processed image and a second processed image corresponding to each candidate image, and selecting M target images from the candidate images according to the image quality difference, wherein the M target images are used as test samples of the first image processing model and the second image processing model during performance comparison.

How to obtain the image quality difference between the first processed image and the second processed image corresponding to each candidate image is described next:

in one implementation, the image quality difference may be calculated based on a non-convolutional neural network, such as PSNR, SSIM, or the like, or may also be calculated based on a convolutional neural network, such as learning-aware image block similarity (LPIPS), DISTS, or the like.

In the embodiment of the application, the image quality difference between the first processed image and the second processed image corresponding to each candidate image is used as a factor when the test sample is selected, and the test sample used for performance comparison of the first image processing model and the second image processing model is selected from a plurality of candidate samples.

Specifically, in one implementation, M target images with corresponding image quality differences larger than the first threshold may be selected from the plurality of candidate images according to the image quality difference corresponding to each candidate image. That is, each candidate image may be calculated to have a corresponding image quality difference, and M target images are selected from the plurality of candidate images by comparing the magnitudes of the image quality differences corresponding to the respective candidate images.

Referring to fig. 10a and 10b, fig. 10a and 10b are flow diagrams illustrating an image selection method in an embodiment of the present application, where for a plurality of candidate images, a first image processing model and a second image processing model respectively process each candidate image, so as to obtain a first processed image and a second processed image corresponding to each candidate image, and which candidate images are selected as target images required for performing a model performance test based on an image quality difference between the first processed image and the second processed image. As shown in fig. 10a, for the candidate image shown in fig. 10a, since the difference in image quality between the corresponding first processed image (the image on the upper right in fig. 10 a) and the corresponding second processed image (the image on the lower right in fig. 10 a) is large (specifically, the first processed image in fig. 10a has richer texture details of the trunk and the leaves, more sharpness, and higher brightness of the clumps in the background area, and richer texture details), it can be used as a test sample for performing the model performance test. As shown in fig. 10b, for the candidate image shown in fig. 10b, since the difference of image quality between the corresponding first processed image (the upper right image in fig. 10 b) and second processed image (the lower right image in fig. 10 b) is small, it cannot be used as a test sample for performing the model performance test.

In one implementation, in addition to image quality differences as a selection factor for the test sample, image content differences between candidate images may also be used as a selection factor for the test sample.

In order to ensure that the test samples adopted in the model performance comparison can cover more scenes, the reliability of the subsequent test results can be ensured under the condition that the scenes covered by the test samples are comprehensive enough. The scene covered by the test samples is sufficiently comprehensive, and the difference of the image content between the test samples is large, for example, each test sample includes different target objects, for example, the test sample a includes a person, the test sample B includes a dog, the test sample C includes a cat, and the like.

Specifically, in this embodiment of the present application, image content differences between each candidate image and other candidate images except for the candidate image itself in the plurality of candidate images may be obtained, and M target images may be selected from the plurality of candidate images according to the image quality difference and the image content difference corresponding to each candidate image.

Next, how to calculate the image content difference between each candidate image and the other candidate images except for itself among the plurality of candidate images is described:

in one possible implementation, the image features of each candidate image may be extracted using a conventional method such as scale-invariant feature transform (SIFT) or convolutional neural network CNN, to obtain an image feature vector of each candidate image, which is used to characterize the image content. And then calculating a distance metric between the image feature vector of each candidate image and the image feature vectors of other candidate images, wherein the distance metric can express the content difference between the images, and the greater the content difference is, the greater the difference between the image content of the candidate image and the other candidate images is.

It should be understood that the distance metric may also be referred to as metric similarity, and that by calculating a distance metric between two multidimensional data, the similarity between the two multidimensional data may be determined. Generally, the smaller the distance metric between two multidimensional data, the higher the similarity between the two multidimensional data; conversely, the greater the distance metric between two multidimensional data, the less similarity between the two multidimensional data. Illustratively, the distance metric may include a Mean Squared Error (MES) distance, an L1 distance, or a Mean Squared Error MSE equidistant. It should be understood that the image content difference may be determined based on other distance measures besides the MES distance and the L1 distance, and the embodiment does not specifically limit the distance measure.

The following introduces a structural schematic of a feature extraction network that extracts image features of respective candidate images based on CNN:

in an optional implementation, the feature extraction network may include a backbone network backbone and a Feature Pyramid Network (FPN), where the backbone network backbone is configured to receive an input picture, perform convolution processing on the input picture, and output feature maps with different resolutions corresponding to the picture; that is, feature maps corresponding to different sizes of the picture are output. The backbone network can perform a series of convolution processes on the input picture to obtain feature maps (feature maps) at different scales. The backbone network may take various forms, such as a Visual Geometry Group (VGG), a residual neural network (net), a core structure of google lenet (inclusion-net), and the like.

The FPN is connected with the backbone network backbone, and the FPN can perform convolution processing on a plurality of feature maps with different resolutions generated by the backbone network backbone to construct a feature pyramid.

And the feature map output by the feature pyramid can be used as an image feature vector of each candidate image.

In the embodiment of the present application, a selected weight of each candidate image may be determined according to the image quality difference and the image content difference corresponding to each candidate image, where the selected weight is positively correlated with the image quality difference and the image content difference, and the selected weight is used as a selection basis of a test sample. Specifically, M target images having a selection weight greater than a second threshold may be selected from the plurality of candidate images according to the selection weight of each candidate image. In one implementation, the M target images selected with the greatest weight may be selected from the plurality of candidate images.

In one implementation, the selected weight may be a result of a weighted summation of the image quality difference and the image content difference, i.e. the selected weight is a result of a summation of a first weight being a product of the image quality difference and a corresponding weight and a second weight being a product of the image content difference and a corresponding weight. The weights corresponding to the image quality difference and the image content difference can be adjusted by the user as needed, depending on whether the user is more concerned about the quality difference or about the diversity difference of the samples.

Specifically, in the process of selecting the target image, an image may be randomly sampled from a plurality of candidate images, added to a candidate set of the test image, and an image quality difference corresponding to the candidate image is calculated, then a candidate image is sampled from the plurality of candidate images, added to the candidate set, and an image quality difference corresponding to the candidate image and an image content difference between the candidate image and the candidate image already added to the candidate set are calculated, so as to obtain the selected weight of the candidate image, then a candidate image is sampled from the plurality of candidate images, added to the candidate set, and an image quality difference corresponding to the candidate image and an image content difference between the candidate image and two candidate images already added to the candidate set are calculated (specifically, an average value of the image content differences between two candidate images in the candidate set and each candidate image in the two candidate images in the candidate set may be used), the selected weight of the candidate image is obtained, and so on until M candidate images are sampled, at which time M candidate images may be included in the candidate set, and the selected weight of each of the M candidate images is calculated. Then, one image is sampled randomly from a plurality of candidate images, and the image quality difference corresponding to the candidate image and the image content difference between the candidate image and the M candidate images already added to the candidate set are calculated (specifically, the image content difference may be an average value of the image content differences between the candidate image and each candidate image in the M candidate images in the candidate set), so as to obtain the selected weight of the candidate image, then the selected weights of the sampled M +1 candidate images are ranked, and the largest M selected weights are taken as a new candidate set (that is, one image in the original M candidate images is removed, or the latest sampled image is not added). And repeating the above steps until all candidate images are sampled, wherein M candidate images in the candidate set may be selected M target images, and each of the M target images has a large difference in image quality and a large difference in content between the images.

Taking the first image processing model and the second image processing model as the Shao20 algorithm and FFA-Net as examples, as shown in fig. 10c, fig. 10c shows that there is a significant quality difference in defogging effect for the selected target images, and the difference in image content is large (one is architectural and one is portrait), which meets the selection requirement of the test sample.

After acquiring M target images used as test samples of the first image processing model and the second image processing model at the time of performance comparison, the first image processing model and the second image processing model may process the M target images, and an expert may subjectively evaluate the quality of the processing results of the first image processing model and the second image processing model and compare the performance superiority and inferiority between the first image processing model and the second image processing model based on the subjective quality evaluation result.

It should be understood that in some scenarios, if only whether the scene coverage between the test samples is complete is considered, the image content difference between the candidate images may be used as the selection basis of the test samples, and the image quality difference is not used as the selection basis of the test samples, specifically, a plurality of candidate images may be obtained, the image content difference between each candidate image and the other candidate images except for the candidate image itself in the plurality of candidate images may be obtained, and M target images may be selected from the plurality of candidate images according to the image content difference corresponding to each candidate image.

The above describes the selection process of the test sample, and then describes the process of performing the model performance test with reference to the selected test sample.

Referring to fig. 10d, fig. 10d is a flowchart illustrating a method for testing a model according to an embodiment of the present application, and as shown in fig. 10d, the method includes:

1001. acquiring M target images;

and the M target images are used as test samples of the first image processing model and the second image processing model in performance comparison.

1002. Respectively processing each target image through a first image processing model and a second image processing model to obtain a first processed image and a second processed image corresponding to each target image; the first image processing model and the second image processing model are used for realizing the same image processing task, the first processed image is obtained by processing the first image processing model, and the second processed image is obtained by processing the second image processing model; and the image quality difference between the first processing image and the second processing image corresponding to each target image is larger than a first threshold value.

1003. And obtaining a model test result, wherein the model test result is used for representing a performance comparison result between the first image processing model and the second image processing model, and the model test result is determined according to the first processing image and the second processing image corresponding to each target image.

After obtaining the first processed image and the second processed image corresponding to each target image, the model tester may perform performance comparison between the first image processing model and the second image processing model based on the first processed image and the second processed image corresponding to each target image, and feed back the model performance comparison result, so that the model performance comparison result may be used as the obtained model test result.

Referring to fig. 11, fig. 11 is a schematic structural diagram of an image selection apparatus 1100 according to an embodiment of the present application, and as shown in fig. 11, the apparatus includes:

an obtaining module 1101 configured to obtain a plurality of candidate images;

an image processing module 1102, configured to process each candidate image through a first image processing model and a second image processing model respectively to obtain a first processed image and a second processed image corresponding to each candidate image; wherein the first image processing model and the second image processing model are used to implement the same image processing task; the first processed image is obtained by processing through the first image processing model, and the second processed image is obtained by processing through the second image processing model;

an image selecting module 1103, configured to obtain an image quality difference between a first processed image and a second processed image corresponding to each candidate image, and select M target images from the multiple candidate images according to the image quality difference, where the M target images are used as test samples of the first image processing model and the second image processing model when performing performance comparison.

In one possible implementation, the image selecting module 1103 is configured to select M target images from the plurality of candidate images, where corresponding image quality differences are greater than a first threshold, according to the corresponding image quality difference of each candidate image.

In one possible implementation, the image selecting module 1103 is configured to determine, according to the image quality difference and the image content difference corresponding to each candidate image, a selected weight of each candidate image, where the selected weight is positively correlated to the image quality difference and the image content difference;

Referring to fig. 12, fig. 12 is a schematic structural diagram of an image selecting apparatus 1200 according to an embodiment of the present application, and as shown in fig. 12, the apparatus 1200 includes:

an obtaining module 1201, configured to obtain a plurality of candidate images;

an image selection module 1202, configured to select M target images from the multiple candidate images according to the image content difference corresponding to each candidate image.

In one possible implementation, the image selecting module 1202 is configured to select M target images with image content differences larger than a third threshold from the plurality of candidate images according to the image content difference corresponding to each candidate image.

In one possible implementation, the apparatus further comprises:

Referring to fig. 13, fig. 13 is a schematic structural diagram of a model testing apparatus 1300 according to an embodiment of the present application, and as shown in fig. 13, the apparatus 1300 includes:

an obtaining module 1301, configured to obtain M target images;

an image processing module 1302, configured to process each target image through a first image processing model and a second image processing model respectively to obtain a first processed image and a second processed image corresponding to each target image; the first image processing model and the second image processing model are used for realizing the same image processing task, the first processed image is obtained by processing the first image processing model, and the second processed image is obtained by processing the second image processing model; and the image quality difference between the first processing image and the second processing image corresponding to each target image is larger than a first threshold value; and the first processing image and the second processing image corresponding to each target image are used for performing performance comparison between the first image processing model and the second image processing model.

The obtaining module 1301 is further configured to obtain a model test result, where the model test result is used to represent a performance comparison result between the first image processing model and the second image processing model, and the model test result is determined according to the first processed image and the second processed image corresponding to each target image.

Referring to fig. 14, fig. 14 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present disclosure, and the image processing apparatus 1400 may be embodied as a mobile phone, a tablet, a notebook computer, an intelligent wearable device, a server, and the like, which is not limited herein. The image processing apparatus 1400 may perform the image selection method described in the embodiments corresponding to fig. 8 and fig. 10a, and the model test method described in the embodiments corresponding to fig. 10 d. Specifically, the image processing apparatus 1400 includes: a receiver 1401, a transmitter 1402, a processor 1403 and a memory 1404 (wherein the number of processors 1403 in the image selection apparatus 1400 may be one or more, for example one processor in fig. 14), wherein the processor 1403 may include an application processor 14031 and a communication processor 14032. In some embodiments of the present application, the receiver 1401, the transmitter 1402, the processor 1403, and the memory 1404 may be connected by a bus or other means.

The memory 1404 may include a read-only memory and a random access memory, and provides instructions and data to the processor 1403. A portion of memory 1404 may also include non-volatile random access memory (NVRAM). The memory 1404 stores a processor and operating instructions, executable modules or data structures, or a subset thereof, or an expanded set thereof, wherein the operating instructions may include various operating instructions for performing various operations.

The processor 1403 controls the operation of the execution apparatus. In a particular application, the various components of the execution device are coupled together by a bus system that may include a power bus, a control bus, a status signal bus, etc., in addition to a data bus. For clarity of illustration, the various buses are referred to in the figures as a bus system.

The method disclosed in the embodiments of the present application may be applied to the processor 1403, or implemented by the processor 1403. The processor 1403 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method can be performed by hardware integrated logic circuits or instructions in software form in the processor 1403. The processor 1403 may be a general-purpose processor, a Digital Signal Processor (DSP), a microprocessor or a microcontroller, a Vision Processor (VPU), a Tensor Processing Unit (TPU), or other processors suitable for AI operation, and may further include an Application Specific Integrated Circuit (ASIC), a field-programmable gate array (FPGA) or other programmable logic device, a discrete gate or transistor logic device, and a discrete hardware component. The processor 1403 may implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present application. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 1404, and the processor 1403 reads the information in the memory 1404 and completes the steps of the above method in combination with the hardware thereof.

The receiver 1401 may be used to receive input numeric or character information and to generate signal inputs related to performing relevant settings and function control of the device. The transmitter 1402 may be used to output numeric or character information through a first interface; the transmitter 1402 may also be configured to send instructions to the disk pack via the first interface to modify data in the disk pack; the transmitter 1402 may also include a display device such as a display screen. The execution device may execute the image selection method described in the embodiments corresponding to fig. 8 and 10a, and the model test method described in the embodiments corresponding to fig. 10 d.

Embodiments of the present application also provide a computer program product, which when executed on a computer causes the computer to perform the steps performed by the aforementioned execution device, or causes the computer to perform the steps performed by the aforementioned training device.

Also provided in an embodiment of the present application is a computer-readable storage medium, in which a program for signal processing is stored, and when the program is run on a computer, the program causes the computer to execute the steps executed by the aforementioned execution device, or causes the computer to execute the steps executed by the aforementioned training device.

The execution device, the training device, or the terminal device provided in the embodiment of the present application may specifically be a chip, where the chip includes: a processing unit, which may be for example a processor, and a communication unit, which may be for example an input/output interface, a pin or a circuit, etc. The processing unit may execute the computer execution instructions stored by the storage unit to cause the chip in the execution device to execute the data processing method described in the above embodiment, or to cause the chip in the training device to execute the data processing method described in the above embodiment. Optionally, the storage unit is a storage unit in the chip, such as a register, a cache, and the like, and the storage unit may also be a storage unit located outside the chip in the wireless access device, such as a read-only memory (ROM) or another type of static storage device that can store static information and instructions, a Random Access Memory (RAM), and the like.

It should be noted that the above-described embodiments of the apparatus are merely schematic, where the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on multiple network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. In addition, in the drawings of the embodiments of the apparatus provided in the present application, the connection relationship between the modules indicates that there is a communication connection therebetween, and may be implemented as one or more communication buses or signal lines.

Through the above description of the embodiments, those skilled in the art will clearly understand that the present application can be implemented by software plus necessary general-purpose hardware, and certainly can also be implemented by special-purpose hardware including special-purpose integrated circuits, special-purpose CPUs, special-purpose memories, special-purpose components and the like. Generally, functions performed by computer programs can be easily implemented by corresponding hardware, and specific hardware structures for implementing the same functions may be various, such as analog circuits, digital circuits, or dedicated circuits. However, for the present application, the implementation of a software program is more preferable. Based on such understanding, the technical solutions of the present application may be substantially embodied in the form of a software product, which is stored in a readable storage medium, such as a floppy disk, a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk of a computer, and includes several instructions for enabling a computer device (which may be a personal computer, an exercise device, or a network device) to execute the method according to the embodiments of the present application.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product.

The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, training device, or data center to another website site, computer, training device, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that a computer can store or a data storage device, such as a training device, a data center, etc., that incorporates one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

Claims

1. An image selection method, characterized in that the method comprises:

acquiring a plurality of candidate images;

respectively processing each candidate image through a first image processing model and a second image processing model to obtain a first processed image and a second processed image corresponding to each candidate image; wherein the first image processing model and the second image processing model are used to implement the same image processing task; the first processed image is obtained by processing through the first image processing model, and the second processed image is obtained by processing through the second image processing model;

2. The method of claim 1, wherein selecting M target images from the plurality of candidate images according to the image quality difference comprises:

3. The method of claim 1, further comprising:

4. The method of claim 3, wherein selecting M target images from the plurality of candidate images according to the image quality difference and the image content difference corresponding to each candidate image comprises:

5. The method according to claim 3 or 4, wherein the selected weight is a sum of a first weight and a second weight, the first weight being a product of the image quality difference and a corresponding weight, the second weight being a product of the image content difference and a corresponding weight.

6. The method of any of claims 1 to 5, wherein the first image processing model and the second image processing model are used to implement an image enhancement task.

7. A method of model testing, the method comprising:

acquiring M target images;

8. The method according to claim 7, wherein the selected weight of each of the M target images is greater than a second threshold, the selected weight of each target image being positively correlated to the image quality difference and the image content difference corresponding to each target image, wherein the image content difference corresponding to each target image is used to represent the image content difference between each target image and the other target images except for the target image itself.

9. The method of claim 8, wherein the selected weight is a sum of a first weight and a second weight, the first weight being a product of the image quality difference and a corresponding weight, and the second weight being a product of the image content difference and a corresponding weight.

10. The method according to any of claims 7 to 9, wherein the first image processing model and the second image processing model are used to implement an image enhancement task.

11. An image selection apparatus, characterized in that the apparatus comprises:

an acquisition module for acquiring a plurality of candidate images;

12. The apparatus of claim 11, wherein the image selecting module is configured to select M target images from the plurality of candidate images, each of the M target images having an image quality difference greater than a first threshold, according to the image quality difference corresponding to each of the candidate images.

13. The apparatus according to claim 11, wherein the obtaining module is configured to obtain an image content difference between each candidate image and other candidate images in the plurality of candidate images except for itself;

14. The apparatus of claim 13, wherein the image selection module is configured to determine a selected weight of each candidate image according to the image quality difference and the image content difference corresponding to the candidate image, and the selected weight is positively correlated to the image quality difference and the image content difference;

15. The apparatus according to claim 13 or 14, wherein the selected weight is a sum of a first weight and a second weight, the first weight being a product of the image quality difference and a corresponding weight, and the second weight being a product of the image content difference and a corresponding weight.

16. The apparatus of any of claims 11 to 15, wherein the first image processing model and the second image processing model are used to implement an image enhancement task.

17. A model testing apparatus, characterized in that the apparatus comprises:

the acquisition module is used for acquiring M target images;

the image processing module is used for respectively processing each target image through a first image processing model and a second image processing model to obtain a first processing image and a second processing image corresponding to each target image; the first image processing model and the second image processing model are used for realizing the same image processing task, the first processed image is obtained by processing the first image processing model, and the second processed image is obtained by processing the second image processing model; and the image quality difference between the first processing image and the second processing image corresponding to each target image is larger than a first threshold value;

18. The apparatus of claim 17, wherein the selected weight of each of the M target images is greater than a second threshold, and wherein the selected weight of each target image is positively correlated to the image quality difference and the image content difference corresponding to each target image, and wherein the image content difference corresponding to each target image is used to represent the image content difference between each target image and the other target images except for the target image.

19. The apparatus of claim 18, wherein the selected weight is a sum of a first weight and a second weight, the first weight being a product of the image quality difference and a corresponding weight, and the second weight being a product of the image content difference and a corresponding weight.

20. The apparatus of any of claims 17 to 19, wherein the first image processing model and the second image processing model are used to implement an image enhancement task.

21. An image selection apparatus, characterized in that the apparatus comprises a memory and a processor; the memory stores code, and the processor is configured to retrieve the code and perform the method of any of claims 1 to 10.

22. A computer storage medium, characterized in that the computer storage medium stores one or more instructions that, when executed by one or more computers, cause the one or more computers to implement the method of any of claims 1 to 10.

23. A computer product comprising code that when executed is operable to implement the method of any of claims 1 to 10.