WO2023272431A1

WO2023272431A1 - Image processing method and apparatus

Info

Publication number: WO2023272431A1
Application number: PCT/CN2021/102739
Authority: WO
Inventors: 伍玮翔; 伍文龙
Original assignee: 华为技术有限公司
Priority date: 2021-06-28
Filing date: 2021-06-28
Publication date: 2023-01-05
Also published as: CN117529725A

Abstract

The present application relates to the field of artificial intelligence, and specifically relates to the field of computer vision. Provided are an image processing method and apparatus. The method comprises: processing an input image by means of at least one image processing module, taking the processing result as an input of a visual task model, and adjusting the at least one image processing module according to a processing result of the visual task model. According to the solution of the present application, an image processing flow suitable for a visual task model can be obtained, thereby facilitating improving the performance of the visual task model.

Description

Image processing method and device

technical field

The present application relates to the field of computer vision, and more specifically, to an image processing method and device.

Background technique

Computer vision is an integral part of various intelligent/autonomous systems in various application fields such as manufacturing, inspection, document analysis, medical diagnosis, and military. What we need is the knowledge of the data and information of the subject being photographed. To put it figuratively, it is to install eyes (cameras/video cameras) and brains (algorithms) on computers to replace human eyes to identify, track and measure targets, so that computers can perceive the environment. Computer vision can be seen as the science of how to make artificial systems "perceive" from images or multidimensional data. In general, computer vision is to use various imaging systems to replace the visual organs to obtain input information, and then use the computer to replace the brain to complete the processing and interpretation of these input information.

Computer vision tasks include tasks such as image classification, object detection, object tracking, and object segmentation. In practical applications, a series of image signal processing (image signal processing, ISP) is usually performed on the raw (raw) image to output a visualized image. This visualized image can be used as an input image for computer vision tasks. However, the purpose of ISP is usually to meet human visual needs. In fact, the image obtained after a series of image signal processing can meet human visual needs, but performing visual tasks based on the image may not necessarily obtain ideal processing results.

Contents of the invention

The present application provides an image processing method and device, which can obtain an image processing flow suitable for a visual task and improve the performance of a visual task model.

In a first aspect, an image processing method is provided, the method comprising: acquiring a first image; processing the first image through at least one image processing module to obtain a second image; inputting the second image into a visual task model for processing Processing; adjusting at least one image processing module according to the processing results of the visual task model.

In the solution of the embodiment of the present application, the image processing flow is adjusted according to the processing result of the visual task model, which is beneficial to obtain an image suitable for the visual task, so as to ensure the performance of the visual task model. The solutions of the embodiments of the present application can adjust the image processing flow according to the requirements of different application scenarios, so as to adapt to different application scenarios.

Exemplarily, the first image may be a raw image acquired by a sensor.

The image processing module is used for image signal processing on the input image.

Exemplarily, the second image may be an RGB image.

Optionally, processing the first image through at least one image processing module to obtain the second image includes: processing the first image through at least one image processing module and the weight of the at least one image processing module to obtain the second image .

Specifically, the processing result of the at least one image processing module is adjusted according to the weight of the at least one image processing module to obtain the second image.

Exemplarily, the vision task includes: target detection, image classification, target segmentation, target tracking, or image recognition.

The visual task model is used to perform visual tasks. For example, if the vision task is target detection, then the vision task model is the target detection model. For another example, if the visual task is image recognition, then the visual task model is an image recognition model.

The vision task model can be a trained model.

The processing results of the vision task model may include performance indicators of the vision task model.

Exemplarily, the performance index of the vision task model includes the accuracy of reasoning or the value of the loss function. The loss function can be set as needed. The loss function is used to indicate the difference between the inference result of the vision task model and the true value corresponding to the first image. It should be noted that the loss function here may be the loss function in the training process of the vision task model, or other forms of loss functions may also be used.

For example, if the vision task is target detection, the processing result of the vision task model may include detection accuracy.

For another example, if the vision task is target segmentation, the processing result of the vision task model may include segmentation accuracy.

The visual task model may use a neural network model, or may also use a non-neural network model.

The at least one image processing module is adjusted according to the processing result of the visual task model, so that the processing result of the visual task model is as close to expectation as possible.

Exemplarily, the at least one image adjustment module may be adjusted by means of a Bayesian optimization method, an RNN model, or a reinforcement learning algorithm.

With reference to the first aspect, in some implementations of the first aspect, adjusting at least one image processing module according to the processing result of the visual task model includes: adjusting the at least one image processing module according to the time of image processing and the processing result of the visual task model module.

The image processing time may be the processing time of the visual task model, or may also be the processing time of the at least one image processing module, or may also be the difference between the processing time of the visual task model and the processing time of the at least one image processing module. sum.

In this way, the processing speed can be improved and the time delay can be reduced under the premise of ensuring the performance of the visual task model.

With reference to the first aspect, in some implementations of the first aspect, at least one image processing module includes multiple image processing modules, and adjusting at least one image processing module according to the processing results of the visual task model includes: changing the at least one image processing module module.

Changing the at least one image processing module may include: deleting some image processing modules in the at least one image processing module or/and adding other image processing modules.

In the solution of the embodiment of the present application, the combination of image processing modules is changed according to the processing results of the visual task model, so that a combination of image processing modules more suitable for the visual task model can be obtained, which is conducive to improving the performance of the visual task model.

With reference to the first aspect, in some implementations of the first aspect, at least one image processing module includes multiple image processing modules, and adjusting at least one image processing module according to the processing results of the visual task model includes: processing according to the visual task model As a result, some image processing modules among the plurality of image processing modules are deleted.

In the solution of the embodiment of the present application, some image processing modules are deleted according to the processing results of the visual task model, which can reduce the time required for image processing, increase the processing speed, and reduce the requirement for computing power.

With reference to the first aspect, in some implementations of the first aspect, deleting some of the image processing modules in the multiple image processing modules according to the processing results of the visual task model includes: adjusting multiple image processing modules according to the processing results of the visual task model The weight of the module, the weights of the multiple image processing modules are used to process the processing results of the multiple image processing modules to obtain the second image; delete the parts of the multiple image processing modules according to the adjusted weights of the multiple image processing modules Image processing module.

In the scheme of the embodiment of the present application, the image processing module to be deleted is determined according to the weight of each image processing module, and the image processing module with a relatively small weight value is deleted, so that the impact on the processing result of the visual task model is small, and the visual impact after deletion is relatively small. The performance of the task model is less affected. That is to say, the solutions of the embodiments of the present application can reduce unnecessary operations, reduce computing overhead, and improve processing speed on the premise of ensuring the performance of the visual task model.

Exemplarily, the multiple image processing modules are m image processing modules. m is an integer greater than 1. The n image processing modules with the smallest adjusted weights are deleted from the m image processing modules. n is an integer greater than 1 and less than m.

Alternatively, an image processing module whose adjusted weight is less than or equal to a weight threshold is deleted from the m image processing modules.

With reference to the first aspect, in some implementation manners of the first aspect, adjusting at least one image processing module according to a processing result of the visual task model includes: adjusting parameters in at least one image processing module according to a processing result of the visual task model.

In the solutions of the embodiments of the present application, by adjusting the parameters in the image processing module according to the processing results of the visual task model, an image processing module more suitable for the visual task can be obtained, which is conducive to improving the accuracy of the visual task.

With reference to the first aspect, in some implementations of the first aspect, adjusting at least one image processing module according to the processing result of the visual task model includes: deleting part of the image processing module from multiple image processing modules according to the processing result of the visual task model module; the fifth image is processed by the undeleted image processing module in the plurality of image processing modules to obtain the sixth image, and the sixth image is input into the visual task model for processing; according to the processing result of the visual task model, the undeleted The parameters of the image processing module that were removed.

According to the solution of the embodiment of the present application, the performance indicators obtained by the visual task model, such as the accuracy of target detection, the accuracy of target segmentation, etc., are used to adjust the weights of multiple image processing modules, so as to keep the performance indicators that have a relatively small impact on the visual task model. A large image processing module, or in other words, an image processing module that maintains or improves the performance indicators of the vision task model. In this way, an image processing module suitable for the visual task model can be obtained, or in other words, the image processing module required by the visual task model can be obtained, which reduces the time required for the image processing process, saves computing overhead, reduces the demand for computing power, and requires more hardware. friendly.

Moreover, using the performance index obtained from the visual task model to adjust the parameters in the reserved image processing module, for example, using the performance index obtained from the visual task model to search the design space of the image processing module is conducive to obtaining the optimal value of each image processing module. parameter configuration to improve the performance of the vision task model.

With reference to the first aspect, in some implementation manners of the first aspect, adjusting the at least one image processing module according to the processing result of the visual task model includes: adjusting the processing sequence of the at least one image processing module according to the processing result of the visual task model.

In the solution of the embodiment of the present application, the processing sequence of the image processing module is adjusted according to the processing result of the visual task model, so that an image processing flow more suitable for the visual task can be obtained, which is conducive to improving the accuracy of the visual task.

In combination with the first aspect, in some implementations of the first aspect, at least one image processing module includes: a black level compensation module, a green balance module, a bad pixel correction module, a demosaic module, a Bayer noise reduction module, an automatic white balance module, color correction module, gamma correction module or noise reduction and sharpening module.

Any image processing module in the at least one image processing module may be implemented by a neural network algorithm, or may also be implemented by a non-neural network algorithm.

In a second aspect, an image processing method is provided, the method comprising: acquiring a third image; determining at least one target image processing module according to a visual task model; processing the third image by at least one target image processing module to obtain a fourth An image; the fourth image is processed through the visual task model to obtain a processing result of the fourth image.

According to the solution of the embodiment of the present application, different visual task models correspond to different image processing module configurations. When the visual task model changes, the image processing module can adaptively match the visual task model, making the image processing flow more suitable for the visual task model. , which is beneficial to improve the performance of the vision task model.

Exemplarily, the third image may be a raw image acquired by the sensor.

The processing result of the fourth image can also be understood as the processing result of the third image.

The processing result of the fourth image is the reasoning result of the visual task model.

The at least one target image processing module is one or more image processing modules corresponding to the visual task model.

The vision task model can be a trained model.

In different application scenarios, different visual task models may be used, and accordingly, at least one target image processing module matching the visual task model may be determined according to different visual task models. In this way, different image processing modules can be selected according to different application scenarios.

There is a corresponding relationship between the vision task model and the configuration of the image processing module. According to the corresponding relationship between the visual task model and the configuration of the image processing module, the configuration of the image processing module matching the current visual task model can be determined.

Exemplarily, the configuration of the image processing modules includes at least one of the following: a combination of image processing modules, a weight of the image processing modules, a processing order of the image processing modules, or parameters in the image processing modules.

With reference to the second aspect, in some implementations of the second aspect, determining at least one target image processing module according to the visual task model includes: determining at least one target image processing module from multiple candidate image processing modules according to the visual task model.

According to the solution of the embodiment of the present application, different visual task models correspond to different combinations of image processing modules. When the visual task model changes, the combination of image processing modules can adaptively match the visual task model, so that the current image processing module The combination is more suitable for the current visual task model and is beneficial to improve the performance of the visual task model.

Moreover, by selecting a suitable image processing module from multiple candidate image processing modules according to the visual task model, it is not necessary to use all the candidate image processing modules to process images, which reduces the processing flow and reduces the requirement for computing power.

There is a correspondence between the combination of the visual task model and the image processing module. According to the corresponding relationship, the combination of image processing modules corresponding to the current visual task model can be determined, or in other words, the image processing module required for the visual task model can be determined according to the corresponding relationship, that is, the at least one target image processing module .

With reference to the second aspect, in some implementations of the second aspect, determining at least one target image processing module according to the visual task model includes: determining the weight of at least one target image processing module according to the visual task model, and at least one target image processing module The weights of are used to process the processing result of at least one target image processing module to obtain a fourth image.

According to the scheme of the embodiment of the present application, different visual task models correspond to the weights of different image processing modules. When the visual task model changes, the weight of the image processing module can adaptively match the visual task model, so that the current image processing module The weights are more suitable for the current visual task model, which is beneficial to improve the performance of the visual task model.

With reference to the second aspect, in some implementation manners of the second aspect, determining at least one target image processing module according to the visual task model includes: determining parameters in the at least one target image processing module according to the visual task model.

According to the scheme of the embodiment of the present application, different visual task models correspond to different parameters in the image processing module. When the visual task model changes, the parameters in the image processing module can adaptively match the visual task model, so that the current image processing The parameters in the module are more suitable for the current vision task model, which is beneficial to improve the performance of the vision task model.

There is a corresponding relationship between the visual task model and the parameters in the image processing module. According to the visual task model, parameters in the image processing module corresponding to the visual task model can be determined, that is, parameters in the at least one target image processing module.

With reference to the second aspect, in some implementations of the second aspect, it is characterized in that determining at least one target image processing module according to the visual task model includes: determining a processing order of the at least one target image processing module according to the visual task model.

According to the solution of the embodiment of the present application, different visual task models correspond to different processing sequences of image processing modules. When the visual task model changes, the processing sequence of the image processing modules can adaptively match the visual task model, so that the current image processing The processing order of the modules is more suitable for the current vision task model, which is beneficial to improve the performance of the vision task model.

There is a corresponding relationship between the visual task model and the processing sequence of the image processing module. According to the corresponding relationship, the processing order of the image processing modules corresponding to the current visual task model can be determined, that is, the processing order of the at least one target image processing module.

In combination with the second aspect, in some implementations of the second aspect, at least one target image processing module includes: a black level compensation module, a green balance module, a dead pixel correction module, a demosaic module, a Bayer noise reduction module, an automatic white Balance Module, Color Correction Module, Gamma Correction Module or Noise Reduction and Sharpening Module.

In a third aspect, an image processing apparatus is provided, and the apparatus includes a module or unit for executing the method in any one of the above-mentioned first aspect and the first aspect.

According to a fourth aspect, an image processing device is provided, and the device includes a module or unit for executing the method in any one of the above-mentioned second aspect and the second aspect.

It should be understood that the expansion, limitation, explanation and illustration of the related content in the above first aspect are also applicable to the same content in the second aspect, the third aspect and the fourth aspect.

In a fifth aspect, an image processing device is provided, the device comprising: a memory for storing a program; a processor for executing the program stored in the memory, and when the program stored in the memory is executed, the processing The device is used to execute the first aspect and the method in any one of the implementation manners of the first aspect.

The processor in the fifth aspect above can be either a central processing unit (central processing unit, CPU), or a combination of a CPU and a neural network computing processor, where the neural network computing processor can include a graphics processing unit (graphics processing unit). unit, GPU), neural-network processing unit (NPU) and tensor processing unit (TPU), etc. Among them, TPU is an artificial intelligence accelerator ASIC fully customized by Google for machine learning.

According to a sixth aspect, an image processing device is provided, which includes: a memory for storing programs; a processor for executing the programs stored in the memory, and when the programs stored in the memory are executed, the processing The device is configured to execute the second aspect and the method in any one implementation manner of the second aspect.

The processor in the sixth aspect above can be a central processing unit, or a combination of a CPU and a neural network computing processor, where the neural network computing processor can include a graphics processor, a neural network processor and a tensor processor wait. Among them, TPU is Google's fully customized artificial intelligence accelerator ASIC for machine learning.

In a seventh aspect, there is provided a computer-readable storage medium, where the computer-readable medium stores program code for execution by a device, and the program code includes a program code for executing any one of the implementation manners of the first aspect or the second aspect. method.

In an eighth aspect, a computer program product containing instructions is provided, and when the computer program product is run on a computer, the computer is made to execute the method in any one of the above-mentioned first aspect or the second aspect.

In the ninth aspect, there is provided a chip, the chip includes a processor and a data interface, and the processor reads the instructions stored in the memory through the data interface, and executes any one of the above-mentioned first aspect or the second aspect method in the implementation.

Optionally, as an implementation manner, the chip may further include a memory, the memory stores instructions, the processor is configured to execute the instructions stored in the memory, and when the instructions are executed, the The processor is configured to execute the method in any one of the implementation manners of the first aspect or the second aspect.

The aforementioned chip may specifically be a field-programmable gate array (field-programmable gate array, FPGA) or an application-specific integrated circuit (application-specific integrated circuit, ASIC).

Description of drawings

FIG. 1 is a schematic structural diagram of a system architecture provided in an embodiment of the present application;

FIG. 2 is a schematic diagram of an image processing flow provided by an embodiment of the present application;

FIG. 3 is a schematic flowchart of an image processing method provided in an embodiment of the present application;

FIG. 4 is a schematic diagram of another image processing flow provided by the embodiment of the present application;

FIG. 5 is a schematic diagram of another image processing flow provided by the embodiment of the present application;

FIG. 6 is a schematic flowchart of another image processing method provided by the embodiment of the present application;

Fig. 7 is a schematic block diagram of an image processing device provided by an embodiment of the present application;

Fig. 8 is a schematic block diagram of another image processing apparatus provided by an embodiment of the present application.

detailed description

The technical solution in this application will be described below with reference to the accompanying drawings.

The embodiments of the present application can be applied in fields such as automatic driving, image classification, image retrieval, image semantic segmentation, image quality enhancement, image super-resolution, monitoring, object tracking, object detection, etc. that need to perform visual tasks.

Specifically, the method in the embodiment of the present application can be applied in picture classification and monitoring scenarios, and the following two application scenarios are briefly introduced respectively.

Image classification:

When a user stores a large number of pictures on a terminal device (for example, a mobile phone) or a cloud disk, it is convenient for the user or the system to classify and manage the album by identifying the images in the album, thereby improving user experience.

Using the image processing method of the embodiment of the present application, an image suitable for performing a classification task can be obtained, and the accuracy of classification can be improved. In addition, it can reduce the image processing process, reduce hardware overhead, be more friendly to terminal equipment, increase the speed of classifying pictures, and help to label pictures of different categories in real time, which is convenient for users to view and find. In addition, the classification tags of these pictures can also be provided to the album management system for classification management, which saves management time for users, improves the efficiency of album management, and improves user experience.

monitor:

Surveillance scenarios include: smart city, field surveillance, indoor surveillance, outdoor surveillance, in-vehicle surveillance, etc. Among them, in the smart city scene, multiple attribute recognition is required, such as pedestrian attribute recognition and riding attribute recognition. Deep neural networks play an important role in multiple attribute recognition by virtue of their powerful capabilities.

By adopting the image processing method of the embodiment of the present application, an image suitable for performing an attribute recognition task can be obtained, and the accuracy of recognition can be improved. In addition, the image processing flow can be reduced, the hardware overhead can be reduced, and the processing efficiency can be improved, which is conducive to real-time processing of the input road picture and faster recognition of different attribute information in the road picture.

Since the embodiment of the present application involves the application of a large number of neural networks, for ease of understanding, the following first introduces the related terms and concepts of the neural network that may be involved in the embodiment of the present application.

(1) neural network

A neural network can be composed of neural units, and a neural unit can refer to an operation unit that takes x _s and an intercept 1 as input, and the output of the operation unit can be:

Among them, s=1, 2, ... n, n is a natural number greater than 1, W _s is the weight of x _s , and b is the bias of the neuron unit.

f is the activation function of the neural unit, which is used to introduce nonlinear characteristics into the neural network to transform the input signal in the neural unit into an output signal. The output signal of this activation function can be used as the input of the next layer. For example, the activation function can be a ReLU, tanh or sigmoid function.

A neural network is a network formed by connecting multiple above-mentioned single neural units, that is, the output of one neural unit can be the input of another neural unit. The input of each neural unit can be connected with the local receptive field of the previous layer to extract the features of the local receptive field. The local receptive field can be an area composed of several neural units.

(2) Deep Neural Network

Deep neural network (DNN), also known as multi-layer neural network, can be understood as a neural network with multiple hidden layers. DNN is divided according to the position of different layers, and the neural network inside DNN can be divided into three categories: input layer, hidden layer, and output layer. Generally speaking, the first layer is the input layer, the last layer is the output layer, and the layers in the middle are all hidden layers. The layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.

Although DNN looks complicated, it is actually not complicated in terms of the work of each layer. In simple terms, it is the following linear relationship expression:

in,

is the input vector,

is the output vector,

Is the offset vector, W is the weight matrix (also called coefficient), and α() is the activation function. Each layer is just an input vector

After such a simple operation, the output vector is obtained. Due to the large number of DNN layers, the coefficient W and the offset vector

The number is also higher. The definition of these parameters in DNN is as follows: Take the coefficient W as an example: Assume that in a three-layer DNN, the linear coefficient from the fourth neuron of the second layer to the second neuron of the third layer is defined as

The superscript 3 represents the layer number of the coefficient W, and the subscript corresponds to the output third layer index 2 and the input second layer index 4.

In summary, the coefficient from the kth neuron of the L-1 layer to the jth neuron of the L layer is defined as

It should be noted that the input layer has no W parameter. In deep neural networks, more hidden layers make the network more capable of describing complex situations in the real world. Theoretically speaking, a model with more parameters has a higher complexity and a greater "capacity", which means that it can complete more complex learning tasks. Training the deep neural network is the process of learning the weight matrix, and its ultimate goal is to obtain the weight matrix of all layers of the trained deep neural network (the weight matrix formed by the vector W of many layers).

(3) Convolutional neural network

Convolutional neural network (CNN) is a deep neural network with a convolutional structure. The convolutional neural network contains a feature extractor composed of a convolutional layer and a subsampling layer, which can be regarded as a filter. The convolutional layer refers to the neuron layer that performs convolution processing on the input signal in the convolutional neural network. In the convolutional layer of a convolutional neural network, a neuron can only be connected to some adjacent neurons. A convolutional layer usually contains several feature planes, and each feature plane can be composed of some rectangularly arranged neural units. Neural units of the same feature plane share weights, and the shared weights here are convolution kernels. Shared weights can be understood as a way to extract image information that is independent of location. The convolution kernel can be formalized as a matrix of random size, and the convolution kernel can obtain reasonable weights through learning during the training process of the convolutional neural network. In addition, the direct benefit of sharing weights is to reduce the connections between the layers of the convolutional neural network, while reducing the risk of overfitting.

(4) Recurrent neural network

Recurrent neural networks (RNN) are used to process sequence data. In the traditional neural network model, from the input layer to the hidden layer to the output layer, the layers are fully connected, and each node in each layer is unconnected. Although this ordinary neural network solves many problems, it is still powerless to many problems. For example, if you want to predict what the next word in a sentence is, you generally need to use the previous words, because the preceding and following words in a sentence are not independent. The reason why RNN is called a recurrent neural network is that the current output of a sequence is also related to the previous output. The specific manifestation is that the network will remember the previous information and apply it to the calculation of the current output, that is, the nodes between the hidden layer and the current layer are no longer connected but connected, and the input of the hidden layer not only includes The output of the input layer also includes the output of the hidden layer at the previous moment. In theory, RNN can process sequence data of any length. The training of RNN is the same as that of traditional CNN or DNN. The error backpropagation algorithm is also used, but there is a difference: that is, if the RNN is expanded to the network, then the parameters, such as W, are shared; while the above-mentioned traditional neural network is not the case. And in the gradient descent algorithm, the output of each step depends not only on the network of the current step, but also depends on the state of the previous several steps of the network. This learning algorithm is called back propagation through time (BPTT) based on time.

Since there are already convolutional neural networks, why do we need recurrent neural networks? The reason is simple. In the convolutional neural network, there is a premise that the elements are independent of each other, and the input and output are also independent, such as cats and dogs. But in the real world, many elements are interconnected, such as the change of stocks over time, or a person said: I like to travel, and my favorite place is Yunnan, and I must go there in the future. Fill in the blank here, humans should know that it is to fill in "Yunnan". Because humans will infer based on the content of the context, but how to make the machine do this? RNN came into being. RNN is designed to allow machines to have the ability to remember like humans. Therefore, the output of RNN needs to depend on the current input information and historical memory information.

(5) Loss function

In the process of training the deep neural network, because it is hoped that the output of the deep neural network is as close as possible to the value you really want to predict, you can compare the predicted value of the current network with the target value you really want, and then according to the difference between the two to update the weight vector of each layer of neural network (of course, there is usually a process of optimization before the first update, which is to pre-configure parameters for each layer in the deep neural network), for example, if the predicted value of the network If it is high, adjust the weight vector to make it predict lower, and keep adjusting until the deep neural network can predict the real desired target value or a value very close to the real desired target value. Therefore, it is necessary to pre-define "how to compare the difference between the predicted value and the target value", which is the loss function (loss function) or objective function (objective function), which is used to measure the difference between the predicted value and the target value important equation. Among them, taking the loss function as an example, the higher the output value (loss) of the loss function, the greater the difference. Then the training of the deep neural network becomes a process of reducing the loss as much as possible. Generally, the smaller the loss, the higher the training quality of the deep neural network, and the larger the loss, the lower the training quality of the deep neural network. Similarly, the smaller the loss fluctuation, the more stable the training; the larger the loss fluctuation, the more unstable the training.

As shown in FIG. 1 , the embodiment of the present application provides a system architecture 100 . In FIG. 1 , the data collection device 170 is used to collect training data. For example, for the image processing method of the embodiment of the present application, the training data may include training images and ground truths corresponding to the training images. For example, if the vision task is an image classification task, the ground truth value corresponding to the training image may be the classification result corresponding to the training image, and the classification result of the training image may be the result of manual pre-labeling.

After collecting the training data, the data collection device 170 stores the training data in the database 130 , and the training device 120 obtains the target model/rule 101 based on training data maintained in the database 130 . The target model/rule 101 is the model used by the vision task. For example, if the vision task is an image classification task, the target model/rule 101 may be a network model for image classification.

The training device 120 obtains the target model/rule 101 based on the training data. The training device 120 processes the input raw data and compares the output value with the target value until the difference between the value output by the training device 120 and the target value The value is less than a certain threshold, thus completing the training of the target model/rule 101.

The target model/rule 101 in the embodiment of the present application may specifically be a neural network model. For example, Convolutional Neural Networks or Residual Networks. It should be noted that, in practical applications, the training data maintained in the database 130 may not all be collected by the data collection device 170, but may also be received from other devices. In addition, it should be noted that the training device 120 does not necessarily perform the training of the target model/rules 101 based entirely on the training data maintained by the database 130, and it is also possible to obtain training data from the cloud or other places for model training. Limitations of the Examples.

The target model/rules 101 trained according to the training device 120 can be applied to different systems or devices, such as the execution device 110 shown in FIG. Laptop, augmented reality (augmented reality, AR) AR/virtual reality (virtual reality, VR), vehicle terminal, etc., can also be a server or cloud, etc. In FIG. 1 , the execution device 110 configures an input/output (input/output, I/O) interface 112 for data interaction with external devices, and the user can input data to the I/O interface 112 through the client device 140, and the input data In this embodiment of the application, it may include: data to be processed input by the client device. Exemplarily, the input data may include a raw image in this embodiment of the application.

The preprocessing module 113 is used to perform preprocessing according to the input image received by the I/O interface 112. In the embodiment of the present application, the preprocessing module 113 may be used to perform a series of image signal processing on the input image. The preprocessing module 113 may include one or more image processing modules.

When the execution device 110 preprocesses the input data, or in the calculation module 111 of the execution device 110 performs calculation and other related processing, the execution device 110 can call the data, codes, etc. in the data storage system 150 for corresponding processing , the correspondingly processed data and instructions may also be stored in the data storage system 150 .

Finally, the I/O interface 112 returns the processing result, such as the processing result of the data obtained above, to the client device 140, thereby providing it to the user.

It is worth noting that the training device 120 can generate corresponding target models/rules 101 based on different training data for different goals or different tasks, and the corresponding target models/rules 101 can be used to achieve the above-mentioned goals or complete the above-mentioned task to provide the user with the desired result.

In the situation shown in FIG. 1 , the user can manually specify the input data, and the manual specification can be operated through the interface provided by the I/O interface 112 . In another case, the client device 140 can automatically send the input data to the I/O interface 112 . If the client device 140 is required to automatically send the input data to obtain the user's authorization, the user can set the corresponding authority in the client device 140 . The user can view the results output by the execution device 110 on the client device 140, and the specific presentation form may be specific ways such as display, sound, and action. The client device 140 can also be used as a data collection terminal, collecting the input data input to the I/O interface 112 as shown in the figure and the output results of the output I/O interface 112 as new sample data, and storing them in the database 130 . Of course, the client device 140 may not be used for collection, but the I/O interface 112 directly uses the input data input to the I/O interface 112 as shown in the figure and the output result of the output I/O interface 112 as a new sample. The data is stored in database 130 .

It should be noted that FIG. 1 is only a schematic diagram of a system architecture provided by the embodiment of the present application, and the positional relationship between devices, devices, modules, etc. shown in the figure does not constitute any limitation. For example, in FIG. 1, the data The storage system 150 is an external memory relative to the execution device 110 , and in other cases, the data storage system 150 may also be placed in the execution device 110 .

As shown in FIG. 1, the target model/rule 101 is obtained according to the training device 120. The target model/rule 101 in the embodiment of the present application may be the neural network model in the present application, specifically, the neural network in the embodiment of the present application The model can be CNN or residual network, etc.

The image signal processor outputs a visualized image after a series of processing on the raw image acquired by the sensor. These images can be used as input images for vision tasks. Specifically, a neural network algorithm or a non-neural network algorithm may be used to process an input image in a visual task to obtain relevant results of the visual task.

FIG. 2 shows a schematic diagram of an overall processing flow of a vision task. The raw image is used as an input image, and a series of image signal processing is performed on the input image, and an 8-bit (bit) visualized red green blue (RGB) image is output. The RGB image is used as the input image of the vision task, and the processing result of the vision task is obtained. For example, as shown in Figure 2, the image signal processing module includes a black level compensation (black level compensation) module, a green balance (green balance) module, a bad pixel correction (bad pixel correction) module, a demosaic (demosaic) module, Bayer Bayer denoise module, auto white balance module, color correction module, gamma correction module, denoise sharpness module, etc. The image signal processing module can adopt non-neural network algorithm or neural network algorithm.

The input images of vision tasks are usually RGB images after image signal processing. The purpose of traditional image signal processing is usually to meet human visual needs, and the results obtained by performing visual tasks based on the images are not necessarily optimal.

The embodiment of the present application provides an image processing method, which adjusts the image processing flow before the vision task according to the processing result of the vision task, so as to obtain an image processing flow that meets requirements.

The image processing method in the embodiment of the present application will be described in detail below with reference to FIG. 3 to FIG. 6 .

FIG. 3 shows an image processing method 300 provided by an embodiment of the present application. The method shown in Figure 3 can be executed by a computing device, which can be a cloud service device or a terminal device, such as a computer, server, mobile phone, camera, vehicle, drone or robot, or a A system composed of cloud service equipment and terminal equipment.

Exemplarily, the method 300 may be executed by a training device or an inference device, for example, the method 300 may be executed by an accelerator such as a CPU, a GPU, or an NPU. Further, the accelerator chip may be located on an FPGA, a chip emulator (Emulator) or a development board (evaluation board, EVB).

Alternatively, the method 300 may be executed by a tuning tool or a calibration tool of an ISP pipeline (pipeline) of a hardware device (eg, a camera or a camera).

The method 300 includes step S301 to step S304. Step S301 to step S304 will be described in detail below.

S301. Acquire a first image.

Exemplarily, the first image may be a raw image acquired by a sensor.

The training data set includes multiple images, and the first image is any image in the training data set. In practical applications, the method 300 may be executed multiple times based on multiple images in the training data set until the required image processing modules are obtained.

Exemplarily, the training data set may use an open source data set. Alternatively, the training data set can also be a self-made data set.

Exemplarily, the training data set may be pre-stored. For example, the training data set may be the training data maintained in the database 130 shown in FIG. 1 . Alternatively, the training dataset can also be user-input data.

S302. Process the first image by at least one image processing module to obtain a second image.

Exemplarily, the at least one image processing module may be located on an image signal processor. That is to say, step S302 is executed by the image processing module in the image processor.

Any image processing module in the at least one image processing module may be implemented by a neural network algorithm, or may also be implemented by a non-neural network algorithm. The embodiment of the present application does not limit the specific implementation manner of the image processing module.

Optionally, the at least one image processing module may include: a black level compensation module, a green balance module, a dead point correction module, a demosaic module, a Bayer noise reduction module, an automatic white balance module, a color correction module, and a gamma correction module Or noise reduction and sharpening modules.

For example, as shown in Figure 4, the raw image is used as the first image, and the at least one image processing module includes 9 image processing modules, which are respectively a black level compensation module, a green balance module, a bad pixel correction module, a demosaic module, Bayer noise reduction module, automatic white balance module, color correction module, gamma correction module, and noise reduction and sharpening module. The nine image processing modules sequentially perform black level compensation, green balance processing, dead pixel correction, demosaicing, Bayer noise reduction, automatic white balance processing, color correction, gamma correction, and noise reduction and sharpening.

Exemplarily, the black level module, the green balance module and the bad pixel correction module can be used to process the raw data. A demosaic module and a Bayer denoising module may be used to perform demosaic processing. An automatic white balance module, a color correction module, a gamma correction module, and a noise reduction and sharpening module can be used to perform image enhancement processing.

For example, as shown in FIG. 4, the second image may be an RGB image. Further, the second image may be an 8-bit RGB image. This is only an example, and the type of the second image may also be set according to the input requirements of the visual task model.

Optionally, step S302 includes: processing the first image by using at least one image processing module and the weight of the at least one image processing module to obtain the second image.

Exemplarily, the image processing module processes the image input to the module, which may be to adjust the pixel values of all or part of the pixels of the image input to the module, that is, to change the pixel values of all or part of the pixels. In this case, the variation of the pixel values of all or some pixels may be adjusted according to the weight of the image processing module.

For example, the weight of the image processing module is multiplied by the variation of the pixel value to obtain the adjusted variation of the pixel, and then the output image of the module is obtained. If the weight of the image processing module is 0, it means that the image processing module does not participate in the image processing process.

The specific value of the weight can be set as required, for example, the weight can be a real number greater than or equal to 0 and less than or equal to 1.

Further, when setting the weight, the weight of the at least one image processing module can be normalized, that is, the sum of the weights of the at least one image processing module is 1, or the sum of the weights of the at least one image processing module close to 1.

As shown in Figure 4, the weights of the nine image processing modules are w1, w2, w3, w4, w5, w6, w7, w8 and w9, respectively. The value range of the weight is a real number greater than or equal to 0 and less than or equal to 1. Thus, the largest possible sum of w1, w2, w3, w4, w5, w6, w7, w8 and w9 is nine. Alternatively, the nine weights can also be normalized so that the sum of the nine weights can be 1.

S303. Input the second image into the visual task model for processing.

The vision task model can be a trained model.

The type of output of the visual task model is related to the type of visual task. The output of the visual task model is the inference result of the visual task model.

For example, if the vision task is target detection, the output of the vision task model may be a target frame on the second image and the category of the object in the target frame. For another example, if the visual task is image classification, the output of the visual task model may be the category of the second image.

Input the second image into the visual task model for processing, compare the obtained detection result with the corresponding true value of the first image, obtain the error between the two, and determine the detection accuracy according to the error between the two.

Input the second image into the visual task model for processing, compare the obtained segmentation result with the corresponding true value of the first image, obtain the error between the two, and determine the segmentation accuracy according to the error between the two.

The visual task model may use a neural network model, or may also use a non-neural network model. The neural network model may be an existing neural network model, for example, a residual network. Alternatively, the neural network model may also be a neural network model of other structures constructed by itself. This embodiment of the present application does not limit it.

It should be noted that for the same visual task, different visual task models may be used in different application scenarios. For example, for an object detection task in a driving scene, the visual task model employed may or may not be the same in overexposed and underexposed situations. During driving, if the current scene is recognized as overexposed, the first object detection model may be used, and if the current scene is recognized as underexposed, the second object detection model may be used. The first target detection model and the second target detection model are different target detection models.

Exemplarily, the processing of the visual task model can be executed by the calculation module 111 in FIG. 1 .

The vision task model can be deployed on the execution device of the method 300, or can be deployed on other devices. That is to say, the processing of the visual task model can be executed by the executing device of the method 300 or by other devices, and the processing result can be fed back to the executing device of the method 300 .

S304. Adjust the at least one image processing module according to the processing result of the visual task model.

In other words, the at least one image processing module is adjusted according to the performance index of the visual task model, so as to improve the performance of the visual task model.

For example, if the performance index of the visual task model is the accuracy of inference of the visual task model, the at least one image processing module is adjusted to improve the accuracy of inference of the model.

For another example, if the performance index of the visual task model is the value of the loss function of the visual task model, the at least one image processing module is adjusted to reduce the value of the loss function of the visual task model.

In practical applications, the method 300 may be executed based on multiple images in the training data set until a preset condition is met. That is to say, in practical applications, the image processing module can be adjusted iteratively based on multiple images. The image processing module used in each iteration is the image processing module obtained after the previous iteration.

The preset conditions can be set as required, and examples will be given in Mode 1, Mode 2, Mode 3, and Mode 4 below.

Further, the at least one image processing module can also be adjusted according to the image processing time and the processing result of the visual task model.

In the solution of the embodiment of the present application, the image processing flow is adjusted according to the processing result of the visual task model, which is beneficial to obtain an image suitable for the visual task, so as to ensure the performance of the visual task model.

The solutions of the embodiments of the present application can adjust the image processing flow according to the requirements of different application scenarios, so as to adapt to different application scenarios.

For the same visual task, different visual task models may be used in different application scenarios. For example, for an object detection task in a driving scene, the visual task model employed may or may not be the same in overexposed and underexposed situations. During the driving process, if the current scene is identified as being overexposed, the first object detection model may be used as the visual task model. If the current scene is identified as being underexposed, the second object detection model may be used as the vision task model. The solution of the embodiment of the present application can adjust the image processing flow according to the processing results of the first object detection model and the second object detection model respectively, so as to obtain an image processing flow suitable for the first object detection model and an image suitable for the second object detection model processing flow.

Step S304 can be implemented in various ways, and the following four ways (mode 1, mode 2, mode 3 and mode 4) are taken as examples for illustration.

way 1

Optionally, the at least one image processing module includes a plurality of image processing modules, and step S304 includes: adjusting weights of the plurality of image processing modules according to a processing result of the visual task model.

The weights of the plurality of image processing modules are adjusted according to the processing results of the visual task model, so as to improve the performance of the visual task model.

As mentioned above, in practical applications, the method 300 can be executed based on multiple images in the training data set to implement iterative adjustment of the weights of the multiple image processing modules until the preset conditions are met. Stop adjusting the weights of the plurality of image processing modules after the preset condition is met, or stop refreshing the weights of the plurality of image processing modules.

Exemplarily, the preset condition may be that the weights of the plurality of image processing modules converge.

When the weights of the plurality of image processing modules converge, the method 300 is not executed any more, that is, the adjustment of the weights of the plurality of image processing modules is stopped. The weight convergence can also be understood as the weight gradient obtained after performing the method 300 multiple times continuously has a small change. For example, when the change amount of the weight gradient obtained after performing the method 300 for multiple times is less than or equal to the first threshold, stop adjusting the weights of the multiple image processing modules.

Alternatively, the preset condition may be that the accuracy of the visual task model is greater than or equal to the second threshold.

In the case that the accuracy of the visual task model is greater than or equal to the second threshold, the method 300 is not executed, that is, the adjustment of the weights of the plurality of image processing modules is stopped.

The second threshold may be a preset value. Alternatively, the second threshold may be the inference accuracy of the visual task model obtained without setting the weight of the image processing module. For example, as shown in FIG. 4 , the second threshold may be the inference accuracy of the visual task model when no weight is set for the nine image processing modules. Or it can be understood that the second threshold may be the inference accuracy of the visual task model when the weight of the nine image processing modules is 1.

That is to say, the image is input into the original image processing module for processing, and the processed image is input into the vision task model for processing, and the accuracy of reasoning is calculated, and the accuracy is used as the second threshold. Executing method 300, that is, inputting the image into the currently adjusted image processing module for processing, and inputting the processed image into the visual task model for processing, and calculating the accuracy of inference, and comparing the currently obtained inference accuracy with The second threshold is compared, and in the case that the accuracy of the currently obtained reasoning is greater than or equal to the second threshold, the method 300 is not executed any more. In this way, using the adjusted image processing module to process the image can ensure the performance of the visual task model, or can improve the performance of the visual task model.

Alternatively, the preset condition may be that the change amount of the loss function value of the visual task model obtained after performing the method 300 for multiple times is less than or equal to the third threshold.

That is to say, when the change of the loss function value of the vision task model tends to be stable, the method 300 is not executed any more.

Alternatively, the preset condition may be that the number of iterations is greater than or equal to the fourth threshold.

That is to say, in the case that the number of times the method 300 is executed is greater than or equal to the fourth threshold, the method 300 is not executed any more.

It should be understood that the above preset conditions may be used in combination. For example, the preset condition may be that the accuracy of the visual task model is greater than or equal to the second threshold, and the number of iterations is greater than or equal to the fourth threshold. For another example, the preset condition may be that the weights of the plurality of image processing modules converge, and the accuracy of the visual task model is greater than or equal to the second threshold.

It should be understood that the above are examples only, and the preset conditions may also be conditions in other forms, which are not limited in the present application.

Exemplarily, the weights of the plurality of image processing modules may be adjusted by means of Bayesian optimization method, RNN model, or reinforcement learning algorithm.

The Bayesian optimization method is taken as an example to illustrate below.

For example, the vision task model is a target detection model, and the performance index of the vision task model may be mean average precision (mAP). The weights of the multiple image processing modules are adjusted by a Bayesian optimization method to improve the mAP of the object detection model. In other words, the weights of the multiple image processing modules are adjusted with the goal of maximizing the mAP of the target detection model.

The average accuracy refers to the average of the detection accuracies for all target objects.

Input the image in the training data set into the target detection model to obtain the detection accuracy of the image. The detection accuracy of the image is input into the Bayesian optimization model, and the Bayesian optimization model adjusts the weight of each image processing module.

Further, the detection accuracy of the image can be preserved in the Bayesian optimization model. That is to say, when other images in the training data set are input into the target detection model, the detection accuracy of other images is obtained. The Bayesian optimization model can adjust the weight of each image processing module according to the detection accuracy of other images and the detection accuracy of previous images.

It should be noted that the training data set in the embodiment of the present application is used to train each image processing module, which may be the same as or different from the training data set of the vision task model. For example, the training data set in the embodiment of the present application may use a verification data set or a test data set of the vision task model.

In the solution of the embodiment of the present application, the weight of the image processing module is evaluated according to the processing results of the visual task model, and then the weight of the image processing module is adjusted to increase the weight of the image processing module that has a strong correlation with the performance of the visual task model. Reducing the weight of image processing modules that are less correlated with the performance of the vision task model can obtain an image processing flow that is more suitable for the vision task, and is conducive to improving the performance of the vision task model.

way 2

Optionally, step S304 includes: modifying the at least one image processing module according to a processing result of the visual task model.

In a possible implementation, step S304 may be to select a combination of image processing modules from multiple candidate image processing modules according to the processing results of the visual task model, and use the combination of image processing modules to replace the at least one image processing module. module.

Exemplarily, the at least one image processing module can be changed by means of Bayesian optimization method or reinforcement learning algorithm.

As mentioned above, in practical applications, the method 300 can be executed based on multiple images in the training data set to realize iterative adjustment of the combination of the multiple image processing modules until the preset conditions are met. Stop adjusting the combination of the multiple image processing modules after the preset condition is met, or stop refreshing the combination of the multiple image processing modules.

Exemplarily, the preset condition may be that the number of iterations is greater than or equal to the fourth threshold.

In a case where the number of executions of the method 300 is greater than or equal to the fourth threshold, the execution of the method 300 is stopped, that is, the adjustment of the combination of the image processing modules is stopped.

It should be understood that this is only an example, and other preset conditions can be set with reference to mode 1, which will not be repeated here.

The at least one image processing module includes a plurality of image processing modules, and step S304 includes: deleting part of the image processing modules from the plurality of image processing modules according to the processing result of the visual task model.

In a possible implementation manner, the processing result of the manner 1 may be adopted in the manner 2.

Optionally, step S304 includes: adjusting the weights of the multiple image processing modules according to the processing results of the visual task model; deleting part of the image processing modules from the multiple image processing modules according to the adjusted weights of the multiple image processing modules .

For example, among the nine image processing modules shown in Figure 4, if the corresponding weights of the green balance module, bad pixel correction module, bayer noise reduction module, color correction module, and noise reduction and sharpening module are less than or equal to the weight threshold, delete The five modules.

In addition, image processing modules with higher weights have a stronger correlation with the vision task model, or in other words, image processing modules with higher weights have a greater impact on the performance of the vision task model. In the scheme of the embodiment of the present application, the image processing module to be deleted is determined according to the weight of each image processing module, and the image processing module with a relatively small weight value is deleted, so that the impact on the processing result of the visual task model is small, and the visual impact after deletion is relatively small. The performance of the task model is less affected. That is to say, the solutions of the embodiments of the present application can reduce unnecessary operations, reduce computing overhead, and improve processing speed on the premise of ensuring the performance of the visual task model.

Optionally, step S304 includes: deleting part of the image processing modules from the plurality of image processing modules according to the processing result of the visual task model and the processing speed of the plurality of image processing modules.

Exemplarily, some image processing modules are deleted from the plurality of image processing modules according to the adjusted weights of the plurality of image processing modules and the processing speeds of the plurality of image processing modules.

For example, an image processing module whose adjusted weight is less than or equal to a weight threshold and whose processing speed is less than or equal to a speed threshold is deleted from the plurality of image processing modules. That is, image processing modules that have slower processing speed and have less impact on the vision task model are deleted. In this way, the speed of image processing can be further increased.

way 3

The at least one image processing module includes a plurality of image processing modules, and step S304 includes: adjusting the processing order of the plurality of image processing modules according to the processing results of the visual task model.

The processing order of the plurality of image processing modules is adjusted according to the processing results of the visual task model, so as to improve the performance of the visual task model.

Exemplarily, the processing order of the plurality of image processing modules may be adjusted by means of Bayesian optimization method, RNN model, or reinforcement learning algorithm.

As mentioned above, in practical applications, the method 300 can be executed based on multiple images in the training data set until the preset conditions are met. Stop adjusting the processing sequence of the multiple image processing modules after the preset condition is satisfied, or stop refreshing the processing sequence of the multiple image processing modules.

Exemplarily, the preset condition may be that the variation of the processing sequence of the plurality of image processing modules is less than or equal to the fifth threshold.

For example, the amount of change in the processing order of the plurality of image processing modules may be the number of image processing modules whose processing order changes after the method 300 is executed.

Alternatively, the preset condition may be that the inference accuracy of the visual task model is greater than or equal to the sixth threshold.

In a case where the inference accuracy of the visual task model is greater than or equal to the sixth threshold, the method 300 is not executed again, that is, the adjustment of the processing sequence of the plurality of image processing modules is stopped.

The sixth threshold may be a preset value. Alternatively, the sixth threshold may be the inference accuracy of the visual task model without adjusting the processing sequence of the image processing module. For example, as shown in FIG. 4 , the sixth threshold may be the inference accuracy of the visual task model when images are processed according to the processing order of the image processing module shown in FIG. 4 .

That is to say, the image is input into the original image processing module, processed in the order of the original image processing module, and the processed image is input into the visual task model for processing, and the accuracy of inference is calculated, and the accuracy as the sixth threshold. Input the image into the currently adjusted image processing module for processing, and input the processed image into the visual task model for processing, and calculate the accuracy of the inference, and compare the accuracy of the current inference with the sixth threshold For comparison, if the accuracy of the currently obtained reasoning is greater than or equal to the sixth threshold, the method 300 is not executed any more. In this way, the images are processed according to the adjusted processing sequence of the image processing module, so that the performance of the visual task model can be guaranteed, or the performance of the visual task model can be improved.

Alternatively, the above preset conditions may be used in combination. For example, the preset condition may be that the inference accuracy of the visual task model is greater than or equal to the sixth threshold, and the number of iterations is greater than or equal to the fourth threshold. For another example, the preset condition may be that the variation of the processing order of the plurality of image processing modules is less than or equal to the fifth threshold, and the accuracy of the visual task model is greater than or equal to the sixth threshold.

way 4

Optionally, step S304 includes: adjusting parameters in the at least one image processing module according to a processing result of the visual task model.

The parameters in the at least one image processing module are adjusted according to the processing results of the visual task model, so as to improve the performance of the visual task model.

Exemplarily, if the image processing module adopts a neural network model, the parameters in the image processing module are the parameters of the neural network model.

Exemplarily, the parameters in the at least one image processing module may be adjusted by means of a Bayesian optimization method, an RNN model, a reinforcement learning algorithm, and the like.

The input image is processed based on the parameter combination in the current image processing module, and the processed result is input into the vision task model for processing, for example, the vision task is performed by CPU or GPU. The parameter combination in the image processing module is optimized and updated according to the feedback of the performance of the visual task model, that is, the optimal parameter combination in the image processing module is found in the search space, so as to improve the performance of the visual task model.

As mentioned above, in practical applications, the method 300 can be executed based on multiple images in the training data set until a preset condition is met. Stop adjusting the parameters in the at least one image processing module after the preset condition is met, or stop refreshing the parameters in the at least one image processing module.

Exemplarily, the preset condition may be that the inference accuracy of the visual task model is greater than or equal to the seventh threshold.

In a case where the inference accuracy of the visual task model is greater than or equal to the seventh threshold, the method 300 is not executed again, that is, the adjustment of the parameters in the at least one image processing module is stopped.

The seventh threshold may be a preset value. Alternatively, the seventh threshold may be the processing accuracy of the visual task model obtained without adjusting the parameters in the at least one image processing module. For example, as shown in FIG. 4 , the seventh threshold may be the inference accuracy of the vision task model when the nine image processing modules do not adjust parameters.

That is to say, the image is input into the original image processing module, that is, the image processing module without adjustment parameters for processing, and the processed image is input into the vision task model for processing, and the accuracy of inference is calculated, and the accuracy degrees as the seventh threshold. Input the image into the image processing module with currently adjusted parameters for processing, and input the processed image into the visual task model for processing, and calculate the accuracy of inference, and compare the currently obtained inference accuracy with the seventh threshold In comparison, in the case where the accuracy of the currently obtained reasoning is greater than or equal to the seventh threshold, the method 300 is not executed any more. In this way, using the adjusted image processing module to process the image can ensure the performance of the visual task model, or can improve the performance of the visual task model.

Alternatively, the above preset conditions may be used in combination. For example, the preset condition may be that the inference accuracy of the visual task model is greater than or equal to the seventh threshold, and the number of iterations is greater than or equal to the fourth threshold.

It should be noted that any two or more of the above modes 1, 2, 3 and 4 may be used in combination. When used in combination, each method can be executed at the same time, or each method can also be executed separately.

Optionally, step S304 includes: deleting part of the image processing modules from the plurality of image processing modules according to the processing results of the visual task model; processing the fifth image through the image processing modules that have not been deleted in the plurality of image processing modules to obtain For the sixth image, input the sixth image into the visual task model for processing; adjust the parameters of the image processing module that have not been deleted according to the processing result of the visual task model.

Exemplarily, the fifth image may be an image in the training data set. For other descriptions of the fifth image, reference may be made to the first image above. The fifth image and the first image may be the same image or different images.

Exemplarily, the sixth image may be an RGB image. For the description of the sixth image, refer to the second image above.

Optionally, step S304 includes: adjusting the parameters of multiple image processing modules and the weights of the multiple image processing modules according to the processing results of the visual task model, and processing the multiple image processing modules according to the adjusted weights of the multiple image processing modules. Some image processing modules are deleted from the module.

Optionally, step S304 includes: adjusting the parameters of multiple image processing modules, the weights of the multiple image processing modules, and the processing order of the multiple image processing modules according to the processing results of the visual task model, and adjusting the multiple image processing modules according to the adjusted The weight of the processing module deletes some image processing modules from the plurality of image processing modules.

The embodiment of the present application provides an image processing method 400. The method 400 can be regarded as a specific implementation of the method 300. For the specific description, refer to the aforementioned method 300. For the sake of brevity, some descriptions are appropriately omitted when introducing the method 400 below. Specifically, the method 400 adopts a combination of mode 1, mode 2 and mode 4.

The method 400 includes step S401 to step S410. Steps S401 to S410 will be described below. The method 400 can be regarded as two stages, the first stage includes steps S401 to S406, and the second stage includes steps S407 to S410.

S401. Set initial weights for multiple image processing modules.

For example, the plurality of image processing modules may include nine image processing modules as shown in FIG. 5 . The weights of the respective image processing modules are denoted as w1, w2, w3, w4, w5, w6, w7, w8 and w9. The sum of the 9 weights is 1.

The above is only an example, and other weight setting methods can refer to the description in step S302.

S402. Input the images in the training data set to the plurality of image processing modules for processing.

That is, the input image is processed based on the weights of the plurality of image processing modules. In other words, the processing results of the multiple image processing modules are adjusted based on the weights of the multiple image processing modules.

For example, the input image is processed according to the image processing module and its corresponding weight shown in FIG. 5 .

Exemplarily, the processing result may be an RGB image.

Further, the processing result may be an 8-bit RGB image.

Step S402 corresponds to step S302, and for a specific description, refer to the description in step S302.

S403, input the processed results of the plurality of image processing modules into the visual task model for reasoning, and obtain the reasoning result of the visual task model.

The vision task model can be a trained model.

S404. Compare the inference result of the visual task model with the true value corresponding to the image in the training data set, and adjust the weights of the plurality of image processing modules according to the comparison result.

In other words, the comparison result is fed back to the optimization algorithm, and the optimization algorithm is used to adjust the weights of the plurality of image processing modules.

Exemplarily, the optimization algorithm includes a Bayesian optimization method, an RNN model, and a reinforcement learning algorithm.

S405, using the adjusted weight of the image processing module as the weight of the image processing module in step S402, and repeating steps S402 to S404 until the first preset condition is met.

Alternatively, step S405 may also be to perform normalization processing on the adjusted weights of the image processing modules, and use the normalized weights as the weights of the image processing modules in step S402.

That is to say, after adjusting the weights of the image processing modules each time, normalization processing is performed on the adjusted weights, so that the sum of the normalized weights is 1 or the sum is close to 1.

After the first preset condition is satisfied, step S402 to step S404 are terminated. Exemplarily, the currently obtained weight of the image processing module may be regarded as the weight of the image processing module obtained after satisfying the first preset condition.

For example, if the accuracy of the current visual task model is greater than or equal to the accuracy of the visual task model when the weight of the image processing module is not set, then step S402 to step S404 are terminated.

Steps S403 to S405 can be regarded as a specific implementation of method 1. For specific description, refer to the description in method 1. For the setting method of the first preset condition, refer to the preset condition in method 1, which will not be repeated here.

S406. Delete part of the image processing modules according to the weights of the image processing modules obtained after satisfying the first preset condition.

Step S406 corresponds to step S304 in method 2. For specific description, please refer to the description of love is hard to change in method 2, which will not be repeated here.

For example, as shown in FIG. 5 , delete the green balance module, bad pixel repair module, bayer noise reduction module, color correction module, gamma correction module, and noise reduction and sharpening module with smaller adjusted weight values.

S407. Input the images in the training data set into the undeleted image processing module for processing.

The image in step S407 and the image in step S402 may be the same image or different images.

That is to say, the parameters in the image processing module that have not been deleted are used as tuning objects. In other words, the parameters in the reserved image processing module are used as tuning objects.

Further, before step S407, normalization processing may also be performed on the weights of the image processing modules that have not been deleted.

For example, as shown in Figure 5, the images in the training data set are input to the black level compensation module, the demosaic module, the automatic white balance module and the gamma correction module for processing. Further, before performing step S407, the weights of the four image processing modules may be normalized.

S408. Input the processed results of the undeleted image processing modules into the visual task model for inference, and obtain an inference result of the visual task model.

S409, comparing the inference result of the visual task model with the true value corresponding to the image in the training data set, and adjusting the parameters in the image processing module that have not been deleted according to the comparison result.

In other words, the comparison result is fed back to the optimization algorithm, and the parameters in the image processing module are adjusted using the optimization algorithm.

Exemplarily, the optimization algorithm includes Bayesian optimization method, RNN model or reinforcement learning algorithm.

It should be understood that the optimization algorithm used in step S409 may be the same as or different from the optimization algorithm used in step S440.

S410, using the adjusted parameters in the image processing module as parameters in the image processing module in step S407, and repeating steps S407 to S409 until the second preset condition is met.

After the second preset condition is satisfied, step S407 to step S410 are terminated. Exemplarily, the currently obtained parameters in the image processing module may be regarded as parameters in the image processing module obtained after satisfying the second preset condition.

For example, if the accuracy of the current visual task model is greater than or equal to the accuracy of the visual task model when the weight of the image processing module is not set, then step S407 to step S410 are terminated.

Step S407 to step S410 can be regarded as a specific implementation manner of mode 3, and for specific description, refer to the description in mode 3, which will not be repeated here. For the setting method of the second preset condition, reference may be made to the preset condition in method 3.

Moreover, after the first stage is completed, use the performance index obtained by the visual task model to adjust the parameters in the retained image processing module, for example, use the performance index obtained by the visual task model to search the design space of the image processing module, which is beneficial The optimal parameter configuration of each image processing module is obtained to improve the performance of the vision task model.

In another possible implementation manner, the first stage and the second stage in the method 400 may be executed simultaneously. That is to say, the weight of the image processing module and the parameters in the image processing module are adjusted at the same time. The manner in which the first phase and the second phase of the method 400 are executed simultaneously will be described below. Method 400 may include the following steps. For the following steps, reference may be made to the description of the first stage and the second stage of the aforementioned method 400. For the sake of brevity, part of the description is appropriately omitted when describing the following steps.

1) Set initial weights for multiple image processing modules.

2) Input the images in the training data set into the plurality of image processing modules for processing.

3) Input the processed results of the plurality of image processing modules into the visual task model for inference, and obtain the inference result of the visual task model.

4) comparing the inference result of the vision task model with the true value corresponding to the image in the training data set, and adjusting the weights of the multiple image processing modules and the parameters in the multiple image processing modules according to the comparison results.

In other words, the comparison result is fed back to the optimization algorithm, and the optimization algorithm is used to adjust the weights of the plurality of image processing modules. An optimization algorithm is used to adjust parameters in the multiple image processing modules.

The optimization algorithm for adjusting the weights of the multiple image processing modules and the optimization algorithm for adjusting the parameters in the multiple image processing modules may be the same or different.

5) The weight of the image processing module after adjustment is used as the weight of the image processing module in step 2), and the parameter in the image processing module after adjustment is used as the parameter in the image processing module in step 2), repeating step 2) Go to step 4) until the training is completed.

Alternatively, the adjusted weights of the image processing modules are normalized, and the normalized weights are used as the weights of the image processing modules in step 5).

For example, if the accuracy of the current visual task model is greater than or equal to the inference accuracy of the visual task model when the weight of the image processing module is not set, the training is completed. In other words, if the accuracy of the current visual task model is greater than or equal to the inference accuracy of the visual task model before the method 400 is executed, the training is complete.

6) Delete part of the image processing modules according to the weights of the image processing modules after training. Step 6) corresponds to step S304 in method 2. For specific description, please refer to the description in method 2, which will not be repeated here.

In this way, the first stage and the second stage are executed at the same time, which can prevent the image processing module from being deleted due to unreasonable parameter configuration, so that the image processing module can process the image under a better parameter configuration, and then judge the better parameter configuration The contribution degree of each image processing module under the vision task model to the performance index, in order to retain the image processing module required by the vision task model, so that the performance index of the vision task model can be further improved.

Method 400 is only an example of combining mode 1, mode 2 and mode 4. Way 1, way 2, way 3 and way 4 can also be combined in other implementation ways.

Exemplarily, mode 1, mode 2 and mode 3 are combined.

For example, step S304 may include: adjusting the weights of multiple image processing modules and the processing order of the multiple image processing modules according to the processing results of the visual task model, and selecting from the multiple image processing modules according to the adjusted weights of the image processing modules Delete some image processing modules.

For another example, step S304 may include: adjusting the weights of multiple image processing modules according to the processing results of the visual task model, and deleting part of the image processing modules from the multiple image processing modules according to the adjusted weights of the image processing modules; The processing results of the model adjust the processing order of the image processing modules that have not been deleted. That is, step S304 is divided into two stages. In the first stage, some image processing modules are deleted, and in the second stage, the processing order of the image processing modules that have not been deleted is adjusted.

For a specific combination manner, reference may be made to the method 400, which will not be repeated here.

It should be understood that the above combination manners are examples, and any two or more of the foregoing four manners may also be combined, which is not limited in this embodiment of the present application.

In the embodiment of the present application, the adjusted image processing module is an image processing module required by the visual task model. There is a corresponding relationship between the adjusted image processing module and the vision task model. Different vision task models can correspond to different image processing modules. In this way, an appropriate image processing flow can be selected according to the application scenario.

Figure 6 shows an image processing method 700 provided by the embodiment of the present application. The method shown in Figure 6 can be executed by an image processing device, which can be a cloud service device or a terminal device, such as a computer, server, etc. A device capable enough to perform image processing may also be a system composed of cloud service equipment and terminal equipment. Exemplarily, the method 700 may be executed by the preprocessing module in FIG. 1 .

The target image processing module in method 700 is obtained by method 300 or method 400 . In order to avoid unnecessary repetition, repeated descriptions are appropriately omitted when introducing the method 700 below.

The method 700 includes steps S701 to S704. Steps S701 to S704 will be described in detail below.

S701. Acquire a third image.

The third image is an image to be processed.

Exemplarily, the third image may be a raw image acquired by the sensor.

Exemplarily, the third image may be an image captured by a terminal device (or other device or device such as a computer or server) through a camera, or the third image may also be an image captured by a terminal device (or other device or device such as a computer or server). ) internally obtained images (for example, images stored in the photo album of the terminal device, or images obtained by the terminal device from the cloud), which are not limited in this embodiment of the present application.

S702. Determine at least one target image processing module according to the visual task model.

The vision task model can be a trained model.

For the same vision task, different vision task models may be used in different application scenarios. For example, for an object detection task in a driving scene, the visual task model employed may or may not be the same in overexposed and underexposed situations. During driving, if the current scene is identified as being overexposed, the first target detection model may be used as the visual task model, and at least one target image processing module corresponding to the first target detection model may be determined according to the first target detection model. If the current scene is identified as being underexposed, the second target detection model may be used as the visual task model, and at least one target image processing module corresponding to the second target detection model is determined according to the second target detection model. The first target detection model and the second target detection model are different target detection models. In this way, different image processing processes can be selected according to different application scenarios to improve the performance of the vision task model.

S703. Process the third image by the at least one target image processing module to obtain a fourth image.

That is to say, one or more image processing modules corresponding to the visual task model are used to process the input third image to obtain the fourth image.

Exemplarily, the fourth image may be an RGB image. Alternatively, the fourth image may be an 8-bit RGB image. This is only an example, and the type of the fourth image can be set according to the input requirements of the visual task model.

S704. Process the fourth image by using the visual task model to obtain a processing result of the fourth image.

The processing result of the fourth image is the reasoning result of the visual task model. The inference results of the visual task model are related to the type of visual task.

For example, if the vision task is target detection, the inference result of the vision task model may be the target frame on the fourth image and the category of the object in the target frame. For another example, if the vision task is image classification, the reasoning result of the vision task model may be the category of the fourth image.

Optionally, step S702 includes: determining at least one target image processing module from multiple candidate image processing modules according to the visual task model.

That is to say, a combination of image processing modules is determined from multiple candidate image processing modules according to the visual task model, and an image processing module in the combination of image processing modules is the at least one target image processing module.

In this case, when the visual task model changes, the combination of image processing modules may also change accordingly.

There is a correspondence between the combination of the visual task model and the image processing module. According to the corresponding relationship, the combination of image processing modules corresponding to the current visual task model can be determined, or in other words, the image processing module required for the visual task model can be determined according to the corresponding relationship, that is, the at least one target image processing module . The at least one target image processing module may be obtained through the method 300 or the method 400 . Alternatively, it can be understood that the correspondence between the combination of the visual task model and the image processing module is obtained through the method 300 or the method 400 .

For example, if the visual task model is the model shown in FIG. 5 , then the at least one target image processing module includes: a black level compensation module, a demosaic module, an automatic white balance module and a gamma correction module.

In this way, different visual task models correspond to different combinations of image processing modules. When the visual task model changes, the combination of image processing modules can adaptively match the visual task model, making the current combination of image processing modules more suitable for the current visual Task model, which is beneficial to improve the performance of vision task models.

Optionally, step S702 includes: determining the weight of at least one target image processing module according to the visual task model. The weight of the at least one target image processing module is used to process the processing result of the at least one target image processing module to obtain a fourth image.

In an implementation manner, the combinations of image processing modules corresponding to different visual task models are the same. When the vision task model changes, the weight of the image processing module may change accordingly.

In the embodiment of the present application, the combination of image processing modules corresponding to different visual task models is the same, which may be understood to mean that the functions implemented by the image processing modules adopted by different visual task models are the same.

There is a corresponding relationship between the visual task model and the weights of the image processing module. According to the corresponding relationship, the weight of the image processing module corresponding to the current visual task model, that is, the weight of the at least one target image processing module, can be determined.

For example, if the visual task model is the model shown in FIG. 4, the at least one target image processing module may be the nine image processing modules in FIG. 4, and the weights of the image processing modules may be the weights obtained in step S405.

In this way, different visual task models correspond to different weights of image processing modules. When the visual task model changes, the weights of the image processing modules can adaptively match the visual task model, making the weights of the current image processing modules more suitable for the current visual Task model, which is beneficial to improve the performance of vision task models.

In another implementation manner, when the visual task model changes, correspondingly, the weight of the image processing module may also change, and other configurations of the image processing module may also change. For example, the combination of image processing modules may change.

Exemplarily, the visual task model has a corresponding relationship with the weight of the image processing module and other configuration conditions of the image processing module. In this way, the weight of the image processing module corresponding to the visual task model and other configurations of the image processing module can be determined according to the visual task model.

For example, there is a corresponding relationship between the combination of the visual task model and the image processing module, and the weight of the image processing module. In step S702, a combination of image processing modules corresponding to the visual task model and weights of image processing modules in the combination of image processing modules may be determined.

If the visual task model is the model shown in FIG. 5 , the at least one target image processing module corresponding to the visual task model may be obtained in step S406. The at least one target image processing module includes a black level compensation module, a demosaic module, an automatic white balance module and a gamma correction module. The weight of the at least one target image processing module may be the weight obtained in step S405.

Optionally, step S702 includes: determining a processing sequence of at least one target image processing module according to the visual task model.

In an implementation manner, the combinations of image processing modules corresponding to different visual task models are the same. In this case, when the visual task model changes, the processing sequence of the image processing module may also change accordingly.

In this way, different visual task models correspond to the processing order of different image processing modules. When the visual task model changes, the processing order of the image processing module can adaptively match the visual task model, making the processing order of the current image processing module more suitable. The current vision task model is beneficial to improve the performance of the vision task model.

In another implementation manner, when the visual task model changes, correspondingly, the processing order of the image processing module may change, and other configurations of the image processing module may also change. For example, the combination of image processing modules may change.

Exemplarily, the visual task model has a corresponding relationship with the processing order of the image processing module and other configurations of the image processing module. In this way, the processing sequence of the image processing module corresponding to the visual task model and other configurations of the image processing module can be determined according to the corresponding relationship.

For example, there is a corresponding relationship between the combination of the visual task model and the image processing module, and the processing sequence of the image processing module. According to the vision task model, the combination of image processing modules corresponding to the vision task model and the processing order of the image processing modules in the combination of image processing modules can be determined.

In this case, the combinations of image processing modules corresponding to different visual task models may be the same or different. For example, the combinations of image processing modules corresponding to the two visual task models are the same, but the processing orders of the image processing modules in the combination of image processing modules are different.

For another example, there is a corresponding relationship between the combination of the visual task model and the image processing module, the weight of the image processing module, and the processing order of the image processing module. In step S702, it is possible to determine the combination of image processing modules corresponding to the visual task model, the weight of the image processing modules, and the processing order of the image processing modules, that is, determine the target image processing module and the target image processing module from multiple candidate image processing modules. weights and the processing order of the target image processing module.

In this case, the combinations of image processing modules corresponding to different visual task models may be the same or different. In the case of the same combination of image processing modules, the weights of the image processing modules in the combination of image processing modules may be the same or different. In the case of the same combination of image processing modules, the processing order of the image processing modules in the combination of image processing modules may be the same or different.

Optionally, step S702 includes: determining parameters in the at least one target image processing module according to the visual task model.

In an implementation manner, the combinations of image processing modules corresponding to different visual task models are the same. When the vision task model changes, the parameters in the image processing module may change accordingly.

For example, the image processing module corresponding to the first visual task model includes: a black level compensation module and a demosaic module. Wherein, the parameters of the black level compensation module include parameter A1, and the parameters of the demosaic module include parameter B1. The image processing module corresponding to the second visual task model includes: a black level compensation module and a demosaic module. Wherein, the parameters of the black level compensation module include parameter A2, and the parameters of the demosaic module include parameter B2. Before the image is input into the first visual task model and the second visual task model, it needs to undergo black level compensation processing and demosaic processing. However, the parameters used in the black level compensation processing and demosaic processing before the first visual task model are different from those used in the black level compensation processing and demosaic processing before the second visual model.

In this way, different visual task models correspond to different parameters in the image processing module. When the visual task model changes, the parameters in the image processing module can adaptively match the visual task model, making the parameters in the current image processing module more suitable. The current vision task model is beneficial to improve the performance of the vision task model.

In another implementation manner, the visual task model has a corresponding relationship with parameters in the image processing module and other configurations of the image processing module. In this way, parameters in the image processing module corresponding to the current visual task model and other configurations of the image processing module can be determined according to the corresponding relationship.

For example, there is a corresponding relationship between the combination of the visual task model and the image processing module, and the parameters in the image processing module. According to the corresponding relationship, the combination of image processing modules corresponding to the current visual task model and the parameters of the image processing modules in the combination of image processing modules can be determined.

In this case, the combinations of image processing modules corresponding to different visual task models may be the same or different. For example, the combinations of image processing modules corresponding to the two vision task models are the same, but the parameters of the image processing modules in the combination of image processing modules are different.

For another example, there is a corresponding relationship between the combination of the visual task model and the image processing module, the weight of the image processing module, and the parameters in the image processing module. According to the corresponding relationship, the combination of the image processing modules corresponding to the visual task model, the weight of the image processing modules, and the parameters in the image processing modules can be determined.

In this case, the combinations of image processing modules corresponding to different visual task models may be the same or different. In the case of the same combination of image processing modules, the weights of the image processing modules in the combination of image processing modules may be the same or different. In the case of the same combination of image processing modules, the parameters of the image processing modules in the combination of image processing modules may be the same or different.

The device of the embodiment of the present application will be described below with reference to FIG. 7 to FIG. 8 . It should be understood that the device described below can execute the method of the aforementioned embodiment of the present application. In order to avoid unnecessary repetition, repeated descriptions are appropriately omitted when introducing the device of the embodiment of the present application below.

FIG. 7 is a schematic block diagram of an image processing device according to an embodiment of the present application. The image processing device 4000 shown in FIG. 7 includes an acquisition unit 4010 and a processing unit 4020 .

The acquisition unit 4010 and the processing unit 4020 may be used to execute the image processing method of the embodiment of the present application.

In a possible implementation manner, the apparatus 4000 may be used to execute the method 300 or the method 400 .

Specifically, the acquiring unit 4010 is configured to acquire the first image.

The processing unit 4020 is used to: process the first image through at least one image processing module to obtain a second image; input the second image into the visual task model for processing; adjust at least one image processing module according to the processing result of the visual task model .

Optionally, as an embodiment, at least one image processing module includes multiple image processing modules, and the processing unit 4020 is specifically configured to:

Part of the image processing modules in the plurality of image processing modules are deleted according to the processing results of the visual task model.

Optionally, as an embodiment, the processing unit 4020 is specifically configured to: adjust the weights of multiple image processing modules according to the processing results of the visual task model, and the weights of the multiple image processing modules are used to process the processing results of the multiple image processing modules Perform processing to obtain a second image; delete part of the image processing modules in the plurality of image processing modules according to the adjusted weights of the plurality of image processing modules.

Optionally, as an embodiment, the processing unit 4020 is specifically configured to: adjust parameters in at least one image processing module according to a processing result of the visual task model.

Optionally, as an embodiment, the processing unit 4020 is specifically configured to: adjust a processing sequence of at least one image processing module according to a processing result of the visual task model.

Optionally, as an embodiment, at least one image processing module includes: a black level compensation module, a green balance module, a dead point correction module, a demosaic module, a Bayer noise reduction module, an automatic white balance module, a color correction module, a gamma Horse correction module or noise reduction and sharpening module.

In another possible implementation manner, the apparatus 4000 may be used to execute the method 700 .

Specifically, the acquiring unit 4010 is configured to acquire a third image.

The processing unit 4020 is configured to: determine at least one target image processing module according to the visual task model; process the third image through at least one target image processing module to obtain the fourth image; process the fourth image through the visual task model to obtain the fourth image Four image processing results.

Optionally, as an embodiment, the processing unit 4020 is specifically configured to: determine at least one target image processing module from multiple candidate image processing modules according to the visual task model.

Optionally, as an embodiment, the processing unit 4020 is specifically configured to: determine parameters in at least one target image processing module according to the visual task model.

Optionally, as an embodiment, the processing unit 4020 is specifically configured to: determine a processing sequence of at least one target image processing module according to the visual task model.

Optionally, as an embodiment, at least one target image processing module includes: a black level compensation module, a green balance module, a dead point correction module, a demosaic module, a Bayer noise reduction module, an automatic white balance module, a color correction module, Gamma Correction Module or Noise Reduction and Sharpening Module.

It should be noted that the above device 4000 is embodied in the form of functional units. The term "unit" here may be implemented in the form of software and/or hardware, which is not specifically limited.

For example, a "unit" may be a software program, a hardware circuit or a combination of both to realize the above functions. The hardware circuitry may include application specific integrated circuits (ASICs), electronic circuits, processors (such as shared processors, dedicated processors, or group processors) for executing one or more software or firmware programs. etc.) and memory, incorporating logic, and/or other suitable components to support the described functionality.

Therefore, the units of each example described in the embodiments of the present application can be realized by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.

FIG. 8 is a schematic diagram of a hardware structure of an image processing device provided by an embodiment of the present application. The image processing apparatus 6000 shown in FIG. 8 (the apparatus 6000 may specifically be a computer device) includes a memory 6001 , a processor 6002 , a communication interface 6003 and a bus 6004 . Wherein, the memory 6001 , the processor 6002 , and the communication interface 6003 are connected to each other through a bus 6004 .

The memory 6001 may be a read only memory (read only memory, ROM), a static storage device, a dynamic storage device or a random access memory (random access memory, RAM). The memory 6001 may store programs, and when the programs stored in the memory 6001 are executed by the processor 6002, the processor 6002 is configured to execute various steps of the image processing method of the embodiment of the present application. Specifically, the processor 6002 may execute the method 300, the method 400 or the method 700 above.

The processor 6002 may be a general-purpose central processing unit (central processing unit, CPU), a microprocessor, an application specific integrated circuit (application specific integrated circuit, ASIC), a graphics processing unit (graphics processing unit, GPU) or one or more The integrated circuit is used to execute related programs to realize the image processing method of the method embodiment of the present application.

The processor 6002 may also be an integrated circuit chip with signal processing capabilities. During implementation, each step of the image processing method of the present application may be completed by an integrated logic circuit of hardware in the processor 6002 or instructions in the form of software.

The above-mentioned processor 6002 can also be a general-purpose processor, a digital signal processor (digital signal processing, DSP), an application-specific integrated circuit (ASIC), a ready-made programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, Discrete gate or transistor logic devices, discrete hardware components. Various methods, steps, and logic block diagrams disclosed in the embodiments of the present application may be implemented or executed. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module can be located in a mature storage medium in the field such as random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, register. The storage medium is located in the memory 6001, and the processor 6002 reads the information in the memory 6001, and combines its hardware to complete the functions required by the units included in the device shown in Figure 7, or execute the image processing method of the method embodiment of the present application .

The communication interface 6003 implements communication between the apparatus 6000 and other devices or communication networks by using a transceiver device such as but not limited to a transceiver. For example, training data can be obtained through the communication interface 6003 .

The bus 6004 may include pathways for transferring information between various components of the device 6000 (eg, memory 6001 , processor 6002 , communication interface 6003 ).

It should be noted that although the above device 6000 only shows a memory, a processor, and a communication interface, those skilled in the art should understand that the device 6000 may also include other devices necessary for normal operation during specific implementation. Meanwhile, according to specific needs, those skilled in the art should understand that the apparatus 6000 may also include hardware devices for implementing other additional functions. In addition, those skilled in the art should understand that the device 6000 may also only include the components necessary to realize the embodiment of the present application, and does not necessarily include all the components shown in FIG. 8 .

The embodiment of the present application also provides a computer-readable storage medium, the computer-readable medium stores program code for device execution, and the program code includes the image processing method used in the embodiment of the present application.

The embodiment of the present application further provides a computer program product including instructions, and when the computer program product is run on a computer, the computer is made to execute the image processing method in the embodiment of the present application.

The embodiment of the present application also provides a chip, the chip includes a processor and a data interface, and the processor reads the instructions stored in the memory through the data interface, and executes the image processing method in the embodiment of the present application.

The aforementioned chip may specifically be an FPGA or an ASIC.

It should be understood that the processor in the embodiment of the present application may be a central processing unit (central processing unit, CPU), and the processor may also be other general-purpose processors, digital signal processors (digital signal processor, DSP), application specific integrated circuits (application specific integrated circuit, ASIC), off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor, or the processor may be any conventional processor, or the like.

It should also be understood that the memory in the embodiments of the present application may be a volatile memory or a nonvolatile memory, or may include both volatile and nonvolatile memories. Among them, the non-volatile memory can be read-only memory (read-only memory, ROM), programmable read-only memory (programmable ROM, PROM), erasable programmable read-only memory (erasable PROM, EPROM), electrically programmable Erases programmable read-only memory (electrically EPROM, EEPROM) or flash memory. Volatile memory can be random access memory (RAM), which acts as external cache memory. By way of illustration and not limitation, many forms of random access memory (RAM) are available, such as static random access memory (static RAM, SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory Access memory (synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (double data rate SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous connection dynamic random access memory Access memory (synchlink DRAM, SLDRAM) and direct memory bus random access memory (direct rambus RAM, DR RAM).

The above-mentioned embodiments may be implemented in whole or in part by software, hardware, firmware or other arbitrary combinations. When implemented using software, the above-described embodiments may be implemented in whole or in part in the form of computer program products. The computer program product comprises one or more computer instructions or computer programs. When the computer instruction or computer program is loaded or executed on the computer, the processes or functions according to the embodiments of the present application will be generated in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from a website, computer, server or data center Transmission to another website site, computer, server or data center by wired (such as infrared, wireless, microwave, etc.). The computer-readable storage medium may be any available medium that can be accessed by a computer, or a data storage device such as a server or a data center that includes one or more sets of available media. The available media may be magnetic media (eg, floppy disk, hard disk, magnetic tape), optical media (eg, DVD), or semiconductor media. The semiconductor medium may be a solid state drive.

It should be understood that the term "and/or" in this article is only an association relationship describing associated objects, which means that there may be three relationships, for example, A and/or B may mean: A exists alone, and A and B exist at the same time , there are three cases of B alone, where A and B can be singular or plural. In addition, the character "/" in this article generally indicates that the related objects are an "or" relationship, but it may also indicate an "and/or" relationship, which can be understood by referring to the context.

In this application, "at least one" means one or more, and "multiple" means two or more. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one item (piece) of a, b, or c can represent: a, b, c, a-b, a-c, b-c, or a-b-c, where a, b, c can be single or multiple .

It should be understood that, in various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the order of execution, and the execution order of the processes should be determined by their functions and internal logic, and should not be used in the embodiments of the present application. The implementation process constitutes any limitation.

Those skilled in the art can appreciate that the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.

Those skilled in the art can clearly understand that for the convenience and brevity of the description, the specific working process of the above-described system, device and unit can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.

In the several embodiments provided in this application, it should be understood that the disclosed systems, devices and methods may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.

If the functions described above are realized in the form of software function units and sold or used as independent products, they can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or the part that contributes to the prior art or the part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, including Several instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disc and other media that can store program codes. .

The above is only a specific implementation of the application, but the scope of protection of the application is not limited thereto. Anyone familiar with the technical field can easily think of changes or substitutions within the technical scope disclosed in the application. Should be covered within the protection scope of this application. Therefore, the protection scope of the present application should be determined by the protection scope of the claims.

Claims

An image processing method, characterized in that, comprising:

get the first image;

Process the first image by at least one image processing module to obtain a second image;

inputting the second image into the visual task model for processing;

The at least one image processing module is adjusted according to the processing result of the visual task model.
The method according to claim 1, characterized in that, the at least one image processing module comprises a plurality of image processing modules, and the at least one image processing module is adjusted according to the processing results of the visual task model ,include:

Deleting part of the image processing modules in the plurality of image processing modules according to the processing result of the visual task model.
The method according to claim 2, wherein the deleting part of the image processing modules in the plurality of image processing modules according to the processing results of the visual task model comprises:

Adjust the weights of the multiple image processing modules according to the processing results of the visual task model, and the weights of the multiple image processing modules are used to process the processing results of the multiple image processing modules to obtain the second image;

Deleting part of the image processing modules in the plurality of image processing modules according to the adjusted weights of the plurality of image processing modules.
The method according to any one of claims 1 to 3, wherein said adjusting said at least one image processing module according to a processing result of said visual task model comprises:

Adjusting parameters in the at least one image processing module according to the processing results of the visual task model.
The method according to any one of claims 1 to 4, wherein said adjusting said at least one image processing module according to a processing result of said visual task model comprises:

The processing sequence of the at least one image processing module is adjusted according to the processing result of the visual task model.
The method according to any one of claims 1 to 5, wherein the at least one image processing module comprises:

Black level compensation module, green balance module, dead pixel correction module, demosaic module, Bayer noise reduction module, automatic white balance module, color correction module, gamma correction module or noise reduction and sharpening module.
An image processing method, characterized in that, comprising:

get the third image;

determining at least one target image processing module according to the visual task model;

Process the third image by the at least one target image processing module to obtain a fourth image;

The fourth image is processed by the visual task model to obtain a processing result of the fourth image.
The method according to claim 7, wherein said determining at least one target image processing module according to the visual task model comprises:

The at least one target image processing module is determined from a plurality of candidate image processing modules according to the visual task model.
The method according to claim 7 or 8, wherein said determining at least one target image processing module according to the visual task model comprises:

Determine parameters in the at least one target image processing module according to the visual task model.
The method according to any one of claims 7 to 9, wherein said determining at least one target image processing module according to the visual task model comprises:

A processing order of the at least one target image processing module is determined according to the visual task model.
The method according to any one of claims 7 to 10, wherein the at least one target image processing module comprises:

Black level compensation module, green balance module, dead pixel correction module, demosaic module, Bayer noise reduction module, automatic white balance module, color correction module, gamma correction module or noise reduction and sharpening module.
An image processing device, characterized in that it comprises:

an acquisition unit, configured to acquire the first image;

processing unit for:

Process the first image by at least one image processing module to obtain a second image;

inputting the second image into the visual task model for processing;

The at least one image processing module is adjusted according to the processing result of the visual task model.
The device according to claim 12, wherein the at least one image processing module comprises a plurality of image processing modules, and the processing unit is specifically used for:

Deleting part of the image processing modules in the plurality of image processing modules according to the processing result of the visual task model.
The device according to claim 13, wherein the processing unit is specifically used for:

Adjust the weights of the multiple image processing modules according to the processing results of the visual task model, and the weights of the multiple image processing modules are used to process the processing results of the multiple image processing modules to obtain the second image;

Deleting part of the image processing modules in the plurality of image processing modules according to the adjusted weights of the plurality of image processing modules.
The device according to any one of claims 12 to 14, wherein the processing unit is specifically configured to:

Adjusting parameters in the at least one image processing module according to the processing results of the visual task model.
The device according to any one of claims 12 to 15, wherein the processing unit is specifically configured to:

The processing sequence of the at least one image processing module is adjusted according to the processing result of the visual task model.
The device according to any one of claims 12 to 16, wherein the at least one image processing module comprises:

Black level compensation module, green balance module, dead pixel correction module, demosaic module, Bayer noise reduction module, automatic white balance module, color correction module, gamma correction module or noise reduction and sharpening module.
An image processing device, characterized in that it comprises:

an acquisition unit, configured to acquire a third image;

processing unit for:

determining at least one target image processing module according to the visual task model;

Process the third image by the at least one target image processing module to obtain a fourth image;

The fourth image is processed by the visual task model to obtain a processing result of the fourth image.
The device according to claim 18, wherein the processing unit is specifically used for:

The at least one target image processing module is determined from a plurality of candidate image processing modules according to the visual task model.
The device according to claim 18 or 19, wherein the processing unit is specifically used for:

Determine parameters in the at least one target image processing module according to the visual task model.
The device according to any one of claims 18 to 20, wherein the processing unit is specifically configured to:

A processing order of the at least one target image processing module is determined according to the visual task model.
The device according to any one of claims 18 to 21, wherein the at least one target image processing module comprises:

Black level compensation module, green balance module, dead pixel correction module, demosaic module, Bayer noise reduction module, automatic white balance module, color correction module, gamma correction module or noise reduction and sharpening module.
An image processing device, characterized in that it includes a processor and a memory, the memory is used to store program instructions, and the processor is used to call the program instructions to perform the tasks described in claims 1 to 6 or claims 7 to 11. any one of the methods described.
A computer-readable storage medium, characterized in that the computer-readable storage medium is used to store program code executed by a device, and the program code includes a program code for executing any of claims 1 to 6 or claims 7 to 11. one of the methods described.
A computer program product comprising instructions, characterized in that, when the computer program product is run on a computer, the computer is made to perform the method according to any one of claims 1 to 6 or claims 7 to 11 .
A chip, characterized in that the chip includes a processor and a data interface, and the processor reads the instructions stored on the memory through the data interface to execute any of claims 1 to 6 or claims 7 to 11. one of the methods described.