CN117529725A

CN117529725A - Image processing method and device

Info

Publication number: CN117529725A
Application number: CN202180099442.4A
Authority: CN
Inventors: 伍玮翔; 伍文龙
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2021-06-28
Filing date: 2021-06-28
Publication date: 2024-02-06
Also published as: WO2023272431A1

Abstract

The application provides an image processing method and device, relates to the field of artificial intelligence, and particularly relates to the field of computer vision. The method comprises the following steps: and processing the input image through at least one image processing module, taking the processing result as the input of the visual task model, and adjusting the at least one image processing module according to the processing result of the visual task model. According to the scheme, the image processing flow suitable for the visual task model can be obtained, and the performance of the visual task model is improved.

Description

Image processing method and device

Technical Field

The present application relates to the field of computer vision, and more particularly, to an image processing method and apparatus.

Background

Computer vision is an integral part of various intelligent/autonomous systems in various fields of application, such as manufacturing, inspection, document analysis, medical diagnosis, and military, and is a study of how to use cameras/cameras and computers to acquire the data and information of a subject. In image, eyes (cameras/video cameras) and brains (algorithms) are installed on a computer to replace human eyes to identify, track, measure targets and the like, so that the computer can sense the environment. Computer vision can be seen as a science of studying how artificial systems are "perceived" from images or multidimensional data. In general, computer vision is the acquisition of input information by various imaging systems instead of visual organs, and the processing and interpretation of such input information is accomplished by a computer instead of the brain.

The computer vision tasks include image classification, target detection, target tracking, target segmentation, and the like. In practical applications, a series of image signal processing (image signal processing, ISP) is generally performed on a raw image to output a visualized image. The visualized image may be used as an input image for a computer vision task. However, the purpose of ISPs is generally to meet the visual needs of humans. In practice, an image obtained after a series of image signal processing can satisfy the visual demands of a person, but performing a visual task based on the image does not necessarily result in an ideal processing result.

Disclosure of Invention

The application provides an image processing method and device, which can obtain an image processing flow suitable for a visual task and improve the performance of a visual task model.

In a first aspect, there is provided an image processing method, the method comprising: acquiring a first image; processing the first image through at least one image processing module to obtain a second image; inputting the second image into the visual task model for processing; and adjusting at least one image processing module according to the processing result of the visual task model.

In the scheme of the embodiment of the application, the image processing flow is adjusted according to the processing result of the visual task model, so that an image suitable for the visual task is obtained, and the performance of the visual task model is ensured. According to the scheme, the image processing flow can be adjusted according to the requirements of different application scenes so as to adapt to the different application scenes.

The first image may be a raw image acquired by the sensor, for example.

The image processing module is used for performing image signal processing on the input image.

The second image may be an RGB image, for example.

Optionally, processing the first image by at least one image processing module to obtain a second image, including: and processing the first image through at least one image processing module and the weight of the at least one image processing module to obtain a second image.

Specifically, the processing result of the at least one image processing module is adjusted according to the weight of the at least one image processing module, so as to obtain a second image.

Illustratively, the visual tasks include: target detection, image classification, target segmentation, target tracking or image recognition, etc.

The visual task model is used to perform visual tasks. For example, the visual task is object detection, and the visual task model is an object detection model. For another example, if the visual task is image recognition, the visual task model is an image recognition model.

The visual task model may be a trained model.

The processing results of the visual task model may include performance metrics of the visual task model.

Illustratively, the performance metrics of the visual task model include the accuracy of reasoning or the value of the loss function, etc. The loss function may be set as desired. The penalty function is used to indicate a difference between the reasoning results of the visual task model and the true values corresponding to the first image. It should be noted that, the loss function may be a loss function in the training process of the visual task model, or may be another loss function.

For example, if the visual task is target detection, the processing results of the visual task model may include detection accuracy.

For another example, if the visual task is a target segmentation, the processing result of the visual task model may include segmentation accuracy.

The visual task model may employ a neural network model, or alternatively, a non-neural network model.

The at least one image processing module is adapted according to the processing results of the visual task model so that the processing results of the visual task model are as close as possible to the expected.

Illustratively, the at least one image adjustment module may be adjusted using a bayesian optimization method, an RNN model, or a reinforcement learning algorithm, or the like.

With reference to the first aspect, in certain implementation manners of the first aspect, adjusting at least one image processing module according to a processing result of the visual task model includes: the at least one image processing module is adjusted according to the time of image processing and the processing result of the visual task model.

The time of image processing may be the processing time of the visual task model, or may be the processing time of the at least one image processing module, or may be the sum of the processing time of the visual task model and the processing time of the at least one image processing module.

Therefore, the processing speed can be improved and the time delay can be reduced on the premise of ensuring the performance of the visual task model.

With reference to the first aspect, in certain implementation manners of the first aspect, the at least one image processing module includes a plurality of image processing modules, and adjusting the at least one image processing module according to a processing result of the visual task model includes: the at least one image processing module is modified.

Altering the at least one image processing module may include: deleting part of the image processing modules in the at least one image processing module or/and adding other image processing modules.

In the scheme of the embodiment of the application, the combination of the image processing modules is changed according to the processing result of the visual task model, so that the combination of the image processing modules which are more suitable for the visual task model can be obtained, and the performance of the visual task model is improved.

With reference to the first aspect, in certain implementation manners of the first aspect, the at least one image processing module includes a plurality of image processing modules, and adjusting the at least one image processing module according to a processing result of the visual task model includes: and deleting part of the image processing modules in the plurality of image processing modules according to the processing result of the visual task model.

In the scheme of the embodiment of the application, part of the image processing modules are deleted according to the processing result of the visual task model, so that the time required by image processing can be reduced, the processing speed is improved, and the requirement on computational power is reduced.

With reference to the first aspect, in certain implementation manners of the first aspect, deleting a part of the image processing modules in the plurality of image processing modules according to a processing result of the visual task model includes: the weights of the plurality of image processing modules are adjusted according to the processing results of the visual task model, and the weights of the plurality of image processing modules are used for processing the processing results of the plurality of image processing modules to obtain a second image; and deleting part of the image processing modules in the plurality of image processing modules according to the adjusted weights of the plurality of image processing modules.

In the scheme of the embodiment of the application, the deleted image processing modules are determined according to the weights of the image processing modules, and the image processing modules with relatively smaller weight values are deleted, so that the influence on the processing result of the visual task model is smaller, and the influence on the performance of the visual task model after deletion is smaller. That is, the scheme of the embodiment of the application can reduce unnecessary operation, reduce calculation cost and improve processing speed on the premise of ensuring the performance of the visual task model.

Illustratively, the plurality of image processing modules is m image processing modules. m is an integer greater than 1. And deleting the n image processing modules with the minimum adjusted weights from the m image processing modules. n is an integer greater than 1 and less than m.

Alternatively, the image processing module whose adjusted weight is less than or equal to the weight threshold is deleted from the m image processing modules.

With reference to the first aspect, in certain implementation manners of the first aspect, adjusting at least one image processing module according to a processing result of the visual task model includes: and adjusting parameters in at least one image processing module according to the processing result of the visual task model.

In the scheme of the embodiment of the application, the parameters in the image processing module are adjusted according to the processing result of the visual task model, so that the image processing module more suitable for the visual task can be obtained, and the accuracy of the visual task is improved.

With reference to the first aspect, in certain implementation manners of the first aspect, adjusting at least one image processing module according to a processing result of the visual task model includes: deleting part of the image processing modules from the plurality of image processing modules according to the processing results of the visual task model; processing the fifth image through an undeleted image processing module in the plurality of image processing modules to obtain a sixth image, and inputting the sixth image into the visual task model for processing; and adjusting parameters of the undeleted image processing module according to the processing result of the visual task model.

According to the scheme of the embodiment of the application, the performance indexes obtained by the visual task model, such as accuracy of target detection, target segmentation accuracy and the like, are utilized to adjust the weights of a plurality of image processing modules, so that the image processing modules with great influence on the performance indexes of the visual task model are reserved, or the image processing modules capable of maintaining or improving the performance indexes of the visual task model are reserved. Therefore, the image processing module suitable for the visual task model or the image processing module required by the visual task model can be obtained, the time required by the image processing flow is reduced, the calculation cost is saved, the calculation force requirement is reduced, and the method is more friendly to hardware.

And, the performance index obtained by the visual task model is used for adjusting parameters in the reserved image processing module, for example, the performance index obtained by the visual task model is used for searching the design space of the image processing module, so that the optimal parameter configuration of each image processing module is obtained, and the performance of the visual task model is improved.

With reference to the first aspect, in certain implementation manners of the first aspect, adjusting at least one image processing module according to a processing result of the visual task model includes: and adjusting the processing sequence of at least one image processing module according to the processing result of the visual task model.

In the scheme of the embodiment of the application, the processing sequence of the image processing module is adjusted according to the processing result of the visual task model, so that an image processing flow more suitable for the visual task can be obtained, and the accuracy of the visual task is improved.

With reference to the first aspect, in certain implementations of the first aspect, the at least one image processing module includes: the device comprises a black level compensation module, a green balance module, a dead pixel correction module, a demosaicing module, a Bayer noise reduction module, an automatic white balance module, a color correction module, a gamma correction module or a noise reduction and sharpening module.

Any image processing module in the at least one image processing module can be implemented by adopting a neural network algorithm, or can also be implemented by adopting a non-neural network algorithm.

In a second aspect, there is provided an image processing method, the method comprising: acquiring a third image; determining at least one target image processing module according to the visual task model; processing the third image by at least one target image processing module to obtain a fourth image; and processing the fourth image through the visual task model to obtain a processing result of the fourth image.

According to the scheme of the embodiment of the application, different visual task models correspond to the configuration of different image processing modules, and when the visual task models change, the image processing modules can be adaptively matched with the visual task models, so that the image processing flow is more suitable for the visual task models, and the performance of the visual task models is improved.

The third image may be, for example, a raw map acquired by the sensor.

The processing result of the fourth image can also be understood as the processing result of the third image.

And the processing result of the fourth image is the reasoning result of the visual task model.

The at least one target image processing module is one or more image processing modules corresponding to the visual task model.

The visual task model may be a trained model.

In different application scenarios, different visual task models can be adopted, and accordingly, at least one target image processing module matched with the visual task model can be determined according to the different visual task models. In this way, different image processing modules can be selected according to different application scenes.

There is a correspondence between the visual task model and the configuration of the image processing module. The configuration of the image processing module that matches the current visual task model may be determined from the correspondence between the visual task model and the configuration of the image processing module.

Illustratively, the configuration of the image processing module includes at least one of: a combination of image processing modules, a weight of an image processing module, a processing order of an image processing module, or a parameter in an image processing module.

With reference to the second aspect, in certain implementations of the second aspect, determining at least one target image processing module according to the visual task model includes: at least one target image processing module is determined from a plurality of candidate image processing modules according to the visual task model.

According to the scheme of the embodiment of the application, different visual task models correspond to the combination of different image processing modules, and when the visual task model changes, the combination of the image processing modules can be adaptively matched with the visual task model, so that the combination of the current image processing modules is more suitable for the current visual task model, and the performance of the visual task model is improved.

And moreover, a proper image processing module is selected from a plurality of candidate image processing modules according to the visual task model, and all the candidate image processing modules are not required to be used for processing the image, so that the processing flow is reduced, and the requirement on the computing power is reduced.

There is a correspondence between the combination of the visual task model and the image processing module. The combination of the image processing modules corresponding to the current visual task model can be determined according to the corresponding relation, or the image processing module required by the visual task model, namely the at least one target image processing module, can be determined according to the corresponding relation.

With reference to the second aspect, in certain implementations of the second aspect, determining at least one target image processing module according to the visual task model includes: and determining the weight of at least one target image processing module according to the visual task model, wherein the weight of the at least one target image processing module is used for processing the processing result of the at least one target image processing module to obtain a fourth image.

According to the scheme of the embodiment of the application, different visual task models correspond to weights of different image processing modules, when the visual task models change, the weights of the image processing modules can be adaptively matched with the visual task models, so that the weights of the current image processing modules are more suitable for the current visual task models, and the performance of the visual task models is improved.

With reference to the second aspect, in certain implementations of the second aspect, determining at least one target image processing module according to the visual task model includes: parameters in at least one target image processing module are determined from the visual task model.

According to the scheme of the embodiment of the application, different visual task models correspond to parameters in different image processing modules, and when the visual task models change, the parameters in the image processing modules can be adaptively matched with the visual task models, so that the parameters in the current image processing modules are more suitable for the current visual task models, and the performance of the visual task models is improved.

There is a correspondence between the visual task model and parameters in the image processing module. Parameters in the image processing module corresponding to the visual task model, i.e. parameters in the at least one target image processing module, may be determined from the visual task model.

With reference to the second aspect, in certain implementations of the second aspect, determining at least one target image processing module according to the visual task model includes: a processing order of at least one target image processing module is determined based on the visual task model.

According to the scheme of the embodiment of the application, different visual task models correspond to different processing sequences of the image processing modules, and when the visual task models change, the processing sequences of the image processing modules can be adaptively matched with the visual task models, so that the processing sequences of the current image processing modules are more suitable for the current visual task models, and the performance of the visual task models is improved.

The visual task model and the processing sequence of the image processing module have a corresponding relation. And determining the processing sequence of the image processing module corresponding to the current visual task model, namely the processing sequence of the at least one target image processing module, according to the corresponding relation.

With reference to the second aspect, in certain implementations of the second aspect, the at least one target image processing module includes: the device comprises a black level compensation module, a green balance module, a dead pixel correction module, a demosaicing module, a Bayer noise reduction module, an automatic white balance module, a color correction module, a gamma correction module or a noise reduction and sharpening module.

In a third aspect, an image processing apparatus is provided, the apparatus comprising means or units for performing the method of any one of the above-described first aspects and the first aspect.

In a fourth aspect, there is provided an image processing apparatus comprising means or units for performing the method of any one of the implementations of the second aspect and the above-described second aspect.

It should be appreciated that the extensions, limitations, explanations and illustrations of the relevant content in the first aspect described above also apply to the same content in the second, third and fourth aspects.

In a fifth aspect, there is provided an image processing apparatus comprising: a memory for storing a program; a processor for executing the program stored in the memory, the processor being configured to perform the method of the first aspect and any implementation manner of the first aspect when the program stored in the memory is executed.

The processor in the fifth aspect may be a central processing unit (central processing unit, CPU) or a combination of a CPU and a neural network operation processor, where the neural network operation processor may include a graphics processor (graphics processing unit, GPU), a neural network processor (neural-network processing unit, NPU), a tensor processor (tensor processing unit, TPU), and the like. Wherein the TPU is an artificial intelligence accelerator application specific integrated circuit fully customized for machine learning by google (google).

In a sixth aspect, there is provided an image processing apparatus comprising: a memory for storing a program; a processor for executing the program stored in the memory, the processor being configured to perform the method of the second aspect and any implementation manner of the second aspect when the program stored in the memory is executed.

The processor in the sixth aspect may be a central processing unit or a combination of a CPU and a neural network operation processor, where the neural network operation processor may include a graphics processor, a neural network processor, a tensor processor, and the like. Wherein, TPU is an artificial intelligent accelerator application specific integrated circuit which is fully customized by google for machine learning.

In a seventh aspect, a computer readable storage medium is provided, the computer readable medium storing program code for execution by a device, the program code comprising instructions for performing the method in any one of the implementations of the first or second aspects.

In an eighth aspect, there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of any one of the implementations of the first or second aspects described above.

In a ninth aspect, a chip is provided, the chip including a processor and a data interface, the processor reading instructions stored on a memory through the data interface, and executing the method in any implementation manner of the first aspect or the second aspect.

Optionally, as an implementation manner, the chip may further include a memory, where the memory stores instructions, and the processor is configured to execute the instructions stored on the memory, where the instructions, when executed, are configured to perform the method in any implementation manner of the first aspect or the second aspect.

The chip may be a field-programmable gate array (FPGA) or an application-specific integrated circuit (ASIC).

Drawings

Fig. 1 is a schematic structural diagram of a system architecture according to an embodiment of the present application;

fig. 2 is a schematic diagram of an image processing flow provided in an embodiment of the present application;

FIG. 3 is a schematic flow chart of an image processing method according to an embodiment of the present application;

FIG. 4 is a schematic diagram of another image processing procedure according to an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of still another image processing procedure according to an embodiment of the present disclosure;

FIG. 6 is a schematic flow chart of another image processing method provided in an embodiment of the present application;

fig. 7 is a schematic block diagram of an image processing apparatus provided in an embodiment of the present application;

fig. 8 is a schematic block diagram of another image processing apparatus provided in an embodiment of the present application.

Detailed Description

The technical solutions in the present application will be described below with reference to the accompanying drawings.

The method and the device can be applied to the fields of automatic driving, image classification, image retrieval, image semantic segmentation, image quality enhancement, image super-resolution, monitoring, target tracking, target detection and the like which need to execute visual tasks.

Specifically, the method of the embodiment of the application can be applied to picture classification and monitoring scenes, and the two application scenes are respectively and simply described below.

Classifying pictures:

when a user stores a large number of pictures on terminal equipment (for example, a mobile phone) or a cloud disk, the user or the system can conveniently manage the album in a classified mode by identifying the images in the album, and user experience is improved.

By using the image processing method, the image suitable for executing the classification task can be obtained, and the classification accuracy is improved. In addition, the image processing flow can be reduced, hardware cost is reduced, the terminal equipment is more friendly, the speed of classifying the pictures is improved, the pictures of different categories can be labeled in real time, and the pictures can be checked and searched conveniently by a user. In addition, the classification labels of the pictures can also be provided for an album management system to carry out classification management, so that the management time of a user is saved, the album management efficiency is improved, and the user experience is improved.

And (3) monitoring:

the monitoring scene comprises: smart city, field monitoring, indoor monitoring, outdoor monitoring, in-car monitoring, etc. In the smart city scenario, various attribute identifications, such as pedestrian attribute identification and riding attribute identification, are required, and the deep neural network plays an important role in various attribute identifications by virtue of the strong capability of the deep neural network.

By adopting the image processing method, the image suitable for executing the attribute identification task can be obtained, and the identification accuracy is improved. In addition, the method can reduce image processing flow, reduce hardware cost, improve processing efficiency, facilitate real-time processing of the input road picture and more quickly identify different attribute information in the road picture.

Since embodiments of the present application relate to a large number of applications of neural networks, for ease of understanding, related terms and concepts of the neural networks to which embodiments of the present application may relate are first described below.

(1) Neural network

The neural network may be composed of neural units, which may be referred to as x _s And an arithmetic unit whose intercept 1 is an input, the output of the arithmetic unit may be:

wherein s=1, 2, … … n, n is a natural number greater than 1, W _s Is x _s B is the bias of the neural unit.

f is an activation function (activation functions) of the neural unit for introducing a nonlinear characteristic into the neural network to transform an input signal in the neural unit into an output signal. The output signal of the activation function may be used as an input to the next layer. For example, the activation function may be a ReLU, tanh, or sigmoid function.

A neural network is a network formed by joining together a plurality of the above-described single neural units, i.e., the output of one neural unit may be the input of another neural unit. The input of each neural unit may be connected to a local receptive field of a previous layer to extract features of the local receptive field, which may be an area composed of several neural units.

(2) Deep neural network

Deep neural networks (deep neural network, DNN), also known as multi-layer neural networks, can be understood as neural networks with multiple hidden layers. The DNNs are divided according to the positions of different layers, and the neural networks inside the DNNs can be divided into three types: input layer, hidden layer, output layer. Typically the first layer is the input layer, the last layer is the output layer, and the intermediate layers are all hidden layers. The layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer.

Although DNN appears to be complex, it is not really complex in terms of the work of each layer, simply the following linear relational expression:wherein,is the input vector which is to be used for the input,is the output vector of the vector,is the offset vector, W is the weight matrix (also called coefficient), and α () is the activation function. Each layer is only for input vectors The output vector is obtained through such a simple operation. Since the DNN layers are many, the coefficient W and the offset vectorAnd the number of (2) is also relatively large. The definition of these parameters in DNN is as follows: taking the coefficient W as an example: it is assumed that in a three-layer DNN, the linear coefficients of the 4 th neuron of the second layer to the 2 nd neuron of the third layer are defined asThe superscript 3 represents the number of layers in which the coefficient W is located, and the subscript corresponds to the output third layer index 2 and the input second layer index 4.

In summary, the coefficients of the kth neuron of the L-1 layer to the jth neuron of the L layer are defined as

It should be noted that the input layer is devoid of W parameters. In deep neural networks, more hidden layers make the network more capable of characterizing complex situations in the real world. Theoretically, the more parameters the higher the model complexity, the greater the "capacity", meaning that it can accomplish more complex learning tasks. The process of training the deep neural network, i.e. learning the weight matrix, has the final objective of obtaining a weight matrix (a weight matrix formed by a number of layers of vectors W) for all layers of the trained deep neural network.

(3) Convolutional neural network

The convolutional neural network (convolutional neuron network, CNN) is a deep neural network with a convolutional structure. The convolutional neural network comprises a feature extractor consisting of a convolutional layer and a sub-sampling layer, which can be regarded as a filter. The convolution layer refers to a neuron layer in the convolution neural network, which performs convolution processing on an input signal. In the convolutional layer of the convolutional neural network, one neuron may be connected with only a part of adjacent layer neurons. A convolutional layer typically contains a number of feature planes, each of which may be composed of a number of neural elements arranged in a rectangular pattern. Neural elements of the same feature plane share weights, where the shared weights are convolution kernels. Sharing weights can be understood as the way image information is extracted is independent of location. The convolution kernel can be formed in a matrix with random size, and reasonable weight can be obtained through learning in the training process of the convolution neural network. In addition, the direct benefit of sharing weights is to reduce the connections between layers of the convolutional neural network, while reducing the risk of overfitting.

(4) Circulating neural network

A recurrent neural network (recurrent neural networks, RNN) is used to process the sequence data. In the traditional neural network model, from an input layer to an implicit layer to an output layer, the layers are fully connected, and no connection exists for each node between each layer. Although this common neural network solves many problems, it still has no weakness for many problems. For example, you want to predict what the next word of a sentence is, it is generally necessary to use the previous word, because the previous and next words in a sentence are not independent. RNN is called a recurrent neural network in the sense that a sequence's current output is related to the previous output. The specific expression is that the network memorizes the previous information and applies the previous information to the calculation of the current output, namely, the nodes between the hidden layers are not connected any more and are connected, and the input of the hidden layers comprises not only the output of the input layer but also the output of the hidden layer at the last moment. In theory, RNNs are able to process sequence data of any length. Training for RNNs is the same as training for traditional CNNs or DNNs. Error back propagation algorithms are also used, but with a few differences: that is, if the RNN is network extended, parameters therein, such as W, are shared; this is not the case with conventional neural networks such as those described above. And in using a gradient descent algorithm, the output of each step depends not only on the network of the current step, but also on the state of the previous steps of the network. This learning algorithm is referred to as a time-based back propagation algorithm (back propagation through time, BPTT).

Why is the convolutional neural network already present, the neural network is also looped? The reason is simple, and in convolutional neural networks, one precondition assumption is that: the elements are independent of each other, and the input and output are independent of each other, such as cats and dogs. However, in the real world, many elements are interconnected, such as the stock changes over time, and further such as one says: i like travel, where the most favored place is Yunnan, and later have the opportunity to go. Here, the filling should be known to humans as filling "yunnan". Because humans will infer from the context, but how to have the machine do this? RNNs have thus been developed. RNNs aim to give robots the ability to memorize as a robot. Thus, the output of the RNN needs to rely on current input information and historical memory information.

(5) Loss function

In training the deep neural network, since the output of the deep neural network is expected to be as close to the value actually expected, the weight vector of each layer of the neural network can be updated according to the difference between the predicted value of the current network and the actually expected target value (of course, the process of pre-configuring parameters for each layer in the deep neural network is usually performed before the first update), for example, if the predicted value of the network is higher, the weight vector is adjusted to be lower than the predicted value, and the adjustment is continuously performed until the deep neural network can predict the actually expected target value or the value very close to the actually expected target value. Thus, it is necessary to define in advance "how to compare the difference between the predicted value and the target value", which is a loss function (loss function) or an objective function (objective function), which are important equations for measuring the difference between the predicted value and the target value. Taking the loss function as an example, the higher the output value (loss) of the loss function is, the larger the difference is, and then the training of the deep neural network becomes a process of reducing the loss as much as possible. In general, the smaller the loss, the higher the training quality of the deep neural network, and the larger the loss, the lower the training quality of the deep neural network. Similarly, the smaller the loss ripple, the more stable the training; the greater the loss fluctuation, the less stable the training.

As shown in fig. 1, an embodiment of the present application provides a system architecture 100. In fig. 1, a data acquisition device 170 is used to acquire training data. For example, for the image processing method of the embodiment of the present application, the training data may include a training image and a true value (group trunk) corresponding to the training image. For example, if the visual task is an image classification task, the true value corresponding to the training image may be a classification result corresponding to the training image, and the classification result of the training image may be a manually pre-labeled result.

After the training data is collected, the data collection device 170 stores the training data in the database 130 and the training device 120 trains the target model/rule 101 based on the training data maintained in the database 130. The target model/rule 101 is the model used for the visual task. For example, the visual task is an image classification task, then the target model/rule 101 may be a network model for image classification.

The training device 120 obtains the target model/rule 101 based on the training data, and the training device 120 processes the input raw data and compares the output value with the target value until the difference between the value output by the training device 120 and the target value is smaller than a certain threshold value, thereby completing the training of the target model/rule 101.

The target model/rule 101 in the embodiment of the present application may specifically be a neural network model. Such as convolutional neural networks or residual networks. It should be noted that, in practical applications, the training data maintained in the database 130 is not necessarily all acquired by the data acquisition device 170, but may be received from other devices. It should be noted that the training device 120 is not necessarily completely based on the training data maintained by the database 130 to perform training of the target model/rule 101, and it is also possible to obtain the training data from the cloud or other places to perform model training, which should not be taken as a limitation of the embodiments of the present application.

The target model/rule 101 obtained by training according to the training device 120 may be applied to different systems or devices, such as the execution device 110 shown in fig. 1, where the execution device 110 may be a terminal, such as a mobile phone terminal, a tablet computer, a notebook computer, an augmented reality (augmented reality, AR) AR/Virtual Reality (VR), a vehicle-mounted terminal, or may also be a server or cloud. In fig. 1, an execution device 110 configures an input/output (I/O) interface 112 for data interaction with an external device, and a user may input data to the I/O interface 112 through a client device 140, where the input data may include in embodiments of the present application: data to be processed entered by the client device. Illustratively, the input data may include a raw graph in embodiments of the present application.

The preprocessing module 113 is configured to perform preprocessing according to the input image received by the I/O interface 112, and in this embodiment of the present application, the preprocessing module 113 may be configured to perform a series of image signal processing on the input image. The preprocessing module 113 may include one or more image processing modules therein.

In preprocessing input data by the execution device 110, or in performing processing related to computation or the like by the computation module 111 of the execution device 110, the execution device 110 may call data, codes or the like in the data storage system 150 for corresponding processing, or may store data, instructions or the like obtained by corresponding processing in the data storage system 150.

Finally, the I/O interface 112 returns the processing results, such as the processing results of the data obtained as described above, to the client device 140, thereby providing the processing results to the user.

It should be noted that the training device 120 may generate, based on different training data, a corresponding target model/rule 101 for different targets or different tasks, where the corresponding target model/rule 101 may be used to achieve the targets or to complete the tasks, thereby providing the user with the desired result.

In the case shown in FIG. 1, the user may manually give input data that may be manipulated through an interface provided by the I/O interface 112. In another case, the client device 140 may automatically send the input data to the I/O interface 112, and if the client device 140 is required to automatically send the input data requiring the user's authorization, the user may set the corresponding permissions in the client device 140. The user may view the results output by the execution device 110 at the client device 140, and the specific presentation may be in the form of a display, a sound, an action, or the like. The client device 140 may also be used as a data collection terminal to collect input data of the input I/O interface 112 and output results of the output I/O interface 112 as new sample data as shown in the figure, and store the new sample data in the database 130. Of course, instead of being collected by the client device 140, the I/O interface 112 may directly store the input data input to the I/O interface 112 and the output result output from the I/O interface 112 as new sample data into the database 130.

It should be noted that fig. 1 is only a schematic diagram of a system architecture provided in the embodiments of the present application, and the positional relationship among devices, apparatuses, modules, etc. shown in the drawings is not limited in any way, for example, in fig. 1, the data storage system 150 is an external memory with respect to the execution device 110, and in other cases, the data storage system 150 may be disposed in the execution device 110.

As shown in fig. 1, the training device 120 trains to obtain a target model/rule 101, where the target model/rule 101 may be a neural network model in the present application in the embodiment of the present application, and specifically, the neural network model in the embodiment of the present application may be a CNN or a residual network.

The image signal processor outputs a visualized image after a series of processing is performed on the raw image acquired by the sensor. These images may be used as input images for visual tasks. Specifically, in the visual task, the input image may be processed by using a neural network algorithm or a non-neural network algorithm, so as to obtain a relevant result of the visual task.

Fig. 2 shows a schematic diagram of the overall process flow of a visual task. The raw image is used as an input image, and a series of image signal processing is performed on the input image to output a visualized Red Green Blue (RGB) image of 8 bits. And taking the RGB image as an input image of the visual task to obtain a processing result of the visual task. For example, as shown in fig. 2, the image signal processing module includes a black level compensation (black level compensation) module, a green balance (green balance) module, a dead pixel correction (bad pixel correction) module, a demosaic (demosaic) module, a bayer noise reduction (bayer denoise) module, an automatic white balance (auto white balance) module, a color correction (color correction) module, a gamma correction (gamma correction) module, a noise reduction and sharpening (denoise sharpness) module, and the like. The image signal processing module can adopt a non-neural network algorithm or a neural network algorithm.

The input image of a visual task is typically an RGB image that is subject to image signal processing. The purpose of conventional image signal processing is generally to meet the visual needs of a person, and the result of performing a visual task based on the image is not necessarily an optimal result.

The embodiment of the application provides an image processing method, which adjusts an image processing flow before a visual task according to a processing result of the visual task so as to obtain the image processing flow meeting the requirement.

The image processing method in the embodiment of the present application is described in detail below with reference to fig. 3 to 6.

Fig. 3 illustrates an image processing method 300 provided in an embodiment of the present application. The method shown in fig. 3 may be performed by a computing device, which may be a cloud service device, or may be a terminal device, for example, a computer, a server, a mobile phone, a camera, a vehicle, a drone, or a robot, or may be a system formed by the cloud service device and the terminal device.

For example, the method 300 may be performed by a training device or an inference device, e.g., the method 300 may be performed by an accelerator such as a CPU, GPU, or NPU. Further, the accelerator chip may be located on an FPGA, a chip Emulator (simulator), or a development board (EVB).

Alternatively, the method 300 may be performed by a tuning tool or a calibration tool of an ISP pipeline (pipeline) of a hardware device (e.g., a video camera or a camera).

The method 300 includes steps S301 to S304. The following describes step S301 to step S304 in detail.

S301, acquiring a first image.

The first image may be a raw image acquired by the sensor, for example.

The training dataset comprises a plurality of images, the first image being any image in the training dataset. In practice, the method 300 may be performed multiple times based on multiple images in the training dataset until the desired image processing module is obtained.

Illustratively, the training data set may employ an open source data set. Alternatively, the training data set may be a self-made data set.

For example, the training data set may be pre-stored. For example, the training data set may be training data maintained in database 130 shown in FIG. 1. Alternatively, the training data set may be data entered by the user.

S302, processing the first image through at least one image processing module to obtain a second image.

The at least one image processing module may be located on the image signal processor, for example. That is, step S302 is performed by the image processing module in the image processor.

Any image processing module in the at least one image processing module can be implemented by adopting a neural network algorithm, or can also be implemented by adopting a non-neural network algorithm. The embodiment of the application does not limit the specific implementation manner of the image processing module.

Optionally, the at least one image processing module may include: the system comprises a black level compensation module, a green balance module, a dead pixel correction module, a demosaicing module, a Bayer noise reduction module, an automatic white balance module, a color correction module, a gamma correction module or a noise reduction and sharpening module.

For example, as shown in fig. 4, the raw image is taken as the first image, and the at least one image processing module includes 9 image processing modules, namely a black level compensation module, a green balance module, a dead pixel correction module, a demosaicing module, a Bayer noise reduction module, an automatic white balance module, a color correction module, a gamma correction module and a noise reduction and sharpening module. The 9 image processing modules sequentially perform black level compensation, green balance processing, dead pixel correction, demosaicing, bayer noise reduction, automatic white balance processing, color correction, gamma correction, and noise reduction and sharpening.

Illustratively, a black level module, a green balance module, and a dead pixel correction module may be used to process raw data. The demosaicing module and Bayer noise reduction module may be used to perform the demosaicing process. An automatic white balance module, a color correction module, a gamma correction module, and a noise reduction and sharpening module may be used to perform the image enhancement process.

For example, as shown in fig. 4, the second image may be an RGB image. Further, the second image may be an 8bit RGB image. The type of the second image may also be set according to the input needs of the visual task model, here by way of example only.

Optionally, step S302 includes: and processing the first image through at least one image processing module and the weight of the at least one image processing module to obtain a second image.

The image processing module may, for example, process the image input to the module by adjusting, i.e. varying, the pixel values of all or part of the pixels of the image input to the module. In this case, the amount of change in the pixel values of all or part of the pixels may be adjusted according to the weight of the image processing module.

For example, the weight of the image processing module is multiplied by the variation of the pixel value to obtain the variation of the adjusted pixel, and thus the output image of the module is obtained. If the weight of the image processing module is 0, the image processing module does not participate in the image processing flow.

The specific value of the weight may be set as required, for example, the weight may be a real number greater than or equal to 0 and less than or equal to 1.

Further, the weight of the at least one image processing module may be normalized when the weight is set, that is, the sum of the weights of the at least one image processing module is made to be 1, or the sum of the weights of the at least one image processing module is made to be close to 1.

As shown in fig. 4, the weights of the 9 image processing modules are w1, w2, w3, w4, w5, w6, w7, w8, and w9, respectively. The range of the weight is a real number which is more than or equal to 0 and less than or equal to 1. Thus, the largest possible sum of w1, w2, w3, w4, w5, w6, w7, w8 and w9 is 9. Alternatively, the 9 weights may be normalized so that the sum of the 9 weights becomes 1.

S303, inputting the second image into the visual task model for processing.

The visual task model may be a trained model.

The type of output of the visual task model is related to the type of visual task. The output of the visual task model is the reasoning result of the visual task model.

For example, where the visual task is target detection, the output of the visual task model may be a target box on the second image and a category of objects in the target box. For another example, where the visual task is image classification, the output of the visual task model may be the classification of the second image.

And inputting the second image into the visual task model for processing, comparing the obtained detection result with a true value corresponding to the first image to obtain an error between the two, and determining the detection accuracy according to the error between the two.

And inputting the second image into the visual task model for processing, comparing the obtained segmentation result with a true value corresponding to the first image to obtain an error between the segmentation result and the true value, and determining the segmentation accuracy according to the error between the segmentation result and the true value.

The visual task model may employ a neural network model, or alternatively, a non-neural network model. The neural network model may be an existing neural network model, for example, a residual network. Alternatively, the neural network model may be a neural network model of other structure that is self-building. The embodiments of the present application are not limited in this regard.

It should be noted that, for the same visual task, different visual task models may be adopted in different application scenarios. For example, for a target detection task in a driving scene, the visual task models employed in the case of overexposure and underexposure may be the same or may be different. During driving, a first target detection model may be employed if the current scene is identified as overexposed, and a second target detection model may be employed if the current scene is identified as underexposed. The first object detection model and the second object detection model are different object detection models.

Illustratively, the processing of the visual task model may be performed by the computing module 111 in FIG. 1.

The visual task model may be deployed on the execution device of the method 300, as well as on other devices. That is, the processing of the visual task model may be performed by the execution device of the method 300, or may be performed by another device, and the processing result is fed back to the execution device of the method 300.

S304, adjusting the at least one image processing module according to the processing result of the visual task model.

Or, the at least one image processing module is adjusted according to the performance index of the visual task model to improve the performance of the visual task model.

For example, if the performance index of the visual task model is the accuracy of the reasoning of the visual task model, the at least one image processing module is adjusted to improve the accuracy of the reasoning of the model.

For another example, if the performance index of the visual task model is the value of the loss function of the visual task model, the at least one image processing module is adjusted to reduce the value of the loss function of the visual task model.

In practical applications, the method 300 may be performed based on multiple images in the training dataset until a preset condition is met. That is, in practical applications, the image processing module may be iteratively adjusted based on a plurality of images. The image processing module adopted in each iteration process is the image processing module obtained after the last iteration.

The preset conditions may be set as needed, and will be exemplified in modes 1, 2, 3 and 4 hereinafter.

Further, the at least one image processing module may also be adjusted according to the time of image processing and the processing result of the visual task model.

In the scheme of the embodiment of the application, the image processing flow is adjusted according to the processing result of the visual task model, so that an image suitable for the visual task is obtained, and the performance of the visual task model is ensured.

According to the scheme, the image processing flow can be adjusted according to the requirements of different application scenes so as to adapt to the different application scenes.

The same visual task may employ different visual task models in different application scenarios. For example, for a target detection task in a driving scene, the visual task models employed in the case of overexposure and underexposure may be the same or may be different. During driving, if the current scene is identified as overexposed, a first object detection model may be employed as the visual task model. If the current scene is identified as underexposed, a second object detection model may be employed as the visual task model. According to the scheme, the image processing flow can be adjusted according to the processing results of the first target detection model and the second target detection model respectively, so that the image processing flow suitable for the first target detection model and the image processing flow suitable for the second target detection model can be obtained respectively.

Step S304 may be implemented in various ways, and four ways (way 1, way 2, way 3, and way 4) are described below as examples.

Mode 1

Optionally, the at least one image processing module includes a plurality of image processing modules, and step S304 includes: and adjusting the weights of the plurality of image processing modules according to the processing results of the visual task model.

And adjusting the weights of the plurality of image processing modules according to the processing results of the visual task model so as to improve the performance of the visual task model.

As described above, in practical application, the method 300 may be performed based on a plurality of images in the training dataset to implement iterative adjustment of weights of the plurality of image processing modules until a preset condition is satisfied. And stopping adjusting the weights of the image processing modules after the preset conditions are met, or stopping refreshing the weights of the image processing modules.

For example, the preset condition may be a weight convergence of the plurality of image processing modules.

In the event that the weights of the plurality of image processing modules converge, the method 300 is no longer performed, i.e., the adjustment of the weights of the plurality of image processing modules is stopped. Weight convergence may also be understood as the resulting less change in weight gradient after performing the method 300 multiple times in succession. For example, when the amount of change in the weight gradient obtained after the method 300 is continuously performed a plurality of times is less than or equal to the first threshold, the adjustment of the weights of the plurality of image processing modules is stopped.

Alternatively, the preset condition may be that the accuracy of the visual task model is greater than or equal to a second threshold.

In the event that the accuracy of the visual task model is greater than or equal to the second threshold, the method 300 is no longer performed, i.e., the adjustment of the weights of the plurality of image processing modules is stopped.

The second threshold may be a preset value. Alternatively, the second threshold may be the accuracy of reasoning of the visual task model obtained without setting the weights of the image processing module. For example, as shown in fig. 4, the second threshold may be the accuracy of reasoning of the visual task model with no weights set by the 9 image processing modules. Or it can be appreciated that the second threshold may be the accuracy of the reasoning of the visual task model with the weight of the 9 image processing modules being 1.

That is, the image is input into the original image processing module to be processed, and the processed image is input into the visual task model to be processed, and the accuracy of reasoning is calculated, with the accuracy being taken as the second threshold. The method 300 is executed, namely, the image is input into the image processing module with the current weight adjusted for processing, the processed image is input into the visual task model for processing, the accuracy of reasoning is calculated, the accuracy of the current obtained reasoning is compared with a second threshold value, and the method 300 is not executed any more under the condition that the accuracy of the current obtained reasoning is greater than or equal to the second threshold value. In this way, the adjusted image processing module is utilized to process the image, so that the performance of the visual task model can be ensured, or the performance of the visual task model can be improved.

Alternatively, the preset condition may be that the amount of change in the loss function value of the visual task model obtained after the method 300 is continuously performed a plurality of times is less than or equal to the third threshold.

That is, in the event that the change in the loss function value of the visual task model tends to stabilize, the method 300 is no longer performed.

Alternatively, the preset condition may be that the number of iterations is greater than or equal to the fourth threshold.

That is, in the case where the number of times the method 300 is performed is greater than or equal to the fourth threshold value, the method 300 is not performed any more.

It should be appreciated that the above-described preset conditions may be used in combination. For example, the preset condition may be that the accuracy of the visual task model is greater than or equal to a second threshold and the number of iterations is greater than or equal to a fourth threshold. For another example, the preset condition may be that weights of the plurality of image processing modules converge, and accuracy of the visual task model is greater than or equal to a second threshold.

It should be understood that the foregoing is merely an example, and the preset condition may be other forms of conditions, which are not limited in this application.

Illustratively, the weights of the plurality of image processing modules may be adjusted by a bayesian optimization method, an RNN model, or a reinforcement learning algorithm, or the like.

The following describes a bayesian optimization method as an example.

For example, the visual task model is a target detection model, and the performance index of the visual task model may be an average accuracy (mean average precision, mAP). And adjusting the weights of the plurality of image processing modules by a Bayesian optimization method so as to improve mAP of the target detection model. Alternatively, the weights of the plurality of image processing modules are adjusted with the mAP maximization of the target detection model as a target.

Average accuracy refers to the average of the detection accuracy for all target objects.

And inputting the images in the training data set into a target detection model to obtain the detection accuracy of the images. Inputting the detection accuracy of the image into a Bayesian optimization model, and adjusting the weight of each image processing module by the Bayesian optimization model.

Further, the detection accuracy of the image may be retained in a bayesian optimization model. That is, when other images in the training data set are input into the target detection model, the detection accuracy of the other images is obtained. The bayesian optimization model can adjust the weights of the various image processing modules according to the detection accuracy of other images and the detection accuracy of previous images.

It should be noted that, in the embodiment of the present application, the training data set is used to train each image processing module, and the training data set may be the same or different from the training data set of the visual task model. For example, the training data set in the embodiments of the present application may employ a verification data set or a test data set of a visual task model, or the like.

In the scheme of the embodiment of the application, the weight of the image processing module is evaluated according to the processing result of the visual task model, and then the weight of the image processing module is adjusted so as to increase the weight of the image processing module with stronger performance correlation with the visual task model and reduce the weight of the image processing module with weaker performance correlation with the visual task model, so that an image processing flow more suitable for the visual task can be obtained, and the performance of the visual task model is improved.

Mode 2

Optionally, step S304 includes: the at least one image processing module is modified according to the processing result of the visual task model.

In a possible implementation, step S304 may be to select a combination of image processing modules from a plurality of candidate image processing modules according to the processing result of the visual task model, and replace the at least one image processing module with the combination of image processing modules.

Illustratively, the at least one image processing module may be modified by a bayesian optimization method or a reinforcement learning algorithm, or the like.

As described above, in practical applications, the method 300 may be performed based on a plurality of images in the training dataset to implement iterative adjustment of the combination of the plurality of image processing modules until a preset condition is met. And stopping adjusting the combination of the plurality of image processing modules after the preset condition is met, or stopping refreshing the combination of the plurality of image processing modules.

For example, the preset condition may be that the number of iterations is greater than or equal to a fourth threshold.

In case the number of times the method 300 is performed is greater than or equal to the fourth threshold value, the method 300 is no longer performed, i.e. the adjustment of the combination of the image processing modules is stopped.

It should be understood that this is only an example, and other preset conditions may be set in reference to mode 1, which is not described herein.

The at least one image processing module includes a plurality of image processing modules, and step S304 includes: and deleting part of the image processing modules from the plurality of image processing modules according to the processing results of the visual task model.

In one possible implementation, mode 2 may employ the processing results of mode 1.

Optionally, step S304 includes: adjusting weights of the plurality of image processing modules according to processing results of the visual task model; and deleting part of the image processing modules from the plurality of image processing modules according to the adjusted weights of the plurality of image processing modules.

For example, as shown in fig. 4, among the 9 image processing modules, the green balance module, the dead pixel correction module, the bayer noise reduction module, the color correction module, and the noise reduction and sharpening module, the weights corresponding to the five modules are less than or equal to the weight threshold, and the five modules are deleted.

In addition, the image processing module with higher weight has stronger correlation with the visual task model, or the image processing module with higher weight has larger influence on the performance of the visual task model. In the scheme of the embodiment of the application, the deleted image processing modules are determined according to the weights of the image processing modules, and the image processing modules with relatively smaller weight values are deleted, so that the influence on the processing result of the visual task model is smaller, and the influence on the performance of the visual task model after deletion is smaller. That is, the scheme of the embodiment of the application can reduce unnecessary operation, reduce calculation cost and improve processing speed on the premise of ensuring the performance of the visual task model.

Optionally, step S304 includes: and deleting part of the image processing modules from the plurality of image processing modules according to the processing results of the visual task model and the processing speeds of the plurality of image processing modules.

Illustratively, a portion of the image processing modules is deleted from the plurality of image processing modules based on the adjusted weights of the plurality of image processing modules and the processing speeds of the plurality of image processing modules.

For example, the image processing module whose adjusted weight is less than or equal to the weight threshold and whose processing speed is less than or equal to the speed threshold is deleted from the plurality of image processing modules. That is, the image processing module whose processing speed is slow and whose influence on the visual task model is small is deleted. In this way, the speed of image processing can be further improved.

Mode 3

The at least one image processing module includes a plurality of image processing modules, and step S304 includes: and adjusting the processing sequence of the plurality of image processing modules according to the processing result of the visual task model.

And adjusting the processing sequence of the plurality of image processing modules according to the processing result of the visual task model so as to improve the performance of the visual task model.

Illustratively, the processing sequence of the plurality of image processing modules may be adjusted by a bayesian optimization method, an RNN model, or a reinforcement learning algorithm, or the like.

As described above, the method 300 may be performed based on a plurality of images in the training dataset until a predetermined condition is satisfied. And stopping adjusting the processing sequence of the plurality of image processing modules after the preset condition is met, or stopping refreshing the processing sequence of the plurality of image processing modules.

For example, the preset condition may be that the amount of change in the processing order of the plurality of image processing modules is less than or equal to a fifth threshold.

For example, the amount of change in the processing order of the plurality of image processing modules may be the number of image processing modules whose processing order has changed after performing the method 300.

Alternatively, the preset condition may be that the accuracy of the reasoning of the visual task model is greater than or equal to a sixth threshold.

In the event that the accuracy of the reasoning of the visual task model is greater than or equal to the sixth threshold, the method 300 is no longer performed, i.e., the adjustment of the processing order of the plurality of image processing modules is stopped.

The sixth threshold may be a preset value. Alternatively, the sixth threshold may be the accuracy of reasoning of the visual task model without adjusting the processing order of the image processing modules. For example, as shown in fig. 4, the sixth threshold may be the accuracy of reasoning of the visual task model in the case where the images are processed in the processing order of the image processing module as shown in fig. 4.

That is, the images are input to the original image processing module, processed in the order of the original image processing module, and the processed images are input to the visual task model for processing, and the accuracy of reasoning is calculated, with the accuracy being taken as the sixth threshold. The images are input into the image processing module, the processing sequence of which is currently adjusted, and the processed images are input into the visual task model for processing, and the accuracy of the reasoning is calculated, the accuracy of the current reasoning is compared with a sixth threshold, and the method 300 is not executed any more if the accuracy of the current reasoning is greater than or equal to the sixth threshold. In this way, the images are processed according to the processing sequence of the adjusted image processing module, so that the performance of the visual task model can be ensured or the performance of the visual task model can be improved.

Alternatively, the above-described preset conditions may be used in combination. For example, the preset condition may be that the accuracy of reasoning of the visual task model is greater than or equal to a sixth threshold and the number of iterations is greater than or equal to a fourth threshold. For another example, the preset condition may be that a variation of the processing sequence of the plurality of image processing modules is less than or equal to a fifth threshold, and an accuracy of the visual task model is greater than or equal to the sixth threshold.

Mode 4

Optionally, step S304 includes: and adjusting parameters in the at least one image processing module according to the processing result of the visual task model.

And adjusting parameters in the at least one image processing module according to the processing result of the visual task model so as to improve the performance of the visual task model.

For example, if the image processing module adopts a neural network model, the parameters in the image processing module are parameters of the neural network model.

Illustratively, the parameters in the at least one image processing module may be adjusted using a Bayesian optimization method, an RNN model, a reinforcement learning algorithm, etc.

The input image is processed based on the parameter combination in the current image processing module, and the processed result is input into a visual task model for processing, for example, a CPU or GPU executes a visual task. And updating the parameter combination in the image processing module according to the feedback optimization of the performance of the visual task model, namely searching the optimal parameter combination in the image processing module in the search space so as to improve the performance of the visual task model.

As described above, the method 300 may be performed based on a plurality of images in the training dataset until a predetermined condition is satisfied. And stopping adjusting parameters in the at least one image processing module after the preset condition is met, or stopping refreshing the parameters in the at least one image processing module.

For example, the preset condition may be that the accuracy of the reasoning of the visual task model is greater than or equal to a seventh threshold.

In case the accuracy of the reasoning of the visual task model is greater than or equal to the seventh threshold, the method 300 is no longer performed, i.e. the adjustment of the parameters in the at least one image processing module is stopped.

The seventh threshold may be a preset value. Alternatively, the seventh threshold may be an accuracy of processing of the visual task model obtained without adjusting parameters in the at least one image processing module. For example, as shown in fig. 4, the seventh threshold may be the accuracy of the reasoning of the visual task model without the 9 image processing modules adjusting the parameters.

That is, the image is input into the original image processing module, that is, the image processing module without the adjustment parameters, and the processed image is input into the visual task model for processing, and the accuracy of reasoning is calculated, with the accuracy being taken as the seventh threshold. The image is input to the image processing module with the parameters currently adjusted for processing, the processed image is input to the visual task model for processing, the accuracy of the reasoning is calculated, the accuracy of the current derived reasoning is compared with a seventh threshold, and the method 300 is not executed any more if the accuracy of the current derived reasoning is greater than or equal to the seventh threshold. In this way, the adjusted image processing module is utilized to process the image, so that the performance of the visual task model can be ensured, or the performance of the visual task model can be improved.

Alternatively, the above-described preset conditions may be used in combination. For example, the preset condition may be that the accuracy of reasoning of the visual task model is greater than or equal to a seventh threshold and the number of iterations is greater than or equal to a fourth threshold.

Any two or more of the above modes 1, 2, 3 and 4 may be used in combination. When used in combination, the various ways may be performed simultaneously or the various ways may be performed separately.

Optionally, step S304 includes: deleting part of the image processing modules from the plurality of image processing modules according to the processing results of the visual task model; processing the fifth image through an undeleted image processing module in the plurality of image processing modules to obtain a sixth image, and inputting the sixth image into the visual task model for processing; and adjusting parameters of the undeleted image processing module according to the processing result of the visual task model.

The fifth image may be an image in the training dataset, for example. Other descriptions of the fifth image may refer to the first image in the foregoing. The fifth image and the first image may be the same image or may be different images.

Illustratively, the sixth image may be an RGB image. The description of the sixth image may refer to the second image in the foregoing.

Optionally, step S304 includes: and adjusting parameters of the plurality of image processing modules and weights of the plurality of image processing modules according to processing results of the visual task model, and deleting part of the image processing modules from the plurality of image processing modules according to the adjusted weights of the plurality of image processing modules.

Optionally, step S304 includes: and adjusting parameters of the plurality of image processing modules, weights of the plurality of image processing modules and processing sequences of the plurality of image processing modules according to processing results of the visual task model, and deleting part of the image processing modules from the plurality of image processing modules according to the adjusted weights of the plurality of image processing modules.

The embodiment of the present application provides an image processing method 400, where the method 400 may be regarded as a specific implementation manner of the method 300, and the specific description refers to the foregoing method 300, and for brevity of description, a part of the description is omitted when introducing the method 400. Specifically, method 400 employs a combination of modes 1, 2, and 4.

The method 400 includes steps S401 to S410. Step S401 to step S410 are explained below. The method 400 can be regarded as two stages, the first stage comprising steps S401 to S406 and the second stage comprising steps S407 to S410.

S401, setting initial weights for a plurality of image processing modules.

For example, the plurality of image processing modules may include 9 image processing modules as shown in fig. 5. The weights of the respective image processing modules are denoted as w1, w2, w3, w4, w5, w6, w7, w8, and w9. The sum of the 9 weights is 1.

The above is merely an example, and other weight setting methods may refer to the description in step S302.

S402, inputting the images in the training data set into the plurality of image processing modules for processing.

I.e. the input image is processed based on the weights of the plurality of image processing modules. Or, the processing results of the plurality of image processing modules are adjusted based on the weights of the plurality of image processing modules.

For example, the input graphics are processed according to the image processing module and its corresponding weights shown in fig. 5.

Illustratively, the processing result may be an RGB image.

Further, the processing result may be an 8-bit RGB image.

Step S402 corresponds to step S302, and a specific description can be found in step S302.

S403, the results processed by the plurality of image processing modules are input into the visual task model for reasoning, and the reasoning result of the visual task model is obtained.

The visual task model may be a model that has been trained.

S404, comparing the reasoning result of the visual task model with the true value corresponding to the image in the training data set, and adjusting the weights of the image processing modules according to the comparison result.

Or, the comparison result is fed back to an optimization algorithm, and the weights of the plurality of image processing modules are adjusted by the optimization algorithm.

Illustratively, the optimization algorithm includes a Bayesian optimization method, an RNN model, and a reinforcement learning algorithm.

S405, taking the adjusted weight of the image processing module as the weight of the image processing module in the step S402, and repeating the steps S402 to S404 until the first preset condition is met.

Alternatively, in step S405, the adjusted weight of the image processing module may be normalized, and the normalized weight may be used as the weight of the image processing module in step S402.

That is, after each adjustment of the weights of the image processing modules, the adjusted weights are normalized so that the sum of the normalized weights is 1 or the sum approaches 1.

After the first preset condition is satisfied, the steps S402 to S404 are terminated. For example, the weight of the image processing module obtained at present may be regarded as the weight of the image processing module obtained after the first preset condition is satisfied.

For example, if the accuracy of the current visual task model is greater than or equal to the accuracy of the visual task model without setting the weight of the image processing module, steps S402 to S404 are terminated.

Step S403 to step S405 may be regarded as a specific implementation manner of the method 1, the specific description may refer to the description in the method 1, and the setting manner of the first preset condition may refer to the preset condition in the method 1, which is not repeated herein.

S406, deleting part of the image processing modules according to the weight of the image processing modules obtained after the first preset condition is met.

Step S406 corresponds to step S304 in method 2, and the specific description may refer to the description of the loving and difficult to change in mode 2, which is not described herein.

For example, as shown in fig. 5, the adjusted green balance module, the dead pixel restoration module, the bayer noise reduction module, the color correction module, the gamma correction module, and the noise reduction and sharpening module with smaller weight values are deleted.

S407, inputting the images in the training data set into the undeleted image processing module for processing.

The image in step S407 may be the same image as the image in step S402, or may be a different image.

That is, parameters in the image processing module that are not deleted are taken as tuning objects. Or, the parameters in the reserved image processing module are taken as tuning objects.

Further, before step S407, the weights of the image processing modules that have not been deleted may also be normalized.

For example, as shown in fig. 5, the images in the training data set are input to a black level compensation module, a demosaicing module, an automatic white balance module, and a gamma correction module for processing. Further, before step S407 is performed, the weights of the 4 image processing modules may be normalized.

S408, inputting the result processed by the undeleted image processing module into the visual task model for reasoning, and obtaining the reasoning result of the visual task model.

S409, comparing the reasoning result of the visual task model with the true value corresponding to the image in the training data set, and adjusting the parameters in the undeleted image processing module according to the comparison result.

Or, the comparison result is fed back to an optimization algorithm, and parameters in the image processing module are adjusted by the optimization algorithm.

Illustratively, the optimization algorithm includes a bayesian optimization method, an RNN model, or a reinforcement learning algorithm.

It should be understood that the optimization algorithm used in step S409 may be the same as or different from the optimization algorithm used in step S440.

S410, taking the parameters in the adjusted image processing module as the parameters in the image processing module in the step S407, and repeating the steps S407 to S409 until the second preset condition is met.

After the second preset condition is satisfied, step S407 to step S410 are terminated. For example, the parameters in the currently obtained image processing module may be regarded as parameters in the image processing module obtained after the second preset condition is satisfied.

For example, if the accuracy of the current visual task model is greater than or equal to the accuracy of the visual task model without setting the weight of the image processing module, steps S407 to S410 are terminated.

Step S407 to step S410 may be regarded as a specific implementation manner of mode 3, and the specific description may refer to the description in mode 3, which is not repeated herein. The setting manner of the second preset condition may refer to the preset condition in the mode 3.

And after the first stage is completed, the parameters in the reserved image processing modules are adjusted by using the performance indexes obtained by the visual task model, for example, the image processing modules are searched for design space by using the performance indexes obtained by the visual task model, so that optimal parameter configuration of each image processing module is facilitated to improve the performance of the visual task model.

In another possible implementation, the first and second phases in method 400 may be performed simultaneously. That is to say, the weight of the image processing module and the parameters in the image processing module are adjusted at the same time. The manner in which the first and second phases of method 400 are performed simultaneously is described below. The method 400 may include the following steps. The following steps may be referred to the description of the first stage and the second stage of the method 400 described previously, and for brevity of description, a part of the description is appropriately omitted in describing the following steps.

1) Initial weights are set for the plurality of image processing modules.

2) The images in the training data set are input into the plurality of image processing modules for processing.

3) And the results processed by the plurality of image processing modules are input into the visual task model for reasoning, so that the reasoning results of the visual task model are obtained.

4) And comparing the reasoning result of the visual task model with a true value corresponding to the image in the training data set, and adjusting the weights of the plurality of image processing modules and the parameters in the plurality of image processing modules according to the comparison result.

Or, the comparison result is fed back to an optimization algorithm, and the weights of the plurality of image processing modules are adjusted by the optimization algorithm. Parameters in the plurality of image processing modules are adjusted using an optimization algorithm.

The optimization algorithm for adjusting the weights of the plurality of image processing modules and the optimization algorithm for adjusting the parameters in the plurality of image processing modules may be the same or different.

5) Taking the weight of the adjusted image processing module as the weight of the image processing module in the step 2), taking the parameter of the adjusted image processing module as the parameter of the image processing module in the step 2), and repeating the steps 2) to 4) until the training is completed.

Or, the adjusted weight of the image processing module is normalized, and the normalized weight is used as the weight of the image processing module in the step 5).

For example, if the accuracy of the current visual task model is greater than or equal to the accuracy of the reasoning of the visual task model without setting the weights of the image processing module, the training is complete. Alternatively, training is complete if the accuracy of the current visual task model is greater than or equal to the accuracy of the reasoning of the visual task model prior to execution of method 400.

6) And deleting part of the image processing modules according to the weights of the image processing modules after training. Step 6) corresponds to step S304 in method 2, and the specific description may refer to the description in mode 2, which is not repeated here.

In this way, the first stage and the second stage are executed simultaneously, so that the image processing module can be prevented from being deleted due to unreasonable parameter configuration, the image processing module can process images under the optimal parameter configuration, and further the contribution degree of each image processing module under the optimal parameter configuration to the performance index of the visual task model is judged, so that the image processing module required by the visual task model is reserved, and the performance index of the visual task model can be further improved.

Method 400 is merely one example of combining modes 1, 2, and 4. Mode 1, mode 2, mode 3, and mode 4 can also be combined in other implementations.

Illustratively, modes 1, 2 and 3 are combined.

For example, step S304 may include: and adjusting weights of the plurality of image processing modules and processing sequences of the plurality of image processing modules according to processing results of the visual task model, and deleting part of the image processing modules from the plurality of image processing modules according to the adjusted weights of the image processing modules.

For another example, step S304 may include: adjusting weights of a plurality of image processing modules according to processing results of the visual task model, and deleting part of image processing modules from the plurality of image processing modules according to the adjusted weights of the image processing modules; and adjusting the processing sequence of the undeleted image processing modules according to the processing result of the visual task model. That is, step S304 is divided into two stages, in which a part of the image processing modules are deleted in the first stage, and in which the processing order of the image processing modules that are not deleted is adjusted in the second stage.

For specific combinations, reference may be made to method 400, which is not described in detail herein.

It should be understood that the above combination modes are all examples, and any two or more of the above four modes may be combined, which is not limited in this embodiment of the present application.

In the embodiment of the application, the adjusted image processing module is an image processing module required by the visual task model. The adjusted image processing module has a corresponding relation with the visual task model. Different visual task models may correspond to different image processing modules. In this way, an appropriate image processing flow can be selected according to the application scene.

Fig. 6 shows an image processing method 700 provided in an embodiment of the present application, where the method shown in fig. 6 may be performed by an image processing apparatus, and the apparatus may be a cloud service device, or may be a terminal device, for example, an apparatus with sufficient computing power for executing image processing, such as a computer, a server, or may be a system formed by the cloud service device and the terminal device. For example, method 700 may be performed by the preprocessing module of fig. 1.

The target image processing module in method 700 results from either method 300 or method 400. In order to avoid unnecessary repetition, a repetitive description is appropriately omitted below when describing the method 700.

The method 700 includes steps S701 to S704. The following describes step S701 to step S704 in detail.

S701, a third image is acquired.

The third image is the image to be processed.

The third image may be, for example, a raw map acquired by the sensor.

The third image may be an image captured by the terminal device (or other apparatus or device such as a computer, a server, or the like) through the camera, or may be an image obtained from inside the terminal device (or other apparatus or device such as a computer, a server, or the like) (for example, an image stored in an album of the terminal device, or an image obtained from a cloud end by the terminal device), which is not limited in the embodiment of the present application.

S702, determining at least one target image processing module according to the visual task model.

The visual task model may be a trained model.

For the same visual task, different visual task models may be employed in different application scenarios. For example, for a target detection task in a driving scene, the visual task models employed in the case of overexposure and underexposure may be the same or may be different. In the driving process, if the current scene is identified as overexposure, the first target detection model may be adopted as a visual task model, and at least one target image processing module corresponding to the first target detection model is determined according to the first target detection model. If the current scene is identified as underexposed, a second object detection model may be employed as a visual task model, and at least one object image processing module corresponding to the second object detection model is determined from the second object detection model. The first object detection model and the second object detection model are different object detection models. In this way, different image processing flows can be selected according to different application scenes, and the performance of the visual task model is improved.

S703, processing the third image by the at least one target image processing module to obtain a fourth image.

That is, the input third image is processed using one or more image processing modules corresponding to the visual task model to obtain a fourth image.

The fourth image may be an RGB image, for example. Alternatively, the fourth image may be an 8bit RGB image. The type of fourth image may be set according to the input needs of the visual task model, for example only.

S704, processing the fourth image through the visual task model to obtain a processing result of the fourth image.

And the processing result of the fourth image is the reasoning result of the visual task model. The reasoning results of the visual task model are related to the type of visual task.

For example, if the visual task is target detection, the inference result of the visual task model may be a target frame on the fourth image and a category of an object in the target frame. For another example, if the visual task is image classification, the inference result of the visual task model may be the classification of the fourth image.

Optionally, step S702 includes: at least one target image processing module is determined from a plurality of candidate image processing modules according to the visual task model.

That is, a combination of image processing modules is determined from the plurality of candidate image processing modules according to the visual task model, and the image processing module in the combination of image processing modules is the at least one target image processing module.

In this case, when the visual task model is changed, the combination of the image processing modules may be changed accordingly.

There is a correspondence between the combination of the visual task model and the image processing module. The combination of the image processing modules corresponding to the current visual task model can be determined according to the corresponding relation, or the image processing module required by the visual task model, namely the at least one target image processing module, can be determined according to the corresponding relation. The at least one target image processing module may be obtained by the method 300 or the method 400. Alternatively, it is understood that the correspondence between the combination of the visual task model and the image processing module is obtained by the method 300 or the method 400.

For example, the visual task model is the model shown in fig. 5, then the at least one target image processing module comprises: the system comprises a black level compensation module, a demosaicing module, an automatic white balance module and a gamma correction module.

In this way, different visual task models correspond to different combinations of image processing modules, and when the visual task model changes, the combination of the image processing modules can be adaptively matched with the visual task model, so that the combination of the current image processing modules is more suitable for the current visual task model, and the performance of the visual task model is improved.

Optionally, step S702 includes: the weights of the at least one target image processing module are determined according to the visual task model. The weight of the at least one target image processing module is used for processing the processing result of the at least one target image processing module to obtain a fourth image.

In one implementation, the combination of image processing modules corresponding to different visual task models is the same. When the visual task model changes, the weight of the image processing module may change accordingly.

In this embodiment, the combination of the image processing modules corresponding to different visual task models is the same, which may be understood that the functions implemented by the image processing modules adopted by the different visual task models are the same.

The visual task model and the weight of the image processing module have a corresponding relation. And determining the weight of the image processing module corresponding to the current visual task model, namely the weight of the at least one target image processing module according to the corresponding relation.

For example, if the visual task model is the model shown in fig. 4, the at least one target image processing module may be 9 image processing modules in fig. 4, and the weights of the image processing modules may be the weights obtained in step S405.

In this way, different visual task models correspond to weights of different image processing modules, and when the visual task models change, the weights of the image processing modules can be adaptively matched with the visual task models, so that the weights of the current image processing modules are more suitable for the current visual task models, and the performance of the visual task models is improved.

In another implementation, when the visual task model changes, the weight of the image processing module may also change accordingly, as may other configurations of the image processing module. For example, the combination of image processing modules may vary.

Illustratively, the weights of the visual task model and the image processing module and other configurations of the image processing module have a correspondence. In this way, the weight of the image processing module corresponding to the visual task model and other configuration conditions of the image processing module can be determined according to the visual task model.

For example, there is a correspondence between the combination of the visual task model and the image processing module and the weights of the image processing module. In step S702, a combination of image processing modules corresponding to the visual task model and weights of the image processing modules in the combination of image processing modules may be determined.

If the visual task model is the model shown in fig. 5, the at least one target image processing module corresponding to the visual task model may be obtained in step S406. The at least one target image processing module includes a black level compensation module, a demosaicing module, an automatic white balance module, and a gamma correction module. The weight of the at least one target image processing module may be the weight obtained in step S405.

Optionally, step S702 includes: a processing order of at least one target image processing module is determined based on the visual task model.

In one implementation, the combination of image processing modules corresponding to different visual task models is the same. In this case, when the visual task model is changed, the processing order of the image processing modules may be changed accordingly.

In this way, different visual task models correspond to different processing sequences of the image processing modules, and when the visual task models change, the processing sequences of the image processing modules can be adaptively matched with the visual task models, so that the processing sequences of the current image processing modules are more suitable for the current visual task models, and the performance of the visual task models is improved.

In another implementation, when the visual task model changes, the processing order of the image processing modules may change accordingly, as may other configurations of the image processing modules. For example, the combination of image processing modules may vary.

Illustratively, the visual task model and the processing order of the image processing module and other configuration situations of the image processing module have a correspondence. In this way, the processing sequence of the image processing module corresponding to the visual task model and other configuration situations of the image processing module can be determined according to the corresponding relation.

For example, there is a correspondence between the combination of the visual task model and the image processing module and the processing order of the image processing module. A combination of image processing modules corresponding to the visual task model, and a processing order of the image processing modules in the combination of image processing modules, may be determined from the visual task model.

In this case, the combination of image processing modules corresponding to different visual task models may be the same or may be different. For example, the combination of the image processing modules corresponding to the two visual task models is the same, and the processing order of the image processing modules in the combination of the image processing modules is different.

For another example, there is a correspondence between the combination of the visual task model and the image processing module, the weight of the image processing module, and the processing order of the image processing module. In step S702, a combination of image processing modules corresponding to the visual task model, a weight of the image processing modules, and a processing order of the image processing modules may be determined, that is, a target image processing module, a weight of the target image processing module, and a processing order of the target image processing module are determined from a plurality of candidate image processing modules.

In this case, the combination of the image processing modules corresponding to the different visual task models may be the same or different. In the case where the combination of the image processing modules is the same, the weights of the image processing modules in the combination of the image processing modules may be the same or may be different. In the case where the combination of the image processing modules is the same, the processing order of the image processing modules in the combination of the image processing modules may be the same or may be different.

Optionally, step S702 includes: parameters in the at least one target image processing module are determined from the visual task model.

In one implementation, the combination of image processing modules corresponding to different visual task models is the same. When the visual task model changes, parameters in the image processing module may change accordingly.

For example, the image processing module corresponding to the first visual task model includes: and the black level compensation module and the demosaicing module are used for carrying out black level compensation. The parameters of the black level compensation module include a parameter A1, and the parameters of the demosaicing module include a parameter B1. The image processing module corresponding to the second visual task model comprises: and the black level compensation module and the demosaicing module are used for carrying out black level compensation. The parameters of the black level compensation module include a parameter A2, and the parameters of the demosaicing module include a parameter B2. The image is subjected to a black level compensation process and a demosaicing process before being input into the first visual task model and the second visual task model. But the parameters used for the black level compensation process and the demosaicing process before the first visual task model are different from those used for the black level compensation process and the demosaicing process before the second visual task model.

In this way, different visual task models correspond to parameters in different image processing modules, and when the visual task model changes, the parameters in the image processing modules can be adaptively matched with the visual task model, so that the parameters in the current image processing module are more suitable for the current visual task model, and the performance of the visual task model is improved.

In another implementation, the visual task model and parameters in the image processing module and other configuration conditions of the image processing module have a correspondence. In this way, parameters in the image processing module corresponding to the current visual task model and other configuration conditions of the image processing module can be determined according to the corresponding relation.

For example, there is a correspondence between the combination of the visual task model and the image processing module and the parameters in the image processing module. And determining the combination of the image processing modules corresponding to the current visual task model and parameters in the image processing modules in the combination of the image processing modules according to the corresponding relation.

In this case, the combination of image processing modules corresponding to different visual task models may be the same or may be different. For example, the combination of image processing modules corresponding to two visual task models is the same, while the parameters in the image processing modules in the combination of image processing modules are different.

For another example, there is a correspondence between the combination of the visual task model and the image processing module, the weight of the image processing module, and the parameters in the image processing module. And determining the combination of the image processing modules corresponding to the visual task model, the weight of the image processing modules and the parameters in the image processing modules according to the corresponding relation.

In this case, the combination of the image processing modules corresponding to the different visual task models may be the same or different. In the case where the combination of the image processing modules is the same, the weights of the image processing modules in the combination of the image processing modules may be the same or may be different. In the case where the combination of the image processing modules is the same, the parameters in the image processing modules in the combination of the image processing modules may be the same or different.

The apparatus of the embodiments of the present application will be described below with reference to fig. 7 to 8. It should be understood that the apparatus described below is capable of performing the method of the embodiments of the present application described above, and in order to avoid unnecessary repetition, the repeated description is appropriately omitted when introducing the apparatus of the embodiments of the present application.

Fig. 7 is a schematic block diagram of an image processing apparatus of an embodiment of the present application. The image processing apparatus 4000 shown in fig. 7 includes an acquisition unit 4010 and a processing unit 4020.

The acquisition unit 4010 and the processing unit 4020 can be used to execute the image processing method of the embodiment of the present application.

In one possible implementation, apparatus 4000 may be used to perform method 300 or method 400.

Specifically, the acquisition unit 4010 is used for acquiring the first image.

The processing unit 4020 is configured to: processing the first image through at least one image processing module to obtain a second image; inputting the second image into the visual task model for processing; and adjusting at least one image processing module according to the processing result of the visual task model.

Optionally, as an embodiment, the at least one image processing module includes a plurality of image processing modules, and the processing unit 4020 is specifically configured to:

and deleting part of the image processing modules in the plurality of image processing modules according to the processing result of the visual task model.

Optionally, as an embodiment, the processing unit 4020 is specifically configured to: the weights of the plurality of image processing modules are adjusted according to the processing results of the visual task model, and the weights of the plurality of image processing modules are used for processing the processing results of the plurality of image processing modules to obtain a second image; and deleting part of the image processing modules in the plurality of image processing modules according to the adjusted weights of the plurality of image processing modules.

Optionally, as an embodiment, the processing unit 4020 is specifically configured to: and adjusting parameters in at least one image processing module according to the processing result of the visual task model.

Optionally, as an embodiment, the processing unit 4020 is specifically configured to: and adjusting the processing sequence of at least one image processing module according to the processing result of the visual task model.

Optionally, as an embodiment, the at least one image processing module includes: the device comprises a black level compensation module, a green balance module, a dead pixel correction module, a demosaicing module, a Bayer noise reduction module, an automatic white balance module, a color correction module, a gamma correction module or a noise reduction and sharpening module.

In another possible implementation, the apparatus 4000 may be configured to perform the method 700.

Specifically, the acquisition unit 4010 is used for acquiring a third image.

The processing unit 4020 is configured to: determining at least one target image processing module according to the visual task model; processing the third image by at least one target image processing module to obtain a fourth image; and processing the fourth image through the visual task model to obtain a processing result of the fourth image.

Optionally, as an embodiment, the processing unit 4020 is specifically configured to: at least one target image processing module is determined from a plurality of candidate image processing modules according to the visual task model.

Optionally, as an embodiment, the processing unit 4020 is specifically configured to: parameters in at least one target image processing module are determined from the visual task model.

Optionally, as an embodiment, the processing unit 4020 is specifically configured to: a processing order of at least one target image processing module is determined based on the visual task model.

Optionally, as an embodiment, the at least one target image processing module includes: the device comprises a black level compensation module, a green balance module, a dead pixel correction module, a demosaicing module, a Bayer noise reduction module, an automatic white balance module, a color correction module, a gamma correction module or a noise reduction and sharpening module.

It should be noted that the above-mentioned apparatus 4000 is embodied in the form of a functional unit. The term "unit" herein may be implemented in software and/or hardware, without specific limitation.

For example, a "unit" may be a software program, a hardware circuit or a combination of both that implements the functions described above. The hardware circuitry may include application specific integrated circuits (application specific integrated circuit, ASICs), electronic circuits, processors (e.g., shared, proprietary, or group processors, etc.) and memory for executing one or more software or firmware programs, merged logic circuits, and/or other suitable components that support the described functions.

Thus, the elements of the examples described in the embodiments of the present application can be implemented in electronic hardware, or in a combination of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

Fig. 8 is a schematic hardware configuration of an image processing apparatus according to an embodiment of the present application. The image processing apparatus 6000 shown in fig. 8 (the apparatus 6000 may specifically be a computer device) includes a memory 6001, a processor 6002, a communication interface 6003, and a bus 6004. The memory 6001, the processor 6002, and the communication interface 6003 are connected to each other by a bus 6004.

The memory 6001 may be a Read Only Memory (ROM), a static storage device, a dynamic storage device, or a random access memory (random access memory, RAM). The memory 6001 may store a program, and when the program stored in the memory 6001 is executed by the processor 6002, the processor 6002 is configured to execute the respective steps of the image processing method of the embodiment of the present application. In particular, the processor 6002 may perform the method 300, method 400, or method 700 above.

The processor 6002 may employ a general-purpose central processing unit (central processing unit, CPU), microprocessor, application specific integrated circuit (application specific integrated circuit, ASIC), graphics processor (graphics processing unit, GPU) or one or more integrated circuits for executing associated programs to implement the image processing methods of the method embodiments of the present application.

The processor 6002 may also be an integrated circuit chip with signal processing capabilities. In implementation, the steps of the image processing method of the present application may be completed by an integrated logic circuit of hardware in the processor 6002 or an instruction in the form of software.

The processor 6002 may also be a general purpose processor, a digital signal processor (digital signal processing, DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (field programmable gate array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components. The disclosed methods, steps, and logic blocks in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present application may be embodied directly in hardware, in a decoded processor, or in a combination of hardware and software modules in a decoded processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in the memory 6001, and the processor 6002 reads information in the memory 6001, and performs functions necessary for the units included in the apparatus shown in fig. 7 to execute in combination with its hardware, or performs an image processing method of an embodiment of the method.

The communication interface 6003 enables communication between the apparatus 6000 and other devices or communication networks using transceiving means such as, but not limited to, a transceiver. For example, training data may be acquired through the communication interface 6003.

Bus 6004 may include a path to transfer information between components of device 6000 (e.g., memory 6001, processor 6002, communication interface 6003).

It should be noted that although the above-described apparatus 6000 only shows a memory, a processor, a communication interface, in a specific implementation, it will be appreciated by those skilled in the art that the apparatus 6000 may also include other devices necessary to achieve normal operation. Also, as will be appreciated by those skilled in the art, the apparatus 6000 may also include hardware devices that perform other additional functions, as desired. Furthermore, it will be appreciated by those skilled in the art that the apparatus 6000 may also include only the devices necessary to implement the embodiments of the present application, and not necessarily all of the devices shown in fig. 8.

The present embodiment also provides a computer-readable storage medium storing program code for execution by a device, the program code including instructions for performing the image processing method in the embodiment of the present application.

The present embodiments also provide a computer program product containing instructions which, when run on a computer, cause the computer to perform the image processing method in the embodiments of the present application.

The embodiment of the application also provides a chip, which comprises a processor and a data interface, wherein the processor reads instructions stored in a memory through the data interface, and executes the image processing method in the embodiment of the application.

The chip may be an FPGA or an ASIC.

It should be appreciated that the processor in embodiments of the present application may be a central processing unit (central processing unit, CPU), but may also be other general purpose processors, digital signal processors (digital signal processor, DSP), application specific integrated circuits (application specific integrated circuit, ASIC), off-the-shelf programmable gate arrays (field programmable gate array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It should also be appreciated that the memory in embodiments of the present application may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically Erasable EPROM (EEPROM), or a flash memory. The volatile memory may be random access memory (random access memory, RAM) which acts as an external cache. By way of example but not limitation, many forms of random access memory (random access memory, RAM) are available, such as Static RAM (SRAM), dynamic Random Access Memory (DRAM), synchronous Dynamic Random Access Memory (SDRAM), double data rate synchronous dynamic random access memory (DDR SDRAM), enhanced Synchronous Dynamic Random Access Memory (ESDRAM), synchronous Link DRAM (SLDRAM), and direct memory bus RAM (DR RAM).

The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any other combination. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions or computer programs. When the computer instructions or computer program are loaded or executed on a computer, the processes or functions described in accordance with the embodiments of the present application are all or partially produced. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wired (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more sets of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a solid state disk.

It should be understood that the term "and/or" is merely an association relationship describing the associated object, and means that three relationships may exist, for example, a and/or B may mean: there are three cases, a alone, a and B together, and B alone, wherein a, B may be singular or plural. In addition, the character "/" herein generally indicates that the associated object is an "or" relationship, but may also indicate an "and/or" relationship, and may be understood by referring to the context.

In the present application, "at least one" means one or more, and "a plurality" means two or more. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b, or c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or plural.

It should be understood that, in various embodiments of the present application, the sequence numbers of the foregoing processes do not mean the order of execution, and the order of execution of the processes should be determined by the functions and internal logic thereof, and should not constitute any limitation on the implementation process of the embodiments of the present application.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

It will be clear to those skilled in the art that, for convenience and brevity of description, specific working procedures of the above-described systems, apparatuses and units may refer to corresponding procedures in the foregoing method embodiments, and are not repeated herein.

In the several embodiments provided in this application, it should be understood that the disclosed systems, devices, and methods may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art who is familiar with the technical scope of the present application can easily think about the changes or substitutions, and the changes or substitutions are covered in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

An image processing method, comprising:

acquiring a first image;

processing the first image through at least one image processing module to obtain a second image;

inputting the second image into a visual task model for processing;

and adjusting the at least one image processing module according to the processing result of the visual task model.
The method of claim 1, wherein the at least one image processing module comprises a plurality of image processing modules, the adjusting the at least one image processing module according to the processing results of the visual task model comprising:

and deleting part of the image processing modules in the plurality of image processing modules according to the processing result of the visual task model.
The method according to claim 2, wherein deleting a part of the image processing modules of the plurality of image processing modules according to the processing result of the visual task model comprises:

the weights of the plurality of image processing modules are adjusted according to the processing results of the visual task model, and the weights of the plurality of image processing modules are used for processing the processing results of the plurality of image processing modules to obtain the second image;

and deleting part of the image processing modules in the plurality of image processing modules according to the adjusted weights of the plurality of image processing modules.
A method according to any one of claims 1 to 3, wherein said adjusting said at least one image processing module according to the processing result of said visual task model comprises:

and adjusting parameters in the at least one image processing module according to the processing result of the visual task model.
The method according to any one of claims 1 to 4, wherein said adjusting said at least one image processing module according to the processing result of said visual task model comprises:

and adjusting the processing sequence of the at least one image processing module according to the processing result of the visual task model.
The method according to any one of claims 1 to 5, wherein the at least one image processing module comprises:

the device comprises a black level compensation module, a green balance module, a dead pixel correction module, a demosaicing module, a Bayer noise reduction module, an automatic white balance module, a color correction module, a gamma correction module or a noise reduction and sharpening module.
An image processing method, comprising:

acquiring a third image;

determining at least one target image processing module according to the visual task model;

processing the third image through the at least one target image processing module to obtain a fourth image;

and processing the fourth image through the visual task model to obtain a processing result of the fourth image.
The method of claim 7, wherein said determining at least one target image processing module from the visual task model comprises:

the at least one target image processing module is determined from a plurality of candidate image processing modules according to the visual task model.
The method according to claim 7 or 8, wherein said determining at least one target image processing module from the visual task model comprises:

Parameters in the at least one target image processing module are determined according to the visual task model.
The method according to any one of claims 7 to 9, wherein said determining at least one target image processing module from a visual task model comprises:

and determining the processing sequence of the at least one target image processing module according to the visual task model.
The method according to any one of claims 7 to 10, wherein the at least one target image processing module comprises:

the device comprises a black level compensation module, a green balance module, a dead pixel correction module, a demosaicing module, a Bayer noise reduction module, an automatic white balance module, a color correction module, a gamma correction module or a noise reduction and sharpening module.
An image processing apparatus, comprising:

an acquisition unit configured to acquire a first image;

a processing unit for:

processing the first image through at least one image processing module to obtain a second image;

inputting the second image into a visual task model for processing;

and adjusting the at least one image processing module according to the processing result of the visual task model.
The apparatus according to claim 12, wherein said at least one image processing module comprises a plurality of image processing modules, said processing unit being specifically configured to:

and deleting part of the image processing modules in the plurality of image processing modules according to the processing result of the visual task model.
The apparatus according to claim 13, wherein the processing unit is specifically configured to:

the weights of the plurality of image processing modules are adjusted according to the processing results of the visual task model, and the weights of the plurality of image processing modules are used for processing the processing results of the plurality of image processing modules to obtain the second image;

and deleting part of the image processing modules in the plurality of image processing modules according to the adjusted weights of the plurality of image processing modules.
The apparatus according to any one of claims 12 to 14, wherein the processing unit is specifically configured to:

and adjusting parameters in the at least one image processing module according to the processing result of the visual task model.
The apparatus according to any one of claims 12 to 15, wherein the processing unit is specifically configured to:

And adjusting the processing sequence of the at least one image processing module according to the processing result of the visual task model.
The apparatus according to any one of claims 12 to 16, wherein the at least one image processing module comprises:

the device comprises a black level compensation module, a green balance module, a dead pixel correction module, a demosaicing module, a Bayer noise reduction module, an automatic white balance module, a color correction module, a gamma correction module or a noise reduction and sharpening module.
An image processing apparatus, comprising:

an acquisition unit configured to acquire a third image;

a processing unit for:

determining at least one target image processing module according to the visual task model;

processing the third image through the at least one target image processing module to obtain a fourth image;

and processing the fourth image through the visual task model to obtain a processing result of the fourth image.
The apparatus according to claim 18, wherein the processing unit is specifically configured to:

the at least one target image processing module is determined from a plurality of candidate image processing modules according to the visual task model.
The apparatus according to claim 18 or 19, wherein the processing unit is specifically configured to:

parameters in the at least one target image processing module are determined according to the visual task model.
The apparatus according to any one of claims 18 to 20, wherein the processing unit is specifically configured to:

and determining the processing sequence of the at least one target image processing module according to the visual task model.
The apparatus according to any one of claims 18 to 21, wherein the at least one target image processing module comprises:

the device comprises a black level compensation module, a green balance module, a dead pixel correction module, a demosaicing module, a Bayer noise reduction module, an automatic white balance module, a color correction module, a gamma correction module or a noise reduction and sharpening module.
An image processing apparatus comprising a processor and a memory, the memory for storing program instructions, the processor for invoking the program instructions to perform the method of any of claims 1-6 or 7-11.
A computer readable storage medium for storing program code for execution by a device, the program code comprising instructions for performing the method of any one of claims 1 to 6 or claims 7 to 11.
A computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of any of claims 1 to 6 or claims 7 to 11.
A chip comprising a processor and a data interface, the processor reading instructions stored on a memory via the data interface to perform the method of any one of claims 1 to 6 or 7 to 11.