CN115908969A - Method and apparatus for image processing and model training - Google Patents
Method and apparatus for image processing and model training Download PDFInfo
- Publication number
- CN115908969A CN115908969A CN202211358446.2A CN202211358446A CN115908969A CN 115908969 A CN115908969 A CN 115908969A CN 202211358446 A CN202211358446 A CN 202211358446A CN 115908969 A CN115908969 A CN 115908969A
- Authority
- CN
- China
- Prior art keywords
- image processing
- conversion layer
- model
- channel
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012545 processing Methods 0.000 title claims abstract description 398
- 238000012549 training Methods 0.000 title claims abstract description 149
- 238000000034 method Methods 0.000 title claims abstract description 76
- 238000006243 chemical reaction Methods 0.000 claims abstract description 187
- 238000010586 diagram Methods 0.000 claims abstract description 42
- 230000009466 transformation Effects 0.000 claims abstract description 27
- 238000003780 insertion Methods 0.000 claims description 48
- 230000037431 insertion Effects 0.000 claims description 48
- 238000013507 mapping Methods 0.000 claims description 26
- 238000002372 labelling Methods 0.000 claims description 15
- 238000000605 extraction Methods 0.000 claims description 14
- 238000013519 translation Methods 0.000 claims description 13
- 230000001131 transforming effect Effects 0.000 claims description 10
- 238000003860 storage Methods 0.000 claims description 7
- 238000003672 processing method Methods 0.000 claims description 6
- 230000008569 process Effects 0.000 description 16
- 238000005457 optimization Methods 0.000 description 12
- 230000006870 function Effects 0.000 description 8
- 238000010606 normalization Methods 0.000 description 8
- 238000004590 computer program Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 7
- 238000004519 manufacturing process Methods 0.000 description 6
- 238000001514 detection method Methods 0.000 description 4
- 238000003709 image segmentation Methods 0.000 description 4
- 238000007781 pre-processing Methods 0.000 description 3
- 230000015556 catabolic process Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000006731 degradation reaction Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
Images
Landscapes
- Image Analysis (AREA)
Abstract
The application provides a method and equipment for image processing and model training. According to the method, when the pre-training model is applied to a specific image processing task, the image processing model is obtained by inserting the channel tuning module into the transform conversion layer of the pre-training model, the channel tuning module is used for carrying out transformation processing on the characteristics of at least one target channel in the intermediate characteristic diagram extracted by the conversion layer, when the model is trained, the channel with richer characteristics in the intermediate characteristic diagram is selected as the target channel based on the data set of the current image processing task, the data set is used for training the parameters of the channel tuning module in the image processing model, the original parameters of the pre-training model are kept unchanged, the number of trainable parameters can be greatly reduced, overfitting is prevented, and therefore the precision of the image processing model is improved.
Description
Technical Field
The present application relates to computer technologies, and in particular, to a method and an apparatus for image processing and model training.
Background
The large-scale pre-training model is excellent in various computer vision tasks, such as image processing tasks applied to image classification, image segmentation, image detection and the like. Pre-training on large public data sets enables pre-trained models to learn rich visual representations, robust in both low-level and high-level visual representations, that can be used in downstream image processing tasks to enhance the effectiveness of image processing tasks.
When the method is applied to an image processing task, all parameters of a pre-training model are often required to be trained by using a large-scale data set with labels, so that the trained model is suitable for the specific image processing task.
However, in some fields (such as medical treatment, remote sensing, etc.), there is less existing data, or the size of the data set that can be acquired due to data sensitivity is small, and for the image processing tasks in these fields, overfitting is easily caused by training the pre-trained model with the small-scale data set, which results in low accuracy of the model when applied to the image processing task.
Disclosure of Invention
The application provides a method and equipment for image processing and model training, which are used for solving the problem that when the method and equipment are applied to a specific image processing task, overfitting is easily caused when a pre-training model is trained by using a data set, so that the precision of the model is low when the method and equipment are applied to the image processing task.
In a first aspect, the present application provides a method for training an image processing model, including:
acquiring a data set of an image processing task and an image processing model to be trained, wherein the image processing model is obtained by inserting a channel tuning module into a conversion layer of a pre-training model;
inputting a sample image in the data set into the image processing model, extracting features of the sample image through the conversion layer, and transforming the features of at least one target channel in an intermediate feature map extracted by the conversion layer through a channel tuning module inserted in the conversion layer;
determining an image processing result according to the characteristics finally output by the conversion layer;
and training parameters of a channel tuning module in the image processing model according to the image processing result and the labeling information of the sample image to obtain a trained image processing model, wherein the trained image processing model is used for carrying out image processing on an input image to obtain an image processing result.
In a second aspect, the present application provides a method for training a remote sensing image processing model, including:
receiving a remote sensing image data set sent by user equipment, wherein the remote sensing image data set comprises a plurality of remote sensing images and annotation information of the remote sensing images;
acquiring an image processing model to be trained, wherein the image processing model is obtained by inserting a channel tuning module into a conversion layer of a pre-training model;
inputting the remote sensing image into the image processing model, extracting the characteristics of the remote sensing image through the conversion layer, and transforming the characteristics of at least one target channel in the intermediate characteristic diagram extracted by the conversion layer through a channel tuning module inserted in the conversion layer;
determining an image processing result of the remote sensing image according to the characteristics finally output by the conversion layer;
training parameters of a channel tuning module in the image processing model according to the image processing result and the labeling information of the remote sensing image to obtain a trained image processing model;
and outputting the model parameters of the trained image processing model to the user equipment.
In a third aspect, the present application provides an image processing method, including:
acquiring an image to be processed;
inputting the image into a trained image processing model, extracting the features of the image through a conversion layer of the image processing model, and performing transformation processing on the features of at least one target channel in an intermediate feature map extracted by the conversion layer through a channel tuning module inserted in the conversion layer;
determining an image processing result of the image according to the characteristics finally output by the conversion layer;
and outputting the image processing result.
In a fourth aspect, the present application provides an image processing model training system, comprising:
the end-side equipment is used for constructing a data set of an image processing task and sending the data set of the image processing task to the cloud-side equipment;
the cloud side equipment is used for receiving a data set of an image processing task and acquiring an image processing model to be trained, wherein the image processing model is obtained by inserting a channel tuning module into a conversion layer of a pre-training model; inputting a sample image in the data set into the image processing model, extracting the characteristics of the sample image through the conversion layer, and transforming the characteristics of at least one target channel in the intermediate characteristic diagram extracted by the conversion layer through a channel tuning module inserted in the conversion layer; determining an image processing result according to the characteristics finally output by the conversion layer; training parameters of a channel tuning module in the image processing model according to the image processing result and the labeling information of the sample image to obtain a trained image processing model;
the cloud side device is further used for sending the model parameters of the trained image processing model to the end side device.
In a fifth aspect, the present application provides an image processing model training apparatus, including:
the data acquisition unit is used for acquiring a data set of an image processing task and an image processing model to be trained, wherein the image processing model is obtained by inserting a channel tuning module into a conversion layer of a pre-training model;
the image processing unit is used for inputting the sample image in the data set into the image processing model, extracting the features of the sample image through the conversion layer and transforming the features of at least one target channel in the intermediate feature map extracted by the conversion layer through a channel tuning module inserted in the conversion layer; determining an image processing result according to the characteristics finally output by the conversion layer;
and the parameter training unit is used for training the parameters of the channel tuning module in the image processing model according to the image processing result and the labeling information of the sample image to obtain a trained image processing model, and the trained image processing model is used for carrying out image processing on an input image to obtain an image processing result.
In a sixth aspect, the present application provides an image processing apparatus comprising:
the image acquisition unit is used for acquiring an image to be processed;
the image processing unit is used for inputting the image into a trained image processing model, extracting the features of the image through a conversion layer of the image processing model, and performing transformation processing on the features of at least one target channel in an intermediate feature map extracted by the conversion layer through a channel tuning module inserted in the conversion layer; determining an image processing result of the image according to the characteristics finally output by the conversion layer;
and the processing result output unit is used for outputting the image processing result.
In a seventh aspect, the present application provides an electronic device, comprising: a processor, and a memory communicatively coupled to the processor;
the memory stores computer-executable instructions;
the processor executes computer-executable instructions stored by the memory to implement the method of any of the above aspects.
In an eighth aspect, the present application provides a computer-readable storage medium having stored thereon computer-executable instructions for implementing the method of any one of the above aspects when executed by a processor.
In a ninth aspect, the present application provides a computer program product comprising a computer program which, when executed by a processor, implements the method of any of the above aspects.
According to the image processing and model training method and device, the image processing model to be trained is obtained by inserting the channel tuning module into the transformer layer of the pre-trained model, and the channel tuning module is used for carrying out transformation processing on the characteristics of at least one target channel in the intermediate characteristic diagram extracted by the transformer layer. When the pre-training model is transferred to a downstream specific image processing task, parameters of a channel tuning module in the image processing model are trained by using a data set of the specific image processing task, original parameters of the pre-training model are kept unchanged, the number of trainable parameters can be greatly reduced, overfitting of a large-scale model to a small training set is avoided, and therefore the accuracy of the model when the pre-training model is applied to the specific image processing task is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
FIG. 1 is a schematic diagram of an exemplary network architecture to which the present application is applicable;
FIG. 2 is a schematic diagram of an exemplary system architecture to which the present application is applicable;
FIG. 3 is a flowchart of an image processing model training method provided by an exemplary embodiment of the present application;
FIG. 4 is a flowchart of a process for transforming a feature of a target channel in an intermediate feature map according to an exemplary embodiment of the present application;
FIG. 5 is a block diagram of a transformation process performed on a feature of a target channel in an intermediate feature map according to an exemplary embodiment of the present application;
FIG. 6 is an exemplary diagram of an insertion location of a channel tuning module provided in an exemplary embodiment of the present application;
FIG. 7 is an exemplary diagram of an insertion location of a channel tuning module provided in accordance with another exemplary embodiment of the present application;
FIG. 8 is a flowchart of an image processing model training method provided by another exemplary embodiment of the present application;
FIG. 9 is a flowchart of a method for training a remote sensing image processing model according to an exemplary embodiment of the present application;
FIG. 10 is a flowchart of an image processing method provided in an exemplary embodiment of the present application;
FIG. 11 is a schematic diagram of an image processing model training system provided in an exemplary embodiment of the present application;
FIG. 12 is a schematic structural diagram of an image processing model training apparatus according to an exemplary embodiment of the present application;
fig. 13 is a schematic structural diagram of an image processing apparatus according to an exemplary embodiment of the present application;
fig. 14 is a schematic structural diagram of an electronic device according to an example embodiment of the present application.
With the above figures, there are shown specific embodiments of the present application, which will be described in more detail below. These drawings and written description are not intended to limit the scope of the inventive concepts in any manner, but rather to illustrate the inventive concepts to those skilled in the art by reference to specific embodiments.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. The following description refers to the accompanying drawings in which the same numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
The terms referred to in this application are explained first:
conversion layer: the method refers to transformer layers in a neural network model, each transformer layer comprises a first sublayer and a second sublayer, and the output of the first sublayer is used as the input of the second sublayer. Taking a Vision Transformer (ViT) model as an example, a backbone network of the ViT model includes 12 transform layers, and each transform layer includes two sublayers.
Aiming at an image processing task with a small data set scale, training the total parameters of a pre-training model based on a small data set is easy to overfit, and the problem of low model precision when the pre-training model is applied to the image processing task is caused. And during model training, acquiring a data set of an image processing task, and inputting a sample image in the data set into the image processing model for image processing to obtain an image processing result. Specifically, feature extraction is carried out on the sample image through the conversion layer, and the feature of at least one target channel in the intermediate feature map extracted by the conversion layer is subjected to conversion processing through a channel tuning module inserted in the conversion layer; and determining an image processing result according to the characteristics finally output by the conversion layer. And further, training parameters of a channel tuning module in the image processing model according to the image processing result and the labeling information of the sample image to obtain the trained image processing model. In the image processing model training process, the original parameters of the pre-training model are kept unchanged, only the parameters of the inserted channel tuning module need to be updated, the number of trainable parameters is greatly reduced relative to the full parameters of the pre-training model, the problem that the large-scale pre-training model is easy to be over-fitted based on small data set training is solved, and the accuracy of the image processing model is improved. The image processing model training method provided by the application realizes training of few training parameters and enables the model to achieve better precision.
Fig. 1 is a schematic diagram of an example network architecture to which the present application is applicable. As shown in fig. 1, the network architecture includes a server and an electronic device.
The server may be a server cluster deployed in the cloud or a device having local computing capability. The server stores a pre-training model which is pre-trained by a large-scale data set. Such as a pre-trained ViT model, etc. The server is stored with an image processing model obtained by inserting a channel tuning module into a transform layer of the pre-training model, and can also obtain a data set of a specific image processing task, wherein the data set is used for training the image processing model. When the server trains the image processing model, the original parameters of the pre-training model are fixed, the parameters of the channel tuning module in the image processing model are trained based on the data set of the specific image processing task, and the image processing model for executing the current image processing task can be obtained after the training is finished. Further, the server may transmit the obtained model parameters of the image processing model to the specified electronic device.
The electronic device may be a client device that requests the server for an image processing model for executing a specific image processing task, specifically, may be a computing device deployed locally by a user, or may be a server deployed in a cloud.
The electronic device provides a data set of a specific image processing task to a server, and receives an image processing model trained by the server based on the data set. Based on the trained image processing model, the electronic equipment can externally provide a function for executing an image processing task.
Exemplarily, less marked data can be acquired in the remote sensing field, taking an image processing task as a remote sensing image recognition task as an example, the server only trains the image processing model obtained by inserting the channel tuning module into the transform layer of the pre-trained model according to a data set of the remote sensing image recognition task, only updates parameters of the inserted channel tuning module in the training process, keeps original parameters of the pre-trained model unchanged, and obtains a target model for executing the remote sensing image recognition task after the training is completed. The trained target model can be deployed to a local/another cloud server to provide a function of executing a remote sensing image recognition task to the outside. When a remote sensing image recognition task needs to be executed, equipment with a corresponding target model is deployed to obtain a remote sensing image to be recognized, the remote sensing image is input into the target model obtained through training to perform remote sensing image recognition, a recognition result is obtained, the recognition result is output, or the recognition result is applied to other functional modules.
Illustratively, fig. 2 is a schematic diagram of an exemplary system architecture to which the present application is applicable. As shown in fig. 2, the system architecture includes a cloud-side device, a peer-side device, and a data production device. The cloud side equipment is in communication connection with the end side equipment through an end cloud link, and each end side equipment is in communication connection with the plurality of data production equipment.
The cloud-side device may be a central cloud device of a distributed cloud architecture, and the end-side device is an edge cloud device of the distributed cloud architecture. The data production equipment comprises various terminal equipment, including but not limited to smart phones, portable computers, tablet computers, intelligent household appliances and the like.
The data generation equipment is responsible for the production, collection and uploading of various data. The end-side equipment is responsible for collecting various data of the data production equipment within the coverage range of the end-side equipment, preprocessing the data to obtain high-value data (key information), and the end-side equipment can transmit the original data and the high-value data to the cloud-side equipment through the end cloud link. The cloud side equipment is responsible for integrating various data from different end side equipment besides synchronizing various data of the end side equipment, performs data operation according to a preset rule, and can synchronize data operation results to different end side equipment.
The cloud side equipment provides super-strong computing and storage capacity and is far away from the user, and the deployment range of the cloud side equipment is large and is close to the user. The end-side equipment is an extension of the cloud-side equipment, the computing capability of the cloud-side equipment can be sunk to the end-side equipment, and the service requirement which cannot be met in a centralized cloud computing mode is solved through integration and cooperative management of the end cloud.
Based on the system architecture shown in fig. 2, in this embodiment, the end-side device is responsible for collecting various types of data of the data production device within the coverage area of the end-side device, preprocessing the data, constructing a data set of an image processing task, and uploading the data set to the cloud-side device. The cloud side equipment receives the data sets of the image processing tasks sent by the end side equipment, integrates the data sets of the same image processing tasks from different end side equipment, and forms a larger data set. And the cloud side equipment trains an image processing model based on the integrated big data set. In addition, the cloud-side device may issue the model parameters of the trained image processing model to each end-side device, or deploy the image processing model according to a preset manner.
The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. These several specific embodiments may be combined with each other below, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
Fig. 3 is a flowchart of an image processing model training method according to an exemplary embodiment of the present application. The main execution body of the method provided by this embodiment is the server in the network architecture shown in fig. 1 or the cloud-side device in the system architecture shown in fig. 2. As shown in fig. 3, the method comprises the following specific steps:
step S301, a data set of an image processing task and an image processing model to be trained are obtained, and the image processing model is obtained by inserting a channel tuning module into a conversion layer of a pre-training model.
The embodiment can be applied to various image processing tasks such as image classification, image recognition, image segmentation, image detection and the like, and when the embodiment is applied to different image processing tasks, a channel tuning module in an image processing model is trained based on a data set of the specifically applied image processing task, so that the image processing model suitable for the specific image processing task can be obtained.
The data set of the image processing task refers to a training data set used for training an image processing model to obtain a model suitable for the current image processing task.
For example, for two different image classification tasks, a data set of each image classification task may be acquired, and a channel tuning module in an image processing model is trained based on each data set to obtain a model specifically applying each image classification task. The parameters of the pre-training models in the models for executing different image classification tasks obtained through training are consistent, but the channel tuning module has different parameters.
In this embodiment, when the pre-training model is migrated to a downstream specific image processing task, a channel tuning module is inserted into a transform layer of the pre-training model, and the channel tuning module is configured to perform transform processing on a feature of at least one target channel in an intermediate feature map extracted by the transform layer, so as to obtain an image processing model to be trained. In the training process of the image processing model, the original parameters of the pre-training model are kept unchanged, and only the parameters of the added channel tuning modules are adjusted, so that the number of trainable parameters can be greatly reduced, and the overfitting of a large-scale model to a small training set is avoided.
In particular, the backbone network of the pre-trained model typically includes multiple transform layers. The image processing model to be trained can be obtained by inserting a channel tuning module in one or more transform layers.
Preferably, the image processing model to be trained can be obtained by inserting a channel tuning module into each transform layer, so as to improve the expressive ability of the trained image processing model.
Illustratively, the pre-trained model may be a pre-trained ViT model, which has more parameters and whose full-scale parameters are fused and overfit when migrating to a task of a downstream small data set. The backbone network of the ViT model comprises 12 transform layers, and a channel tuning module is inserted into each transform layer to serve as an image processing model to be trained.
In addition, in this embodiment, the insertion position of the channel tuning module in the transform layer is not specifically limited.
Step S302, inputting the sample image in the data set into an image processing model, extracting the characteristics of the sample image through a conversion layer, and converting the characteristics of at least one target channel in the intermediate characteristic diagram extracted by the conversion layer through a channel tuning module inserted in the conversion layer.
And step S303, determining an image processing result according to the characteristics finally output by the conversion layer.
When the image processing model is trained, the sample image in the data set is input into the image processing model for image processing, in the process, the transformer layer is used for extracting the characteristics of the sample image, and the image processing result of the sample image can be determined according to the characteristics finally output by the transformer layer.
In this embodiment, because the channel optimization module is inserted into the transform layer, in the process of extracting features of the sample image by the transform layer, according to an insertion position of the channel optimization module in the transform layer, the channel optimization module performs a transformation process on features of at least one target channel in an intermediate feature map generated at the insertion position to obtain intermediate transformation features, and the intermediate transformation features are continuously input to a portion after the insertion position and continuously perform a subsequent process until an image processing result is obtained.
The characteristics finally output by the conversion layer can be characteristics output by the last transform layer in the image processing model, and the characteristics are subjected to transform processing of the channel tuning module in one or more transform layers.
And S304, training parameters of a channel tuning module in the image processing model according to the image processing result and the labeling information of the sample image to obtain a trained image processing model, wherein the trained image processing model is used for carrying out image processing on the input image to obtain an image processing result.
After the image processing result of the sample image is obtained, the loss can be calculated according to the image processing result of the sample image and the labeling information of the sample image, and the parameters of a channel tuning module in the image processing model are adjusted based on the loss; and after multiple iterations, obtaining a trained image processing model after the training is finished.
The trained image processing model is used for executing the current image processing task. Specifically, an image to be processed is input into a trained image processing model, and the image processing model is used for processing the input image to obtain an image processing result of the input image.
In this embodiment, a channel tuning module is inserted into a transform layer of a pre-training model to obtain an image processing model to be trained, and the channel tuning module is configured to perform transform processing on a feature of at least one target channel in an intermediate feature map extracted by the transform layer. When the pre-training model is transferred to a downstream specific image processing task, parameters of a channel tuning module in the image processing model are trained by using a data set of the specific image processing task, original parameters of the pre-training model are kept unchanged, the number of trainable parameters can be greatly reduced, the over-fitting problem ubiquitous due to the fact that the large-scale pre-training model is subjected to fine tuning is avoided, and therefore the accuracy of the model when the pre-training model is applied to the specific image processing task is improved.
In addition, considering that only adjusting part of channels in each transform layer damages the integrity of the model and suffers from model degradation, in this embodiment, a channel tuning module is added to implement the transformation of the features of the important target channels, so that the integrity of the original pre-training model is not damaged, the degradation of the pre-training model can be avoided, and the precision equivalent to or even better than that of the prior art can be achieved by training few parameters.
Referring to fig. 4, in an alternative embodiment, for any conversion layer into which a channel tuning module is inserted, the conversion processing is performed on the feature of at least one target channel in the intermediate feature map extracted by the conversion layer through the channel tuning module inserted into the conversion layer in step S302, which may specifically be implemented by the following steps:
step S3021, extracting the characteristics of the target channel from the intermediate characteristic diagram obtained at the insertion position in the conversion layer according to the insertion position of the channel optimization module in the conversion layer.
In this embodiment, the number of target channels may be represented by K, which is a preset hyper-parameter, and the same K may be used for each data set. The value of K may be determined and preconfigured based on experimental data on the public data set, and as shown based on experimental results on a certain public data set, significant performance may be obtained by selecting features of 32 target channels for transformation, where the number of trainable parameters is only 0.01M. The performance of the model improves with increasing K, but the performance improvement of K =192 over K =96 is very slight, while the number of parameters is four times larger. The number of target channels may be in the range [32, 96] for efficiency and efficiency. For example, the number of target channels may be set by default to 96. In this step, the features of 96 target channels are extracted as the original features to be transformed in the transform layer.
Optionally, for the selection of the K target channels, the K channels may be randomly selected as the target channels in a random selection manner.
Optionally, for the selection of K target channels, the importance of features in each channel may be analyzed and the feature weight of the channel may be determined based on the data set of the current image processing task, where the richer and the more significant the channel is, the greater the feature weight of the channel is. K channels which are more important can be selected as target channels based on the characteristic weights of the channels.
In this step, an intermediate feature map generated at the insertion location of the channel optimization module in the translation layer is obtained, and the dimension of the intermediate feature map can be represented as B × L × C. Wherein, B is the number of sample images; l is the number of tokens of the transform layer of the pre-training model, namely the number of blocks of the input image; c is the number of channels of the transform layer.
Further, the features of K target channels are extracted from the B × L × C intermediate feature map by channel, and may be represented as B × L × K as the original features to be transformed.
And step S3022, performing linear mapping on the extracted features through the channel tuning module to obtain mapping features, and fusing the mapping features with the extracted features to obtain fused features.
And the channel tuning module performs linear mapping on the extracted features of the K target channels to obtain mapping features, and then fuses the mapping features with the extracted features (original features before transformation) of the K target channels to obtain fused features, wherein the dimension of the fused features is still BxL xK.
Alternatively, when the mapping feature and the extracted feature are fused to obtain a fused feature, the mapping feature and the extracted feature may be added to obtain the fused feature.
Optionally, when the mapping feature and the extracted feature are fused to obtain a fused feature, the mapping feature and the extracted feature may be weighted and summed according to a preset weighting coefficient to obtain the fused feature. The preset weight coefficient includes a first weight coefficient of the mapped feature and a second weight coefficient of the extracted original feature, and values of the first weight coefficient and the second weight coefficient may be set and adjusted empirically, which is not specifically limited herein.
And step S3023, replacing the characteristics of the target channel in the intermediate characteristic diagram with the fused characteristics to obtain intermediate transformation characteristics.
The dimension of the fused feature obtained in step S3022 is still B × L × K, the fused feature is split according to the channels to obtain transformed features of K target channels, and the features of the target channels in the intermediate feature map are replaced with the corresponding transformed features, so that the fused feature is updated to the intermediate feature map to obtain the intermediate transformed feature.
Step S3024, inputting the intermediate transformation feature into the inserted portion of the image processing model, and performing subsequent image processing to determine an image processing result of the sample image.
Exemplarily, fig. 5 is a frame diagram for performing a transformation process on features of target channels in an intermediate feature map according to an exemplary embodiment of the present application, and as shown in fig. 5, a dimension of the intermediate feature map obtained from an insertion position of a transform layer is B × L × C, K target channels are selected based on feature weights of the channels, features of the K target channels are extracted from the intermediate feature map, and a dimension of an extracted original feature is B × L × K. And performing linear mapping on the extracted original features based on a channel tuning module to obtain mapping features, wherein the dimension is still BxLxK. And fusing the mapping characteristics with the extracted original characteristics to obtain fused characteristics with dimension of BxL xK. And replacing the characteristics of the K target channels in the intermediate characteristic diagram with the fused characteristics to obtain converted intermediate conversion characteristics.
In this embodiment, the channel tuning module inserted in the transform layer is a linear mapping layer, and has few trainable parameters, and when the features of K target channels extracted from the intermediate feature map are transformed, the extracted original features are linearly mapped through the linear mapping layer, and the mapped features and the extracted original features are fused and then replaced into the intermediate feature map, so as to obtain transformed intermediate transformed features.
On the basis of any of the above embodiments, the transform layer in the backbone network of the pre-trained model generally includes a first sub-layer and a second sub-layer, and an output of the first sub-layer is an input of the second sub-layer.
Taking the example that the pre-training model is a ViT model, a first sub-layer in a transform layer of a backbone network of the ViT model is a Multi-Head Self-Attention (MHSA) module, which includes two sub-layers, namely a Multi-Head Self-Attention (MHSA) layer and a normalized layer (LayerNorm), and a second sub-layer is a Multilayer Perceptron (MLP), which includes two sub-layers, namely a Multilayer Perceptron (MLP) and a normalized layer (LayerNorm).
Optionally, the insertion position of the channel tuning module in the conversion layer is between the first sublayer and the second sublayer, i.e. the channel tuning module is inserted after the first sublayer and before the second sublayer of the conversion layer.
Illustratively, taking the pre-training model as the ViT model as an example, as shown in fig. 6, the insertion position of the channel-tuning module is after the multi-headed self-attention (MHSA) layer and before the normalization layer (LayerNorm) in the multi-layered perceptron module, i.e. between the first sublayer and the second sublayer.
Further, in step S3021, the feature of the target channel is extracted from the intermediate feature map output from the first sub-layer of the conversion layer according to the insertion position of the channel optimization module in the conversion layer.
In step S3024, the intermediate transform feature obtained after the transform is input into the second sublayer after the insertion position.
Optionally, the insertion position of the channel optimisation module in the translation layer is after the second sublayer.
Illustratively, taking the pre-training model as the ViT model as an example, as shown in fig. 7, the insertion position of the channel tuning module is after the normalization layer (LayerNorm) of the multi-layer perceptron module, i.e. after the second sublayer.
Further, in step S3021, the feature of the target channel is extracted from the intermediate feature map output from the second sub-layer of the conversion layer according to the insertion position of the channel optimization module in the conversion layer.
In step S3024, the intermediate transformation feature obtained by transformation is input to the next layer (transform layer or other layer) after the insertion position.
Optionally, the insertion position of the channel optimization module in the translation layer may also be in the first sublayer, such as between the normalization layer (LayerNorm) and the multi-headed self-attention layer in the first sublayer of the ViT model; alternatively, the insertion position of the channel tuning module in the translation layer may also be in the second sub-layer, such as between the normalization layer (LayerNorm) and the multilayer perceptron (MLP) in the second sub-layer of the ViT model, and the insertion position of the channel tuning module in this embodiment may be anywhere in the translation layer, which is not specifically limited herein.
In this embodiment, a channel tuning module is inserted between the first sublayer and the second sublayer of the translation layer, which is an optimal manner, after collecting long-term dependency relationship with a multi-head self-attention (MHSA) module, the features include more significant and more important channels, and can better adapt to downstream image processing tasks, such as an image classification task, an image segmentation task, an image detection task, an image recognition task, and the like, so that the model has better performance when applied to a specific downstream image processing task.
On the basis of any of the above embodiments, when K target channels for feature transformation by the channel tuning module are selected, K channels can be randomly selected from the channels of the intermediate feature map extracted by the conversion layer, and the model can also have better performance as the target channels of the conversion layer.
In an optional embodiment, when K target channels of any conversion layer are selected, the feature weights of the channels in the conversion layer of the pre-training model when applied to the image processing task can be determined according to the data set of the image processing task; and selecting a preset number (K) of channels as target channels of the conversion layer according to the characteristic weight of each channel in the conversion layer. The preset number K is greater than or equal to 1.
The feature weight of each channel can be determined by analyzing the importance of the features in each channel based on the data set of the current image processing task, and the more abundant and obvious the features in the channel are, the more important the channel is, and the greater the feature weight of the channel is.
Specifically, the feature weights of all channels in the conversion layer are sorted, and K channels with larger feature weights are selected as target channels of the conversion layer. For example, the feature weights of the channels in the conversion layer are sorted in the descending order, and Top K channels are selected as target channels of the conversion layer according to the sorting result.
In this embodiment, the importance of each channel in each conversion layer may be analyzed based on the current data set, K important channels are respectively selected as target channels in each conversion layer, the K target channels selected in different conversion layers may be different, and the target channels used when applied to different data sets are different, which can improve the performance of the model.
Specifically, the feature weight of each channel in the conversion layer of the pre-training model when applied to the image processing task is determined according to the data set of the image processing task, which can be specifically implemented in the following manner:
inputting sample images in a data set into a pre-training model, and performing feature extraction on the sample images through a conversion layer of the pre-training model; acquiring an intermediate characteristic diagram extracted by a conversion layer of a pre-training model according to the insertion position of a channel tuning module in an image processing model in the conversion layer; and taking the value obtained after L2 standardization operation is carried out on the features of each channel in the intermediate feature map extracted by the conversion layer as the feature weight of each channel in the conversion layer.
Illustratively, taking the pre-training model as the ViT model as an example, the backbone network of the ViT model includes 12 transform layers. And inputting the sample image in the data set into a ViT model, and acquiring the features generated at the insertion position from each transform layer according to the insertion position of the channel tuning module to serve as an intermediate feature map. Any one of the transform layers is denoted by l, and the intermediate feature map extracted by the transform layer can be expressed as f l The superscript l represents the transformer layer, l can be taken as [1, 12 ]]Any integer in the interval. f. of l Is B × L × C, wherein B is the number of sample images in the data set; l is the number of tokens of the transform layer of the pre-training model, namely the number of blocks of the input image; c is the number of channels of the transform layer. By f i l Representing an intermediate feature map f l The characteristic of channel i in (1), i refers to any channel, can be taken as [1,C ]]Any integer in the interval. To eliminate the effect of image shift, the intermediate feature map f is subjected to l Characteristic of channel i in (1)The value obtained by performing the L2 normalization operation is used as the feature weight of the channel i. The feature weights of the channels are spliced into a vector form in sequence, namely: />Wherein +>Represents->The value after the L2 normalization operation was performed. Concat () represents the value concatenation after normalizing the feature L2 of each channel. />Has a dimension of 1 × C. And sorting according to the feature weight of each channel, and selecting K channels with high feature weights according to a sorting result as target channels of the transform layer l.
The specific implementation manner of obtaining the intermediate feature map is the same as the manner of obtaining the intermediate feature map from the insertion position in the conversion layer in S3021, and is not described herein again.
In an optional embodiment, the feature weights of the channels in the conversion layer of the pre-trained model when applied to a public data set can be determined based on the public data set; and selecting a preset number (K) of channels as target channels of the conversion layer according to the characteristic weight of each channel in the conversion layer. The target channel determined based on the public data set is used when applied to downstream image processing tasks.
However, in consideration of the fact that the channel importance varies from data set to data set in practical applications, it is recommended to analyze the importance of each channel using a data set based on a specific image processing task, select a target channel, and train based on the target channel to obtain an image processing model applied to the specific image processing task, so that the accuracy of the model when applied to the specific image processing task can be improved.
In an optional embodiment, the image processing task is an image classification task, and the annotation information of the sample image is the category information of the sample image. According to the data set of the image processing task, determining the feature weight of each channel in the conversion layer of the pre-training model when the pre-training model is applied to the image processing task, and specifically adopting the following method:
inputting the sample image of each category into a pre-training model according to the category information of the sample image in the data set, and performing feature extraction on the sample image through a conversion layer of the pre-training model; acquiring an intermediate characteristic diagram extracted by a conversion layer of a pre-training model according to the insertion position of a channel tuning module in an image processing model in the conversion layer; performing L2 standardization operation on the characteristics of each channel in the intermediate characteristic diagram extracted by the conversion layer to obtain a value, wherein the value is used as the characteristic weight of each channel in the conversion layer corresponding to the category; and calculating the average value of the characteristic weights of each channel in the conversion layer corresponding to the various categories as the characteristic weight of each channel in the conversion layer.
Illustratively, taking the pre-trained model as the ViT model as an example, the backbone network of the ViT modelThe net comprises 12 transform layers. When applied to an image classification task, the data set contains a plurality of classified sample images. Taking the Caltech101 dataset as an example, it contains 101 classes and 1000 images. And grouping the sample images according to the classes in the data set, wherein each group contains the sample images in the same class. The number of categories contained in the data set is represented by M, and N is used c Representing any one packet, corresponding to a class, N c Can be taken from [1, M ]]Any integer in the interval. Taking Caltech101 dataset as an example, M =101,n c Can be taken from [1, 101]Any integer in the interval may refer to any one of the categories. By usingTo N c The number of sample images in this group. For each group N c Group the packet N c The sample image in (1) is input into a ViT model, and according to the insertion position of the channel tuning module, the features generated at the insertion position are obtained from each transform layer as an intermediate feature map. Let l denote any one of the transform layers, and the intermediate feature map extracted by a transform layer can be expressed as ∑ er @>The superscript l represents the layer of a transformer, l may be [1, 12 ]]Any integer in the interval, subscript N c Indicating the corresponding grouping/category. />Is dimensioned to->Wherein it is present>Is the packet N c The number of middle sample images; l is the number of tokens of the transform layer of the pre-training model, namely the number of blocks of the input image; c is the number of channels of the transform layer. By usingRepresents an intermediate characteristic diagram->The characteristic of channel i in (1), i refers to any channel, can be taken as [1,C ]]Any integer in the interval. In order to eliminate the effect of image shifts, an intermediate feature map will be combined>Characteristic of channel i in->The value obtained by performing L2 normalization operation as a channel i corresponding to the category N c The feature weight of (2). The feature weights of the channels are spliced into a vector form in sequence, namely: />Wherein,represents->The values after the L2 normalization operation were performed. Concat () represents the concatenation of the normalized values of the features L2 of the individual channels. />Dimension (d) is 1 × C. Determining the final feature weight for each channel by averaging the feature weights for each channel corresponding to the various categories can be expressed as +>Wherein Z is l Representing the final feature weight of each channel. According to Z l Sorting is carried out, and the characteristic weight Z is selected according to the sorting result l The high K channels serve as target channels for the transformer layer l.
In the embodiment, when the method is applied to the image classification task, the characteristic weight of each channel is estimated by combining the characteristics that the data set of the image classification task contains a plurality of categories, the influence of the categories is fully considered when the importance of the channels is analyzed, instead of estimating the importance of the channels by taking the whole data set as a whole, the important channels can be more accurately selected as target channels, and model training is performed on the basis to improve the precision and performance of the model when the method is applied to the image classification task.
Fig. 8 is a flowchart of an image processing model training method according to an exemplary embodiment of the present application, where an execution subject of the method provided by the present application is a server in the network architecture shown in fig. 1. As shown in fig. 8, the method comprises the following specific steps:
step S801, receiving a data set of an image processing task sent by the user equipment.
In practical applications, when a user wants to acquire an image processing model for executing a specific image processing task, a data set of a current image processing task may be acquired by a user device, and the data set of the current image processing task is uploaded to a server. The server receives the data set of the image processing task sent by the user equipment, and trains to obtain an image processing model based on the data set of the current image processing task through subsequent steps S802-S805.
And S802, obtaining an image processing model to be trained, wherein the image processing model is obtained by inserting a channel tuning module into a conversion layer of a pre-training model.
Step S803, inputting the sample image in the data set into an image processing model, performing feature extraction on the sample image through the conversion layer, and performing transformation processing on the feature of at least one target channel in the intermediate feature map extracted by the conversion layer through the channel tuning module inserted in the conversion layer.
And step S804, determining an image processing result according to the characteristics finally output by the conversion layer.
And S805, training parameters of a channel tuning module in the image processing model according to the image processing result and the labeling information of the sample image to obtain the trained image processing model.
And step S806, outputting the model parameters of the trained image processing model to user equipment.
In this embodiment, the specific implementation manner of the steps S802 to S805 is referred to the model training process related to the steps S301 to S304 in the foregoing embodiment, and details of this embodiment are not repeated herein.
The embodiment provides a system architecture of an image processing model training method in practical application.
The image processing model training method provided by the application can be applied to the fields of medical treatment, remote sensing and the like, wherein a large amount of training data are difficult to obtain, and can be applied to the processing tasks of images such as medical images and remote sensing images. Fig. 9 is a flowchart of a training method for a remote sensing image processing model according to an exemplary embodiment of the present application, where an execution subject of the method provided by the present application is a server in the network architecture shown in fig. 1. As shown in fig. 9, the method comprises the following specific steps:
step S901, receiving a remote sensing image data set sent by user equipment, where the remote sensing image data set includes a plurality of remote sensing images and annotation information of the remote sensing images.
In this embodiment, for example, when the remote sensing system is applied to the remote sensing field, a user wants to obtain an image processing model for executing a processing task on a remote sensing image, a remote sensing image data set of a current image processing task may be obtained through user equipment, and the remote sensing image data set is uploaded to a server. The remote sensing image data set comprises a plurality of remote sensing images and labeling information of the remote sensing images.
The server receives a remote sensing image data set sent by the user equipment, and trains to obtain an image processing model based on the data set of the current image processing task through subsequent steps S902-S905.
And S902, acquiring an image processing model to be trained, wherein the image processing model is obtained by inserting a channel tuning module into a conversion layer of a pre-training model.
And step S903, inputting the remote sensing image into an image processing model, extracting the characteristics of the remote sensing image through a conversion layer, and transforming the characteristics of at least one target channel in the intermediate characteristic diagram extracted by the conversion layer through a channel tuning module inserted in the conversion layer.
And step S904, determining an image processing result of the remote sensing image according to the characteristics finally output by the conversion layer.
And S905, training parameters of a channel tuning module in the image processing model according to the image processing result and the labeling information of the remote sensing image to obtain the trained image processing model.
And step S906, outputting the model parameters of the trained image processing model to user equipment.
In this embodiment, the specific implementation manner of the steps S902 to S905 refers to the model training process related to the steps S301 to S304 in the foregoing embodiment, and this embodiment is not described herein again.
The embodiment provides a system architecture of an image processing model training method when the method is applied to a processing task of a remote sensing image.
Fig. 10 is a flowchart of an image processing method according to an exemplary embodiment of the present application, where an execution subject of the method provided by the present application is an electronic device in the network architecture shown in fig. 1, and is responsible for executing an image processing task using a trained image processing model. As shown in fig. 10, the method comprises the following specific steps:
step S1001, an image to be processed is acquired.
Step S1002, inputting the image into the trained image processing model, extracting the features of the image through a conversion layer of the image processing model, and performing transformation processing on the features of at least one target channel in the intermediate feature map extracted by the conversion layer through a channel tuning module inserted in the conversion layer.
In this step, feature extraction is performed on the image through a conversion layer of the image processing model, and a channel tuning module inserted in the conversion layer is used to perform conversion processing on features of at least one target channel in the intermediate feature map extracted by the conversion layer, where a specific implementation manner is similar to the process of processing the sample image in step S302, and specific reference is made to relevant contents in the above embodiment, and details are not described here again.
The image processing model may be a model for implementing any image processing task of image classification, image recognition, image segmentation, and image detection.
And step S1003, determining an image processing result of the image according to the characteristics finally output by the conversion layer.
In this embodiment, because a channel optimization module is inserted into a transform layer of an image processing model, after an image to be processed is input into the image processing model, in a process of feature extraction on the image through the transform layer of the image processing model, according to an insertion position of the channel optimization module in the transform layer, the channel optimization module performs transform processing on features of at least one target channel in an intermediate feature map generated at the insertion position to obtain intermediate transform features, and the intermediate transform features are continuously input into a portion after the insertion position and are continuously subjected to subsequent processing until an image processing result is obtained.
The characteristics finally output by the conversion layer can be characteristics output by the last transform layer in the image processing model, and the characteristics are subjected to transform processing of the channel tuning module in one or more transform layers.
And step S1004, outputting an image processing result.
The method of the embodiment can improve the precision of image processing.
Fig. 11 is a schematic diagram of an image processing model training system according to an exemplary embodiment of the present application. As shown in fig. 11, the image processing model training system includes: an end-side device 1101, and a cloud-side device 1102 communicatively connected to the end-side device 1101. The end-side device 1101 constructs a plurality of sets of training data, forms a data set of the image processing task, and uploads the data set of the image processing task to the cloud-side device 1102. The cloud-side device 1102 trains an image processing model from the data set of the image processing task, and issues model parameters of the trained image processing model to the end-side device 1101 when the loss function of the model converges.
Specifically, the end-side device 1101 constructs a data set of an image processing task, and transmits the data set of the image processing task to the cloud-side device.
The cloud-side device 1102 is configured to: receiving a data set of an image processing task, and acquiring an image processing model to be trained, wherein the image processing model is obtained by inserting a channel tuning module into a conversion layer of a pre-training model; inputting a sample image in a data set into an image processing model, extracting the characteristics of the sample image through a conversion layer, and converting the characteristics of at least one target channel in an intermediate characteristic diagram extracted by the conversion layer through a channel tuning module inserted in the conversion layer; determining an image processing result according to the characteristics finally output by the conversion layer; and training parameters of a channel tuning module in the image processing model according to the image processing result and the labeling information of the sample image to obtain the trained image processing model.
The cloud side device 1102 is further configured to send model parameters of the trained image processing model to the end side device 1101.
Optionally, the cloud-side device 1102 may also deploy the trained image processing model according to a preset manner, for example, to a cloud server.
In this embodiment, the end-side device 1101 may be an edge cloud device in which various network platforms are deployed at the edge of a network, and is responsible for collecting various data generated by terminal devices within the coverage of the end-side device, preprocessing the data, constructing a data set of an image processing task, and uploading the data set to the cloud-side device. The end-side device 1101 may be a server device such as a conventional server, a cloud server, or a server array. The terminal device includes, but is not limited to, a desktop computer, a notebook computer, or a smartphone.
The cloud-side device 1102 may be a central cloud device in which various network platforms are deployed in a network center, and may be a server-side device such as a conventional server, a cloud server, or a server array. The cloud side device receives the data sets of the image processing tasks sent by the end side device, and integrates the data sets of the same image processing tasks from different end side devices to form a larger data set. And the cloud side equipment trains an image processing model based on the integrated big data set.
In this embodiment, the cloud-side device 1102 performs a process of training to obtain a trained image processing model based on the data set of the image processing task, which is specifically referred to relevant contents of steps S301 to S304 in the foregoing method embodiment, and details are not described here.
Fig. 12 is a schematic structural diagram of an image processing model training apparatus according to an exemplary embodiment of the present application. The device provided by the embodiment is applied to executing the image processing model training method. As shown in fig. 12, the image processing model training apparatus 120 includes: a data acquisition unit 1201, an image processing unit 1202, and a parameter training unit 1203.
The data obtaining unit 1201 is configured to obtain a data set of an image processing task and an image processing model to be trained, where the image processing model is obtained by inserting a channel tuning module into a conversion layer of a pre-training model.
The image processing unit 1202 is configured to input a sample image in a data set into an image processing model, perform feature extraction on the sample image through a conversion layer, and perform transformation processing on features of at least one target channel in an intermediate feature map extracted by the conversion layer through a channel tuning module inserted in the conversion layer; and determining an image processing result according to the characteristics finally output by the conversion layer.
The parameter training unit 1203 is configured to train parameters of a channel tuning module in the image processing model according to the image processing result and the labeling information of the sample image, so as to obtain a trained image processing model, where the trained image processing model is used to perform image processing on an input image, so as to obtain an image processing result.
In an alternative embodiment, when implementing the transformation processing on the feature of at least one target channel in the intermediate feature map extracted by the conversion layer through the channel tuning module inserted in the conversion layer, the image processing unit 1202 is further configured to:
extracting the characteristics of the target channel from an intermediate characteristic diagram obtained at the insertion position in the conversion layer according to the insertion position of the channel optimization module in the conversion layer; performing linear mapping on the extracted features through a channel tuning module to obtain mapping features, and fusing the mapping features with the extracted features to obtain fused features; replacing the characteristics of the target channel in the intermediate characteristic diagram with the fused characteristics to obtain intermediate transformation characteristics; and inputting the intermediate transformation characteristics into the part inserted into the image processing model, and performing subsequent image processing to determine an image processing result of the sample image.
In an alternative embodiment, when fusion of the mapping feature and the extracted feature is implemented to obtain a fused feature, the image processing unit 1202 is further configured to:
adding the mapping features and the extracted features to obtain fused features;
or,
and according to a preset weight coefficient, carrying out weighted summation on the mapping characteristics and the extracted characteristics to obtain the fused characteristics.
In an alternative embodiment, the conversion layer comprises a first sublayer and a second sublayer, the output of the first sublayer being the input of the second sublayer, and the insertion position of the channel tuning module in the conversion layer is between the first sublayer and the second sublayer.
In implementing the extraction of the feature of the target channel from the intermediate feature map obtained at the insertion position in the conversion layer, the image processing unit 1202 is further configured to: and extracting the characteristics of the target channel from the intermediate characteristic diagram output by the first sublayer of the conversion layer.
In an alternative embodiment, the conversion layer comprises a first sublayer and a second sublayer, the output of the first sublayer being the input of the second sublayer, the insertion position of the channel tuning module in the conversion layer being subsequent to the second sublayer.
In implementing the extraction of the features of the target channel from the intermediate feature map resulting at the insertion position in the translation layer, the image processing unit 1202 is further configured to: and extracting the characteristics of the target channel from the intermediate characteristic diagram output by the second sublayer.
In an optional embodiment, the image processing model training apparatus 120 further comprises: a channel selection unit. The channel selection unit is used for:
determining the characteristic weight of each channel in a conversion layer of a pre-training model when the characteristic weight is applied to an image processing task according to a data set of the image processing task; and selecting a preset number of channels as target channels of the conversion layer according to the characteristic weight of each channel in the conversion layer, wherein the preset number is greater than or equal to 1.
In an alternative embodiment, in implementing determining the feature weights of the channels in the conversion layer of the pre-trained model when applied to the image processing task, based on the dataset of the image processing task, the channel selection unit is further configured to:
inputting sample images in a data set into a pre-training model, and performing feature extraction on the sample images through a conversion layer of the pre-training model; acquiring an intermediate characteristic diagram extracted by a conversion layer of a pre-training model according to the insertion position of a channel tuning module in an image processing model in the conversion layer; and taking the value obtained after L2 standardization operation is carried out on the features of each channel in the intermediate feature map extracted by the conversion layer as the feature weight of each channel in the conversion layer.
In an optional embodiment, the image processing task is an image classification task, and the annotation information of the sample image is the category information of the sample image.
When determining the feature weight of each channel in the conversion layer of the pre-training model when applied to the image processing task according to the data set of the image processing task, the channel selection unit is further configured to:
inputting the sample image of each category into a pre-training model according to the category information of the sample image in the data set, and performing feature extraction on the sample image through a conversion layer of the pre-training model; acquiring an intermediate characteristic diagram extracted by a conversion layer of a pre-training model according to the insertion position of a channel tuning module in an image processing model in the conversion layer; performing L2 standardization operation on the characteristics of each channel in the intermediate characteristic diagram extracted by the conversion layer to obtain a value, wherein the value is used as the characteristic weight of each channel in the conversion layer corresponding to the category; and calculating the mean value of the feature weights of each channel in the conversion layer corresponding to each category as the feature weight of each channel in the conversion layer.
In an alternative embodiment, after obtaining the trained image processing model, the method further includes: and acquiring an image to be processed in the image processing task, and inputting the image to be processed into the trained image processing model for image processing to obtain an image processing result.
In an alternative embodiment, after obtaining the trained image processing model, the method further includes: and outputting the model parameters of the trained image processing model to the end-side equipment.
The apparatus provided in this embodiment may be specifically configured to execute the image processing model training method provided in any of the above embodiments, and specific functions and technical effects that can be achieved are not described herein again.
Fig. 13 is a schematic structural diagram of an image processing apparatus according to an exemplary embodiment of the present application. The device provided by the embodiment is applied to executing the image processing method. As shown in fig. 13, the image processing apparatus 130 includes: an image acquisition unit 1301, an image processing unit 1302, and a processing result output unit 1303.
The image acquiring unit 1301 is used for acquiring an image to be processed.
The image processing unit 1302 is configured to input an image into a trained image processing model, perform feature extraction on the image through a conversion layer of the image processing model, and perform transformation processing on a feature of at least one target channel in an intermediate feature map extracted by the conversion layer through a channel tuning module inserted in the conversion layer; and determining an image processing result of the image according to the characteristics finally output by the conversion layer.
The processing result output unit 1303 is used to output an image processing result.
The apparatus provided in this embodiment may be specifically configured to execute the image processing method provided based on any of the above embodiments, and specific functions and technical effects that can be achieved are not described herein again.
Fig. 14 is a schematic structural diagram of an electronic device according to an example embodiment of the present application. As shown in fig. 14, the electronic device 140 includes: a processor 1401, and a memory 1402 communicatively coupled to the processor 1401, the memory 1402 storing computer-executable instructions.
The processor executes the computer execution instructions stored in the memory to implement the scheme provided by any of the above method embodiments, and the specific functions and the technical effects that can be achieved are not described herein again.
The embodiment of the present application further provides a computer-readable storage medium, where computer-executable instructions are stored in the computer-readable storage medium, and when the computer-executable instructions are executed by a processor, the computer-executable instructions are used to implement the solutions provided in any of the above method embodiments, and specific functions and technical effects that can be achieved are not described herein again.
An embodiment of the present application further provides a computer program product, where the computer program product includes: the computer program is stored in a readable storage medium, at least one processor of the electronic device can read the computer program from the readable storage medium, and the at least one processor executes the computer program to enable the electronic device to execute the scheme provided by any one of the above method embodiments, and specific functions and achievable technical effects are not described herein again.
In addition, in some of the flows described in the above embodiments and the drawings, a plurality of operations are included in a certain order, but it should be clearly understood that the operations may be executed out of the order presented herein or in parallel, and only for distinguishing between different operations, and the sequence number itself does not represent any execution order. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first", "second", etc. in this document are used for distinguishing different messages, devices, modules, etc., and do not represent a sequential order, nor limit the types of "first" and "second" to be different. The meaning of "a plurality" is two or more unless specifically limited otherwise.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.
Claims (14)
1. An image processing model training method, comprising:
acquiring a data set of an image processing task and an image processing model to be trained, wherein the image processing model is obtained by inserting a channel tuning module into a conversion layer of a pre-training model;
inputting a sample image in the data set into the image processing model, extracting the characteristics of the sample image through the conversion layer, and transforming the characteristics of at least one target channel in the intermediate characteristic diagram extracted by the conversion layer through a channel tuning module inserted in the conversion layer;
determining an image processing result according to the characteristics finally output by the conversion layer;
and training parameters of a channel tuning module in the image processing model according to the image processing result and the labeling information of the sample image to obtain a trained image processing model, wherein the trained image processing model is used for carrying out image processing on an input image to obtain an image processing result.
2. The method of claim 1, wherein the transforming the feature of at least one target channel in the intermediate feature map extracted by the conversion layer by the channel tuning module inserted in the conversion layer comprises:
extracting the characteristics of the target channel from an intermediate characteristic map obtained at the insertion position in the conversion layer according to the insertion position of the channel tuning module in the conversion layer;
the extracted features are subjected to linear mapping through the channel tuning module to obtain mapping features, and the mapping features and the extracted features are fused to obtain fused features;
replacing the features of the target channel in the intermediate feature map with the fused features to obtain intermediate transformation features;
and inputting the intermediate transformation characteristic into the part of the image processing model after the insertion position, and performing subsequent image processing to determine an image processing result of the sample image.
3. The method of claim 2, wherein fusing the mapped features with the extracted features to obtain fused features comprises:
adding the mapping features and the extracted features to obtain fused features;
or,
and according to a preset weight coefficient, carrying out weighted summation on the mapping characteristics and the extracted characteristics to obtain the fused characteristics.
4. The method of claim 2, wherein the translation layer comprises a first sublayer and a second sublayer, wherein an output of the first sublayer is an input of the second sublayer,
the insertion position of the channel optimisation module in the translation layer is between the first sublayer and the second sublayer,
said extracting features of said target channel from an intermediate feature map derived at said insertion locations in said translation layer, comprising:
and extracting the characteristics of the target channel from the intermediate characteristic graph output by the first sublayer of the conversion layer.
5. The method of claim 2, wherein the translation layer comprises a first sublayer and a second sublayer, an output of the first sublayer being an input of the second sublayer,
the insertion position of the channel optimisation module in the conversion layer follows the second sublayer,
said extracting features of said target channel from an intermediate feature map derived at said insertion locations in said translation layer, comprising:
and extracting the characteristics of the target channel from the intermediate characteristic diagram output by the second sublayer.
6. The method of any of claims 1-5, wherein after the acquiring the dataset of image processing tasks, further comprising:
determining the characteristic weight of each channel in the conversion layer of the pre-training model when the pre-training model is applied to the image processing task according to the data set of the image processing task;
and selecting a preset number of channels as target channels of the conversion layer according to the characteristic weight of each channel in the conversion layer, wherein the preset number is more than or equal to 1.
7. The method of claim 6, wherein determining feature weights for channels in a translation layer of the pre-trained model when applied to an image processing task from a dataset of the image processing task comprises:
inputting sample images in the data set into the pre-training model, and performing feature extraction on the sample images through a conversion layer of the pre-training model;
acquiring an intermediate feature map extracted by the conversion layer of the pre-training model according to the insertion position of the channel tuning module in the conversion layer in the image processing model;
and taking the value obtained after the characteristic of each channel in the intermediate characteristic diagram extracted by the conversion layer is subjected to the L2 standardization operation as the characteristic weight of each channel in the conversion layer.
8. The method of claim 6, wherein the image processing task is an image classification task, the annotation information of the sample image is class information of the sample image,
determining the feature weight of each channel in the conversion layer of the pre-training model when the feature weight is applied to the image processing task according to the data set of the image processing task comprises the following steps:
inputting the sample image of each category into the pre-training model according to the category information of the sample image in the data set, and performing feature extraction on the sample image through a conversion layer of the pre-training model;
acquiring an intermediate feature map extracted by the conversion layer of the pre-training model according to the insertion position of the channel tuning module in the conversion layer in the image processing model;
performing L2 standardization operation on the feature of each channel in the intermediate feature map extracted by the conversion layer to obtain a value, wherein the value is used as the feature weight of each channel in the conversion layer corresponding to the category;
and calculating the average value of the characteristic weight of each channel in the conversion layer corresponding to each category as the characteristic weight of each channel in the conversion layer.
9. The method according to any one of claims 1-5, wherein after obtaining the trained image processing model, further comprising:
acquiring an image to be processed in the image processing task, and inputting the image to be processed into a trained image processing model for image processing to obtain an image processing result;
or,
and outputting the model parameters of the trained image processing model to the end-side equipment.
10. A training method of a remote sensing image processing model is characterized by comprising the following steps:
receiving a remote sensing image data set sent by user equipment, wherein the remote sensing image data set comprises a plurality of remote sensing images and annotation information of the remote sensing images;
acquiring an image processing model to be trained, wherein the image processing model is obtained by inserting a channel tuning module into a conversion layer of a pre-training model;
inputting the remote sensing image into the image processing model, extracting the characteristics of the remote sensing image through the conversion layer, and transforming the characteristics of at least one target channel in the intermediate characteristic diagram extracted by the conversion layer through a channel tuning module inserted in the conversion layer;
determining an image processing result of the remote sensing image according to the characteristics finally output by the conversion layer;
training parameters of a channel tuning module in the image processing model according to the image processing result and the labeling information of the remote sensing image to obtain a trained image processing model;
and outputting the model parameters of the trained image processing model to the user equipment.
11. An image processing method, characterized by comprising:
acquiring an image to be processed;
inputting the image into a trained image processing model, extracting the features of the image through a conversion layer of the image processing model, and performing transformation processing on the features of at least one target channel in an intermediate feature map extracted by the conversion layer through a channel tuning module inserted in the conversion layer;
determining an image processing result of the image according to the characteristics finally output by the conversion layer;
and outputting the image processing result.
12. An image processing model training system, comprising:
the end-side equipment is used for constructing a data set of an image processing task and sending the data set of the image processing task to the cloud-side equipment;
the cloud side equipment is used for receiving a data set of an image processing task and acquiring an image processing model to be trained, wherein the image processing model is obtained by inserting a channel tuning module into a conversion layer of a pre-training model; inputting a sample image in the data set into the image processing model, extracting the characteristics of the sample image through the conversion layer, and transforming the characteristics of at least one target channel in the intermediate characteristic diagram extracted by the conversion layer through a channel tuning module inserted in the conversion layer; determining an image processing result according to the characteristics finally output by the conversion layer; training parameters of a channel tuning module in the image processing model according to the image processing result and the labeling information of the sample image to obtain a trained image processing model;
the cloud side device is further used for sending the model parameters of the trained image processing model to the end side device.
13. An electronic device, comprising: a processor, and a memory communicatively coupled to the processor;
the memory stores computer-executable instructions;
the processor executes computer-executable instructions stored by the memory to implement the method of any of claims 1-11.
14. A computer-readable storage medium having computer-executable instructions stored therein, which when executed by a processor, are configured to implement the method of any one of claims 1-11.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211358446.2A CN115908969A (en) | 2022-11-01 | 2022-11-01 | Method and apparatus for image processing and model training |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211358446.2A CN115908969A (en) | 2022-11-01 | 2022-11-01 | Method and apparatus for image processing and model training |
Publications (1)
Publication Number | Publication Date |
---|---|
CN115908969A true CN115908969A (en) | 2023-04-04 |
Family
ID=86471747
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211358446.2A Pending CN115908969A (en) | 2022-11-01 | 2022-11-01 | Method and apparatus for image processing and model training |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115908969A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117523410A (en) * | 2023-11-10 | 2024-02-06 | 中国科学院空天信息创新研究院 | Image processing and construction method based on multi-terminal collaborative perception distributed large model |
-
2022
- 2022-11-01 CN CN202211358446.2A patent/CN115908969A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117523410A (en) * | 2023-11-10 | 2024-02-06 | 中国科学院空天信息创新研究院 | Image processing and construction method based on multi-terminal collaborative perception distributed large model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111860573B (en) | Model training method, image category detection method and device and electronic equipment | |
CN111324774B (en) | Video duplicate removal method and device | |
CN110210513B (en) | Data classification method and device and terminal equipment | |
US20220222925A1 (en) | Artificial intelligence-based image processing method and apparatus, device, and storage medium | |
CN110929622A (en) | Video classification method, model training method, device, equipment and storage medium | |
US8472728B1 (en) | System and method for identifying and characterizing content within electronic files using example sets | |
CN115511501A (en) | Data processing method, computer equipment and readable storage medium | |
CN113821668A (en) | Data classification identification method, device, equipment and readable storage medium | |
CN114693624A (en) | Image detection method, device and equipment and readable storage medium | |
CN112786160A (en) | Multi-image input multi-label gastroscope image classification method based on graph neural network | |
CN111008570B (en) | Video understanding method based on compression-excitation pseudo-three-dimensional network | |
CN115908969A (en) | Method and apparatus for image processing and model training | |
CN117011859A (en) | Picture processing method and related device | |
CN110276405A (en) | Method and apparatus for output information | |
CN117079005A (en) | Optical cable fault monitoring method, system, device and readable storage medium | |
CN116740078A (en) | Image segmentation processing method, device, equipment and medium | |
CN116958730A (en) | Training method and device of image recognition model, storage medium and electronic equipment | |
CN116958027A (en) | Three-dimensional industrial abnormality detection method and device, storage medium and electronic equipment | |
CN113298265B (en) | Heterogeneous sensor potential correlation learning method based on deep learning | |
CN115114467B (en) | Training method and device for picture neural network model | |
CN115861605A (en) | Image data processing method, computer equipment and readable storage medium | |
CN114722942A (en) | Equipment fault diagnosis method and device, electronic equipment and storage medium | |
CN115878839A (en) | Video recommendation method and device, computer equipment and computer program product | |
EP3683733A1 (en) | A method, an apparatus and a computer program product for neural networks | |
CN117058498B (en) | Training method of segmentation map evaluation model, and segmentation map evaluation method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |