CN117574982B

CN117574982B - Pre-training model fine tuning method and device based on linear transformation

Info

Publication number: CN117574982B
Application number: CN202410060305.5A
Authority: CN
Inventors: 王玉柱; 段曼妮; 王永恒
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2024-01-16
Filing date: 2024-01-16
Publication date: 2024-04-26
Anticipated expiration: 2044-01-16
Also published as: CN117574982A

Abstract

A pretraining model fine tuning method and device based on linear transformation, the method comprises: collecting related image data of a downstream task, dividing a training set and a verification set for the image data, performing proper preprocessing, selecting a proper pre-training model, modifying a task head of the model to adapt to the downstream task, and freezing a backbone part of the pre-training model; inserting a linear transformation module between adjacent layers, the linear transformation module being used for scaling and translating features; the data of the downstream task is utilized to finely tune the pre-training model, and the model weight with the optimal performance on the verification set is saved; and integrating the parameters of the linear transformation module into the adjacent layers by utilizing a heavy parameter technology, and finally deploying the model to finish the downstream task. The invention has less introduced parameters to be learned, can realize higher accuracy on various downstream tasks, and adopts a heavy parameter technology to integrate the introduced parameters into a model backbone layer in a model reasoning stage, thereby greatly simplifying the deployment of the model.

Description

Pre-training model fine tuning method and device based on linear transformation

Technical Field

The invention relates to the field of fine tuning of a pre-training model, in particular to a method and a device for fine tuning of a pre-training model based on linear transformation.

Background

In the field of artificial intelligence, data size and model size have been increasing explosively, and nearly hundred basic models with excellent performance have been developed with the addition of large computational effort and large algorithms. Current research and application paradigms translate into 'pre-training + fine tuning' to address a wide variety of downstream tasks.

Initially, in order to address a certain downstream task, it is the practice of a researcher or engineer to conduct full-parameter tuning (full-tuning) of the pre-trained model, i.e., to update all parameters of the pre-trained model with data of the downstream task. The full-parameter fine tuning method can achieve better performance indexes, such as accuracy, but the fine-tuned pre-training model is not used for other downstream tasks, and on the other hand, the problem of fitting is caused by full-parameter fine tuning when the data of the downstream tasks are small. Therefore, in practice, the full parameter trimming method often requires a huge storage space. Later, researchers put forward a more convenient linear probe method (linear probe), for different downstream tasks, the method freezes the backbone (backup) of feature extraction, only needs to adjust the parameters of the network (head) of the part related to the task, thus realizing that a plurality of downstream tasks share one backbone, and saving the storage cost. But the accuracy of the linear detection method is significantly reduced relative to the full parameter fine tuning method.

Currently, researchers are enthusiastic to develop efficient fine-tuning strategies for parameters, hopeing to achieve accuracy similar to full-parameter fine-tuning, and hope to adjust only a slight, very small amount of parameters to ensure convenience in model deployment. For this reason, methods such as Adapter, bias, VPT, which are related to the field, have received a lot of attention. Taking VPT as an example, for each downstream task, the method introduces N additional learnable token in the input sequence for prompting the pre-training model to better identify the input image.

However, although the currently existing parameter efficient fine tuning methods achieve better accuracy, some parameters related to downstream tasks, such as the above-mentioned VPT example, are still additionally introduced during the training and reasoning phase. Although small in number, certain complications can be caused when the model is deployed.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, and provides a pre-training model fine tuning method and device based on linear transformation by deeply analyzing the difference of characteristic distribution of pre-training data and downstream task data, so as to solve the problems that the prior art has more parameters to be learned, relatively low accuracy and the related parameters of the downstream task can not be removed when the model is deployed.

The invention freezes the backbone part (backbone) of the pre-training model, adds a linear transformation operation after each network layer, the purpose of the operation is to align the characteristic distribution of the downstream task data and the pre-training data, and only fine-tunes the parameters (few) of the linear transformation and the parameters of the network head (head) in the fine-tuning process of the downstream task. After fine tuning, the invention fuses the parameters of the linear transformation into the parameters of the backbone through the heavy parameter technology, thereby realizing no extra parameters in the reasoning stage and facilitating the deployment of the model.

In order to achieve the above object, the present invention provides the following technical solutions:

the first aspect of the invention relates to a pretrained model fine tuning method based on linear transformation, which comprises the following steps:

S1, data collection and pretreatment: collecting related image data of a downstream task, dividing a training set and a verification set for the image data, and carrying out proper preprocessing, wherein the preprocessing comprises the steps of keeping aspect ratio random scaling, random clipping, random horizontal overturning, RGB dithering, label smoothing and average value removal;

s2, preparing a pre-training model: selecting a neural network model of a main flow after pre-training on an ImageNet-1K or 21K data set, freezing a backbone part (backbone) of the pre-training model, namely, not updating corresponding network parameters, modifying the output dimension of a task head (head) of the pre-training model according to the category number of downstream tasks, and randomly initializing the parameters of the modified head part;

s3, introducing linear transformation: the invention observes that the feature distribution of the pre-training data and the feature distribution of the downstream data have the difference in scale and direction, so the purpose of the linear transformation module is to scale and translate the features output by the previous layer, so that the pre-training model can adapt to the downstream data;

s4, fine adjustment of a model: training parameters of the head part in the step S2 and parameters of the linear transformation module in the step S3 by using data of the downstream task;

S5, model weight parameters: selecting the model with the best performance on the verification set, storing the model weight, and merging the parameters of the linear transformation module introduced in the step S3 into the backup parameters of the pre-training model by utilizing a heavy parameter technology;

S6, model deployment: the model after the heavy parameters is deployed on terminal equipment, the terminal equipment receives new data and inputs the new data into the trained model to obtain a predictive probability vector, and then related downstream tasks are completed.

Further, in the step S3, when the transducer is used as the pre-training model, the number of layers of the model is recorded as m, and the input of the model layer 1+1 isB, L, D are the batch size of the input data, the length of the input sequence, and the dimension of the input sequence, respectively; the linear transformation module comprises two parts, namely, characteristic scaling/>And feature translation/>; The forward propagation process of the network after the linear transformation module is added is as follows:

Wherein the method comprises the steps of For input image data,/>The transducer module consists of a multi-head self-attention, a feedforward network, a multi-layer sensor and residual connection.

Further, in the step S3, when CNN is used as the pre-training model, the number of layers of the model is still recorded as m, and the input of the model layer 1+1 isWherein B, C, h and w are the batch size, the number of characteristic channels, the characteristic width and the characteristic height of the input data, respectively; the linear transformation module still comprises two parts, namely characteristic scaling/>And feature translation/>; The forward propagation process of the network after the linear transformation module is added is as follows:

Wherein the method comprises the steps of For input image data,/>The system is a CNN module and consists of a convolution layer, a batch standardization layer, a nonlinear activation function and residual error connection.

Further, in the step S4, when the image classification is used as the downstream task, the model training loss on the downstream task is thatWherein/>The prediction probability and the category label of the model to the ith input sample are respectively represented.

Further, in the step S4, when the target detection is used as the downstream task, the model training loss on the downstream task is thatWhereinThe loss is a frame regression.

Further, in the step S4, when the semantic segmentation is used as the downstream task, the model is trained on the downstream task with a loss ofWherein/>The prediction probability and the category label of the model to the jth pixel point of the ith input sample are respectively represented.

Further, in the step S5, the parameters of the linear transformation module are converted by using a heavy parameter techniqueAnd/>Blend into adjacent layer weights:

wherein, ，/>Based on the technology, only the weight of the model is required to be changed for the downstream task, and model deployment is not required to be implemented again, so that the engineering floor of the pre-training model is greatly facilitated.

A second aspect of the invention comprises a pre-training model fine tuning system based on linear transformation, comprising:

And the data collection and preprocessing module is used for: the method comprises the steps of collecting related image data of a downstream task, dividing a training set and a verification set for the image data, and preprocessing, wherein the preprocessing comprises aspect ratio random scaling, random clipping, random horizontal overturning, RGB dithering, label smoothing and average removal;

Preparing a pre-training model module: selecting a neural network model of a main flow after pre-training on an ImageNet-1K or 21K data set, freezing a backbone part (backbone) of the pre-training model, namely, not updating corresponding network parameters, modifying the output dimension of a task head (head) of the pre-training model according to the category number of downstream tasks, and randomly initializing the parameters of the modified head part;

And a linear transformation module: the method comprises the steps of inserting an additional linear transformation module between two adjacent layers of a pre-training model, wherein the dimension of the linear transformation module and the output characteristics of the adjacent layers meet the vector multiplication relation, and scaling and translating the characteristics output by the previous layer so that the pre-training model can adapt to downstream data;

model fine tuning module: training parameters of a task head part and parameters of a linear transformation module by utilizing data of a downstream task;

Model heavy parameter module: selecting the model with the best performance on the verification set, storing the model weight, and merging the parameters of the linear transformation module introduced by the linear transformation module into the backbone parameters of the pre-training model by utilizing a heavy parameter technology;

model deployment module: the model after the heavy parameters is deployed on the terminal equipment, and the terminal equipment receives new data and inputs the new data into the trained model so as to complete related downstream tasks.

A third aspect of the present invention comprises a linear transformation based pretraining model tuning apparatus comprising a memory and one or more processors, the memory having executable code stored therein, the one or more processors, when executing the executable code, for implementing a linear transformation based pretraining model tuning method of the present invention.

A fourth aspect of the invention comprises a computer readable storage medium having stored thereon a program which, when executed by a processor, implements a pre-training model tuning method of the invention based on linear transformations.

The technical scheme of the invention can be applied to various types of neural network models, and is not limited by the enumerated model types. The technical scheme of the invention can be used for various tasks without being limited by the enumerated task types.

The innovation points of the invention are as follows: by adding the linear transformation module, the efficient fine adjustment of the pre-training model is realized with very little parameter quantity to be learned (only about one thousandth of parameter is needed to be learned relative to the pre-training model), and no additional parameter is needed in the reasoning stage (only the original pre-training model parameter is needed to be updated, and no new parameter is introduced), so that the deployment of the model is greatly facilitated.

The beneficial effects of the invention are as follows:

① In the model fine tuning training stage, only few parameters to be learned are introduced, so that higher accuracy than methods in the prior art, such as full fine-tuning, adapter, bias, VPT, is realized;

② In the reasoning stage, parameters introduced in the training stage are fused into the backbone of the pre-training model through a heavy parameter technology, so that the same pre-training model can be simultaneously used for a plurality of downstream tasks, and the problem of huge cost of model storage is solved;

③ The method can be used for various mainstream network architectures including convolutional neural networks, vision transformer and the like, and can also be used for various pre-training modes including supervised pre-training, weak supervised pre-training, self-supervised pre-training and the like.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.

FIG. 1 is a schematic flow chart of a pre-training model fine tuning method based on linear transformation according to an embodiment of the present invention;

FIG. 2 is a feature distribution diagram of a pre-training model fine tuning method based on linear transformation according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a linear transformation module of a pre-training model fine tuning method based on linear transformation according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a linear transformation module of a pre-training model tuning method based on linear transformation according to another embodiment of the present invention;

Fig. 5 is a block diagram of a pretraining model fine tuning device based on linear transformation according to an embodiment of the present invention.

FIG. 6 is a block diagram of a pre-training model fine tuning system based on linear transformation according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The features of the following examples and embodiments may be combined with each other without any conflict.

Example 1

Taking urban traffic highway target identification as an example, target categories such as pedestrians, bicycles, electric vehicles, trucks, buses and automobiles need to be identified, referring to fig. 1, the pedestrian vehicle identification method applying the pre-training model fine tuning method based on linear transformation of the invention comprises the following steps:

S1, data collection and pretreatment: image acquisition devices such as cameras arranged on roads are used for collecting image data related to road target recognition tasks, training sets and verification sets are divided into image data, and proper preprocessing is carried out, wherein the preprocessing comprises aspect ratio random scaling, random cutting, random horizontal overturning, RGB dithering, label smoothing, mean value removal and the like;

s2, preparing a pre-training model: selecting a main stream neural network model pre-trained on an ImageNet-21K data set, freezing a backbone part (backbone) of the pre-training model, namely, not updating corresponding network parameters, modifying the output dimension of a task head (head) of the pre-training model according to the category number of downstream tasks, and randomly initializing the parameters of the modified head part;

S3, introducing linear transformation: the invention observes that the feature distribution of the pre-training data and the feature distribution of the downstream data have the difference in scale and direction, so the purpose of the linear transformation module is to scale and translate the features output by the previous layer, so that the pre-training model can adapt to the downstream data; please refer to fig. 2 for feature distribution of the pre-training data and feature distribution of the downstream task data;

S4, fine adjustment of a model: training parameters of a head part in S2 and parameters of a linear transformation module in S3 by using data of a downstream task;

S6, model deployment: the model after the heavy parameters is deployed on terminal equipment, the terminal equipment receives new data and inputs the new data into the trained model to obtain a predictive probability vector, and then recognition of a highway target is completed.

S7, the predictive probability vector obtained in the step S6 is used for identifying the pedestrian vehicle. Specifically, a class label set of the pedestrian vehicle is defined, a predictive probability vector is obtained for each input image data through the model deployed in the step S6, an index of the maximum value of the predictive probability vector is calculated, and a corresponding class label is found out from the defined class label set of the pedestrian vehicle according to the index, namely, the class identification of the input image data is completed.

In the step S3, a transducer is used as a pre-training model, the number of layers of the model is recorded as m, and the input of the (1+1) th layer of the model isB, L, D are the batch size of the input data, the length of the input sequence, and the dimension of the input sequence, respectively; the linear transformation module comprises two parts, namely, characteristic scaling/>And feature translation/>; The forward propagation process of the network after the linear transformation module is added is that

Wherein the method comprises the steps ofFor input image data,/>The transformer module consists of a multi-head self-attention, a feedforward network, a multi-layer sensor and residual error connection, and the network architecture is shown in FIG. 3;

in the step S4, the image classification is used as the downstream task, and the model training loss on the downstream task is that Wherein/>Respectively representing the prediction probability and the category label of the model to the ith input sample;

in the step S5, the parameters of the linear transformation module are converted by utilizing the heavy parameter technology And/>Blend into adjacent layer weights

Wherein,，/>Based on the technology, only the weight of the model is required to be changed for the downstream task, and model deployment is not required to be implemented again, so that the engineering floor of the pre-training model is greatly facilitated;

in order to embody the remarkable advantages of the method, the method compares a plurality of existing methods on the main stream evaluation data set, uses ViT-B pre-trained on ImageNet-21K as a backbone, reports the classification accuracy on each data set, and the result is shown in a table 1. Experimental results show that the method is obviously superior to the existing method on a plurality of data sets, and comprises two aspects of an accuracy index and a parameter index needing to be learned, and the method has the beneficial advantages of less learnable parameters and high accuracy.

Table 1 results of the trimming method are compared, and a transducer (ViT-B) is used as a backbone

Example 2

The natural image recognition method applying the pretrained model fine adjustment method based on linear transformation of the invention, referring to fig. 1, comprises the following steps:

s1, data collection and pretreatment: collecting related image data of a downstream task, dividing a training set and a verification set for the image data, and carrying out proper preprocessing, wherein the preprocessing comprises the steps of keeping aspect ratio random scaling, random clipping, random horizontal overturning, RGB dithering, label smoothing, average value removal and the like;

S7, obtaining the predictive probability vector identification natural image by applying the step S6. Specifically, a class label set of the natural image is defined, a predictive probability vector is obtained for each input image data through the model deployed in the step S6, an index of the maximum value of the predictive probability vector is calculated, and a corresponding class label is found out from the defined class label set of the natural image according to the index, namely, the class identification of the input image data is completed.

In the step S3, CNN is used as a pre-training model, the number of layers of the model is recorded as m, and the input of the (1+1) th layer of the model isWherein B, C, h and w are the batch size, the number of characteristic channels, the characteristic width and the characteristic height of the input data, respectively; the linear transformation module still comprises two parts, namely characteristic scaling/>And feature translation/>; The forward propagation process of the network after the linear transformation module is added is that

Wherein the method comprises the steps ofFor input image data,/>The network architecture is shown in fig. 4, and is a CNN module and is composed of a convolution layer, a batch standardization layer, a nonlinear activation function and residual error connection;

In the step S4, when the image classification is used as the downstream task, the model training loss on the downstream task is that Wherein/>Respectively representing the prediction probability and the category label of the model to the ith input sample;

in order to embody the remarkable advantages of the method, the invention reports the classification accuracy on each data set on the main stream evaluating data set comprising two main stream natural image data sets of CIFAR-100 and ImageNet-1K, compared with a plurality of existing methods, and the method adopts ConvNeXt-B pre-training on ImageNet-21K as a backbone, and reports the classification accuracy on each data set, wherein the results are shown in Table 2. Experimental results show that the method of the invention has the beneficial advantages of less leachable parameters and high accuracy rate for CNN architecture.

Table 2 results of the trimming method are compared, CNN (ConvNeXt-B) is used as backbone

Example 3

The outdoor natural scene target detection and segmentation method based on the pre-training model fine tuning method based on linear transformation comprises the following steps:

S6, model deployment: and deploying the model with the heavy parameters on terminal equipment, and inputting the received new data into the trained model by the terminal equipment to obtain a predictive probability vector.

S7, the predictive probability vector is obtained in the step S6 to detect and divide the outdoor natural scene target. Specifically, a class label set for detecting and dividing an outdoor natural scene target is defined, a prediction probability vector and a boundary box of an image area are obtained for each input image data through a model deployed in the step S6, an index of the maximum value of the prediction probability vector is calculated, a corresponding class label is found out from the defined class label set of the outdoor natural scene according to the index, and the boundary of the image area is drawn on the input image according to the predicted boundary box, so that the target detection and division of the input image data are completed.

Wherein the method comprises the steps ofFor input image data,/>The transducer module consists of a multi-head self-attention, a feedforward network, a multi-layer sensor and residual error connection;

In the step S4, the target detection is used as a downstream task, the Mask R-CNN is used as a backbone, and the loss of training of the model on the downstream task is that WhereinRegression loss for the frame;

in the step S4, when semantic segmentation is used as the downstream task, UPerNet is used as the backbone, and the model training loss on the downstream task is that Wherein/>Respectively representing the prediction probability and the category label of the model to the jth pixel point of the ith input sample;

To demonstrate the significant advantages of the method of the present invention, the present invention compares the co and ADE20K on the mainstream evaluation dataset with the various existing methods, reporting the evaluation results on each dataset, as shown in table 3. Experimental results show that the method provided by the invention still has the beneficial advantages of less leachable parameters and high accuracy in target detection and semantic segmentation.

Table 3 results comparison, object detection and semantic segmentation of the fine tuning method

Example 4

Referring to fig. 6, the present embodiment relates to a pretrained model fine tuning system based on linear transformation, including:

And the data collection and preprocessing module is used for: the method comprises the steps that the server side is used for collecting image data related to an identification task, dividing the image data into a training set and a verification set, and carrying out proper preprocessing;

preparing a pre-training model module: selecting a main stream pre-training model, freezing a backbone part, modifying the output dimension of a task head of the pre-training model, and randomly initializing parameters of a modified head part;

And a linear transformation module: the linear transformation module is used for inserting additional linear transformation modules between two adjacent layers of the pre-training model;

model heavy parameter module: the parameters of the linear transformation module are fused into backbone parameters of adjacent layers by utilizing a heavy parameter technology;

Example 5

The present embodiment relates to a computer readable storage medium storing a computer program operable to perform a pre-training model tuning method based on linear transformation as provided in fig. 1 above.

Example 6

The present embodiment relates to a pre-training model fine tuning device based on linear transformation, which includes a memory and one or more processors, wherein executable codes are stored in the memory, and the one or more processors are used for implementing the pre-training model fine tuning method based on linear transformation of embodiment 1 when executing the executable codes.

At the hardware level, as shown in fig. 5, the pretrained model fine tuning device based on linear transformation includes a processor, an internal bus, a network interface, a memory and a nonvolatile memory, and may of course include hardware required by other services. The processor reads the corresponding computer program from the non-volatile memory into the memory and then runs to implement the method of data acquisition described above with respect to fig. 1. Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present invention, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.

The foregoing is merely exemplary of the present invention and is not intended to limit the present invention. Various modifications and variations of the present invention will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are to be included in the scope of the claims of the present invention.

Claims

1. A pretraining model fine tuning method based on linear transformation is characterized by comprising the following steps:

s1, data collection and pretreatment: collecting related image data of a downstream task, dividing a training set and a verification set for the image data, and preprocessing, wherein the preprocessing comprises aspect ratio random scaling, random clipping, random horizontal overturning, RGB dithering, label smoothing and average value removal;

S2, preparing a pre-training model: selecting a main stream neural network model pre-trained on an ImageNet-1K or 21K data set, freezing a backbone part backbone of the pre-trained model, namely, not updating corresponding network parameters, modifying the output dimension of a task head of the pre-trained model according to the category number of downstream tasks, and randomly initializing the parameters of the modified head part;

S3, introducing linear transformation: inserting additional linear transformation modules between two adjacent layers of the pre-training model, wherein the dimension of the linear transformation modules and the output characteristics of the adjacent layers meet the matrix multiplication relation, and scaling and translating the characteristics output by the previous layer so that the pre-training model can adapt to downstream data;

When a transducer is used as a pre-training model, the number of layers of the model is recorded as m, model number The layer input is/>B, L, D are the batch size of the input data, the length of the input sequence, and the dimension of the input sequence, respectively; the linear transformation module comprises two parts, namely, characteristic scaling/>And feature translation/>; The forward propagation process of the network after the linear transformation module is added is as follows: /(I)Wherein/>For input image data,/>The transducer module consists of a multi-head self-attention, a feedforward network, a multi-layer sensor and residual error connection;

when CNN is used as a pre-training model, the number of layers of the model is still recorded as m, model number The layer input is/>Wherein B, C, h and w are the batch size, the number of characteristic channels, the characteristic width and the characteristic height of the input data, respectively; the linear transformation module still comprises two parts, namely characteristic scaling/>And feature translation/>; The forward propagation process of the network after the linear transformation module is added is as follows: /(I)Wherein/>For input image data,/>The system is a CNN module and consists of a convolution layer, a batch standardization layer, a nonlinear activation function and residual error connection;

S5, model weight parameters: selecting the model with the best performance on the verification set, storing the model weight, and updating the parameters, namely merging the parameters of the linear transformation module introduced in the step S3 into the backup parameters of the pre-training model;

2. The method according to claim 1, wherein in the step S4, when the image classification is used as the downstream task, the model training loss on the downstream task is thatWherein/>，/>Respectively represent model pair/>Predictive probability and class labels for individual input samples.

3. The method according to claim 1, wherein in the step S4, when the target detection is used as the downstream task, the model training loss on the downstream task isWherein/>The loss is a frame regression.

4. The method for fine tuning a pre-training model based on linear transformation according to claim 1, wherein in the step S4, when semantic segmentation is used as the downstream task, the model training loss on the downstream task isWherein/>，/>Respectively represent model pair/>The prediction probability and class label of the j-th pixel point of the input sample.

5. The method for fine tuning a pre-training model based on linear transformation according to claim 1, wherein in step S5, the parameters of the linear transformation module are determined by a heavy parameter techniqueAnd/>Is integrated into the weight of the adjacent layer, and the calculation formula is as follows: /(I)Wherein/>Only the weights of the model need be changed for different downstream tasks without having to re-implement the model deployment.

6. A pre-training model fine tuning system based on linear transformation, comprising:

Preparing a pre-training model module: selecting a main stream neural network model pre-trained on an ImageNet-1K or 21K data set, freezing a backbone part backbone of the pre-trained model, namely, not updating corresponding network parameters, modifying the output dimension of a task head of the pre-trained model according to the category number of downstream tasks, and randomly initializing the parameters of the modified head part;

When a transducer is used as a pre-training model, the number of layers of the model is recorded as m, model number The layer is input asB, L, D are the batch size of the input data, the length of the input sequence, and the dimension of the input sequence, respectively; the linear transformation module comprises two parts, namely, characteristic scaling/>And feature translation/>; The forward propagation process of the network after the linear transformation module is added is as follows: /(I)Wherein/>For input image data,/>The transducer module consists of a multi-head self-attention, a feedforward network, a multi-layer sensor and residual error connection;

7. A pre-training model tuning device based on linear transformation, comprising a memory and one or more processors, the memory having executable code stored therein, the one or more processors being configured to implement a pre-training model tuning method based on linear transformation as claimed in any one of claims 1-5 when the executable code is executed.

8. A computer readable storage medium, having stored thereon a program which, when executed by a processor, implements a pre-training model tuning method based on a linear transformation as claimed in any one of claims 1-5.