CN117574982B - Pre-training model fine tuning method and device based on linear transformation - Google Patents

Pre-training model fine tuning method and device based on linear transformation Download PDF

Info

Publication number
CN117574982B
CN117574982B CN202410060305.5A CN202410060305A CN117574982B CN 117574982 B CN117574982 B CN 117574982B CN 202410060305 A CN202410060305 A CN 202410060305A CN 117574982 B CN117574982 B CN 117574982B
Authority
CN
China
Prior art keywords
model
linear transformation
training
parameters
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410060305.5A
Other languages
Chinese (zh)
Other versions
CN117574982A (en
Inventor
王玉柱
段曼妮
王永恒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Lab
Original Assignee
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Lab filed Critical Zhejiang Lab
Priority to CN202410060305.5A priority Critical patent/CN117574982B/en
Publication of CN117574982A publication Critical patent/CN117574982A/en
Application granted granted Critical
Publication of CN117574982B publication Critical patent/CN117574982B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0499Feedforward networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

A pretraining model fine tuning method and device based on linear transformation, the method comprises: collecting related image data of a downstream task, dividing a training set and a verification set for the image data, performing proper preprocessing, selecting a proper pre-training model, modifying a task head of the model to adapt to the downstream task, and freezing a backbone part of the pre-training model; inserting a linear transformation module between adjacent layers, the linear transformation module being used for scaling and translating features; the data of the downstream task is utilized to finely tune the pre-training model, and the model weight with the optimal performance on the verification set is saved; and integrating the parameters of the linear transformation module into the adjacent layers by utilizing a heavy parameter technology, and finally deploying the model to finish the downstream task. The invention has less introduced parameters to be learned, can realize higher accuracy on various downstream tasks, and adopts a heavy parameter technology to integrate the introduced parameters into a model backbone layer in a model reasoning stage, thereby greatly simplifying the deployment of the model.

Description

Pre-training model fine tuning method and device based on linear transformation
Technical Field
The invention relates to the field of fine tuning of a pre-training model, in particular to a method and a device for fine tuning of a pre-training model based on linear transformation.
Background
In the field of artificial intelligence, data size and model size have been increasing explosively, and nearly hundred basic models with excellent performance have been developed with the addition of large computational effort and large algorithms. Current research and application paradigms translate into 'pre-training + fine tuning' to address a wide variety of downstream tasks.
Initially, in order to address a certain downstream task, it is the practice of a researcher or engineer to conduct full-parameter tuning (full-tuning) of the pre-trained model, i.e., to update all parameters of the pre-trained model with data of the downstream task. The full-parameter fine tuning method can achieve better performance indexes, such as accuracy, but the fine-tuned pre-training model is not used for other downstream tasks, and on the other hand, the problem of fitting is caused by full-parameter fine tuning when the data of the downstream tasks are small. Therefore, in practice, the full parameter trimming method often requires a huge storage space. Later, researchers put forward a more convenient linear probe method (linear probe), for different downstream tasks, the method freezes the backbone (backup) of feature extraction, only needs to adjust the parameters of the network (head) of the part related to the task, thus realizing that a plurality of downstream tasks share one backbone, and saving the storage cost. But the accuracy of the linear detection method is significantly reduced relative to the full parameter fine tuning method.
Currently, researchers are enthusiastic to develop efficient fine-tuning strategies for parameters, hopeing to achieve accuracy similar to full-parameter fine-tuning, and hope to adjust only a slight, very small amount of parameters to ensure convenience in model deployment. For this reason, methods such as Adapter, bias, VPT, which are related to the field, have received a lot of attention. Taking VPT as an example, for each downstream task, the method introduces N additional learnable token in the input sequence for prompting the pre-training model to better identify the input image.
However, although the currently existing parameter efficient fine tuning methods achieve better accuracy, some parameters related to downstream tasks, such as the above-mentioned VPT example, are still additionally introduced during the training and reasoning phase. Although small in number, certain complications can be caused when the model is deployed.
Disclosure of Invention
The invention aims to overcome the defects of the prior art, and provides a pre-training model fine tuning method and device based on linear transformation by deeply analyzing the difference of characteristic distribution of pre-training data and downstream task data, so as to solve the problems that the prior art has more parameters to be learned, relatively low accuracy and the related parameters of the downstream task can not be removed when the model is deployed.
The invention freezes the backbone part (backbone) of the pre-training model, adds a linear transformation operation after each network layer, the purpose of the operation is to align the characteristic distribution of the downstream task data and the pre-training data, and only fine-tunes the parameters (few) of the linear transformation and the parameters of the network head (head) in the fine-tuning process of the downstream task. After fine tuning, the invention fuses the parameters of the linear transformation into the parameters of the backbone through the heavy parameter technology, thereby realizing no extra parameters in the reasoning stage and facilitating the deployment of the model.
In order to achieve the above object, the present invention provides the following technical solutions:
the first aspect of the invention relates to a pretrained model fine tuning method based on linear transformation, which comprises the following steps:
S1, data collection and pretreatment: collecting related image data of a downstream task, dividing a training set and a verification set for the image data, and carrying out proper preprocessing, wherein the preprocessing comprises the steps of keeping aspect ratio random scaling, random clipping, random horizontal overturning, RGB dithering, label smoothing and average value removal;
s2, preparing a pre-training model: selecting a neural network model of a main flow after pre-training on an ImageNet-1K or 21K data set, freezing a backbone part (backbone) of the pre-training model, namely, not updating corresponding network parameters, modifying the output dimension of a task head (head) of the pre-training model according to the category number of downstream tasks, and randomly initializing the parameters of the modified head part;
s3, introducing linear transformation: the invention observes that the feature distribution of the pre-training data and the feature distribution of the downstream data have the difference in scale and direction, so the purpose of the linear transformation module is to scale and translate the features output by the previous layer, so that the pre-training model can adapt to the downstream data;
s4, fine adjustment of a model: training parameters of the head part in the step S2 and parameters of the linear transformation module in the step S3 by using data of the downstream task;
S5, model weight parameters: selecting the model with the best performance on the verification set, storing the model weight, and merging the parameters of the linear transformation module introduced in the step S3 into the backup parameters of the pre-training model by utilizing a heavy parameter technology;
S6, model deployment: the model after the heavy parameters is deployed on terminal equipment, the terminal equipment receives new data and inputs the new data into the trained model to obtain a predictive probability vector, and then related downstream tasks are completed.
Further, in the step S3, when the transducer is used as the pre-training model, the number of layers of the model is recorded as m, and the input of the model layer 1+1 isB, L, D are the batch size of the input data, the length of the input sequence, and the dimension of the input sequence, respectively; the linear transformation module comprises two parts, namely, characteristic scaling/>And feature translation/>; The forward propagation process of the network after the linear transformation module is added is as follows:
Wherein the method comprises the steps of For input image data,/>The transducer module consists of a multi-head self-attention, a feedforward network, a multi-layer sensor and residual connection.
Further, in the step S3, when CNN is used as the pre-training model, the number of layers of the model is still recorded as m, and the input of the model layer 1+1 isWherein B, C, h and w are the batch size, the number of characteristic channels, the characteristic width and the characteristic height of the input data, respectively; the linear transformation module still comprises two parts, namely characteristic scaling/>And feature translation/>; The forward propagation process of the network after the linear transformation module is added is as follows:
Wherein the method comprises the steps of For input image data,/>The system is a CNN module and consists of a convolution layer, a batch standardization layer, a nonlinear activation function and residual error connection.
Further, in the step S4, when the image classification is used as the downstream task, the model training loss on the downstream task is thatWherein/>The prediction probability and the category label of the model to the ith input sample are respectively represented.
Further, in the step S4, when the target detection is used as the downstream task, the model training loss on the downstream task is thatWhereinThe loss is a frame regression.
Further, in the step S4, when the semantic segmentation is used as the downstream task, the model is trained on the downstream task with a loss ofWherein/>The prediction probability and the category label of the model to the jth pixel point of the ith input sample are respectively represented.
Further, in the step S5, the parameters of the linear transformation module are converted by using a heavy parameter techniqueAnd/>Blend into adjacent layer weights:
wherein, ,/>Based on the technology, only the weight of the model is required to be changed for the downstream task, and model deployment is not required to be implemented again, so that the engineering floor of the pre-training model is greatly facilitated.
A second aspect of the invention comprises a pre-training model fine tuning system based on linear transformation, comprising:
And the data collection and preprocessing module is used for: the method comprises the steps of collecting related image data of a downstream task, dividing a training set and a verification set for the image data, and preprocessing, wherein the preprocessing comprises aspect ratio random scaling, random clipping, random horizontal overturning, RGB dithering, label smoothing and average removal;
Preparing a pre-training model module: selecting a neural network model of a main flow after pre-training on an ImageNet-1K or 21K data set, freezing a backbone part (backbone) of the pre-training model, namely, not updating corresponding network parameters, modifying the output dimension of a task head (head) of the pre-training model according to the category number of downstream tasks, and randomly initializing the parameters of the modified head part;
And a linear transformation module: the method comprises the steps of inserting an additional linear transformation module between two adjacent layers of a pre-training model, wherein the dimension of the linear transformation module and the output characteristics of the adjacent layers meet the vector multiplication relation, and scaling and translating the characteristics output by the previous layer so that the pre-training model can adapt to downstream data;
model fine tuning module: training parameters of a task head part and parameters of a linear transformation module by utilizing data of a downstream task;
Model heavy parameter module: selecting the model with the best performance on the verification set, storing the model weight, and merging the parameters of the linear transformation module introduced by the linear transformation module into the backbone parameters of the pre-training model by utilizing a heavy parameter technology;
model deployment module: the model after the heavy parameters is deployed on the terminal equipment, and the terminal equipment receives new data and inputs the new data into the trained model so as to complete related downstream tasks.
A third aspect of the present invention comprises a linear transformation based pretraining model tuning apparatus comprising a memory and one or more processors, the memory having executable code stored therein, the one or more processors, when executing the executable code, for implementing a linear transformation based pretraining model tuning method of the present invention.
A fourth aspect of the invention comprises a computer readable storage medium having stored thereon a program which, when executed by a processor, implements a pre-training model tuning method of the invention based on linear transformations.
The technical scheme of the invention can be applied to various types of neural network models, and is not limited by the enumerated model types. The technical scheme of the invention can be used for various tasks without being limited by the enumerated task types.
The innovation points of the invention are as follows: by adding the linear transformation module, the efficient fine adjustment of the pre-training model is realized with very little parameter quantity to be learned (only about one thousandth of parameter is needed to be learned relative to the pre-training model), and no additional parameter is needed in the reasoning stage (only the original pre-training model parameter is needed to be updated, and no new parameter is introduced), so that the deployment of the model is greatly facilitated.
The beneficial effects of the invention are as follows:
① In the model fine tuning training stage, only few parameters to be learned are introduced, so that higher accuracy than methods in the prior art, such as full fine-tuning, adapter, bias, VPT, is realized;
② In the reasoning stage, parameters introduced in the training stage are fused into the backbone of the pre-training model through a heavy parameter technology, so that the same pre-training model can be simultaneously used for a plurality of downstream tasks, and the problem of huge cost of model storage is solved;
③ The method can be used for various mainstream network architectures including convolutional neural networks, vision transformer and the like, and can also be used for various pre-training modes including supervised pre-training, weak supervised pre-training, self-supervised pre-training and the like.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.
FIG. 1 is a schematic flow chart of a pre-training model fine tuning method based on linear transformation according to an embodiment of the present invention;
FIG. 2 is a feature distribution diagram of a pre-training model fine tuning method based on linear transformation according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a linear transformation module of a pre-training model fine tuning method based on linear transformation according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a linear transformation module of a pre-training model tuning method based on linear transformation according to another embodiment of the present invention;
Fig. 5 is a block diagram of a pretraining model fine tuning device based on linear transformation according to an embodiment of the present invention.
FIG. 6 is a block diagram of a pre-training model fine tuning system based on linear transformation according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The features of the following examples and embodiments may be combined with each other without any conflict.
Example 1
Taking urban traffic highway target identification as an example, target categories such as pedestrians, bicycles, electric vehicles, trucks, buses and automobiles need to be identified, referring to fig. 1, the pedestrian vehicle identification method applying the pre-training model fine tuning method based on linear transformation of the invention comprises the following steps:
S1, data collection and pretreatment: image acquisition devices such as cameras arranged on roads are used for collecting image data related to road target recognition tasks, training sets and verification sets are divided into image data, and proper preprocessing is carried out, wherein the preprocessing comprises aspect ratio random scaling, random cutting, random horizontal overturning, RGB dithering, label smoothing, mean value removal and the like;
s2, preparing a pre-training model: selecting a main stream neural network model pre-trained on an ImageNet-21K data set, freezing a backbone part (backbone) of the pre-training model, namely, not updating corresponding network parameters, modifying the output dimension of a task head (head) of the pre-training model according to the category number of downstream tasks, and randomly initializing the parameters of the modified head part;
S3, introducing linear transformation: the invention observes that the feature distribution of the pre-training data and the feature distribution of the downstream data have the difference in scale and direction, so the purpose of the linear transformation module is to scale and translate the features output by the previous layer, so that the pre-training model can adapt to the downstream data; please refer to fig. 2 for feature distribution of the pre-training data and feature distribution of the downstream task data;
S4, fine adjustment of a model: training parameters of a head part in S2 and parameters of a linear transformation module in S3 by using data of a downstream task;
s5, model weight parameters: selecting the model with the best performance on the verification set, storing the model weight, and merging the parameters of the linear transformation module introduced in the step S3 into the backup parameters of the pre-training model by utilizing a heavy parameter technology;
S6, model deployment: the model after the heavy parameters is deployed on terminal equipment, the terminal equipment receives new data and inputs the new data into the trained model to obtain a predictive probability vector, and then recognition of a highway target is completed.
S7, the predictive probability vector obtained in the step S6 is used for identifying the pedestrian vehicle. Specifically, a class label set of the pedestrian vehicle is defined, a predictive probability vector is obtained for each input image data through the model deployed in the step S6, an index of the maximum value of the predictive probability vector is calculated, and a corresponding class label is found out from the defined class label set of the pedestrian vehicle according to the index, namely, the class identification of the input image data is completed.
In the step S3, a transducer is used as a pre-training model, the number of layers of the model is recorded as m, and the input of the (1+1) th layer of the model isB, L, D are the batch size of the input data, the length of the input sequence, and the dimension of the input sequence, respectively; the linear transformation module comprises two parts, namely, characteristic scaling/>And feature translation/>; The forward propagation process of the network after the linear transformation module is added is that
Wherein the method comprises the steps ofFor input image data,/>The transformer module consists of a multi-head self-attention, a feedforward network, a multi-layer sensor and residual error connection, and the network architecture is shown in FIG. 3;
in the step S4, the image classification is used as the downstream task, and the model training loss on the downstream task is that Wherein/>Respectively representing the prediction probability and the category label of the model to the ith input sample;
in the step S5, the parameters of the linear transformation module are converted by utilizing the heavy parameter technology And/>Blend into adjacent layer weights
Wherein,,/>Based on the technology, only the weight of the model is required to be changed for the downstream task, and model deployment is not required to be implemented again, so that the engineering floor of the pre-training model is greatly facilitated;
in order to embody the remarkable advantages of the method, the method compares a plurality of existing methods on the main stream evaluation data set, uses ViT-B pre-trained on ImageNet-21K as a backbone, reports the classification accuracy on each data set, and the result is shown in a table 1. Experimental results show that the method is obviously superior to the existing method on a plurality of data sets, and comprises two aspects of an accuracy index and a parameter index needing to be learned, and the method has the beneficial advantages of less learnable parameters and high accuracy.
Table 1 results of the trimming method are compared, and a transducer (ViT-B) is used as a backbone
Example 2
The natural image recognition method applying the pretrained model fine adjustment method based on linear transformation of the invention, referring to fig. 1, comprises the following steps:
s1, data collection and pretreatment: collecting related image data of a downstream task, dividing a training set and a verification set for the image data, and carrying out proper preprocessing, wherein the preprocessing comprises the steps of keeping aspect ratio random scaling, random clipping, random horizontal overturning, RGB dithering, label smoothing, average value removal and the like;
s2, preparing a pre-training model: selecting a main stream neural network model pre-trained on an ImageNet-21K data set, freezing a backbone part (backbone) of the pre-training model, namely, not updating corresponding network parameters, modifying the output dimension of a task head (head) of the pre-training model according to the category number of downstream tasks, and randomly initializing the parameters of the modified head part;
S3, introducing linear transformation: the invention observes that the feature distribution of the pre-training data and the feature distribution of the downstream data have the difference in scale and direction, so the purpose of the linear transformation module is to scale and translate the features output by the previous layer, so that the pre-training model can adapt to the downstream data; please refer to fig. 2 for feature distribution of the pre-training data and feature distribution of the downstream task data;
S4, fine adjustment of a model: training parameters of a head part in S2 and parameters of a linear transformation module in S3 by using data of a downstream task;
s5, model weight parameters: selecting the model with the best performance on the verification set, storing the model weight, and merging the parameters of the linear transformation module introduced in the step S3 into the backup parameters of the pre-training model by utilizing a heavy parameter technology;
S6, model deployment: the model after the heavy parameters is deployed on terminal equipment, the terminal equipment receives new data and inputs the new data into the trained model to obtain a predictive probability vector, and then related downstream tasks are completed.
S7, obtaining the predictive probability vector identification natural image by applying the step S6. Specifically, a class label set of the natural image is defined, a predictive probability vector is obtained for each input image data through the model deployed in the step S6, an index of the maximum value of the predictive probability vector is calculated, and a corresponding class label is found out from the defined class label set of the natural image according to the index, namely, the class identification of the input image data is completed.
In the step S3, CNN is used as a pre-training model, the number of layers of the model is recorded as m, and the input of the (1+1) th layer of the model isWherein B, C, h and w are the batch size, the number of characteristic channels, the characteristic width and the characteristic height of the input data, respectively; the linear transformation module still comprises two parts, namely characteristic scaling/>And feature translation/>; The forward propagation process of the network after the linear transformation module is added is that
Wherein the method comprises the steps ofFor input image data,/>The network architecture is shown in fig. 4, and is a CNN module and is composed of a convolution layer, a batch standardization layer, a nonlinear activation function and residual error connection;
In the step S4, when the image classification is used as the downstream task, the model training loss on the downstream task is that Wherein/>Respectively representing the prediction probability and the category label of the model to the ith input sample;
in the step S5, the parameters of the linear transformation module are converted by utilizing the heavy parameter technology And/>Blend into adjacent layer weights
Wherein,,/>Based on the technology, only the weight of the model is required to be changed for the downstream task, and model deployment is not required to be implemented again, so that the engineering floor of the pre-training model is greatly facilitated;
in order to embody the remarkable advantages of the method, the invention reports the classification accuracy on each data set on the main stream evaluating data set comprising two main stream natural image data sets of CIFAR-100 and ImageNet-1K, compared with a plurality of existing methods, and the method adopts ConvNeXt-B pre-training on ImageNet-21K as a backbone, and reports the classification accuracy on each data set, wherein the results are shown in Table 2. Experimental results show that the method of the invention has the beneficial advantages of less leachable parameters and high accuracy rate for CNN architecture.
Table 2 results of the trimming method are compared, CNN (ConvNeXt-B) is used as backbone
Example 3
The outdoor natural scene target detection and segmentation method based on the pre-training model fine tuning method based on linear transformation comprises the following steps:
s1, data collection and pretreatment: collecting related image data of a downstream task, dividing a training set and a verification set for the image data, and carrying out proper preprocessing, wherein the preprocessing comprises the steps of keeping aspect ratio random scaling, random clipping, random horizontal overturning, RGB dithering, label smoothing, average value removal and the like;
s2, preparing a pre-training model: selecting a main stream neural network model pre-trained on an ImageNet-21K data set, freezing a backbone part (backbone) of the pre-training model, namely, not updating corresponding network parameters, modifying the output dimension of a task head (head) of the pre-training model according to the category number of downstream tasks, and randomly initializing the parameters of the modified head part;
S3, introducing linear transformation: the invention observes that the feature distribution of the pre-training data and the feature distribution of the downstream data have the difference in scale and direction, so the purpose of the linear transformation module is to scale and translate the features output by the previous layer, so that the pre-training model can adapt to the downstream data; please refer to fig. 2 for feature distribution of the pre-training data and feature distribution of the downstream task data;
S4, fine adjustment of a model: training parameters of a head part in S2 and parameters of a linear transformation module in S3 by using data of a downstream task;
s5, model weight parameters: selecting the model with the best performance on the verification set, storing the model weight, and merging the parameters of the linear transformation module introduced in the step S3 into the backup parameters of the pre-training model by utilizing a heavy parameter technology;
S6, model deployment: and deploying the model with the heavy parameters on terminal equipment, and inputting the received new data into the trained model by the terminal equipment to obtain a predictive probability vector.
S7, the predictive probability vector is obtained in the step S6 to detect and divide the outdoor natural scene target. Specifically, a class label set for detecting and dividing an outdoor natural scene target is defined, a prediction probability vector and a boundary box of an image area are obtained for each input image data through a model deployed in the step S6, an index of the maximum value of the prediction probability vector is calculated, a corresponding class label is found out from the defined class label set of the outdoor natural scene according to the index, and the boundary of the image area is drawn on the input image according to the predicted boundary box, so that the target detection and division of the input image data are completed.
In the step S3, a transducer is used as a pre-training model, the number of layers of the model is recorded as m, and the input of the (1+1) th layer of the model isB, L, D are the batch size of the input data, the length of the input sequence, and the dimension of the input sequence, respectively; the linear transformation module comprises two parts, namely, characteristic scaling/>And feature translation/>; The forward propagation process of the network after the linear transformation module is added is that
Wherein the method comprises the steps ofFor input image data,/>The transducer module consists of a multi-head self-attention, a feedforward network, a multi-layer sensor and residual error connection;
In the step S4, the target detection is used as a downstream task, the Mask R-CNN is used as a backbone, and the loss of training of the model on the downstream task is that WhereinRegression loss for the frame;
in the step S4, when semantic segmentation is used as the downstream task, UPerNet is used as the backbone, and the model training loss on the downstream task is that Wherein/>Respectively representing the prediction probability and the category label of the model to the jth pixel point of the ith input sample;
in the step S5, the parameters of the linear transformation module are converted by utilizing the heavy parameter technology And/>Blend into adjacent layer weights
Wherein,,/>Based on the technology, only the weight of the model is required to be changed for the downstream task, and model deployment is not required to be implemented again, so that the engineering floor of the pre-training model is greatly facilitated;
To demonstrate the significant advantages of the method of the present invention, the present invention compares the co and ADE20K on the mainstream evaluation dataset with the various existing methods, reporting the evaluation results on each dataset, as shown in table 3. Experimental results show that the method provided by the invention still has the beneficial advantages of less leachable parameters and high accuracy in target detection and semantic segmentation.
Table 3 results comparison, object detection and semantic segmentation of the fine tuning method
Example 4
Referring to fig. 6, the present embodiment relates to a pretrained model fine tuning system based on linear transformation, including:
And the data collection and preprocessing module is used for: the method comprises the steps that the server side is used for collecting image data related to an identification task, dividing the image data into a training set and a verification set, and carrying out proper preprocessing;
preparing a pre-training model module: selecting a main stream pre-training model, freezing a backbone part, modifying the output dimension of a task head of the pre-training model, and randomly initializing parameters of a modified head part;
And a linear transformation module: the linear transformation module is used for inserting additional linear transformation modules between two adjacent layers of the pre-training model;
model fine tuning module: training parameters of a task head part and parameters of a linear transformation module by utilizing data of a downstream task;
model heavy parameter module: the parameters of the linear transformation module are fused into backbone parameters of adjacent layers by utilizing a heavy parameter technology;
model deployment module: the model after the heavy parameters is deployed on the terminal equipment, and the terminal equipment receives new data and inputs the new data into the trained model so as to complete related downstream tasks.
Example 5
The present embodiment relates to a computer readable storage medium storing a computer program operable to perform a pre-training model tuning method based on linear transformation as provided in fig. 1 above.
Example 6
The present embodiment relates to a pre-training model fine tuning device based on linear transformation, which includes a memory and one or more processors, wherein executable codes are stored in the memory, and the one or more processors are used for implementing the pre-training model fine tuning method based on linear transformation of embodiment 1 when executing the executable codes.
At the hardware level, as shown in fig. 5, the pretrained model fine tuning device based on linear transformation includes a processor, an internal bus, a network interface, a memory and a nonvolatile memory, and may of course include hardware required by other services. The processor reads the corresponding computer program from the non-volatile memory into the memory and then runs to implement the method of data acquisition described above with respect to fig. 1. Of course, other implementations, such as logic devices or combinations of hardware and software, are not excluded from the present invention, that is, the execution subject of the following processing flows is not limited to each logic unit, but may be hardware or logic devices.
The foregoing is merely exemplary of the present invention and is not intended to limit the present invention. Various modifications and variations of the present invention will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. which come within the spirit and principles of the invention are to be included in the scope of the claims of the present invention.

Claims (8)

1. A pretraining model fine tuning method based on linear transformation is characterized by comprising the following steps:
s1, data collection and pretreatment: collecting related image data of a downstream task, dividing a training set and a verification set for the image data, and preprocessing, wherein the preprocessing comprises aspect ratio random scaling, random clipping, random horizontal overturning, RGB dithering, label smoothing and average value removal;
S2, preparing a pre-training model: selecting a main stream neural network model pre-trained on an ImageNet-1K or 21K data set, freezing a backbone part backbone of the pre-trained model, namely, not updating corresponding network parameters, modifying the output dimension of a task head of the pre-trained model according to the category number of downstream tasks, and randomly initializing the parameters of the modified head part;
S3, introducing linear transformation: inserting additional linear transformation modules between two adjacent layers of the pre-training model, wherein the dimension of the linear transformation modules and the output characteristics of the adjacent layers meet the matrix multiplication relation, and scaling and translating the characteristics output by the previous layer so that the pre-training model can adapt to downstream data;
When a transducer is used as a pre-training model, the number of layers of the model is recorded as m, model number The layer input is/>B, L, D are the batch size of the input data, the length of the input sequence, and the dimension of the input sequence, respectively; the linear transformation module comprises two parts, namely, characteristic scaling/>And feature translation/>; The forward propagation process of the network after the linear transformation module is added is as follows: /(I)Wherein/>For input image data,/>The transducer module consists of a multi-head self-attention, a feedforward network, a multi-layer sensor and residual error connection;
when CNN is used as a pre-training model, the number of layers of the model is still recorded as m, model number The layer input is/>Wherein B, C, h and w are the batch size, the number of characteristic channels, the characteristic width and the characteristic height of the input data, respectively; the linear transformation module still comprises two parts, namely characteristic scaling/>And feature translation/>; The forward propagation process of the network after the linear transformation module is added is as follows: /(I)Wherein/>For input image data,/>The system is a CNN module and consists of a convolution layer, a batch standardization layer, a nonlinear activation function and residual error connection;
s4, fine adjustment of a model: training parameters of the head part in the step S2 and parameters of the linear transformation module in the step S3 by using data of the downstream task;
S5, model weight parameters: selecting the model with the best performance on the verification set, storing the model weight, and updating the parameters, namely merging the parameters of the linear transformation module introduced in the step S3 into the backup parameters of the pre-training model;
S6, model deployment: the model after the heavy parameters is deployed on terminal equipment, the terminal equipment receives new data and inputs the new data into the trained model to obtain a predictive probability vector, and then related downstream tasks are completed.
2. The method according to claim 1, wherein in the step S4, when the image classification is used as the downstream task, the model training loss on the downstream task is thatWherein/>,/>Respectively represent model pair/>Predictive probability and class labels for individual input samples.
3. The method according to claim 1, wherein in the step S4, when the target detection is used as the downstream task, the model training loss on the downstream task isWherein/>The loss is a frame regression.
4. The method for fine tuning a pre-training model based on linear transformation according to claim 1, wherein in the step S4, when semantic segmentation is used as the downstream task, the model training loss on the downstream task isWherein/>,/>Respectively represent model pair/>The prediction probability and class label of the j-th pixel point of the input sample.
5. The method for fine tuning a pre-training model based on linear transformation according to claim 1, wherein in step S5, the parameters of the linear transformation module are determined by a heavy parameter techniqueAnd/>Is integrated into the weight of the adjacent layer, and the calculation formula is as follows: /(I)Wherein/>Only the weights of the model need be changed for different downstream tasks without having to re-implement the model deployment.
6. A pre-training model fine tuning system based on linear transformation, comprising:
And the data collection and preprocessing module is used for: the method comprises the steps of collecting related image data of a downstream task, dividing a training set and a verification set for the image data, and preprocessing, wherein the preprocessing comprises aspect ratio random scaling, random clipping, random horizontal overturning, RGB dithering, label smoothing and average removal;
Preparing a pre-training model module: selecting a main stream neural network model pre-trained on an ImageNet-1K or 21K data set, freezing a backbone part backbone of the pre-trained model, namely, not updating corresponding network parameters, modifying the output dimension of a task head of the pre-trained model according to the category number of downstream tasks, and randomly initializing the parameters of the modified head part;
And a linear transformation module: the method comprises the steps of inserting an additional linear transformation module between two adjacent layers of a pre-training model, wherein the dimension of the linear transformation module and the output characteristics of the adjacent layers meet the vector multiplication relation, and scaling and translating the characteristics output by the previous layer so that the pre-training model can adapt to downstream data;
When a transducer is used as a pre-training model, the number of layers of the model is recorded as m, model number The layer is input asB, L, D are the batch size of the input data, the length of the input sequence, and the dimension of the input sequence, respectively; the linear transformation module comprises two parts, namely, characteristic scaling/>And feature translation/>; The forward propagation process of the network after the linear transformation module is added is as follows: /(I)Wherein/>For input image data,/>The transducer module consists of a multi-head self-attention, a feedforward network, a multi-layer sensor and residual error connection;
when CNN is used as a pre-training model, the number of layers of the model is still recorded as m, model number The layer input is/>Wherein B, C, h and w are the batch size, the number of characteristic channels, the characteristic width and the characteristic height of the input data, respectively; the linear transformation module still comprises two parts, namely characteristic scaling/>And feature translation/>; The forward propagation process of the network after the linear transformation module is added is as follows: /(I)Wherein/>For input image data,/>The system is a CNN module and consists of a convolution layer, a batch standardization layer, a nonlinear activation function and residual error connection;
model fine tuning module: training parameters of a task head part and parameters of a linear transformation module by utilizing data of a downstream task;
Model heavy parameter module: selecting the model with the best performance on the verification set, storing the model weight, and merging the parameters of the linear transformation module introduced by the linear transformation module into the backbone parameters of the pre-training model by utilizing a heavy parameter technology;
model deployment module: the model after the heavy parameters is deployed on the terminal equipment, and the terminal equipment receives new data and inputs the new data into the trained model so as to complete related downstream tasks.
7. A pre-training model tuning device based on linear transformation, comprising a memory and one or more processors, the memory having executable code stored therein, the one or more processors being configured to implement a pre-training model tuning method based on linear transformation as claimed in any one of claims 1-5 when the executable code is executed.
8. A computer readable storage medium, having stored thereon a program which, when executed by a processor, implements a pre-training model tuning method based on a linear transformation as claimed in any one of claims 1-5.
CN202410060305.5A 2024-01-16 2024-01-16 Pre-training model fine tuning method and device based on linear transformation Active CN117574982B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410060305.5A CN117574982B (en) 2024-01-16 2024-01-16 Pre-training model fine tuning method and device based on linear transformation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410060305.5A CN117574982B (en) 2024-01-16 2024-01-16 Pre-training model fine tuning method and device based on linear transformation

Publications (2)

Publication Number Publication Date
CN117574982A CN117574982A (en) 2024-02-20
CN117574982B true CN117574982B (en) 2024-04-26

Family

ID=89862867

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410060305.5A Active CN117574982B (en) 2024-01-16 2024-01-16 Pre-training model fine tuning method and device based on linear transformation

Country Status (1)

Country Link
CN (1) CN117574982B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114298158A (en) * 2021-12-06 2022-04-08 湖南工业大学 Multi-mode pre-training method based on image-text linear combination
CN115146651A (en) * 2022-06-30 2022-10-04 北京航空航天大学 Memory mechanism-based pre-training language model parameter fine-tuning method and device
WO2022243985A1 (en) * 2021-05-21 2022-11-24 Soul Machines Limited Transfer learning in image recognition systems
CN116186606A (en) * 2023-01-29 2023-05-30 北京邮电大学 Semi-supervised less sample time series anomaly detection and classification system and method based on pre-training model guided fine tuning
CN116644316A (en) * 2023-05-31 2023-08-25 杭州电子科技大学 Multi-mode multi-task learning oriented lightweight adaptive network learning method
CN116702760A (en) * 2023-06-01 2023-09-05 中国石油大学(华东) Geographic naming entity error correction method based on pre-training deep learning
CN116775885A (en) * 2023-07-04 2023-09-19 山西财经大学 Cross-domain aspect emotion classification method based on pretraining fine adjustment
CN117059103A (en) * 2023-10-12 2023-11-14 慧言科技(天津)有限公司 Acceleration method of voice recognition fine tuning task based on low-rank matrix approximation

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20230419170A1 (en) * 2022-06-24 2023-12-28 Fractal Analytics Private Limited System and method for efficient machine learning

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022243985A1 (en) * 2021-05-21 2022-11-24 Soul Machines Limited Transfer learning in image recognition systems
CN114298158A (en) * 2021-12-06 2022-04-08 湖南工业大学 Multi-mode pre-training method based on image-text linear combination
CN115146651A (en) * 2022-06-30 2022-10-04 北京航空航天大学 Memory mechanism-based pre-training language model parameter fine-tuning method and device
CN116186606A (en) * 2023-01-29 2023-05-30 北京邮电大学 Semi-supervised less sample time series anomaly detection and classification system and method based on pre-training model guided fine tuning
CN116644316A (en) * 2023-05-31 2023-08-25 杭州电子科技大学 Multi-mode multi-task learning oriented lightweight adaptive network learning method
CN116702760A (en) * 2023-06-01 2023-09-05 中国石油大学(华东) Geographic naming entity error correction method based on pre-training deep learning
CN116775885A (en) * 2023-07-04 2023-09-19 山西财经大学 Cross-domain aspect emotion classification method based on pretraining fine adjustment
CN117059103A (en) * 2023-10-12 2023-11-14 慧言科技(天津)有限公司 Acceleration method of voice recognition fine tuning task based on low-rank matrix approximation

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
Investigate the multipath erasure of nitrobenzene over nanoscale zero-valent-iron/N-doped biochar hybrid with extraordinary reduction performance;Wang, Yongheng等;《ENVIRONMENTAL RESEARCH》;20230106;第216卷;全文 *
Linear fine-tuning: a linear transformation based transfer strategy for deep MRI reconstruction;Bi, Wanqing等;《FRONTIERS IN NEUROSCIENCE》;20230715;全文 *
基于自注意力机制的文本分类研究;刘斌;《中国优秀硕士学位论文全文数据库(信息科技辑)》;20201215(第12期);全文 *
基于预训练模型的文本分类网络TextCGA;杨玮祺;杜晔;;现代计算机;20200425(第12期);全文 *
预训练模型的跨领域跨任务迁移学习;丁文博;许玥;;科技资讯;20200113(第02期);全文 *

Also Published As

Publication number Publication date
CN117574982A (en) 2024-02-20

Similar Documents

Publication Publication Date Title
CN112380921A (en) Road detection method based on Internet of vehicles
CN110263786B (en) Road multi-target identification system and method based on feature dimension fusion
CN111832225A (en) Method for constructing driving condition of automobile
CN113486726A (en) Rail transit obstacle detection method based on improved convolutional neural network
CN110781850A (en) Semantic segmentation system and method for road recognition, and computer storage medium
CN111898432A (en) Pedestrian detection system and method based on improved YOLOv3 algorithm
CN111340026A (en) Training method of vehicle annual payment identification model and vehicle annual payment identification method
CN112417973A (en) Unmanned system based on car networking
CN112990065A (en) Optimized YOLOv5 model-based vehicle classification detection method
CN110599459A (en) Underground pipe network risk assessment cloud system based on deep learning
CN112633149A (en) Domain-adaptive foggy-day image target detection method and device
CN112380918A (en) Road vehicle state identification method and device, electronic equipment and storage medium
CN116630932A (en) Road shielding target detection method based on improved YOLOV5
CN112785610B (en) Lane line semantic segmentation method integrating low-level features
CN117574982B (en) Pre-training model fine tuning method and device based on linear transformation
CN111612803B (en) Vehicle image semantic segmentation method based on image definition
CN110532904B (en) Vehicle identification method
CN116883650A (en) Image-level weak supervision semantic segmentation method based on attention and local stitching
CN115482444A (en) Traffic sign detection method based on two-stage fusion neural network
CN112364720A (en) Method for quickly identifying and counting vehicle types
Jain et al. An improved traffic flow forecasting based control logic using parametrical doped learning and truncated dual flow optimization model
CN112327834A (en) Navigation control equipment and unmanned automobile
CN117593890B (en) Detection method and device for road spilled objects, electronic equipment and storage medium
CN115050028B (en) Small sample license plate detection method in severe weather
Li YOLOV5-based traffic sign detection algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant