CN116663638A

CN116663638A - Model fine adjustment training method, device, equipment and medium

Info

Publication number: CN116663638A
Application number: CN202310919300.9A
Authority: CN
Inventors: 刘微; 张建安; 曲磊
Original assignee: Hisense Group Holding Co Ltd
Current assignee: Hisense Group Holding Co Ltd
Priority date: 2023-07-26
Filing date: 2023-07-26
Publication date: 2023-08-29

Abstract

The application relates to the technical field of artificial intelligence, in particular to a model fine adjustment training method, device, equipment and medium. When the model is subjected to fine tuning training, a matrix to be adjusted in the fine tuning model is decomposed based on a singular value decomposition algorithm to obtain a plurality of decomposition matrices, the target quantity is determined based on the plurality of decomposition matrices, an increment matrix is constructed based on the determined target quantity and the plurality of decomposition matrices, the increment matrix is constructed based on knowledge contained in the matrix to be adjusted, and is not completely independent of the matrix to be adjusted, the increment matrix is added into the fine tuning model to obtain a target model, fine tuning training is performed on the target model, and the problems of long fine tuning training time and poor effect of the model are avoided. The technical scheme provided by the application has the characteristics of reliability, generalization and robustness, and accords with the credibility characteristic.

Description

Model fine adjustment training method, device, equipment and medium

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a model fine adjustment training method, device, equipment and medium.

Background

With the development of artificial intelligence technology, more and more models are applied to various industries, and particularly, a generated natural language model represented by a chat generated pre-training converter (Chat Generative Pre-trained Transformer, chatGPT) model in recent years is attracting more attention. The core principle behind the ChatGPT model is a large-scale language model, which usually takes a converter model (transducer) as a basic structure, and involves parameters of hundreds of billions to hundreds of billions, so that training the model with the large-scale parameters requires extremely large computing resources, such as more than 1000 display cards, and the training time is as high as 1-2 months.

In order to reduce the model development cost, those skilled in the art explore various efficient model parameter fine tuning schemes, namely, fine tuning training is performed on the model by adjusting part of parameters of the model on the basis of the existing large model, so that the large model is applicable to specific downstream tasks.

In the related art, a re-parameterization technique is generally used to perform fine-tuning training on a part of parameters of a trained model. In a large-scale language model, the parameter to be adjusted is usually a matrix, the incremental matrix is added to the matrix to be adjusted by a re-parameterization technology, and when the model is subjected to fine adjustment training, the matrix to be adjusted is not adjusted, but only the incremental matrix added is adjusted, so that the aim of high-efficiency fine adjustment is achieved.

In the related art, an incremental matrix added to a matrix to be adjusted is generally determined based on Low-Rank Adaptation (lorea), however, in practice, it is found that the incremental matrix determined based on the lorea is completely independent from the matrix to be adjusted, that is, knowledge contained in the matrix to be adjusted is not considered when determining the incremental matrix, which results in long fine tuning training time and poor effect.

Therefore, how to improve the efficiency and effect of fine tuning training of models is a problem to be solved.

Disclosure of Invention

The embodiment of the application provides a method, a device, equipment and a medium for fine tuning training of a model, which are used for solving the problems of long fine tuning training time and poor effect of the model in the prior art.

The application provides a model fine tuning training method, which comprises the following steps:

acquiring a matrix to be adjusted in the fine adjustment model;

decomposing the matrix to be adjusted based on a singular value decomposition algorithm to obtain a plurality of decomposition matrices; determining a target number based on the plurality of decomposition matrices;

constructing an incremental matrix based on the target number and the plurality of decomposition matrices; and adding the increment matrix into the fine tuning model to obtain a target model, and carrying out fine tuning training on the target model.

Further, the determining the target number based on the plurality of decomposition matrices includes:

and determining singular values based on the decomposition matrixes, sequencing the singular values according to the order of magnitude, determining the minimum sum value when the first sum value of the maximum singular value and other singular values adjacent to the maximum singular value is larger than a preset threshold value, and including the target number of target singular values.

Further, the determining the first sum value of the maximum singular value and other singular values adjacent to the maximum singular value is greater than the minimum sum value when the first sum value is greater than a preset threshold, and the target number of the included target singular values includes:

determining a target sum of singular values included in the singular value matrix;

determining a second sum of the maximum singular value and at least one other singular value adjacent to the maximum singular value one by one in the sorted singular values; sequentially judging whether the ratio of the second sum value to the target sum value is larger than the preset threshold value or not;

and when a second sum value with the ratio of the second sum value to the target sum value being larger than the preset threshold value is identified, the second sum value is taken as the target sum value, the singular value of the target sum value is determined and obtained, the determined singular value is taken as the target singular value, and the number of the target singular values is determined as the target number.

Further, the preset threshold is any value greater than 0 and less than 1.

Further, the constructing a delta matrix based on the target number and the plurality of decomposition matrices includes:

constructing a diagonal matrix using the target number of singular values;

selecting the row and/or column data of the target number from a first target matrix in the multiple decomposition matrices to obtain a second target matrix;

and constructing an incremental matrix by using the diagonal matrix and the second target matrix.

Further, the selecting the target number of rows and/or columns from the first target matrix in the multiple decomposition matrices to obtain a second target matrix includes:

acquiring a first target matrix in the plurality of decomposition matrices, wherein the first target matrix comprises a left singular value vector matrix and a right singular value vector matrix;

selecting the number of the columns of data of the target number from the left singular value vector matrix as a first matrix; selecting the data with the target number from the right singular value vector matrix as a second matrix; the first matrix and the second matrix are determined as the second target matrix.

Further, said constructing an incremental matrix using said diagonal matrix and said second target matrix comprises:

and determining a matrix obtained by multiplying the first matrix, the diagonal matrix and the second matrix as the increment matrix.

The embodiment of the application also provides a device for fine tuning and training the model, which comprises:

the acquisition module is used for acquiring a matrix to be adjusted in the fine adjustment model;

the decomposition determining module is used for decomposing the matrix to be adjusted based on a singular value decomposition algorithm to obtain a plurality of decomposition matrices; determining a target number based on the plurality of decomposition matrices;

a construction module for constructing an incremental matrix based on the target number and the plurality of decomposition matrices;

and the training module is used for adding the increment matrix into the fine tuning model to obtain a target model, and carrying out fine tuning training on the target model.

The embodiment of the application also provides electronic equipment, which at least comprises a processor and a memory, wherein the processor is used for realizing the steps of the model fine tuning training method according to any one of the above steps when executing a computer program stored in the memory.

Embodiments of the present application also provide a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of a model fine tuning training method as described in any one of the above.

In the embodiment of the application, when the model is subjected to fine tuning training, the matrix to be adjusted in the fine tuning model is decomposed based on a singular value decomposition algorithm to obtain a plurality of decomposition matrices, the target quantity is determined based on the plurality of decomposition matrices, and an increment matrix is constructed based on the determined target quantity and the plurality of decomposition matrices, and is constructed based on knowledge contained in the matrix to be adjusted instead of being completely independent of the matrix to be adjusted, and the increment matrix is added into the fine tuning model to obtain the target model, so that the fine tuning training is performed on the target model, and the problems of long fine tuning training time and poor effect of the model are avoided.

Drawings

In order to more clearly illustrate the technical solutions of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic flow chart of a model fine tuning training method according to an embodiment of the present application;

FIG. 2 is a schematic diagram of model fine tuning training provided in an embodiment of the present application;

FIG. 3 is a flowchart of another model fine tuning training method according to an embodiment of the present application;

FIG. 4 is a schematic structural diagram of a model fine tuning training device according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments. All other embodiments, which are derived by a person skilled in the art based on the embodiments of the application, fall within the scope of protection of the application.

The embodiment of the application provides a method, a device, equipment and a medium for model fine tuning training, wherein a matrix to be adjusted in a fine tuning model is obtained in the method; decomposing the matrix to be adjusted based on a singular value decomposition algorithm to obtain a plurality of decomposition matrices; determining a target number based on the plurality of decomposition matrices; constructing an incremental matrix based on the target number and the plurality of decomposition matrices; and adding the increment matrix into the fine tuning model to obtain a target model, and carrying out fine tuning training on the target model.

Fig. 1 is a flow chart of a model fine tuning training method according to an embodiment of the present application, where the process includes the following steps:

s101: and obtaining a matrix to be adjusted in the fine adjustment model.

The model fine tuning training method provided by the embodiment of the application is applied to electronic equipment, and the electronic equipment can be a server, a PC and the like.

In order to improve the fine tuning training efficiency of the fine tuning model, in the embodiment of the application, a matrix to be adjusted of the fine tuning model to be subjected to fine tuning training can be obtained, an incremental matrix is determined according to the matrix to be adjusted, and the incremental matrix is added into the fine tuning model, so that fine tuning training is performed on the fine tuning model with the incremental matrix added. In the embodiment of the application, the fine-tuning model can be a model which is already pre-trained, and can be stored locally in the electronic equipment or can be sent to the electronic equipment by a user of the electronic equipment. The matrix to be adjusted in the fine tuning matrix can be understood as the training parameters of the fine tuning model which are fixed during pre-training, that is, the fine tuning model adjusts some training parameters in the fine tuning model according to the output result of the fine tuning model during pre-training, when the fine tuning model fixes the adjusted training parameters after training, in order to enable the fine tuning model which is already trained to be suitable for some specific application scenarios, fine tuning training needs to be performed on the fine tuning model which is already trained, and then the fixed training parameters need to be adjusted again during fine tuning training, and the fixed training parameters are the matrix to be adjusted. In the embodiment of the application, the matrix to be adjusted can be identified in advance in the running code of the fine tuning model, or can be input by a user of the electronic equipment, and can be a matrix with any dimension.

Specifically, the large-scale language model is mainly composed based on a converter (transducer) architecture, that is, each large language model is formed by stacking a plurality of transducer layers, and each transducer layer is composed of a multi-head attention structure and a fully-connected forward network structure. Wherein the multi-headed attention structure is generally calculated as follows:

wherein W is _q 、W _k And W is _V All represent matrix parameters to be adjusted; x represents input data of a multi-head attention structure, and the multi-head attention structure multiplies the input data with matrix parameters to operate to obtain matrices Q, K and V; h represents the output result of the multi-head attention structure; softmax () represents the normalization function; d, d _k Representing the dimensions of the matrix K; k (K) ^T Representing the transposed matrix of matrix K.

The fully connected forward network architecture is typically calculated as follows:

F=σ(XW ₁ +b ₁ )W ₂ +b ₂

wherein F represents the output result of the fully connected forward network structure; sigma represents the weight; x represents input data of a fully connected forward network structure; w (W) ₁ And W is ₂ Representing matrix parameters to be adjusted; b1 and b2 represent bias parameters, respectively.

It should be noted that, the description of the multi-head attention structure and the fully connected forward network structure is the prior art, and only a brief description is made in the embodiment of the present application, and the detailed description is not given.

Through the analysis of the multi-head attention structure and the full-connection forward network structure, it can be found that the core of the multi-head attention structure and the full-connection forward network structure is matrix operation, and the matrices participating in the operation form the most important parameters of the large-scale language model. In order to improve the efficiency of fine tuning training of the model, when determining to perform fine tuning training on the fine tuning model, an incremental matrix can be added for matrix parameters to be adjusted in the large-scale language model which is already pre-trained, and then the W can be added _q 、W _k 、W _V 、W ₁ And W is ₂ And determining the matrix to be adjusted.

S102: and decomposing the matrix to be regulated based on a singular value decomposition algorithm to obtain a plurality of decomposition matrices.

After the matrix to be adjusted of the fine tuning model is obtained, the matrix to be adjusted in the fine tuning model can be decomposed based on a singular value decomposition algorithm to obtain a plurality of decomposition matrices, wherein the plurality of decomposition matrices can comprise a left singular value vector matrix, a singular value matrix and a right singular value vector matrix.

Specifically, assuming that the matrix to be adjusted is W, and the matrix to be adjusted is composed of m rows and n columns, the matrix to be adjusted may be expressed as W e R ^m×n . When fine tuning training is performed, the matrix W to be adjusted can be decomposed based on a singular value decomposition algorithm to obtain a left singular value vector matrix, a singular value matrix and a right singular value vector matrix, wherein the left singular value vector matrix can be expressed as U epsilon R ^m×m The left singular value vector matrix is an m×m symmetric matrix; the right singular value vector matrix may be represented as V.epsilon.R ^n×n The right singular value vector matrix is an n x n symmetric matrix; the singular value matrix may be expressed as Σr ^m×n The singular value matrix is a semi-definite diagonal matrix. It should be noted that, how to decompose the matrix to be adjusted based on the singular value decomposition algorithm is described in the prior art, and is not described here again.

S103: based on the plurality of decomposition matrices, a target number is determined.

After decomposing the matrix to be adjusted, the target number may be determined based on the plurality of decomposed matrices obtained by the decomposition. Because the singular values in the decomposition matrix represent the importance of the knowledge contained in the matrix to be adjusted, in the embodiment of the application, a singular value threshold can be preconfigured, singular values are determined according to the decomposition matrix, and the number of singular values larger than the singular value threshold in the singular values is determined and is determined as the target number. It is also possible to determine an average singular value of the singular values in the plurality of decomposition matrices, and to determine a number of singular values of the singular values that is greater than the average singular value, the number being determined as the target number.

S104: constructing an incremental matrix based on the target number and the plurality of decomposition matrices; and adding the increment matrix into the fine tuning model to obtain a target model, and carrying out fine tuning training on the target model.

After determining the target number based on the decomposed matrices, an incremental matrix may be constructed based on the target number and the decomposed matrices, and in the embodiment of the present application, one decomposed matrix may be randomly selected from the decomposed matrices, and row/column data of the target number in the randomly selected decomposed matrix may be used as the incremental matrix.

After the incremental matrix is determined, the incremental matrix is added into a target model to obtain the target model, fine tuning training is carried out on the target model, when fine tuning training is carried out on the target model, only the incremental matrix in the target model is adjusted, and the original fixed matrix parameters, namely the matrix to be adjusted, are not adjusted.

Specifically, assuming that the matrix to be adjusted is W, the matrix to be adjusted may be expressed as W e R ^m×n That is, the matrix to be adjusted consists of m rows and n columns, and the incremental matrix W with the same dimension is newly added to the matrix to be adjusted ₀ The increment matrix W ₀ Consisting of m rows and n columns, the W is defined as ₀ Adding into fine tuning model to obtainThe target model in which the parameters to be adjusted can then be expressed as h=x (w+w) ₀ ）=XW+XW ₀ . In the fine tuning model, the data quantity related to the parameters to be adjusted is that after the incremental matrix is added into the fine tuning model, the parameter quantity required to be adjusted in the fine tuning training process is far lower than the matrix to be adjusted when m and n are in the order of 1000-10000.

In order to further improve the efficiency of fine tuning training, in the foregoing embodiments, in the embodiment of the present application, determining, based on the plurality of decomposition matrices, the target number includes:

After decomposing the matrix to be adjusted in the fine adjustment model, singular values may be determined based on the obtained multiple decomposition matrices, and in the embodiment of the present application, a singular value matrix in the multiple decomposition matrices may be obtained, and singular values included in the singular value matrix may be obtained, where each element of a diagonal of the singular value matrix represents a singular value, and the singular value matrix may be represented as:

wherein the method comprises the steps of，σ _i I=1, …, k is a singular value; the number of singular values is k, and the number of singular values is the minimum value of the number m of rows and the number n of columns of the matrix to be adjusted. The singular value matrix exemplified above is described by taking k=min (m, n) =m as an example.

According to theory of matrix theory, the magnitude of the singular value represents the importance degree of knowledge contained in the matrix to be adjusted, therefore, in the embodiment of the application, the acquired singular values can be ordered according to the magnitude sequence, the first sum value of the largest singular value and other singular values adjacent to the largest singular value is determined one by one, and when the first sum value is larger than a preset threshold value, the first sum value is determined as a candidate sum value. After each first sum of the largest singular value and its neighboring other singular values is determined, the smallest value of the determined candidate sums is determined as a target minimum sum, the singular value involved in determining the target minimum sum is determined as a target singular value, and the number of involved singular values is determined as a target number, which may be used as a low rank parameter for subsequent construction of an incremental matrix. The preset threshold may be preconfigured by a worker based on a number of statistics.

In order to further improve the efficiency of fine tuning training, in the foregoing embodiments, in the embodiments of the present application, the determining, when the first sum of the maximum singular value and other singular values adjacent thereto is greater than the minimum sum of the predetermined threshold, includes:

Since the parameters to be adjusted of different pre-training models are different, the singular values of the different pre-training models are also different, so that in order to further improve the efficiency of fine-tuning training, when determining the target singular values and the target number, the targets and values of all the singular values included in the singular value matrix can be determined.

And determining the second sum value of the maximum singular value and at least one other singular value adjacent to the maximum singular value one by one in the sorted singular values. If the singular values are ordered in the order from big to small, when determining the second sum value, determining the second sum value of the singular values ordered as the first singular value and the singular value ordered as the second singular value in the ordered singular values; and determining the second sum value of the singular value ranked as the first position, the singular value ranked as the second position and the singular value ranked as the third position in the ranked singular values, and so on.

After each second sum is determined, it is sequentially determined whether the ratio of the second sum to the target sum is greater than a preset threshold, where the preset threshold is any value greater than 0 and less than 1, for example, the preset threshold may be 0.8, and the preset threshold may be understood as a parameter indicating the degree to which information needs to be retained. When a second sum value having a ratio to the target sum value greater than a preset threshold is identified, the second sum value may be taken as the target sum value, the singular value involved in determining the target sum value may be determined as the target singular value, and the number of target singular values may be determined as the target number.

Specifically, all singular values included in the singular value matrix are arranged in the order from large to small to obtain a singular value sequence [ sigma ] after sequencing ₁ ,σ ₂ ,…,σ _k ], σ ₁ ≥σ ₂ …≥σ _k The target number r may be determined for the sequence of singular values based on the following formula:

wherein epsilon represents a preset threshold; sigma (sigma) _i Representing the ith singular value in the sequence of singular valuesThe method comprises the steps of carrying out a first treatment on the surface of the r represents the target number; k represents the total number of singular values in the sequence of singular values; st. represents constraints to be satisfied by the equation.

According to the formula, when the ratio of the sum value of the first r singular values in the singular value sequence to the target sum value is greater than or equal to a preset threshold epsilon, the first r singular values are the target singular values, r is the target quantity, and r is the minimum value meeting the conditions.

In the embodiment of the application, the process of determining the target singular value and the target number can be determined based on a greedy algorithm, namely traversing the first k values, and determining the target singular value and the target number according to the first value meeting the condition.

In order to further improve the efficiency of fine tuning training, in the foregoing embodiments, in the embodiment of the present application, the constructing an incremental matrix based on the target number and the plurality of decomposition matrices includes:

Constructing a diagonal matrix using the target number of singular values;

When constructing the incremental matrix, the diagonal matrix can be constructed by using the determined singular values of the target number, in the embodiment of the application, the singular values can be determined according to the multiple decomposition matrices obtained by decomposition, the determined singular values are ordered in the order from big to small, the singular values of the front target number in the order are selected as the target singular values, and the diagonal matrix is constructed by using the target singular values.

After the diagonal matrix is constructed, a first target matrix of a plurality of decomposition matrices may be acquired, where the target matrix may be any one of the plurality of decomposition matrices or any plurality of decomposition matrices of the plurality of decomposition matrices.

In the embodiment of the application, a second target matrix can be obtained by selecting a target number of rows and/or columns from the acquired first target matrix, wherein the second target matrix can be one or a plurality of second target matrices.

When there is only one first target matrix, a target number of rows of data may be selected as the second target matrix in the first target matrix, or a target number of columns of data may be selected as the second target matrix. When the first target matrix selects the target number of row data and column data, for example, the front target number of row data or column data in the first target matrix may be selected, the rear target number of row data or column data in the first target matrix may be selected, and the target number of row data or column data in any position may be selected in the first target matrix.

When there are a plurality of first target matrices, for example, a target number of row or column data may be selected as the second target matrix in the a target matrices in the first target matrices, and a target number of row or column data may be selected as the second target matrix in the B target matrices in the first target matrices.

After the diagonal matrix and the second target matrix are obtained, the diagonal matrix and the second target matrix can be used for constructing an incremental matrix, when the incremental matrix is constructed, whether the diagonal matrix and the second target matrix are added to obtain the incremental matrix or the diagonal matrix and the second target matrix are multiplied to obtain the incremental matrix can be determined according to the number of rows and the number of columns of the diagonal matrix and the number of columns of the second target matrix, the embodiment of the application limits how to construct the incremental matrix according to the diagonal matrix and the second target matrix, and a person skilled in the art can adjust according to the determined number of rows and the determined number of columns of the diagonal matrix and the determined number of columns of the second target matrix.

In order to further improve the efficiency of fine tuning training, in the foregoing embodiments of the present application, selecting, from a first target matrix in the multiple decomposition matrices, the target number of row and/or column data, to obtain a second target matrix includes:

When determining the second target matrix, a first target matrix of a plurality of decomposition matrices obtained based on a singular value decomposition algorithm may be obtained, wherein the first target matrix may be a left singular value vector matrix and a right singular value vector matrix. In the embodiment of the application, the column data with the front target number can be selected from the left singular value vector matrix to be used as a first matrix, and the row data with the front target number can be selected from the right singular value vector matrix to be used as a second matrix, wherein the first matrix and the second matrix are determined second target matrices.

In an embodiment of the present application, said constructing an incremental matrix using said diagonal matrix and said second target matrix comprises:

After determining that the first matrix and the second matrix do not have the second target matrix, the first matrix, the diagonal matrix, and the second matrix may be multiplied to obtain an incremental matrix.

Specifically, the diagonal matrix constructed using the target singular values is Σ _r The diagonal matrix may be represented as Σ _r =diag（[σ ₁ ,…, σ _r ]）∈R ^r×r Wherein diag () represents constructing a diagonal matrix, σ ₁ ,…, σ _r Singular values representing the target number, r representing the target number, i.e. low rank parameters, the diagonal matrix being Σ _r I.e. a matrix of r rows and r columns; selecting the front target number r of column data from the left singular value vector matrix, wherein the obtained first matrix is U _r The first matrix may be represented as U _r =U[:,:r]∈R ^m×r The method comprises the steps of carrying out a first treatment on the surface of the Selecting the rows of the front target quantity r in the right singular value vector matrixData, the second matrix obtained is V _r The second matrix may be denoted as V _r =V[:r,:]∈R ^r×n . The incremental matrix constructed from the first matrix, the diagonal matrix, and the second matrix may be represented as W ₀ =U _r Σ _r V _r 。

The following describes a fine tuning process of a model in connection with a specific embodiment, and the incremental matrix for determining the matrix to be tuned provided by the embodiment of the present application is applicable to each layer of the fine tuning model.

Assume that the weight parameter of the fine tuning model which is pre-trained and fixed is W E R ^m×n I.e. the fixed weight parameter is m rows and n columns of matrix W, and the weight parameter W E R can be used for fine-tuning the model ^m×n The matrix to be adjusted is determined, and after the matrix to be adjusted is decomposed based on a singular value decomposition algorithm, a left singular value vector matrix U epsilon R can be obtained ^m×m Right singular value vector matrix V ε R ^n×n And singular value matrix Σ e R ^m×n . And determining a target singular value and a target quantity r according to the singular values included in the singular value matrix, wherein the target quantity is a low-rank parameter. After determining the low rank parameter, determining an initialized low rank matrix of low rank dimension, i.e. an incremental matrix, based on the low rank parameter, in particular constructing a diagonal matrix Σ using target singular values _r ∈R ^r×r In the left singular value vector matrix U epsilon R ^m×m Selecting the front target number r of column data to obtain a first matrix U _r ∈R ^m×r On the right singular value vector matrix V ε R ^n×n The data of the front target quantity r is selected, and a second matrix V is obtained _r ∈R ^r ^×n . And constructing an increment matrix according to the determined first matrix, the second matrix and the diagonal matrix, and adding the increment matrix into the fine tuning model to obtain the target model. FIG. 2 is a schematic diagram of model fine-tuning training provided in an embodiment of the present application, as shown in FIG. 2, FIG. 2 can be regarded as a fine-tuning training process related to a certain layer of a target model, and the bottom rectangle (top and bottom in the drawing) in FIG. 2 can be regarded as a feature vector x of m dimension, and the feature vector x is taken as an input of the layer The layer processes the eigenvector x based on a pre-training weight, i.e., a matrix to be adjusted W, to obtain x1, where the matrix to be adjusted may be expressed as W εR ^m×n I.e. the matrix to be adjusted is a matrix of m rows and n columns; at the same time as processing the feature vector x based on the matrix to be adjusted, the layer processes the feature vector x based on the determined increment matrix to obtain x2, wherein the increment matrix is based on the matrix V _r ∈R ^r×n 、Σ _r ∈R ^r×r And U _r ∈R ^m×r Determined, the matrix V _r ∈R ^r×n 、Σ _r ∈R ^r×r And U _r ∈R ^m×r Based on a decomposition matrix V.epsilon.R ^n×n 、Σ∈R ^m×n And U epsilon R ^m×m The decomposition matrix is constructed by decomposing the pre-training weight W based on a singular value decomposition algorithm, and how to determine the incremental matrix based on the decomposition matrix is described in detail in the above embodiments, which is not described here again; after x1 and x2 are obtained, the x1 and x2 are fused to obtain the top (top and bottom in the drawing) strip matrix h in fig. 2, where h is the feature vector obtained by fusing x1 and x 2.

The large-scale language model training mainly relates to matrix operation parameters, and the embodiment of the application mainly proposes that an additional increment matrix W is added ₀ Achieving the purpose of high-efficiency fine adjustment. The embodiment of the application utilizes the decomposed left singular value vector matrix, singular value matrix and right singular value vector matrix to construct an initialized incremental matrix W by carrying out singular value decomposition on the matrix W to be adjusted ₀ Thereby achieving the purpose of fine adjustment of high-efficiency parameters.

Compared with the existing widely used LoRA technology, the embodiment of the application selects reasonable target quantity from a theoretical level through singular value decomposition as low-rank dimensionality, thereby reducing the difficulty of manually designing low-rank coefficients; meanwhile, the decomposed matrix is adopted to construct an incremental matrix, and more model knowledge is contained than a random Gaussian initialization matrix, so that the model is easier to train and contains more field information.

The following describes a model fine tuning training process in connection with another specific embodiment, and fig. 3 is a schematic flow chart of another model fine tuning training method provided in the embodiment of the present application, as shown in fig. 3, including the following steps:

s301: and receiving the storage position of the fine tuning model and a matrix to be adjusted in the fine tuning model.

S302: and decomposing the matrix to be adjusted in the fine adjustment model based on a singular value decomposition algorithm to obtain a left singular value vector matrix, a singular value matrix and a right singular value vector matrix.

S303: a low rank parameter, i.e. a target number, is determined.

Specifically, the diagonal elements of the singular value matrix are sequenced from big to small, the target sum value of all the singular values is determined, and the second sum value of at least one other singular value adjacent to the largest singular value is determined one by one in the sequenced singular values; judging whether the ratio of the second sum value to the target sum value is larger than a preset threshold value or not in sequence; and when the second sum value with the ratio of the second sum value to the target sum value being larger than the preset threshold value is identified, the second sum value is taken as the target sum value, the singular value of the target sum value is determined, the determined singular value is taken as the target singular value, and the number of the target singular values is determined as the target number.

S304: an initial delta matrix is constructed based on the determined number of targets.

Specifically, constructing a diagonal matrix by using target singular values; selecting the front target number of column data from the left singular value vector matrix as a first matrix; selecting the front target number of data from the right singular value vector matrix as a second matrix; an incremental matrix is constructed using the first matrix, the diagonal matrix, and the second matrix. How to construct the increment matrix is described in detail in the above embodiments, and is not described herein.

S305: searching and acquiring the fine tuning model in the received storage position for storing the fine tuning model, adding the constructed increment matrix into the fine tuning model to obtain a target model, and carrying out fine tuning training on the target model.

The technical scheme provided by the application has the characteristics of reliability, generalization and robustness, and accords with the credibility characteristic.

Fig. 4 is a schematic structural diagram of a model fine tuning training apparatus according to an embodiment of the present application, as shown in fig. 4, where the apparatus includes:

an obtaining module 401, configured to obtain a matrix to be adjusted in the fine tuning model;

the decomposition determining module 402 is configured to decompose the matrix to be adjusted based on a singular value decomposition algorithm to obtain a plurality of decomposition matrices; determining a target number based on the plurality of decomposition matrices;

A construction module 403 for constructing an incremental matrix based on the target number and the plurality of decomposition matrices;

and the training module 404 is configured to add the increment matrix to the fine tuning model to obtain a target model, and perform fine tuning training on the target model.

In a possible implementation manner, the decomposition determining module 402 is specifically configured to determine, based on the multiple decomposition matrices, singular values, rank the singular values according to a size order, determine a minimum sum value when a first sum value of a maximum singular value and other singular values adjacent thereto is greater than a preset threshold, and include a target number of target singular values.

In a possible implementation manner, the decomposition determination module 402 is specifically configured to determine a target sum value of the singular values included in the singular value matrix; determining a second sum of the maximum singular value and at least one other singular value adjacent to the maximum singular value one by one in the sorted singular values; sequentially judging whether the ratio of the second sum value to the target sum value is larger than the preset threshold value or not; and when a second sum value with the ratio of the second sum value to the target sum value being larger than the preset threshold value is identified, the second sum value is taken as the target sum value, the singular value of the target sum value is determined and obtained, the determined singular value is taken as the target singular value, and the number of the target singular values is determined as the target number.

In one possible embodiment, the preset threshold is any value greater than 0 and less than 1.

In a possible implementation, the constructing module 403 is specifically configured to construct a diagonal matrix using the target number of singular values; selecting the row and/or column data of the target number from a first target matrix in the multiple decomposition matrices to obtain a second target matrix; and constructing an incremental matrix by using the diagonal matrix and the second target matrix.

In a possible implementation manner, the obtaining module 401 is further configured to obtain a first target matrix of the plurality of decomposition matrices, where the first target matrix includes a left singular value vector matrix and a right singular value vector matrix;

the constructing module 403 is specifically configured to select, from the left singular value vector matrix, the first matrix of the target number of column data; selecting the data with the target number from the right singular value vector matrix as a second matrix; the first matrix and the second matrix are determined as the second target matrix.

In a possible implementation manner, the constructing module 403 is specifically configured to determine, as the delta matrix, a matrix obtained by multiplying the first matrix, the diagonal matrix, and the second matrix.

Fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application, and on the basis of the foregoing embodiments, the present application further provides an electronic device, as shown in fig. 5, including: the device comprises a processor 501, a communication interface 502, a memory 503 and a communication bus 504, wherein the processor 501, the communication interface 502 and the memory 503 are in communication with each other through the communication bus 504;

the memory 503 has stored therein a computer program which, when executed by the processor 501, causes the processor 501 to perform the steps of:

acquiring a matrix to be adjusted in the fine adjustment model;

In a possible implementation manner, the processor 501 is further configured to determine, based on the multiple decomposition matrices, singular values, rank the singular values in order of magnitude, and determine a minimum sum value when a first sum value of a maximum singular value and other singular values adjacent thereto is greater than a preset threshold, and include a target number of target singular values.

In a possible implementation, the processor 501 is further configured to determine a target sum value of singular values included in the singular value matrix;

In a possible implementation, the processor 501 is further configured to construct a diagonal matrix using the target number of singular values;

In a possible implementation manner, the processor 501 is further configured to obtain a first target matrix of the plurality of decomposition matrices, where the first target matrix includes a left singular value vector matrix and a right singular value vector matrix;

In a possible implementation manner, the processor 501 is further configured to determine a matrix obtained by multiplying the first matrix, the diagonal matrix, and the second matrix as the delta matrix.

On the basis of the above embodiments, the present application also provides a computer readable storage medium having stored therein a computer program executable by a processor, which when run on the processor, causes the processor to perform the steps of:

acquiring a matrix to be adjusted in the fine adjustment model;

In a possible implementation manner, the determining the target number based on the plurality of decomposition matrices includes:

In one possible implementation manner, the determining the first sum value of the maximum singular value and other singular values adjacent to the maximum singular value is greater than the minimum sum value when the first sum value is greater than a preset threshold, and the target number of the target singular values includes:

In one possible implementation, the constructing a delta matrix based on the target number and the plurality of decomposition matrices includes:

constructing a diagonal matrix using the target number of singular values;

In a possible implementation manner, the selecting the target number of rows and/or columns of data in the first target matrix in the multiple decomposition matrices to obtain the second target matrix includes:

In one possible implementation, the constructing a delta matrix using the diagonal matrix and the second target matrix includes:

It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A method of model fine tuning training, the method comprising:

acquiring a matrix to be adjusted in the fine adjustment model;

2. The method of claim 1, wherein the determining the target number based on the plurality of decomposition matrices comprises:

3. The method of claim 2, wherein determining the first sum of the largest singular value and other singular values adjacent thereto is greater than a minimum sum of the predetermined threshold, the target number of target singular values comprising:

4. A method according to claim 3, wherein the predetermined threshold is any value greater than 0 and less than 1.

5. The method of claim 1, wherein the constructing a delta matrix based on the target number and the plurality of decomposition matrices comprises:

constructing a diagonal matrix using the target number of singular values;

6. The method of claim 5, wherein selecting the target number of rows and/or columns of data in a first target matrix of the plurality of decomposition matrices to obtain a second target matrix comprises:

7. The method of claim 6, wherein constructing an incremental matrix using the diagonal matrix and the second target matrix comprises:

8. A model fine tuning training apparatus, the apparatus comprising:

9. An electronic device comprising at least a processor and a memory, the processor being adapted to implement the steps of the model fine tuning training method according to any one of the preceding claims 1-7 when executing a computer program stored in the memory.

10. A computer readable storage medium, characterized in that it stores a computer program which, when executed by a processor, implements the steps of the model fine tuning training method according to any one of the preceding claims 1-7.