CN116524259A - Updating method and device of model parameters, testing method and device, and storage medium - Google Patents

Updating method and device of model parameters, testing method and device, and storage medium Download PDF

Info

Publication number
CN116524259A
CN116524259A CN202310457812.8A CN202310457812A CN116524259A CN 116524259 A CN116524259 A CN 116524259A CN 202310457812 A CN202310457812 A CN 202310457812A CN 116524259 A CN116524259 A CN 116524259A
Authority
CN
China
Prior art keywords
model
parameter
training
model parameters
parameters
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310457812.8A
Other languages
Chinese (zh)
Inventor
节世博
邓志鸿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN202310457812.8A priority Critical patent/CN116524259A/en
Publication of CN116524259A publication Critical patent/CN116524259A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • G06V10/765Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects using rules for classification or partitioning the feature space
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0499Feedforward networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to a model parameter updating method, a model parameter updating device, a model parameter testing method, model parameter testing equipment and a storage medium. The method comprises the following steps: acquiring training data and an identification model of pre-trained multimedia data; wherein the recognition model comprises a first model parameter; processing the identification model, and re-parameterizing the first model parameters into tensors to obtain second model parameters; wherein the second model parameters comprise parameter increments; training the parameter increment according to the training data and the recognition model to obtain an adjustment value of the parameter increment; and loading the adjustment value into the second model parameter to obtain the updated model parameter. By adopting the method, only partial parameters in the recognition model can be updated, so that when the pre-trained recognition model executes the downstream tasks, only the updated model parameters are needed to be stored for each downstream task, and the model parameters occupying the main body are shared among all the downstream tasks, thereby greatly reducing the storage cost and having wide practical significance and application value.

Description

Updating method and device of model parameters, testing method and device, and storage medium
Technical Field
The present disclosure relates to the field of artificial intelligence technologies, and in particular, to a method and apparatus for updating model parameters, a test method, a device, and a storage medium.
Background
At present, one general paradigm in the fields of computer vision and natural language processing is to use labeling data of a downstream task to perform fine tuning on a large-scale pre-trained model, so that the fine-tuned model can obtain excellent performance on the downstream task. Such paradigms may be applied to downstream tasks in a number of fields, including image classification, object detection, semantic segmentation, natural language understanding, natural language generation, and so forth. Since this paradigm updates all parameters in the pre-trained model, it requires the user to store for each downstream task a downstream task-specific fine-tuning model that is the same size as the pre-trained model.
In recent years, the size of commonly used pre-trained models has increased, and the need for the paradigm to store multiple downstream task-specific fine-tuning models in a multitasking scenario has resulted in a significant increase in storage overhead. For example, for an OPT model containing 1750 hundred million parameters, over 300GB of storage overhead will be incurred for each downstream task under the above-described paradigm. This drawback severely limits the application of large models in multitasking scenarios.
Disclosure of Invention
Based on the foregoing, it is necessary to provide a method, a device, a test method, a device and a storage medium for updating model parameters, so as to solve the problem that the storage cost is significantly increased due to the continuous increase of the size of a commonly used pre-trained model in the prior art, and severely limit the application of a large model in a multi-task scene.
In a first aspect, the present application provides a method for updating a model parameter. The method comprises the following steps:
acquiring training data and an identification model of pre-trained multimedia data; wherein the recognition model comprises a first model parameter;
processing the identification model, and re-parameterizing the first model parameters into tensors to obtain second model parameters; wherein the second model parameters include parameter increments;
training the parameter increment according to the training data and the identification model to obtain an adjustment value of the parameter increment;
and loading the adjustment value into the second model parameter to obtain an updated model parameter.
In one embodiment, the processing the identification model, re-parameterizing the first model parameters into tensors, and obtaining second model parameters includes:
Dividing the weight matrix in the identification model into a plurality of submatrices with the same size;
stacking all the submatrices to form tensors of preset dimensions, and obtaining tensor representation of the first model parameters;
the tensor representation of the first model parameter is noted as a second model parameter.
In one embodiment, the training the parameter increment according to the training data and the recognition model to obtain the adjustment value of the parameter increment includes:
decomposing the parameter increment to obtain a plurality of factors;
and training each factor according to the training data and the identification model to obtain the adjustment value of the parameter increment.
In one embodiment, the training each factor according to the training data and the recognition model to obtain the adjustment value of the parameter increment includes:
reconstructing the parameter increment by adopting each factor, and loading the reconstructed parameter increment to the second model parameter to obtain an intermediate recognition model;
inputting the training data into the intermediate recognition model, training the intermediate recognition model under the constraint of a loss function, updating each factor in the training process until a preset convergence condition is reached,
And taking the numerical value of each factor when the preset convergence condition is reached as the adjustment value of the parameter increment.
In one embodiment, the identification model includes a plurality of sub-modules, each of the sub-modules including sub-model parameters;
training the intermediate recognition model under the constraint of the loss function, updating each factor until a preset convergence condition is reached in the training process, and taking the numerical value of each factor when the preset convergence condition is reached as the adjustment value of the parameter increment, wherein the method comprises the following steps:
in the process of training the intermediate recognition model under the constraint of the loss function, updating each factor and each sub-model parameter until a preset convergence condition is reached;
and taking the values of the factors and the sub-model parameters when the preset convergence condition is reached as adjustment values of the parameter increment.
In a second aspect, the present application provides a test method. The method comprises the following steps:
obtaining updated model parameters by adopting the updating method of the model parameters in any one of the first aspects;
and testing the multimedia data to be identified by adopting an identification model with updated model parameters to obtain a test result.
In a third aspect, the present application further provides a device for updating a model parameter. The device comprises:
the acquisition module is used for acquiring training data and an identification model of the pre-trained multimedia data; wherein the recognition model comprises a first model parameter;
the processing module is used for processing the identification model, and re-parameterizing the first model parameters into tensors to obtain second model parameters;
the training module is used for training the parameter increment according to the training data and the identification model to obtain an adjustment value of the parameter increment;
and the loading module is used for loading the adjustment value into the second model parameter to obtain an updated model parameter.
In a fourth aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program, and a processor implementing the method steps of any of the first aspect and the method steps of the second aspect when the computer program is executed.
In a fifth aspect, the present application also provides a computer-readable storage medium. The computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method steps of any of the first aspect, and the method steps of the second aspect.
In a sixth aspect, the present application also provides a computer program product. The computer program product comprising a computer program which, when executed by a processor, implements the method steps of any of the first aspect, and the method steps of the second aspect.
The method, the device, the test method, the equipment and the storage medium for updating the model parameters have at least the following advantages:
acquiring training data and a pre-trained recognition model, and re-parameterizing first model parameters in the recognition model into tensors to obtain second model parameters; wherein the second model parameters comprise parameter increments; training the parameter increment according to the training data and the recognition model to obtain an adjustment value of the parameter increment; and loading the adjustment value into the second model parameter to obtain the updated model parameter. According to the method and the device, only part of parameters in the recognition model are updated, so that when the pre-trained recognition model executes the downstream tasks, only the updated model parameters are needed to be stored for each downstream task, and the model parameters occupying the main body are shared among all the downstream tasks, so that the storage cost is greatly reduced, and the method and the device have wide practical significance and application value.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is an application environment diagram of a method of updating model parameters in one embodiment;
FIG. 2 is a flow diagram of a method of updating model parameters in one embodiment;
FIG. 3 is a flowchart illustrating steps for obtaining second model parameters in one embodiment;
FIG. 4 is a flowchart illustrating steps for obtaining an adjustment value of a parameter increment according to one embodiment;
FIG. 5 is a flowchart illustrating steps for obtaining an adjustment value of a parameter increment according to another embodiment;
FIG. 6 is a block diagram of an embodiment of a device for updating model parameters;
Fig. 7 is an internal structural diagram of a computer device in one embodiment.
Detailed Description
Other advantages and effects of the present invention will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present invention with reference to specific examples. The invention may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present invention. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict.
Some exemplary embodiments of the invention have been described for illustrative purposes, it being understood that the invention may be practiced otherwise than as specifically shown in the accompanying drawings.
The method for updating the model parameters provided by the embodiment of the application can be applied to an application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on a cloud or other network server.
The terminal 102 may acquire the training data and the pre-trained recognition model, and send the training data and the pre-trained recognition model to the server 104, so that the server 104 processes the training data and the pre-trained recognition model, for example, the server 104 processes a first model parameter of the recognition model, and re-parameterizes the first model parameter into a tensor to obtain a second model parameter; training the second model parameters according to the training data and the recognition model to obtain third model parameters; and loading the third model parameters into the tensor to obtain updated model parameters. The server 104 thus feeds back the updated model parameters to the terminal 102.
The method for updating the model parameters comprises the steps of obtaining training data and a pre-trained recognition model, and re-parameterizing first model parameters in the recognition model into tensors to obtain second model parameters; training the second model parameters according to the training data and the recognition model to obtain third model parameters; and loading the third model parameters into the tensor to obtain updated model parameters. According to the method and the device for updating the parameters of the recognition model, partial parameters in the recognition model are updated, so that the pre-trained recognition model only needs to store the updated model parameters for each downstream task when the downstream tasks are executed, and the model parameters occupying the main body are shared among all the downstream tasks, so that the storage cost is greatly reduced, and the method and the device have wide practical significance and application value.
The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server 104 may be implemented as a stand-alone server or as a server cluster of multiple servers.
In a possible embodiment, the embodiment of the present application provides a method for updating model parameters, and the following description will take an example that the method is applied to the server in fig. 1 as an example.
Referring to fig. 2, fig. 2 is a flow chart of a method for updating model parameters according to the present embodiment, which specifically includes the following steps:
step S202, acquiring training data and an identification model of pre-trained multimedia data; wherein the recognition model comprises a first model parameter;
step S204, processing the identification model, and re-parameterizing the first model parameters into tensors to obtain second model parameters; wherein the second model parameters comprise parameter increments.
Step S206, training the parameter increment according to the training data and the recognition model to obtain an adjustment value of the parameter increment.
Step S208, loading the adjustment value into the second model parameter to obtain the updated model parameter.
Specifically, the training data is original multimedia data collected in advance and multimedia data carrying labels, and the training data can be randomly divided into a training set and a testing set according to a preset proportion; the training set is used to train the recognition model, and the training step includes, for example: inputting the training set into the initial recognition model to obtain a predicted value, calculating the predicted value and the loss value of the labeling value by adopting a loss function, correcting the parameters of the initial recognition model according to the loss value, stopping the correction process when the loss value reaches a preset condition, and selecting the parameters with the minimum loss value as the parameters of the trained recognition model. The test set is used for verifying the recognition accuracy of the trained recognition model, and the test set is input into the trained recognition model by way of example, and if the output predicted value reaches the preset recognition accuracy value, the verification is qualified. The multimedia data comprises picture data and text data, and in one possible example, the multimedia data can also comprise video data and audio data, and when the video data is processed, the video data can be stored as a plurality of pieces of picture data first, and then the picture data is marked; when processing the audio data, the audio data can be processed into text data, and then the text data can be marked.
The pre-trained recognition model in this embodiment is a neural network obtained by training based on original multimedia data and multimedia data carrying labels, where the multimedia data used in training may be the same as or different from the training data. In addition, the neural network is typically a multi-layer network structure, such as a multi-head self-attention module, a feed-forward neural network module, or a multi-head self-attention module, a multi-head cross-attention module, a feed-forward neural network module, or the like.
The method for updating the model parameters comprises the steps of obtaining training data and a pre-trained recognition model, and re-parameterizing first model parameters in the recognition model into tensors to obtain second model parameters; wherein the second model parameters comprise parameter increments; training the parameter increment according to the training data and the recognition model to obtain an adjustment value of the parameter increment; and loading the adjustment value into the second model parameter to obtain the updated model parameter. According to the method and the device, the first model parameters in the recognition model are converted into tensor forms, part of parameters can be updated, so that when the pre-trained recognition model executes downstream tasks, only the updated model parameters are needed to be stored for each downstream task, and the model parameters occupying the main body are shared among all downstream tasks, so that the storage cost is greatly reduced, and the method and the device have wide practical significance and application value.
Referring to fig. 3, optionally, processing the identification model, re-parameterizing the first model parameters into tensors, and obtaining second model parameters includes:
step S302, dividing a weight matrix in the recognition model into a plurality of submatrices with the same size;
step S304, stacking all the submatrices to form tensors with preset dimensions, and obtaining tensor representation of the first model parameters;
step S306, the tensor representation of the first model parameter is noted as the second model parameter.
Specifically, tensor (Tensor) is a multiple linear mapping defined on a Cartesian product of some vector space and some dual space, e.g., a scalar may be considered a 0-dimensional Tensor, a vector may be considered a one-dimensional Tensor, and a matrix may be considered a two-dimensional Tensor.
All weight matrixes in a multi-head self-attention module and a feedforward neural network module of the pre-trained recognition model are split and spliced to form a plurality of submatrices with the same size, and then the matrixes are stacked into tensors with preset dimensions to obtain tensor representation of first model parameters. The tensor representation of the first model parameter includes a basic tensor and a parameter increment, and for convenience of description, the tensor representation of the first model parameter is denoted as the second model parameter in this embodiment. The basic tensor is a model parameter which can be shared by downstream tasks; the parameter delta is the model parameter to be updated, typically the exact same tensor as the base tensor shape, with an initial value of zero.
Illustratively, all of the fully connected layer weight matrices of 4d×d and d×4d in the feedforward neural network module are split into 4 matrices of d×d in shape, respectively, and then these matrices are stacked with all of the transformation matrices of d×d in the multi-headed self-attention module and the multi-headed cross-attention module. These k d x d matrices are stacked to form a tensor, i.e., a tensor representation of the recognition model.
Optionally, in the process of re-parameterizing the first model parameters into tensors, only part of the parameters can be selected for processing according to needs, namely, only part of the weight matrixes in the recognition model are split and spliced to form a plurality of matrixes with the same size, and then the matrixes are stacked into a single tensor, so that the processing complexity is reduced.
Referring to fig. 4, optionally, training the parameter increment according to the training data and the recognition model to obtain an adjustment value of the parameter increment includes:
step S402, decomposing the parameter increment to obtain a plurality of factors;
and step S404, training each factor according to the training data and the recognition model to obtain an adjustment value of the parameter increment.
Specifically, in this embodiment, the tensor decomposition method is used to decompose the parameter increment into a plurality of factors, where each factor characterizes information hidden in the parameter increment. In multi-linear algebra, tensor decomposition can be regarded as popularization from matrix singular value decomposition to tensor, and has been applied to statistics, signal processing, computer vision, numerical analysis and data mining. The Tensor decomposition method mainly includes CP decomposition, tucker decomposition, t-SVD decomposition, tensor-Train decomposition, etc., and in this embodiment, the Tensor-Train decomposition method and Tucker decomposition method are used for explanation. Wherein, the Tensor-Train decomposition is to decompose one N-order Tensor into N second-order or third-order Tensor condensed forms; the Tucker decomposition is to decompose the tensor into a set of matrices, each representing a base in one direction, and a small core tensor, by which the bases in each direction are linked, so that an approximation of the original data can be reconstructed from the matrix in each direction and one core tensor.
Exemplary tensor representation of the first model parameterComprising basic tensor->And parameter increment->The expression is +.>Wherein the foundation tensor->And the shape is kxdxd.
Then for the Tensor-Train decomposition method, the parameter delta Deltaw can be decomposed into factors Wherein->
For the Tucker decomposition method, the parameter delta Deltaw may be decomposed into factors Wherein->
S and r in the parameter increment Deltaw expression are adjustable super parameters; k. d is the dimension of the base tensor; i. j and k are index values of tensor elements;
for the Tensor-Train decomposition method, the factor V is zero initialization, and the factor Sigma and the factor U are random initialization.
For the Tucker decomposition method, the factor V is zero initialization, and the factors C, U and P are random initialization.
In the above embodiment, the parameters of the first model are re-parameterized into tensors, and the parameter increments thereof are decomposed to obtain a plurality of factors, and only the light-weight factors need to be stored after the training of each factor is finished, without storing the whole model, thereby greatly reducing the storage cost.
Referring to fig. 5, optionally, training each factor according to the training data and the recognition model to obtain an adjustment value of the parameter increment includes:
step S502, reconstructing parameter increment by adopting each factor, and loading the reconstructed parameter increment to a second model parameter to obtain an intermediate recognition model;
Step S504, inputting training data into the intermediate recognition model, training the intermediate recognition model under the constraint of a loss function, and updating each factor in the training process until a preset convergence condition is reached;
in step S506, the value of each factor when the preset convergence condition is reached is used as the adjustment value of the parameter increment.
Specifically, the tensor representation of the first model parameter includes a basic tensor and a parameter increment, in order to obtain the overall gradient distribution in the training process of the recognition model, in this embodiment, the factors obtained by decomposition in the steps are reconstructed into the parameter increment Δw, the reconstructed parameter increment is loaded onto the second model parameter, the initial parameter increment is replaced, and the reconstructed parameter increment and the basic increment form an intermediate recognition model;
the training data are pre-collected original multimedia data and multimedia data carrying labels, and the training data can be randomly divided into a training set and a testing set according to a preset proportion. And inputting a training set in the training data into the intermediate recognition model to obtain a predicted value, and calculating a loss value between the predicted value and the labeling value by adopting a loss function. In the training process, the numerical value of each parameter in the basic tensor is locked, the numerical value of each factor in the parameter increment is corrected according to the loss value, the correction process is stopped when the preset convergence condition is reached, and the numerical value of each factor when the preset convergence condition is reached is selected as the adjustment value of the parameter increment. Wherein each factor in the parameter delta refers to the Σ, U, V of the Tensor-Train decomposition method or the C, U, V, P of the Tucker decomposition method. Further, the test set may also be input into the recognition model constructed according to the adjustment value to verify the accuracy of the recognition model.
Optionally, the judging mode of the preset convergence condition includes: and if the loss value is smaller than the preset threshold value or the training times reach the preset times, the preset convergence condition is considered to be met.
In the above embodiment, a tensor decomposition method is adopted to decompose the parameter increment, so as to obtain a plurality of factors representing hidden information in the parameter increment, sample training data and an identification model, train each factor, and thus obtain a global optimal solution of the parameter, and further obtain the identification model with higher identification precision.
Optionally, the recognition model comprises a plurality of sub-modules, each sub-module comprising sub-model parameters;
training the intermediate recognition model under the constraint of the loss function, updating each factor in the training process until a preset convergence condition is reached, and taking the numerical value of each factor when the preset convergence condition is reached as an adjustment value of the parameter increment, wherein the method comprises the following steps:
in the process of training the intermediate recognition model under the constraint of the loss function, updating each factor and each sub-model parameter until a preset convergence condition is reached;
and taking the values of each factor and each submodel parameter when the preset convergence condition is reached as the adjustment value of the parameter increment.
Specifically, under different recognition task scenes, the recognition model is different in structure, for example, the recognition model further comprises sub-models such as a classification head, a detection head, a segmentation decoder and the like which are added to the recognition model according to classification tasks, detection tasks, segmentation tasks and the like, and in the training process, the sub-model parameters of each sub-model are adjusted together while adjusting each factor so that the recognition model meets the training requirement. After training, storing the adjustment value of the parameter increment, wherein the adjustment value comprises updated factors, and sub-models such as an adjusted classification head, an adjusted detection head, an adjusted segmentation decoder and the like. When a new test task is received, the adjustment value of the parameter increment stored in the step is loaded into the second model parameter to replace the initial parameter increment in the second model parameter, so that the updated model parameter can be obtained, and the multimedia data to be identified is tested by adopting the identification model with the updated model parameter to obtain a test result.
It will be appreciated that the second model parameter is a tensor representation of the first model parameter, and therefore, the second model parameter corresponds one-to-one to the first model parameter, and during the training process, the second model parameter is adjusted, which also corresponds to the corresponding portion of the adjusted first model parameter. The updated model parameters may be in the form of tensors or may be in the form of a reduction of the second model parameters to the first model parameters.
For example, for the Tensor-Train decomposition method, the expression of the updated model parameter w is:
for the Tucker decomposition method, the expression of the updated model parameter w is:
when the specific task is tested, the identification model with updated model parameters is adopted to complete the testing task.
In order to enable those skilled in the art to fully understand the present application, the following describes the steps of the method for updating model parameters of the present application in detail:
acquiring training data and a pre-trained recognition model, wherein the training data are original multimedia data collected in advance and multimedia data carrying labels, the training data can be randomly divided into a training set and a testing set according to a preset proportion, the training set is used for training the recognition model, and the testing set is used for verifying the recognition accuracy of the trained recognition model; the pre-trained recognition model is a neural network of a multi-layer network structure obtained by training based on the original multimedia data and the multimedia data carrying the labels.
Further, the recognition model is processed, and the weight matrix in the recognition model is divided into a plurality of submatrices with the same size; and stacking all the submatrices to form tensors of preset dimensions to obtain tensor representation of the first model parameters, and recording the tensor representation of the first model parameters as second model parameters. Wherein the second model parameters include a base tensor and a parameter delta.
Further, a tensor decomposition method is used for decomposing the parameter increment to obtain a plurality of factors, the parameter increment is reconstructed based on each factor, the reconstructed parameter increment is loaded on the second model parameter to replace the initial parameter increment, and the reconstructed parameter increment and the basic increment form an intermediate recognition model. And inputting training data into the intermediate recognition model, training the intermediate recognition model under the constraint of the loss function, and updating each factor in the training process until a preset convergence condition is reached, wherein the numerical value of each factor when the preset convergence condition is reached is used as an adjustment value of the parameter increment.
Further, when the recognition model comprises a plurality of sub-modules, and each sub-module comprises sub-model parameters, updating each factor and each sub-model parameter in the process of training the intermediate recognition model under the constraint of the loss function until a preset convergence condition is reached; and taking the values of each factor and each submodel parameter when the preset convergence condition is reached as the adjustment value of the parameter increment.
And loading the acquired adjustment value of the parameter increment into the second model parameter to obtain the updated model parameter.
When a new test task is received, the multimedia data to be tested is input into an identification model with updated model parameters, and then the test can be performed.
The method for updating the model parameters comprises the steps of processing a first model parameter of an identification model into tensors, and decomposing parameter increment of the tensors to obtain a plurality of factors; and training each factor according to the training data and the recognition model, and taking the numerical value of each factor when the preset convergence condition is reached as the adjustment value of the parameter increment. According to the method and the device, only part of parameters in the recognition model are updated, so that when the pre-trained recognition model executes the downstream tasks, only the updated model parameters are needed to be stored for each downstream task, and the model parameters occupying the main body are shared among all the downstream tasks, so that the storage cost is greatly reduced, and the method and the device have wide practical significance and application value.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides a device for updating the model parameters for realizing the above related method for updating the model parameters. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiment of the updating device for one or more model parameters provided below may be referred to the limitation of the updating method for model parameters hereinabove, and will not be described herein.
Referring to fig. 6, in a possible embodiment, an apparatus for updating model parameters is provided in an embodiment of the present application, including: the device comprises an acquisition module, a processing module, a training module and a loading module, wherein:
the acquisition module is used for acquiring training data and an identification model of the pre-trained multimedia data; wherein the identification model comprises a first model parameter.
And the processing module is used for processing the identification model, re-parameterizing the first model parameters into tensors and obtaining second model parameters.
And the training module is used for training the parameter increment according to the training data and the identification model to obtain an adjustment value of the parameter increment.
And the loading module is used for loading the adjustment value into the second model parameter to obtain the updated model parameter.
Optionally, the processing module processes the identification model, re-parameterizes the first model parameter into a tensor, and obtains a second model parameter, including:
step one, dividing a weight matrix in an identification model into a plurality of submatrices with the same size;
stacking all the submatrices to form tensors with preset dimensions to obtain tensor representation of the first model parameters;
and thirdly, recording the tensor representation of the first model parameter as a second model parameter.
Optionally, the training module trains the parameter increment according to the training data and the recognition model to obtain an adjustment value of the parameter increment, including:
decomposing parameter increment to obtain a plurality of factors;
training each factor according to the training data and the recognition model to obtain an adjustment value of the parameter increment.
Optionally, the training module trains each factor according to the training data and the recognition model to obtain an adjustment value of the second model, including:
reconstructing parameter increments by adopting various factors, and loading the reconstructed parameter increments to second model parameters to obtain an intermediate recognition model;
inputting training data into the intermediate recognition model, training the intermediate recognition model under the constraint of a loss function, and updating each factor in the training process until a preset convergence condition is reached;
And thirdly, taking the numerical value of each factor when the preset convergence condition is reached as the adjustment value of the parameter increment.
Optionally, when the recognition model includes a plurality of sub-modules, each sub-module includes a sub-model parameter, the training module trains the intermediate recognition model under the constraint of the loss function, updates each factor in the training process until reaching a preset convergence condition, and uses the value of each factor when reaching the preset convergence condition as an adjustment value of the parameter increment, including:
in the process of training the intermediate recognition model under the constraint of the loss function, updating each factor and each sub-model parameter until a preset convergence condition is reached;
and taking the values of each factor and each submodel parameter when the preset convergence condition is reached as the adjustment value of the parameter increment. After training, storing the adjustment value of the parameter increment, wherein the adjustment value comprises updated factors, and sub-models such as an adjusted classification head, an adjusted detection head, an adjusted segmentation decoder and the like. When a new test task is received, the adjustment value of the parameter increment stored in the step is loaded into the second model parameter to replace the initial parameter increment in the second model parameter, so that the updated model parameter can be obtained, and the multimedia data to be identified is tested by adopting the identification model with the updated model parameter to obtain a test result.
The updating device of the model parameters processes the first model parameters of the identification model into tensors, and decomposes the parameter increment of the tensors to obtain a plurality of factors; and training each factor according to the training data and the recognition model, and taking the numerical value of each factor when the preset convergence condition is reached as the adjustment value of the parameter increment. According to the method and the device, only part of parameters in the recognition model are updated, so that when the pre-trained recognition model executes the downstream tasks, only the updated model parameters are needed to be stored for each downstream task, and the model parameters occupying the main body are shared among all the downstream tasks, so that the storage cost is greatly reduced, and the method and the device have wide practical significance and application value.
The above-mentioned respective modules in the updating means of the model parameters may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In a possible embodiment, the embodiment of the present application further provides a testing method, which is characterized in that the method includes:
Obtaining updated model parameters by adopting the updating method of the model parameters in any one of the embodiments; and testing the multimedia data to be identified by adopting an identification model with updated model parameters to obtain a test result.
Specifically, the updated model parameters are derived based on training data and a pre-trained recognition model. It should be understood that the type of multimedia data used for training the model is different based on different test tasks, e.g. test tasks for text recognition, the multimedia data used for training is text data; aiming at the image recognition test task, the multimedia data used for training are picture data. After receiving a new test task, calling a prestored parameter increment adjustment value, loading the adjustment value on model parameters of a pre-trained recognition model to obtain updated model parameters, and testing the multimedia data to be recognized to obtain a test result, thereby realizing the sharing of the recognition model among different tasks.
In addition, the updated model parameters may be in tensor form, or tensor representation of the updated model parameters may be restored to an initial format, and the specific format is set according to the test requirement.
According to the test method, only part of parameters in the pre-trained recognition model are updated before the test, so that when the pre-trained recognition model executes downstream tasks, only updated model parameters are stored for each downstream task, and model parameters occupying a main body are shared among all downstream tasks, thereby greatly reducing storage expenditure and having wide practical significance and application value.
In one possible embodiment, a computer device is provided, which may be a terminal, and the internal structure thereof may be as shown in fig. 7. The computer device includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input means. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface, the display unit and the input device are connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program, when executed by a processor, implements a method of updating or testing model parameters. The display unit of the computer device is used for forming a visual picture, and can be a display screen, a projection device or a virtual reality imaging device. The display screen can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be a key, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by those skilled in the art that the structure shown in fig. 7 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In a possible embodiment, a computer device is provided, comprising a memory and a processor, the memory having stored therein a computer program, which when executed by the processor implements the method steps of a method for updating or testing model parameters disclosed in the above-mentioned embodiments.
In a possible embodiment, a computer readable storage medium is provided, on which a computer program is stored, which computer program, when being executed by a processor, implements the method steps of a model parameter updating method or a testing method disclosed in the above embodiments.
In a possible embodiment, a computer program product is provided, comprising a computer program which, when being executed by a processor, implements the method steps of a model parameter updating method or testing method disclosed in the above-mentioned embodiments.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.

Claims (10)

1. A method for updating model parameters, the method comprising:
acquiring training data and an identification model of pre-trained multimedia data; wherein the recognition model comprises a first model parameter;
processing the identification model, and re-parameterizing the first model parameters into tensors to obtain second model parameters; wherein the second model parameters include parameter increments;
Training the parameter increment according to the training data and the identification model to obtain an adjustment value of the parameter increment;
and loading the adjustment value into the second model parameter to obtain an updated model parameter.
2. The method of claim 1, wherein the processing the recognition model to reparameter the first model parameters into tensors to obtain second model parameters comprises:
dividing the weight matrix in the identification model into a plurality of submatrices with the same size;
stacking all the submatrices to form tensors of preset dimensions, and obtaining tensor representation of the first model parameters;
the tensor representation of the first model parameter is noted as a second model parameter.
3. The method of claim 1, wherein training the parameter increment according to the training data and the recognition model to obtain the adjustment value of the parameter increment comprises:
decomposing the parameter increment to obtain a plurality of factors;
and training each factor according to the training data and the identification model to obtain the adjustment value of the parameter increment.
4. A method according to claim 3, wherein training each of the factors based on the training data and the recognition model to obtain the adjustment value of the parameter increment comprises:
Reconstructing the parameter increment by adopting each factor, and loading the reconstructed parameter increment to the second model parameter to obtain an intermediate recognition model;
inputting the training data into the intermediate recognition model, training the intermediate recognition model under the constraint of a loss function, updating each factor in the training process until a preset convergence condition is reached,
and taking the numerical value of each factor when the preset convergence condition is reached as the adjustment value of the parameter increment.
5. The method of claim 4, wherein the recognition model comprises a plurality of sub-modules, each sub-module comprising sub-model parameters;
training the intermediate recognition model under the constraint of the loss function, updating each factor until a preset convergence condition is reached in the training process, and taking the numerical value of each factor when the preset convergence condition is reached as the adjustment value of the parameter increment, wherein the method comprises the following steps:
in the process of training the intermediate recognition model under the constraint of the loss function, updating each factor and each sub-model parameter until a preset convergence condition is reached;
And taking the values of the factors and the sub-model parameters when the preset convergence condition is reached as adjustment values of the parameter increment.
6. A method of testing, the method comprising:
obtaining updated model parameters by adopting the updating method of the model parameters according to any one of claims 1 to 5;
and testing the multimedia data to be identified by adopting an identification model with updated model parameters to obtain a test result.
7. An apparatus for updating model parameters, the apparatus comprising:
the acquisition module is used for acquiring training data and an identification model of the pre-trained multimedia data; wherein the recognition model comprises a first model parameter;
the processing module is used for processing the identification model, and re-parameterizing the first model parameters into tensors to obtain second model parameters;
the training module is used for training the parameter increment according to the training data and the identification model to obtain an adjustment value of the parameter increment;
and the loading module is used for loading the adjustment value into the second model parameter to obtain an updated model parameter.
8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 5 or the steps of the method of claim 6 when the computer program is executed.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any one of claims 1 to 5, or the steps of the method of claim 6.
10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any one of claims 1 to 5, or the steps of the method of claim 6.
CN202310457812.8A 2023-04-25 2023-04-25 Updating method and device of model parameters, testing method and device, and storage medium Pending CN116524259A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310457812.8A CN116524259A (en) 2023-04-25 2023-04-25 Updating method and device of model parameters, testing method and device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310457812.8A CN116524259A (en) 2023-04-25 2023-04-25 Updating method and device of model parameters, testing method and device, and storage medium

Publications (1)

Publication Number Publication Date
CN116524259A true CN116524259A (en) 2023-08-01

Family

ID=87405901

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310457812.8A Pending CN116524259A (en) 2023-04-25 2023-04-25 Updating method and device of model parameters, testing method and device, and storage medium

Country Status (1)

Country Link
CN (1) CN116524259A (en)

Similar Documents

Publication Publication Date Title
CN112101190B (en) Remote sensing image classification method, storage medium and computing device
WO2020248841A1 (en) Au detection method and apparatus for image, and electronic device and storage medium
WO2022257730A1 (en) Methods and apparatus for multiple parties to collaboratively update model while protecting privacy, and system
CN116824092B (en) Three-dimensional model generation method, three-dimensional model generation device, computer equipment and storage medium
CN112488923A (en) Image super-resolution reconstruction method and device, storage medium and electronic equipment
CN115082358A (en) Image enhancement method and device, computer equipment and storage medium
CN116188878A (en) Image classification method, device and storage medium based on neural network structure fine adjustment
CN116010226A (en) Software system reliability simulation evaluation method and device and computer equipment
CN116524259A (en) Updating method and device of model parameters, testing method and device, and storage medium
CN111667046A (en) Deep learning acceleration method and user terminal
CN115346616A (en) Training method and device of crystal property prediction model and computer equipment
CN114782960A (en) Model training method and device, computer equipment and computer readable storage medium
CN111950015A (en) Data open output method and device and computing equipment
CN116385836A (en) Image recognition method, model training method and device
CN118279597A (en) Object detection method, device, apparatus, medium, and program product
CN117370488A (en) Data processing method, device, electronic equipment and computer readable storage medium
CN116719994A (en) Media object recommendation method, device, computer equipment and storage medium
CN116127183A (en) Service recommendation method, device, computer equipment and storage medium
CN117035111A (en) Multitasking method, system, computer device and storage medium
CN116245805A (en) Defect detection method and device for substation equipment, computer equipment and storage medium
CN116343745A (en) Speech synthesis method, device, computer equipment and storage medium
CN117648579A (en) Data information and event information matching method and device and computer equipment
CN117634751A (en) Data element evaluation method, device, computer equipment and storage medium
CN115496688A (en) Image processing method, image processing device, computer equipment and storage medium
CN116910115A (en) Group query method, device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination