CN117217292A - Model training method and device - Google Patents

Model training method and device Download PDF

Info

Publication number
CN117217292A
CN117217292A CN202210604357.5A CN202210604357A CN117217292A CN 117217292 A CN117217292 A CN 117217292A CN 202210604357 A CN202210604357 A CN 202210604357A CN 117217292 A CN117217292 A CN 117217292A
Authority
CN
China
Prior art keywords
prompt
parameter set
model
training
hint
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210604357.5A
Other languages
Chinese (zh)
Inventor
史佳欣
朴世豪
田奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Cloud Computing Technologies Co Ltd
Original Assignee
Huawei Cloud Computing Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Cloud Computing Technologies Co Ltd filed Critical Huawei Cloud Computing Technologies Co Ltd
Priority to CN202210604357.5A priority Critical patent/CN117217292A/en
Priority to PCT/CN2023/077318 priority patent/WO2023231458A1/en
Publication of CN117217292A publication Critical patent/CN117217292A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/0895Weakly supervised learning, e.g. semi-supervised or self-supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/096Transfer learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

A model training method and apparatus, in this method, training equipment chooses the first suggestion parameter set from the suggestion parameter pool of AI platform; based on a first prompt parameter set, an initialized task model is obtained, wherein the task model comprises a prompt layer and a basic model, the first prompt parameter set is used for initializing the prompt layer, and the basic model is a pre-trained AI model deployed in the AI platform; and training the initialized task model based on the training data set to obtain a trained task model. Training is performed on the basis of the task model initialized by the first prompt parameter set, so that training efficiency can be improved, and calculation overhead can be saved.

Description

Model training method and device
Technical Field
The application relates to the technical field of artificial intelligence, in particular to a model training method and device.
Background
Typically, a neural network model requires an algorithm engineer to train for a long period of time based on the computational power of a powerful graphics processing unit (graphic processing unit, GPU) using specialized knowledge to label samples. The transfer learning is a learning idea and mode, and the transfer learning refers to a learning process of applying a model learned in an old field to a new field by utilizing the similarity among data, tasks or models, and the core is to find the similarity between a new problem and the old problem so as to smoothly realize the transfer of knowledge. Deep learning is to let a machine autonomously acquire knowledge from data, thereby being applied to solve new problems. At present, how to improve the model training efficiency in the transfer learning is a problem to be solved urgently.
Disclosure of Invention
The application provides a model training method and device, which are used for improving the model training efficiency.
In a first aspect, embodiments of the present application provide a data recovery method that may be performed by a computing device (e.g., a server), in which the computing device selects a set of hint parameters (denoted as a first set of hint parameters) from a pool of hint parameters of an artificial intelligence (Artificial Intelligence, AI) platform based on a task model prior to initialization and training data of the task; using a first prompt parameter set to assign a prompt layer of an initial task model to obtain an initialized task model, wherein the task model comprises the prompt layer and a basic model, and the basic model is a pre-trained AI model deployed in an AI platform; and then training the initialized task model by using the training data set of the task to obtain a trained task model.
Through the design, the task model comprises a pre-training model and a prompt layer (the prompt layer comprises one or more prompt parameters), a first prompt parameter set is selected from a prompt parameter pool, the task model is assigned by using the first prompt parameter set to obtain an initialized task model, and then the prompt parameters of the initialized task model are finely adjusted by using training data of the task to obtain the trained task model. The training efficiency can be improved, and the calculation overhead can be saved. Furthermore, different tasks can share the same task model, and when the task is executed, a prompt parameter set of the task (a prompt parameter set corresponding to the trained task model) is input, so that the calculation cost is further saved, and the storage pressure is reduced.
In one possible implementation, after obtaining the trained task model, the computing device stores a set (denoted as a second set of hint parameters) based on the composition of parameters in the hint layer in the trained task module into a pool of hint parameters of the AI platform.
Through the design, the second prompt parameter set of the trained task model is stored in the prompt parameter pool, and the prompt parameter pool can be updated in time based on the change of the task, so that the association degree between the subsequent task model and the prompt parameter pool is enhanced.
In one possible implementation manner, the first prompting parameter set is a prompting parameter set meeting a set condition in a prompting parameter pool; wherein, the setting conditions include: obtaining a loss function value of the task model based on one prompt parameter set in the prompt parameter pool and the training sample, and taking the prompt parameter set corresponding to the minimum value of the loss function as the first prompt parameter set; or the set conditions are as follows: and obtaining the accuracy of the task model based on one prompt parameter set in the prompt parameter pool and the training sample, and taking the prompt parameter set corresponding to the maximum accuracy as the first prompt parameter set.
Through the design, the first prompt parameter set is the prompt parameter set with the minimum loss function value or the highest accuracy in the prompt parameter pool, so that training is performed on the basis of the task model initialized based on the first prompt parameter set, and training efficiency can be greatly improved.
In one possible implementation, the hint parameter pool includes an original hint parameter set and/or a meta hint parameter set; wherein one original prompt parameter set comprises various prompt parameters included in a prompt layer in task models of other tasks after training, and the task models of other tasks comprise the basic model and the prompt layer; a meta-hint parameter set is obtained by processing one or more original hint parameter sets.
Through the design, the first prompt parameter set is selected from the original prompt parameter set and/or the meta prompt parameter set, and the first prompt parameter is used for initializing the prompt layer in the task model, so that the model training efficiency can be accelerated relative to a mode of assigning values for the prompt layer in the task model at any time.
In one possible implementation, the meta-hint parameter set is generated by: clustering the original prompt parameter sets in the prompt parameter pool to obtain one or more categories; a set of meta-hint parameters is generated based on one or more of the original hint parameter sets included in one of the categories.
Through the design, the original prompt parameter sets are clustered to obtain the meta prompt parameter sets with smaller quantity, so that when the first prompt parameter set is selected based on the meta prompt parameter sets, the selection of the first prompt parameter set can be further accelerated, and the calculation force is saved.
In one possible implementation manner, the generating a meta-hint parameter set according to one or more original hint parameter sets included in one of the categories includes:
acquiring a weight value of each original prompt parameter set in one or more original prompt parameter sets included in the category; the weight value is the distance between the original prompt parameter set and the center point of the category;
and calculating the values of the prompt parameters in the meta prompt parameter set according to each original prompt parameter set and the weight value.
In a second aspect, an embodiment of the present application further provides a model training apparatus, where the apparatus has a function of implementing the computing apparatus in the foregoing method example of the first aspect, and beneficial effects may be referred to the description of the first aspect and are not repeated herein. The functions may be implemented by hardware, or may be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the functions described above.
In a third aspect, the present application also provides a computing device comprising a processor and a communication interface, the processor performing the method provided by the first aspect or any of the possible implementation manners of the first aspect. The communication interface is used for communicating with other devices, such as receiving a request.
In a fourth aspect, the application provides a computer readable storage medium which, when executed by a computing device, performs the method provided in the foregoing first aspect or any possible implementation of the first aspect. The storage medium stores a program. The storage medium includes, but is not limited to, volatile memory, such as random access memory, non-volatile memory, such as flash memory, hard Disk Drive (HDD), solid state disk (solid state drive, SSD).
In a fifth aspect, the present application provides a computer program product comprising computer instructions which, when executed by a computing device, performs the method provided in the foregoing first aspect or any possible implementation of the first aspect. The computer program product may be a software installation package which may be downloaded and executed on a computing device in case the method provided in the first aspect or any of the possible implementations of the first aspect is required.
In a sixth aspect, the present application further provides a chip for implementing the method described in the first aspect and each possible implementation manner of the first aspect by executing a software program.
In a seventh aspect, the present application also provides a cluster of computing nodes comprising a plurality of computing devices operable to implement the method provided in the first aspect or any of the possible implementations of the first aspect.
Advantageous effects of any implementation manner of the second aspect to the seventh aspect are described with reference to the first aspect, and are not repeated here.
Drawings
FIG. 1 is a schematic diagram of an artificial intelligence main body framework according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a system architecture according to an embodiment of the present application;
FIG. 3 is a schematic flow chart corresponding to a model training method according to an embodiment of the present application;
FIG. 4 is a schematic structural diagram of a task model according to an embodiment of the present application;
FIG. 5 is a schematic diagram of another task model according to an embodiment of the present application;
FIG. 6 is a schematic structural diagram of a third task model according to an embodiment of the present application;
FIG. 7 is a schematic diagram of a distillation process of a prompt parameter set according to an embodiment of the present application;
FIG. 8 is a schematic diagram of a computing device according to an embodiment of the present application;
fig. 9 is a schematic structural diagram of another computing device according to an embodiment of the present application.
Detailed Description
Embodiments of the present application relate to applications related to neural networks, and in order to better understand the schemes of the embodiments of the present application, related terms and concepts of some neural networks to which the embodiments of the present application may relate are described below.
(1) Neural network
Neural Networks (NNs) are mathematical algorithms that mimic the behavioral characteristics of animal neural networks and perform distributed parallel information processing. The aim of processing information can be achieved by adjusting the connection relation among a large number of nodes in the neural network, and the neural network has self-learning and self-adapting capabilities.
In particular, neural networks may typically comprise multiple layers that are connected end to end, such as convolutional layers, fully-connected layers (fully connected layers, FC), active or pooled layers, and the like. Each layer can be expressed as a function y=f w (x) +b, where f is a function of the function, where f is derivable, w is a weight (or called a weight tensor), b is the bias of the neural unit, x is the input (or called an input tensor), and y is the output (or called an output tensor).
(2) Deep neural network
Deep neural network (deep neural net)work, DNN), also known as a multi-layer neural network, can be understood as a neural network with multiple hidden layers. The DNNs are divided according to the positions of different layers, and the neural networks inside the DNNs can be divided into three types: input layer, hidden layer, output layer. Typically the first layer is the input layer, the last layer is the output layer, and the intermediate layers are all hidden layers. The layers are fully connected, that is, any neuron in the i-th layer must be connected to any neuron in the i+1-th layer. Although DNN appears to be complex, it is not really complex in terms of the work of each layer, simply the following linear relational expression:wherein (1)>Is an input vector, +.>Is the output vector, +.>Is the offset vector, W is the weight matrix (also called coefficient), and α () is the activation function. Each layer is only for the input vector +.>The output vector is obtained by such simple operation>
(3) Convolutional neural network (convolutional neural network CNN)
A convolutional neural network is a deep neural network with a convolutional structure. The convolutional neural network comprises a feature extractor consisting of a convolutional layer and a sub-sampling layer, which can be regarded as a filter. The convolution layer refers to a neuron layer in the convolution neural network, which performs convolution processing on an input signal. In the convolutional layer of the convolutional neural network, one neuron may be connected with only a part of adjacent layer neurons. A convolutional layer typically contains a number of feature planes, each of which may be composed of a number of neural elements arranged in a rectangular pattern. Neural elements of the same feature plane share weights, where the shared weights are convolution kernels. Sharing weights can be understood as the way image information is extracted is independent of location. The convolution kernel can be initialized in the form of a matrix with random size, and reasonable weight can be obtained through learning in the training process of the convolution neural network. In addition, the direct benefit of sharing weights is to reduce the connections between layers of the convolutional neural network, while reducing the risk of overfitting.
(4) Recurrent neural networks (recurrent neural network, RNN)
The recurrent neural network is used to process the sequence data. In the traditional neural network model, from an input layer to an implicit layer to an output layer, the layers are fully connected, and no connection exists for each node between each layer. Although this common neural network solves many problems, it still has no power to solve many problems. For example, you want to predict what the next word of a sentence is, it is generally necessary to use the previous word, because the previous and next words in a sentence are not independent. For example, one has said that: i like travel, where the most favored place is Yunnan, and later have the opportunity to go __________. Here, the filling should be known to humans as filling "yunnan". Because humans will infer from the context, but how to have the machine do this? RNNs have thus been developed. RNNs aim to give robots the ability to memorize as a robot. Thus, the output of the RNN needs to rely on current input information and historical memory information.
RNN is called a recurrent neural network in the sense that a sequence's current output is related to the previous output. The specific expression is that the network memorizes the previous information and applies the previous information to the calculation of the current output, namely, the nodes between the hidden layers are not connected any more and are connected, and the input of the hidden layers comprises not only the output of the input layer but also the output of the hidden layer at the last moment. In theory, RNNs are able to process sequence data of any length. Training for RNNs is the same as training for traditional CNNs or DNNs.
(5) Loss function
In training the deep neural network, since the output of the deep neural network is expected to be as close to the value actually expected, the weight vector of each layer of the neural network can be updated by comparing the predicted value of the current network with the actually expected target value according to the difference between the predicted value of the current network and the actually expected target value (of course, there is usually an initialization process before the first update, that is, the pre-configuration parameters of each layer in the deep neural network), for example, if the predicted value of the network is higher, the weight vector is adjusted to be lower than the predicted value, and the adjustment is continuously performed until the deep neural network can predict the actually expected target value or the value very close to the actually expected target value. Thus, it is necessary to define in advance "how to compare the difference between the predicted value and the target value", which is a loss function (loss function) or an objective function (objective function), which are important equations for measuring the difference between the predicted value and the target value. Taking the loss function as an example, the higher the output value (loss) of the loss function is, the larger the difference is, and then the training of the deep neural network becomes a process of reducing the loss as much as possible.
(6) Migration learning
Transfer learning is a machine learning method, meaning that a pre-trained model (bidirectional encoder representations fromtransformer, BERT) is re-applied in another task. Specifically, the migration learning is to migrate knowledge in one domain (source domain) to another domain (target domain), so that the target domain can obtain a better learning effect.
Artificial intelligence (artificial intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is a branch of computer science that attempts to understand the nature of intelligence and to produce a new intelligent machine that can react in a similar manner to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.
With the continuous development of artificial intelligence technology, natural language man-machine interaction systems that enable man-machine interaction through natural language are becoming more and more important. In the field of natural language processing, tasks mainly include intent classification, emotion analysis, man-machine conversation, and the like. The intention is the intention of the user, i.e. what the user wants to do. Wherein the intent classification, i.e. classifying the user utterances into previously defined intent categories according to the domain and intent to which they relate. For example, text content is also known as text classification (e.g., recognition of "spam", "spam" and the like) or emotion analysis (e.g., recognition of "triage", "hunger" and the like), for example, recognition of the field to which text content belongs, such as "bomber", "aircraft carrier" belongs to military; "flavor man-to-man", "life" belongs to food ", and the like). Man-machine conversations are sometimes also referred to as "conversational behavior," i.e., behavior in which the state or context of information shared by users in a conversation changes and is continually updated. In the man-machine dialogue system, the machine is required to respond accurately to sentence information input by the user on the basis of being able to determine the intention to be expressed by the user.
For such tasks, the task model may be obtained by fine-tuning (training) the pre-training model in combination with training data of the target task. One conventional training method is: firstly, obtaining the labeling data of a target task, then inputting the labeling data into the pre-training model for training, wherein the process is used for adjusting the weight parameters of the pre-training language model, which can also be called as model fine-tuning (fine-tuning), so as to obtain the task model of the target task. It can be seen that in this way the task model is the same as the pre-trained model in that the values of the parameters comprised by the two may be different. The pre-training model is a method for performing self-supervision training by using a large amount of unlabeled data (including various text information such as news, blogs, journals, papers and the like), so that the model can obtain stronger semantic representation capability and migration learning capability, and thus, various tasks can be quickly adapted.
However, the parameter scale of the pre-training model is usually relatively large, some parameter scales can reach billions to trillions, the size of the pre-training model can reach tens of GB to hundreds of GB, and the mode of fine tuning the model can occupy a great amount of calculation force.
In view of this, the embodiment of the application provides a model training method, which introduces additional parameters (noted as prompt parameters) on the basis of a pre-training model for specific tasks, splices the prompt parameters with the pre-training model, fixes the parameter values of the pre-training model in the training process, and trains only the prompt parameters to obtain a trained task model, thereby saving the calculation cost and improving the training efficiency.
FIG. 1 illustrates a schematic diagram of an artificial intelligence framework that describes the overall workflow of an artificial intelligence system, applicable to general artificial intelligence field requirements.
The above-described artificial intelligence topic framework is described in detail below from two dimensions, the "Smart information chain" (horizontal axis) and the "information technology (information technology, IT) value chain" (vertical axis).
The "intelligent information chain" reflects a list of processes from the acquisition of data to the processing. For example, there may be general procedures of intelligent information awareness, intelligent information representation and formation, intelligent reasoning, intelligent decision making, intelligent execution and output. In this process, the data undergoes a "data-information-knowledge-wisdom" gel process.
The "IT value chain" reflects the value that artificial intelligence brings to the information technology industry from the underlying infrastructure of personal intelligence, information (provisioning and processing technology implementation), to the industrial ecological process of the system.
(1) Infrastructure:
the infrastructure provides computing capability support for the artificial intelligence system, realizes communication with the outside world, and realizes support through the base platform.
The infrastructure may communicate with the outside through sensors, and the computing power of the infrastructure may be provided by the smart chip.
The smart chip may be a hardware acceleration chip such as a central processing unit (central processing unit, CPU), a neural network model processor (neural-network processing unit, NPU), a graphics processor (graphics processingunit, GPU), an application specific integrated circuit (application specific integrated circuit, ASIC), or a field programmable gate array (field programmable gate array, FPGA).
The basic platform of the infrastructure can comprise a distributed computing framework, network and other relevant platform guarantees and supports, and can comprise cloud storage, computing, interconnection network and the like.
For example, for an infrastructure, data may be obtained through sensor and external communication and then provided to a smart chip in a distributed computing system provided by the base platform for computation.
(2) Data:
the data of the upper layer of the infrastructure is used to represent the data source in the field of artificial intelligence. The data relate to graphics, images, voice and text, and also relate to internet of things data of traditional equipment, wherein the data comprise service data of an existing system and sensing data such as force, displacement, liquid level, temperature, humidity and the like.
(3) And (3) data processing:
such data processing typically includes data training, machine learning, deep learning, searching, reasoning, decision making, and the like.
Wherein machine learning and deep learning can perform symbolized and formalized intelligent information modeling, extraction, preprocessing, training and the like on data.
Reasoning refers to the process of simulating human intelligent reasoning modes in a computer or an intelligent system, and carrying out machine thinking and problem solving by using formal information according to a reasoning control strategy, and typical functions are searching and matching.
Decision making refers to the process of making decisions after intelligent information is inferred, and generally provides functions of classification, sequencing, prediction and the like.
(4) General capabilities:
after the data has been processed, some general-purpose capabilities can be formed based on the result of the data processing, such as algorithms or a general-purpose system, for example, translation, text analysis, computer vision processing, speech recognition, image recognition, etc.
(5) Intelligent product and industry application:
the intelligent product and industry application refers to products and applications of an artificial intelligent system in various fields, is encapsulation of an artificial intelligent overall solution, and realizes land application by making intelligent information decisions, and the application fields mainly comprise: intelligent manufacturing, intelligent transportation, intelligent home, intelligent medical treatment, intelligent security, automatic driving, intelligent terminals and the like.
The embodiment of the application can be applied to various fields in artificial intelligence, such as intelligent manufacturing, intelligent transportation, intelligent home, intelligent medical treatment, intelligent security, automatic driving and the like. Specifically, the training method of the neural network model in the embodiment of the application can be particularly applied to the fields of automatic driving, image classification, image retrieval, image semantic segmentation, natural language processing and the like, which need to use (deep) neural network models.
Fig. 2 is a schematic diagram of a system architecture according to an embodiment of the present application. In fig. 2, a data acquisition device 160 is used to acquire training data.
For the neural network model of text classification, the training data may include training text and a classification result corresponding to the training text, where the classification result of the training text may be a manually pre-labeled result. After the training data is collected, the data collection device 160 stores the training data in the database 130. It should be noted that, in practical applications, the training data maintained in the database 130 is not necessarily all acquired by the data acquisition device 160, but may be received from other devices, for example, input by the client device 140. It should be noted that the training device 120 is not necessarily completely based on the training data maintained by the database 130 to perform training of the target model/rule 103, and it is also possible to obtain the training data from the cloud or other places to perform model training, which should not be taken as a limitation of the embodiments of the present application.
The training device 120 is configured to obtain an initialized task model based on the initial task model 102 and one of the prompt parameter sets (denoted as a target prompt parameter set) in the prompt parameter pool 171, and train the initialized task model based on training data maintained in the database 130 to obtain a trained task model (denoted as a target model/rule 103). This will be described in detail below. The target model/rule 103 in the embodiment of the present application may be specifically a neural network model. The initial task model 102 includes a pre-training model 101 and a prompt layer, and the initial task model 102 may be generated by the training device 120 or acquired from the AI platform 170. The training device 120 may also be used to train the pre-training model 101 and store the pre-training model 101 in the AI platform 170. Specifically, the neural network model (e.g., the pre-training model 101 or the initial task model) constructed by the embodiments of the present application may include CNN, deep convolutional neural network model (deep convolutional neural networks, DCNN), cyclic neural network model (recurrent neural network, RNNS), and so on. The embodiment of the present application is not limited thereto.
The AI platform 170 may be a single physical server, a desktop computer, a notebook computer, or the like, or may be a server cluster or a distributed system formed by a plurality of physical servers. Illustratively, the training device 120 may be a server cluster corresponding to the AI platform 170 or one of the computing nodes in the distributed system, or may be a physical server or virtual machine independent of the AI platform 170.
The target model/rule 103 obtained by training the training device 120 may be applied to different systems or devices, such as the execution device 110 shown in fig. 2, where the execution device 110 may be a terminal, such as a mobile phone terminal, a tablet computer, a notebook computer, an augmented reality (augmented reality, AR) AR/Virtual Reality (VR), a vehicle-mounted terminal, or the like, or may be a cloud server, a network server, an application server, a management server, or other devices or servers having a data processing function. In fig. 2, the execution device 110 configures an input/output (I/O) interface 112 for data interaction with an external device, and a user may input data to the I/O interface 112 through the client device 140, where the input data may include in an embodiment of the present application: text information to be processed entered by the client device.
In the process related to the execution of the computation or the like by the computation module 111 of the execution device 110, the execution device 110 may call the data, the code or the like in the data storage system 150 for the corresponding process, or may store the classification result or the like obtained by the corresponding process in the data storage system 150.
Finally, the I/O interface 112 returns the processing result, such as the text classification result obtained as described above, to the client device 140, thereby providing the processing result to the user.
It should be noted that the training device 120 may generate, based on different training data, a corresponding target model/rule 103 for different targets or different tasks, where the corresponding target model/rule 103 may be used to achieve the targets or complete the tasks, thereby providing the user with the desired result.
In the case shown in FIG. 2, the user may manually give input data that may be manipulated through an interface provided by the I/O interface 112. In another case, the client device 140 may automatically send the input data to the I/O interface 112, and if the client device 140 is required to automatically send the input data requiring the user's authorization, the user may set the corresponding permissions in the client device 140. The user may view the results output by the execution device 110 at the client device 140, and the specific presentation may be in the form of a display, a sound, an action, or the like. The client device 140 may also be used as a data collection terminal to collect input data of the input I/O interface 112 and output results of the output I/O interface 112 as new sample data as shown in the figure, and store the new sample data in the database 130. Of course, instead of being collected by the client device 140, the I/O interface 112 may directly store the input data input to the I/O interface 112 and the output result output from the I/O interface 112 as new sample data into the database 130.
It should be noted that fig. 2 is only a schematic diagram of a system architecture provided by an embodiment of the present application, and the positional relationship among devices, apparatuses, modules, etc. shown in the drawing is not limited in any way, for example, in fig. 2, the data storage system 150 is an external memory with respect to the execution device 110, and in other cases, the data storage system 150 may be disposed in the execution device 110. In addition, only one client device 140 is shown in fig. 2, and in practical applications, there may be multiple client devices 140, and different client devices 140 may trigger different tasks.
In another possible scenario, the client device 140 may directly serve as an executing device, directly receive input from the user and directly process the input by the hardware of the client device 140 itself, and the specific process is similar to that of the previous scenario, and reference is made to the above description and will not be repeated here. It should be understood that the foregoing is illustrative of an application scenario, and is not intended to limit the application scenario of the present application in any way.
The method of model training provided in the embodiment of the present application is applied to the system shown in fig. 2, and is described in detail below with reference to fig. 3. Fig. 3 is a schematic flow chart corresponding to a model training method provided in an embodiment of the present application, and as understood in conjunction with fig. 2 and fig. 3, the method may include the following steps:
In step 301, the training device 120 obtains training data for a target task.
As previously described, the training device 120 may obtain training data for the target task from the database 130 and/or obtain training data for the target task from other sources, which is not limited by the present application.
In step 302, the training device 120 obtains the pre-training model 101 corresponding to the target task.
In one possible scenario, as shown in FIG. 2, the training device 120 obtains the pre-training model 101 corresponding to the target task from the AI platform 170. In another possible scenario, the pre-training model 101 may also be stored in the training device 120, such that the training device 120 obtains the pre-training model 101 locally.
In step 303, the training device 120 generates an initial task model based on the pre-training model 101.
Illustratively, the task model includes a pre-training model 101 and at least one prompt layer, and in particular, the one prompt layer may be spliced with one layer in the pre-training model 101, for example, as shown in fig. 4, the task model includes one prompt layer spliced with an input layer of the pre-training model 101. Referring to fig. 5, the task model includes a prompt layer that is spliced with the middle layer of the pre-training model 101. Referring to fig. 6, the task model may further include multiple prompt layers, where one prompt layer may be spliced with one of the layers of the pre-training model 101. The splicing mode is not limited.
It can be seen that the number of layers included in the task model is the same as the number of layers included in the pre-trained model. In the task model, parameters included in the pre-training model may be referred to as original parameters, and parameters included in the hint layer may be referred to as hint parameters. Specifically, the type of the prompting parameter may be the foregoing weight or bias, or other parameters, such as word vectors, which are not limited in the embodiment of the present application. A set of hint parameters may include values for all hint parameters included by a task model. It should be noted that fig. 4 to fig. 6 are only examples, and the number of layers of the task model and the number of layers of the pre-training model 101 may also be different, for example, the prompt layer in the task model is an independent layer, which is not limited in the embodiment of the present application.
It should be noted that, in the initial task model, the value of the prompt parameter is a random value or a default value or a preset value, which is not limited in particular.
It should be noted that, there is no strict timing limitation between the step 301 and the step 302, the step 302 may be performed before the step 302, or may be performed after the step 302, or the step 301 and the step 302 may be performed simultaneously. It should be further noted that, steps 302 and 303 are optional steps, for example, the training device 120 may also obtain the initial task model from the AI platform 170, in other words, steps 302 and 303 may be replaced by the training device obtaining the initial task model from the AI platform 170. In practical applications, the initial task models used by different tasks may be the same, except that the values of the hint parameters in the task models corresponding to the different tasks may be different.
For convenience of description, the initial task model is referred to as an initial task model 102, the AI platform 170 may store the initial task model 102, and for different tasks, the training device 120 may obtain the same initial task model 102 from the AI platform 170, so that the training device 120 is prevented from frequently creating task models, and calculation power is saved. Further, the training device 120 may also store the initial task model 102, thereby saving network bandwidth resources for transmitting the initial task model 102. Of course, when the initial task models corresponding to different tasks are different, the initial task model of the task may also be generated and stored by the AI platform, which is not limited by the present application.
At step 304, the training device 120 determines a set of target cue parameters for the initial task model 102.
As shown in FIG. 2, there is a pool of hint parameters in the AI platform 170, which can include a plurality of hint parameter sets, wherein each hint parameter set includes one or more hint parameter values, as previously described, and in the present application, a hint parameter set can be composed of values of a plurality of hint parameters included in a task model.
For example, assuming that the initial task models 102 of different tasks are the same in the present application, or that the architecture of the initial task models 102 is the same (the values of the hint parameters may be different), it is understood that the values of the hint parameters in the trained task models may be different. For example, taking fig. 4 as an example, the hint parameters in the initial task model 102 include: w (W) 01 、W 02 、W 03 、W 04 、W 05 . Wherein W is 0i Representing the weight of the ith neuron of the input layer. Illustratively, the values of the hint parameters in the initial task model 102 can be random numbers.
If task 1, task 2, and task 3 are assumed to exist and correspond to the initial task model, W in the task model of task 1 after training 01 =m 1 ,W 02 =n 1 、W 03 =i 1 、W 04 =j 1 、W 05 =k 1 The method comprises the steps of carrying out a first treatment on the surface of the In the task model of the trained task 2, W 01 =m 2 ,W 02 =n 2 、W 03 =i 2 、W 04 =j 2 、W 05 =k 2 The method comprises the steps of carrying out a first treatment on the surface of the In the task model of the trained task 3, W 01 =m 3 ,W 02 =n 3 、W 03 =i 3 、W 04 =j 3 、W 05 =k 3 The method comprises the steps of carrying out a first treatment on the surface of the Wherein m is i 、n i 、i i 、j i 、k i All represent numerical values, and the numerical range is not particularly limited.
A prompt parameter set obtained by the task model of the trained task 1 is { m } 1 、n 1 、i 1 、j 1 、k 1 }. Similarly, a set of prompt parameters for the trained task 2 task model is { m } 2 、n 2 、i 2 、j 2 、k 2 }. A prompt parameter set obtained by the task model of the trained task 3 is { m } 3 、n 3 、i 3 、j 3 、k 3 }。
The present application may store the set of prompt parameters obtained based on the trained task model in a prompt parameter pool, such as the prompt parameter pool in the AI platform 170, including but not limited to: { m 1 、n 1 、i 1 、j 1 、k 1 }、{m 2 、n 2 、i 2 、j 2 、k 2 }、{m 3 、n 3 、i 3 、j 3 、k 3 }. For example only, the present application is not limited to the number of hint parameters included in the hint parameter set and the number of hint parameter sets included in the hint parameter pool.
It should be appreciated that the set of hint parameters in the hint parameter pool can be generated prior to step 304.
In performing step 304, the training device 120 may extract a portion of the training data from the training data of the target task as a verification set, use the verification set to traverse the set of prompt parameters in the prompt parameter pool, and select a set of prompt parameters that meets the preset condition (i.e., a target set of prompt parameters). The preset condition may be that the loss function value is minimum, for example. The process may include: selecting a prompt parameter set (recorded as a first prompt parameter set) from the prompt parameter pool, assigning prompt parameters in the initial task model by using the first prompt parameter set, inputting the verification set into the assigned task model, and calculating a loss function value of the first prompt parameter set.
Next, a new prompt parameter set (noted as a second prompt parameter set) is selected from the remaining prompt parameter sets, the prompt parameters in the initial task model are assigned by using the second prompt parameter set, the verification set is input into the assigned task model, and the loss function value of the second prompt parameter set is calculated.
And then selecting a new prompt parameter set (marked as a third prompt parameter set) from the rest prompt parameter sets, using the third prompt parameter set to assign the prompt parameters in the initial task model, inputting the verification set into the assigned task model, and calculating the loss function value of the third prompt parameter set.
And so on, respectively calculating the loss function value of each prompting parameter set in the prompting parameter pool.
It should be noted that the set of hint parameters selected here is of the same type as the set of hint parameters corresponding to the initial task model. The same type means that the hint parameters related to the hint parameter set are the same, and the values of the hint parameters may be the same or different.
For example, in connection with the above example, assuming that the initial task model of the target task is the model shown in fig. 4, the first hint parameter set may be { m } 1 、n 1 、i 1 、j 1 、k 1 Assigning the initial task model by using the first prompt parameter set, wherein W is the task model of the target task 01 =m 1 ,W 02 =n 1 、W 03 =i 1 、W 04 =j 1 、W 05 =k 1 The method comprises the steps of carrying out a first treatment on the surface of the And then inputting the verification set into the assigned task model, and calculating the loss function value of the first prompt parameter set. Specifically, a loss function value may be obtained for each input data, where an average value may be obtained or other algorithms may be used according to the loss function value of each input data included in the verification set to obtain the loss function value of the first prompting parameter set, and similar parts are not described herein.
Assume that the second prompt parameter set is { m } 2 、n 2 、i 2 、j 2 、k 2 Assigning W in the task model of the target task after the initial task model is assigned by using the second prompt parameter set 01 =m 2 ,W 02 =n 2 、W 03 =i 2 、W 04 =j 2 、W 05 =k 2 The method comprises the steps of carrying out a first treatment on the surface of the And then inputting the verification set into the assigned task model, and calculating the loss function value of the second prompt parameter set.
Assume that the third hint parameter set is { m } 3 、n 3 、i 3 、j 3 、k 3 Assigning W in the task model of the target task after the initial task model is assigned by using the third prompt parameter set 01 =m 3 ,W 02 =n 3 、W 03 =i 3 、W 04 =j 3 、W 05 =k 3 The method comprises the steps of carrying out a first treatment on the surface of the And then inputting the verification set into the assigned task model, and calculating the loss function value of the third prompt parameter set.
And the loss function value of each prompt parameter set of the same type is calculated respectively, and the prompt parameter set with the minimum loss function value is used as the target prompt parameter set.
In the application, a plurality of types of prompt parameter sets can exist in the prompt parameter pool, wherein at least one different prompt parameter exists in one or more prompt parameters related in the prompt parameter sets or the number of the contained prompt parameters is different, for example, at least three types exist in the prompt parameter pool: { W 01 、W 02 、W 03 、W 04 、W 05 }、{W 01 、W 02 、W 03 、W 04 }、{W 01 、W 02 、W 03 、W 04 、W 06 -a }; { m hereinbefore 1 、n 1 、i 1 、j 1 、k 1 }、{m 2 、n 2 、i 2 、j 2 、k 2 Sum { m } 3 、n 3 、i 3 、j 3 、k 3 All of the types { W } are 01 、W 02 、W 03 、W 04 、W 05 }. If a plurality of types of prompt parameter sets exist in the prompt parameter pool, the prompt parameter sets corresponding to the task model of the target task can be screened out to be the same type of prompt parameter set, and then the target prompt parameter set is determined from the screened prompt parameter sets. Or, a part of prompt parameter sets can be further screened out from the prompt parameter sets of the same type, and then only the screened part of prompt parameter sets are traversed, and a target prompt parameter set is selected from the part of prompt parameter sets. The filtering mode may be selected randomly or according to task types, for example, the target task is a task with text classification, for example, the news information app generally includes a task with text classification, and then the prompt parameter sets of the task model with the same task type may be selected from the prompt parameter pool, and one of the prompt parameter sets is selected as the target prompt parameter set.
In an alternative embodiment, knowledge distillation may be performed on the alert parameter sets in the alert parameter pool when a set condition is met, e.g., the number of alert parameter sets in the alert parameter pool reaches a preset threshold. For another example, knowledge distillation may be performed periodically, and the set condition may be that the interval duration reaches a preset duration, and so on.
For convenience of description, the prompt parameter set before distillation is referred to as the original prompt parameter set in fig. 7, and the prompt parameter set after distillation is referred to as the meta prompt parameter set. Illustratively, the knowledge distillation process may include:
(1) The original hint parameter set is converted into a feature vector.
(2) And clustering the feature vectors.
(3) A set of meta-hint parameters is generated based on the feature vectors included in each category.
For example, taking an original hint parameter set as an example, in one embodiment, hint parameters in the original hint parameter set may be spliced to obtain a feature vector, e.g., the original hint parameter set is { m } 1 、n 1 、i 1 、j 1 、k 1 And its correspondent feature vector is [ m ] 1 、n 1 、i 1 、j 1 、k 1 ]. Based on the method, each original hint parameter set in the pool of hint parameters is converted into a feature vector.
And clustering the converted plurality of feature vectors to obtain one or more categories. The clustering method may be an existing algorithm, such as k-means, or may be a clustering algorithm possibly applied in the future, which is not limited in particular. And then, calculating a meta-prompt parameter set corresponding to each category.
Taking one class as an example, the meta-hint parameter set corresponding to the class can be based on a weighted average of each feature vector and the weight value of the feature vector included in the class. Illustratively, the process may include: the center of the category is calculated based on the feature vectors included in the category, and illustratively, the element included in the feature vector corresponding to the center is equal to the average value of the same elements in the feature vectors in the category. Alternatively, the feature vector of the center may be calculated in other ways, such as distance-based calculation. And then, calculating the distance between each feature vector and the center, and determining the weight value of each feature vector based on the distance between each feature vector and the center, wherein the weight value can be the distance between each feature vector and the center or the ratio determined based on the distance between each feature vector and the center. Each feature vector in the category is weighted averaged with the weight value of the feature vector.
In another embodiment, when generating the feature vector, each of the prompt parameters in an original prompt parameter set may be added to obtain a one-dimensional feature vector, and the subsequent process is the same as the above, which is not repeated here.
In embodiments of the present application, training device 120 may also traverse the set of meta-hint parameters to select a set of target hint parameters from one or more sets of source hint parameters. The process may refer to the relevant description of selecting the target hint parameter set from the original hint parameter set, which is not described herein.
By adopting the mode, the target prompt parameter set is selected from the meta prompt parameter sets, so that the selection of the target prompt parameter set can be accelerated, and the calculation force is saved.
It should be noted that, in the above example, the loss function values of each cue parameter set are calculated in a serial manner, and the present application may also use a parallel manner, for example, a plurality of processes calculate the loss function values of a plurality of cue parameter sets at the same time, so as to improve the execution efficiency of step 304. In addition, it should be noted that the above preset condition is only an example, and the preset condition in the present application may be the maximum value of the output result accuracy, that is, after the selected prompt parameter set is used to assign a value to the initial task model, the verification set is input to the task model after assignment, the output result accuracy of the task model is calculated, and the prompt parameter set with the highest accuracy is used as the target prompt parameter set. Reference is made to the foregoing manner of loss function values, which is not described in detail herein.
According to the method, the verification set is used for determining the target prompt parameter set, so that the calculation overhead can be reduced on the basis of guaranteeing the true and effective training results. Of course, the verification set may be replaced by all training data of the target task, that is, all training data of the target task is used to select the target prompt parameter set, which is not limited by the present application.
And 305, obtaining an initialized task model based on the target prompt parameter set.
Specifically, the initial task model is assigned by using the target prompt parameter set to obtainTo the initialized task model. For example, in the above example, the target hint parameter set is { m } 3 、n 3 、i 3 、j 3 、k 3 Then W in the initialized task model 01 =m 3 ,W 02 =n 3 、W 03 =i 3 、W 04 =j 3 、W 05 =k 3 . It is noted that after initialization, the values of the parameters belonging to the pre-training model in the task model are unchanged.
Step 306, training the initialized task model by using training data of the target task to obtain a trained task model.
Taking a target task as a text classification as an example, for a neural network model of text classification, training data can comprise training texts and classification results corresponding to the training texts, wherein the classification results of the training texts can be manually pre-labeled results. For example, the training text is "flavor man-made", and the classification result is "food". For another example, the training text is "bomber" and the classification result is "military". This is only illustrative to keep the conciseness, and in practice the body size of the training data is large.
The training device 120 trains the initialized task model based on training data of the target task, resulting in a trained task model (denoted as target model/rule 103).
The training device 120 obtains the target model/rule 103 based on the training data, the training device 120 processes the input training text, compares the output classification result with the input classification result, and adjusts the prompt parameters in the task model until the classification result output by the training device 120 is consistent with the input classification result, thereby completing the training of the target model/rule 103.
The task model of the embodiment of the application can be suitable for a text input scene, for example, a text recognition task such as intelligent customer service 'ali honey', a text input scene, for example, a voice recognition task such as 'little', 'little love', 'heaven cat fairy', and the like, and the embodiment of the application is not limited.
Step 307, the set of prompt parameters obtained based on the trained task model is stored in the prompt parameter pool.
Specifically, the set of prompt parameters includes values of various prompt parameters in the trained task model.
In the above manner, the task model includes a pre-training model and a prompt layer (the prompt layer includes one or more prompt parameters), a target prompt parameter set is selected from the prompt parameter pool, the prompt layer of the task model is assigned by using the target prompt parameter set to obtain an initialized task model, and then the prompt parameters of the initialized task model are finely tuned by using training data of the target task to obtain the trained task model. The initialized task model obtained based on the target prompt parameter set is trained, so that training efficiency can be improved, and calculation overhead can be saved. Furthermore, different tasks can share the same pre-training model, and when the tasks are executed, a prompt parameter set of the tasks (a prompt parameter set corresponding to the trained task model) is input, so that the calculation overhead is further saved.
Based on the same inventive concept as the method embodiment, the present application also provides a computing device for executing the method executed by the training apparatus 120 in the method embodiment of fig. 3. As shown in fig. 8, computing device 800 includes a selection module 801, a generation module 802, and a training module 802; specifically, in the computing device 800, connections are established between the modules through communication paths.
A selection module 801 that selects a first set of hint parameters (i.e., the target hint parameter set of fig. 3) from a pool of hint parameters of the AI platform; the detailed implementation is described with reference to step 304 in fig. 3, and will not be described herein.
A generating module 802, configured to obtain an initialized task model based on the first prompt parameter set, where the task model includes a prompt layer and a base model, and the first prompt parameter set is used to initialize the prompt layer, and the base model is a pre-trained AI model deployed in the AI platform; the detailed implementation is described with reference to step 305 in fig. 3, and will not be described herein.
And a training module 803 for training the initialized task model based on the training data set to obtain a trained task model. The detailed implementation is described with reference to step 306 in fig. 3, and will not be described herein.
In one possible implementation manner, after obtaining the trained task model, the training module 803 is further configured to store a second set of prompt parameters into the prompt parameter pool, where the second set of prompt parameters is a set of parameters in a prompt layer in the trained task model. The detailed implementation is described with reference to step 307 in fig. 3, and will not be described again here.
In one possible implementation manner, the first prompt parameter set is a prompt parameter set meeting a set condition in the prompt parameter pool. Illustratively, the setting conditions include: obtaining a loss function value of the task model based on one prompt parameter set in the prompt parameter pool and the training sample, and taking the prompt parameter set corresponding to the minimum value of the loss function as the first prompt parameter set; or obtaining the accuracy of the task model based on one prompt parameter set in the prompt parameter pool and the training sample, and taking the prompt parameter set corresponding to the maximum accuracy as the first prompt parameter set.
In a possible implementation manner, the prompt parameter pool includes an original prompt parameter set and/or a meta prompt parameter set, wherein one original prompt parameter set includes various prompt parameters included in a prompt layer in a task model of other tasks after training, and the task model of the other tasks includes the basic model and the prompt layer; wherein, a meta-hint parameter set is obtained by processing one or more original hint parameter sets.
As with the concepts described above, the present application provides a computing device 900, as shown in fig. 9, for performing the method performed by the training apparatus 120 in the method embodiment of fig. 3 described above.
The processor 901, the memory 902 and the communication interface 904 may be connected through the bus system 903, where the memory 902 may store instructions, and the processor 901 may be configured to execute the instructions stored in the memory 902 to control the communication interface 904 to receive or send signals, so as to complete the steps of using the cloud virtual machine body in the method shown in fig. 8.
The memory 902 may be integrated into the processor 901 or may be a physical entity different from the processor 901.
As an implementation, the functions of the communication interface 904 may be considered to be implemented by a transceiver circuit or a dedicated chip for transceiving. Processor 901 may be considered to be implemented by a dedicated processing chip, processing circuit, processor, or general-purpose chip.
As another implementation, a computer may be considered to implement the functionality of the training device 120 in the method embodiment of FIG. 3 provided by the embodiment of the present application. I.e. program code that implements the functions of the processor 901 and the communication interface 904, is stored in the memory 902, and a general purpose processor may implement the functions of the processor 901 and the communication interface 904 by executing the code in the memory.
The concepts, explanations and detailed descriptions related to the technical solutions provided by the present application and other steps related to the computing device 900 may be referred to the foregoing method or the description of other embodiments related to these matters, which are not repeated herein.
The embodiment of the present application further provides a computer storage medium, in which computer instructions are stored, and when the computer instructions run on a computer, the computer is caused to perform the steps of the related method to implement the method performed by the training device 120 in the foregoing embodiment, and descriptions of the steps of fig. 3 are omitted herein, and details of the descriptions of the steps are omitted herein.
The embodiment of the present application further provides a computer program product, which when executed on a computer, causes the computer to perform the above-mentioned related steps to implement the method performed by the training device 120 in the above-mentioned embodiment, and descriptions of the steps in fig. 3 are omitted herein and omitted herein.
In addition, the embodiment of the application also provides a device, which can be a chip, a component or a module, and can comprise a processor and a power supply circuit which are connected; the power supply circuit is configured to provide power for the processor, and when the apparatus is running, the processor may execute the computer-executable instructions to cause the chip to execute the method executed by the training device 120 in the embodiments of the methods described above, and descriptions of the steps in fig. 3 are omitted herein.
The computer storage medium, the computer program product, or the chip provided in the embodiments of the present application are used to perform the method performed by the training device 120 in the foregoing, and the descriptions of the steps in fig. 3 are omitted herein. Therefore, the advantages achieved by the method can be referred to as the advantages in the corresponding method provided above, and will not be described herein.
It will be appreciated by those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional modules is illustrated, and in practical application, the above-described functional allocation may be performed by different functional modules according to needs, i.e. the internal structure of the apparatus is divided into different functional modules to perform all or part of the functions described above.
In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of modules or units is merely a logical function division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another apparatus, or some features may be omitted or not performed. Alternatively, the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and the parts shown as units may be one physical unit or a plurality of physical units, may be located in one place, or may be distributed in a plurality of different places. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit (or module) in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a readable storage medium. Based on such understanding, the technical solution of the embodiments of the present application may be essentially or a part contributing to the prior art or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, including several instructions for causing a device (may be a single-chip microcomputer, a chip or the like) or a processor (processor) to perform all or part of the steps of the methods of the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read Only Memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Alternatively, the computer-executable instructions in the embodiments of the present application may be referred to as application program codes, which are not particularly limited in the embodiments of the present application.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present application, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device including one or more servers, data centers, etc. that can be integrated with the available medium. The usable medium may be a magnetic medium (e.g., a floppy Disk, a hard Disk, a magnetic tape), an optical medium (e.g., a DVD), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.
The various illustrative logical blocks and circuits described in connection with the embodiments of the present application may be implemented or performed with a general purpose processor, a digital signal processor, an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof designed to perform the functions described herein. A general purpose processor may be a microprocessor, but in the alternative, the general purpose processor may be any conventional processor, controller, microcontroller, or state machine. A processor may also be implemented as a combination of computing devices, e.g., a digital signal processor and a microprocessor, a plurality of microprocessors, one or more microprocessors in conjunction with a digital signal processor core, or any other similar configuration.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software unit executed by a processor, or in a combination of the two. The software elements may be stored in RAM memory, flash memory, ROM memory, EPROM memory, EEPROM memory, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. In an example, a storage medium may be coupled to the processor such that the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an ASIC.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
Although the application has been described in connection with specific features and embodiments thereof, it will be apparent that various modifications and combinations can be made without departing from the spirit and scope of the application. Accordingly, the specification and drawings are merely exemplary illustrations of the present application as defined in the appended claims and are considered to cover any and all modifications, variations, combinations, or equivalents that fall within the scope of the application. It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (15)

1. A method of model training, comprising:
selecting a first prompting parameter set from a prompting parameter pool of the artificial intelligence AI platform;
based on the first prompt parameter set, an initialized task model is obtained, wherein the task model comprises a prompt layer and a basic model, the first prompt parameter set is used for initializing the prompt layer, and the basic model is a pre-trained AI model deployed in the AI platform;
and training the initialized task model based on the training data set to obtain a trained task model.
2. The method of claim 1, wherein after obtaining the trained task model, the method further comprises:
and storing a second prompt parameter set into the prompt parameter pool, wherein the second prompt parameter set is a set formed by parameters in a prompt layer in the trained task model.
3. The method according to claim 1 or 2, wherein the first set of hint parameters is a set of hint parameters in the pool of hint parameters that meet a set condition.
4. A method according to claim 3, wherein the setting conditions comprise: obtaining a loss function value of the task model based on one prompt parameter set in the prompt parameter pool and the training sample, and taking the prompt parameter set corresponding to the minimum value of the loss function as the first prompt parameter set; or (b)
And obtaining the accuracy of the task model based on one prompt parameter set in the prompt parameter pool and the training sample, and taking the prompt parameter set corresponding to the maximum accuracy as the first prompt parameter set.
5. A method according to any one of claims 1-4, wherein the prompt parameter pool comprises an original prompt parameter set and/or a meta prompt parameter set, wherein one original prompt parameter set comprises prompt parameters included in a prompt layer in a task model of another task after training, and wherein the task model of the other task comprises the base model and the prompt layer; wherein, a meta-hint parameter set is obtained by processing one or more original hint parameter sets.
6. The method of claim 5, wherein the set of meta-hint parameters is generated by:
clustering the original prompt parameter sets in the prompt parameter pool to obtain one or more categories;
a set of meta-hint parameters is generated based on one or more of the original hint parameter sets included in one of the categories.
7. The method of claim 6, wherein generating a meta-hint parameter set from one or more original hint parameter sets included in one of the categories comprises:
Acquiring a weight value of each original prompt parameter set in one or more original prompt parameter sets included in the category; the weight value is the distance between the original prompt parameter set and the center point of the category;
and calculating the values of the prompt parameters in the meta prompt parameter set according to each original prompt parameter set and the weight value.
8. A model training apparatus, the apparatus comprising:
the selection module is used for selecting a first prompt parameter set from the prompt parameter pool of the AI platform;
the generation module is used for obtaining an initialized task model based on the first prompt parameter set, wherein the task model comprises a prompt layer and a basic model, the first prompt parameter set is used for initializing the prompt layer, and the basic model is a pre-trained AI model deployed in the AI platform;
and the training module is used for training the initialized task model based on the training data set to obtain a trained task model.
9. The apparatus of claim 8, wherein the training module, after obtaining the trained task model, is further to: and storing a second prompt parameter set into the prompt parameter pool, wherein the second prompt parameter set is a set formed by parameters in a prompt layer in the trained task model.
10. The apparatus of claim 8 or 9, wherein the first set of hint parameters is a set of hint parameters in the pool of hint parameters that satisfy a set condition.
11. The apparatus of claim 10, wherein the set condition comprises: obtaining a loss function value of the task model based on one prompt parameter set in the prompt parameter pool and the training sample, and taking the prompt parameter set corresponding to the minimum value of the loss function as the first prompt parameter set; or (b)
And obtaining the accuracy of the task model based on one prompt parameter set in the prompt parameter pool and the training sample, and taking the prompt parameter set corresponding to the maximum accuracy as the first prompt parameter set.
12. The apparatus of any of claims 8-11, wherein the pool of hint parameters includes a set of original hint parameters and/or a set of meta hint parameters, wherein one set of original hint parameters includes hint parameters included in a hint layer in a task model of other tasks after training, wherein the task model of other tasks includes the base model and hint layer; wherein, a meta-hint parameter set is obtained by processing one or more original hint parameter sets.
13. A computing device, characterized in that the apparatus comprises a processor and a memory, the memory having stored therein a computer executable program for causing the processor to perform the method of any of the preceding claims 1-7 when invoked by the processor.
14. A cluster of computing nodes, characterized in that it comprises a plurality of computing devices for executing the method of any of the preceding claims 1-7.
15. A computer readable storage medium storing a program which, when called by a processor, performs the method of any one of claims 1-7.
CN202210604357.5A 2022-05-30 2022-05-30 Model training method and device Pending CN117217292A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202210604357.5A CN117217292A (en) 2022-05-30 2022-05-30 Model training method and device
PCT/CN2023/077318 WO2023231458A1 (en) 2022-05-30 2023-02-21 Model training method and apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210604357.5A CN117217292A (en) 2022-05-30 2022-05-30 Model training method and device

Publications (1)

Publication Number Publication Date
CN117217292A true CN117217292A (en) 2023-12-12

Family

ID=89026824

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210604357.5A Pending CN117217292A (en) 2022-05-30 2022-05-30 Model training method and device

Country Status (2)

Country Link
CN (1) CN117217292A (en)
WO (1) WO2023231458A1 (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111353076B (en) * 2020-02-21 2023-10-10 华为云计算技术有限公司 Method for training cross-modal retrieval model, cross-modal retrieval method and related device
US11521075B2 (en) * 2020-05-15 2022-12-06 Microsoft Technology Licensing, Llc Transfer learning system for automated software engineering tasks
CN114154641A (en) * 2020-09-07 2022-03-08 华为云计算技术有限公司 AI model training method and device, computing equipment and storage medium
CN114862493A (en) * 2022-04-07 2022-08-05 北京中科深智科技有限公司 Generation model for generating personalized commodity description based on light-weight fine adjustment
CN115018043A (en) * 2022-04-28 2022-09-06 阿里巴巴(中国)有限公司 Model training method and device, computer readable storage medium and computer equipment

Also Published As

Publication number Publication date
WO2023231458A1 (en) 2023-12-07

Similar Documents

Publication Publication Date Title
CN110796190B (en) Exponential modeling with deep learning features
CN111797893B (en) Neural network training method, image classification system and related equipment
JP7403909B2 (en) Operating method of sequence mining model training device, operation method of sequence data processing device, sequence mining model training device, sequence data processing device, computer equipment, and computer program
CN111797895B (en) Training method, data processing method, system and equipment for classifier
US11360927B1 (en) Architecture for predicting network access probability of data files accessible over a computer network
US11068747B2 (en) Computer architecture for object detection using point-wise labels
US20190228297A1 (en) Artificial Intelligence Modelling Engine
Alatabani et al. Deep learning approaches for IoV applications and services
WO2022116905A1 (en) Data processing method and apparatus
Lin et al. SpikeCD: a parameter-insensitive spiking neural network with clustering degeneracy strategy
US20200302171A1 (en) Neural network trained by homographic augmentation
KR102601446B1 (en) Method, device and system for providing sales product matching platform service based on influencer using artificial intelligence model
CN116830122A (en) Method, system and apparatus for joint learning
CN113609337A (en) Pre-training method, device, equipment and medium of graph neural network
CN117217292A (en) Model training method and device
KR102644593B1 (en) An AI differentiation based HW-optimized Intelligent Software Development Tools for Developing Intelligent Devices
US20230139437A1 (en) Classifier processing using multiple binary classifier stages
KR102618066B1 (en) Method, device and system for strengthening military security based on natural language process and image compare in soldier based community application
US20230359208A1 (en) Computer Architecture for Identification of Nonlinear Control Policies
Kaur et al. Machine Learning and its Applications-A Review Study
CN113779396B (en) Question recommending method and device, electronic equipment and storage medium
Wöhlke et al. Learning Hierarchical Planning-Based Policies from Offline Data
KR102429832B1 (en) Method, device and system for providing remote access service based on analysis of network environment
WO2023071793A1 (en) Neural network construction method and apparatus
CN117574992A (en) Method, device, equipment and storage medium for distributing embedded dimension

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication