CN117094361A - Method for selecting parameter efficient fine adjustment module - Google Patents

Method for selecting parameter efficient fine adjustment module Download PDF

Info

Publication number
CN117094361A
CN117094361A CN202311352064.3A CN202311352064A CN117094361A CN 117094361 A CN117094361 A CN 117094361A CN 202311352064 A CN202311352064 A CN 202311352064A CN 117094361 A CN117094361 A CN 117094361A
Authority
CN
China
Prior art keywords
hidden state
input sample
parameter
final
efficient fine
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311352064.3A
Other languages
Chinese (zh)
Other versions
CN117094361B (en
Inventor
游世学
郭锐
王丙栋
乔亚飞
徐峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhongke Huilian Technology Co ltd
Original Assignee
Beijing Zhongke Huilian Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhongke Huilian Technology Co ltd filed Critical Beijing Zhongke Huilian Technology Co ltd
Priority to CN202311352064.3A priority Critical patent/CN117094361B/en
Publication of CN117094361A publication Critical patent/CN117094361A/en
Application granted granted Critical
Publication of CN117094361B publication Critical patent/CN117094361B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Other Investigation Or Analysis Of Materials By Electrical Means (AREA)

Abstract

The invention provides a method for selecting a parameter efficient fine adjustment module, which comprises the following steps: acquiring a parameter efficient fine adjustment module to be selected and a final hidden state of an input sample; constructing a parameter efficient fine-tuning super network and obtaining the final characterization of the input sample according to the final hidden state of the input sample; the final characterization of the input sample is multiplication of the learning coefficient with the final hidden state of the input sample; judging whether the learning coefficient is larger than a threshold value, if so, selecting the parameter efficient fine adjustment module to be selected, and if not, discarding the parameter efficient fine adjustment module to be selected. The invention solves the problems of small selectivity and large consumption in the training process of the large-scale language model fine tuning method in the prior art.

Description

Method for selecting parameter efficient fine adjustment module
Technical Field
The invention relates to the technical field of language models, in particular to a method for selecting a parameter efficient fine tuning module.
Background
The existing large-scale language model, though more and more powerful, shows a certain general learning capability, namely, by observing a few groups of examples, the existing large-scale language model can have a certain completion capability for tasks which are never seen. However, to support different needs of customers in different application scenarios, we may not be able to just run the exact same model. Therefore, we may need to customize the model based on the customer data. For example, a client has a requirement for privacy protection, the input data is dialogue data subjected to encryption processing, and possibly text is completely different, and the model needs to be customized in order to understand the dialogue content and formulate reply content. In order to use a large model base to meet the requirements of different customization tasks, a parameter efficient fine-tuning method is needed.
The existing method uses only one parameter efficient fine tuning method to fine tune a large model for one task, and the training process is consumed greatly.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention aims to provide a method for selecting a parameter efficient fine tuning module, and solves the problems of low selectivity and high consumption in the training process of a large-scale language model fine tuning method in the prior art.
In order to achieve the above object, the present invention provides the following solutions:
a method of selecting a parameter efficient trim module, comprising:
acquiring a parameter efficient fine adjustment module to be selected and a final hidden state of an input sample;
constructing a parameter efficient fine-tuning super network and obtaining the final characterization of the input sample according to the final hidden state of the input sample; the final characterization of the input sample is multiplication of the learning coefficient with the final hidden state of the input sample;
judging whether the learning coefficient is larger than a threshold value, if so, selecting the parameter efficient fine adjustment module to be selected, and if not, discarding the parameter efficient fine adjustment module to be selected.
Preferably, the obtaining the final hidden state of the parameter to be selected efficient fine tuning module and the input sample includes:
acquiring a first hidden state of an input sample;
obtaining a second hidden state of the input sample according to the first hidden state and part of the operation of the transducer layer;
obtaining a third hidden state of the input sample according to the second hidden state and the parameter to be selected high-efficiency fine adjustment module;
and obtaining the final hidden state of the input sample according to the remaining operation of the transformer layer and the third hidden state of the input sample.
Preferably, the expression of the final hidden state of the input sample is:
wherein,for the efficient fine-tuning module of the parameter to be selected +.>For the first hidden state, ++>In order to be in the final hidden state,is a second functional expression.
Preferably, the formula of the final characterization of the input sample is:
the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Is a learning coefficient.
Preferably, the learning coefficient is determined by a bernoulli random number, and the bernoulli random number is 0 or 1, wherein the probability of the bernoulli random number being 1 is 0.5.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
the invention provides a method for selecting a parameter efficient fine-tuning module, which allows a plurality of parameter fine-tuning methods to be used by setting a parameter efficient fine-tuning super network, and eliminates redundant parameter efficient fine-tuning modules according to the final hidden state of an input sample and judgment of learning parameters, so that training consumption of a language model is reduced, and different transducer layers select corresponding parameter efficient fine-tuning modules to obtain better effects.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions of the prior art, the drawings that are needed in the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a method for selecting a parameter efficient fine tuning module according to an embodiment of the present invention;
fig. 2 is a schematic diagram of 3 known efficient fine tuning methods according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention aims to provide a method for selecting a parameter efficient fine tuning module, which solves the problems of low selectivity and high consumption in the training process of a large-scale language model fine tuning method in the prior art.
In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.
As shown in fig. 1, the present invention provides a method for selecting a parameter efficient fine tuning module, which includes:
step 100: acquiring a parameter efficient fine adjustment module to be selected and a final hidden state of an input sample;
step 200: constructing a parameter efficient fine-tuning super network and obtaining the final characterization of the input sample according to the final hidden state of the input sample; the final characterization of the input sample is multiplication of the learning coefficient with the final hidden state of the input sample;
step 300: judging whether the learning coefficient is larger than a threshold value, if so, selecting the parameter efficient fine adjustment module to be selected, and if not, discarding the parameter efficient fine adjustment module to be selected.
Specifically, assume we already have a large-scale language modelThis model may be our autonomous pre-training or may be an open source model, such as ChatGLM-6B. The model is generally over 50 billion in model parameter scale. When the model is trained, the pre-training trunk is not changed, and only parameters of the PEFT module on the right side are changed.
Further, the obtaining the final hidden state of the parameter to be selected high-efficiency fine adjustment module and the input sample includes:
acquiring a first hidden state of an input sample;
obtaining a second hidden state of the input sample according to the first hidden state and part of the operation of the transducer layer;
obtaining a third hidden state of the input sample according to the second hidden state and the parameter to be selected high-efficiency fine adjustment module;
and obtaining the final hidden state of the input sample according to the remaining operation of the transformer layer and the third hidden state of the input sample.
In particular, assume that for one parameter a module is efficiently trimmedThe hidden state of the input sample before entering the transducer layer is +.>It reaches->Part of the operation of passing through a transducer before accessing the location (recorded as a function) The previous hidden state is +.>Through->After that become +.>Then the remaining operations through the transducer layer (denoted as a function +.>) Obtain->
The final hidden state of the input sample is expressed as:
wherein,for the efficient fine-tuning module of the parameter to be selected +.>For the first hidden state, ++>In order to be in the final hidden state,is a second functional expression.
This example discloses 3 known methods for efficient fine tuning of parameters, as shown in fig. 2: loRA, prefixing, adapter tuning. LoRA is the modification of the parameter matrix, and Prefix is the splicing of the parameter matrix with random initialization of the vector characterization of the sample; adapter is the modification of the hidden state of each transform layer output.
Specifically, we first set a parameter efficient fine-tuning super-network, i.e. we allow the six parameter efficient fine-tuning modules (2 adapter positions, two prefix positions, and two LoRA positions) in fig. 2 to be all used at the same time. Thus, the formula for the final characterization of the input sample (final table of samples through the transducer layer) is:
the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>For learning coefficients, the value is between 0 and 1 (here, a real number is taken as sigmoid). We finally want to realize: giving a threshold value->If->Is greater than->Then keep +.>Otherwise remove->. Thus, by adjusting the threshold, we can eliminate redundant parameter efficient trimming modules.
The transformation layer of the pre-training model is equivalent to a trunk, the parameter efficient fine-tuning module is a Bowman on the transformation layer, and both the transformation layer and the parameter efficient fine-tuning module process and change the implicit state of the sample, so that the transformation layer and the parameter efficient fine-tuning module are more suitable for outputting useful outputs, such as the category of sentences, how the next sentence of sentences should be written, and the like.
Further, the learning coefficient is determined by a Bernoulli random number, the Bernoulli random number is 0 or 1, and the probability of the Bernoulli random number being 1 is 0.5.
In particular, the method comprises the steps of,learning parameters: />The parameters are considered as part of the model parameters and are learned along with the parameters of the parameter efficient fine tuning module. Finally, go up>By learning, a value near 0 or 1 indicates that the PEFT module on the customization task is less important. Although it can learn +.>Parameters, but there is a problem: finally we want to screen out less important PEFT modules, but during training and end use the composition of PEFT modules is different, training directly +.>Resulting in portions of the modules not being adequately trained. Therefore, it can cause +>The importance of parameter learning is inaccurate. So to ensureThe parameters may express the importance of the PEFT module, we propose the following regularization method. We are on each forward propagation for each +.>Parameters, all randomly sample a Bernoulli random number +.>The random number takes on a value of 0 or 1, with a probability of p=0.5 of 1, then +.>The value of (2) is->And->Multiplication is performed to obtain>Whether or not can be used by +.>And (5) determining. Thus, each forward propagation activates different parameter efficient fine tuning modules due to the randomness of the Bernoulli random numbers, and different results are obtained. The hidden state obtained by two different forward propagation is marked as +.>And->We believe that the super-network needs to ensure that when different PEFT modules are used, the provided sample semantic representation is stable enough that we need to select different PEFTs for the transducer modules at different positions of the model so that we can learn the number a by training enough for our different PEFT modules i Adding the PEFT modules, and learning a through the regular term of I description plus training data loss i The value of (a), finally a i If smaller, the corresponding PEFT module is removed. Thus, we add an additional regularization term on the basis of training loss:
the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Is a regular term
That is, the requirement isAnd->Is as small as possible. By the constraint of this regularization term, part of the modules in our PEFT super-network can be trained sufficiently so that the corresponding +.>The parameters may truly reflect their importance.
The beneficial effects of the invention are as follows:
the invention provides a method for selecting a parameter efficient fine-tuning module, which allows a plurality of parameter fine-tuning methods to be used by setting a parameter efficient fine-tuning super network, and eliminates redundant parameter efficient fine-tuning modules according to the final hidden state of an input sample and judgment of learning parameters, so that training consumption of a language model is reduced, and different transducer layers select corresponding parameter efficient fine-tuning modules to obtain better effects.
In the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other.
The principles and embodiments of the present invention have been described herein with reference to specific examples, the description of which is intended only to assist in understanding the methods of the present invention and the core ideas thereof; also, it is within the scope of the present invention to be modified by those of ordinary skill in the art in light of the present teachings. In view of the foregoing, this description should not be construed as limiting the invention.

Claims (5)

1. A method of selecting a parameter efficient trim module, comprising:
acquiring a parameter efficient fine adjustment module to be selected and a final hidden state of an input sample;
constructing a parameter efficient fine-tuning super network and obtaining the final characterization of the input sample according to the final hidden state of the input sample; the final characterization of the input sample is multiplication of the learning coefficient with the final hidden state of the input sample;
judging whether the learning coefficient is larger than a threshold value, if so, selecting the parameter efficient fine adjustment module to be selected, and if not, discarding the parameter efficient fine adjustment module to be selected.
2. The method of claim 1, wherein the obtaining the final hidden state of the parameter efficient trimming module to be selected and the input sample comprises:
acquiring a first hidden state of an input sample;
obtaining a second hidden state of the input sample according to the first hidden state and part of the operation of the transducer layer;
obtaining a third hidden state of the input sample according to the second hidden state and the parameter to be selected high-efficiency fine adjustment module;
and obtaining the final hidden state of the input sample according to the remaining operation of the transformer layer and the third hidden state of the input sample.
3. The method of claim 1, wherein the final hidden state of the input samples is expressed as:
wherein,for the efficient fine-tuning module of the parameter to be selected +.>For the first hidden state, ++>At the mostA final hidden state of the device,is a second functional expression.
4. A method of selecting a parameter efficient fine tuning module as claimed in claim 3 wherein the formula of the final characterization of the input sample is:
the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Is a learning coefficient.
5. The method of claim 1, wherein the learning factor is determined by a bernoulli random number, the bernoulli random number having a value of 0 or 1, and wherein the probability of the bernoulli random number having a value of 1 is 0.5.
CN202311352064.3A 2023-10-19 2023-10-19 Method for selecting parameter efficient fine adjustment module Active CN117094361B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311352064.3A CN117094361B (en) 2023-10-19 2023-10-19 Method for selecting parameter efficient fine adjustment module

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311352064.3A CN117094361B (en) 2023-10-19 2023-10-19 Method for selecting parameter efficient fine adjustment module

Publications (2)

Publication Number Publication Date
CN117094361A true CN117094361A (en) 2023-11-21
CN117094361B CN117094361B (en) 2024-01-26

Family

ID=88772147

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311352064.3A Active CN117094361B (en) 2023-10-19 2023-10-19 Method for selecting parameter efficient fine adjustment module

Country Status (1)

Country Link
CN (1) CN117094361B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110580543A (en) * 2019-08-06 2019-12-17 天津大学 Power load prediction method and system based on deep belief network
US20200364574A1 (en) * 2019-05-16 2020-11-19 Samsung Electronics Co., Ltd. Neural network model apparatus and compressing method of neural network model
CN114357172A (en) * 2022-01-07 2022-04-15 北京邮电大学 Rumor detection method based on ERNIE-BiGRU-Attention
CN114676234A (en) * 2022-02-22 2022-06-28 华为技术有限公司 Model training method and related equipment
KR20220124389A (en) * 2021-03-03 2022-09-14 에스케이 주식회사 Method and provision system for finetuned model service using pretrain model
CN116882474A (en) * 2023-07-18 2023-10-13 平安科技(深圳)有限公司 Fine tuning method, device, equipment and medium of pre-training model

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200364574A1 (en) * 2019-05-16 2020-11-19 Samsung Electronics Co., Ltd. Neural network model apparatus and compressing method of neural network model
CN110580543A (en) * 2019-08-06 2019-12-17 天津大学 Power load prediction method and system based on deep belief network
KR20220124389A (en) * 2021-03-03 2022-09-14 에스케이 주식회사 Method and provision system for finetuned model service using pretrain model
CN114357172A (en) * 2022-01-07 2022-04-15 北京邮电大学 Rumor detection method based on ERNIE-BiGRU-Attention
CN114676234A (en) * 2022-02-22 2022-06-28 华为技术有限公司 Model training method and related equipment
CN116882474A (en) * 2023-07-18 2023-10-13 平安科技(深圳)有限公司 Fine tuning method, device, equipment and medium of pre-training model

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
JIAMING HAN 等: "ImageBind-LLM: Multi-modality Instruction Tuning", 《ARXIV》, pages 1 - 24 *
TAO JIANG 等: "Gaseous emission during the composting of pig feces from Chinese Ganqinfen system", 《CHEMOSPHERE》, vol. 90, no. 4, pages 1545 - 1551 *
ZINIU LI 等: "ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models", 《ARXIV》, pages 1 - 20 *
徐峰 等: "综合模块选择, 资源共享与任务调度的SoC设计方案搜索算法", 《计算机辅助设计与图形学学报》, vol. 21, no. 7, pages 1005 - 1010 *
王亮亮 等: "基于深度学习的快速车辆检测算法", 《中国优秀硕士学位论文全文数据库工程科技Ⅱ辑(月刊)》, no. 1, pages 034 - 1013 *
羽林小王子: "一文读懂:LoRA实现大模型LLM微调", pages 1 - 10, Retrieved from the Internet <URL:https://developer.aliyun.com/article/1257855> *

Also Published As

Publication number Publication date
CN117094361B (en) 2024-01-26

Similar Documents

Publication Publication Date Title
CN113962315B (en) Model pre-training method, device, equipment, storage medium and program product
CN110134968B (en) Poem generation method, device, equipment and storage medium based on deep learning
CN111914551B (en) Natural language processing method, device, electronic equipment and storage medium
JP6876814B2 (en) Batch renormalization layer
CN110598224A (en) Translation model training method, text processing device and storage medium
US20170243575A1 (en) Computer-Implemented Method And Apparatus For Generating Grapheme-To-Phoneme Model
US11488060B2 (en) Learning method, learning program, learning device, and learning system
WO2023137911A1 (en) Intention classification method and apparatus based on small-sample corpus, and computer device
US20200057811A1 (en) Hybrid Natural Language Understanding
CN111125323B (en) Chat corpus labeling method and device, electronic equipment and storage medium
CN116049387A (en) Short text classification method, device and medium based on graph convolution
CN114265921A (en) Question-answer knowledge base construction method and device, equipment, medium and product thereof
CN117094361B (en) Method for selecting parameter efficient fine adjustment module
CN112989843B (en) Intention recognition method, device, computing equipment and storage medium
CN111797220B (en) Dialog generation method, apparatus, computer device and storage medium
CN117033961A (en) Multi-mode image-text classification method for context awareness
CN111090740A (en) Knowledge graph generation method for dialog system
CN111401069A (en) Intention recognition method and intention recognition device for conversation text and terminal
CN115240654A (en) Speech recognition model training method, device, equipment and storage medium
CN114048296A (en) Semantic gate-based chatting type multi-round conversation method, system, medium and equipment
CN117332791B (en) Large language model training method, device, equipment and storage medium
CN117808083B (en) Distributed training communication method, device, system, equipment and storage medium
CN110909142B (en) Question and sentence processing method and device of question-answer model, electronic equipment and storage medium
Nie et al. Graph neural net-based user simulator
US11664010B2 (en) Natural language domain corpus data set creation based on enhanced root utterances

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant