CN117708307B - Method and device for fusing micro-tuning and Adapter of large language model - Google Patents
Method and device for fusing micro-tuning and Adapter of large language model Download PDFInfo
- Publication number
- CN117708307B CN117708307B CN202410170139.4A CN202410170139A CN117708307B CN 117708307 B CN117708307 B CN 117708307B CN 202410170139 A CN202410170139 A CN 202410170139A CN 117708307 B CN117708307 B CN 117708307B
- Authority
- CN
- China
- Prior art keywords
- question
- dialogue
- answer
- adapter
- lora
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 26
- 230000006870 function Effects 0.000 claims abstract description 114
- 230000004927 fusion Effects 0.000 claims abstract description 91
- 238000007500 overflow downdraw method Methods 0.000 claims abstract description 18
- 238000012549 training Methods 0.000 claims description 20
- 238000004590 computer program Methods 0.000 claims description 8
- 230000006978 adaptation Effects 0.000 claims description 4
- 230000008014 freezing Effects 0.000 claims 1
- 238000007710 freezing Methods 0.000 claims 1
- 238000013135 deep learning Methods 0.000 abstract description 2
- 238000010276 construction Methods 0.000 abstract 1
- 238000013480 data collection Methods 0.000 abstract 1
- 238000003058 natural language processing Methods 0.000 description 8
- 238000011160 research Methods 0.000 description 7
- 230000002452 interceptive effect Effects 0.000 description 6
- 238000012545 processing Methods 0.000 description 6
- 238000011161 development Methods 0.000 description 5
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 230000004044 response Effects 0.000 description 5
- 238000013473 artificial intelligence Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000013519 translation Methods 0.000 description 4
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 3
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 3
- 230000015556 catabolic process Effects 0.000 description 3
- 238000004140 cleaning Methods 0.000 description 3
- 238000006731 degradation reaction Methods 0.000 description 3
- 238000013461 design Methods 0.000 description 3
- 230000001737 promoting effect Effects 0.000 description 3
- 230000007423 decrease Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000012827 research and development Methods 0.000 description 2
- 101100012902 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) FIG2 gene Proteins 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000001427 coherent effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000003745 diagnosis Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 231100000331 toxic Toxicity 0.000 description 1
- 230000002588 toxic effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/332—Query formulation
- G06F16/3329—Natural language query formulation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
技术领域Technical Field
本发明涉及深度学习领域,更具体的涉及一种大语言模型微调和Adapter融合方法及装置。The present invention relates to the field of deep learning, and more specifically to a large language model fine-tuning and adapter fusion method and device.
背景技术Background technique
训练大型语言模型具有重要的科研和应用价值,能够提升自然语言处理任务的性能,改进对话系统的交互体验,推动科学研究、技术创新和人工智能发展的普惠性。大型语言模型通过训练海量的语料库,能够学习到丰富的语言知识和语法规则,从而在如机器翻译、文本生成和文本分类等自然语言处理任务中表现出更好的性能。这些模型能够理解和生成更准确、更流畅的自然语言。大型语言模型可用于构建智能对话系统,通过与用户进行对话,提供更加自然、准确和个性化的回复。训练出的模型能够理解和生成人类语言,从而能够更好地满足用户的需求,提升对话系统的交互体验。训练大型语言模型需要处理海量的数据和庞大的计算资源,这对于推动科学研究和技术创新具有重要意义。训练大型语言模型的过程中,需要解决许多技术挑战,如数据处理、模型设计、训练算法等,这些挑战的解决对于相关领域的研究和发展都有积极的推动作用。Training large language models has important scientific research and application value. It can improve the performance of natural language processing tasks, improve the interactive experience of dialogue systems, and promote the inclusiveness of scientific research, technological innovation, and artificial intelligence development. Large language models can learn rich language knowledge and grammatical rules by training massive corpora, so as to show better performance in natural language processing tasks such as machine translation, text generation, and text classification. These models can understand and generate more accurate and fluent natural language. Large language models can be used to build intelligent dialogue systems to provide more natural, accurate, and personalized responses through dialogues with users. The trained models can understand and generate human language, so as to better meet the needs of users and improve the interactive experience of dialogue systems. Training large language models requires processing massive amounts of data and huge computing resources, which is of great significance for promoting scientific research and technological innovation. In the process of training large language models, many technical challenges need to be solved, such as data processing, model design, training algorithms, etc. The solution of these challenges has a positive role in promoting research and development in related fields.
目前大型语言模型训练采用的常规方案为:收集大量的指令微调数据,对其进行融合后构建出来一个大规模的数据集,在此数据集上微调开源大语言模型。然而,将多个数据集融合构建一个多功能数据集似乎是不可能的,一方面不同数据集之间可能存在矛盾的可能性,而且很难评估数据的质量;另一方面这些数据集由各种特定任务的实例组成,例如数学、编码、角色扮演、写作等。若将这些数据集混合,并在这个融合数据集上进行微调时,大语言模型的性能可能会下降,甚至出现严重下降的问题。The conventional approach currently used for large language model training is to collect a large amount of instruction fine-tuning data, fuse it to build a large-scale dataset, and fine-tune the open source large language model on this dataset. However, it seems impossible to fuse multiple datasets to build a multi-functional dataset. On the one hand, there may be contradictions between different datasets, and it is difficult to evaluate the quality of the data; on the other hand, these datasets are composed of instances of various specific tasks, such as mathematics, coding, role-playing, writing, etc. If these datasets are mixed and fine-tuned on this fused dataset, the performance of the large language model may decline, or even seriously decline.
发明内容Summary of the invention
本发明实施例提供一种大语言模型微调和Adapter融合方法及装置,可以防止由于不同数据集在语义空间中存在冲突而导致性能下降的问题。The embodiment of the present invention provides a large language model fine-tuning and adapter fusion method and device, which can prevent the problem of performance degradation caused by conflicts between different data sets in the semantic space.
本发明实施例提供一种大语言模型微调和Adapter融合方法,包括:The embodiment of the present invention provides a large language model fine-tuning and adapter fusion method, including:
从设定网络平台上收集多个问答数据集和对话数据集,对所述问答数据集和所述对话数据集分别进行LoRA-adapter微调,依次得到问答大语言模型、问答负对数似然损失函数、对话大语言模型和对话负对数似然损失函数;Collect multiple question-answering datasets and dialogue datasets from a set network platform, perform LoRA-adapter fine-tuning on the question-answering datasets and the dialogue datasets respectively, and obtain a question-answering large language model, a question-answering negative log-likelihood loss function, a dialogue large language model, and a dialogue negative log-likelihood loss function in turn;
根据所述问答负对数似然损失函数、所述对话负对数似然损失函数和基于每个LoRA-adapter微调所包括的初始融合权重,得到所述问答数据集和所述对话数据集在理想状态下的理想损失函数,根据所述理想损失函数的最小值得到所述理想损失函数所对应的理想融合权重和第一理想参数;其中,所述第一理想参数表示分别添加到所述问答大语言模型和所述对话大语言模型的所有的LoRA-adapter;According to the question-answering negative log-likelihood loss function, the dialogue negative log-likelihood loss function and the initial fusion weight included in the fine-tuning based on each LoRA-adapter, the ideal loss function of the question-answering dataset and the dialogue dataset in an ideal state is obtained, and the ideal fusion weight and the first ideal parameter corresponding to the ideal loss function are obtained according to the minimum value of the ideal loss function; wherein the first ideal parameter represents all LoRA-adapters added to the question-answering large language model and the dialogue large language model respectively;
根据所述理想损失函数,对每个所述问答数据集对应的问答LoRA-adapter、每个所述对话数据集对应的对话LoRA-adapter进行微调,分别得到问答LoRA-adapter的最佳参数、对话LoRA-adapter的最佳参数和最佳融合参数;According to the ideal loss function, fine-tune the question-answer LoRA-adapter corresponding to each question-answer data set and the conversation LoRA-adapter corresponding to each conversation data set to obtain the optimal parameters of the question-answer LoRA-adapter, the optimal parameters of the conversation LoRA-adapter and the optimal fusion parameters respectively;
根据所述问答LoRA-adapter的最佳参数、所述对话LoRA-adapter的最佳参数和所述最佳融合参数得到通用LoRA-adapter。A universal LoRA-adapter is obtained according to the optimal parameters of the question-and-answer LoRA-adapter, the optimal parameters of the dialogue LoRA-adapter and the optimal fusion parameters.
优选地,所述对所述问答数据集进行LoRA-adapter微调,依次得到问答大语言模型、问答负对数似然损失函数,具体包括:Preferably, the LoRA-adapter is fine-tuned on the question-answering dataset to obtain a question-answering large language model and a question-answering negative log-likelihood loss function in sequence, specifically including:
对所述问答数据集进行训练得到问答LoRA-adapter,根据所述问答LoRA-adapter和所述问答数据集,得到所述问答大语言模型;The question-answering data set is trained to obtain a question-answering LoRA-adapter, and the question-answering large language model is obtained according to the question-answering LoRA-adapter and the question-answering data set;
根据所述问答大语言模型和所述问答大语言模型的token得到所述问答负对数似然损失函数;Obtaining the question-answering negative log-likelihood loss function according to the question-answering large language model and the token of the question-answering large language model;
所述问答数据集、所述问答大语言模型和所述问答负对数似然损失函数如下所示:The question-answering dataset, the question-answering large language model, and the question-answering negative log-likelihood loss function are as follows:
其中,Qi表示第i个问答数据集,si,j表示第i个问答数据集的第j个系统信息,qi,j表示第i个问答数据集的第j个问题,ri,j表示第i个问答数据集的第j个回复,|Qi|表示问答数据集Qi的长度,表示在问答数据集Qi上训练得到的问答LoRA-adapter,|ri,j|表示ri,j的长度,rk表示大语言模型生成的第k个token,pθ表示大语言模型,θ表示大语言模型的冻结参数,/>表示问答负对数似然损失函数。Where Qi represents the i-th question-answering dataset, si ,j represents the j-th system information of the i-th question-answering dataset, qi ,j represents the j-th question of the i-th question-answering dataset, ri ,j represents the j-th reply of the i-th question-answering dataset, | Qi | represents the length of the question-answering dataset Qi , represents the question-answering LoRA-adapter trained on the question-answering dataset Qi , | ri,j | represents the length of ri ,j , rk represents the kth token generated by the large language model, pθ represents the large language model, θ represents the frozen parameters of the large language model,/> represents the question-answering negative log-likelihood loss function.
优选地,所述对所述对话数据集进行LoRA-adapter微调,依次得到对话大语言模型、对话负对数似然损失函数,具体包括:Preferably, the LoRA-adapter fine-tuning of the conversation dataset to obtain a large conversation language model and a conversation negative log-likelihood loss function in sequence specifically includes:
对所述对话数据集进行训练得到对话LoRA-adapter,根据所述对话LoRA-adapter和所述对话数据集,得到对话大语言模型;Training the conversation data set to obtain a conversation LoRA-adapter, and obtaining a conversation large language model according to the conversation LoRA-adapter and the conversation data set;
根据所述对话大语言模型和所述对话大语言模型的token得到所述对话负对数似然损失函数;Obtaining the conversation negative log-likelihood loss function according to the conversation large language model and the token of the conversation large language model;
所述对话数据集、对话大语言模型和所述对话负对数似然损失函数如下所示:The dialogue dataset, the dialogue large language model and the dialogue negative log-likelihood loss function are as follows:
其中,Ci表示第i个对话数据集,表示第T轮中第i个对话数据集的第j个查询,表示第T轮中第i个对话数据集的第j个回复,|Ci|表示对话数据集Ci的长度,/>表示在对话数据集Ci上训练得到的对话LoRA-adapter,Qj表示属于用户查询的所有标记,Rj表示目标标记,/>表示对话数据集Ci中第j个数据包含token的数量,/>表示对话负对数似然损失函数,pθ表示大语言模型,θ表示大语言模型的冻结参数。Among them, Ci represents the i-th dialogue dataset, represents the jth query of the i-th dialogue dataset in the T-th round, represents the jth reply of the ith dialogue dataset in the Tth round, |C i | represents the length of the dialogue dataset C i , /> represents the dialog LoRA-adapter trained on the dialog dataset Ci , Qj represents all tags belonging to the user query, Rj represents the target tag, /> Indicates the number of tokens contained in the jth data in the conversation dataset Ci ,/> represents the dialogue negative log-likelihood loss function, p θ represents the large language model, and θ represents the frozen parameters of the large language model.
优选地,所述理想损失函数如下所示:Preferably, the ideal loss function is as follows:
所述理想损失函数的最小值如下所示:The minimum value of the ideal loss function is as follows:
其中,L表示理想损失函数,表示在问答数据集Qi上微调获得/>的初始融合权重,/>表示在对话数据集Ci上微调获得/>的初始融合权重,A*表示所有的第一理想参数,ω*表示所有的理想融合权重,A表示第一理想参数,ω表示理想融合权重。Where L represents the ideal loss function, Denotes fine-tuning on the question-answering dataset Qi /> The initial fusion weight of Represents fine-tuning on the dialogue dataset Ci to obtain/> The initial fusion weight of , A * represents all the first ideal parameters, ω * represents all the ideal fusion weights, A represents the first ideal parameter, and ω represents the ideal fusion weight.
优选地,所述第一理想参数如下所示:Preferably, the first ideal parameter is as follows:
所述理想融合权重如下所示:The ideal fusion weights are as follows:
其中,A表示第一理想参数,表示在问答数据集QM上微调获得的第一理想参数,表示在对话数据集CN上微调获得的第一理想参数,M表示问答数据集的数量,N表示对话数据集的数量,ω表示理想融合权重,/>表示/>的理想融合权重,/>表示/>的理想融合权重。Where A represents the first ideal parameter, represents the first ideal parameter obtained by fine-tuning on the question-answering dataset Q M , represents the first ideal parameter obtained by fine-tuning on the dialogue dataset CN , M represents the number of question-answering datasets, N represents the number of dialogue datasets, ω represents the ideal fusion weight,/> Indicates/> The ideal fusion weight of Indicates/> The ideal fusion weights.
优选地,所述问答LoRA-adapter的最佳参数、对话LoRA-adapter的最佳参数和最佳融合参数如下所示:Preferably, the optimal parameters of the question-answer LoRA-adapter, the optimal parameters of the dialogue LoRA-adapter and the optimal fusion parameters are as follows:
其中,表示问答LoRA-adapter的最佳参数,/>表示对话LoRA-adapter的最佳参数,ω**表示最佳融合参数,/>表示问答负对数似然损失函数,/>表示对话负对数似然损失函数,/>表示在问答数据集Qi上微调获得的初始融合权重,/>表示在对话数据集Ci上微调获得的初始融合权重,/>表示在对话数据集Ci上训练得到的对话LoRA-adapter。in, represents the optimal parameters of the question-answering LoRA-adapter,/> represents the best parameter of the dialogue LoRA-adapter, ω ** represents the best fusion parameter, /> represents the question-answering negative log-likelihood loss function,/> represents the negative log-likelihood loss function of the conversation,/> represents the initial fusion weight obtained by fine-tuning on the question-answering dataset Qi ,/> represents the initial fusion weight obtained by fine-tuning on the dialogue dataset Ci ,/> Represents the conversation LoRA-adapter trained on the conversation dataset Ci .
本发明实施例提供一种大语言模型微调和Adapter融合装置,包括:The embodiment of the present invention provides a large language model fine-tuning and adapter fusion device, including:
第一得到单元,用于从设定网络平台上收集多个问答数据集和对话数据集;对所述问答数据集和对话数据集分别进行LoRA-adapter微调,依次得到问答大语言模型、问答负对数似然损失函数、对话大语言模型和对话负对数似然损失函数;The first obtaining unit is used to collect multiple question-answering datasets and dialogue datasets from a set network platform; the LoRA-adapter is fine-tuned on the question-answering dataset and the dialogue dataset respectively, and the question-answering large language model, the question-answering negative log-likelihood loss function, the dialogue large language model and the dialogue negative log-likelihood loss function are obtained in turn;
第二得到单元,用于根据所述问答负对数似然损失函数、所述对话负对数似然损失函数和基于每个LoRA-adapter微调所包括的初始融合权重,得到问答数据集和对话数据集在理想状态下的理想损失函数,根据所述理想损失函数的最小值得到理想损失函数所对应的理想融合权重和第一理想参数;其中,所述第一理想参数表示分别添加到所述问答大语言模型和所述对话大语言模型的所有的LoRA-adapter;The second obtaining unit is used to obtain the ideal loss function of the question and answer dataset and the dialogue dataset under the ideal state according to the question and answer negative log-likelihood loss function, the dialogue negative log-likelihood loss function and the initial fusion weight included in the fine-tuning based on each LoRA-adapter, and obtain the ideal fusion weight and the first ideal parameter corresponding to the ideal loss function according to the minimum value of the ideal loss function; wherein the first ideal parameter represents all LoRA-adapters added to the question and answer large language model and the dialogue large language model respectively;
第三得到单元,用于根据所述理想损失函数,对每个所述问答数据集对应的问答LoRA-adapter、每个所述对话数据集对应的对话LoRA-adapter进行微调,分别得到问答LoRA-adapter的最佳参数、对话LoRA-adapter的最佳参数和最佳融合参数;The third obtaining unit is used to fine-tune the question and answer LoRA-adapter corresponding to each question and answer data set and the conversation LoRA-adapter corresponding to each conversation data set according to the ideal loss function, and respectively obtain the optimal parameters of the question and answer LoRA-adapter, the optimal parameters of the conversation LoRA-adapter and the optimal fusion parameters;
第四得到单元,用于根据所述问答LoRA-adapter的最佳参数、所述对话LoRA-adapter的最佳参数和所述最佳融合参数得到通用LoRA-adapter。The fourth obtaining unit is used to obtain a universal LoRA-adapter according to the optimal parameters of the question-and-answer LoRA-adapter, the optimal parameters of the dialogue LoRA-adapter and the optimal fusion parameters.
本发明实施例提供一种计算机设备,所述计算机设备包括存储器和处理器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行上述任意一项所述的大语言模型微调和Adapter融合方法。An embodiment of the present invention provides a computer device, comprising a memory and a processor, wherein the memory stores a computer program, and when the computer program is executed by the processor, the processor executes any one of the large language model fine-tuning and adapter fusion methods described above.
本发明实施例提供一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行上述任意一项所述的大语言模型微调和Adapter融合方法。An embodiment of the present invention provides a computer-readable storage medium storing a computer program. When the computer program is executed by a processor, the processor executes any one of the large language model fine-tuning and adapter fusion methods described above.
本发明实施例提供一种大语言模型微调和Adapter融合方法及装置,该方法包括:从设定网络平台上收集多个问答数据集和对话数据集;对所述问答数据集和对话数据集分别进行LoRA-adapter微调,依次得到问答大语言模型、问答负对数似然损失函数、对话大语言模型和对话负对数似然损失函数;根据所述问答负对数似然损失函数、所述对话负对数似然损失函数和基于每个LoRA-adapter微调所包括的初始融合权重,得到问答数据集和对话数据集在理想状态下的理想损失函数,根据所述理想损失函数的最小值得到理想损失函数所对应的理想融合权重和第一理想参数;其中,所述第一理想参数表示分别添加到所述问答大语言模型和所述对话大语言模型的所有的LoRA-adapter;根据所述理想损失函数,对每个所述问答数据集对应的问答LoRA-adapter、每个所述对话数据集对应的对话LoRA-adapter进行微调,分别得到问答LoRA-adapter的最佳参数、对话LoRA-adapter的最佳参数和最佳融合参数;根据所述问答LoRA-adapter的最佳参数、所述对话LoRA-adapter的最佳参数和所述最佳融合参数得到通用LoRA-adapter。该方法通过构建多个指令微调数据集,利用QLoRA的量化技术来节约GPU(英文为:Graphic Process Unit,中文为:图形处理器)的消耗,提供了一种节约计算资源成本且高质量的大语言模型训练方式,同时设计了基于Grid-Search(中文为:调参手段)优化多LoRA-adapter融合方式对训练出的LoRA-adapter进行融合。该方法通过对LoRA-adapter进行融合,可以有效避免数据集融合引起的语义空间冲突,同时提升大语言模型在多个任务上的泛化性能。解决了现有技术中由于不同数据集在语义空间中存在冲突而导致性能下降的问题。The embodiment of the present invention provides a large language model fine-tuning and adapter fusion method and device, the method comprising: collecting multiple question-answering data sets and dialogue data sets from a set network platform; performing LoRA-adapter fine-tuning on the question-answering data set and the dialogue data set respectively, and obtaining a large question-answering language model, a question-answering negative log-likelihood loss function, a dialogue large language model and a dialogue negative log-likelihood loss function in turn; according to the question-answering negative log-likelihood loss function, the dialogue negative log-likelihood loss function and the initial fusion weights included in each LoRA-adapter fine-tuning, obtaining the ideal loss function of the question-answering data set and the dialogue data set in an ideal state, and obtaining the ideal fusion corresponding to the ideal loss function according to the minimum value of the ideal loss function Weight and first ideal parameter; wherein the first ideal parameter represents all LoRA-adapters added to the question-answering large language model and the dialogue large language model respectively; according to the ideal loss function, the question-answering LoRA-adapter corresponding to each question-answering data set and the dialogue LoRA-adapter corresponding to each dialogue data set are fine-tuned to obtain the optimal parameters of the question-answering LoRA-adapter, the optimal parameters of the dialogue LoRA-adapter and the optimal fusion parameters respectively; the universal LoRA-adapter is obtained according to the optimal parameters of the question-answering LoRA-adapter, the optimal parameters of the dialogue LoRA-adapter and the optimal fusion parameters. The method constructs multiple instruction fine-tuning data sets and uses the quantization technology of QLoRA to save the consumption of GPU (Graphic Process Unit in English, Graphics Processing Unit in Chinese), providing a large language model training method that saves computing resource costs and is high-quality, and at the same time designs a multi-LoRA-adapter fusion method based on Grid-Search (parameter adjustment means in Chinese) to fuse the trained LoRA-adapter. This method can effectively avoid the semantic space conflict caused by data set fusion by fusing LoRA-adapter, and improve the generalization performance of large language models on multiple tasks. It solves the problem of performance degradation caused by conflicts in the semantic space of different data sets in the existing technology.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings required for use in the embodiments or the description of the prior art will be briefly introduced below. Obviously, the drawings described below are only some embodiments of the present invention. For ordinary technicians in this field, other drawings can be obtained based on these drawings without paying creative work.
图1为本发明实施例提供的一种大语言模型微调和Adapter融合方法流程示意图;FIG1 is a schematic diagram of a large language model fine-tuning and adapter fusion method provided by an embodiment of the present invention;
图2为本发明实施例提供的一种大语言模型微调和Adapter融合装置结构示意图。FIG2 is a schematic diagram of the structure of a large language model fine-tuning and Adapter fusion device provided by an embodiment of the present invention.
具体实施方式Detailed ways
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The following will be combined with the drawings in the embodiments of the present invention to clearly and completely describe the technical solutions in the embodiments of the present invention. Obviously, the described embodiments are only part of the embodiments of the present invention, not all of the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by ordinary technicians in this field without creative work are within the scope of protection of the present invention.
训练大型语言模型不仅仅是为了提升自然语言处理任务的性能和改进对话系统的交互体验,它还在科研和应用领域具有更为丰富的价值和意义。Training large language models is not only for improving the performance of natural language processing tasks and improving the interactive experience of dialogue systems, but also has richer value and significance in scientific research and application fields.
首先,大型语言模型通过训练海量的语料库,能够学习到丰富的语言知识和语法规则。这些模型可以理解和生成更准确、更流畅的自然语言,为机器翻译、文本生成、文本分类等自然语言处理任务提供更好的性能表现。在机器翻译任务中,大型语言模型能够更准确地理解源语言的含义并生成更自然的目标语言翻译结果。在文本生成任务中,模型能够生成更富有逻辑和连贯性的文本内容。在文本分类任务中,模型可以更准确地判断文本的类别,提升分类的准确率。First, large language models can learn rich language knowledge and grammatical rules by training massive corpora. These models can understand and generate more accurate and fluent natural language, providing better performance for natural language processing tasks such as machine translation, text generation, and text classification. In machine translation tasks, large language models can more accurately understand the meaning of the source language and generate more natural translation results in the target language. In text generation tasks, the model can generate more logical and coherent text content. In text classification tasks, the model can more accurately determine the category of the text and improve the accuracy of classification.
其次,大型语言模型可用于构建智能对话系统,通过与用户进行对话,提供更加自然、准确和个性化的回复。这种能力对于日常生活中的聊天机器人、智能客服等场景非常有用。训练出的模型能够理解和生成人类语言,从而能够更好地满足用户的需求,提升对话系统的交互体验。对话系统可以通过模型生成个性化回复,使用户感受到与人类对话一样的交互体验,增强用户的满意度。Secondly, large language models can be used to build intelligent dialogue systems, which can provide more natural, accurate and personalized responses by communicating with users. This capability is very useful for scenarios such as chatbots and intelligent customer service in daily life. The trained model can understand and generate human language, so that it can better meet user needs and improve the interactive experience of the dialogue system. The dialogue system can generate personalized responses through the model, so that users can experience the same interactive experience as human dialogue, thereby enhancing user satisfaction.
此外,训练大型语言模型需要处理海量的数据和庞大的计算资源,这对于推动科学研究和技术创新具有重要意义。在训练大型语言模型的过程中,需要解决许多技术挑战,如数据处理、模型设计、训练算法等。这些挑战的解决不仅能够推动语言模型的发展,还有助于相关领域的研究和发展。例如,通过对模型进行改进和优化,可以提高模型的效率和性能,为其他自然语言处理任务的开发和应用提供技术支持。In addition, training large language models requires processing massive amounts of data and huge computing resources, which is of great significance for promoting scientific research and technological innovation. In the process of training large language models, many technical challenges need to be addressed, such as data processing, model design, training algorithms, etc. Solving these challenges can not only promote the development of language models, but also contribute to the research and development of related fields. For example, by improving and optimizing the model, the efficiency and performance of the model can be improved, providing technical support for the development and application of other natural language processing tasks.
最后,训练大型语言模型可以提供智能化的自然语言处理服务,促进人工智能技术的普惠性发展。这些模型可以被应用于各个领域,如教育、医疗、金融等。在教育领域,模型可以用于辅助学习、智能化答疑等,提供个性化的学习资源和交流平台。在医疗领域,模型可以用于辅助医生诊断、智能化病历记录等,提升医疗服务的质量和效率。在金融领域,模型可以用于智能客服、风险管理等,提供更个性化、高效的金融服务。Finally, training large language models can provide intelligent natural language processing services and promote the inclusive development of artificial intelligence technology. These models can be applied to various fields, such as education, medical care, and finance. In the field of education, models can be used to assist learning, intelligent question answering, etc., to provide personalized learning resources and communication platforms. In the medical field, models can be used to assist doctors in diagnosis, intelligent medical record recording, etc., to improve the quality and efficiency of medical services. In the financial field, models can be used for intelligent customer service, risk management, etc., to provide more personalized and efficient financial services.
综上所述,训练大型语言模型具有重要的科研和应用价值,不仅能提升自然语言处理任务的性能和改进对话系统的交互体验,还能推动科学研究、技术创新和人工智能发展的普惠性。通过训练大型语言模型,可以在各个领域应用智能化的自然语言处理技术,为社会提供更好的智能化服务和解决方案。In summary, training large language models has important scientific research and application value. It can not only improve the performance of natural language processing tasks and improve the interactive experience of dialogue systems, but also promote the inclusiveness of scientific research, technological innovation and artificial intelligence development. By training large language models, intelligent natural language processing technology can be applied in various fields to provide better intelligent services and solutions for society.
由于目前大型语言模型训练采用的常规方案,多个数据集融合构建一个多功能数据集似乎是不可能的,不同数据集之间可能存在矛盾的可能性,而且很难评估数据的质量。基于此,本发明实施例为了能够构建一个高质量且能力强的大语言模型,提出了一种高效的训练方法。通过Huggingface平台上的多个开源数据进行清洗与整理,得到多个不同的知识问答数据集和对话数据集,然后采用QLoRA(中文为:低秩适配)在每个数据集上单独训练一个LoRA-adapter,最后利用Grid-Search对这些LoRA-adapter的融合权重进行动态优化。Due to the conventional scheme currently used for large-scale language model training, it seems impossible to fuse multiple data sets to construct a multifunctional data set. There may be contradictions between different data sets, and it is difficult to evaluate the quality of the data. Based on this, in order to be able to build a high-quality and powerful large language model, an embodiment of the present invention proposes an efficient training method. By cleaning and sorting multiple open source data on the Huggingface platform, multiple different knowledge question and answer data sets and dialogue data sets are obtained, and then QLoRA (Chinese: low-rank adaptation) is used to train a LoRA-adapter separately on each data set, and finally Grid-Search is used to dynamically optimize the fusion weights of these LoRA-adapters.
图1为本发明实施例提供的一种大语言模型微调和Adapter融合方法流程示意图;如图1所示,该方法包括以下步骤:FIG1 is a flow chart of a method for fine-tuning a large language model and fusing an adapter according to an embodiment of the present invention; as shown in FIG1 , the method includes the following steps:
步骤101,从设定网络平台上收集多个问答数据集和对话数据集;对所述问答数据集和对话数据集分别进行LoRA-adapter微调,依次得到问答大语言模型、问答负对数似然损失函数、对话大语言模型和对话负对数似然损失函数;Step 101, collecting multiple question-answering datasets and dialogue datasets from a set network platform; performing LoRA-adapter fine-tuning on the question-answering dataset and the dialogue dataset, respectively, to obtain a question-answering large language model, a question-answering negative log-likelihood loss function, a dialogue large language model, and a dialogue negative log-likelihood loss function;
步骤102,根据所述问答负对数似然损失函数、所述对话负对数似然损失函数和基于每个LoRA-adapter微调所包括的初始融合权重,得到问答数据集和对话数据集在理想状态下的理想损失函数,根据所述理想损失函数的最小值得到理想损失函数所对应的理想融合权重和第一理想参数;其中,所述第一理想参数表示分别添加到所述问答大语言模型和所述对话大语言模型的所有的LoRA-adapter;Step 102, according to the question-answering negative log-likelihood loss function, the dialogue negative log-likelihood loss function and the initial fusion weight included in the fine-tuning based on each LoRA-adapter, obtain the ideal loss function of the question-answering dataset and the dialogue dataset in an ideal state, and obtain the ideal fusion weight and the first ideal parameter corresponding to the ideal loss function according to the minimum value of the ideal loss function; wherein the first ideal parameter represents all LoRA-adapters added to the question-answering large language model and the dialogue large language model respectively;
步骤103,根据所述理想损失函数,对每个所述问答数据集对应的问答LoRA-adapter、每个所述对话数据集对应的对话LoRA-adapter进行微调,分别得到问答LoRA-adapter的最佳参数、对话LoRA-adapter的最佳参数和最佳融合参数;Step 103, according to the ideal loss function, fine-tune the question-answer LoRA-adapter corresponding to each question-answer data set and the conversation LoRA-adapter corresponding to each conversation data set to obtain the optimal parameters of the question-answer LoRA-adapter, the optimal parameters of the conversation LoRA-adapter and the optimal fusion parameters respectively;
步骤104,根据所述问答LoRA-adapter的最佳参数、所述对话LoRA-adapter的最佳参数和所述最佳融合参数得到通用LoRA-adapter。Step 104, obtaining a universal LoRA-adapter according to the optimal parameters of the question-and-answer LoRA-adapter, the optimal parameters of the dialogue LoRA-adapter and the optimal fusion parameters.
需要说明的是,本发明实施例提供的大语言模型微调和Adapter融合方法,其执行主体为处理器。It should be noted that the execution subject of the large language model fine-tuning and Adapter fusion method provided in the embodiment of the present invention is a processor.
在步骤101中,先从设定网络平台上收集多个问答数据集和对话数据集,这里的设定网络平台可以是Huggingface社区,在本发明实施例中,对设定网络平台不做具体限定。In step 101, a plurality of question-answer data sets and conversation data sets are first collected from a set network platform. The set network platform here may be the Huggingface community. In the embodiment of the present invention, the set network platform is not specifically limited.
具体地,从设定网络平台上收集多个数据集之后,需要对上述多个数据集进行清洗,清洗之后最终得到多个问答数据集和多个对话数据集,在本发明实施例中,对数据集的清洗规则如下:Specifically, after collecting multiple data sets from a set network platform, the multiple data sets need to be cleaned. After cleaning, multiple question-answer data sets and multiple dialogue data sets are finally obtained. In an embodiment of the present invention, the cleaning rules for the data sets are as follows:
1)删除与ChatGPT(英文为:Chat Generative Pre-trained Transformer)-3.5-Turbo之间的对话实例,只保留与GPT-4之间的对话实例;2)删除GPT-4拒绝回答或直接解释的对话;3)删除的回答为GPT-4空或GPT-4遗漏回答的对话;4)删除包含有毒或非法信息的对话;5)删除包含OpenAI或ChatGPT字样的对话,或者将包含OpenAI或ChatGPT字样的对话替换为确保该模型具有正确身份的信息;6)删除与基准问题相似度大于85%的用户提问;7)将过长的对话实例分成与模型最大上下文长度匹配对话。1) Delete the dialogue instances with ChatGPT (Chat Generative Pre-trained Transformer)-3.5-Turbo, and only keep the dialogue instances with GPT-4; 2) Delete the dialogues that GPT-4 refuses to answer or directly explains; 3) Delete the dialogues where the answer is GPT-4 empty or GPT-4 omitted to answer; 4) Delete the dialogues containing toxic or illegal information; 5) Delete the dialogues containing the words OpenAI or ChatGPT, or replace the dialogues containing the words OpenAI or ChatGPT with information that ensures that the model has the correct identity; 6) Delete user questions with a similarity greater than 85% with the benchmark questions; 7) Divide the dialogue instances that are too long into dialogues that match the maximum context length of the model.
通过上述情形规则,最终可以得到本发明实施例所需的多个问答数据集合多个对话数据集。Through the above situation rules, multiple question and answer data sets and multiple dialogue data sets required by the embodiment of the present invention can be finally obtained.
在本发明实施例中,问答数据集可以表示为{Q1,Q2,…,QM},对话数据集可以表示为{C1,C2,…,CN}。In the embodiment of the present invention, the question-answering dataset may be represented as {Q 1 , Q 2 , …, Q M }, and the conversation dataset may be represented as {C 1 , C 2 , …, C N }.
具体地,问答数据集可以通过公式(1)表示:Specifically, the question-answering dataset can be expressed by formula (1):
其中,s代表系统信息system message,q代表用户的问题query,r代表人工智能的回复response,Qi表示第i个问答数据集,si,j表示第i个问答数据集的第j个系统信息,qi,j表示第i个问答数据集的第j个问题,ri,j表示第i个问答数据集的第j个回复,|Qi|表示问答数据集Qi的长度。Among them, s represents system message, q represents the user's query, r represents the artificial intelligence's response, Qi represents the i-th question and answer dataset, si ,j represents the j-th system message of the i-th question and answer dataset, qi ,j represents the j-th question of the i-th question and answer dataset, ri ,j represents the j-th reply of the i-th question and answer dataset, and | Qi | represents the length of the question and answer dataset Qi .
在本发明实施例中,对得到的问答数据集进行LoRA-adapter微调时,给定来自特定实例的系统消息si,j和查询qi,j,大语言模型应该学会生成相应的回复ri,j。这个过程就可以得到问答大语言模型,问答大语言模型如下所示:In the embodiment of the present invention, when the obtained question-answering dataset is fine-tuned by LoRA-adapter, given a specific instance Given a system message si ,j and a query qi ,j , the large language model should learn to generate the corresponding reply ri,j . This process can get the question-answering large language model, which is as follows:
其中,表示在问答数据集Qi上训练得到的问答LoRA-adapter,pθ表示大语言模型,|ri,j|表示ri,j的长度,rk表示大语言模型生成的第k个token,θ表示大语言模型的冻结参数,r<k表示小标小于k的所有的r。in, represents the question-answering LoRA-adapter trained on the question-answering dataset Qi , pθ represents the large language model, | ri,j | represents the length of ri ,j , rk represents the kth token generated by the large language model, θ represents the frozen parameter of the large language model, and r <k represents all r whose sub-scalar is less than k.
进一步地,根据问答大语言模型和问答大语言模型的token得到问答负对数似然损失函数。Furthermore, a question-answering negative log-likelihood loss function is obtained according to the question-answering large language model and the token of the question-answering large language model.
其中,问答负对数似然损失函数如下所示:Among them, the question-answering negative log-likelihood loss function is as follows:
其中,表示问答负对数似然损失函数,si,j表示第i个问答数据集的第j个系统信息,qi,j表示第i个问答数据集的第j个问题,ri,j表示第i个问答数据集的第j个回复。in, represents the negative log-likelihood loss function of question and answer, si ,j represents the j-th system information of the ith question and answer dataset, qi ,j represents the j-th question of the ith question and answer dataset, and ri,j represents the j-th reply of the ith question and answer dataset.
相应地,对话数据集可以通过公式(4)表示:Accordingly, the dialogue dataset can be expressed by formula (4):
其中,对话数据集包含多个拥有T轮的对话实例,Ci表示第i个对话数据集,表示第T轮中第i个对话数据集的第j个查询,/>表示第T轮中第i个对话数据集的第j个回复,|Ci|表示对话数据集Ci的长度。The dialogue dataset contains multiple dialogue instances with T rounds, and Ci represents the i-th dialogue dataset. represents the jth query of the i-th dialogue dataset in the T-th round,/> represents the j-th reply of the i-th dialogue dataset in the T-th round, and | Ci | represents the length of the dialogue dataset Ci .
在本发明实施例中,在对话数据集进行LoRA-adapter微调时,大语言模型将学习在给定第T轮之前的对话历史和查询的情况下预测每个回复/>这个过程就可以得到对话大语言模型,对话大语言模型如下所示:In the embodiment of the present invention, when the LoRA-adapter is fine-tuned on the conversation dataset, the large language model will learn the conversation history and query before the given Tth round. Predict each response in the case of/> This process can obtain a large conversation language model, which is as follows:
其中,表示在对话数据集Ci上训练得到的对话LoRA-adapter,Qj表示属于用户查询的所有标记,Rj表示目标标记,/>表示对话数据集Ci中第j个数据包含token的数量。in, represents the dialog LoRA-adapter trained on the dialog dataset Ci , Qj represents all tags belonging to the user query, Rj represents the target tag, /> Indicates the number of tokens contained in the j-th data in the conversation dataset Ci .
进一步地,根据对话大语言模型和对话大语言模型的token得到对话负对数似然损失函数。其中,对话负对数似然损失函数如下所示:Furthermore, the conversation negative log likelihood loss function is obtained based on the conversation large language model and the token of the conversation large language model. The conversation negative log likelihood loss function is as follows:
其中,表示对话负对数似然损失函数,/>表示对话数据集Ci中第j个数据包含token的数量,pθ表示大语言模型,θ表示大语言模型的冻结参数,/>表示在对话数据集Ci上训练得到的对话LoRA-adapter。in, represents the negative log-likelihood loss function of the conversation,/> represents the number of tokens contained in the jth data in the dialogue dataset Ci , pθ represents the large language model, θ represents the frozen parameters of the large language model,/> Represents the conversation LoRA-adapter trained on the conversation dataset Ci .
需要说明的是,在本发明实施例中,对于LoRA-adapter的融合,每个微调LoRA-adapter的损失函数将被赋予一个可训练的权重,并且微调所有带有融合权重的LoRA-adapter的损失函数,其中,融合权重可以表示为: It should be noted that in an embodiment of the present invention, for the fusion of LoRA-adapter, the loss function of each fine-tuned LoRA-adapter will be assigned a trainable weight, and the loss function of all LoRA-adapters with fusion weights will be fine-tuned, where the fusion weight can be expressed as:
在步骤102中,根据问答负对数似然损失函数、对话负对数似然损失函数和基于每个LoRA-adapter微调所包括的初始融合权重,可以得到问答数据集和对话数据集在理想状态下的理想损失函数,其中,理想损失函数如下所示:In step 102, according to the question-answering negative log-likelihood loss function, the dialogue negative log-likelihood loss function and the initial fusion weights included in the fine-tuning based on each LoRA-adapter, the ideal loss function of the question-answering dataset and the dialogue dataset under ideal conditions can be obtained, wherein the ideal loss function is as follows:
其中,L表示理想损失函数,表示问答负对数似然损失函数,/>表示对话负对数似然损失函数,/>表示在问答数据集Qi上微调获得/>的初始融合权重,/>表示在对话数据集Ci上微调获得/>的初始融合权重。Where L represents the ideal loss function, represents the question-answering negative log-likelihood loss function,/> represents the conversation negative log-likelihood loss function,/> Denotes fine-tuning on the question-answering dataset Qi /> The initial fusion weight of Represents fine-tuning on the dialogue dataset Ci to obtain/> The initial fusion weights.
进一步地,根据理想损失函数的最小值得到理想损失函数所对应理想融合权重和第一理想参数,通过下列公式表示理想损失函数的最小值:Furthermore, the ideal fusion weight and the first ideal parameter corresponding to the ideal loss function are obtained according to the minimum value of the ideal loss function, and the minimum value of the ideal loss function is expressed by the following formula:
其中,A*表示所有的理想LORA-adaper,ω*表示所有理想融合权重,argmin表示当后面的公式取得最小值时,对应的参数A和ω的取值。Among them, A * represents all ideal LORA-adapers, ω * represents all ideal fusion weights, and argmin represents the values of the corresponding parameters A and ω when the following formula reaches the minimum value.
在本发明实施例中,所有的理想LORA-adaper也可以称为所有的第一理想参数,第一理想参数表示分别添加到问答大语言模型和所述对话大语言模型的所有的LoRA-adapter,其中,第一理想参数通过下列公式表示:In an embodiment of the present invention, all ideal LoRA-adapers may also be referred to as all first ideal parameters, where the first ideal parameters represent all LoRA-adapters added to the question-answering large language model and the dialogue large language model, respectively, wherein the first ideal parameter is represented by the following formula:
其中,A表示第一理想参数,表示在问答数据集QM上微调获得的第一理想参数,表示在对话数据集CN上微调获得的第一理想参数,M表示问答数据集的数量,N表示对话数据集的数量。Where A represents the first ideal parameter, represents the first ideal parameter obtained by fine-tuning on the question-answering dataset Q M , represents the first ideal parameter obtained by fine-tuning on the dialogue dataset CN , M represents the number of question-answering datasets, and N represents the number of dialogue datasets.
具体地,ω表示所有LoRA-adapter的理想融合权重,即表示问答LoRA-adapter和对话LoRA-adapter的理想融合权重,其可以通过下列公式表示:Specifically, ω represents the ideal fusion weight of all LoRA-adapters, that is, the ideal fusion weight of the question-answer LoRA-adapter and the conversation LoRA-adapter, which can be expressed by the following formula:
其中,ω表示理想融合权重,表示/>的理想融合权重,/>表示/>的理想融合权重。Among them, ω represents the ideal fusion weight, Indicates/> The ideal fusion weight of Indicates/> The ideal fusion weights.
在步骤103中,根据所述理想损失函数,对每个所述问答数据集对应的问答LoRA-adapter、每个所述对话数据集对应的对话LoRA-adapter进行微调,分别得到问答LoRA-adapter的最佳参数、对话LoRA-adapter的最佳参数和最佳融合参数。In step 103, according to the ideal loss function, the question-answer LoRA-adapter corresponding to each question-answer data set and the conversation LoRA-adapter corresponding to each conversation data set are fine-tuned to obtain the optimal parameters of the question-answer LoRA-adapter, the optimal parameters of the conversation LoRA-adapter and the optimal fusion parameters respectively.
在实际应用中,为了提高效率和简便性,第一理想参数和理想融合权重会被顺序微调。在第一阶段,分别对每个问答数据集或对话数据集上的每个LoRA-adapter(所有第一理想参数)进行微调,即针对公式(8),先将公式(8)拆分为两个如下所示的公式,然后可以求出问答LoRA-adapter的最佳参数和对话LoRA-adapter的最佳参数,具体如下所示:In practical applications, in order to improve efficiency and simplicity, the first ideal parameters and the ideal fusion weights are fine-tuned sequentially. In the first stage, each LoRA-adapter (all first ideal parameters) on each question-answering dataset or dialogue dataset is fine-tuned, that is, for formula (8), first split formula (8) into two formulas as shown below, and then the optimal parameters of the question-answering LoRA-adapter and the optimal parameters of the dialogue LoRA-adapter can be calculated, as shown below:
其中,表示问答LoRA-adapter的最佳参数,/>表示对话LoRA-adapter的最佳参数,/>表示在对话数据集Ci上训练得到的对话LoRA-adapter,/>表示在问答数据集Qi上训练得到的问答LoRA-adapter,/>表示问答负对数似然损失函数,/>表示对话负对数似然损失函数。in, represents the optimal parameters of the question-answering LoRA-adapter,/> Indicates the optimal parameters for the LoRA-adapter dialogue, /> represents the conversation LoRA-adapter trained on the conversation dataset Ci ,/> represents the question-answering LoRA-adapter trained on the question-answering dataset Qi ,/> represents the question-answering negative log-likelihood loss function,/> represents the dialogue negative log-likelihood loss function.
进一步地,在第二阶段只微调理想融合权重,而将基本大语言模型和第一理想参数冻结,得到最佳融合参数,最佳融合参数具体如下所示:Furthermore, in the second stage, only the ideal fusion weights are fine-tuned, while the basic large language model and the first ideal parameters are frozen to obtain the optimal fusion parameters. The optimal fusion parameters are specifically as follows:
其中,ω**表示最佳融合参数,表示在问答数据集Qi上微调获得/>的初始融合权重,/>表示在对话数据集Ci上微调获得/>的初始融合权重,/>表示问答负对数似然损失函数,/>表示对话负对数似然损失函数,ω表示理想融合权重。Among them, ω ** represents the optimal fusion parameter, Denotes fine-tuning on the question-answering dataset Qi /> The initial fusion weight of Represents fine-tuning on the dialogue dataset Ci to obtain/> The initial fusion weight of represents the question-answering negative log-likelihood loss function,/> represents the dialogue negative log-likelihood loss function, and ω represents the ideal fusion weight.
需要说明的是,在实际应用中,当问答数据集和对话数据集的数量较少时,可以使用一些简单快捷的算法来优化理想融合权重。It should be noted that in practical applications, when the number of question-answering datasets and dialogue datasets is small, some simple and fast algorithms can be used to optimize the ideal fusion weights.
步骤104,根据所述问答LoRA-adapter的最佳参数、所述对话LoRA-adapter的最佳参数和所述最佳融合参数得到通用LoRA-adapter。Step 104, obtaining a universal LoRA-adapter according to the optimal parameters of the question-and-answer LoRA-adapter, the optimal parameters of the dialogue LoRA-adapter and the optimal fusion parameters.
综上所述,本发明实施例提供一种大语言模型微调和Adapter融合方法及装置,该方法通过对LoRA-adapter进行融合,可以有效避免数据集融合引起的语义空间冲突,同时提升大语言模型在多个任务上的泛化性能。解决了现有技术中由于不同数据集在语义空间中存在冲突而导致性能下降的问题。In summary, the embodiment of the present invention provides a large language model fine-tuning and adapter fusion method and device, which can effectively avoid the semantic space conflict caused by data set fusion by fusing LoRA-adapter, and improve the generalization performance of the large language model on multiple tasks. It solves the problem of performance degradation caused by conflicts between different data sets in the semantic space in the prior art.
基于同一发明构思,本发明实施例提供了一种大语言模型微调和Adapter融合装置,由于该装置解决技术问题的原理与一种大语言模型微调和Adapter融合方法相似,因此该装置的实施可以参见方法的实施,重复之处不再赘述。Based on the same inventive concept, an embodiment of the present invention provides a large language model fine-tuning and Adapter fusion device. Since the principle of solving the technical problem of the device is similar to that of a large language model fine-tuning and Adapter fusion method, the implementation of the device can refer to the implementation of the method, and the repeated parts will not be repeated.
如图2所示,该装置主要包括第一得到单元201,第二得到单元202,第三得到单元203和第四得到单元204。As shown in FIG. 2 , the device mainly includes a first obtaining unit 201 , a second obtaining unit 202 , a third obtaining unit 203 and a fourth obtaining unit 204 .
第一得到单元201,用于从设定网络平台上收集多个问答数据集和对话数据集,对所述问答数据集和所述对话数据集分别进行LoRA-adapter微调,依次得到问答大语言模型、问答负对数似然损失函数、对话大语言模型和对话负对数似然损失函数;The first obtaining unit 201 is used to collect multiple question-answering datasets and dialogue datasets from a set network platform, perform LoRA-adapter fine-tuning on the question-answering dataset and the dialogue dataset respectively, and obtain a question-answering large language model, a question-answering negative log-likelihood loss function, a dialogue large language model, and a dialogue negative log-likelihood loss function in turn;
第二得到单元202,用于根据所述问答负对数似然损失函数、所述对话负对数似然损失函数和基于每个LoRA-adapter微调所包括的初始融合权重,得到所述问答数据集和所述对话数据集在理想状态下的理想损失函数,根据所述理想损失函数的最小值得到所述理想损失函数所对应的理想融合权重和第一理想参数;其中,所述第一理想参数表示分别添加到所述问答大语言模型和所述对话大语言模型的所有的LoRA-adapter;The second obtaining unit 202 is used to obtain the ideal loss function of the question and answer dataset and the dialogue dataset in an ideal state according to the question and answer negative log-likelihood loss function, the dialogue negative log-likelihood loss function and the initial fusion weight included in the fine-tuning based on each LoRA-adapter, and obtain the ideal fusion weight and the first ideal parameter corresponding to the ideal loss function according to the minimum value of the ideal loss function; wherein the first ideal parameter represents all LoRA-adapters added to the question and answer large language model and the dialogue large language model respectively;
第三得到单元203,用于根据所述理想损失函数,对每个所述问答数据集对应的问答LoRA-adapter、每个所述对话数据集对应的对话LoRA-adapter进行微调,分别得到问答LoRA-adapter的最佳参数、对话LoRA-adapter的最佳参数和最佳融合参数;The third obtaining unit 203 is used to fine-tune the question and answer LoRA-adapter corresponding to each question and answer data set and the conversation LoRA-adapter corresponding to each conversation data set according to the ideal loss function, and obtain the optimal parameters of the question and answer LoRA-adapter, the optimal parameters of the conversation LoRA-adapter and the optimal fusion parameters respectively;
第四得到单元204,用于根据所述问答LoRA-adapter的最佳参数、所述对话LoRA-adapter的最佳参数和所述最佳融合参数得到通用LoRA-adapter。The fourth obtaining unit 204 is used to obtain a universal LoRA-adapter according to the optimal parameters of the question-answer LoRA-adapter, the optimal parameters of the dialogue LoRA-adapter and the optimal fusion parameters.
应当理解,以上一种大语言模型微调和Adapter融合装置包括的单元仅根据该装置实现的功能进行逻辑划分,实际应用中,可以进行上述单元的叠加或拆分。并且该实施例提供的一种大语言模型微调和Adapter融合装置所实现的功能与上述实施例提供的一种大语言模型微调和Adapter融合方法一一对应,对于该装置所实现的更为详细的处理流程,在上述方法实施例一中已做详细描述,此处不再详细描述。It should be understood that the units included in the above large language model fine-tuning and Adapter fusion device are logically divided only according to the functions implemented by the device. In practical applications, the above units can be superimposed or split. In addition, the functions implemented by the large language model fine-tuning and Adapter fusion device provided in this embodiment correspond one-to-one to the large language model fine-tuning and Adapter fusion method provided in the above embodiment. The more detailed processing flow implemented by the device has been described in detail in the above method embodiment 1 and will not be described in detail here.
本发明另一实施例还提供一种计算机设备,计算机设备包括:处理器和存储器;所述存储器用于存储计算机程序代码,所述计算机程序代码包括计算机指令;当所述处理器执行所述计算机指令时,所述电子设备执行上述方法实施例所示的方法流程中大语言模型微调和Adapter融合方法的各个步骤。Another embodiment of the present invention further provides a computer device, comprising: a processor and a memory; the memory is used to store computer program code, and the computer program code includes computer instructions; when the processor executes the computer instructions, the electronic device executes each step of the large language model fine-tuning and Adapter fusion method in the method flow shown in the above method embodiment.
本发明另一实施例还提供一种计算机可读存储介质,该计算机可读存储介质中存储有计算机指令,当计算机指令在计算机设备上运行时,使得计算机设备执行上述方法实施例所示的方法流程中大语言模型微调和Adapter融合方法的各个步骤。Another embodiment of the present invention further provides a computer-readable storage medium, which stores computer instructions. When the computer instructions are executed on a computer device, the computer device executes each step of the large language model fine-tuning and Adapter fusion method in the method flow shown in the above method embodiment.
尽管已描述了本发明的优选实施例,但本领域内的技术人员一旦得知了基本创造性概念,则可对这些实施例作出另外的变更和修改。所以,所附权利要求意欲解释为包括优选实施例以及落入本发明范围的所有变更和修改。Although the preferred embodiments of the present invention have been described, those skilled in the art may make other changes and modifications to these embodiments once they have learned the basic creative concept. Therefore, the appended claims are intended to be interpreted as including the preferred embodiments and all changes and modifications that fall within the scope of the present invention.
显然,本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样,倘若本发明的这些修改和变型属于本发明权利要求及其等同技术的范围之内,则本发明也意图包含这些改动和变型在内。Obviously, those skilled in the art can make various changes and modifications to the present invention without departing from the spirit and scope of the present invention. Thus, if these modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include these modifications and variations.
Claims (8)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410170139.4A CN117708307B (en) | 2024-02-06 | 2024-02-06 | Method and device for fusing micro-tuning and Adapter of large language model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410170139.4A CN117708307B (en) | 2024-02-06 | 2024-02-06 | Method and device for fusing micro-tuning and Adapter of large language model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117708307A CN117708307A (en) | 2024-03-15 |
CN117708307B true CN117708307B (en) | 2024-05-14 |
Family
ID=90144771
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410170139.4A Active CN117708307B (en) | 2024-02-06 | 2024-02-06 | Method and device for fusing micro-tuning and Adapter of large language model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117708307B (en) |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115712709A (en) * | 2022-11-18 | 2023-02-24 | 哈尔滨工业大学 | Multi-modal dialog question-answer generation method based on multi-relationship graph model |
CN116975241A (en) * | 2023-09-20 | 2023-10-31 | 广东技术师范大学 | Liver cancer auxiliary diagnosis and question-answering method, system and medium based on large language model |
CN117033608A (en) * | 2023-09-28 | 2023-11-10 | 中国电子科技集团公司第十研究所 | Knowledge graph generation type question-answering method and system based on large language model |
CN117171326A (en) * | 2023-09-20 | 2023-12-05 | 宜宾电子科技大学研究院 | Rapid construction method of financial question-answering algorithm and life cycle management platform |
WO2023235346A1 (en) * | 2022-06-03 | 2023-12-07 | Google Llc | Prompting machine-learned models using chains of thought |
CN117216234A (en) * | 2023-08-18 | 2023-12-12 | 腾讯科技(深圳)有限公司 | Artificial intelligence-based speaking operation rewriting method, device, equipment and storage medium |
CN117217289A (en) * | 2023-10-09 | 2023-12-12 | 北银金融科技有限责任公司 | Banking industry large language model training method |
CN117290492A (en) * | 2023-11-27 | 2023-12-26 | 深圳市灵智数字科技有限公司 | Knowledge base question-answering method and device, electronic equipment and storage medium |
CN117371527A (en) * | 2023-11-01 | 2024-01-09 | 中国科学院计算技术研究所 | Multi-mode entity linking method and system based on large model |
CN117455009A (en) * | 2023-10-27 | 2024-01-26 | 腾讯科技(深圳)有限公司 | Federal learning method, federal prediction method, apparatus, device, and storage medium |
CN117453925A (en) * | 2023-10-24 | 2024-01-26 | 腾讯科技(深圳)有限公司 | Knowledge migration method, apparatus, device, readable storage medium and program product |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110188182B (en) * | 2019-05-31 | 2023-10-27 | 中国科学院深圳先进技术研究院 | Model training method, dialogue generating method, device, equipment and medium |
-
2024
- 2024-02-06 CN CN202410170139.4A patent/CN117708307B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023235346A1 (en) * | 2022-06-03 | 2023-12-07 | Google Llc | Prompting machine-learned models using chains of thought |
CN115712709A (en) * | 2022-11-18 | 2023-02-24 | 哈尔滨工业大学 | Multi-modal dialog question-answer generation method based on multi-relationship graph model |
CN117216234A (en) * | 2023-08-18 | 2023-12-12 | 腾讯科技(深圳)有限公司 | Artificial intelligence-based speaking operation rewriting method, device, equipment and storage medium |
CN116975241A (en) * | 2023-09-20 | 2023-10-31 | 广东技术师范大学 | Liver cancer auxiliary diagnosis and question-answering method, system and medium based on large language model |
CN117171326A (en) * | 2023-09-20 | 2023-12-05 | 宜宾电子科技大学研究院 | Rapid construction method of financial question-answering algorithm and life cycle management platform |
CN117033608A (en) * | 2023-09-28 | 2023-11-10 | 中国电子科技集团公司第十研究所 | Knowledge graph generation type question-answering method and system based on large language model |
CN117217289A (en) * | 2023-10-09 | 2023-12-12 | 北银金融科技有限责任公司 | Banking industry large language model training method |
CN117453925A (en) * | 2023-10-24 | 2024-01-26 | 腾讯科技(深圳)有限公司 | Knowledge migration method, apparatus, device, readable storage medium and program product |
CN117455009A (en) * | 2023-10-27 | 2024-01-26 | 腾讯科技(深圳)有限公司 | Federal learning method, federal prediction method, apparatus, device, and storage medium |
CN117371527A (en) * | 2023-11-01 | 2024-01-09 | 中国科学院计算技术研究所 | Multi-mode entity linking method and system based on large model |
CN117290492A (en) * | 2023-11-27 | 2023-12-26 | 深圳市灵智数字科技有限公司 | Knowledge base question-answering method and device, electronic equipment and storage medium |
Non-Patent Citations (3)
Title |
---|
Zoie Zhao.More human than human: LLM-generated narratives outperform human-LLM interleaved narratives.C&C '23: Proceedings of the 15th Conference on Creativity and Cognition.2023,全文. * |
基于BiLSTM模型的定义抽取方法;阳萍;谢志鹏;;计算机工程;20200331;第46卷(第03期);全文 * |
基于深度学习的个性化聊天机器人研究;王乾铭;李吟;;计算机技术与发展;20200430;第30卷(第04期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN117708307A (en) | 2024-03-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN116226334B (en) | Method for training generated large language model and searching method based on model | |
CN110427461B (en) | Intelligent question and answer information processing method, electronic equipment and computer readable storage medium | |
CN116166782A (en) | Intelligent question-answering method based on deep learning | |
CN116127020B (en) | Method for training generated large language model and searching method based on model | |
CN107741976B (en) | Intelligent response method, device, medium and electronic equipment | |
CN109635080A (en) | Acknowledgment strategy generation method and device | |
CN107451230A (en) | A kind of answering method and question answering system | |
JP2022068120A (en) | How to handle chat channel communication, chat channel processing system, program (intelligent chat channel processor) | |
CN114691831A (en) | Task-type intelligent automobile fault question-answering system based on knowledge graph | |
CN114385817B (en) | Entity relationship identification method, device and readable storage medium | |
US20240153259A1 (en) | Single image concept encoder for personalization using a pretrained diffusion model | |
CN108491515A (en) | A kind of sentence pair matching degree prediction technique for campus psychological consultation | |
CN117077792B (en) | Knowledge graph-based method and device for generating prompt data | |
CN113177415A (en) | Semantic understanding method and device, electronic equipment and storage medium | |
CN113761220A (en) | Information acquisition method, device, equipment and storage medium | |
CN112784591B (en) | Data processing method and device, electronic equipment and storage medium | |
CN117726011A (en) | Model distillation method and device, medium and equipment for natural language processing | |
CN110322959A (en) | A kind of Knowledge based engineering depth medical care problem method for routing and system | |
CN118410457B (en) | Multimodal recognition method, device, electronic device and storage medium based on large language model | |
CN117708307B (en) | Method and device for fusing micro-tuning and Adapter of large language model | |
CN117315067A (en) | Method and device for generating text graph and computer equipment | |
CN116910217A (en) | Natural language question-answering method, device and medium based on small language model cluster | |
CN115292492A (en) | Method, device and equipment for training intention classification model and storage medium | |
CN115618968B (en) | New idea discovery method and device, electronic device and storage medium | |
CN115129861B (en) | Text classification method and device, storage medium and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |