WO2024031891A1 - 知识表征解耦的分类模型的微调方法、装置和应用 - Google Patents

知识表征解耦的分类模型的微调方法、装置和应用 Download PDF

Info

Publication number
WO2024031891A1
WO2024031891A1 PCT/CN2022/137938 CN2022137938W WO2024031891A1 WO 2024031891 A1 WO2024031891 A1 WO 2024031891A1 CN 2022137938 W CN2022137938 W CN 2022137938W WO 2024031891 A1 WO2024031891 A1 WO 2024031891A1
Authority
WO
WIPO (PCT)
Prior art keywords
classification
instance
vector
model
phrases
Prior art date
Application number
PCT/CN2022/137938
Other languages
English (en)
French (fr)
Inventor
张宁豫
李磊
陈想
陈华钧
Original Assignee
浙江大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浙江大学 filed Critical 浙江大学
Publication of WO2024031891A1 publication Critical patent/WO2024031891A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation

Definitions

  • the invention belongs to the technical field of natural language processing, and specifically relates to a fine-tuning method, device and application of a classification model with decoupled knowledge representation.
  • Pre-trained classification models have achieved exciting and remarkable results in the field of natural language processing by deeply learning knowledge from massive amounts of data.
  • the pre-training classification model is trained from large-scale corpus by designing general pre-training tasks, such as masking mask modeling (MLM), next sentence prediction (NSP), etc., and then applied to downstream relationship classification, emotion classification and other classifications.
  • MLM masking mask modeling
  • NSP next sentence prediction
  • hint learning reduces the difference between the fine-tuning stage and the pre-training stage of the pre-trained classification model, making the pre-trained classification model further capable of few-shot and zero-shot learning.
  • Prompt learning can be divided into discrete prompts and continuous prompts. Discrete prompts convert the input form by manually constructing discrete prompt templates. Continuous prompts add a series of learnable continuous embedding vectors to the input sequence, reducing prompt engineering.
  • Patent document CN101127042A discloses an emotion classification method based on a classification model.
  • Patent document CN108363753A discloses a review text emotion classification model training and emotion classification method, device and equipment. Both patent applications extract embedding vectors of text. Finally, sentiment classification is constructed based on embedding vectors. When sample data is scarce in these two methods, it is difficult to achieve accuracy in emotion classification due to poor extracted embedding vectors.
  • the purpose of the present invention is to provide a fine-tuning method, device and application of a classification model that decouples knowledge representation.
  • the knowledge The library serves as a similarity guide to optimize the classification model to improve the knowledge representation capability and accuracy of the classification model, thereby improving the classification accuracy of downstream classification tasks.
  • the embodiment provides a method for fine-tuning a classification model with decoupled knowledge representation, including the following steps:
  • Step 1 Build a knowledge base for retrieval. There are multiple instance phrases in the knowledge base. Each instance phrase is stored in the form of a key-value pair, where the key stores the embedding vector of the instance word and the value stores the label true value of the instance phrase. ;
  • Step 2 Build a classification model including a pre-trained language model and a predictive classification module
  • Step 3 Use the pre-trained language model to extract the first embedding vector of the masked word in the input instance text, and use the first embedding vector as the first query vector to query the knowledge base for each label category that is the closest to the first query vector. Multiple nearby instance phrases are used as the first neighboring instance phrases, and the aggregation result obtained by aggregating all the first neighboring instance phrases and the first query vector is used as the input data of the pre-trained language model;
  • Step 4 Use the pre-trained language model to extract the second embedding vector of the masked word in the input data, and use the prediction classification module to classify and predict the second embedding vector to obtain the classification prediction probability. Based on the classification prediction probability and the label truth of the masking word, value to calculate the classification loss;
  • Step 5 Use the true label value of the masked word to construct a weight factor, and adjust the classification loss based on the weight factor to make the classification loss pay more attention to misclassified instances;
  • Step 6 Use the adjusted classification loss to optimize the parameters of the classification model to obtain a classification model with optimized parameters.
  • the embodiment provides a fine-tuning device for a classification model with decoupled knowledge representation, including:
  • the knowledge base construction and update unit is used to build a knowledge base for retrieval.
  • Each instance phrase is stored in the form of a key-value pair, where the key stores the embedding vector of the instance word and the value stores The label truth value of the instance phrase;
  • a classification model building unit used to build a classification model including a pre-trained language model and a predictive classification module;
  • the query and aggregation unit is used to use the pre-trained language model to extract the first embedding vector of the masked word in the input instance text, and use the first embedding vector as the first query vector to query the knowledge base for each label category and the first embedding vector.
  • Multiple instance phrases closest to a query vector are used as the first neighboring instance phrases, and the aggregation result obtained by aggregating all the first neighboring instance phrases and the first query vector is used as the input data of the pre-trained language model;
  • the loss calculation unit is used to use the pre-trained language model to extract the second embedding vector of the masked word in the input data, and use the prediction classification module to classify and predict the second embedding vector to obtain the classification prediction probability. Based on the classification prediction probability and the masking word The true value of the label calculates the classification loss;
  • the loss adjustment unit is used to construct a weight factor based on the true label value of the masked word, and adjust the classification loss based on the weight factor so that the classification loss pays more attention to misclassified instances;
  • the parameter optimization unit is used to optimize the parameters of the classification model using the adjusted classification loss to obtain a parameter-optimized classification model.
  • embodiments also provide a task classification method using a knowledge representation decoupled classification model.
  • the task classification method applies the knowledge base constructed by the above fine-tuning method and the parameter-optimized classification model, including the following steps. :
  • Step 1 Use the pre-trained language model with optimized parameters to extract the third embedding vector of the masked word in the input instance text, and use the third embedding vector as the third query vector to query the third embedding vector from the knowledge base for each label category.
  • the multiple nearest instance phrases of the three query vectors are used as the third neighboring instance phrases, and the aggregation result obtained by aggregating all the third neighboring instance phrases and the third query vector is used as the input data of the pre-trained language model;
  • Step 2 Use the parameter-optimized pre-trained language model to extract the fourth embedding vector of the masked word in the input data, and query the multiple instance texts closest to the fourth query vector from the knowledge base for each category as the fourth neighboring instance text. , calculate the category correlation probability based on the similarity between the fourth query vector and the fourth neighboring instance text;
  • Step 3 Use the parameter-optimized prediction and classification module to perform classification prediction on the fourth embedding vector to obtain the classification prediction probability
  • Step 4 The weighted result of each category's correlation probability and classification prediction probability is used as the overall classification prediction result.
  • the beneficial effects of the present invention include at least:
  • KNN is used to retrieve from the knowledge base. Neighboring instance phrases are obtained as continuous neural examples, and neural examples are used to guide classification model training and correct classification model predictions, which improves the ability of the classification model in few-sample and zero-sample scenarios. When the amount of data is sufficient, the knowledge base will also have more information. With richer information, the classification model also performs very well in fully supervised scenarios.
  • Figure 1 is a flow chart of the fine-tuning method of the knowledge representation decoupled classification model provided by the embodiment
  • Figure 2 is a schematic diagram of the structure and training of the classification model, a schematic diagram of knowledge base update, and a schematic diagram of classification prediction provided by the embodiment;
  • Figure 3 is a flow chart of a task classification method using a knowledge representation decoupled classification model provided by the embodiment.
  • the embodiment provides a fine-tuning method and device for a classification model with decoupled knowledge representation, and a classification application of the fine-tuned classification model.
  • the memory is transferred from the pre-trained language model. Decoupling in the middle provides reference knowledge for model training and prediction, and improves the generalization ability of the model.
  • Figure 1 is a flow chart of the fine-tuning method of the knowledge representation decoupled classification model provided by the embodiment. As shown in Figure 1, the fine-tuning method of the knowledge representation decoupled classification model provided by the embodiment includes the following steps:
  • Step 1 Build a knowledge base for retrieval.
  • the knowledge base serves as an additional reference information to decouple the knowledge representation from the partial memory of the classification model, and is mainly used to store the knowledge representation obtained from the structure of the classification model.
  • the knowledge representation is in the form of city phrases. There exists, specifically, each instance phrase stored in the form of a key-value pair, where the key stores the embedding vector of the instance word and the value stores the label truth value of the instance phrase.
  • the embedding vector of the instance phrase is learned from the pre-trained language model based on the instance text of the prompt template. Specifically, it is the hidden vector output by the last layer of the pre-trained language model for the mask position in the instance text.
  • the knowledge base can be added, edited and deleted freely, as shown in Figure 2.
  • the first embedding vector of the masked word in the input instance text and its corresponding label true value form a new instance. Phrases are updated asynchronously into the knowledge base.
  • Step 2 Build a classification model including a pre-trained language model and a prediction classification module.
  • the classification model constructed in the embodiment includes a pre-trained language model.
  • the pre-trained language model is used to perform knowledge representation on the input instance text to extract the embedding vector of the mask position.
  • the input instance text needs to After serialization and conversion of the prompt template, the form of the prompt template is: [CLS] instance text [MASK] [SEP], an example is: [CLS] This movie has no meaning [MASK] [SEP], and the label true value is
  • the label vector is obtained by mapping it to the vocabulary space of the pre-trained language model through the mapping function.
  • the prediction classification module is used to perform classification prediction on the input embedding vector to output the classification prediction probability.
  • Step 3 Use the pre-trained language model to extract the first embedding vector of the masked word in the input instance text, and aggregate the input data by querying adjacent instance phrases from the knowledge base.
  • a pre-trained language model is used to extract the first embedding vector of the masked word in the input instance text, and the first embedding vector is used as the first query vector.
  • KNN nearest neighbor algorithm
  • KNN nearest neighbor algorithm
  • These first neighboring instance phrases are used as additional example inputs, and the aggregation result obtained by aggregating with the first query vector is used as pre-training.
  • the input data of the language model, where the aggregation formula is:
  • Step 4 Use the pre-trained language model to extract the second embedding vector of the masked word in the input data, use the prediction classification module to perform classification prediction on the second embedding vector, and calculate the classification loss based on the classification prediction probability.
  • the cross-entropy of the classification prediction probability corresponding to the input data and the true value of the label of the masking word is used as the classification loss L CE .
  • Step 5 Use the true value of the label of the masked word to construct a weight factor, and adjust the classification loss based on the weight factor to make the classification loss pay more attention to misclassified instances.
  • the true value of the label of the masked word is used to adjust the weights of correct classification and misclassification in the classification loss so that the classification model can better focus on misclassified samples.
  • the specific formula is as follows:
  • L CE represents the classification loss
  • represents the adjustment parameter
  • p knn represents the true value of the label of the masked word.
  • Step 6 Use the adjusted classification loss to optimize the parameters of the classification model to obtain a classification model with optimized parameters.
  • the constructed classification loss is used to optimize the parameters of the classification model, and in each round of training, the first embedding vector of the input instance text is used to construct the instance phrase and updated into the knowledge base.
  • the fine-tuning method of the above-mentioned knowledge representation decoupled classification model fine-tunes the classification model to improve its capabilities in few-sample and zero-sample scenarios.
  • the knowledge base will have better and richer information, and the classification model will also perform very well in fully supervised scenarios.
  • the embodiment also provides a fine-tuning device for a classification model with decoupled knowledge representation, including:
  • the knowledge base construction and update unit is used to build a knowledge base for retrieval.
  • Each instance phrase is stored in the form of a key-value pair, where the key stores the embedding vector of the instance word and the value stores The label truth value of the instance phrase;
  • a classification model building unit used to build a classification model including a pre-trained language model and a predictive classification module;
  • the query and aggregation unit is used to use the pre-trained language model to extract the first embedding vector of the masked word in the input instance text, and use the first embedding vector as the first query vector to query the knowledge base for each label category and the first embedding vector.
  • Multiple instance phrases closest to a query vector are used as the first neighboring instance phrases, and the aggregation result obtained by aggregating all the first neighboring instance phrases and the first query vector is used as the input data of the pre-trained language model;
  • the loss calculation unit is used to use the pre-trained language model to extract the second embedding vector of the masked word in the input data, and use the prediction classification module to classify and predict the second embedding vector to obtain the classification prediction probability. Based on the classification prediction probability and the masking word The true value of the label calculates the classification loss;
  • the loss adjustment unit is used to construct a weight factor based on the true label value of the masked word, and adjust the classification loss based on the weight factor so that the classification loss pays more attention to misclassified instances;
  • the parameter optimization unit is used to optimize the parameters of the classification model using the adjusted classification loss to obtain a parameter-optimized classification model.
  • the fine-tuning device for the knowledge representation decoupled classification model provided in the above embodiment is fine-tuning the classification model
  • the division of the above-mentioned functional units should be used as an example.
  • the above-mentioned functions can be allocated to different functions as needed.
  • Unit completion means that the internal structure of the terminal or server is divided into different functional units to complete all or part of the functions described above.
  • the fine-tuning device for a classification model with decoupled knowledge representation and the fine-tuning method for a classification model with decoupled knowledge representation provided in the above embodiments belong to the same concept. For details on the implementation process, see the fine-tuning method for a classification model with decoupled knowledge representation. The embodiments will not be described again here.
  • the embodiment also provides a task classification method using a knowledge representation decoupled classification model.
  • the task classification method applies the knowledge base constructed by the above fine-tuning method and the parameter-optimized classification model, as shown in Figure 3 display, including the following steps:
  • Step 1 Use the pre-trained language model with optimized parameters to extract the third embedding vector of the masked word in the input instance text, and aggregate the input data by querying adjacent instance phrases from the knowledge base.
  • a pre-trained language model with optimized parameters is used to extract the third embedding vector of the masked word in the input instance text, and the third embedding vector is used as the third query vector to query and query from the knowledge base for each label category.
  • Multiple instance phrases nearest to the third query vector are used as third neighboring instance phrases, and an aggregation result obtained by aggregating all third neighboring instance phrases and the third query vector is used as the input data of the pre-trained language model.
  • the non-parametric method KNN is used to retrieve instance phrases adjacent to the input instance text from the knowledge base.
  • the results of KNN retrieval are regarded as indication information of easy and difficult instances, allowing the classification model to pay more attention to difficult samples during training.
  • Step 2 Use the pre-trained language model with optimized parameters to extract the fourth embedding vector of the masked word in the input data, and calculate the category correlation probability by querying adjacent instance phrases from the knowledge base.
  • a parameter-optimized pre-trained language model is used to extract the fourth embedding vector of the masked word in the input data.
  • KNN search is used to query multiple instance texts closest to the fourth query vector from the knowledge base as
  • the category correlation probability is calculated based on the similarity between the fourth query vector and the fourth neighboring instance text.
  • the following formula is used based on the similarity between the fourth query vector and the fourth neighboring instance text. to calculate class-related probabilities:
  • q t ) represents the category correlation probability of the i-th classification category of the input instance text q t
  • the inner product distance between the embedding vector h ci of the instance phrase ci belonging to the i-th classification category yi, as the inner product similarity, N represents the knowledge base.
  • KNN is a non-parametric method that can easily predict input instance text without any classification layer. Therefore, the classification results of KNN (category related probability) can be intuitively used as a kind of prior knowledge to guide Pre-train the classification model so that it pays more attention to difficult samples (or atypical samples).
  • Step 3 Use the parameter-optimized prediction classification module to perform classification prediction on the fourth embedding vector to obtain the classification prediction probability.
  • Step 4 The weighted result of each category's correlation probability and classification prediction probability is used as the overall classification prediction result.
  • represents the weight parameter
  • q t ) obtained through KNN retrieval can be further used in the inference process of the classification model to correct errors produced by the classification model during inference.
  • the task classification method using the knowledge representation decoupled classification model provided in the embodiment can be used for relationship classification tasks.
  • the label truth values of instance phrases stored in the knowledge base are relationship types, including friend relationships, kinship relationships, colleague relationships, and classmate relationships.
  • the input instance text is processed through steps 1 and 2 Calculate the category correlation probability of each relationship type, calculate the classification prediction probability according to step 3, calculate the total classification prediction result corresponding to each relationship type according to step 4, and obtain the largest total classification prediction result through screening as the corresponding input instance text The final relationship classification result.
  • the task classification method using the knowledge representation decoupled classification model provided in the embodiment can be used for emotion classification tasks.
  • the true label value of the instance phrase stored in the knowledge base is the emotion type, including positive emotion and negative emotion.
  • each emotion is calculated through steps 1 and 2 based on the input instance text.
  • calculate the classification prediction probability according to step 3 calculate the total classification prediction result corresponding to the emotion type according to step 4, and obtain the largest total classification prediction result through screening as the final emotion classification result corresponding to the input instance text.
  • Roberta-large is used as the pre-trained language model.
  • the open source library FAISS is used for KNN retrieval.
  • q t ) are weighted and summed to obtain the overall classification prediction result.
  • the weight parameter ⁇ is selected as 0.5, so that the overall classification prediction probability of the label "bad review” is 0.6, and the overall classification predicted probability of "good review” is 0.4.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本发明公开了一种知识表征解耦的分类模型的微调方法、装置和应用,将知识表征与分类模型解耦,存储于知识库中,应用的时候根据检索进行匹配聚合,这样限制了学习模型的死记硬背,提高了模型的泛化能力,同时利用KNN从知识库中检索得到邻近实例短语作为连续的神经示例,利用神经示例指导分类模型训练和纠正分类模型预测,提高了分类模型在少样本和零样本场景下的能力,当数据量足够多时,知识库相应也拥有更佳更丰富的信息,分类模型在全监督场景下表现也十分突出。

Description

知识表征解耦的分类模型的微调方法、装置和应用 技术领域
本发明属于自然语言处理技术领域,具体涉及一种知识表征解耦的分类模型的微调方法、装置和应用。
背景技术
预训练分类模型通过从海量数据中深度学习知识,在自然语言处理领域取得了激动人心的显著成果。预训练分类模型通过设计通用的预训练任务,如遮蔽掩码建模(MLM)、下句预测(NSP)等,从大规模的语料中进行训练,在应用到下游关系分类、情感分类等分类任务时,只需使用少量数据微调预训练分类模型,便能取得良好性能。
提示学习的出现,减少了预训练分类模型在微调阶段与预训练阶段的差异性,使得预训练分类模型进一步具备了少样本和零样本学习的能力。提示学习可分为离散提示和连续提示,离散提示通过人工构建离散的提示模板来转换输入形式,连续提示在输入序列中添加一系列可学习的连续嵌入向量,减少了提示工程。
然而,最近的研究表明当数据量及其匮乏时,预训练分类模型的泛化能力不尽人意。一个潜在的原因在于,参数化模型通过记忆的方式很难掌握稀疏和困难样本,导致不充分的泛化能力。当数据呈现长尾分布并且具有小的非典型实例集群,预训练分类模型倾向于通过死记硬背这些非典型实例而不是通过学习更通用的模式知识来进行预测,这会导致预训练分类模型学习的知识表示在下游分类任务中表现差,分类结果准确率不高。
专利文献CN101127042A公开了一种基于分类模型的情感分类方法, 专利文献CN108363753A公开了一种评论文本情感分类模型训练与情感分类方法、装置及设备,这两篇专利申请均是通过提取文本的嵌入向量后,基于嵌入向量来构建进行情感分类。这两种方式当样本数据匮乏时,由于提取的嵌入向量不佳,就难实现情感分类的准确性。
发明内容
针对现有技术所存在的上述技术问题,本发明的目的是提供一种知识表征解耦的分类模型的微调方法、装置和应用,通过将分类模型得到的知识表征解耦成知识库,该知识库作为相似度引导来优化分类模型,以提高分类模型知识表示的能力和准确性,进而提高下游分类任务的分类准确性。
为实现上述发明目的,实施例提供的一种知识表征解耦的分类模型的微调方法,包括以下步骤:
步骤1,构建用于检索的知识库,知识库中存有多个实例短语,每个实例短语以键值对的形式存储,其中键存储实例词语的嵌入向量,值存储实例短语的标签真值;
步骤2,构建包含预训练语言模型、预测分类模块的分类模型;
步骤3,利用预训练语言模型提取输入实例文本中遮蔽词的第一嵌入向量,并以该第一嵌入向量作为第一查询向量,针对每个标签类别从知识库中查询与第一查询向量最邻近的多个实例短语作为第一邻近实例短语,将所有第一邻近实例短语与第一查询向量聚合得到的聚合结果作为预训练语言模型的输入数据;
步骤4,利用预训练语言模型提取输入数据中遮蔽词的第二嵌入向量,利用预测分类模块对第二嵌入向量进行分类预测,以得到分类预测概率,基于该分类预测概率和遮蔽词的标签真值计算分类损失;
步骤5,以遮蔽词的标签真值来构建权重因子,根据权重因子对分类损失进行调整,使分类损失更关注错误分类实例;
步骤6,利用调整后的分类损失优化分类模型的参数,得到参数优化后的分类模型。
为实现上述发明目的,实施例提供的一种知识表征解耦的分类模型的微调装置,包括:
知识库构建和更新单元,用于构建用于检索的知识库,知识库中存有多个实例短语,每个实例短语以键值对的形式存储,其中键存储实例词语的嵌入向量,值存储实例短语的标签真值;
分类模型构建单元,用于构建包含预训练语言模型、预测分类模块的分类模型;
查询及聚合单元,用于利用预训练语言模型提取输入实例文本中遮蔽词的第一嵌入向量,并以该第一嵌入向量作为第一查询向量,针对每个标签类别从知识库中查询与第一查询向量最邻近的多个实例短语作为第一邻近实例短语,将所有第一邻近实例短语与第一查询向量聚合得到的聚合结果作为预训练语言模型的输入数据;
损失计算单元,用于利用预训练语言模型提取输入数据中遮蔽词的第二嵌入向量,利用预测分类模块对第二嵌入向量进行分类预测,以得到分类预测概率,基于该分类预测概率和遮蔽词的标签真值计算分类损失;
损失调整单元,用于以遮蔽词的标签真值来构建权重因子,根据权重因子对分类损失进行调整,使分类损失更关注错误分类实例;
参数优化单元,用于利用调整后的分类损失优化分类模型的参数,得到参数优化后的分类模型。
为实现上述发明目的,实施例还提供了一种利用知识表征解耦的分类 模型的任务分类方法,所述任务分类方法应用上述微调方法构建的知识库和参数优化后的分类模型,包括以下步骤:
步骤1,利用参数优化后的预训练语言模型提取输入实例文本中遮蔽词的第三嵌入向量,并以该第三嵌入向量作为第三查询向量,针对每个标签类别从知识库中查询与第三查询向量最邻近的多个实例短语作为第三邻近实例短语,将所有第三邻近实例短语与第三查询向量聚合得到的聚合结果作为预训练语言模型的输入数据;
步骤2,利用参数优化后的预训练语言模型提取输入数据中遮蔽词的第四嵌入向量,针对每类从知识库中查询与第四查询向量最邻近的多个实例文本作为第四邻近实例文本,依据第四查询向量与第四邻近实例文本之间的相似度来计算类别相关概率;
步骤3,利用参数优化后的预测分类模块对第四嵌入向量进行分类预测,以得到分类预测概率;
步骤4,以每个类别相关概率和分类预测概率的加权结果作为总分类预测结果。
与现有技术相比,本发明具有的有益效果至少包括:
将知识表征与分类模型解耦,存储于知识库中,应用的时候根据检索进行匹配聚合,这样限制了学习模型的死记硬背,提高了模型的泛化能力,同时利用KNN从知识库中检索得到邻近实例短语作为连续的神经示例,利用神经示例指导分类模型训练和纠正分类模型预测,提高了分类模型在少样本和零样本场景下的能力,当数据量足够多时,知识库相应也拥有更佳更丰富的信息,分类模型在全监督场景下表现也十分突出。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图做简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动前提下,还可以根据这些附图获得其他附图。
图1是实施例提供的知识表征解耦的分类模型的微调方法的流程图;
图2是实施例提供的分类模型的结构及训练示意图和知识库更新示意图以及分类预测示意图;
图3是实施例提供的利用知识表征解耦的分类模型的任务分类方法的流程图。
具体实施方式
为使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例对本发明进行进一步的详细说明。应当理解,此处所描述的具体实施方式仅仅用以解释本发明,并不限定本发明的保护范围。
针对传统的提示学习方法和微调方法不能很好的处理非典型样本,导致分类模型的表示能力不强,进而影响分类任务的预测准确性。现有技术通过死记硬背这些非典型实例而不是通过学习更通用的模式知识来进行预测,会导致模型表征能力差,这与人类通过类比来学习知识相反,人类可以通过联想学习来回忆深层记忆中的相关技能,从而相互加强,从而拥有解决小样本和零样本任务的非凡能力。受此启发,实施例提供了一种知识表征解耦的分类模型的微调方法和装置,以及微调后的分类模型的分类应用,通过从训练实例文本中构建知识库,将记忆从预训练语言模型中解耦,为模型的训练和预测提供参考知识,提高模型的泛化能力。
图1是实施例提供的知识表征解耦的分类模型的微调方法的流程图。 如图1所示,实施例提供的知识表征解耦的分类模型的微调方法,包括以下步骤:
步骤1,构建用于检索的知识库。
实施例中,知识库作为一种额外的参考信息将知识表征从分类模型的部分记忆中解耦出来,主要用于存储从分类模型中结构得到的知识表征,该知识表征以市里短语的形式存在,具体地,每个实例短语以键值对的形式存储,其中,键存储实例词语的嵌入向量,值存储实例短语的标签真值。实例短语的嵌入向量是基于提示模板的实例文本经过预训练语言模型学习得到,具体为实例文本中的掩码位置在预训练语言模型中最后一层输出的隐藏向量。
需要说明的是,知识库可以被自由的被添加、编辑和删除,如图2所示,每轮训练时,输入实例文本中遮蔽词的第一嵌入向量及其对应的标签真值组成新实例短语,被异步更新到知识库中。
步骤2,构建包含预训练语言模型、预测分类模块的分类模型。
如图2所示,实施例构建的分类模型包含预训练语言模型,该预训练语言模型用于对输入实例文本进行知识表示,以提取掩码位置的嵌入向量,具体地,输入的实例文本需要经过提示模板序列化转化,提示模板的形式为:[CLS]实例文本[MASK][SEP],举例说明为:[CLS]这部电影没有任何意义[MASK][SEP],同时将标签真值通过映射函数映射到预训练语言模型的词表空间,得到标签向量。预测分类模块用于对输入的嵌入向量进行分类预测以输出分类预测概率。
步骤3,利用预训练语言模型提取输入实例文本中遮蔽词的第一嵌入向量,并通过从知识库中查询邻近实例短语来聚合得到输入数据。
实施例中,利用预训练语言模型提取输入实例文本中遮蔽词的第一嵌 入向量,并以该第一嵌入向量作为第一查询向量,针对每个标签类别,采用KNN(最邻近节点算法)搜索从知识库中查询与第一查询向量最邻近的m个实例短语作为第一邻近实例短语,该些第一邻近实例短语作为额外的示例输入,与第一查询向量聚合得到的聚合结果作为预训练语言模型的输入数据,其中聚合公式为:
Figure PCTCN2022137938-appb-000001
Figure PCTCN2022137938-appb-000002
其中,
Figure PCTCN2022137938-appb-000003
表示经过提示模板序列化处理的输入实例文本的初始向量,h q表示输入实例文本中遮蔽词的第一查询向量,
Figure PCTCN2022137938-appb-000004
表示第l类标签中第i个第一邻近实例短语的嵌入向量,m为第一邻近实例短语总量,
Figure PCTCN2022137938-appb-000005
表示
Figure PCTCN2022137938-appb-000006
的softmax值,表示与第一查询向量之间的相关性,e(v l)表示第一邻近实例短语的标签真值,L表示标签总量,I表示聚合得到的聚合结果,该聚合结果作为输入数据结合了来自于知识库的实例短语,作为上下文增强信息,用于指导分类模型训练和纠正分类模型预测,提高了分类模型在少样本和零样本场景下的能力。
步骤4,利用利用预训练语言模型提取输入数据中遮蔽词的第二嵌入向量,利用预测分类模块对第二嵌入向量进行分类预测,基于分类预测概率计算分类损失。
实施例中,在构建计算分类损失时,以输入数据对应的分类预测概率和遮蔽词的标签真值的交叉熵作为分类损失L CE
步骤5,以遮蔽词的标签真值来构建权重因子,根据权重因子对分类 损失进行调整,使分类损失更关注错误分类实例。
实施例中,通过遮蔽词的标签真值来调整分类损失中正确分类和错误分类的权重来使分类模型更佳关注错误分类样本,具体公式如下:
L=(1+βF(p knn))L CE
其中,L CE表示分类损失,β表示调节参数,F(p knn)表示权重因子,表示为F(p knn)=-log(p knn),p knn表示遮蔽词的标签真值。
步骤6,利用调整后的分类损失优化分类模型的参数,得到参数优化后的分类模型。
实施例中,利用构建的分类损失来优化分类模型的参数,并在每轮训练时,将输入实例文本的第一嵌入向量来构建实例短语,更新到知识库中。
上述知识表征解耦的分类模型的微调方法微调后的分类模型提升了在少样本和零样本场景下的能力。当数据量足够多时,知识库相应也拥有更佳更丰富的信息,分类模型在全监督场景下表现也十分突出。
基于同样的发明构思,实施例还提供了一种知识表征解耦的分类模型的微调装置,包括:
知识库构建和更新单元,用于构建用于检索的知识库,知识库中存有多个实例短语,每个实例短语以键值对的形式存储,其中键存储实例词语的嵌入向量,值存储实例短语的标签真值;
分类模型构建单元,用于构建包含预训练语言模型、预测分类模块的分类模型;
查询及聚合单元,用于利用预训练语言模型提取输入实例文本中遮蔽词的第一嵌入向量,并以该第一嵌入向量作为第一查询向量,针对每个标签类别从知识库中查询与第一查询向量最邻近的多个实例短语作为第一邻近实例短语,将所有第一邻近实例短语与第一查询向量聚合得到的聚合 结果作为预训练语言模型的输入数据;
损失计算单元,用于利用预训练语言模型提取输入数据中遮蔽词的第二嵌入向量,利用预测分类模块对第二嵌入向量进行分类预测,以得到分类预测概率,基于该分类预测概率和遮蔽词的标签真值计算分类损失;
损失调整单元,用于以遮蔽词的标签真值来构建权重因子,根据权重因子对分类损失进行调整,使分类损失更关注错误分类实例;
参数优化单元,用于利用调整后的分类损失优化分类模型的参数,得到参数优化后的分类模型。
需要说明的是,上述实施例提供的知识表征解耦的分类模型的微调装置在进行微调分类模型时,应以上述各功能单元的划分进行举例说明,可以根据需要将上述功能分配由不同的功能单元完成,即在终端或服务器的内部结构划分成不同的功能单元,以完成以上描述的全部或者部分功能。另外,上述实施例提供的知识表征解耦的分类模型的微调装置与知识表征解耦的分类模型的微调方法实施例属于同一构思,其具体实现过程详见知识表征解耦的分类模型的微调方法实施例,这里不再赘述。
基于同样的发明构思,实施例还提供了一种利用知识表征解耦的分类模型的任务分类方法,该任务分类方法应用上述微调方法构建的知识库和参数优化后的分类模型,如图3所示,包括以下步骤:
步骤1,利用参数优化后的预训练语言模型提取输入实例文本中遮蔽词的第三嵌入向量,并通过从知识库中查询邻近实例短语来聚合得到输入数据。
实施例中,利用参数优化后的预训练语言模型提取输入实例文本中遮蔽词的第三嵌入向量,并以该第三嵌入向量作为第三查询向量,针对每个标签类别从知识库中查询与第三查询向量最邻近的多个实例短语作为第 三邻近实例短语,将所有第三邻近实例短语与第三查询向量聚合得到的聚合结果作为预训练语言模型的输入数据。
使用非参数化方法KNN从知识库中检索与输入实例文本邻近的实例短语,将KNN检索的结果视为容易与困难实例的指示信息,让分类模型在训练时更关注困难样本。
步骤2,利用参数优化后的预训练语言模型提取输入数据中遮蔽词的第四嵌入向量,并通过从知识库中查询邻近实例短语来计算类别相关概率。
实施例中,利用参数优化后的预训练语言模型提取输入数据中遮蔽词的第四嵌入向量,针对每类,采用KNN搜索从知识库中查询与第四查询向量最邻近的多个实例文本作为第四邻近实例文本,依据第四查询向量与第四邻近实例文本之间的相似度来计算类别相关概率,具体地,采用以下公式依据第四查询向量与第四邻近实例文本之间的相似度来计算类别相关概率:
Figure PCTCN2022137938-appb-000007
其中,P KNN(yi|q t)表示输入实例文本q t的第i分类类别的类别相关概率,
Figure PCTCN2022137938-appb-000008
表示输入实例文本q t的第四查询向量
Figure PCTCN2022137938-appb-000009
与属于第i分类类别yi的实例短语ci的嵌入向量h ci之间的内积距离,作为内积相似度,N表示知识库。
KNN是一种非参数化方法,可以非常容易的对输入实例文本做出预测,不需要任何的分类层,因此可以直观的将KNN的分类结果(类别相关概率)作为一种先验知识来指导预训练分类模型,使其更加关注难样本(或非典型样本)。
步骤3,利用参数优化后的预测分类模块对第四嵌入向量进行分类预 测,以得到分类预测概率。
步骤4,以每个类别相关概率和分类预测概率的加权结果作为总分类预测结果。
传统的预训练语言模型在预测时仅依赖模型的参数化记忆能力,引入非参数化方法KNN后,可以使模型在预测时通过检索最近邻样本来做决策,类似于“开卷考试”。通过KNN检索得到类别相关概率P KNN(yi|q t),分类模型输出的分类预测概率P(yi|q t),将两种概率分布加权求和得到总分类预测结果,表示为:
P=γP KNN(yi|q t)+(1-γ)P(yi|q t)
其中,γ表示权重参数。
通过KNN检索得到类别相关概率P KNN(yi|q t)可以进一步用于分类模型的推理过程,来纠正分类模型在推理时产生的错误。
实施例提供的利用知识表征解耦的分类模型的任务分类方法,可用于关系分类任务。当用于关系分类任务时,知识库中存储的实例短语的标签真值为关系类型,包括朋友关系、亲属关系、同事关系、同学关系,在进行关系分类时,根据输入实例文本经过步骤1和2计算得到每个关系类型的类别相关概率,根据步骤3计算分类预测概率,根据步骤4计算每个关系类型对应的总分类预测结果,通过筛选得到最大的总分类预测结果作为输入实例文本对应的最终关系分类结果。
实施例提供的利用知识表征解耦的分类模型的任务分类方法,可用于情感分类任务。当用于情感分类任务时,知识库中存储的实例短语的标签真值为情感类型,包括积极情感、消极情感,在进行情感分类时,根据输入实例文本经过步骤1和2计算得到每个情感类型的类别相关概率,根据步骤3计算分类预测概率,根据步骤4计算情感类型对应的总分类预测结 果,通过筛选得到最大的总分类预测结果作为输入实例文本对应的最终情感分类结果。
在情感分类任务中,以Roberta-large作为预训练语言模型,为了提高检索速度,使用开源库FAISS进行KNN检索。输入实例文本为“这部电影没有任何意义!”时,进行情感分类的过程为:
(1)构建提示模板对输入实例文本进行转换,经过提示模板转换后输入变为“[CLS]这部电影没有任何意义![MASK][SEP]”。
(2)利用预训练语言模型获得输入实例文本[MASK]位置在嵌入向量,从知识库中检索神经示例,与输入实例文本在[MASK]位置在嵌入向量进行拼接聚合后再输入到预训练语言模型中。
(3)将输入实例文本[MASK]位置在语言模型最后一层的隐藏状态作为查询向量从知识库中检索最近邻实例短语,基于实例短语计算类别相关概率P KNN(yi|q t),其中标签为“差评”概率为0.8,“好评”的概率为0.2;
(4)利用预测分类模块得到查询向量的分类预测概率P(yi|q t),其中,标签为“差评”的概率是0.4,“好评”的概率为0.6;
(5)将两种概率P KNN(yi|q t)和P(yi|q t)加权求和得到总分类预测结果,权重参数γ选择0.5,这样标签为“差评”的总分类预测概率为0.6,“好评”的总分类预测概率为0.4。
以上所述的具体实施方式对本发明的技术方案和有益效果进行了详细说明,应理解的是以上所述仅为本发明的最优选实施例,并不用于限制本发明,凡在本发明的原则范围内所做的任何修改、补充和等同替换等,均应包含在本发明的保护范围之内。

Claims (10)

  1. 一种知识表征解耦的分类模型的微调方法,其特征在于,包括以下步骤:
    步骤1,构建用于检索的知识库,知识库中存有多个实例短语,每个实例短语以键值对的形式存储,其中键存储实例词语的嵌入向量,值存储实例短语的标签真值;
    步骤2,构建包含预训练语言模型、预测分类模块的分类模型;
    步骤3,利用预训练语言模型提取输入实例文本中遮蔽词的第一嵌入向量,并以该第一嵌入向量作为第一查询向量,针对每个标签类别从知识库中查询与第一查询向量最邻近的多个实例短语作为第一邻近实例短语,将所有第一邻近实例短语与第一查询向量聚合得到的聚合结果作为预训练语言模型的输入数据;
    步骤4,利用预训练语言模型提取输入数据中遮蔽词的第二嵌入向量,利用预测分类模块对第二嵌入向量进行分类预测,以得到分类预测概率,基于该分类预测概率和遮蔽词的标签真值计算分类损失;
    步骤5,以遮蔽词的标签真值来构建权重因子,根据权重因子对分类损失进行调整,使分类损失更关注错误分类实例;
    步骤6,利用调整后的分类损失优化分类模型的参数,得到参数优化后的分类模型。
  2. 根据权利要求1所述的知识表征解耦的分类模型的微调方法,其特征在于,步骤2中,采用KNN检索从知识库中查询与第一查询向量最邻近的多个实例短语作为第一邻近实例短语,并通过以下聚合方式聚合所有第一邻近实例短语与第一查询向量:
    Figure PCTCN2022137938-appb-100001
    Figure PCTCN2022137938-appb-100002
    其中,I表示聚合得到的聚合结果,
    Figure PCTCN2022137938-appb-100003
    表示经过提示模板序列化处理的输入实例文本的初始向量,h q表示输入实例文本中遮蔽词的第一查询向量,
    Figure PCTCN2022137938-appb-100004
    表示第l类标签中第i个第一邻近实例短语的嵌入向量,m为第一邻近实例短语总量,
    Figure PCTCN2022137938-appb-100005
    表示
    Figure PCTCN2022137938-appb-100006
    的softmax值,表示与第一查询向量之间的相关性,e(v l)表示第一邻近实例短语的标签真值,L表示标签总量。
  3. 根据权利要求1所述的知识表征解耦的分类模型的微调方法,其特征在于,步骤5中,调整后的分类损失L表示为:
    L=(1+βF(p knn))L CE
    其中,L CE表示分类损失,β表示调节参数,F(p knn)表示权重因子,表示为F(p knn)=-log(p knn),p knn表示遮蔽词的标签真值。
  4. 根据权利要求1所述的知识表征解耦的分类模型的微调方法,其特征在于,包括:以分类预测概率和遮蔽词的标签真值的交叉熵来计算分类损失。
  5. 根据权利要求1-4任一项所述的知识表征解耦的分类模型的微调方法,其特征在于,还包括:利用预训练语言模型提取的第一嵌入向量及其对应的标签真值形成新实例短语,更新到知识库中。
  6. 根据权利要求1所述的知识表征解耦的分类模型的微调装置,其特征在于,包括:
    知识库构建和更新单元,用于构建用于检索的知识库,知识库中存有 多个实例短语,每个实例短语以键值对的形式存储,其中键存储实例词语的嵌入向量,值存储实例短语的标签真值;
    分类模型构建单元,用于构建包含预训练语言模型、预测分类模块的分类模型;
    查询及聚合单元,用于利用预训练语言模型提取输入实例文本中遮蔽词的第一嵌入向量,并以该第一嵌入向量作为第一查询向量,针对每个标签类别从知识库中查询与第一查询向量最邻近的多个实例短语作为第一邻近实例短语,将所有第一邻近实例短语与第一查询向量聚合得到的聚合结果作为预训练语言模型的输入数据;
    损失计算单元,用于利用预训练语言模型提取输入数据中遮蔽词的第二嵌入向量,利用预测分类模块对第二嵌入向量进行分类预测,以得到分类预测概率,基于该分类预测概率和遮蔽词的标签真值计算分类损失;
    损失调整单元,用于以遮蔽词的标签真值来构建权重因子,根据权重因子对分类损失进行调整,使分类损失更关注错误分类实例;
    参数优化单元,用于利用调整后的分类损失优化分类模型的参数,得到参数优化后的分类模型。
  7. 一种利用知识表征解耦的分类模型的任务分类方法,其特征在于,所述任务分类方法应用权利要求1-5任一项所述的微调方法构建的知识库和参数优化后的分类模型,包括以下步骤:
    步骤1,利用参数优化后的预训练语言模型提取输入实例文本中遮蔽词的第三嵌入向量,并以该第三嵌入向量作为第三查询向量,针对每个标签类别从知识库中查询与第三查询向量最邻近的多个实例短语作为第三邻近实例短语,将所有第三邻近实例短语与第三查询向量聚合得到的聚合结果作为预训练语言模型的输入数据;
    步骤2,利用参数优化后的预训练语言模型提取输入数据中遮蔽词的第四嵌入向量,针对每类从知识库中查询与第四查询向量最邻近的多个实例文本作为第四邻近实例文本,依据第四查询向量与第四邻近实例文本之间的相似度来计算类别相关概率;
    步骤3,利用参数优化后的预测分类模块对第四嵌入向量进行分类预测,以得到分类预测概率;
    步骤4,以每个类别相关概率和分类预测概率的加权结果作为总分类预测结果。
  8. 根据权利要求7所述的利用知识表征解耦的分类模型的任务分类方法,其特征在于,采用以下公式依据第四查询向量与第四邻近实例文本之间的相似度来计算类别相关概率:
    Figure PCTCN2022137938-appb-100007
    其中,P KNN(yi|q t)表示输入实例文本q t的第i分类类别的类别相关概率,
    Figure PCTCN2022137938-appb-100008
    表示输入实例文本q t的第四查询向量
    Figure PCTCN2022137938-appb-100009
    与属于第i分类类别yi的实例短语ci的嵌入向量h ci之间的内积距离,作为内积相似度,N表示知识库。
  9. 根据权利要求7所述的利用知识表征解耦的分类模型的任务分类方法,其特征在于,当用于关系分类任务时,知识库中存储的实例短语的标签真值为关系类型,包括朋友关系、亲属关系、同事关系、同学关系,在进行关系分类时,根据输入实例文本经过步骤1和2计算得到每个关系类型的类别相关概率,根据步骤3计算分类预测概率,根据步骤4计算每个关系类型对应的总分类预测结果,通过筛选得到最大的总分类预测结果作为输入实例文本对应的最终关系分类结果。
  10. 根据权利要求7所述的利用知识表征解耦的分类模型的任务分类方法,其特征在于,当用于情感分类任务时,知识库中存储的实例短语的标签真值为情感类型,包括积极情感、消极情感,在进行情感分类时,根据输入实例文本经过步骤1和2计算得到每个情感类型的类别相关概率,根据步骤3计算分类预测概率,根据步骤4计算情感类型对应的总分类预测结果,通过筛选得到最大的总分类预测结果作为输入实例文本对应的最终情感分类结果。
PCT/CN2022/137938 2022-08-10 2022-12-09 知识表征解耦的分类模型的微调方法、装置和应用 WO2024031891A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210955108.0A CN115270988A (zh) 2022-08-10 2022-08-10 知识表征解耦的分类模型的微调方法、装置和应用
CN202210955108.0 2022-08-10

Publications (1)

Publication Number Publication Date
WO2024031891A1 true WO2024031891A1 (zh) 2024-02-15

Family

ID=83751784

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/137938 WO2024031891A1 (zh) 2022-08-10 2022-12-09 知识表征解耦的分类模型的微调方法、装置和应用

Country Status (2)

Country Link
CN (1) CN115270988A (zh)
WO (1) WO2024031891A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117743315A (zh) * 2024-02-20 2024-03-22 浪潮软件科技有限公司 一种为多模态大模型系统提供高质量数据的方法
CN118070925A (zh) * 2024-04-17 2024-05-24 腾讯科技(深圳)有限公司 模型训练方法、装置、电子设备、存储介质及程序产品
CN118152428A (zh) * 2024-05-09 2024-06-07 烟台海颐软件股份有限公司 一种电力客服系统查询指令的预测和增强方法及其装置
CN118171650A (zh) * 2024-03-21 2024-06-11 行至智能(北京)技术有限公司 一种完全无监督的大语言模型微调训练平台
CN118504586A (zh) * 2024-07-18 2024-08-16 河南嵩山实验室产业研究院有限公司洛阳分公司 一种基于大语言模型的用户风险行为感知方法及相关设备

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115270988A (zh) * 2022-08-10 2022-11-01 浙江大学 知识表征解耦的分类模型的微调方法、装置和应用

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111401077A (zh) * 2020-06-02 2020-07-10 腾讯科技(深圳)有限公司 语言模型的处理方法、装置和计算机设备
US20210089724A1 (en) * 2019-09-25 2021-03-25 Google Llc Contrastive Pre-Training for Language Tasks
CN112614538A (zh) * 2020-12-17 2021-04-06 厦门大学 一种基于蛋白质预训练表征学习的抗菌肽预测方法和装置
CN113987209A (zh) * 2021-11-04 2022-01-28 浙江大学 基于知识指导前缀微调的自然语言处理方法、装置、计算设备和存储介质
CN114510572A (zh) * 2022-04-18 2022-05-17 佛山科学技术学院 一种终身学习的文本分类方法及系统
CN114565104A (zh) * 2022-03-01 2022-05-31 腾讯科技(深圳)有限公司 语言模型的预训练方法、结果推荐方法及相关装置
WO2022141878A1 (zh) * 2020-12-28 2022-07-07 平安科技(深圳)有限公司 端到端的语言模型预训练方法、系统、设备及存储介质
CN115270988A (zh) * 2022-08-10 2022-11-01 浙江大学 知识表征解耦的分类模型的微调方法、装置和应用

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210089724A1 (en) * 2019-09-25 2021-03-25 Google Llc Contrastive Pre-Training for Language Tasks
CN111401077A (zh) * 2020-06-02 2020-07-10 腾讯科技(深圳)有限公司 语言模型的处理方法、装置和计算机设备
CN112614538A (zh) * 2020-12-17 2021-04-06 厦门大学 一种基于蛋白质预训练表征学习的抗菌肽预测方法和装置
WO2022141878A1 (zh) * 2020-12-28 2022-07-07 平安科技(深圳)有限公司 端到端的语言模型预训练方法、系统、设备及存储介质
CN113987209A (zh) * 2021-11-04 2022-01-28 浙江大学 基于知识指导前缀微调的自然语言处理方法、装置、计算设备和存储介质
CN114565104A (zh) * 2022-03-01 2022-05-31 腾讯科技(深圳)有限公司 语言模型的预训练方法、结果推荐方法及相关装置
CN114510572A (zh) * 2022-04-18 2022-05-17 佛山科学技术学院 一种终身学习的文本分类方法及系统
CN115270988A (zh) * 2022-08-10 2022-11-01 浙江大学 知识表征解耦的分类模型的微调方法、装置和应用

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117743315A (zh) * 2024-02-20 2024-03-22 浪潮软件科技有限公司 一种为多模态大模型系统提供高质量数据的方法
CN117743315B (zh) * 2024-02-20 2024-05-14 浪潮软件科技有限公司 一种为多模态大模型系统提供高质量数据的方法
CN118171650A (zh) * 2024-03-21 2024-06-11 行至智能(北京)技术有限公司 一种完全无监督的大语言模型微调训练平台
CN118070925A (zh) * 2024-04-17 2024-05-24 腾讯科技(深圳)有限公司 模型训练方法、装置、电子设备、存储介质及程序产品
CN118152428A (zh) * 2024-05-09 2024-06-07 烟台海颐软件股份有限公司 一种电力客服系统查询指令的预测和增强方法及其装置
CN118504586A (zh) * 2024-07-18 2024-08-16 河南嵩山实验室产业研究院有限公司洛阳分公司 一种基于大语言模型的用户风险行为感知方法及相关设备

Also Published As

Publication number Publication date
CN115270988A (zh) 2022-11-01

Similar Documents

Publication Publication Date Title
WO2024031891A1 (zh) 知识表征解耦的分类模型的微调方法、装置和应用
Du et al. Text classification research with attention-based recurrent neural networks
CN111177374A (zh) 一种基于主动学习的问答语料情感分类方法及系统
CN108038492A (zh) 一种基于深度学习的感性词向量及情感分类方法
CN112800190B (zh) 基于Bert模型的意图识别与槽值填充联合预测方法
CN110555084A (zh) 基于pcnn和多层注意力的远程监督关系分类方法
US11663668B1 (en) Apparatus and method for generating a pecuniary program
CN110555459A (zh) 基于模糊聚类和支持向量回归的成绩预测方法
CN111581368A (zh) 一种基于卷积神经网络的面向智能专家推荐的用户画像方法
Shen et al. A deep learning method for Chinese singer identification
Kozhevnikov et al. Research of the text data vectorization and classification algorithms of machine learning
Song et al. Classification of traditional chinese medicine cases based on character-level bert and deep learning
CN115687609A (zh) 一种基于Prompt多模板融合的零样本关系抽取方法
US20230368003A1 (en) Adaptive sparse attention pattern
CN114077836A (zh) 一种基于异构神经网络的文本分类方法及装置
CN114417851A (zh) 一种基于关键词加权信息的情感分析方法
CN115329101A (zh) 一种电力物联网标准知识图谱构建方法及装置
Zheng et al. Named entity recognition: A comparative study of advanced pre-trained model
Zheng et al. Optimizing the online learners’ verbal intention classification efficiency based on the multi-head attention mechanism algorithm
Yuan [Retracted] A Classroom Emotion Recognition Model Based on a Convolutional Neural Network Speech Emotion Algorithm
CN117436451A (zh) 基于IDCNN-Attention的农业病虫害命名实体识别方法
CN117436522A (zh) 生物事件关系抽取方法及癌症主题的大规模生物事件关系知识库构建方法
Wang et al. W-RNN: News text classification based on a Weighted RNN
Wu et al. TW-TGNN: Two windows graph-based model for text classification
Zhang et al. Research on a kind of multi-objective evolutionary fuzzy system with a flowing data pool and a rule pool for interpreting neural networks

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22954829

Country of ref document: EP

Kind code of ref document: A1