CN117251555A - Language generation model training method and device - Google Patents

Language generation model training method and device Download PDF

Info

Publication number
CN117251555A
CN117251555A CN202311533038.0A CN202311533038A CN117251555A CN 117251555 A CN117251555 A CN 117251555A CN 202311533038 A CN202311533038 A CN 202311533038A CN 117251555 A CN117251555 A CN 117251555A
Authority
CN
China
Prior art keywords
dictionary
entry
enhancement
original
generation model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311533038.0A
Other languages
Chinese (zh)
Other versions
CN117251555B (en
Inventor
孙海亮
暴宇健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Xumi Yuntu Space Technology Co Ltd
Original Assignee
Shenzhen Xumi Yuntu Space Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Xumi Yuntu Space Technology Co Ltd filed Critical Shenzhen Xumi Yuntu Space Technology Co Ltd
Priority to CN202311533038.0A priority Critical patent/CN117251555B/en
Publication of CN117251555A publication Critical patent/CN117251555A/en
Application granted granted Critical
Publication of CN117251555B publication Critical patent/CN117251555B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Machine Translation (AREA)

Abstract

The application relates to the technical field of artificial intelligence and provides a language generation model training method and device. The method comprises the following steps: acquiring a training corpus; the training corpus comprises a plurality of pieces of dictionary information, wherein each piece of dictionary information comprises an original vocabulary entry, an original vocabulary entry description, and a dictionary positive example sample and a dictionary negative example sample corresponding to each original vocabulary entry; inputting training corpus into a pre-trained dictionary enhancement language generation model adopting an encoder-decoder structure, and obtaining a plurality of feature vectors; the dictionary enhancement language generation model is obtained through an entry prediction task and an entry description judgment task; and acquiring a target loss function according to the plurality of feature vectors and the fine tuning data set to update parameters of the dictionary enhancement language generation model and obtain the dictionary enhancement language generation model subjected to fine tuning. According to the method and the device, dictionary description information is introduced, and the contrast learning enhancement is combined, so that the language generation model is trained, and the model can effectively process the context.

Description

Language generation model training method and device
Technical Field
The application relates to the technical field of artificial intelligence, in particular to a language generation model training method and device.
Background
The pre-training model fully utilizes the ideas of self-supervision learning method and transfer learning, is generally irrelevant to specific natural language processing tasks, performs learning and training on a large-scale unlabeled data set, and then applies the learned model to downstream tasks of natural language processing. In some researches on pre-training language models, some researchers have integrated external knowledge, such as a knowledge graph, into the pre-training language models to obtain more excellent effects, but knowledge graph information in specific scenes is often not easy to obtain. The prior pre-training language model has some problems in learning universal knowledge such as common concepts and common sense, and the model usually learns grammar and vocabulary collocation knowledge, but does not learn common sense or entity entry knowledge. And dictionary descriptions are easier to obtain than information enhancement of knowledge maps.
Therefore, how to propose a new method to use dictionary description information as external knowledge to enhance the pre-training language model is a technical problem to be solved.
Disclosure of Invention
In view of this, embodiments of the present application provide a language generation model training method, apparatus, electronic device, and computer readable storage medium, so as to solve the problem that the existing pre-training language model does not perform natural language processing by learning dictionary entries.
In a first aspect of an embodiment of the present application, a language generation model training method is provided, including:
acquiring a training corpus; the training corpus comprises a plurality of pieces of dictionary information, wherein each piece of dictionary information comprises an original vocabulary entry, an original vocabulary entry description, and a dictionary positive example sample and a dictionary negative example sample corresponding to each original vocabulary entry;
inputting the training corpus into a pre-trained dictionary enhanced language generation model adopting an encoder-decoder structure, and obtaining a plurality of feature vectors; the dictionary enhancement language generation model is obtained through an entry prediction task and an entry description judgment task, wherein the entry prediction task is used for predicting an entry according to the original entry and the original entry description, and the entry description judgment task is used for performing contrast learning according to the dictionary positive example sample, the dictionary negative example sample and the enhancement sample;
and acquiring a target loss function according to the plurality of feature vectors and the fine tuning data set so as to update parameters of the dictionary enhancement language generation model and acquire the dictionary enhancement language generation model subjected to fine tuning.
In a second aspect of an embodiment of the present application, there is provided a language generation model training apparatus, adapted to the language generation model training method described in the first aspect, where the apparatus includes:
the training corpus acquisition module can acquire training corpus; the training corpus comprises a plurality of pieces of dictionary information, wherein each piece of dictionary information comprises an original vocabulary entry, an original vocabulary entry description, and a dictionary positive example sample and a dictionary negative example sample corresponding to each original vocabulary entry;
the enhanced feature vector acquisition module can input the training corpus into a pre-trained dictionary enhanced language generation model adopting an encoder-decoder structure to acquire a plurality of feature vectors; the dictionary enhancement language generation model is obtained through an entry prediction task and an entry description judgment task, wherein the entry prediction task is used for predicting an entry according to the original entry and the original entry description, and the entry description judgment task is used for performing contrast learning according to the dictionary positive example sample, the dictionary negative example sample and the enhancement sample;
and the model training module can acquire a target loss function according to the plurality of feature vectors and the fine tuning data set so as to update parameters of the dictionary enhancement language generation model and acquire the dictionary enhancement language generation model subjected to fine tuning.
In a third aspect of the embodiments of the present application, there is provided an electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the method of the first aspect when the computer program is executed.
In a fourth aspect of embodiments of the present application, there is provided a computer readable storage medium storing a computer program which, when executed by a processor, implements the steps of the method of the first aspect.
Compared with the prior art, the beneficial effects of the embodiment of the application at least comprise: firstly, acquiring a training corpus, wherein the training corpus comprises a plurality of pieces of dictionary information, and each piece of dictionary information comprises an original vocabulary entry, an original vocabulary entry description, and a dictionary positive example sample and a dictionary negative example sample corresponding to each original vocabulary entry; inputting the training corpus into a pre-trained dictionary enhanced language generation model adopting an encoder-decoder structure, and obtaining a plurality of feature vectors; the dictionary enhanced language generation model is obtained through an entry prediction task and an entry description judgment task; and finally, acquiring a target loss function according to the plurality of feature vectors and the fine tuning data set to update parameters of the dictionary enhancement language generation model and obtain the dictionary enhancement language generation model subjected to fine tuning. According to the method and the device, dictionary description information is introduced, and the contrast learning enhancement is combined, so that the language generation model is trained, and the model can effectively process the context.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following description will briefly introduce the drawings that are needed in the embodiments or the description of the prior art, it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a language generation model training method according to an embodiment of the present application;
FIG. 2 is a second flow chart of a language generation model training method according to an embodiment of the present disclosure;
FIG. 3 is a third flow chart of a language generation model training method according to an embodiment of the present disclosure;
FIG. 4 is a flowchart of a language generation model training method according to an embodiment of the present disclosure;
FIG. 5 is a fifth flow chart of a language generation model training method provided in an embodiment of the present application;
FIG. 6 is a schematic diagram of a language modeling training apparatus according to an embodiment of the present disclosure;
fig. 7 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
A language generating model training method, device, electronic equipment and storage medium according to the embodiment of the application will be described in detail below with reference to the accompanying drawings.
As described in the background, deep learning has been largely successful over the past decade, and is widely used in natural language processing and computer vision. In the field of natural language processing, the current deep learning method depends on the labeling data in a large quantity, however, in the field of natural language processing, large-scale labeling data is not common, compared with the field of computer vision, the labeling task of the related data of the natural language processing is more difficult, and labeling personnel usually need to read and understand a large-section text before labeling. The manual small amount of annotation data cannot drive a large and deep learning model, so that the learning capability of the model is limited. Pre-trained language models (Pre-TrainedLanguage Models, PLMs) are one of the important approaches to solving the above-mentioned problems. The pre-training model fully utilizes the ideas of self-supervision learning method and transfer learning, is generally irrelevant to specific natural language processing tasks, performs learning and training on a large-scale unlabeled data set, and then applies the learned model to downstream tasks of natural language processing. Early pre-training language models were designed to obtain word representation vectors, such as: word2Vec, golVe, etc.; more of the currently prevailing pre-trained language models are to obtain deep semantic representations, such as: ELMo, BERT, BART and GPT, etc.
The prior pre-training language model has some problems in learning universal knowledge such as common concepts and common sense, and the model usually learns grammar and vocabulary collocation knowledge, but does not learn common sense or entity entry knowledge. Dictionary class corpus refers to a corpus containing term definitions, such as a dictionary or encyclopedia. Such corpora typically contain a large number of phrases and entity names, as well as relationships associated with these terms. Pre-trained language models present some difficulties in processing such corpora, including: (1) Lack of entity recognition capabilities, i.e., pre-trained language models are typically trained on text generation tasks, and thus have difficulty in recognizing entity names; (2) The lack of context information, i.e., entries in the dictionary-type corpus, is typically independent and therefore lacks context information.
The inventor knows through searching that in some researches of the pre-training language model, some researchers have integrated external knowledge, such as a knowledge graph, into the pre-training language model to obtain a more excellent effect, but knowledge graph information in a specific scene is often not easy to obtain. And dictionary descriptions are easier to obtain than information enhancement of knowledge maps. Therefore, how to propose a new method to use dictionary description information as external knowledge to enhance the pre-training language model is a technical problem to be solved.
FIG. 1 is a schematic flow chart of a language generation model training method according to an embodiment of the invention. The method comprises the following steps:
s101: acquiring a training corpus; the training corpus comprises a plurality of pieces of dictionary information, wherein each piece of dictionary information comprises an original vocabulary entry, an original vocabulary entry description, and a dictionary positive example sample and a dictionary negative example sample corresponding to each original vocabulary entry.
S102: inputting the training corpus into a pre-trained dictionary enhancement language generation model adopting an encoder-decoder structure, and obtaining a plurality of feature vectors; the dictionary enhancement language generation model is obtained through an entry prediction task and an entry description judgment task, the entry prediction task is used for predicting an entry according to the original entry and the original entry description, and the entry description judgment task is used for performing contrast learning according to the dictionary positive example sample, the dictionary negative example sample and the enhancement sample.
S103: and obtaining a target loss function according to the plurality of feature vectors and the fine tuning data set so as to update parameters of the dictionary enhancement language generation model and obtain the dictionary enhancement language generation model subjected to fine tuning.
In particular, four types of information are used during the pre-training phase of the model, including dictionary entries, dictionary entry descriptions, dictionary synonyms, and dictionary anti-ambiguities. Meanwhile, in order to improve the robustness of the entry embedding characterization, synonyms and anti-ambiguities of the original entries are utilized for comparison learning. One embodiment of the present application is described by taking the original entry of a dictionary as an example:
original entry: forest (S)
Original entry description: a large areaof land covered with trees and plants, usually larger than a wood, or the treesand plants themselves
Synonyms of: jungle, wood land
The anticonsite term: desert, wasteland
In some embodiments, the dictionary positive example sample includes synonym entries and synonym entry descriptions corresponding to the original entries; and/or, the dictionary negative example sample comprises an anti-meaning term and an anti-meaning term description corresponding to the original term.
Specifically, take synonyms wood and anticonsert as examples:
synonyms of: woodland
Synonym description: land covered withwood or trees
The anticonsite term: desert (Desert)
Disambiguation description: arid land withlittle or no vegetation
In some embodiments, the term prediction task, as shown in fig. 3, includes:
s211: and inputting the original vocabulary entry and the original vocabulary entry description into a mask language model to obtain a prediction feature vector corresponding to the original vocabulary entry.
S212: determining the entry predictive loss function according to the predictive feature vector
Specifically, the term prediction task is effectively a masked language model MLM task. I.e. given the term e and its description desc, the content of the term e is then replaced with the special character MASK, and finally its MASK content is restored. In particular, when the term e contains a plurality of token, all of the token needs to be masked. The use of the masking language model MLM is not described here in detail.
In some embodiments, the term describes the judgment task, as shown in fig. 3, including:
s311: and acquiring a first feature vector according to the combination statement of the original entry and the original entry description configured with the specific mark.
S312: and obtaining a second feature vector according to the synonym entry configured with the specific mark and the combined statement described by the synonym entry.
S313: and acquiring a third feature vector according to the combined statement of the anti-definition word and the anti-definition word description of the configuration specific mark.
S314: determining a first contrast learning loss function according to the first feature vector, the second feature vector and the third feature vector
Specifically, the term description judgment task is essentially a contrast learning task. The contrast learning aims at pulling similar data and pushing dissimilar data, so that data characterization is effectively learned. In contrast learning, firstly, model coding is carried out by adopting a unified format of 'entry+description', namely, original entry+description, synonym+description and anti-ambiguity+description are respectively obtained, then corresponding hidden layer characteristics are obtained, and contrast learning loss is obtained. In addition, the dictionary enhanced language generation model is constructed based on the BART model. The BART model is a transducer encoder-decoder neural network structure. In order to achieve the technical effect of the application, the total quantity of parameters of the BART model exceeds 5Billion in application, and the application needs to be pre-trained by a training task commonly used by an autoregressive language model task GPT model.
Specifically, the training process is still described taking the original vocabulary entry forest of the dictionary as an example. First, contrast learning samples are respectively constructed: [ CLS ]]forest[ SEP]alarge area … obtains [ CLS ] of whole sentence]Feature vectors are noted as;[ CLS]woodland[ SEP]Landcovered with … gets the complete sentence [ CLS ]]The feature vector is denoted->;[ CLS]desert[ SEP]aridland with little … obtaining the [ CLS ] of the whole sentence]The feature vector is denoted->. Then, a classical contrast learning loss function triplet loss trippletloss is introduced to calculate a first contrast learning loss function. It can be seen that the distance between the synonym feature-vectors should be smaller than the distance between the anti-synonym feature-vectors, i.e +.>And->The loss is small when the vector distance is short, and the vector distance is little>And->The loss is small if the distance is far.
In some embodiments, the enhancement samples corresponding to the original vocabulary entry are obtained using an existing large language model, wherein the enhancement samples include at least 1 positive enhancement sample, and the positive enhancement sample includes an enhancement synonym entry and an enhancement synonym entry description.
In some embodiments, the term describes the judging task, as shown in fig. 4, further including:
s411: and obtaining a fourth feature vector according to the combination statement of the enhanced synonym entry and the enhanced synonym entry description configured with the specific mark.
S412: determining the first contrast learning loss function according to the first feature vector, the fourth feature vector and the third feature vector
Specifically, the enhancement samples can be obtained based on the data enhancement means of the mature large language model, so that the number of training corpuses is further increased, and the training of the model is more accurate. For example: for the positive case enhancement sample, enhancement synonym and enhancement synonym description generation is performed using the large language model with the highest current precision, such as GPT 4. Still taking the above forest entry as an example. First, the following hint is input to GPT 4: please give at least 1 synonym for the word "forest" and give an explanation of each synonym. Then, acquisition of GPT4 gives an answer as: "Woodland: land coveredwith trees or grass "construction method described with reference to synonyms+above can construct [ CLS ]]woodland[SEP]Land covered with trees or grass, obtain [ CLS ] of the whole sentence]Feature vectors are noted asAs a positive sample. At this time, a->And->Form a positive relationship, ->And->Constructing a negative example relationship, and still introducing classical contrast learning loss function triplet loss tripleloss to calculate a first contrast learning loss function. It can be seen that the distance between the synonym feature-vectors should be smaller than the distance between the anti-synonym feature-vectors, i.e +.>And->The loss is small when the vector distance is short, and the vector distance is little>And->The loss is small if the distance is far.
In some embodiments, the enhancement samples include at least 2 refractory case enhancement samples, and the refractory case enhancement samples include refractory case enhancement vocabulary entries and refractory case enhancement vocabulary entry descriptions.
Specifically, for the negative example enhancement sample, the large language model with highest current precision, such as GPT4, is used for generating the difficult negative example enhancement vocabulary entry and the difficult negative example enhancement vocabulary entry description. Still taking the above forest entry as an example. First, the following hint is input to GPT 4: please give at least 2 entries related to the entry "forest" but different and give an explanation of each entry. Then, acquisition of GPT4 gives an answer as:
Wetland: Wetlandsare areas where water covers the soil, or is present either at or near thesurface of the soil all year or for varying periods of time during the year,including during the growing season.
Grove: A groveis a small group or cluster of trees, often found in a specific area. It can bea collection of trees of the same species or a mixture of different species.Groves are usually smaller than forests and may have a more open and scatteredarrangement of trees.
in some embodiments, the term describes the judging task, as shown in fig. 5, further including:
s511: and obtaining a fifth feature vector according to the combination statement of the first difficult-to-minus example enhancement vocabulary entry configured with the specific mark and the description of the first difficult-to-minus example enhancement vocabulary entry.
S512: and obtaining a sixth feature vector according to the combined statement of the first difficult-to-minus example enhancement entry and the second difficult-to-minus example enhancement entry description configured with the specific mark.
S513: determining the second contrast learning loss function according to the fifth feature vector and the sixth feature vector
Specifically, assume that the obtained Wetland is a first refractory case enhancement term, and Grove is a second refractory case enhancement term. First, the vocabulary entry description of the first refractory object enhancement vocabulary entry Wetland refers to the vocabulary entry+description construction method [ CLS ]] Wetland[ SEP]Wetlands are areas … obtaining the [ CLS ] of the whole sentence]Feature vectors are noted asAs a negative example. Then, the vocabulary entry description of the first difficultly-negative example enhanced vocabulary entry Wetland and the vocabulary entry Grove of the second difficultly-negative example enhanced vocabulary entry is constructed by referring to the vocabulary entry +description construction method [ CLS ]] Wetland[ SEP]A grove is a small group … obtaining the [ CLS ] of the whole sentence]The feature vector is denoted->As a negative example. Finally, a vector distance method can be used to determine the second contrast learning loss function>. The vector distance method here includes the use of euclidean distance method.
In some embodiments, the loss function is predicted based on the termThe first contrast learning loss function>And the second contrast learning loss function described above +.>Determining the target loss functionThe method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>And->The first contrast learning loss function>And the second contrast learning loss function described above +.>Is a weight factor of (a). In one embodiment, the ∈ ->And->Typically between 0.1 and 0.3.
In some embodiments, the fine data sets include an open source instruction fine data set, a dictionary-based interpretation data set, and a dictionary-based question-answer data set; the dictionary-based question-answer data set comprises question-answer pairs constructed based on terms and term description matching relations.
Specifically, to fully enhance the generalization and multitasking capabilities of a large-scale language model, the model may be instruction-refined using an open-source instruction-refined dataset+dictionary-based interpretation dataset. Wherein the open source instruction fine data set may comprise: chinese general open source instruction data set COIG, the FlanCollection, etc. Because of the large number of such data sets, no further description is provided here. In addition, a Chinese-English bilingual multilingual dataset is selected as much as possible. Through verification, the multi-language data set can improve training precision and model generalization capability compared with a single-language data set.
For a dictionary-based question-answer dataset, its construction content includes: (1) Defining class, namely directly giving out vocabulary entries, and enabling the model to generate the corresponding interpretation text of the vocabulary entries; (2) Abstract class, namely extracting the interpretation text of a term, rewriting the interpretation text by GPT4, requiring the meaning of the text to be unchanged, inputting the rewritten text as a model, and enabling the model to give the term; (3) The matching degree question and answer, the vocabulary entry and the interpretation are randomly matched with a certain probability, for example, the original vocabulary entry interpretation is maintained with 20% probability, after the model is input, the vocabulary entry and the interpretation are given, if not, the original vocabulary entry corresponding to the interpretation is given. For example:
Word:Grove.Explanation:Groves are areas where water covers the soil, or is presenteither at or near the surface of the soil all year or for varying periods oftime during the year, including during the growing season.
Question: Doesthe explanation matches the word? If Not, please give the correct word thatmatches the explanation.
Answer: No,the answer should be wetland.
dictionary-based questions and answers are completely built into the form of question and answer pairs, and instruction fine-tuning data sets are trained to perform the seq2seq task in a unified form, the loss function can be. When the instruction fine tuning is performed, all the data sets participating in the instruction fine tuning are mixed and then are trained uniformly, so that the question-answer data sets constructed based on the dictionary are required not to be excessive, for example, the data sets exceed 30% of the total data sets, and otherwise, the generalization capability of the whole model is possibly damaged.
In addition, there are two methods for using the dictionary enhancement language generation model in the prediction use stage: the method is that the semantic characterization capability is provided as a characterization model, after the text to be represented passes through the encoder of the model, the feature vector corresponding to [ CLS ] of the encoder is extracted as the feature vector of the whole sentence, and the vector retrieval is participated. Another method of use is to directly use as an end-to-end question and answer robot to answer various dictionary related field questions as a controller for the whole dialogue system, for example to construct a dialogue type search engine, etc.
It can be seen that the dictionary enhancement language generation model uses dictionary description information as an external knowledge enhancement pre-training language model, and dictionary description is easier to obtain compared with information enhancement of a knowledge graph. By seamlessly integrating the large language model generating capability and dictionary knowledge, the enhancement of the large language generating model to entry information is realized, and expert information of a professional dictionary is integrated into the large language generating model so as to fully utilize the dominant processing context of the large language generating model in text characterization. Meanwhile, the model training method has excellent zero-shot multitask learning capability, and can give more professional and reliable answers in the dictionary coverage field due to the adoption of a large language model technology and dictionary knowledge, and can also give more reliable general questions and answers in the open field due to common sense answer training in the general field.
Any combination of the above optional solutions may be adopted to form an optional embodiment of the present application, which is not described herein in detail.
The following are device embodiments of the present application, which may be used to perform method embodiments of the present application. For details not disclosed in the device embodiments of the present application, please refer to the method embodiments of the present application.
Fig. 6 is a schematic diagram of a language generating model training apparatus according to an embodiment of the present application. As shown in fig. 6, the language generation model training apparatus includes:
the corpus acquisition module 601 can acquire a corpus; the training corpus comprises a plurality of pieces of dictionary information, wherein each piece of dictionary information comprises an original vocabulary entry, an original vocabulary entry description, and a dictionary positive example sample and a dictionary negative example sample corresponding to each original vocabulary entry;
the enhanced feature vector obtaining module 602 is capable of inputting the training corpus into a pre-trained dictionary enhanced language generating model adopting an encoder-decoder structure to obtain a plurality of feature vectors; the dictionary enhancement language generation model is obtained through an entry prediction task and an entry description judgment task, wherein the entry prediction task is used for predicting an entry according to the original entry and the original entry description, and the entry description judgment task is used for performing contrast learning according to the dictionary positive example sample, the dictionary negative example sample and the enhancement sample;
the model training module 603 can obtain a target loss function according to the feature vectors and the fine tuning data set, so as to update parameters of the dictionary enhancement language generation model and obtain the dictionary enhancement language generation model after fine tuning.
It should be understood that, the language generating model training apparatus according to the embodiments of the present disclosure may further execute the method executed by the language generating model training apparatus in fig. 1 to 5, and implement the functions of the language generating model training apparatus in the examples shown in fig. 1 to 5, which are not described herein.
Meanwhile, the sequence number of each step in the above embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.
Fig. 7 is a schematic diagram of an electronic device 7 provided in an embodiment of the present application. As shown in fig. 7, the electronic device 7 of this embodiment includes: a processor 701, a memory 702 and a computer program 703 stored in the memory 702 and executable on the processor 701. The steps of the various method embodiments described above are implemented by the processor 701 when executing the computer program 703. Alternatively, the processor 701, when executing the computer program 703, performs the functions of the modules/units of the apparatus embodiments described above.
The electronic device 7 may be a desktop computer, a notebook computer, a palm computer, a cloud server, or the like. The electronic device 7 may include, but is not limited to, a processor 701 and a memory 702. It will be appreciated by those skilled in the art that fig. 7 is merely an example of the electronic device 7 and is not limiting of the electronic device 7 and may include more or fewer components than shown, or different components.
The memory 702 may be an internal storage unit of the electronic device 7, for example, a hard disk or a memory of the electronic device 7. The memory 702 may also be an external storage device of the electronic device 7, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the electronic device 7. The memory 702 may also include both internal storage units and external storage devices of the electronic device 7. The memory 702 is used to store computer programs and other programs and data required by the electronic device.
The processor 701 may be a central processing unit (CentralProcessing Unit, CPU) or other general purpose processor, digital signal processor (Digital SignalProcessor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field-programmable gate array (Field-ProgrammableGate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like. The processor 701 reads a corresponding computer program from the nonvolatile memory into the memory and then runs, and forms a shared resource access control device on a logic level. The processor is used for executing the programs stored in the memory and is specifically used for executing the following operations:
acquiring a training corpus; the training corpus comprises a plurality of pieces of dictionary information, wherein each piece of dictionary information comprises an original vocabulary entry, an original vocabulary entry description, and a dictionary positive example sample and a dictionary negative example sample corresponding to each original vocabulary entry;
inputting the training corpus into a pre-trained dictionary enhancement language generation model adopting an encoder-decoder structure, and obtaining a plurality of feature vectors; the dictionary enhancement language generation model is obtained through an entry prediction task and an entry description judgment task, wherein the entry prediction task is used for predicting an entry according to the original entry and the original entry description, and the entry description judgment task is used for performing contrast learning according to the dictionary positive example sample, the dictionary negative example sample and the enhancement sample;
and obtaining a target loss function according to the plurality of feature vectors and the fine tuning data set so as to update parameters of the dictionary enhancement language generation model and obtain the dictionary enhancement language generation model subjected to fine tuning.
The language generating model training method disclosed in the embodiments shown in fig. 1 to 5 of the present specification may be applied to the processor 701 or implemented by the processor 701. The processor 701 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware in a processor or by instructions in the form of software. The above-described processor may implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present specification. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the embodiments of the present specification may be embodied directly in hardware, in a decoded processor, or in a combination of hardware and software modules in a decoded processor. The software modules may be located in a random access memory, flash memory, read only memory, programmable read only memory, or electrically erasable programmable memory, registers, etc. as well known in the art. The storage medium is located in a memory, and the processor reads the information in the memory and, in combination with its hardware, performs the steps of the above method.
Of course, in addition to the software implementation, the electronic device of the embodiments of the present disclosure does not exclude other implementations, such as a logic device or a combination of software and hardware, that is, the execution subject of the following processing flow is not limited to each logic unit, but may also be hardware or a logic device.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit.
The integrated modules/units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application implements all or part of the flow in the methods of the above embodiments, or may be implemented by a computer program to instruct related hardware, and the computer program may be stored in a computer readable storage medium, where the computer program may implement the steps of the respective method embodiments described above when executed by a processor. The computer program may comprise computer program code, which may be in source code form, object code form, executable file or in some intermediate form, etc. The computer readable medium may include: any entity or device capable of carrying computer program code, a recording medium, a USB flash disk, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a Read-only memory (ROM), a random access memory (Random Access Memory, RAM), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. It should be noted that the content of the computer readable medium can be appropriately increased or decreased according to the requirements of the jurisdiction's jurisdiction and the patent practice, for example, in some jurisdictions, the computer readable medium does not include electrical carrier signals and telecommunication signals according to the jurisdiction and the patent practice.
The present description also proposes a computer-readable storage medium storing one or more programs, the one or more programs comprising instructions, which when executed by a portable electronic device comprising a plurality of application programs, enable the portable electronic device to perform the language generation model training method of the embodiments shown in fig. 1 to 5, and in particular to perform the following method:
acquiring a training corpus; the training corpus comprises a plurality of pieces of dictionary information, wherein each piece of dictionary information comprises an original vocabulary entry, an original vocabulary entry description, and a dictionary positive example sample and a dictionary negative example sample corresponding to each original vocabulary entry;
inputting the training corpus into a pre-trained dictionary enhancement language generation model adopting an encoder-decoder structure, and obtaining a plurality of feature vectors; the dictionary enhancement language generation model is obtained through an entry prediction task and an entry description judgment task, wherein the entry prediction task is used for predicting an entry according to the original entry and the original entry description, and the entry description judgment task is used for performing contrast learning according to the dictionary positive example sample, the dictionary negative example sample and the enhancement sample;
and obtaining a target loss function according to the plurality of feature vectors and the fine tuning data set so as to update parameters of the dictionary enhancement language generation model and obtain the dictionary enhancement language generation model subjected to fine tuning.
In summary, the above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present specification should be included in the protection scope of the present specification.
The system, apparatus, module or unit set forth in the above embodiments may be implemented in particular by a computer chip or entity, or by a product having a certain function. One typical implementation is a computer. In particular, the computer may be, for example, a personal computer, a laptop computer, a cellular telephone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, a game console, a tablet computer, a wearable device, or a combination of any of these devices.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transshipment) such as modulated data signals and carrier waves.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments.
The above embodiments are only for illustrating the technical solution of the present application, and are not limiting thereof; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims (10)

1. A method for training a language generation model, comprising:
acquiring a training corpus; the training corpus comprises a plurality of pieces of dictionary information, wherein each piece of dictionary information comprises an original vocabulary entry, an original vocabulary entry description, and a dictionary positive example sample and a dictionary negative example sample corresponding to each original vocabulary entry;
inputting the training corpus into a pre-trained dictionary enhanced language generation model adopting an encoder-decoder structure, and obtaining a plurality of feature vectors; the dictionary enhancement language generation model is obtained through an entry prediction task and an entry description judgment task, wherein the entry prediction task is used for predicting an entry according to the original entry and the original entry description, and the entry description judgment task is used for performing contrast learning according to the dictionary positive example sample, the dictionary negative example sample and the enhancement sample;
and acquiring a target loss function according to the plurality of feature vectors and the fine tuning data set so as to update parameters of the dictionary enhancement language generation model and acquire the dictionary enhancement language generation model subjected to fine tuning.
2. The method of claim 1, wherein the term prediction task comprises:
inputting the original vocabulary entry and the original vocabulary entry description into a mask language model to obtain a prediction feature vector corresponding to the original vocabulary entry;
determining an entry predictive loss function according to the predictive feature vector
3. The method of claim 2, wherein the dictionary positive example sample includes synonyms and synonym descriptions corresponding to the original vocabulary entry; and/or the dictionary negative example sample includes an anti-ambiguous term and an anti-ambiguous term description corresponding to the original term; and/or, the term description judging task comprises:
acquiring a first feature vector according to the original entry configured with the specific mark and the combined statement described by the original entry;
obtaining a second feature vector according to the synonym entry configured with the specific mark and the combined statement described by the synonym entry;
acquiring a third feature vector according to the combined statement of the anti-definition vocabulary entry and the anti-definition vocabulary entry description configured with the specific mark;
determining a first contrast learning loss function according to the first feature vector, the second feature vector and the third feature vector
4. The method of claim 3, wherein the enhancement samples corresponding to the original vocabulary entries are obtained using an existing large language model, wherein the enhancement samples comprise at least 1 positive enhancement samples comprising enhancement synonyms and enhancement synonym descriptions; and, the term description judging task further includes:
obtaining a fourth feature vector according to the enhanced synonym entry configured with the specific mark and the combined statement described by the enhanced synonym entry;
determining the first contrast learning loss function according to the first feature vector, the fourth feature vector and the third feature vector
5. The method of claim 4, wherein the enhancement samples comprise at least 2 refractory case enhancement samples, the refractory case enhancement samples comprising refractory case enhancement vocabulary entries and refractory case enhancement vocabulary entry descriptions; and, the term description judging task further includes:
obtaining a fifth feature vector according to the combination statement of the first difficult-to-minus example enhancement vocabulary entry configured with the specific mark and the description of the first difficult-to-minus example enhancement vocabulary entry;
obtaining a sixth feature vector according to the combined statement of the first difficult-to-minus example enhancement vocabulary entry and the second difficult-to-minus example enhancement vocabulary entry description configured with the specific mark;
determining the second contrast learning loss function according to the fifth feature vector and the sixth feature vector
6. The method of claim 5, wherein predicting a loss function based on the termSaid first contrast learning loss function->And said second contrast learning loss function +.>Determining said target loss function->The method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>And->-said first contrast learning loss function respectively->And said second contrast learning loss function +.>Is a weight factor of (a).
7. The method of claim 1, wherein the fine data set comprises an open source instruction fine data set, a dictionary-based interpretation data set, and a dictionary-based question-answer data set; the dictionary-based question-answer data set comprises question-answer pairs constructed based on terms and term description matching relations.
8. A language generation model training apparatus adapted to the language generation model training method of any one of claims 1 to 7, the apparatus comprising:
the training corpus acquisition module can acquire training corpus; the training corpus comprises a plurality of pieces of dictionary information, wherein each piece of dictionary information comprises an original vocabulary entry, an original vocabulary entry description, and a dictionary positive example sample and a dictionary negative example sample corresponding to each original vocabulary entry;
the enhanced feature vector acquisition module can input the training corpus into a pre-trained dictionary enhanced language generation model adopting an encoder-decoder structure to acquire a plurality of feature vectors; the dictionary enhancement language generation model is obtained through an entry prediction task and an entry description judgment task, wherein the entry prediction task is used for predicting an entry according to the original entry and the original entry description, and the entry description judgment task is used for performing contrast learning according to the dictionary positive example sample, the dictionary negative example sample and the enhancement sample;
and the model training module can acquire a target loss function according to the plurality of feature vectors and the fine tuning data set so as to update parameters of the dictionary enhancement language generation model and acquire the dictionary enhancement language generation model subjected to fine tuning.
9. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor, when executing the computer program, realizes the steps of the method according to any of claims 1 to 7.
10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1 to 7.
CN202311533038.0A 2023-11-17 2023-11-17 Language generation model training method and device Active CN117251555B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311533038.0A CN117251555B (en) 2023-11-17 2023-11-17 Language generation model training method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311533038.0A CN117251555B (en) 2023-11-17 2023-11-17 Language generation model training method and device

Publications (2)

Publication Number Publication Date
CN117251555A true CN117251555A (en) 2023-12-19
CN117251555B CN117251555B (en) 2024-04-16

Family

ID=89131727

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311533038.0A Active CN117251555B (en) 2023-11-17 2023-11-17 Language generation model training method and device

Country Status (1)

Country Link
CN (1) CN117251555B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118410060A (en) * 2024-07-01 2024-07-30 杭州智通福科技有限公司 GQL corpus generation model training method, GQL corpus generation model training device and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220121884A1 (en) * 2011-09-24 2022-04-21 Z Advanced Computing, Inc. System and Method for Extremely Efficient Image and Pattern Recognition and Artificial Intelligence Platform
CN114528919A (en) * 2022-01-14 2022-05-24 北京健康之家科技有限公司 Natural language processing method and device and computer equipment
CN114707483A (en) * 2022-03-07 2022-07-05 华泰证券股份有限公司 Zero sample event extraction system and method based on contrast learning and data enhancement
CN114780691A (en) * 2022-06-21 2022-07-22 安徽讯飞医疗股份有限公司 Model pre-training and natural language processing method, device, equipment and storage medium
WO2022228041A1 (en) * 2021-04-26 2022-11-03 北京有竹居网络技术有限公司 Translation model training method, apparatus, and device, and storage medium
CN116150621A (en) * 2023-02-18 2023-05-23 阳光保险集团股份有限公司 Training method, device and equipment for text model
CN116595999A (en) * 2023-07-17 2023-08-15 深圳须弥云图空间科技有限公司 Machine translation model training method and device
WO2023160312A1 (en) * 2022-02-23 2023-08-31 厦门市美亚柏科信息股份有限公司 Person re-identification method and apparatus based on self-supervised learning, and device and storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220121884A1 (en) * 2011-09-24 2022-04-21 Z Advanced Computing, Inc. System and Method for Extremely Efficient Image and Pattern Recognition and Artificial Intelligence Platform
WO2022228041A1 (en) * 2021-04-26 2022-11-03 北京有竹居网络技术有限公司 Translation model training method, apparatus, and device, and storage medium
CN114528919A (en) * 2022-01-14 2022-05-24 北京健康之家科技有限公司 Natural language processing method and device and computer equipment
WO2023160312A1 (en) * 2022-02-23 2023-08-31 厦门市美亚柏科信息股份有限公司 Person re-identification method and apparatus based on self-supervised learning, and device and storage medium
CN114707483A (en) * 2022-03-07 2022-07-05 华泰证券股份有限公司 Zero sample event extraction system and method based on contrast learning and data enhancement
CN114780691A (en) * 2022-06-21 2022-07-22 安徽讯飞医疗股份有限公司 Model pre-training and natural language processing method, device, equipment and storage medium
CN116150621A (en) * 2023-02-18 2023-05-23 阳光保险集团股份有限公司 Training method, device and equipment for text model
CN116595999A (en) * 2023-07-17 2023-08-15 深圳须弥云图空间科技有限公司 Machine translation model training method and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118410060A (en) * 2024-07-01 2024-07-30 杭州智通福科技有限公司 GQL corpus generation model training method, GQL corpus generation model training device and storage medium

Also Published As

Publication number Publication date
CN117251555B (en) 2024-04-16

Similar Documents

Publication Publication Date Title
CN110427463B (en) Search statement response method and device, server and storage medium
CN113987209A (en) Natural language processing method, apparatus, computing device and storage medium based on knowledge-guided prefix fine-tuning
CN111078836A (en) Machine reading comprehension method, system and device based on external knowledge enhancement
Hao et al. BertNet: Harvesting knowledge graphs with arbitrary relations from pretrained language models
CN117149984B (en) Customization training method and device based on large model thinking chain
CN117251555B (en) Language generation model training method and device
CN116595999B (en) Machine translation model training method and device
Tang et al. AttenSy-SNER: software knowledge entity extraction with syntactic features and semantic augmentation information
Shu et al. Transcending language boundaries: Harnessing llms for low-resource language translation
CN112949293B (en) Similar text generation method, similar text generation device and intelligent equipment
CN114490926A (en) Method and device for determining similar problems, storage medium and terminal
Yang et al. Task independent fine tuning for word embeddings
WO2023169301A1 (en) Text processing method and apparatus, and electronic device
Opitz et al. Natural Language Processing RELIES on Linguistics
CN116860947A (en) Multiple-choice question generation method, system and storage medium for text reading comprehension
Pradeep et al. ConvKGYarn: Spinning Configurable and Scalable Conversational Knowledge Graph QA datasets with Large Language Models
Wang et al. Improving relation extraction by multi-task learning
Long A Grammatical Error Correction Model for English Essay Words in Colleges Using Natural Language Processing
CN111126066B (en) Method and device for determining Chinese congratulation technique based on neural network
Murugathas et al. Domain specific question & answer generation in tamil
Zhang et al. Robust dialog state tracker with contextual-feature augmentation
Cai et al. Editing Knowledge Representation of Language Lodel via Rephrased Prefix Prompts
Pereira et al. Predictive authoring for Brazilian Portuguese augmentative and alternative communication
Matsuyoshi et al. User's Intention Understanding in Question-Answering System Using Attention-based LSTM
Li et al. Improving word vector with prior knowledge in semantic dictionary

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant