CN116340779A - Training method and device for next-generation universal basic model and electronic equipment - Google Patents

Training method and device for next-generation universal basic model and electronic equipment Download PDF

Info

Publication number
CN116340779A
CN116340779A CN202310620027.XA CN202310620027A CN116340779A CN 116340779 A CN116340779 A CN 116340779A CN 202310620027 A CN202310620027 A CN 202310620027A CN 116340779 A CN116340779 A CN 116340779A
Authority
CN
China
Prior art keywords
training
model
data
basic model
generation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310620027.XA
Other languages
Chinese (zh)
Inventor
王业全
李响
姜鑫
孟绪颖
孙爱欣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhiyuan Artificial Intelligence Research Institute
Original Assignee
Beijing Zhiyuan Artificial Intelligence Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhiyuan Artificial Intelligence Research Institute filed Critical Beijing Zhiyuan Artificial Intelligence Research Institute
Priority to CN202310620027.XA priority Critical patent/CN116340779A/en
Publication of CN116340779A publication Critical patent/CN116340779A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a training method and device for a next-generation universal basic model and electronic equipment, and belongs to the technical field of natural language processing. Training a model by utilizing the original data in a language training stage so that a next generation universal basic model can generate corresponding unified data based on the input original data; the unified data is utilized to train the model in the training stage of the teacher so that the correctness of the proposition can be judged by the universal basic model of the next generation; and carrying out alternate iterative language training and teacher training on the model to obtain a trained next-generation universal basic model. The method teaches the model to learn task awareness data by utilizing language raw data and task awareness data in a training process while emphasizing its role as a language model. A user can process multiple tasks by utilizing the single model obtained by training, additional fine adjustment is not needed for each task, the modeling cost is low, the model generalization is strong, and the performance of business performance can be improved.

Description

Training method and device for next-generation universal basic model and electronic equipment
Technical Field
The present invention relates to the field of natural language processing technologies, and in particular, to a training method and apparatus for a universal basic model of the next generation, and an electronic device.
Background
Natural language processing (Natural Language Processing, NLP) is an important direction in the fields of computer science and artificial intelligence. It is studying various theories and methods that enable effective communication between a person and a computer in natural language.
At present, natural language processing models that are widely studied and used include small models and large models. The small model is a language model which is obtained by training each task by using special data independently. Since a corresponding model needs to be built for each task, not only is the modeling cost high, but also the generalization of the model is poor. The large model is a basic model, and is an AI model trained on a large-scale and widely-sourced data set based on a neural network and a self-supervision learning technology. The training of the base model comprises two steps, the first step being pre-training, the purpose of this step being to train the model using as many self-supervised data sets as possible. The pre-training method may use self-supervised learning techniques (e.g., autoregressive language models and self-coding techniques). Single-language, multi-language, and multi-modal models may be trained. The second step is fine tuning, which modifies the network for the specific task. The training data may be text, text-image pairs, text-video pairs. After fine tuning, the model is used for supporting various technologies such as classification, sequence marking, structure prediction, sequence generation and the like, and constructing applications such as abstract, machine translation, picture retrieval, video annotation and the like. Therefore, the basic model is mainly trained by adopting a paradigm of 'pre-training followed by fine tuning'. Although the model of the model obtained by the model form has been remarkably successful in many natural language processing tasks, the model only can learn language knowledge in the training stage of the language model and can not learn related knowledge in actual tasks, so that fine adjustment is required to be carried out on each task, and the modeling cost is very high. For example, a large model, after fine tuning of 100 tasks, will yield 100 dedicated large models, and each dedicated large model will be less generic.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides the following technical scheme.
The first aspect of the present invention provides a training method for a next-generation universal base model, comprising:
acquiring training data, wherein the training data comprises original data and unified data; the unified data comprises propositions of natural language description and corresponding true and false labels, and the unified data is obtained by converting the original data aiming at different original labels in a classification problem target space;
alternately iterating language training and teacher training on the next-generation universal basic model to be trained to obtain a trained next-generation universal basic model; wherein,,
training the next generation universal basic model by utilizing the original data in the language training stage so that the next generation universal basic model can generate corresponding unified data based on the input original data;
and training the next-generation universal basic model by utilizing the unified data in the training stage of the teacher so that the next-generation universal basic model can judge the correctness of the proposition.
Preferably, the unified data is for different original labels in the classification problem target space, and the converting the original data includes: under the guidance of a specific template, aiming at each original label in the classification problem target space, converting the original data into a proposition described by natural language.
Preferably, the specific template is used for completing the conversion rule from the original data to the unified data, including task description of the original data, context of the original data and mapping rule of the original label to the proposition.
Preferably, the raw data includes a natural language processing data set corresponding to at least one classification task, the at least one classification task including topic classification, natural language reasoning, emotion analysis, reading understanding, automatic question-answering, and story renewal.
Preferably, the natural language processing data set corresponding to the at least one classification task is integrated by means of building a template.
Preferably, in the training stage of a teacher, the model takes propositions as input and takes propositions correctness labels as supervision to learn; and adding a preset character at the end of the proposition, selecting the output of the model at the corresponding position of the preset character as the characteristic representation of the preset character, mapping the characteristic representation to a true and false two-dimensional space through a linear layer, and finally calculating the correctness probability of the proposition through a softmax activation function.
A second aspect of the present invention provides a training apparatus for a next generation universal base model, comprising:
the training data acquisition module is used for acquiring training data, wherein the training data comprises original data and unified data; the unified data comprises propositions of natural language description and corresponding true and false labels, and the unified data is obtained by converting the original data aiming at different original labels in a classification problem target space;
the model training module is used for carrying out language training and teacher training of alternate iteration on the next-generation universal basic model to be trained so as to obtain a trained next-generation universal basic model; wherein,,
training the next generation universal basic model by utilizing the original data in the language training stage so that the next generation universal basic model can generate corresponding unified data based on the input original data;
and training the next-generation universal basic model by utilizing the unified data in the training stage of the teacher so that the next-generation universal basic model can judge the correctness of the proposition.
A third aspect of the present invention provides a classification method using a next-generation universal base model, comprising:
inputting the original data to be classified into a next generation universal basic model, and outputting a classification result of the original data to be classified according to the original label corresponding to the proposition with the maximum accuracy probability;
the next-generation universal basic model is obtained by training in advance by adopting the training method in the first aspect.
A fourth aspect of the present invention provides a memory storing a plurality of instructions for implementing the training method of the next-generation generic base model according to the first aspect, or the classification method using the next-generation generic base model according to the third aspect.
A fifth aspect of the present invention provides an electronic device comprising a processor and a memory coupled to the processor, the memory storing a plurality of instructions that are loadable and executable by the processor to enable the processor to perform the method of training the next generation generic base model as described in the first aspect or the method of classifying using the next generation generic base model as described in the third aspect.
The beneficial effects of the invention are as follows: aiming at the technical problems of a large model obtained by adopting a model of pre-training and fine tuning in the prior art, the invention provides a method for establishing a next generation universal basic model. Because the learned task knowledge has the effect of reducing the search optimization space, the language knowledge and the task knowledge can mutually promote in the cyclic learning process, so that the training speed of the next-generation universal basic model is accelerated; in addition, the technology provided by the invention can use a single model trained by self-contained data and even open source data to process a plurality of specific tasks in the service, and additional fine tuning is not required for each task. Therefore, the invention can reduce modeling cost, has strong generalization of the model and can improve the performance of service performance.
Drawings
FIG. 1 is a schematic flow chart of a training method of a next generation universal basic model according to the present invention;
FIG. 2 is a schematic diagram of a unified data acquisition mode according to the present invention;
FIG. 3 is a schematic diagram of the training process (alternately iterating language training and teacher training) of the next generation universal base model according to the present invention;
FIG. 4 is a schematic diagram of the language training and teacher training modes according to the present invention;
fig. 5 is a functional structural diagram of a training device of the next-generation universal basic model according to the present invention.
Detailed Description
In order to better understand the above technical solutions, the following detailed description will be given with reference to the accompanying drawings and specific embodiments.
The method provided by the invention can be implemented in a terminal environment, and the terminal can comprise one or more of the following components: processor, memory and display screen. Wherein the memory stores at least one instruction that is loaded and executed by the processor to implement the method described in the embodiments below.
The processor may include one or more processing cores. The processor connects various parts within the overall terminal using various interfaces and lines, performs various functions of the terminal and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory, and invoking data stored in the memory.
The Memory may include random access Memory (Random Access Memory, RAM) or Read-Only Memory (ROM). The memory may be used to store instructions, programs, code, sets of codes, or instructions.
The display screen is used for displaying a user interface of each application program.
In addition, it will be appreciated by those skilled in the art that the structure of the terminal described above is not limiting and that the terminal may include more or fewer components, or may combine certain components, or a different arrangement of components. For example, the terminal further includes components such as a radio frequency circuit, an input unit, a sensor, an audio circuit, a power supply, and the like, which are not described herein.
Example 1
As shown in fig. 1, an embodiment of the present invention provides a training method for a next-generation universal base model, including: s101, training data is obtained, wherein the training data comprises original data and unified data; the unified data comprises propositions of natural language description and corresponding true and false labels, and the unified data is obtained by converting the original data aiming at different original labels in a classification problem target space; s102, carrying out alternate iterative language training and teacher training on a next-generation universal basic model to be trained so as to obtain a trained next-generation universal basic model; wherein, in the language training stage, the next generation universal basic model is trained by utilizing the original data so that the next generation universal basic model can generate corresponding unified data based on the input original data; and training the next-generation universal basic model by utilizing the unified data in the training stage of the teacher so that the next-generation universal basic model can judge the correctness of the proposition.
The method for converting the unified data is characterized in that the unified data is different original labels in a classification problem target space, and the method for converting the original data further comprises the following steps: under the guidance of a specific template, aiming at each original label in the classification problem target space, converting the original data into a proposition described by natural language.
In a preferred embodiment of the present invention, the specific templates are used to complete the conversion rules from raw data to unified data, including task descriptions of raw data, context of raw data, and mapping rules of raw labels to propositions.
The original data comprises a natural language processing data set corresponding to at least one classification task, wherein the at least one classification task comprises theme classification, natural language reasoning, emotion analysis, reading understanding, automatic question and answer and story renewal.
Further, the natural language processing data sets corresponding to the classification tasks can be integrated by means of building templates. For example, consider a text topic classification data: the original dataset is typically made up of a plurality of < text, tag > tuples, wherein tags can be used to represent text categories. And the finally constructed unified data is composed of a < proposition, correctness label > binary group. The "template" describes the rules for converting the propositions of the original data to unified data. A typical template is exemplified as follows: "[ background ] [ text ] [ proposition core ]"; where "[ background ]" represents "a piece of news text", "[ text ]" will be replaced with text in actual data, that is, the 'text' in original data, such as "NBA post-season express boat 2-1 leading sun...the", "[ proposition core ]" will be generated from the tag of each piece of original data, such as "sports" of the tag of original data, the [ proposition core ] will be replaced with "the subject of the piece of news is 'sports'. "the proposition part in the unified data finally constituted will be" the following is a piece of news text: the NBA season post-competition express boat 2-1 led the sun. For the "correctness label", the true label will generate correct propositions according to the true/false label determination used when constructing the data, that is, the propositions in the unified data are correct, and the correctness label can be expressed as 1; the wrong label generated by negative sampling will generate wrong propositions, i.e. the propositions are wrong in unified data, and the correctness label can be expressed as 0.
In another preferred embodiment of the invention, in the training stage of a teacher, the model learns by taking propositions as input and taking propositions correctness labels as supervision; adding a preset character, such as a 'cls' special character, at the end of the proposition, selecting the output of the model at the corresponding position of the preset character as the characteristic representation of the preset character, mapping the characteristic representation to a true and false two-dimensional space through a linear layer, and finally calculating the correctness probability of the proposition through a softmax activation function.
In the invention, in order to enable one language model to simultaneously have the solving capability of multiple tasks, various tasks are unified in form. Based on the assumption that all classification problems can be converted into non-judgment problems, the invention provides a modeling mode which can uniformly solve various classification problems, namely proposition correctness judgment. Accordingly, a variety of NLU (Natural Language Understanding ) datasets are integrated, building Unified Data (Unified Data) for training of the model.
FIG. 2 shows the unified modeling mode and the conversion process of unified data for judging the correctness of the proposition. Taking text topic classification tasks as an example: under the guidance of a specific Template (Template), the original data are respectively converted into a proposition described in natural language aiming at each label (0- "World",1- "Sports",2- "Business",3- "Sci/Tech") in the target space of the classification problem. Thus, one piece of raw data is converted into N propositions (N is the target sort space size), and only one of them is a true proposition. And finally, in the training stage of the teacher, respectively inputting the propositions and the corresponding true and false labels into the model for training the propositions correctness judging capability of the model. In the reasoning stage, the original data to be classified is input into the model as well, and the classification result of the original data to be classified can be output according to the original label corresponding to the proposition with the highest accuracy probability, so that the classification problem is solved.
Under the task form conversion framework, NLP data sets which are any classification task in essence can be integrated in a mode of respectively constructing templates for the data sets, the scale of unified data is continuously expanded, and generalization and robustness of a final training model on each task are further improved.
Based on the unified form of classification task, the training process of the next generation universal basic model (FreeLM) can be as shown in fig. 3, and the model training of the FreeLM is divided into two stages: the language training stage and the teacher training stage respectively correspond to two training targets of 'prediction of next word' and 'judgment of proposition correctness'. The two training stages are alternately iterated, so that the model is guaranteed to have the text generation capability and the proposition correctness judging capability of the language model. The FreeLM can be trained in a manner as shown in FIG. 4, wherein the language training phase (Language Iterative Stage) of the model (transducer) is indifferent to conventional autoregressive language model training, which predicts the kth word based on the given top k-1 words. Taking The language training phase (a) in fig. 4 as an example, the model receives "The cat sat on The mat" as input, and through The attention masking mechanism, the model will perform current word prediction under The condition that only word sequence information before The current predicted word is received, such as receiving "The cat", predicting "sat", or receiving "The cat sat on The", predicting "mat". The retention of the language model training target ensures the language generation capability of the model. And a teacher training stage (Teacher Iterative Stage) of the model takes the constructed unified data (propositions) as input and takes the propositional correctness labels as supervision to learn. And finally adding a '[ cls ]' special symbol in the proposition text, selecting the output of the model at the corresponding position as a characteristic representation, mapping to a True/False two-dimensional space through a linear layer, and calculating the correctness probability of the proposition through a softmax activation function. Taking the teacher training phase (b) in fig. 4 as an example, the input first describes that the current task is natural language reasoning (natural language inference), and the two sentences given respectively represent "The cat sat on the mat (cat sitting on mat)", "The cat did not sat on the mat (cat not sitting on mat)"; and the proposition is expressed as "Here, the premise entails the latter hypothesis (the former premise implies the latter assumption)". Obviously, the expressions of the two sentences are contradictory and have no implication. Thus, the proposition is wrong, and the model will also be trained to judge the proposition as wrong.
The large model constructed by the method provided by the invention has the following beneficial effects: (1) The language knowledge and the task knowledge are subjected to alternate loop iterative learning in a model, so that a Fine-Tuning-Free target is realized, and the modeling cost is reduced; (2) In the natural language understanding task, the method is greatly superior to the GPT (generating Pre-trained Transformer) series, including InstructGPT, has a very large training data set expansion space, and the capacity can be continuously expanded along with the size of a model and the enlargement of the training set; (3) On the task of natural language generation, the next generation universal basic model constructed by the invention shows understanding advantages in the practical effect. More importantly, the capabilities of the model such as guiding learning, context learning and the like are completely compatible, so that the language generation capability can be further enhanced; (4) The solution provided by the present invention can be adapted to basic models in almost all fields including, but not limited to, e.g., vision, speech, music, authoring, biology, medicine, etc.
Example two
As shown in fig. 5, an embodiment of the present invention provides a training apparatus for a next-generation universal base model, including: a training data acquisition module 501, configured to acquire training data, where the training data includes original data and unified data; the unified data comprises propositions of natural language description and corresponding true and false labels, and the unified data is obtained by converting the original data aiming at different original labels in a classification problem target space; the model training module 502 is configured to perform language training and teacher training alternately and iteratively on a next-generation universal basic model to be trained, so as to obtain a trained next-generation universal basic model.
Wherein, in the language training stage, the next generation universal basic model is trained by utilizing the original data so that the next generation universal basic model can generate corresponding unified data based on the input original data; and training the next-generation universal basic model by utilizing the unified data in the training stage of the teacher so that the next-generation universal basic model can judge the correctness of the proposition.
Further, in the training data obtaining module, the unified data is different original labels in the classification problem target space, and the converting the original data further includes: under the guidance of a specific template, aiming at each original label in the classification problem target space, converting the original data into a proposition described by natural language.
The specific template is used for completing the conversion rule from the original data to the unified data, and comprises a task description of the original data, a context of the original data and a mapping rule from an original label to a proposition.
Further, the raw data includes a natural language processing dataset corresponding to at least one classification task including topic classification, natural language reasoning, emotion analysis, reading understanding, automatic question-answering, and story renewal.
Furthermore, the natural language processing data set corresponding to the at least one classification task is integrated by means of a template construction.
Further, in the model training module, in the teacher training stage, the model takes propositions as input and the propositions correctness labels as supervision to learn; and adding a preset character at the end of the proposition, selecting the output of the model at the corresponding position of the preset character as the characteristic representation of the preset character, mapping the characteristic representation to a true and false two-dimensional space through a linear layer, and finally calculating the correctness probability of the proposition through a softmax activation function.
The device may be implemented by the training method of the next-generation universal base model provided in the first embodiment, and the specific implementation method may be described in the first embodiment, which is not described herein.
Example III
The embodiment of the invention provides a classification method using a next generation universal basic model, which comprises the following steps: inputting the original data to be classified into a next generation universal basic model, and outputting a classification result of the original data to be classified according to the original label corresponding to the proposition with the maximum accuracy probability; the next-generation universal basic model is obtained by training in advance by adopting the training method in the first embodiment.
The training method of the next-generation universal base model can be specifically referred to the description of the first embodiment, and will not be described herein.
The embodiment of the invention also provides a memory, which stores a plurality of instructions for realizing the training method of the next-generation universal basic model according to the first embodiment or the classification method using the next-generation universal basic model according to the third embodiment.
The embodiment of the invention also provides electronic equipment, which comprises a processor and a memory connected with the processor, wherein the memory stores a plurality of instructions which can be loaded and executed by the processor so that the processor can execute the training method of the next-generation general basic model as described in the first embodiment or the classification method using the next-generation general basic model as described in the third embodiment.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention. It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (10)

1. A method for training a next generation generic base model, comprising:
acquiring training data, wherein the training data comprises original data and unified data; the unified data comprises propositions of natural language description and corresponding true and false labels, and the unified data is obtained by converting the original data aiming at different original labels in a classification problem target space;
alternately iterating language training and teacher training on the next-generation universal basic model to be trained to obtain a trained next-generation universal basic model; wherein,,
training the next generation universal basic model by utilizing the original data in the language training stage so that the next generation universal basic model can generate corresponding unified data based on the input original data;
and training the next-generation universal basic model by utilizing the unified data in the training stage of the teacher so that the next-generation universal basic model can judge the correctness of the proposition.
2. The method for training a next-generation generic base model according to claim 1, wherein the unified data is for different original labels in a classification problem target space, and the converting the original data further comprises: under the guidance of a specific template, aiming at each original label in the classification problem target space, converting the original data into a proposition described by natural language.
3. The training method of the next-generation generic basic model according to claim 2, wherein the specific templates are used to complete the conversion rules from the original data to the unified data, including the task description of the original data, the context of the original data, and the mapping rules of the original labels to the propositions.
4. The training method of the next-generation generic basic model of claim 1, wherein the raw data comprises a natural language processing dataset corresponding to at least one classification task, the at least one classification task comprising topic classification, natural language reasoning, emotion analysis, reading understanding, automatic question-answering, and story-renewal.
5. The method of claim 4, wherein the natural language processing data set corresponding to the at least one classification task is integrated by way of a template.
6. The training method of the next-generation universal basic model according to claim 1, wherein in the training stage of a teacher, the model learns by taking a proposition as an input and taking a proposition correctness label as a supervision; and adding a preset character at the end of the proposition, selecting the output of the model at the corresponding position of the preset character as the characteristic representation of the preset character, mapping the characteristic representation to a true and false two-dimensional space through a linear layer, and finally calculating the correctness probability of the proposition through a softmax activation function.
7. A training device for a next generation generic basic model, comprising:
the training data acquisition module is used for acquiring training data, wherein the training data comprises original data and unified data; the unified data comprises propositions of natural language description and corresponding true and false labels, and the unified data is obtained by converting the original data aiming at different original labels in a classification problem target space;
the model training module is used for carrying out language training and teacher training of alternate iteration on the next-generation universal basic model to be trained so as to obtain a trained next-generation universal basic model; wherein,,
training the next generation universal basic model by utilizing the original data in the language training stage so that the next generation universal basic model can generate corresponding unified data based on the input original data;
and training the next-generation universal basic model by utilizing the unified data in the training stage of the teacher so that the next-generation universal basic model can judge the correctness of the proposition.
8. A classification method using a next-generation universal base model, comprising:
inputting the original data to be classified into a next generation universal basic model, and outputting a classification result of the original data to be classified according to the original label corresponding to the proposition with the maximum accuracy probability;
wherein the next-generation universal base model is trained beforehand using the training method according to any one of claims 1 to 6.
9. A memory, characterized in that a plurality of instructions for realizing the training method of the next-generation general basic model according to any one of claims 1 to 6 or the classification method using the next-generation general basic model according to claim 8 are stored.
10. An electronic device comprising a processor and a memory coupled to the processor, the memory storing a plurality of instructions that are loadable and executable by the processor to enable the processor to perform the method of training the next generation generic basis model as set forth in any one of claims 1-6 or the method of classifying using the next generation generic basis model as set forth in claim 8.
CN202310620027.XA 2023-05-30 2023-05-30 Training method and device for next-generation universal basic model and electronic equipment Pending CN116340779A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310620027.XA CN116340779A (en) 2023-05-30 2023-05-30 Training method and device for next-generation universal basic model and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310620027.XA CN116340779A (en) 2023-05-30 2023-05-30 Training method and device for next-generation universal basic model and electronic equipment

Publications (1)

Publication Number Publication Date
CN116340779A true CN116340779A (en) 2023-06-27

Family

ID=86879096

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310620027.XA Pending CN116340779A (en) 2023-05-30 2023-05-30 Training method and device for next-generation universal basic model and electronic equipment

Country Status (1)

Country Link
CN (1) CN116340779A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220067274A1 (en) * 2020-09-02 2022-03-03 Zhejiang Lab Compression method and platform of pre-training language model based on knowledge distillation
CN114528383A (en) * 2021-12-29 2022-05-24 阿里云计算有限公司 Pre-training language model processing method based on comparative learning and intelligent question-answering system
CN114818902A (en) * 2022-04-21 2022-07-29 浪潮云信息技术股份公司 Text classification method and system based on knowledge distillation
CN115526332A (en) * 2022-08-17 2022-12-27 阿里巴巴(中国)有限公司 Student model training method and text classification system based on pre-training language model

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220067274A1 (en) * 2020-09-02 2022-03-03 Zhejiang Lab Compression method and platform of pre-training language model based on knowledge distillation
CN114528383A (en) * 2021-12-29 2022-05-24 阿里云计算有限公司 Pre-training language model processing method based on comparative learning and intelligent question-answering system
CN114818902A (en) * 2022-04-21 2022-07-29 浪潮云信息技术股份公司 Text classification method and system based on knowledge distillation
CN115526332A (en) * 2022-08-17 2022-12-27 阿里巴巴(中国)有限公司 Student model training method and text classification system based on pre-training language model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
XIANG LI 等: "FreeLM: Fine-Tuning-Free Language Model", 《HTTP:// ARXIV.ORG/PDF/2305.01616.PDF》, pages 1 - 11 *

Similar Documents

Publication Publication Date Title
US11501182B2 (en) Method and apparatus for generating model
CN109992782B (en) Legal document named entity identification method and device and computer equipment
CN113987209B (en) Natural language processing method, device, computing equipment and storage medium based on knowledge-guided prefix fine adjustment
CN110490213B (en) Image recognition method, device and storage medium
CN107273355B (en) Chinese word vector generation method based on word and phrase joint training
CN110807332A (en) Training method of semantic understanding model, semantic processing method, semantic processing device and storage medium
CN111708869B (en) Processing method and device for man-machine conversation
CN110795945A (en) Semantic understanding model training method, semantic understanding device and storage medium
CN110263325A (en) Chinese automatic word-cut
CN111898636B (en) Data processing method and device
CN113987147A (en) Sample processing method and device
CN111782769A (en) Intelligent knowledge graph question-answering method based on relation prediction
CN116820429B (en) Training method and device of code processing model, electronic equipment and storage medium
CN114676255A (en) Text processing method, device, equipment, storage medium and computer program product
CN111858898A (en) Text processing method and device based on artificial intelligence and electronic equipment
CN115034201A (en) Augmenting textual data for sentence classification using weakly supervised multi-reward reinforcement learning
CN114510570A (en) Intention classification method and device based on small sample corpus and computer equipment
CN111597815A (en) Multi-embedded named entity identification method, device, equipment and storage medium
CN111475645B (en) Knowledge point labeling method, knowledge point labeling device and computer readable storage medium
CN116258137A (en) Text error correction method, device, equipment and storage medium
CN111710428A (en) Biomedical text representation method for modeling global and local context interaction
CN115062617A (en) Task processing method, device, equipment and medium based on prompt learning
CN111597816A (en) Self-attention named entity recognition method, device, equipment and storage medium
CN113326367B (en) Task type dialogue method and system based on end-to-end text generation
CN117236335B (en) Two-stage named entity recognition method based on prompt learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20230627