CN113961669A - Training method of pre-training language model, storage medium and server - Google Patents

Training method of pre-training language model, storage medium and server Download PDF

Info

Publication number
CN113961669A
CN113961669A CN202111251502.8A CN202111251502A CN113961669A CN 113961669 A CN113961669 A CN 113961669A CN 202111251502 A CN202111251502 A CN 202111251502A CN 113961669 A CN113961669 A CN 113961669A
Authority
CN
China
Prior art keywords
text
word
training
language model
words
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111251502.8A
Other languages
Chinese (zh)
Inventor
程德生
王梨
余星
万晶
钱刚
周靖峰
陈志方
刘阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Soft Hangzhou Anren Network Communication Co ltd
Original Assignee
China Soft Hangzhou Anren Network Communication Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Soft Hangzhou Anren Network Communication Co ltd filed Critical China Soft Hangzhou Anren Network Communication Co ltd
Priority to CN202111251502.8A priority Critical patent/CN113961669A/en
Publication of CN113961669A publication Critical patent/CN113961669A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a training method, a storage medium and a server for a pre-training language model, wherein the training method is used for pre-training a language model in the general field by utilizing a text corpus of a specific scene, and the obtained pre-training language model in the professional field can better capture unique information in the corpus under the specific scene. The words are segmented through the word segmentation tool, so that the whole words are used as targets for being shielded or not, the training difficulty of the language model can be increased, the semantic understanding capability of the language model is improved, and the accuracy of the pre-training language model obtained through training is further improved. The added category label information of each text is rich in abundant semantic information, and the pre-training language model can better understand the whole language effect by adding the category label information. The accuracy and efficiency in the process of processing the downstream natural language processing task by adopting the pre-training language model are improved.

Description

Training method of pre-training language model, storage medium and server
Technical Field
The invention relates to the field of natural language processing, in particular to a training method, a storage medium and a server for a pre-training language model.
Background
Natural language processing is an important branch of the field of artificial intelligence. Pre-trained language models have proven effective in practice for enhancing many natural language processing tasks, such as natural language inference, question answering, sequence tagging, and the like. Currently, the success of the Masking Language Model (MLM) is: some single-character words in the sentence are initially masked randomly with a probability of 15%, and the masking language model learns the filled single-character words where they are masked according to given labels.
This training method is relatively simplistic in that it directly predicts the filled-in single-character words at each occluded position, which is much simpler than the predicted vocabulary and is relatively simplistic in training tasks. Furthermore, the original mask language model does not use tagged corpus data, but in some scenarios is able to obtain tagged corpus data.
Disclosure of Invention
The invention provides a training method, a storage medium and a server of a pre-training language model, which are used for improving the accuracy of the pre-training language model obtained by training, so that the accuracy and the efficiency of processing a downstream natural language processing task by adopting the pre-training language model are improved.
In a first aspect, the present invention provides a training method for pre-training a language model, the training method comprising:
acquiring a text corpus of a specific scene, wherein the text corpus comprises a plurality of texts;
labeling each text by adopting a category label;
performing word segmentation on each text by adopting a word segmentation tool to obtain a word segmented text of each text;
inputting each segmented text into a Word2vec model for training to obtain a Word bank containing Word vector information of each Word;
adding a start marker and a first end marker in front of and behind the text after each word segmentation respectively;
adding a category label of each text after each word segmentation behind the first end marker of each text, and adding a second end marker behind the category label to obtain a text containing a label of each text;
randomly selecting words for masking each text containing the label according to a set probability value, extracting similar words of each masked Word from a Word bank through a Word2vec model, and replacing the similar words to obtain a masked replacement text of each text;
converting each piece of text and the mask replacement text of the text into a numeric ID;
and inputting the digital ID and the class label of each text into a pre-training language model for supervision training to obtain the pre-training language model containing label information.
In the above scheme, the language model in the general field is pre-trained by using the text corpus in the specific scene, and the obtained pre-trained language model in the professional field can better capture unique information in the text corpus in the specific scene. The words are segmented through the word segmentation tool, so that the whole words are used as targets for being shielded or not, the training difficulty of the language model can be increased, the semantic understanding capability of the language model is improved, and the accuracy of the pre-training language model obtained through training is further improved. In addition, the added category label information of each text is rich in abundant semantic information, and the pre-training language model can better understand the whole language effect by adding the category label information. The accuracy and efficiency in the process of processing the downstream natural language processing task by adopting the pre-training language model are improved.
In a specific embodiment, the word segmentation tool is a Jieba word segmentation tool or a Hanlp word segmentation tool.
In a specific embodiment, inputting each segmented text into a Word2vec model for training, and obtaining a lexicon containing Word vector information of each Word comprises: and predicting the central Word through the surrounding words based on a Word2vec model to obtain Word vector information of each Word, so that the accuracy of the obtained Word vector information is improved.
In a specific embodiment, the start marker is [ cls ], and the first end marker and the second end marker are both [ sep ].
In a specific embodiment, adding the category label of each piece of text after the first end marker of the piece of text after word segmentation comprises: defining n class labels as [ unused1], [ unused2], [ unused3], … and [ unused n ] in [ unused ], respectively; and splicing the [ unused ] corresponding to the category label of each text behind the first end marker of each participled text. So as to be convenient for integrating the class label information and improve the accuracy of the pre-training language model.
In a specific embodiment, randomly selecting words for masking each text containing tags according to a set probability value, and extracting similar words of each masked Word from a Word library through a Word2vec model for similar Word replacement to obtain a masked replacement text of each text, wherein the step of obtaining the masked replacement text of each text comprises the following steps: continuously shielding m words of each text containing the label by adopting an N-gram model; wherein, m is [ set a probability value as the total number of words included in the word after the word segmentation ]; skipping the current word when the current word is the start marker, the first end marker, or the second end marker; when the current Word needs to be masked, replacing the current Word with [ mask ] according to the probability of P1, keeping the current Word unchanged according to the probability of P2, extracting similar words of the current Word from a Word stock through a Word2vec model according to the probability of (1-P1-P2) to replace the similar words, and obtaining a masked replacement text of each text; and the similar word length is the same as the current word length. So as to improve the semantic comprehension capability of the language model and further improve the accuracy of the pre-training language model obtained through training.
In a specific embodiment, a probability value of 15% is set; 80% of P1 and 10% of P2, so as to improve the accuracy of the pre-trained language model obtained by final training.
In one particular embodiment, converting each piece of text and the occluding replacement text of the text to a numeric ID comprises: cutting words of each text according to BPE, and converting each text into a digital ID according to Vocab. txt text; and cutting words of the mask replacement text of each text according to BPE, and converting the mask replacement text into a numerical ID according to Vocab.
In a second aspect, the present invention also provides a storage medium having a computer program stored therein, which, when run on a computer, causes the computer to perform any of the training methods described above.
In a third aspect, the present invention further provides a server, which includes a processor and a memory, wherein the memory stores a computer program, and the processor is configured to execute any one of the training methods by calling the computer program stored in the memory.
Drawings
FIG. 1 is a flowchart of a training method for pre-training a language model according to an embodiment of the present invention;
fig. 2 is a flowchart of another training method for pre-training a language model according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to facilitate understanding of the training method of the pre-training language model provided by the embodiment of the present invention, an application scenario of the pre-training language model provided by the embodiment of the present invention is described below. The training method is described in detail below with reference to the accompanying drawings.
Referring to fig. 1, a training method for a pre-training language model provided by an embodiment of the present invention includes:
s10: obtaining corpus data of a specific scene, wherein the text corpus comprises a plurality of texts;
s20: labeling each text by adopting a category label;
s30: performing word segmentation on each text by adopting a word segmentation tool to obtain a word segmented text of each text;
s40: inputting each segmented text into a Word2vec model for training to obtain a Word bank containing Word vector information of each Word;
s50: adding a start marker and a first end marker in front of and behind the text after each word segmentation respectively;
s60: adding a category label of each text after each word segmentation behind the first end marker of each text, and adding a second end marker behind the category label to obtain a text containing a label of each text;
s70: randomly selecting words for masking each text containing the label according to a set probability value, extracting similar words of each masked Word from a Word bank through a Word2vec model, and replacing the similar words to obtain a masked replacement text of each text;
s80: converting each piece of text and the mask replacement text of the text into a numeric ID;
s90: and inputting the digital ID and the class label of each text into a pre-training language model for supervision training to obtain the pre-training language model containing label information.
In the above scheme, the language model in the general field is pre-trained by using the text corpus in the specific scene, and the obtained pre-trained language model in the professional field can better capture unique information in the text corpus in the specific scene. The words are segmented through the word segmentation tool, so that the whole words are used as targets for being shielded or not, the training difficulty of the language model can be increased, the semantic understanding capability of the language model is improved, and the accuracy of the pre-training language model obtained through training is further improved. In addition, the added category label information of each text is rich in abundant semantic information, and the pre-training language model can better understand the whole language effect by adding the category label information. The accuracy and efficiency in the process of processing the downstream natural language processing task by adopting the pre-training language model are improved. Each of the above steps will be described in detail with reference to the accompanying drawings.
First, referring to fig. 1 and fig. 2, text corpus data of a specific scene is obtained. The text corpus data includes a plurality of texts, and specifically, the number of the texts included in the text corpus may be 50, 100, 200, and the like. The specific scene can be professional scenes such as sports, finance and economics, military affairs, entertainment, history and tax, and correspondingly, the text corpora under the specific scene also belong to texts in the professional field.
Next, as shown in fig. 1 and 2, each text is labeled with a category label. During specific labeling, each text in a specific scene can be labeled in a manual labeling mode. The labeled labels are the categories to which each piece of text belongs. And the categories corresponding to all texts in the text corpus at least comprise two categories. It should be noted that this category is a finer grained category relative to the scene category. For example, in a specific scene in the sports field, the classification of the text corpus may be classified into basketball, football, badminton, and the like.
Next, with continued reference to fig. 1 and fig. 2, a word segmentation tool is used to segment each text to obtain a segmented text of each text. When the words are specifically segmented, an open-source word segmentation tool such as a Jieba word segmentation tool or a Hanlp word segmentation tool can be adopted to segment each text through the open-source word segmentation tool, so as to obtain the segmented text of each text.
Next, as shown in fig. 1 and fig. 2, inputting each segmented text into Word2vec model training to obtain a Word bank containing Word vector information of each Word. Namely, after Word segmentation of each text, the text is input into a Word2vec model for training, so that a Word bank containing Word vector information of each Word can be obtained and used as a similar Word bank for subsequent similar Word replacement.
In addition, when the Word after each participle is specifically input into a Word2vec model for training to obtain a Word bank containing Word vector information of each Word, the Word vector information of each Word can be obtained by predicting a central Word through surrounding words based on the Word2vec model, and the accuracy of the obtained Word vector information is improved.
Next, as shown in fig. 1 and 2, a start marker and a first end marker are added in front of and behind the text after each word segmentation, respectively. The start marker may be [ cls ], the first end marker may be [ sep ], and the text after adding the start marker and the first end marker is: [ cls ] post-segmentation text [ sep ].
Next, with continued reference to fig. 1 and fig. 2, a category label of each piece of text is added after the first end marker of the text after word segmentation, and a second end marker is added after the category label, so as to obtain a labeled text of each piece of text. Wherein the second end marker may also be [ sep ]. The label-containing text of each text formed after the label category is added is as follows: [ cls ] text [ sep ] category tag [ sep ] after word segmentation.
In addition, when adding the category label of each piece of text after word segmentation after the first end marker of the text, n category labels can be defined as [ unused1], [ unused2], [ unused3], …, [ unused n ] in [ unused ] in sequence. Where n is obviously any integer greater than 1. And then splicing the [ unused ] corresponding to the category label of each text behind the first end marker of each participled text. That is, the text containing the label of each finally formed text is: [ cls ] the participled text [ sep ] [ unsugedi ] [ sep ], wherein [ unsugedi ] is [ unused ] corresponding to each participled text. So as to be convenient for integrating the class label information and improve the accuracy of the pre-training language model.
And then, randomly selecting words for each text containing the labels according to a set probability value to carry out occlusion, extracting similar words of each occluded Word from a Word library through a Word2vec model to carry out similar Word replacement, and obtaining an occlusion replacement text of each text. The set probability value may be about 15% specifically, so as to improve the accuracy of the pre-training language model obtained by the final training.
Specifically, for each text containing the tag, randomly selecting words according to a set probability value for masking, extracting similar words of each masked Word from a Word library through a Word2vec model for similar Word replacement, and when obtaining the masked and replaced text of each text, continuously masking m words of each text containing the tag in an N-gram model mode. Wherein, m is [ set probability value ═ total word number included in the segmented text ], that is, the number m of words that each text containing labels needs to be masked is related to the total word number included in the segmented text corresponding to the text containing labels. The size of the word is a set probability value, and the rounding mode can be upwards rounding or downwards rounding.
When the current word is the start marker, the first end marker or the second end marker, the current word needs to be skipped to prevent the markers from being masked.
In addition, when the current Word needs to be masked, replacing the current Word with [ mask ] according to the probability of P1, keeping the current Word unchanged according to the probability of P2, extracting similar words of the current Word from a Word stock through a Word2vec model according to the probability of (1-P1-P2) for similar Word replacement, and obtaining a masked replacement text of each text; and the similar word length is the same as the current word length. So as to improve the semantic comprehension capability of the language model and further improve the accuracy of the pre-training language model obtained through training. Wherein, P1-80% and P2-10% can be adopted. Namely, the adopted scheme is as follows: when the current Word needs to be shielded, replacing the current Word with [ mask ] at a probability of 80%, keeping the current Word unchanged at a probability of 10%, extracting similar words of the current Word from a Word stock through a Word2vec model at a probability of 10% to perform similar Word replacement, and obtaining a shielding replacement text of each text so as to improve the accuracy of a pre-training language model obtained by final training.
In addition, when similar words of the current Word are extracted from the Word stock through the Word2vec model for similar Word replacement, Word vectors of the current Word can be calculated through the Word2vec model, words with the highest similarity and the same length as those of the Word vectors of the current Word are selected from the Word stock to serve as the similar words of the current Word, and similar Word replacement is carried out, so that the accuracy of the pre-training language model obtained through final training is improved.
Next, as shown in fig. 1 and 2, each piece of text and the mask-replacement text of the text are converted into a numeric ID. In some embodiments, each piece of text may be cut into words according to BPE to obtain a token sequence at a character level, and converted into a numeric ID according to vocab. And cutting the mask replacement text of each text according to BPE to obtain a token sequence at a character level, and converting the mask replacement text into a digital ID according to the Vocab. The Vocab. txt text is a word stock file of a universal Chinese pre-training model and can be obtained through downloading. The BPE can be Byte Piece Encoding, and is a simple character-dividing algorithm.
And finally, inputting the digital ID and the class label of each text into a pre-training language model for supervision training to obtain the pre-training language model containing label information. And finally obtaining the pre-training language model of the N-gram and the full word mask based on the label information.
Compared with the prior art, the language model in the general field is pre-trained by utilizing the text corpus of the specific scene, and the obtained pre-trained language model in the professional field can better capture unique information in the text corpus of the specific scene. The words are segmented through the word segmentation tool, so that the whole words are used as targets for being shielded or not, the training difficulty of the language model can be increased, the semantic understanding capability of the language model is improved, and the accuracy of the pre-training language model obtained through training is further improved. In addition, the added category label information of each text is rich in abundant semantic information, and by adding the category label information, the generalization capability of the language model can be better improved, and the pre-training language model can better understand the whole language effect. The accuracy and efficiency in the process of processing the downstream natural language processing task by adopting the pre-training language model are improved. The pre-training language model based on the N-gram of the label information and the whole word shielding can increase the difficulty of the language model training and improve the semantic comprehension capability of the pre-training language model.
Furthermore, an embodiment of the present invention further provides a storage medium, where a computer program is stored, and when the computer program runs on a computer, the computer is caused to execute any one of the training methods described above. The above description is referred to for the effect, and the description is omitted here.
In addition, an embodiment of the present invention further provides a server, where the server includes a processor and a memory, where the memory stores a computer program, and the processor is configured to execute any one of the training methods by calling the computer program stored in the memory. The above description is referred to for the effect, and the description is omitted here.
The above description is only for the specific embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (10)

1. A training method for pre-training a language model, comprising:
acquiring a text corpus of a specific scene, wherein the text corpus comprises a plurality of texts;
labeling each text by adopting a category label;
performing word segmentation on each text by adopting a word segmentation tool to obtain a word segmented text of each text;
inputting each segmented text into a Word2vec model for training to obtain a Word bank containing Word vector information of each Word;
adding a start marker and a first end marker in front of and behind the text after each word segmentation respectively;
adding a category label of each text after each word segmentation behind the first end marker of each text, and adding a second end marker behind the category label to obtain a text containing a label of each text;
randomly selecting words for masking each text containing the label according to a set probability value, extracting similar words of each masked Word from the Word stock through a Word2vec model, and replacing the similar words to obtain a masked and replaced text of each text;
converting each piece of text and the mask replacement text of the text into a numeric ID;
and inputting the digital ID and the class label of each text into a pre-training language model for supervision training to obtain the pre-training language model containing label information.
2. The training method of claim 1, wherein the word segmentation tool is a Jieba word segmentation tool or a Hanlp word segmentation tool.
3. The training method of claim 1, wherein the step of inputting each segmented text into a Word2vec model for training to obtain a lexicon containing Word vector information of each Word comprises:
and predicting the central Word through the surrounding words based on a Word2vec model to obtain Word vector information of each Word.
4. The training method of claim 1, wherein the start marker is [ cls ] and the first end marker and the second end marker are both [ sep ].
5. The training method of claim 1, wherein said appending a category label for each participled text after its first end marker comprises:
defining n class labels as [ unused1], [ unused2], [ unused3], … and [ unused n ] in [ unused ], respectively;
and splicing the [ unused ] corresponding to the category label of each text behind the first end marker of each participled text.
6. The training method of claim 1, wherein for each text containing labels, randomly selecting words for masking according to a set probability value, and extracting similar words of each masked Word from the lexicon through a Word2vec model for similar Word replacement, and obtaining a masked replacement text of each text comprises:
continuously shielding m words of each text containing the label by adopting an N-gram model; wherein m is [ the set probability value is the total number of words included in the word after the word segmentation ];
skipping a current word when the current word is the start marker, the first end marker, or the second end marker;
when the current Word needs to be masked, replacing the current Word with [ mask ] according to the probability of P1, keeping the current Word unchanged according to the probability of P2, extracting similar words of the current Word from the Word stock through a Word2vec model according to the probability of (1-P1-P2) for similar Word replacement, and obtaining a masked replacement text of each text; and the similar word length is the same as the current word length.
7. The training method of claim 6, wherein the set probability value is 15%; p1-80%, P2-10%.
8. The training method of claim 1, wherein converting each piece of text and the occluding replacement text of the text to a numeric ID comprises:
cutting words of each text according to BPE, and converting each text into the digital ID according to Vocab. txt text;
and cutting words of the mask replacement text of each text according to BPE, and converting the mask replacement text into the numerical ID according to Vocab.
9. A storage medium having stored thereon a computer program which, when run on a computer, causes the computer to perform the training method of any one of claims 1 to 8.
10. A server, characterized by comprising a processor and a memory, wherein the memory stores a computer program, and the processor executes the training method according to any one of claims 1 to 8 by calling the computer program stored in the memory.
CN202111251502.8A 2021-10-26 2021-10-26 Training method of pre-training language model, storage medium and server Pending CN113961669A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111251502.8A CN113961669A (en) 2021-10-26 2021-10-26 Training method of pre-training language model, storage medium and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111251502.8A CN113961669A (en) 2021-10-26 2021-10-26 Training method of pre-training language model, storage medium and server

Publications (1)

Publication Number Publication Date
CN113961669A true CN113961669A (en) 2022-01-21

Family

ID=79467298

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111251502.8A Pending CN113961669A (en) 2021-10-26 2021-10-26 Training method of pre-training language model, storage medium and server

Country Status (1)

Country Link
CN (1) CN113961669A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114626371A (en) * 2022-03-22 2022-06-14 鼎富智能科技有限公司 Training method and device for pre-training language model
CN117709355A (en) * 2024-02-05 2024-03-15 四川蜀天信息技术有限公司 Method, device and medium for improving training effect of large language model

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110489555A (en) * 2019-08-21 2019-11-22 创新工场(广州)人工智能研究有限公司 A kind of language model pre-training method of combination class word information
CN111444721A (en) * 2020-05-27 2020-07-24 南京大学 Chinese text key information extraction method based on pre-training language model
CN111950281A (en) * 2020-07-02 2020-11-17 中国科学院软件研究所 Demand entity co-reference detection method and device based on deep learning and context semantics
CN112257421A (en) * 2020-12-21 2021-01-22 完美世界(北京)软件科技发展有限公司 Nested entity data identification method and device and electronic equipment
CN112507628A (en) * 2021-02-03 2021-03-16 北京淇瑀信息科技有限公司 Risk prediction method and device based on deep bidirectional language model and electronic equipment
CN112560486A (en) * 2020-11-25 2021-03-26 国网江苏省电力有限公司电力科学研究院 Power entity identification method based on multilayer neural network, storage medium and equipment
CN112612892A (en) * 2020-12-29 2021-04-06 达而观数据(成都)有限公司 Special field corpus model construction method, computer equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110489555A (en) * 2019-08-21 2019-11-22 创新工场(广州)人工智能研究有限公司 A kind of language model pre-training method of combination class word information
CN111444721A (en) * 2020-05-27 2020-07-24 南京大学 Chinese text key information extraction method based on pre-training language model
CN111950281A (en) * 2020-07-02 2020-11-17 中国科学院软件研究所 Demand entity co-reference detection method and device based on deep learning and context semantics
CN112560486A (en) * 2020-11-25 2021-03-26 国网江苏省电力有限公司电力科学研究院 Power entity identification method based on multilayer neural network, storage medium and equipment
CN112257421A (en) * 2020-12-21 2021-01-22 完美世界(北京)软件科技发展有限公司 Nested entity data identification method and device and electronic equipment
CN112612892A (en) * 2020-12-29 2021-04-06 达而观数据(成都)有限公司 Special field corpus model construction method, computer equipment and storage medium
CN112507628A (en) * 2021-02-03 2021-03-16 北京淇瑀信息科技有限公司 Risk prediction method and device based on deep bidirectional language model and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YIMING CUI等: "Revisiting Pre-trained Models for Chinese Natural Language Processing" *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114626371A (en) * 2022-03-22 2022-06-14 鼎富智能科技有限公司 Training method and device for pre-training language model
CN114626371B (en) * 2022-03-22 2024-05-10 鼎富智能科技有限公司 Training method and device for pre-training language model
CN117709355A (en) * 2024-02-05 2024-03-15 四川蜀天信息技术有限公司 Method, device and medium for improving training effect of large language model
CN117709355B (en) * 2024-02-05 2024-05-17 四川蜀天信息技术有限公司 Method, device and medium for improving training effect of large language model

Similar Documents

Publication Publication Date Title
CN107526967B (en) Risk address identification method and device and electronic equipment
CN111291566B (en) Event main body recognition method, device and storage medium
US11699275B2 (en) Method and system for visio-linguistic understanding using contextual language model reasoners
Xin et al. Learning better internal structure of words for sequence labeling
CN110750993A (en) Word segmentation method, word segmentation device, named entity identification method and system
CN111159412B (en) Classification method, classification device, electronic equipment and readable storage medium
CN113961669A (en) Training method of pre-training language model, storage medium and server
CN113821605B (en) Event extraction method
CN110555136A (en) Video tag generation method and device and computer storage medium
CN112232024A (en) Dependency syntax analysis model training method and device based on multi-labeled data
CN111723569A (en) Event extraction method and device and computer readable storage medium
CN111339250A (en) Mining method of new category label, electronic equipment and computer readable medium
CN113836925B (en) Training method and device for pre-training language model, electronic equipment and storage medium
CN110968725B (en) Image content description information generation method, electronic device and storage medium
CN112257452A (en) Emotion recognition model training method, device, equipment and storage medium
CN112287100A (en) Text recognition method, spelling error correction method and voice recognition method
CN112163560A (en) Video information processing method and device, electronic equipment and storage medium
CN112270184A (en) Natural language processing method, device and storage medium
CN114817633A (en) Video classification method, device, equipment and storage medium
CN113887206B (en) Model training and keyword extraction method and device
CN108491381A (en) A kind of syntactic analysis method of Chinese bipartite structure
CN114860942A (en) Text intention classification method, device, equipment and storage medium
CN115858773A (en) Keyword mining method, device and medium suitable for long document
CN112084788A (en) Automatic marking method and system for implicit emotional tendency of image captions
CN114610878A (en) Model training method, computer device and computer-readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20220121