CN113326700B - ALBert-based complex heavy equipment entity extraction method - Google Patents
ALBert-based complex heavy equipment entity extraction method Download PDFInfo
- Publication number
- CN113326700B CN113326700B CN202110217185.1A CN202110217185A CN113326700B CN 113326700 B CN113326700 B CN 113326700B CN 202110217185 A CN202110217185 A CN 202110217185A CN 113326700 B CN113326700 B CN 113326700B
- Authority
- CN
- China
- Prior art keywords
- albert
- model
- entity
- heavy equipment
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 45
- 238000012549 training Methods 0.000 claims abstract description 36
- 238000012795 verification Methods 0.000 claims abstract description 9
- 238000002372 labelling Methods 0.000 claims description 20
- 239000013598 vector Substances 0.000 claims description 10
- 238000000034 method Methods 0.000 claims description 8
- 238000012545 processing Methods 0.000 claims description 7
- 230000002159 abnormal effect Effects 0.000 claims description 3
- 238000005259 measurement Methods 0.000 claims description 3
- 230000007246 mechanism Effects 0.000 claims description 3
- 230000000452 restraining effect Effects 0.000 claims description 2
- 238000001125 extrusion Methods 0.000 description 4
- 238000004519 manufacturing process Methods 0.000 description 4
- 238000011161 development Methods 0.000 description 3
- 238000007726 management method Methods 0.000 description 3
- 238000010276 construction Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 239000010931 gold Substances 0.000 description 2
- 229910052737 gold Inorganic materials 0.000 description 2
- 239000002184 metal Substances 0.000 description 2
- 229910052751 metal Inorganic materials 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 230000007123 defense Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 238000009434 installation Methods 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000005272 metallurgy Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000003908 quality control method Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/242—Dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Databases & Information Systems (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Medical Informatics (AREA)
- Animal Behavior & Ethology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a ALBert-based complex heavy equipment entity extraction method, which is implemented according to the following steps: step 1, collecting texts in the field of complex heavy equipment, and constructing a corpus; step 2, pre-training ALBert the model by using the corpus obtained in the step 1 to obtain a pre-trained word representation model ALBert; marking entity names in the corpus obtained in the step 1, and adjusting a text format to an algorithm reading format to obtain a training set and a verification set; step 4, training a model, namely sending marked data into ALBert-BGRU-Attention-CRF algorithm to obtain a trained model; step 5, creating a dictionary Dict; and 6, inputting the text to be extracted into the model obtained in the step 4, and combining the dictionary Dict constructed in the step 5 to obtain an entity extraction result. The invention can complete the entity extraction task in the field of complex heavy equipment.
Description
Technical Field
The invention belongs to the technical field of knowledge maps, and particularly relates to a ALBert-based complex heavy equipment entity extraction method.
Background
The complex heavy equipment is one of important basic equipment in the manufacturing industry, is an important guarantee for social and economic development and national defense industry, and is particularly important as a national heavy equipment. Heavy equipment is used as high-end equipment and is widely applied to important industries and fields such as energy, traffic, ships, engineering machinery, metallurgy, aerospace, military industry and the like. Heavy equipment has long development cycle and complex stages, including early investigation, design, manufacture, purchasing, matching, installation, debugging, delivery, quality control, after-sales service, etc., and a large amount of knowledge is generated in these processes, wherein the large amount of knowledge is stored in a text form.
With the development of new internet technology, the effective management of knowledge and the reuse of knowledge in equipment manufacturing industry can better assist the whole process of design, production and operation and maintenance. The knowledge graph is a high-efficiency and knowledge organization and management mode, one of important links of knowledge graph construction is entity extraction, and the accuracy of the entity extraction determines the accuracy of the knowledge graph to a certain extent. Entity extraction for complex heavy equipment text lays a foundation for subsequent knowledge graph construction, knowledge effective management and knowledge reuse.
Disclosure of Invention
The invention aims to provide a ALBert-based entity extraction method for complex heavy equipment, which can complete the entity extraction task in the field of complex heavy equipment.
The technical scheme adopted by the invention is that the method for extracting the entity of the complex heavy equipment based on ALBert is implemented according to the following steps:
Step 1, collecting texts in the field of complex heavy equipment, and constructing a corpus;
Step 2, pre-training ALBert the model by using the corpus obtained in the step 1 to obtain a pre-trained word representation model ALBert;
Marking entity names in the corpus obtained in the step 1, and adjusting a text format to an algorithm reading format to obtain a training set and a verification set;
step 4, training a model, namely sending marked data into ALBert-BGRU-Attention-CRF algorithm to obtain a trained model;
Step 5, creating a dictionary Dict;
And 6, inputting the text to be extracted into the model obtained in the step 4, and combining the dictionary Dict constructed in the step 5 to obtain an entity extraction result.
The present invention is also characterized in that,
In the step 1, a web crawler framework Scrapy is used for capturing related complex heavy equipment information from a webpage and storing the information as a text file, and the stored text is integrated with an existing complex heavy equipment field document collected manually to serve as a data source; then processing the data source, and eliminating special symbols, formulas and measurement units in the data source; the processed data is used as a corpus and is stored as a text file.
In the step 2, the ALBert model takes a single Chinese character as input, a start identifier [ CLS ] is added before the first word of each sentence, an end identifier [ SEP ] is added at the end of each sentence, ALBert outputs a representation vector which is fused text semantic information of each input word, the latter connection parameters are finely tuned according to the corpus in the data source on the basis of the ALBert pre-training model, and ALBert internal training parameters do not participate in training, so that a finely tuned ALBert model is obtained.
And 3, finishing entity labeling in a manual labeling mode, wherein the labeling entity adopts a BIO labeling mode, the first word of the entity is labeled with a B-Type label, the non-first word of the entity is labeled with an I-Type label, the non-entity and the punctuation mark number are all labeled with O labels, and the Type represents the entity category.
The training model in step 4 is specifically as follows:
Step 4.1, inputting the training set and the verification set obtained in the step 3 into the ALBert model finely tuned in the step2 to generate word vectors;
Step 4.2, inputting the word vector generated in the step 4.1 into a bi-directional gating circulation unit BGRU, and obtaining the scores of each word on all labels;
step 4.3, weighting the result in the step 4.2 by using an attribute mechanism to obtain weighted scores of each word on all labels;
Step 4.4, restraining the tag sequence by using a conditional random field CRF, and reducing the occurrence probability of the abnormal sequence;
And 4.5, obtaining the entity extraction model after training.
The step5 is specifically as follows:
related names including, but not limited to, parts, combinations, product names are extracted from the complex heavy equipment detailed information table as dictionary Dict.
The step 6 is specifically as follows:
Step 6.1, aiming at a large number of texts to be extracted, guiding all the texts into the entity extraction model trained in the step 4, obtaining a primary recognition result, and adding the dictionary Dict constructed in the step 5 on the basis to perform secondary extraction to obtain a final entity extraction result;
and 6.2, aiming at entity extraction of the independent sentences, pasting the sentences to be extracted to an online recognition window in an online recognition mode, calling the model obtained in the step 4 and giving an extraction result by combining with a dictionary Dict.
The method has the beneficial effects that the method for extracting the entity of the complex heavy equipment based on ALBert marks the text of the related information in the existing text and webpage of the field as a corpus, word embedding is realized by using the fine-tuned ALBert, the entity extraction model is obtained by training by using the deep learning algorithm BGRU-Attention-CRF, and the field dictionary is added in order to improve the accuracy of entity extraction and simultaneously consider the special nouns of the industry of the complex heavy equipment. When a new corpus is input, the trained model identifies the entity therein and combines with a dictionary to give a final entity extraction result.
Drawings
FIG. 1 is a general flow chart of a ALBert-based complex heavy equipment entity extraction method of the present invention;
FIG. 2 is a flow chart of the depth learning algorithm ALBert-BGRU-Attention-CRF for establishing the entity extraction model of the complex heavy equipment based on ALBert in the entity extraction method of the complex heavy equipment.
Detailed Description
The invention will be described in detail below with reference to the drawings and the detailed description.
According to the complex heavy equipment entity extraction method based on ALBert, a flow chart is shown in fig. 1, on the basis of data collection and processing, an entity extraction model is trained by utilizing a ALBert-BGRU-Attention-CRF algorithm based on deep learning, and a final extraction result is obtained by performing primary entity extraction on corpus to be extracted and combining with a dictionary (Dict). The method is implemented according to the following steps:
Step 1, collecting texts in the field of complex heavy equipment, and constructing a corpus;
In the step 1, a web crawler framework Scrapy is used for capturing related complex heavy equipment information from a webpage and storing the information as a text file, and the stored text is integrated with an existing complex heavy equipment field document collected manually to serve as a data source; then processing the data source, and eliminating special symbols, formulas and measurement units in the data source; the processed data is used as a corpus and is stored as a text file.
Step 2, pre-training ALBert the model by using the corpus obtained in the step 1 to obtain a pre-trained word representation model ALBert;
In the step 2, the ALBert model takes a single Chinese character as input, a start identifier [ CLS ] is added before the first word of each sentence, an end identifier [ SEP ] is added at the end of each sentence, ALBert outputs a representation vector which is fused text semantic information of each input word, the latter connection parameters are finely tuned according to the corpus in the data source on the basis of the ALBert pre-training model, and ALBert internal training parameters do not participate in training, so that a finely tuned ALBert model is obtained.
Marking entity names in the corpus obtained in the step 1, and adjusting a text format to an algorithm reading format to obtain a training set and a verification set;
and 3, finishing entity labeling in a manual labeling mode, wherein the labeling entity adopts a BIO labeling mode, the first word of the entity is labeled with a B-Type label, the non-first word of the entity is labeled with an I-Type label, the non-entity and the punctuation mark number are all labeled with O labels, and the Type represents the entity category.
And 3, developing a manual annotation and automatic format adjustment webpage system. And (3) adopting a manual labeling mode, and utilizing the developed data labeling webpage to finish entity labeling.
The entity labeling and text format adjustment algorithm pseudo code is as follows:
Input: text data to be marked;
Output: labeling data with labels;
1. Text preprocessing:
1.1. Removing line feed and blank spaces in the text, and displaying the text with the arranged format;
1.2. Creating a tag array, and initializing all character tags in a text as O;
2. labeling entities;
2.1. Clicking the label type, selecting an entity corresponding to the label type, and setting the label of the text of the selected entity as the corresponding label type;
2.2. If the full text label is started, searching the full text, and setting all the entity labels with the same name as the selected label type;
3. Generating annotation data in a standard format, outputting text character by character, and attaching a label corresponding to each character and a line-feed character after each character;
Return format specification and tagged data;
the labeling entity adopts a BIO labeling mode, the first word of the entity is marked with a B-Type label, the non-first word of the entity is marked with an I-Type label, the non-entity and punctuation marks are all marked with O labels, and the Type represents the entity category.
For example, there is a corpus of: "a metal extrusion machine is the most important equipment for realizing metal extrusion processing. ", entities are labeled: the gold B-Product belongs to I-Product extrusion I-Product pressure I-Product machine I-Product is O and realizes O-gold B-Way belongs to I-Way extrusion I-Way pressure I-Way and I-Way O most O main O of O equipment O. O
The non-entity information is marked as 'O', 'B-Product' marked with the entity first word of 'Product', 'I-Product' marked with the entity first word of 'Product', 'B-Way' marked with the entity first word of 'processing mode', 'I-Way' marked with the entity first word of 'processing mode'.
Step 4, training a model, namely sending marked data into ALBert-BGRU-Attention-CRF algorithm to obtain a trained model; the flow chart is shown in figure 2 of the drawings,
The training model in step 4 is specifically as follows:
Step 4.1, inputting the training set and the verification set obtained in the step 3 into the ALBert model finely tuned in the step2 to generate word vectors;
step 4.2, inputting the word vector generated in the step 4.1 into a bi-directional gating circulation unit BGRU (Bidirectional Gated Recurrent Unit), and obtaining the scores of each word on all labels;
step 4.3, weighting the result in the step 4.2 by using an attribute mechanism to obtain weighted scores of each word on all labels;
Step 4.4, constraint tag sequences are carried out by using a conditional random field CRF (Conditional Random Field), so that the occurrence probability of abnormal sequences is reduced;
And 4.5, obtaining the entity extraction model after training.
The training entity extraction model is as follows:
input: training set and verification set;
Output: extracting a model by an entity;
An import training set, a validation set;
2. Introducing a ALBert model after fine adjustment;
3. importing word vectors into GRU-Attention-CRF;
4. Specifying model parameters;
5. inputting a training set and a verification set to start training;
Return entity extraction model.
Step 5, creating a dictionary Dict;
The step5 is specifically as follows:
related names including, but not limited to, parts, combinations, product names are extracted from the complex heavy equipment detailed information table as dictionary Dict.
And 6, inputting the text to be extracted into the model obtained in the step 4, and combining the dictionary Dict constructed in the step 5 to obtain an entity extraction result.
The step 6 is specifically as follows:
Step 6.1, aiming at a large number of texts to be extracted, guiding all the texts into the entity extraction model trained in the step 4, obtaining a primary recognition result, and adding the dictionary Dict constructed in the step 5 on the basis to perform secondary extraction to obtain a final entity extraction result;
and 6.2, aiming at entity extraction of the independent sentences, pasting the sentences to be extracted to an online recognition window in an online recognition mode, calling the model obtained in the step 4 and giving an extraction result by combining with a dictionary Dict.
Claims (3)
1. The ALBert-based complex heavy equipment entity extraction method is characterized by comprising the following steps of:
Step 1, collecting texts in the field of complex heavy equipment, and constructing a corpus;
In the step 1, the web crawler framework Scrapy is used for capturing related complex heavy equipment information from a webpage and storing the related complex heavy equipment information as a text file, and the stored text is integrated with the existing complex heavy equipment field document collected manually to be used as a data source; then processing the data source, and eliminating special symbols, formulas and measurement units in the data source; the processed data is used as a corpus and is stored as a text file;
step 2, pre-training ALBert the model by using the corpus obtained in the step 1 to obtain a pre-trained word representation model ALBert;
in the step 2, the ALBert model takes a single Chinese character as input, a start identifier [ CLS ] is added before the first word of each sentence, an end identifier [ SEP ] is added at the tail of each sentence, ALBert outputs a representation vector which is fused with text semantic information of each input word, the connection parameters at the back are finely tuned according to the corpus in the data source on the basis of the ALBert pre-training model, and ALBert internal training parameters do not participate in training, so that a finely tuned ALBert model is obtained;
marking entity names in the corpus obtained in the step 1, and adjusting a text format to an algorithm reading format to obtain a training set and a verification set;
In the step 3, an artificial labeling mode is adopted to finish entity labeling, a labeling entity adopts a BIO labeling mode, an entity first word is marked with a B-Type label, an entity non-first word is marked with an I-Type label, non-entity and punctuation marks are all marked with O labels, and the Type represents the entity category;
Step 4, training a model, namely sending marked data into ALBert-BGRU-Attention-CRF algorithm to obtain a trained model;
The training model in the step 4 is specifically as follows:
Step 4.1, inputting the training set and the verification set obtained in the step3 into the ALBert model finely tuned in the step 2 to generate word vectors;
step 4.2, inputting the word vector generated in the step 4.1 into a bi-directional gating circulation unit BGRU, and obtaining the scores of each word on all labels;
Step 4.3, weighting the result in the step 4.2 by using an attribute mechanism to obtain weighted scores of each word on all labels;
Step 4.4, restraining the tag sequence by using a conditional random field CRF, and reducing the occurrence probability of the abnormal sequence;
step 4.5, obtaining a training entity extraction model;
Step 5, creating a dictionary Dict;
And 6, inputting the text to be extracted into the model obtained in the step 4, and combining the dictionary Dict constructed in the step 5 to obtain an entity extraction result.
2. The method for extracting entities of complex heavy equipment based on ALBert as set forth in claim 1, wherein said step 5 is specifically as follows:
Related names including parts, combination, product names are extracted from the complex heavy equipment detailed information table as dictionary Dict.
3. The method for extracting entities of complex heavy equipment based on ALBert as claimed in claim 2, wherein said step 6 is specifically as follows:
Step 6.1, aiming at a large number of texts to be extracted, guiding all the texts into the entity extraction model trained in the step 4, obtaining a primary recognition result, and adding the dictionary Dict constructed in the step 5 on the basis to perform secondary extraction to obtain a final entity extraction result;
and 6.2, aiming at entity extraction of the independent sentences, pasting the sentences to be extracted to an online recognition window in an online recognition mode, calling the model obtained in the step 4 and giving an extraction result by combining with a dictionary Dict.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110217185.1A CN113326700B (en) | 2021-02-26 | 2021-02-26 | ALBert-based complex heavy equipment entity extraction method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110217185.1A CN113326700B (en) | 2021-02-26 | 2021-02-26 | ALBert-based complex heavy equipment entity extraction method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113326700A CN113326700A (en) | 2021-08-31 |
CN113326700B true CN113326700B (en) | 2024-05-14 |
Family
ID=77414448
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110217185.1A Active CN113326700B (en) | 2021-02-26 | 2021-02-26 | ALBert-based complex heavy equipment entity extraction method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113326700B (en) |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106815293A (en) * | 2016-12-08 | 2017-06-09 | 中国电子科技集团公司第三十二研究所 | System and method for constructing knowledge graph for information analysis |
CN109359293A (en) * | 2018-09-13 | 2019-02-19 | 内蒙古大学 | Mongolian name entity recognition method neural network based and its identifying system |
CN110188347A (en) * | 2019-04-29 | 2019-08-30 | 西安交通大学 | Relation extraction method is recognized between a kind of knowledget opic of text-oriented |
CN110598203A (en) * | 2019-07-19 | 2019-12-20 | 中国人民解放军国防科技大学 | Military imagination document entity information extraction method and device combined with dictionary |
CN110990525A (en) * | 2019-11-15 | 2020-04-10 | 华融融通(北京)科技有限公司 | Natural language processing-based public opinion information extraction and knowledge base generation method |
CN111199152A (en) * | 2019-12-20 | 2020-05-26 | 西安交通大学 | Named entity identification method based on label attention mechanism |
CN111444721A (en) * | 2020-05-27 | 2020-07-24 | 南京大学 | Chinese text key information extraction method based on pre-training language model |
CN111860882A (en) * | 2020-06-17 | 2020-10-30 | 国网江苏省电力有限公司 | Method and device for constructing power grid dispatching fault processing knowledge graph |
CN111950540A (en) * | 2020-07-24 | 2020-11-17 | 浙江师范大学 | Knowledge point extraction method, system, device and medium based on deep learning |
CN112036185A (en) * | 2020-11-04 | 2020-12-04 | 长沙树根互联技术有限公司 | Method and device for constructing named entity recognition model based on industrial enterprise |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10049103B2 (en) * | 2017-01-17 | 2018-08-14 | Xerox Corporation | Author personality trait recognition from short texts with a deep compositional learning approach |
-
2021
- 2021-02-26 CN CN202110217185.1A patent/CN113326700B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106815293A (en) * | 2016-12-08 | 2017-06-09 | 中国电子科技集团公司第三十二研究所 | System and method for constructing knowledge graph for information analysis |
CN109359293A (en) * | 2018-09-13 | 2019-02-19 | 内蒙古大学 | Mongolian name entity recognition method neural network based and its identifying system |
CN110188347A (en) * | 2019-04-29 | 2019-08-30 | 西安交通大学 | Relation extraction method is recognized between a kind of knowledget opic of text-oriented |
CN110598203A (en) * | 2019-07-19 | 2019-12-20 | 中国人民解放军国防科技大学 | Military imagination document entity information extraction method and device combined with dictionary |
CN110990525A (en) * | 2019-11-15 | 2020-04-10 | 华融融通(北京)科技有限公司 | Natural language processing-based public opinion information extraction and knowledge base generation method |
CN111199152A (en) * | 2019-12-20 | 2020-05-26 | 西安交通大学 | Named entity identification method based on label attention mechanism |
CN111444721A (en) * | 2020-05-27 | 2020-07-24 | 南京大学 | Chinese text key information extraction method based on pre-training language model |
CN111860882A (en) * | 2020-06-17 | 2020-10-30 | 国网江苏省电力有限公司 | Method and device for constructing power grid dispatching fault processing knowledge graph |
CN111950540A (en) * | 2020-07-24 | 2020-11-17 | 浙江师范大学 | Knowledge point extraction method, system, device and medium based on deep learning |
CN112036185A (en) * | 2020-11-04 | 2020-12-04 | 长沙树根互联技术有限公司 | Method and device for constructing named entity recognition model based on industrial enterprise |
Non-Patent Citations (1)
Title |
---|
结合ALBERT和双向门控循环单元的专利文本分类;温超东 等;《计算机应用》;20210210;第41卷(第2期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN113326700A (en) | 2021-08-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109992782B (en) | Legal document named entity identification method and device and computer equipment | |
CN110597997B (en) | Military scenario text event extraction corpus iterative construction method and device | |
CN109670041A (en) | A kind of band based on binary channels text convolutional neural networks is made an uproar illegal short text recognition methods | |
CN107943911A (en) | Data pick-up method, apparatus, computer equipment and readable storage medium storing program for executing | |
CN112100388A (en) | Method for analyzing emotional polarity of long text news public sentiment | |
CN112417854A (en) | Chinese document abstraction type abstract method | |
CN110287482A (en) | Semi-automation participle corpus labeling training device | |
CN110705272A (en) | Named entity identification method for automobile engine fault diagnosis | |
CN112051986A (en) | Code search recommendation device and method based on open source knowledge | |
CN111914555A (en) | Automatic relation extraction system based on Transformer structure | |
CN111460147B (en) | Title short text classification method based on semantic enhancement | |
CN116050397A (en) | Method, system, equipment and storage medium for generating long text abstract | |
CN114969294A (en) | Expansion method of sound-proximity sensitive words | |
CN111444720A (en) | Named entity recognition method for English text | |
CN110968661A (en) | Event extraction method and system, computer readable storage medium and electronic device | |
CN113901224A (en) | Knowledge distillation-based secret-related text recognition model training method, system and device | |
Akhoundzade et al. | Persian sentiment lexicon expansion using unsupervised learning methods | |
CN113326700B (en) | ALBert-based complex heavy equipment entity extraction method | |
CN111339779A (en) | Named entity identification method for Vietnamese | |
CN114595687B (en) | Laos text regularization method based on BiLSTM | |
CN113901172A (en) | Case-related microblog evaluation object extraction method based on keyword structure codes | |
CN110990385A (en) | Software for automatically generating news headlines based on Sequence2Sequence | |
CN116629387B (en) | Text processing method and processing system for training under missing condition | |
Wang et al. | Bacteria and Biotope Entity Recognition Using A Dictionary-Enhanced Neural Network Model | |
CN114386389B (en) | Aspect emotion analysis method based on joint learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |