CN113408288A - Named entity identification method based on BERT and BiGRU-CRF - Google Patents
Named entity identification method based on BERT and BiGRU-CRF Download PDFInfo
- Publication number
- CN113408288A CN113408288A CN202110725919.7A CN202110725919A CN113408288A CN 113408288 A CN113408288 A CN 113408288A CN 202110725919 A CN202110725919 A CN 202110725919A CN 113408288 A CN113408288 A CN 113408288A
- Authority
- CN
- China
- Prior art keywords
- bert
- model
- text data
- bigru
- crf
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 238000012549 training Methods 0.000 claims abstract description 32
- 230000000873 masking effect Effects 0.000 claims abstract description 16
- 238000012795 verification Methods 0.000 claims abstract description 10
- 239000012634 fragment Substances 0.000 claims abstract description 5
- 230000002457 bidirectional effect Effects 0.000 claims description 6
- 238000009826 distribution Methods 0.000 claims description 6
- 238000002372 labelling Methods 0.000 claims description 6
- 238000007781 pre-processing Methods 0.000 claims description 4
- 238000009827 uniform distribution Methods 0.000 claims description 4
- 238000011144 upstream manufacturing Methods 0.000 claims description 4
- 238000000605 extraction Methods 0.000 description 12
- 238000013135 deep learning Methods 0.000 description 8
- 238000003058 natural language processing Methods 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 5
- 238000012552 review Methods 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000006872 improvement Effects 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000004941 influx Effects 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a named entity identification method based on BERT and BiGRU-CRF, relating to the technical field of computers and comprising the following steps: the method comprises the steps that comment text data of the E-commerce industry are obtained through a web crawler, the text data are labeled, the labeled text data are preprocessed, and a training data set and a verification data set are constructed; and training a BiGRU-CRF algorithm model and a BERT algorithm model according to the training data set and the verification data set. According to the method, word vector representation of a sentence is trained through a BERT pre-training language model, a traditional BERT model training task is improved, and fragment masking is used for replacing traditional word masking; semantic information of the current word and context is further extracted through a BiGRU model; and extracting the optimized label which accords with the context logic through a CRF algorithm.
Description
Technical Field
The invention relates to the technical field of computers, in particular to a named entity identification method based on BERT and BiGRU-CRF.
Background
With the rapid development of internet technology, electronic commerce is well-known in people's daily life. The e-commerce platform basically provides a commodity review area for online review of consumers. The consumer can select the satisfied commodity by reading the information of the comment area, meanwhile, the merchant can also obtain the satisfaction degree of the consumer on the commodity from the information of the comment area, the deficiency in the commodity transaction process can be found by timely checking the comment area, and the improvement can be made timely, so that the method has great significance for the continuous development of the shop.
However, the rapid development of the mobile internet has accumulated a lot of and complicated comments on the e-commerce platform, which makes it difficult for consumers to obtain correct commodity information in a short time. Merchants also have difficulty obtaining effective consumer reviews from a vast array of reviews. Therefore, how to efficiently mine the information contained in the comments from the numerous and complicated comments greatly helps to promote consumer behaviors and promote merchants to improve services or change product quality, and directly influences the economic benefit of the e-commerce platform.
With the continuous influx of a large amount of comment texts and the random publication of users, the format of comments is not uniform, grammar rules are difficult to be captured, natural language processing is performed by manpower, the speed of establishing rules and a corpus by experts cannot catch up with the speed of the increase of comment data, the requirements cannot be met, the workload is huge, and human resources are wasted.
An effective solution to the problems in the related art has not been proposed yet.
Disclosure of Invention
Aiming at the problems in the related art, the invention provides a named entity identification method based on BERT and BiGRU-CRF, so as to overcome the technical problems in the prior related art.
The technical scheme of the invention is realized as follows:
a named entity identification method based on BERT and BiGRU-CRF comprises the following steps:
obtaining comment text data of the E-commerce industry through a web crawler, and labeling the text data;
preprocessing the labeled text data, constructing a training data set and a verification data set, and training a BiGRU-CRF algorithm model and a BERT algorithm model according to the training data set and the verification data set;
the word vector representation of a sentence is trained through a BERT pre-training language model, the traditional BERT model training task is improved, and fragment masking is used for replacing the traditional word masking;
semantic information of the current word and context is further extracted through a BiGRU model;
and extracting the optimized label which accords with the context logic through a CRF algorithm.
Further, the text data is labeled, and the method comprises the following steps:
training a model by using part of the labeled text data in an incremental learning mode, and predicting the rest unlabeled text data according to the trained model;
and directly taking the prediction result with the confidence coefficient higher than the preset threshold value as a mark of the text data, and manually marking the text data with the confidence coefficient lower than the preset threshold value.
Further, the method for using the fragment mask to replace the traditional word mask comprises the following steps:
according to the geometric distribution, a segment of the hidden length is randomly selected, then the initial position is randomly selected according to the uniform distribution, and finally a segment of characters in the sentence are covered according to the length.
Further, the semantic information of the current word and context is further extracted through the BiGRU model, the semantic information comprises an entity contained in text data to be predicted, which is extracted through the trained BiGRU-CRF algorithm model, and the following steps are represented:
modeling the text data sequence in a forward direction and a backward direction by utilizing a bidirectional GRU model;
and (4) scoring the whole prediction path by using the relation between conditional random field CRF constraint label results, and extracting entities contained in the text data.
Further, the method also comprises the following steps:
if the text data is a long text, predicting the relation between the text data to be predicted and the entity through the trained BERT algorithm model, and comprising the following steps:
acquiring a BERT original model, adopting [ CLS ] marks to represent the characteristics of the whole sentence types in the article through the BERT original model, and using [ SEP ] to segment a plurality of sentences in the input article;
coding by combining BERT input with an upstream extracted entity and adopting a structure of [ CLS ] article sentence [ SEP ] subject [ object ] [ SEP ] };
connecting the subject entity vector, the sentence vector and the object entity vector, and predicting the relationship type through full connection and softmax; wherein a ═ a1, a 2.., an ] is used to represent the subject entity vector; the object entity vector is denoted by b ═ b1, b 2.
The invention has the beneficial effects that:
the invention relates to a named entity identification method based on BERT and BiGRU-CRF, which comprises the steps of obtaining comment text data of the E-commerce industry through a web crawler and marking the text data; preprocessing the labeled text data, and constructing a training data set and a verification data set; training a BiGRU-CRF algorithm model and a BERT algorithm model according to the training data set and the verification data set; extracting entities contained in text data to be predicted through a trained BiGRU-CRF algorithm model; predicting the relation between the text data to be predicted and the entity through the trained BERT algorithm model, establishing an entity connection relation, and then further analyzing semantic information according to the connection relation; the method comprises the steps of extracting required entities in texts by using an algorithm model of BiGRU-CRF on the basis of a natural language processing technology under deep learning on the basis of a named entity recognition task, predicting the relationships among the entities by the texts and the extracted entities through a BERT model, establishing entity connection relationships, and further analyzing semantic information according to the connection relationships.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
FIG. 1 is a schematic flow chart of a named entity identification method based on BERT and BiGRU-CRF according to an embodiment of the present invention;
fig. 2 is a schematic block diagram of a method for identifying named entities based on BERT and BiGRU-CRF according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present invention.
According to the embodiment of the invention, a named entity identification method based on BERT and BiGRU-CRF is provided.
As shown in fig. 1-2, the method for identifying a named entity based on BERT and BiGRU-CRF according to an embodiment of the present invention includes the following steps:
the method comprises the steps that firstly, comment text data of the E-commerce industry are obtained through a web crawler, and the text data are marked;
preprocessing the labeled text data to construct a training data set and a verification data set; training a BiGRU-CRF algorithm model and a BERT algorithm model according to the training data set and the verification data set;
extracting entities contained in the text data to be predicted through the trained BiGRU-CRF algorithm model;
predicting the relation between the text data to be predicted and the entity through the trained BERT algorithm model;
and step five, further analyzing the semantic information according to the connection relation.
By means of the technical scheme, an automatic information extraction method based on deep learning natural language processing is innovatively provided through the problem of named entity recognition based on E-commerce comment. On the basis of the recognition of named entities of comment texts, the invention provides a high-precision named entity extraction method for information extraction of unstructured comment texts based on a natural language processing technology under deep learning.
In addition, labeling the text data includes: training a model by using part of the labeled text data in an incremental learning mode, and predicting the rest unlabeled text data according to the trained model; and directly taking the prediction result with the confidence coefficient higher than the preset threshold value as a mark of the text data, and manually re-marking the text data with the confidence coefficient lower than the preset threshold value. And constructing a commodity comment data entity library in the E-commerce industry, replacing entities of the same type from the entity library by adopting a random replacement mode, constructing random number word generators of different expression modes aiming at general entity types such as time, amount and the like, and randomly generating and replacing time, amount and the like in the original label text data. The method has the advantages that the labeled data set is expanded, meanwhile, the model can learn the language knowledge of the context better, and compared with a common method, the robustness and the accuracy of the model are improved. And simultaneously, predicting another batch of unlabeled text data by using a model trained by a part of labeled data in an incremental learning mode, directly taking a prediction result with the algorithm confidence coefficient higher than a threshold value as a mark, and manually re-labeling the data with low confidence coefficient. And carrying out entity replacement on the marked text data by using the constructed entity library and the time and money random generator, and enhancing the data set. And simultaneously, predicting another batch of unlabeled text data by using a model trained by a part of labeled data in an incremental learning mode, directly using a prediction result with high confidence coefficient of the algorithm as a label, and manually re-labeling the data with low confidence coefficient. The method greatly improves the efficiency of data marking and reduces the labor cost.
In addition, the entity contained in the text data to be predicted is extracted through the trained BiGRU-CRF algorithm model, and the method comprises the following steps: modeling the text data sequence in a forward direction and a backward direction by utilizing a bidirectional GRU model; and (4) scoring the whole prediction path by using the relation between conditional random field CRF constraint label results, and extracting entities contained in the text data. In the embodiment, named entity recognition is performed on the sentence after the sentence is labeled by using a deep learning sequence labeling model, so that the entity existing in the text is recognized. And carrying out forward and backward modeling on the sequence by using a bidirectional GRU model, and scoring the whole prediction path by using the relation between Conditional Random Field (CRF) constraint label results so as to extract an entity contained in the sentence.
In addition, if the text data is a long text, predicting the relation between the text data to be predicted and the entity through the trained BERT algorithm model, and establishing the relation connection between the related entities, wherein the relation connection comprises the following steps:
acquiring a BERT original model, adopting [ CLS ] marks to represent the characteristics of the whole sentence types in the article through the BERT original model, and using [ SEP ] to segment a plurality of sentences in the input article;
coding by combining BERT input with an upstream extracted entity and adopting a structure of [ CLS ] article sentence [ SEP ] subject [ object ] [ SEP ] };
connecting the subject entity vector, the sentence vector and the object entity vector, and predicting the relationship type through full connection and softmax;
wherein a ═ a1, a 2.., an ] is used to represent the subject entity vector; the object entity vector is denoted by b ═ b1, b 2.
The method trains an entity recognition model and a relation prediction model respectively through labeled data. Constructing a BiGRU-CRF and training entity identification, modeling sequences in a forward direction and a backward direction by utilizing a bidirectional GRU model, and constraining the relation between label results by utilizing a Conditional Random Field (CRF);
constructing a BERT model as a relational analysis model, processing the forward input data of the model into a structure in the form of a '{ [ CLS ] article sentence [ SEP ] subject [ object ] [ SEP ] }', and training the BERT model;
predicting entities appearing in the text by using data to be predicted through an entity recognition model BiGRU-CRF;
and respectively combining the entities extracted in the last step, integrating the entities with the original text according to the format of the training data, and inputting a model to predict the relationship among the entities.
The invention provides a BERT model for improving a mask strategy to analyze semantic relations between entities. BERT uses a multi-layer self-attention mechanism to carry out bidirectional coded representation on text, and semantic syntax information of different levels of the text is extracted from low to high. The traditional BERT Model achieves the effect of training a Language Model by masking 15% of words in a text and predicting the Masked words in a Masked mode by a Masked Language Model. The invention provides a method for replacing the traditional word masking by using segment masking, and specifically, according to geometric distribution, a segment of masking length is randomly selected, then according to uniform distribution, a starting position is randomly selected, and finally a segment of characters in a sentence is masked according to the length. Migration learning using BERT can provide strong support for downstream tasks in general.
The BERT original model adopts [ CLS ] marks to represent the overall type characteristics of sentences, uses [ SEP ] to segment a plurality of input sentences, innovatively proposes a BERT structure to carry out semantic relation analysis aiming at a special input structure in the BERT, and codes by adopting a structure of a [ CLS ] article sentence [ SEP ] subject [ object ] [ SEP ] }bycombining the input of the BERT with an upstream extracted entity. Denote the subject entity vector using a ═ a1, a 2.., an ]; using b ═ b1, b 2.., bn ] to represent the object entity vector, and finally predicting the relationship type through full connection and softmax by connecting the sentence vector, the subject entity vector and the object entity vector, and representing:
V′=W[concat(a,b)]+λ;
p=softmax(V′)。
aiming at the problem that the information of words with similar meanings is split due to word masking in the traditional BERT model masking strategy, a segment masking mode is innovatively adopted, a segment of masking length is randomly selected according to geometric distribution, then the initial position is randomly selected according to uniform distribution, and finally a segment of characters in a sentence are masked according to the length. And denoising according to the connection relation by taking the extracted entity as a semantic central word, reserving a limited step length text having a dependency relation with the entity, and removing unnecessary text noise. The method can solve the problem of entity area ambiguity and improve the accuracy of entity relationship analysis. And adding the entity pairs into the model in an effective mode, and extracting the characteristics to predict the relation between the entities. Meanwhile, in order to prevent the occurrence of overfitting, the entity pair of the relation to be predicted is covered in the sentence [ MASK ]. Compared with the existing relation extraction mode, the method has obvious effect improvement. The invention discloses a method for removing text noise by using syntactic relation analysis, which comprises the following steps: aiming at the problem of fuzzy entity regions in the text, the semantic center word entities are creatively extracted by adopting a syntactic relation analysis mode, the limited step length text which has a dependency relationship with the entities is reserved, and unnecessary text noise caused by fuzzy entity distribution is removed.
In summary, by means of the technical scheme of the present invention, a semantic relationship extraction method is innovatively provided for information extraction of unstructured text based on a natural language processing technology under deep learning. Firstly, extracting entities such as product entities, money amounts, time, places, mechanisms and the like contained in a text through an algorithm model of BiGRU-CRF, predicting the relationship between the entities through a BERT model for the text and the extracted entities, establishing relationship connection between related entities, and completing text semantic analysis according to the connection relationship. The invention discloses an automatic information extraction method based on deep learning natural language processing, which is based on the problem of semantic relation analysis and extraction of unstructured documents. On the aspect of article unstructured information extraction, the invention provides a high-precision semantic relation extraction method for information extraction of unstructured texts based on a natural language processing technology under deep learning, extracts required entities in texts by using an algorithm model of BiGRU-CRF, and predicts the relation between the entities by using the texts and the extracted entities through a BERT model. The invention is a natural language analysis process combining natural language processing with deep learning pre-training model and having better prediction effect through practice, groping and research, and has high algorithm efficiency and strong pertinence. In the engineering application practice, compared with the rule-based extraction process commonly adopted by the text data mining project and the general technical method, the method has higher accuracy and higher processing speed. Moreover, the invention has the following three advantages:
1) the method comprises the steps of adopting a mode of constructing an e-commerce industry comment data entity library, replacing entities of the same type from the entity library in a random replacement mode, constructing random number word generators of different expression modes aiming at general entity types such as time, amount and the like, and randomly generating and replacing positions such as time, amount and the like in original labeled text data, so that a model can learn language knowledge of context better.
2) The method adopts an incremental learning mode, predicts another batch of unlabeled text data by using a model trained by a part of labeled data, directly uses a prediction result with the algorithm confidence higher than a threshold value as a mark, and manually re-labels the data with low confidence. The method greatly improves the efficiency of data marking and reduces the labor cost.
3) Aiming at the fuzzy characteristic of the entity region in the E-commerce comment text, the noise removal operation is innovatively carried out on irrelevant information in the text in a syntactic relation analysis mode. The extracted entity is taken as a semantic central word, the semantic central word is removed through a syntactic relation, a limited step length text which has a dependency relation with the entity is reserved, the length of the analyzed text can be shortened to the maximum extent while the original context structure is reserved, and unnecessary text noise caused by fuzzy entity distribution is removed. The training speed and the prediction accuracy of the model are greatly improved, and the speed disadvantage of the pre-training model on long sentence training reasoning can be well relieved.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (5)
1. A named entity identification method based on BERT and BiGRU-CRF is characterized by comprising the following steps:
obtaining comment text data of the E-commerce industry through a web crawler, and labeling the text data;
preprocessing the labeled text data, constructing a training data set and a verification data set, and training a BiGRU-CRF algorithm model and a BERT algorithm model according to the training data set and the verification data set;
the word vector representation of a sentence is trained through a BERT pre-training language model, the traditional BERT model training task is improved, and fragment masking is used for replacing the traditional word masking;
semantic information of the current word and context is further extracted through a BiGRU model;
and extracting the optimized label which accords with the context logic through a CRF algorithm.
2. The method for identifying named entities based on BERT and BiGRU-CRF as claimed in claim 1, wherein the text data is labeled, comprising the following steps:
training a model by using part of the labeled text data in an incremental learning mode, and predicting the rest unlabeled text data according to the trained model;
and directly taking the prediction result with the confidence coefficient higher than the preset threshold value as a mark of the text data, and manually marking the text data with the confidence coefficient lower than the preset threshold value.
3. The BERT and BiGRU-CRF based named entity recognition method of claim 2, wherein the use of fragment masking instead of traditional word masking comprises the steps of:
according to the geometric distribution, a segment of the hidden length is randomly selected, then the initial position is randomly selected according to the uniform distribution, and finally a segment of characters in the sentence are covered according to the length.
4. The method of claim 3, wherein the extracting semantic information of the current word and context through the BiGRU model further comprises extracting entities included in text data to be predicted through a trained BiGRU-CRF algorithm model, and represents the following steps:
modeling the text data sequence in a forward direction and a backward direction by utilizing a bidirectional GRU model;
and (4) scoring the whole prediction path by using the relation between conditional random field CRF constraint label results, and extracting entities contained in the text data.
5. The method for identifying named entities based on BERT and BiGRU-CRF as claimed in claim 4, further comprising the steps of:
if the text data is a long text, predicting the relation between the text data to be predicted and the entity through the trained BERT algorithm model, and comprising the following steps:
acquiring a BERT original model, adopting [ CLS ] marks to represent the characteristics of the whole sentence types in the article through the BERT original model, and using [ SEP ] to segment a plurality of sentences in the input article;
coding by combining BERT input with an upstream extracted entity and adopting a structure of [ CLS ] article sentence [ SEP ] subject [ object ] [ SEP ] };
connecting the subject entity vector, the sentence vector and the object entity vector, and predicting the relationship type through full connection and softmax; wherein a ═ a1, a 2.., an ] is used to represent the subject entity vector; the object entity vector is denoted by b ═ b1, b 2.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110725919.7A CN113408288A (en) | 2021-06-29 | 2021-06-29 | Named entity identification method based on BERT and BiGRU-CRF |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110725919.7A CN113408288A (en) | 2021-06-29 | 2021-06-29 | Named entity identification method based on BERT and BiGRU-CRF |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113408288A true CN113408288A (en) | 2021-09-17 |
Family
ID=77680045
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110725919.7A Withdrawn CN113408288A (en) | 2021-06-29 | 2021-06-29 | Named entity identification method based on BERT and BiGRU-CRF |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113408288A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113935310A (en) * | 2021-09-22 | 2022-01-14 | 三一重机有限公司 | Customer demand information extraction method and device |
CN114510943A (en) * | 2022-02-18 | 2022-05-17 | 北京大学 | Incremental named entity identification method based on pseudo sample playback |
CN114510948A (en) * | 2021-11-22 | 2022-05-17 | 北京中科凡语科技有限公司 | Machine translation detection method and device, electronic equipment and readable storage medium |
CN114861600A (en) * | 2022-07-07 | 2022-08-05 | 之江实验室 | NER-oriented Chinese clinical text data enhancement method and device |
CN115221882A (en) * | 2022-07-28 | 2022-10-21 | 平安科技(深圳)有限公司 | Named entity identification method, device, equipment and medium |
CN115587594A (en) * | 2022-09-20 | 2023-01-10 | 广东财经大学 | Network security unstructured text data extraction model training method and system |
CN117010390A (en) * | 2023-07-04 | 2023-11-07 | 北大荒信息有限公司 | Company entity identification method, device, equipment and medium based on bidding information |
-
2021
- 2021-06-29 CN CN202110725919.7A patent/CN113408288A/en not_active Withdrawn
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113935310A (en) * | 2021-09-22 | 2022-01-14 | 三一重机有限公司 | Customer demand information extraction method and device |
CN114510948A (en) * | 2021-11-22 | 2022-05-17 | 北京中科凡语科技有限公司 | Machine translation detection method and device, electronic equipment and readable storage medium |
CN114510943A (en) * | 2022-02-18 | 2022-05-17 | 北京大学 | Incremental named entity identification method based on pseudo sample playback |
CN114510943B (en) * | 2022-02-18 | 2024-05-28 | 北京大学 | Incremental named entity recognition method based on pseudo sample replay |
CN114861600A (en) * | 2022-07-07 | 2022-08-05 | 之江实验室 | NER-oriented Chinese clinical text data enhancement method and device |
CN114861600B (en) * | 2022-07-07 | 2022-12-13 | 之江实验室 | NER-oriented Chinese clinical text data enhancement method and device |
US11972214B2 (en) | 2022-07-07 | 2024-04-30 | Zhejiang Lab | Method and apparatus of NER-oriented chinese clinical text data augmentation |
CN115221882A (en) * | 2022-07-28 | 2022-10-21 | 平安科技(深圳)有限公司 | Named entity identification method, device, equipment and medium |
CN115221882B (en) * | 2022-07-28 | 2023-06-20 | 平安科技(深圳)有限公司 | Named entity identification method, device, equipment and medium |
CN115587594A (en) * | 2022-09-20 | 2023-01-10 | 广东财经大学 | Network security unstructured text data extraction model training method and system |
CN115587594B (en) * | 2022-09-20 | 2023-06-30 | 广东财经大学 | Unstructured text data extraction model training method and system for network security |
CN117010390A (en) * | 2023-07-04 | 2023-11-07 | 北大荒信息有限公司 | Company entity identification method, device, equipment and medium based on bidding information |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113408288A (en) | Named entity identification method based on BERT and BiGRU-CRF | |
CN109766524B (en) | Method and system for extracting combined purchasing recombination type notice information | |
CN112417888A (en) | Method for analyzing sparse semantic relationship by combining BilSTM-CRF algorithm and R-BERT algorithm | |
CN113868432B (en) | Automatic knowledge graph construction method and system for iron and steel manufacturing enterprises | |
CN113191148B (en) | Rail transit entity identification method based on semi-supervised learning and clustering | |
Zhang et al. | Aspect-based sentiment analysis for user reviews | |
CN111259153B (en) | Attribute-level emotion analysis method of complete attention mechanism | |
CN113204967B (en) | Resume named entity identification method and system | |
CN110889786A (en) | Legal action insured advocate security use judging service method based on LSTM technology | |
CN112861541A (en) | Commodity comment sentiment analysis method based on multi-feature fusion | |
CN113094502A (en) | Multi-granularity takeaway user comment sentiment analysis method | |
CN112818698A (en) | Fine-grained user comment sentiment analysis method based on dual-channel model | |
CN115391570A (en) | Method and device for constructing emotion knowledge graph based on aspects | |
LU506520B1 (en) | A sentiment analysis method based on multimodal review data | |
Zhang et al. | Efficiency improvement of function point-based software size estimation with deep learning model | |
Caciularu et al. | Cross-document language modeling | |
CN115438709A (en) | Code similarity detection method based on code attribute graph | |
CN110750646A (en) | Attribute description extracting method for hotel comment text | |
CN114388108A (en) | User feedback analysis method based on multi-task learning | |
Velmurugan et al. | Mining implicit and explicit rules for customer data using natural language processing and apriori algorithm | |
CN116805010A (en) | Multi-data chain integration and fusion knowledge graph construction method oriented to equipment manufacturing | |
CN116186241A (en) | Event element extraction method and device based on semantic analysis and prompt learning, electronic equipment and storage medium | |
CN115098637A (en) | Text semantic matching method and system based on Chinese character shape-pronunciation-meaning multi-element knowledge | |
CN114297408A (en) | Relation triple extraction method based on cascade binary labeling framework | |
CN112800762A (en) | Element content extraction method for processing text with format style |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20210917 |