CN112800764A - Entity extraction method in legal field based on Word2Vec-BilSTM-CRF model - Google Patents
Entity extraction method in legal field based on Word2Vec-BilSTM-CRF model Download PDFInfo
- Publication number
- CN112800764A CN112800764A CN202011620453.6A CN202011620453A CN112800764A CN 112800764 A CN112800764 A CN 112800764A CN 202011620453 A CN202011620453 A CN 202011620453A CN 112800764 A CN112800764 A CN 112800764A
- Authority
- CN
- China
- Prior art keywords
- entity
- legal field
- legal
- sentence
- word2vec
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000605 extraction Methods 0.000 title description 11
- 238000012549 training Methods 0.000 claims abstract description 25
- 238000000034 method Methods 0.000 claims abstract description 18
- 239000013598 vector Substances 0.000 claims abstract description 18
- 238000002372 labelling Methods 0.000 claims abstract description 13
- 238000007781 pre-processing Methods 0.000 claims abstract description 8
- 230000014509 gene expression Effects 0.000 claims description 6
- 239000011159 matrix material Substances 0.000 claims description 6
- 230000011218 segmentation Effects 0.000 claims description 6
- 238000012216 screening Methods 0.000 claims description 4
- 230000007704 transition Effects 0.000 claims description 4
- 238000013077 scoring method Methods 0.000 claims description 2
- 238000005065 mining Methods 0.000 abstract description 4
- 230000006870 function Effects 0.000 description 5
- 230000000694 effects Effects 0.000 description 3
- 230000007246 mechanism Effects 0.000 description 2
- 230000008450 motivation Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 230000001502 supplementing effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/18—Legal services
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Business, Economics & Management (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Tourism & Hospitality (AREA)
- Human Resources & Organizations (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Business, Economics & Management (AREA)
- Strategic Management (AREA)
- Primary Health Care (AREA)
- Marketing (AREA)
- Technology Law (AREA)
- Economics (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a named entity identification method based on the legal field of Word2Vec-BilSTM-CRF, which specifically comprises the following steps: acquiring original data in the legal field and preprocessing the data to obtain training corpus data; inputting the obtained training corpus data into a Word2Vec algorithm in combination with a CBOW model, thereby obtaining a Word vector aiming at the legal field; labeling the training corpus data obtained by preprocessing by combining a template matching mode and a pause mode of a Chinese corpus and the like to obtain a labeled corpus, taking Bi-LSTM as a coding layer of a model, combining the obtained labeled corpus with the obtained word vector as input of the coding layer, and outputting to obtain text semantic information characteristics; and (3) taking the text semantic information features acquired by the Bi-LSTM layer as the input of the CRF, and finally outputting the recognition result of the named entity. The method has the advantages that the entities with rich types in the legal documents are identified, fine-grained depiction of the entities in the legal field is realized, data structuring in the legal field is realized, and further mining of the relationship between different entities in the legal field is significant.
Description
Technical Field
The invention relates to the field of named entity identification, in particular to an entity extraction method in the legal field based on a Word2Vec-BilSTM-CRF model.
Background
In the legal field, named entities involved are numerous and complex, whether during case investigation or for court trial. The most common of these entities are case-passing elements such as people (criminal suspects, victims), time, place, motivation, events, and the like. For the different case elements, the case elements have different characteristics and expressions under the context of different criminal law and criminal names.
The legal field is a wide variety of entities, each of which may be represented in a different form. The named entities with different representation forms are identified by a uniform method, fine-grained depiction of the entities in the legal field is realized, data structuring in the legal field is realized, and further mining of the relationship between different entities in the legal field is of great significance.
Chinese patent publication No. CN110807084A, published on 18.02/2020, discloses a patent term relationship extraction method based on Bi-LSTM and keyword policy in attention mechanism, which includes the following steps: step 1): preprocessing a patent text, identifying term characteristics, adding position information, obtaining category keyword characteristics through an improved TextRank algorithm, and forming a vector matrix; step 2): importing the vector matrix into a Bi-LSTM model, and acquiring the overall characteristics of the text information by adopting an attention mechanism; step 3): selecting key features of each sentence as local features by utilizing the maximum pooling layer; step 4): fusing the global features and the local features; step 5): and outputting a classification result by using a softmax classifier. Based on the extraction of patent term relationship, the invention aims at the problem of long-distance dependence in the traditional deep learning method, and through comparison of various experiments, the effect of the invention is superior to that of the existing method, and the requirement of practical application can be well met.
Because the patent is relative to the legal field, named entities of the patent are simple and uniform, the method can extract patent terms, but the effect of the extraction method cannot be applied to the legal field with complicated named entities, no effective identification method is used for mining entities in the legal field, and the extraction effect is poor.
Therefore, it is necessary to provide a new extraction method to solve the above problems.
Disclosure of Invention
In order to solve the problems in the background art, the invention provides a named entity identification method in the legal field based on Word2Vec-BilSTM-CRF, which can mine the relationship between different entities in the legal field.
In order to achieve the purpose, the invention provides the following technical scheme: a named entity identification method based on the legal field of Word2Vec-BilSTM-CRF specifically comprises the following steps:
acquiring original data in the legal field and preprocessing the data to obtain training corpus data; inputting the corpus data obtained in the step A into a Word2Vec algorithm in combination with a CBOW model, so as to obtain a Word vector aiming at the legal field; labeling the training corpus data obtained by preprocessing in the step A by combining the template matching mode and the pause mode of the Chinese corpus and the like to obtain a labeled corpus, specifically: constructing a label system according to a specific entity contained in the legal field, wherein a BIO labeling mode is adopted, a B label is used as the beginning of the entity, an I label represents a non-beginning part of the entity, and an O label represents a non-entity part; constructing an initial entity library in the legal field; traversing the training corpus data set to obtain a sentence set conforming to the dun mode and the like; matching synonyms and parallel words of entities in the initial entity library by using a pause mode and the like, and expanding the entity library by using the entities; performing entity labeling on the training corpus data according to entity use template matching in a legal entity library; checking the marked training expected data obtained by C5 in a manual screening mode, correcting and marking-supplementing entities, updating an entity library, and finally obtaining correctly marked training corpus data; taking Bi-LSTM as a coding layer of the model, combining the labeling linguistic data obtained in the step C and the word vectors obtained in the step B as input of the coding layer, and outputting to obtain text semantic information characteristics; and D, taking the text semantic information features acquired by the Bi-LSTM layer in the step D as the input of the CRF, and finally outputting the recognition result of the named entity.
B, constructing a specific disuse word list in the legal field, and performing word segmentation and word disuse on the training corpus data obtained in the step A by utilizing a jieba and ltp Chinese word segmentation tool; and converting semantic information contained in the vocabulary into n-dimensional Word vectors by using a Word2Vec algorithm and combining a CBOW model to obtain the specific Word vectors in the legal field.
Compared with the prior art, the entity extraction method based on the legal field of Word2Vec-BilSTM-CRF has the beneficial effects that: the method has the advantages that the entities with rich types in the legal documents are identified, fine-grained depiction of the entities in the legal field is realized, data structuring in the legal field is realized, and further mining of the relationship between different entities in the legal field is significant.
Drawings
FIG. 1 is a schematic flow chart of an entity extraction method in the legal field based on Word2Vec-BilSTM-CRF in the invention.
FIG. 2 is a flowchart illustrating obtaining a markup corpus according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention is further explained below with reference to the drawings and the embodiments.
Referring to fig. 1, the invention provides an entity extraction method in the legal field based on Word2Vec-BilSTM-CRF, which specifically comprises the following steps:
: the method for obtaining the training corpus data comprises the following steps of:
step A1: original data in the legal field, including case statement, litigation reports, referee documents and the like in the legal field, are acquired from the Internet by combining a crawler technology with manual screening;
step A2: and carrying out preliminary cleaning and noise reduction on the obtained semi-structured or unstructured multi-source data to obtain available data information.
And B: training word vectors in the legal domain; inputting the corpus data obtained in the step A into a Word2Vec algorithm in combination with a CBOW model, so as to obtain a Word vector aiming at the legal field; the method comprises the following steps:
step B1: constructing a stop word list in the legal field, and performing word segmentation and stop word removal on training corpus data by using Chinese word segmentation tools such as jieba, ltp and the like;
step B2: obtaining a Word vector aiming at the legal field by using a Word2Vec algorithm;
step B3: the Word2Vec algorithm uses the CBOW model to convert semantic information into an n-dimensional vector. The input of the CBOW model is a word vector corresponding to a related word of a certain characteristic word, and the output is the word vector of the specific word, so that the semantic information of the context can be well stored.
And C: aiming at the training corpus data obtained by preprocessing in the step A, constructing an initial entity library in the legal field, and labeling by combining template matching and the pause mode of the Chinese corpus, wherein the pause mode can effectively reduce the manual labeling work and obtain a labeled corpus; the method comprises the following steps:
step C1: constructing a label system aiming at named entities in the legal field, wherein the named entities comprise the types, components and characteristics of laws; adopting a BIO labeling mode, wherein a B label is used as the beginning of an entity, an I label represents a non-beginning part of the entity, and an O label represents a non-entity part;
step C2: manually constructing an initial entity library in the legal field;
step C3: traversing the training corpus data set to obtain a sentence set conforming to the dun mode and the like;
in the Chinese corpus, the use of pause signs is mainly to list synonyms of a certain kind of words, and entities appearing in the corpus assume that before and after pause signs appear, parallel words are often the same kind of words or synonyms of the entities, and can be used as entities to supplement an entity library, and the mode is called a pause waiting mode.
The dungeon mode is not limited to just the front and back entities connected by a dungeon number, but generally has some expressions as follows:
step C4: matching synonyms, parallel words and the like of the entities in the initial entity library by using the patterns of pause and the like, and expanding the entity library by using the entities;
step C5: performing entity labeling on the training corpus data according to entity use template matching in a legal entity library;
step C6: and checking the marked training expected data acquired by the C5 in a manual screening mode, correcting and supplementing entity, updating the entity library, and finally acquiring the training corpus data with correct mark.
Step D: taking a Bi-LSTM model as an encoding layer, wherein X is (X)1,x2,x3,…,xn) As input to the coding layer, where xiC, obtaining a word vector of the legal field corresponding to each word in the training corpus data marked in the step C and obtained by training in the step B;
ft=σ(Wf·[ht-1,xt]+bf)
it=σ(Wi·[ht-1,xt]+bi)
ot=σ(Wo·[ht-1,xt]+bo)
ht=ot*tanhCt
{h0,h1,...,hn}={[hL0,hRn],[hL1,hR(n-1)],...,[hLn,hR0]}
Bi-LSTM can effectively use past features (through forward states) and future features (through backward states) within a specified time horizon, using back propagation through time to train a Bi-directional LSTM network.
Step E: inputting the label vector characteristics obtained by the Bi-LSTM layer into a CRF layer to obtain the score of each word label;
the CRF layer can effectively utilize sentence-level label information, and sets a constraint condition for further mining the relation between different entities in the legal field to ensure that the final prediction is effective, wherein the constraint condition can be automatically learned by the CRF layer during training data. In particular, the method comprises the following steps of,
the sentence for which an entity needs to be identified is expressed as the following expression, xiWords in the representation sentence:
X=(x1,x2,...,xn);
the corresponding labels of the sentence are:
Y=(y1,y2,...,yn);
determining a scoring method function expression mode corresponding to the sentence corresponding to the recognition entity:
wherein A is a transition score matrix, Ai,jRepresents a score for a transition from label i to label j, where y0And ynStart and end tags for sentences, respectively; so the latitude of a is (k +2) × (k +2) (k is the number of tags); p is a fraction matrix output by the Bi-LSTM network, and has latitude of n x k (k is the label number), and Pi,jRepresenting the score of the ith word corresponding to the jth tag in the sentence.
The aim is to obtain a maximum value of the scoring function.
For a given sentence X, the probability of getting label y is:
YXall possible tag sequences corresponding to the sentence X are represented, that is, each tag sequence corresponding to the sentence has a score and a probability, so as to maximize the probability of the real sequence corresponding to the sentence.
In addition, a loss function is provided, the minimum value in the loss function is obtained, and the transformation is given by:
to obtain a minimum in the loss function.
Expressed by the likelihood formula:
finally, named entities such as persons, motivations, events and the like in the identified case pass are output. Therefore, entities with rich types in the legal documents are identified, fine-grained depiction of the entities in the legal field and data structuring in the legal field are realized, and the relationship among different entities in the legal field is further mined.
The above description is only a preferred embodiment of the present invention, and all equivalent changes and modifications made in accordance with the claims of the present invention should be covered by the present invention.
Claims (4)
1. A named entity identification method based on the legal field of Word2Vec-BilSTM-CRF is characterized by comprising the following steps:
step A: acquiring original data in the legal field and preprocessing the data to obtain training corpus data;
and B: inputting the corpus data obtained in the step A into a Word2Vec algorithm in combination with a CBOW model, so as to obtain a Word vector aiming at the legal field;
and C: labeling the training corpus data obtained by preprocessing in the step A by combining the template matching mode and the pause mode of the Chinese corpus and the like to obtain a labeled corpus, specifically:
step C1: constructing a label system according to a specific entity contained in the legal field, wherein a BIO labeling mode is adopted, a B label is used as the beginning of the entity, an I label represents a non-beginning part of the entity, and an O label represents a non-entity part;
step C2: constructing an initial entity library in the legal field;
step C3: traversing the training corpus data set to obtain a sentence set conforming to the dun mode and the like;
step C4: matching synonyms and parallel words of entities in the initial entity library by using a pause mode and the like, and expanding the entity library by using the entities;
step C5: performing entity labeling on the training corpus data according to entity use template matching in a legal entity library;
step C6: checking the marked training expected data obtained by C5 in a manual screening mode, correcting and marking-supplementing entities, updating an entity library, and finally obtaining correctly marked training corpus data;
step D: taking Bi-LSTM as a coding layer of the model, combining the labeling linguistic data obtained in the step C and the word vectors obtained in the step B as input of the coding layer, and outputting to obtain text semantic information characteristics;
step E: and D, taking the text semantic information features acquired by the Bi-LSTM layer in the step D as the input of the CRF, and finally outputting the recognition result of the named entity.
2. The method for identifying the named entity in the legal field based on Word2Vec-BilSTM-CRF as claimed in claim 1, wherein: the step B specifically comprises the following steps:
step B1: b, constructing a specific disuse word list in the legal field, and performing word segmentation and word disuse on the training corpus data obtained in the step A by utilizing a jieba and ltp Chinese word segmentation tool;
step B2: and converting semantic information contained in the vocabulary into n-dimensional Word vectors by using a Word2Vec algorithm and combining a CBOW model to obtain the specific Word vectors in the legal field.
3. The method for identifying the named entity in the legal field based on Word2Vec-BilSTM-CRF as claimed in claim 1, wherein: in the step E, the step of the method is carried out,
the sentence for which an entity needs to be identified is expressed as the following expression, xiWords in the representation sentence:
X=(x1,x2,…,xn);
the corresponding labels of the sentence are:
Y=(y1,y2,…,yn);
determining a scoring method function expression mode corresponding to the sentence corresponding to the identified entity to obtain the maximum value of the scoring function:
wherein A is a transition score matrix, Ai,jRepresents a score for a transition from label i to label j, where y0And ynStart and end tags for sentences, respectively; so the latitude of a is (k +2) × (k + 2); p is a fractional matrix of Bi-LSTM network output with latitude n x k, Pi,jA score representing that the ith word in the sentence corresponds to the jth tag;
for a given sentence X, so that sentence X most likely obtains a corresponding true sequence:
YXrepresenting all possible tag sequences for sentence X.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011620453.6A CN112800764B (en) | 2020-12-31 | 2020-12-31 | Entity extraction method in legal field based on Word2Vec-BiLSTM-CRF model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011620453.6A CN112800764B (en) | 2020-12-31 | 2020-12-31 | Entity extraction method in legal field based on Word2Vec-BiLSTM-CRF model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112800764A true CN112800764A (en) | 2021-05-14 |
CN112800764B CN112800764B (en) | 2023-07-04 |
Family
ID=75804975
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011620453.6A Active CN112800764B (en) | 2020-12-31 | 2020-12-31 | Entity extraction method in legal field based on Word2Vec-BiLSTM-CRF model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112800764B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113312918A (en) * | 2021-06-10 | 2021-08-27 | 临沂大学 | Word segmentation and capsule network law named entity identification method fusing radical vectors |
CN113377916A (en) * | 2021-06-22 | 2021-09-10 | 哈尔滨工业大学 | Extraction method of main relations in multiple relations facing legal text |
CN114048748A (en) * | 2021-11-17 | 2022-02-15 | 上海勃池信息技术有限公司 | Named entity recognition system, method, electronic device, and medium |
CN114330349A (en) * | 2022-01-05 | 2022-04-12 | 北京航空航天大学 | Specific field named entity recognition method |
CN115270780A (en) * | 2022-07-20 | 2022-11-01 | 北京新纽科技有限公司 | Method for recognizing terms |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019113122A1 (en) * | 2017-12-04 | 2019-06-13 | Conversica, Inc. | Systems and methods for improved machine learning for conversations |
CN110633409A (en) * | 2018-06-20 | 2019-12-31 | 上海财经大学 | Rule and deep learning fused automobile news event extraction method |
CN110990525A (en) * | 2019-11-15 | 2020-04-10 | 华融融通(北京)科技有限公司 | Natural language processing-based public opinion information extraction and knowledge base generation method |
CN111444726A (en) * | 2020-03-27 | 2020-07-24 | 河海大学常州校区 | Method and device for extracting Chinese semantic information of long-time and short-time memory network based on bidirectional lattice structure |
-
2020
- 2020-12-31 CN CN202011620453.6A patent/CN112800764B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019113122A1 (en) * | 2017-12-04 | 2019-06-13 | Conversica, Inc. | Systems and methods for improved machine learning for conversations |
CN110633409A (en) * | 2018-06-20 | 2019-12-31 | 上海财经大学 | Rule and deep learning fused automobile news event extraction method |
CN110990525A (en) * | 2019-11-15 | 2020-04-10 | 华融融通(北京)科技有限公司 | Natural language processing-based public opinion information extraction and knowledge base generation method |
CN111444726A (en) * | 2020-03-27 | 2020-07-24 | 河海大学常州校区 | Method and device for extracting Chinese semantic information of long-time and short-time memory network based on bidirectional lattice structure |
Non-Patent Citations (1)
Title |
---|
翟社平: "基于BILSTM_CRF的知识图谱实体抽取方法", 《计算机应用与软件》, vol. 36, no. 5, pages 269 - 274 * |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113312918A (en) * | 2021-06-10 | 2021-08-27 | 临沂大学 | Word segmentation and capsule network law named entity identification method fusing radical vectors |
CN113312918B (en) * | 2021-06-10 | 2022-05-17 | 临沂大学 | Word segmentation and capsule network law named entity identification method fusing radical vectors |
CN113377916A (en) * | 2021-06-22 | 2021-09-10 | 哈尔滨工业大学 | Extraction method of main relations in multiple relations facing legal text |
CN114048748A (en) * | 2021-11-17 | 2022-02-15 | 上海勃池信息技术有限公司 | Named entity recognition system, method, electronic device, and medium |
CN114048748B (en) * | 2021-11-17 | 2024-04-05 | 上海勃池信息技术有限公司 | Named entity recognition system, named entity recognition method, named entity recognition electronic equipment and named entity recognition medium |
CN114330349A (en) * | 2022-01-05 | 2022-04-12 | 北京航空航天大学 | Specific field named entity recognition method |
CN115270780A (en) * | 2022-07-20 | 2022-11-01 | 北京新纽科技有限公司 | Method for recognizing terms |
Also Published As
Publication number | Publication date |
---|---|
CN112800764B (en) | 2023-07-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112800764B (en) | Entity extraction method in legal field based on Word2Vec-BiLSTM-CRF model | |
CN110866401A (en) | Chinese electronic medical record named entity identification method and system based on attention mechanism | |
CN111897908A (en) | Event extraction method and system fusing dependency information and pre-training language model | |
CN111694924A (en) | Event extraction method and system | |
CN109753660B (en) | LSTM-based winning bid web page named entity extraction method | |
CN109635280A (en) | A kind of event extraction method based on mark | |
CN113642330A (en) | Rail transit standard entity identification method based on catalog topic classification | |
CN110825848B (en) | Text classification method based on phrase vectors | |
CN110134954B (en) | Named entity recognition method based on Attention mechanism | |
CN112966525B (en) | Law field event extraction method based on pre-training model and convolutional neural network algorithm | |
CN112749562A (en) | Named entity identification method, device, storage medium and electronic equipment | |
CN111444704B (en) | Network safety keyword extraction method based on deep neural network | |
CN112800184B (en) | Short text comment emotion analysis method based on Target-Aspect-Opinion joint extraction | |
CN113128203A (en) | Attention mechanism-based relationship extraction method, system, equipment and storage medium | |
CN113869053A (en) | Method and system for recognizing named entities oriented to judicial texts | |
CN114818717A (en) | Chinese named entity recognition method and system fusing vocabulary and syntax information | |
CN114239574A (en) | Miner violation knowledge extraction method based on entity and relationship joint learning | |
CN115310448A (en) | Chinese named entity recognition method based on combining bert and word vector | |
CN114091450A (en) | Judicial domain relation extraction method and system based on graph convolution network | |
CN116432645A (en) | Traffic accident named entity recognition method based on pre-training model | |
CN111428501A (en) | Named entity recognition method, recognition system and computer readable storage medium | |
CN112818698A (en) | Fine-grained user comment sentiment analysis method based on dual-channel model | |
CN112989830B (en) | Named entity identification method based on multiple features and machine learning | |
CN117236338B (en) | Named entity recognition model of dense entity text and training method thereof | |
CN114970537B (en) | Cross-border ethnic cultural entity relation extraction method and device based on multi-layer labeling strategy |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |