CN112800764A - Entity extraction method in legal field based on Word2Vec-BilSTM-CRF model - Google Patents

Entity extraction method in legal field based on Word2Vec-BilSTM-CRF model Download PDF

Info

Publication number
CN112800764A
CN112800764A CN202011620453.6A CN202011620453A CN112800764A CN 112800764 A CN112800764 A CN 112800764A CN 202011620453 A CN202011620453 A CN 202011620453A CN 112800764 A CN112800764 A CN 112800764A
Authority
CN
China
Prior art keywords
entity
legal field
legal
sentence
word2vec
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011620453.6A
Other languages
Chinese (zh)
Other versions
CN112800764B (en
Inventor
李参宏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jiangsu Netmarch Technologies Co ltd
Original Assignee
Jiangsu Netmarch Technologies Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Netmarch Technologies Co ltd filed Critical Jiangsu Netmarch Technologies Co ltd
Priority to CN202011620453.6A priority Critical patent/CN112800764B/en
Publication of CN112800764A publication Critical patent/CN112800764A/en
Application granted granted Critical
Publication of CN112800764B publication Critical patent/CN112800764B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/18Legal services
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Business, Economics & Management (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Human Resources & Organizations (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Business, Economics & Management (AREA)
  • Strategic Management (AREA)
  • Primary Health Care (AREA)
  • Marketing (AREA)
  • Technology Law (AREA)
  • Economics (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a named entity identification method based on the legal field of Word2Vec-BilSTM-CRF, which specifically comprises the following steps: acquiring original data in the legal field and preprocessing the data to obtain training corpus data; inputting the obtained training corpus data into a Word2Vec algorithm in combination with a CBOW model, thereby obtaining a Word vector aiming at the legal field; labeling the training corpus data obtained by preprocessing by combining a template matching mode and a pause mode of a Chinese corpus and the like to obtain a labeled corpus, taking Bi-LSTM as a coding layer of a model, combining the obtained labeled corpus with the obtained word vector as input of the coding layer, and outputting to obtain text semantic information characteristics; and (3) taking the text semantic information features acquired by the Bi-LSTM layer as the input of the CRF, and finally outputting the recognition result of the named entity. The method has the advantages that the entities with rich types in the legal documents are identified, fine-grained depiction of the entities in the legal field is realized, data structuring in the legal field is realized, and further mining of the relationship between different entities in the legal field is significant.

Description

Entity extraction method in legal field based on Word2Vec-BilSTM-CRF model
Technical Field
The invention relates to the field of named entity identification, in particular to an entity extraction method in the legal field based on a Word2Vec-BilSTM-CRF model.
Background
In the legal field, named entities involved are numerous and complex, whether during case investigation or for court trial. The most common of these entities are case-passing elements such as people (criminal suspects, victims), time, place, motivation, events, and the like. For the different case elements, the case elements have different characteristics and expressions under the context of different criminal law and criminal names.
The legal field is a wide variety of entities, each of which may be represented in a different form. The named entities with different representation forms are identified by a uniform method, fine-grained depiction of the entities in the legal field is realized, data structuring in the legal field is realized, and further mining of the relationship between different entities in the legal field is of great significance.
Chinese patent publication No. CN110807084A, published on 18.02/2020, discloses a patent term relationship extraction method based on Bi-LSTM and keyword policy in attention mechanism, which includes the following steps: step 1): preprocessing a patent text, identifying term characteristics, adding position information, obtaining category keyword characteristics through an improved TextRank algorithm, and forming a vector matrix; step 2): importing the vector matrix into a Bi-LSTM model, and acquiring the overall characteristics of the text information by adopting an attention mechanism; step 3): selecting key features of each sentence as local features by utilizing the maximum pooling layer; step 4): fusing the global features and the local features; step 5): and outputting a classification result by using a softmax classifier. Based on the extraction of patent term relationship, the invention aims at the problem of long-distance dependence in the traditional deep learning method, and through comparison of various experiments, the effect of the invention is superior to that of the existing method, and the requirement of practical application can be well met.
Because the patent is relative to the legal field, named entities of the patent are simple and uniform, the method can extract patent terms, but the effect of the extraction method cannot be applied to the legal field with complicated named entities, no effective identification method is used for mining entities in the legal field, and the extraction effect is poor.
Therefore, it is necessary to provide a new extraction method to solve the above problems.
Disclosure of Invention
In order to solve the problems in the background art, the invention provides a named entity identification method in the legal field based on Word2Vec-BilSTM-CRF, which can mine the relationship between different entities in the legal field.
In order to achieve the purpose, the invention provides the following technical scheme: a named entity identification method based on the legal field of Word2Vec-BilSTM-CRF specifically comprises the following steps:
acquiring original data in the legal field and preprocessing the data to obtain training corpus data; inputting the corpus data obtained in the step A into a Word2Vec algorithm in combination with a CBOW model, so as to obtain a Word vector aiming at the legal field; labeling the training corpus data obtained by preprocessing in the step A by combining the template matching mode and the pause mode of the Chinese corpus and the like to obtain a labeled corpus, specifically: constructing a label system according to a specific entity contained in the legal field, wherein a BIO labeling mode is adopted, a B label is used as the beginning of the entity, an I label represents a non-beginning part of the entity, and an O label represents a non-entity part; constructing an initial entity library in the legal field; traversing the training corpus data set to obtain a sentence set conforming to the dun mode and the like; matching synonyms and parallel words of entities in the initial entity library by using a pause mode and the like, and expanding the entity library by using the entities; performing entity labeling on the training corpus data according to entity use template matching in a legal entity library; checking the marked training expected data obtained by C5 in a manual screening mode, correcting and marking-supplementing entities, updating an entity library, and finally obtaining correctly marked training corpus data; taking Bi-LSTM as a coding layer of the model, combining the labeling linguistic data obtained in the step C and the word vectors obtained in the step B as input of the coding layer, and outputting to obtain text semantic information characteristics; and D, taking the text semantic information features acquired by the Bi-LSTM layer in the step D as the input of the CRF, and finally outputting the recognition result of the named entity.
B, constructing a specific disuse word list in the legal field, and performing word segmentation and word disuse on the training corpus data obtained in the step A by utilizing a jieba and ltp Chinese word segmentation tool; and converting semantic information contained in the vocabulary into n-dimensional Word vectors by using a Word2Vec algorithm and combining a CBOW model to obtain the specific Word vectors in the legal field.
Compared with the prior art, the entity extraction method based on the legal field of Word2Vec-BilSTM-CRF has the beneficial effects that: the method has the advantages that the entities with rich types in the legal documents are identified, fine-grained depiction of the entities in the legal field is realized, data structuring in the legal field is realized, and further mining of the relationship between different entities in the legal field is significant.
Drawings
FIG. 1 is a schematic flow chart of an entity extraction method in the legal field based on Word2Vec-BilSTM-CRF in the invention.
FIG. 2 is a flowchart illustrating obtaining a markup corpus according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention is further explained below with reference to the drawings and the embodiments.
Referring to fig. 1, the invention provides an entity extraction method in the legal field based on Word2Vec-BilSTM-CRF, which specifically comprises the following steps:
: the method for obtaining the training corpus data comprises the following steps of:
step A1: original data in the legal field, including case statement, litigation reports, referee documents and the like in the legal field, are acquired from the Internet by combining a crawler technology with manual screening;
step A2: and carrying out preliminary cleaning and noise reduction on the obtained semi-structured or unstructured multi-source data to obtain available data information.
And B: training word vectors in the legal domain; inputting the corpus data obtained in the step A into a Word2Vec algorithm in combination with a CBOW model, so as to obtain a Word vector aiming at the legal field; the method comprises the following steps:
step B1: constructing a stop word list in the legal field, and performing word segmentation and stop word removal on training corpus data by using Chinese word segmentation tools such as jieba, ltp and the like;
step B2: obtaining a Word vector aiming at the legal field by using a Word2Vec algorithm;
step B3: the Word2Vec algorithm uses the CBOW model to convert semantic information into an n-dimensional vector. The input of the CBOW model is a word vector corresponding to a related word of a certain characteristic word, and the output is the word vector of the specific word, so that the semantic information of the context can be well stored.
And C: aiming at the training corpus data obtained by preprocessing in the step A, constructing an initial entity library in the legal field, and labeling by combining template matching and the pause mode of the Chinese corpus, wherein the pause mode can effectively reduce the manual labeling work and obtain a labeled corpus; the method comprises the following steps:
step C1: constructing a label system aiming at named entities in the legal field, wherein the named entities comprise the types, components and characteristics of laws; adopting a BIO labeling mode, wherein a B label is used as the beginning of an entity, an I label represents a non-beginning part of the entity, and an O label represents a non-entity part;
step C2: manually constructing an initial entity library in the legal field;
step C3: traversing the training corpus data set to obtain a sentence set conforming to the dun mode and the like;
in the Chinese corpus, the use of pause signs is mainly to list synonyms of a certain kind of words, and entities appearing in the corpus assume that before and after pause signs appear, parallel words are often the same kind of words or synonyms of the entities, and can be used as entities to supplement an entity library, and the mode is called a pause waiting mode.
The dungeon mode is not limited to just the front and back entities connected by a dungeon number, but generally has some expressions as follows:
Figure BDA0002876002960000041
step C4: matching synonyms, parallel words and the like of the entities in the initial entity library by using the patterns of pause and the like, and expanding the entity library by using the entities;
step C5: performing entity labeling on the training corpus data according to entity use template matching in a legal entity library;
step C6: and checking the marked training expected data acquired by the C5 in a manual screening mode, correcting and supplementing entity, updating the entity library, and finally acquiring the training corpus data with correct mark.
Step D: taking a Bi-LSTM model as an encoding layer, wherein X is (X)1,x2,x3,…,xn) As input to the coding layer, where xiC, obtaining a word vector of the legal field corresponding to each word in the training corpus data marked in the step C and obtained by training in the step B;
ft=σ(Wf·[ht-1,xt]+bf)
it=σ(Wi·[ht-1,xt]+bi)
Figure BDA0002876002960000051
Figure BDA0002876002960000052
ot=σ(Wo·[ht-1,xt]+bo)
ht=ot*tanhCt
{h0,h1,...,hn}={[hL0,hRn],[hL1,hR(n-1)],...,[hLn,hR0]}
Bi-LSTM can effectively use past features (through forward states) and future features (through backward states) within a specified time horizon, using back propagation through time to train a Bi-directional LSTM network.
Step E: inputting the label vector characteristics obtained by the Bi-LSTM layer into a CRF layer to obtain the score of each word label;
the CRF layer can effectively utilize sentence-level label information, and sets a constraint condition for further mining the relation between different entities in the legal field to ensure that the final prediction is effective, wherein the constraint condition can be automatically learned by the CRF layer during training data. In particular, the method comprises the following steps of,
the sentence for which an entity needs to be identified is expressed as the following expression, xiWords in the representation sentence:
X=(x1,x2,...,xn);
the corresponding labels of the sentence are:
Y=(y1,y2,...,yn);
determining a scoring method function expression mode corresponding to the sentence corresponding to the recognition entity:
Figure BDA0002876002960000053
wherein A is a transition score matrix, Ai,jRepresents a score for a transition from label i to label j, where y0And ynStart and end tags for sentences, respectively; so the latitude of a is (k +2) × (k +2) (k is the number of tags); p is a fraction matrix output by the Bi-LSTM network, and has latitude of n x k (k is the label number), and Pi,jRepresenting the score of the ith word corresponding to the jth tag in the sentence.
The aim is to obtain a maximum value of the scoring function.
For a given sentence X, the probability of getting label y is:
Figure BDA0002876002960000061
YXall possible tag sequences corresponding to the sentence X are represented, that is, each tag sequence corresponding to the sentence has a score and a probability, so as to maximize the probability of the real sequence corresponding to the sentence.
In addition, a loss function is provided, the minimum value in the loss function is obtained, and the transformation is given by:
Figure BDA0002876002960000062
to obtain a minimum in the loss function.
Expressed by the likelihood formula:
Figure BDA0002876002960000063
finally, named entities such as persons, motivations, events and the like in the identified case pass are output. Therefore, entities with rich types in the legal documents are identified, fine-grained depiction of the entities in the legal field and data structuring in the legal field are realized, and the relationship among different entities in the legal field is further mined.
The above description is only a preferred embodiment of the present invention, and all equivalent changes and modifications made in accordance with the claims of the present invention should be covered by the present invention.

Claims (4)

1. A named entity identification method based on the legal field of Word2Vec-BilSTM-CRF is characterized by comprising the following steps:
step A: acquiring original data in the legal field and preprocessing the data to obtain training corpus data;
and B: inputting the corpus data obtained in the step A into a Word2Vec algorithm in combination with a CBOW model, so as to obtain a Word vector aiming at the legal field;
and C: labeling the training corpus data obtained by preprocessing in the step A by combining the template matching mode and the pause mode of the Chinese corpus and the like to obtain a labeled corpus, specifically:
step C1: constructing a label system according to a specific entity contained in the legal field, wherein a BIO labeling mode is adopted, a B label is used as the beginning of the entity, an I label represents a non-beginning part of the entity, and an O label represents a non-entity part;
step C2: constructing an initial entity library in the legal field;
step C3: traversing the training corpus data set to obtain a sentence set conforming to the dun mode and the like;
step C4: matching synonyms and parallel words of entities in the initial entity library by using a pause mode and the like, and expanding the entity library by using the entities;
step C5: performing entity labeling on the training corpus data according to entity use template matching in a legal entity library;
step C6: checking the marked training expected data obtained by C5 in a manual screening mode, correcting and marking-supplementing entities, updating an entity library, and finally obtaining correctly marked training corpus data;
step D: taking Bi-LSTM as a coding layer of the model, combining the labeling linguistic data obtained in the step C and the word vectors obtained in the step B as input of the coding layer, and outputting to obtain text semantic information characteristics;
step E: and D, taking the text semantic information features acquired by the Bi-LSTM layer in the step D as the input of the CRF, and finally outputting the recognition result of the named entity.
2. The method for identifying the named entity in the legal field based on Word2Vec-BilSTM-CRF as claimed in claim 1, wherein: the step B specifically comprises the following steps:
step B1: b, constructing a specific disuse word list in the legal field, and performing word segmentation and word disuse on the training corpus data obtained in the step A by utilizing a jieba and ltp Chinese word segmentation tool;
step B2: and converting semantic information contained in the vocabulary into n-dimensional Word vectors by using a Word2Vec algorithm and combining a CBOW model to obtain the specific Word vectors in the legal field.
3. The method for identifying the named entity in the legal field based on Word2Vec-BilSTM-CRF as claimed in claim 1, wherein: in the step E, the step of the method is carried out,
the sentence for which an entity needs to be identified is expressed as the following expression, xiWords in the representation sentence:
X=(x1,x2,…,xn);
the corresponding labels of the sentence are:
Y=(y1,y2,…,yn);
determining a scoring method function expression mode corresponding to the sentence corresponding to the identified entity to obtain the maximum value of the scoring function:
Figure FDA0002876002950000021
wherein A is a transition score matrix, Ai,jRepresents a score for a transition from label i to label j, where y0And ynStart and end tags for sentences, respectively; so the latitude of a is (k +2) × (k + 2); p is a fractional matrix of Bi-LSTM network output with latitude n x k, Pi,jA score representing that the ith word in the sentence corresponds to the jth tag;
for a given sentence X, so that sentence X most likely obtains a corresponding true sequence:
Figure FDA0002876002950000022
YXrepresenting all possible tag sequences for sentence X.
4. The method for identifying named entities in the legal field based on Word2Vec-BilSTM-CRF as claimed in claim 3, wherein: providing a loss function to obtain a minimum value in the loss function, transformed to the following equation:
Figure FDA0002876002950000023
CN202011620453.6A 2020-12-31 2020-12-31 Entity extraction method in legal field based on Word2Vec-BiLSTM-CRF model Active CN112800764B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011620453.6A CN112800764B (en) 2020-12-31 2020-12-31 Entity extraction method in legal field based on Word2Vec-BiLSTM-CRF model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011620453.6A CN112800764B (en) 2020-12-31 2020-12-31 Entity extraction method in legal field based on Word2Vec-BiLSTM-CRF model

Publications (2)

Publication Number Publication Date
CN112800764A true CN112800764A (en) 2021-05-14
CN112800764B CN112800764B (en) 2023-07-04

Family

ID=75804975

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011620453.6A Active CN112800764B (en) 2020-12-31 2020-12-31 Entity extraction method in legal field based on Word2Vec-BiLSTM-CRF model

Country Status (1)

Country Link
CN (1) CN112800764B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113312918A (en) * 2021-06-10 2021-08-27 临沂大学 Word segmentation and capsule network law named entity identification method fusing radical vectors
CN113377916A (en) * 2021-06-22 2021-09-10 哈尔滨工业大学 Extraction method of main relations in multiple relations facing legal text
CN114048748A (en) * 2021-11-17 2022-02-15 上海勃池信息技术有限公司 Named entity recognition system, method, electronic device, and medium
CN114330349A (en) * 2022-01-05 2022-04-12 北京航空航天大学 Specific field named entity recognition method
CN115270780A (en) * 2022-07-20 2022-11-01 北京新纽科技有限公司 Method for recognizing terms

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019113122A1 (en) * 2017-12-04 2019-06-13 Conversica, Inc. Systems and methods for improved machine learning for conversations
CN110633409A (en) * 2018-06-20 2019-12-31 上海财经大学 Rule and deep learning fused automobile news event extraction method
CN110990525A (en) * 2019-11-15 2020-04-10 华融融通(北京)科技有限公司 Natural language processing-based public opinion information extraction and knowledge base generation method
CN111444726A (en) * 2020-03-27 2020-07-24 河海大学常州校区 Method and device for extracting Chinese semantic information of long-time and short-time memory network based on bidirectional lattice structure

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019113122A1 (en) * 2017-12-04 2019-06-13 Conversica, Inc. Systems and methods for improved machine learning for conversations
CN110633409A (en) * 2018-06-20 2019-12-31 上海财经大学 Rule and deep learning fused automobile news event extraction method
CN110990525A (en) * 2019-11-15 2020-04-10 华融融通(北京)科技有限公司 Natural language processing-based public opinion information extraction and knowledge base generation method
CN111444726A (en) * 2020-03-27 2020-07-24 河海大学常州校区 Method and device for extracting Chinese semantic information of long-time and short-time memory network based on bidirectional lattice structure

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
翟社平: "基于BILSTM_CRF的知识图谱实体抽取方法", 《计算机应用与软件》, vol. 36, no. 5, pages 269 - 274 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113312918A (en) * 2021-06-10 2021-08-27 临沂大学 Word segmentation and capsule network law named entity identification method fusing radical vectors
CN113312918B (en) * 2021-06-10 2022-05-17 临沂大学 Word segmentation and capsule network law named entity identification method fusing radical vectors
CN113377916A (en) * 2021-06-22 2021-09-10 哈尔滨工业大学 Extraction method of main relations in multiple relations facing legal text
CN114048748A (en) * 2021-11-17 2022-02-15 上海勃池信息技术有限公司 Named entity recognition system, method, electronic device, and medium
CN114048748B (en) * 2021-11-17 2024-04-05 上海勃池信息技术有限公司 Named entity recognition system, named entity recognition method, named entity recognition electronic equipment and named entity recognition medium
CN114330349A (en) * 2022-01-05 2022-04-12 北京航空航天大学 Specific field named entity recognition method
CN115270780A (en) * 2022-07-20 2022-11-01 北京新纽科技有限公司 Method for recognizing terms

Also Published As

Publication number Publication date
CN112800764B (en) 2023-07-04

Similar Documents

Publication Publication Date Title
CN112800764B (en) Entity extraction method in legal field based on Word2Vec-BiLSTM-CRF model
CN110866401A (en) Chinese electronic medical record named entity identification method and system based on attention mechanism
CN111897908A (en) Event extraction method and system fusing dependency information and pre-training language model
CN111694924A (en) Event extraction method and system
CN109753660B (en) LSTM-based winning bid web page named entity extraction method
CN109635280A (en) A kind of event extraction method based on mark
CN113642330A (en) Rail transit standard entity identification method based on catalog topic classification
CN110825848B (en) Text classification method based on phrase vectors
CN110134954B (en) Named entity recognition method based on Attention mechanism
CN112966525B (en) Law field event extraction method based on pre-training model and convolutional neural network algorithm
CN112749562A (en) Named entity identification method, device, storage medium and electronic equipment
CN111444704B (en) Network safety keyword extraction method based on deep neural network
CN112800184B (en) Short text comment emotion analysis method based on Target-Aspect-Opinion joint extraction
CN113128203A (en) Attention mechanism-based relationship extraction method, system, equipment and storage medium
CN113869053A (en) Method and system for recognizing named entities oriented to judicial texts
CN114818717A (en) Chinese named entity recognition method and system fusing vocabulary and syntax information
CN114239574A (en) Miner violation knowledge extraction method based on entity and relationship joint learning
CN115310448A (en) Chinese named entity recognition method based on combining bert and word vector
CN114091450A (en) Judicial domain relation extraction method and system based on graph convolution network
CN116432645A (en) Traffic accident named entity recognition method based on pre-training model
CN111428501A (en) Named entity recognition method, recognition system and computer readable storage medium
CN112818698A (en) Fine-grained user comment sentiment analysis method based on dual-channel model
CN112989830B (en) Named entity identification method based on multiple features and machine learning
CN117236338B (en) Named entity recognition model of dense entity text and training method thereof
CN114970537B (en) Cross-border ethnic cultural entity relation extraction method and device based on multi-layer labeling strategy

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant