CN115130475A - Extensible universal end-to-end named entity identification method - Google Patents

Extensible universal end-to-end named entity identification method Download PDF

Info

Publication number
CN115130475A
CN115130475A CN202210617397.3A CN202210617397A CN115130475A CN 115130475 A CN115130475 A CN 115130475A CN 202210617397 A CN202210617397 A CN 202210617397A CN 115130475 A CN115130475 A CN 115130475A
Authority
CN
China
Prior art keywords
entity
text
model
word
category
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210617397.3A
Other languages
Chinese (zh)
Inventor
李祥学
李轩舟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China Normal University
Original Assignee
East China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China Normal University filed Critical East China Normal University
Priority to CN202210617397.3A priority Critical patent/CN115130475A/en
Publication of CN115130475A publication Critical patent/CN115130475A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/166Editing, e.g. inserting or deleting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Abstract

The invention discloses an extensible universal end-to-end named entity recognition method which comprises a text preprocessing process, a model M, a training process and a prediction and entity analysis process by using the model M, wherein the model M comprises an input layer, a context coding layer and a graph modeling layer. Text preprocessing: receiving a text input and an entity category, and generating an input sequence; the training model comprises acquiring a data set, converting the data set into a training set, and performing multiple rounds of training on the model by using the training set; after the training of the model is finished, the input sequence processed in the text preprocessing process is input into the model M, the graph modeling layer of the model M calculates the connection relation between words, and finally the entity identified in the graph is analyzed. The method can adapt to the recognition under the conditions of entity overlapping and entity non-continuity, and can adapt to the condition of demand change such as newly added entity types without modifying the model structure, thereby being easy to expand and carry out field transfer learning.

Description

Extensible universal end-to-end named entity identification method
Technical Field
The invention relates to the technical field of natural language processing, in particular to an extensible universal end-to-end named entity identification method.
Background
Named Entity Recognition (NER) is an important component of natural language processing. Named entity recognition refers to a process of recognizing names or symbols of things with specific meanings in texts, and named entities mainly comprise names of people, places, organizations, dates, proper nouns and the like. Many downstream NLP tasks or applications rely on the NER for information extraction, such as question answering, relationship extraction, event extraction, and entity linking. If the named entities in the text can be recognized more accurately, the computer can better understand the semantics of the language and better execute tasks, so that the human-computer interaction experience is improved.
The Transformer is proposed in a paper Attention is all you needed by the Google Brain team in 2017, and is constructed by utilizing an Attention mechanism and full connection, and a serial computing structure different from a recurrent neural network can fully utilize parallel computing.
The named entity recognition method based on the deep neural network generally regards named entity recognition as a multi-classification task or a sequence tagging task and can be divided into an input presentation layer, a context coding layer and a label decoding process, wherein the input presentation can be divided into three processes of character level, word level and mixing according to a coding object, and vector presentation of each word can be obtained; semantic coding generally applies a deep neural network, such as a Bidirectional long and short memory neural network, a transform-based Bidirectional Encoder Representation (BERT), a transition learning network, and the like, so that a word vector of each word in a text contains context information; tag decoding is done by a classifier, which usually uses a fully-connected neural network + Softmax layer or a conditional random field + Viterbi algorithm (Viterbi algorithm) to derive the tag for each word.
The sequence-labeled named entity recognition model mostly uses the CRF as a label decoding layer, global optimization is carried out by adding label transfer score matrix parameters and the prediction scores for defining the sequence, but when the number of labels is large, the performance of the CRF is obviously reduced and the time complexity is high. Later, span-based identification methods appeared, and computing the starting and ending positions of entities can solve the entity identification under the condition of entity overlapping, but can not identify discontinuous entities.
In reality, named entity recognition scenes are mostly accompanied by situations of corpus shortage and demand change, for example, situations of demand change require adding one more entity category. One possible approach is to use the trained weights of the model to initialize the weights of the model using the learned knowledge of the model trained under other similar scenario tasks. Most named entity recognition models and final classification layer structures are specially designed for one application scene, and different output dimensions of the classification layers are caused by different entity category numbers. Due to the difference of task scenes, the output dimensionalities of the last layer of the two models are different with high probability, the last layer of the models needs to be discarded, and the learned weight of the previous layer is used as the training starting point of the models.
Disclosure of Invention
In view of the above-mentioned problems, an object of the present invention is to provide an extensible universal end-to-end named entity identification method, which can be applied to entity identification under different task scenarios with different entity types or under different changing conditions without modifying models, and therefore, is very easy to extend and migrate to other fields. After the model is trained in other task scenes, the next scene task training can be directly carried out without discarding the classification layer depending on the task scene finally, so that the knowledge learned from other tasks by the model is utilized. Under the condition of changing the requirements, for example, under the condition of newly adding several types of entities, the model does not need to be modified for retraining, and only the newly added several types of entities need to be provided with training data.
The method and the device are suitable for entity identification under the discontinuous condition and also suitable for entity identification under the entity overlapping condition.
The specific technical scheme for realizing the purpose of the invention is as follows:
an extensible universal end-to-end named entity recognition method comprises the following specific steps:
step 1: the text preprocessing process generates an input sequence, and specifically comprises the following steps:
receiving a text input and an entity type, adding a symbol at the head and the tail of the text respectively, and adding the entity type at the tail of the text;
segmenting the input text with the symbols and the entity categories added at the head and the tail to obtain a word sequence;
mapping the word sequence into numbers, mapping the numbers and the words one by one to meet a bijective relation, and outputting the mapped number sequence as an input sequence;
step 2: constructing a model M, comprising:
receiving an input sequence output in a text preprocessing process by using a context coding layer, generating a word vector group by using a self-attention mechanism, and discarding word vectors corresponding to entity category names;
modeling a directional connection relation between words by using a directional connection diagram, calculating the directional connection diagram between the words by using a word vector group, representing the directional connection diagram by using a matrix, and outputting the directional connection diagram as a diagram represented by the matrix;
and 3, step 3: training a model M;
and 4, step 4: predicting by using the model M;
and 5: and (3) carrying out entity analysis on the output of the model in the step (4), specifically:
and receiving a graph output by the model M, traversing the whole graph from the head, except for a path in which a head symbol is directly connected to a last tail symbol, wherein each word which starts with the head symbol and is corresponding to the path which ends with the last tail symbol is an entity belonging to the category in the path sequence combination, and outputting the entity set which is analyzed.
Step 2, modeling the directed connection relation between the entity words by using the directed connection graph, and calculating the directed connection graph between each word by using the word vector group, wherein the method specifically comprises the following steps:
if a word is the beginning of an entity, establishing a sentence start symbol connected to the word directed edge;
if a word B follows the word A in an entity, establishing a directed edge connected to the word B by the word A;
if a word is the end of an entity, establishing a directed edge connecting the word to a tail symbol;
the words except the head and tail symbols are called intermediate words, and the corresponding word vectors are intermediate word vectors;
calculating the connection relation between the initial symbol and the intermediate word by using the first word vector and the intermediate word vector to represent the probability of starting an entity by the intermediate word;
calculating the connection relation between two words by using the word vectors of any two intermediate words;
calculating the connection relation between the intermediate words and the sentence end symbols by using the end word vectors and the word vectors of the intermediate words, and representing the probability of ending an entity by using the intermediate words;
and after the calculation is finished, obtaining a directed connection graph represented by a matrix between the words.
The training model M specifically comprises the following steps:
acquiring a labeled data set, wherein each piece of data in the data set comprises a text t and a label y, all entity category names contained in the text and a corresponding entity set are recorded in the label y, and if the text t does not contain any type of entity, the label y is empty;
converting the data set to a training set:
defining all entity category names appearing in a data set as a set S, and setting the set S to contain n entity category names; for each piece of data (t, y) in the data set, t is a text, y is a label, for each category S in the set S, if the label y contains an entity of the corresponding category S, that is, the text t contains an entity non-empty set e belonging to the category S, the category S and the corresponding entity set e are used as labels y ', and the text t and the label y' are used as a piece of data in a training set; if the text t does not contain an entity belonging to the category s, taking the category s and a corresponding entity empty set e ' as a label y ', and taking the text t and the label y ' as a piece of data in a training set;
performing multiple rounds of training on the model M using the training set, each round of training comprising:
dividing the data of the training set into a plurality of batches, extracting a batch of data from the training set each time, and generating a true value of an adjacency matrix of the batch of data by using an entity set in a label for each extracted batch of data;
for each piece of data, processing the entity categories in the text and the label in the piece of data by using the text preprocessing process in the step 1 to generate an input sequence;
inputting the input sequence into a model, calculating the connection relation among all words including symbols by the model, and outputting an adjacency matrix;
and finally, calculating loss by using a matrix predicted by the model and a true value matrix generated by the label, and updating the weight of the model according to the loss.
The prediction by using the model M specifically comprises the following steps:
inputting a piece of text to extract entities therein, without including tags and other information;
selecting an entity category to determine an entity category to search for in the text;
inputting the text and the entity type into the text preprocessing process in the step 1 to obtain an input sequence;
the input sequence is input into a model M, which outputs a graph of the adjacency matrix representation.
Compared with the prior art, the invention provides the extensible universal end-to-end named entity identification method which can be suitable for different task scenes without modifying the model, so that the method can be easily transferred to other fields. After the model in the method is trained in other task scenes, a classification layer depending on the task scene at last does not need to be lost, the model does not need to be modified, and the model can be directly used as the model of the next scene task for training, so that the knowledge learned from other tasks by the model can be utilized. Under the condition of changing requirements, such as newly adding several types of entities, the conventional model needs to modify the output of the last layer of classification and retrain, but the model of the method does not need to modify the model for retrain, and only needs to provide training data aiming at the newly added several types of entities. The method utilizes the directed graph to model the connection relation between entity words, is suitable for entity recognition under the discontinuous condition and is also suitable for entity recognition under the entity overlapping condition.
Drawings
FIG. 1 is a block diagram of a named entity recognition model according to an embodiment of the present invention;
FIG. 2 is a text pre-processing process flow diagram of an embodiment of the invention;
FIG. 3 is a flow chart of an embodiment of the present invention for converting a data set to a training set;
FIG. 4 is a training flow diagram of a method of named entity recognition in accordance with an embodiment of the present invention;
FIG. 5 is a flow diagram of a method of named entity identification, according to an embodiment of the invention.
Detailed Description
Examples
The method of the present invention is illustrated below, and for simplicity, the word segmenter treats each Chinese character as a word, using [ S ] and [ E ] as the leading and trailing characters:
the embodiment provides an extensible universal end-to-end named entity identification method, which comprises the following steps:
(1) the text preprocessing program receives two inputs, including an input text and an entity category, as shown in fig. 2, for the input text, adding a symbol [ S ] and [ E ] at the beginning and the end of the text respectively, the entity category is used as a prompt for the entity category to be extracted from the input text by a model, the text and the entity category with the symbol added at the beginning and the end are used, the text input is converted into a word sequence through word segmentation and dictionary mapping, the word sequence is mapped into numbers, the numbers and the words are mapped one by one to meet a bijective relation, and the mapped number sequence is used as an input sequence;
(2) building a model, as shown in fig. 1, the model is divided into an input layer, a context coding layer and a graph modeling layer, in the embodiment, the model uses BERT as the context coding layer, an input sequence processed by a text preprocessing program is input into the context coding layer, each word vector in a word vector group output by the BERT layer of the context coding layer is 768 dimensions, the word vectors are reduced to 64 through a full connection layer, finally, the connection relation between words is calculated by using the word vector group, a Sigmoid function is used as an activation function of the last layer, a directed connection graph represented by an adjacent matrix is modeled, a main diagonal and a part below the main diagonal are set as 0, and only a part above the main diagonal is used;
(3) acquiring a labeled data set, wherein each piece of data in most of the currently disclosed data sets comprises a text t and a label y, all entity categories and corresponding entity sets contained in the text are recorded in the label y, and if the text t does not contain any type of entities, the label y is empty;
(4) as shown in fig. 3, the data set is converted into a training set: defining all entity category names appearing in a data set as a set S, and setting the set S to contain n entity category names; for each piece of data (t, y) in the data set, t is a text, y is a label, for each category S in the set S, if the label y contains an entity of the corresponding category S, that is, the text t contains an entity non-empty set e belonging to the category S, the category S and the corresponding entity set e are used as labels y ', and the text t and the label y' are used as a piece of data in the training set; if the text t does not contain an entity belonging to the category s, taking the category s and a corresponding entity empty set e ' as a label y ', and taking the text t and the label y ' as a piece of data in a training set;
(5) as shown in fig. 4, the set-up model is subjected to multiple rounds of training using a training set, each round of training comprising:
(51) dividing the data of the training set into a plurality of batches, extracting a batch of data from the training set each time, and generating a true value of the adjacency matrix by using an entity set in the label for each extracted data;
(52) processing the piece of data by using a text preprocessing program, adding an [ S ] symbol at the head of the text of the piece of data, adding an [ E ] symbol at the tail of the text, taking out the category name in the label of the piece of data, splicing the category name after the [ E ], performing word segmentation on the text added with the special symbol and the category name, and generating an input sequence;
(53) inputting the input sequence into a model, calculating the connection relation among all words including special characters by the model, and outputting an adjacency matrix;
(54) and finally, calculating the binary cross entropy loss by using a matrix predicted by the model and a true value matrix generated by the label, and updating the weight of the model according to the loss.
(6) After model training is finished, the model is used for prediction, and the specific process is as follows:
(61) selecting a piece of text to extract an entity therein;
(62) selecting an entity category to determine an entity category to search for in the text, which may be an entity category that has not been present in the training set;
(63) inputting the text and the entity type into a text preprocessing process to obtain an input sequence;
(64) inputting the input sequence into a model M;
(65) the entities identified in the graph are parsed using an entity parsing process.
Through the steps (1) to (5), the model is trained, when entity recognition is carried out on a text, each entity type in the entity type set needs to be recognized once in sequence, the model only gives the entities belonging to the category each time, and the entity contained in the text and recognized by the model can be obtained by analyzing the path of the graph output by the model.
The invention also takes the category of the named entity as a part of input, and aims to extract the entity belonging to the category in the text so as to input the text' Xiaoming is on A soft work and Xiaohong is on B song work. "for example, adding a special character of beginning and end [ S]And [ E]Then, the "company" is spliced after the text as the category name so that the text becomes "[ S ]]Xiaoming is on A soft work and Xiaohong is on B song work. [ E ]]The company "," soft a "and" song B "are entities belonging to the category of the company that this text contains, and the model is targeted to output" soft a "and" song B ". Starting with 0 as the index, "[ S ]]Xiaoming is on A soft work and Xiaohong is on B song work. [ E ]]"in," A "and" soft "at positions 4 and 5, respectively," B "and" Song "at positions 12 and 13, respectively," [ S]"and" [ E ]]"two special characters are at position 0 and position 17, respectively, and the adjacency matrix E is 17Matrix of 17 dimensions, E 0,4 、E 4,5 、E 5,17 Has a value of 1, E 0,12 、E 12,13 、E 13,17 Corresponds to 1, E 0,4 、E 0,12 Denotes the beginning words with "A" and "B", respectively, E 5,17 And E 13,17 Indicating "soft" and "song" as ending words of an entity, and E 4,5 And E 12,13 The next word to represent "a" is "soft" and the next word to represent "B" is "song", respectively. The first row of the matrix records the beginning words of the entities and the last column records the ending words of the entities. Although "Xiaoming" and "Xiaohong" belong to the category of the name of a person, in the case where the category of "company" is input, these two words should not appear in the output graph.
It can be seen from the above examples that the connection between modeling words can easily cope with the overlapping and non-continuous situations of entities, where entities with overlapping parts include a common path in the graph, and if the entities are non-continuous, even if two words are separated by a distance, there is a connection relationship, and the entities in the graph output by the model are paths starting with "[ S ]" and ending with "[ E ]".
In the case of using the category information as part of the input, categories that do not appear in the training set may be merged as category information at the end of the text and input into the model, and in the case of training the model using the clenenr 2020 dataset, the dataset contains 10 categories, each of which is: name, address, organization, company, government, book, game, movie, job, sight. The big grassland is marked with a plurality of animals, such as lions, tigers and antelopes. [E] The animal is input, and the model correctly outputs lion, tiger and antelope, which shows that the model can predict unseen entity classes by using the knowledge trained and learned on other tasks.
It should be understood by those skilled in the art that various features of the above-described embodiments can be combined in any combination, and for the sake of brevity, all possible combinations of features in the above-described embodiments are not described in detail, but rather, all combinations of features which are not inconsistent with each other should be construed as being within the scope of the present disclosure.
The above examples are merely illustrative of several embodiments of the present invention, and the description thereof is more specific and detailed, but not to be construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the inventive concept, which falls within the scope of the present invention. Therefore, the protection scope of the present invention should be subject to the appended claims.

Claims (4)

1. An extensible universal end-to-end named entity recognition method is characterized by comprising the following specific steps:
step 1: the text preprocessing process generates an input sequence, and specifically comprises the following steps:
receiving a text input and an entity type, adding a symbol at the head and the tail of the text respectively, and adding the entity type at the tail of the text;
segmenting the input text with the symbols and the entity categories added at the head and the tail to obtain a word sequence;
mapping the word sequence into numbers, mapping the numbers and the words one by one to meet a bijective relation, and outputting the mapped number sequence as an input sequence;
step 2: constructing a model M, comprising:
receiving an input sequence output in a text preprocessing process by using a context coding layer, generating a word vector group by using a self-attention mechanism, and discarding a word vector corresponding to an entity class name;
modeling a directional connection relation between words by using a directional connection diagram, calculating the directional connection diagram between the words by using a word vector group, representing the directional connection diagram by using a matrix, and outputting the directional connection diagram as a diagram represented by the matrix;
and step 3: training a model M;
and 4, step 4: predicting by using the model M;
and 5: and (3) carrying out entity analysis on the output of the model in the step (4), specifically:
and receiving a graph output by the model M, traversing the whole graph from the head, except for a path in which a head symbol is directly connected to a last tail symbol, wherein each word which starts with the head symbol and is corresponding to the path which ends with the last tail symbol is an entity belonging to the category in the path sequence combination, and outputting the entity set which is analyzed.
2. The method for identifying generic end-to-end named entities according to claim 1, wherein step 2 is to model the directed connection relationship between entity words by using a directed connection graph, and calculate the directed connection graph between words by using a word vector group, specifically:
if a word is the beginning of an entity, establishing a period head symbol connected to the word directed edge;
if a word B follows the word A in an entity, establishing a directed edge connected to the word B by the word A;
if a word is the end of an entity, establishing a directed edge connecting the word to a tail symbol;
the words except the head and tail symbols are called intermediate words, and the corresponding word vectors are intermediate word vectors;
calculating the connection relation between the initial symbol and the intermediate word by using the first word vector and the intermediate word vector to represent the probability of starting an entity by the intermediate word;
calculating the connection relation between two words by using the word vectors of any two intermediate words;
calculating the connection relation between the intermediate words and the sentence end symbols by using the end word vectors and the word vectors of the intermediate words, and representing the probability of ending an entity by using the intermediate words;
and after the calculation is finished, obtaining a directed connection graph represented by a matrix between the words.
3. The method for identifying generic end-to-end named entities as claimed in claim 1, wherein the training model M is specifically:
acquiring a labeled data set, wherein each piece of data in the data set comprises a text t and a label y, all entity categories and corresponding entity sets contained in the text are recorded in the label y, and if the text t does not contain any type of entity, the label y is empty;
converting the data set to a training set:
defining all entity category names appearing in a data set as a set S, and setting the set S to contain n entity category names; for each piece of data (t, y) in the data set, t is a text, y is a label, for each category S in the set S, if the label y contains an entity of the corresponding category S, that is, the text t contains an entity non-empty set e belonging to the category S, the category S and the corresponding entity set e are used as labels y ', and the text t and the label y' are used as a piece of data in the training set; if the text t does not contain an entity belonging to the category s, taking the category s and a corresponding entity empty set e ' as a label y ', and taking the text t and the label y ' as a piece of data in a training set;
performing multiple rounds of training on the model M using the training set, each round of training comprising:
dividing the data of the training set into a plurality of batches, extracting one batch of data from the training set each time, and generating a true value of an adjacency matrix of the batch of data by using an entity set in a label for each extracted batch of data;
for each piece of data, processing the text in the piece of data and the entity category in the label by using the text preprocessing process in the step 1 to generate an input sequence;
inputting the input sequence into a model, calculating the connection relation among all words including symbols by the model, and outputting an adjacency matrix;
and finally, calculating loss by using a matrix predicted by the model and a true value matrix generated by the label, and updating the weight of the model according to the loss.
4. The method for generic end-to-end named entity recognition according to claim 1, wherein the prediction is performed using a model M, specifically:
inputting a piece of text to extract entities therein, without including tags and other information;
selecting an entity category to determine an entity category to search in the text;
inputting the text and the entity type into the text preprocessing process in the step 1 to obtain an input sequence;
the input sequence is input into a model M, which outputs a graph of the adjacency matrix representation.
CN202210617397.3A 2022-06-01 2022-06-01 Extensible universal end-to-end named entity identification method Pending CN115130475A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210617397.3A CN115130475A (en) 2022-06-01 2022-06-01 Extensible universal end-to-end named entity identification method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210617397.3A CN115130475A (en) 2022-06-01 2022-06-01 Extensible universal end-to-end named entity identification method

Publications (1)

Publication Number Publication Date
CN115130475A true CN115130475A (en) 2022-09-30

Family

ID=83378459

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210617397.3A Pending CN115130475A (en) 2022-06-01 2022-06-01 Extensible universal end-to-end named entity identification method

Country Status (1)

Country Link
CN (1) CN115130475A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115438658A (en) * 2022-11-08 2022-12-06 浙江大华技术股份有限公司 Entity recognition method, recognition model training method and related device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115438658A (en) * 2022-11-08 2022-12-06 浙江大华技术股份有限公司 Entity recognition method, recognition model training method and related device

Similar Documents

Publication Publication Date Title
CN110852087B (en) Chinese error correction method and device, storage medium and electronic device
CN107729309B (en) Deep learning-based Chinese semantic analysis method and device
CN111222305B (en) Information structuring method and device
CN109960728B (en) Method and system for identifying named entities of open domain conference information
CN110134946B (en) Machine reading understanding method for complex data
CN112231447B (en) Method and system for extracting Chinese document events
CN109543181B (en) Named entity model and system based on combination of active learning and deep learning
CN110196982B (en) Method and device for extracting upper-lower relation and computer equipment
CN109684642B (en) Abstract extraction method combining page parsing rule and NLP text vectorization
CN107797987B (en) Bi-LSTM-CNN-based mixed corpus named entity identification method
CN111709242B (en) Chinese punctuation mark adding method based on named entity recognition
CN112149421A (en) Software programming field entity identification method based on BERT embedding
CN111274804A (en) Case information extraction method based on named entity recognition
CN112966525B (en) Law field event extraction method based on pre-training model and convolutional neural network algorithm
CN113591483A (en) Document-level event argument extraction method based on sequence labeling
CN115292463B (en) Information extraction-based method for joint multi-intention detection and overlapping slot filling
CN109983473B (en) Flexible integrated recognition and semantic processing
CN107977353A (en) A kind of mixing language material name entity recognition method based on LSTM-CNN
CN111159485A (en) Tail entity linking method, device, server and storage medium
CN114282527A (en) Multi-language text detection and correction method, system, electronic device and storage medium
CN113128203A (en) Attention mechanism-based relationship extraction method, system, equipment and storage medium
CN113742733B (en) Method and device for extracting trigger words of reading and understanding vulnerability event and identifying vulnerability type
CN114239574A (en) Miner violation knowledge extraction method based on entity and relationship joint learning
CN113065349A (en) Named entity recognition method based on conditional random field
CN115544303A (en) Method, apparatus, device and medium for determining label of video

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination