CN114298047A - Chinese named entity recognition method and system based on stroke volume and word vector - Google Patents

Chinese named entity recognition method and system based on stroke volume and word vector Download PDF

Info

Publication number
CN114298047A
CN114298047A CN202111641955.1A CN202111641955A CN114298047A CN 114298047 A CN114298047 A CN 114298047A CN 202111641955 A CN202111641955 A CN 202111641955A CN 114298047 A CN114298047 A CN 114298047A
Authority
CN
China
Prior art keywords
stroke
vector
word
character
chinese character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111641955.1A
Other languages
Chinese (zh)
Inventor
何东之
张震
王鹏飞
孙亚茹
郭隆杭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Technology
Original Assignee
Beijing University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Technology filed Critical Beijing University of Technology
Priority to CN202111641955.1A priority Critical patent/CN114298047A/en
Publication of CN114298047A publication Critical patent/CN114298047A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Document Processing Apparatus (AREA)

Abstract

The invention provides a Chinese named entity recognition method and a system based on stroke volume and word vector, which relate to the technical field of named entity recognition and comprise the following steps: acquiring a stroke sequence corresponding to each Chinese character in the text and a character feature vector of each Chinese character; inputting the stroke sequence into a stroke convolution neural network to obtain a stroke feature vector; setting a sliding window according to the maximum length of an entity in the text, and acquiring a word vector of each word in the sliding window through a self-attention mechanism; splicing the stroke characteristic vector, the word vector and the character characteristic vector of each Chinese character in the text, inputting the stroke characteristic vector, the word vector and the character characteristic vector into a BilSTM network, and acquiring the score of each Chinese character corresponding to each entity label; and determining an optimal entity label for each Chinese character in the text by adopting a CRF model. The method considers the influence of the stroke sequence of the Chinese character on the Chinese character, combines the stroke characteristic vector, the word characteristic vector and the character characteristic vector of the Chinese character, and then carries out named entity recognition, thereby improving the effect of named entity recognition.

Description

Chinese named entity recognition method and system based on stroke volume and word vector
Technical Field
The invention relates to the technical field of named entity recognition, in particular to a Chinese named entity recognition method and system based on stroke volume and word vectors.
Background
With the rapid development of internet technology, unstructured data is growing continuously, and the world is in a massive unstructured data era. How to efficiently manage data and extract effective information from unstructured data becomes a problem which needs to be solved urgently.
The purpose of Named Entity Recognition (NER) is to identify defined Named entities from unstructured text, such as person names, place names, organization names, etc., which are the basic core tasks for information retrieval and information extraction. The Chinese NER is a division of the NER in the Chinese field, and still has a plurality of problems due to the characteristics of Chinese characters. The main difficulties of Chinese NER are the following: 1) chinese characters usually have a word ambiguity, and in different text contexts, the meanings may be greatly different; 2) the Chinese text does not have obvious entity boundary identifiers such as spaces and the like in similar English texts; 3) the research of Chinese NER starts late, related labeled data sets are few, and the problems of single field exist.
The existing Chinese named entity recognition usually has two methods, namely a word-based sequence labeling method and a character-based sequence labeling method. A word-based labeling method firstly utilizes a word segmentation tool to segment a text, and then entity recognition is carried out, the word boundary of the method is also an entity boundary, and if errors occur in the word segmentation stage, the subsequent NER model cannot correctly recognize the entity. The word-based sequence labeling method generally has the condition of insufficient semantics, so people mainly consider how to better utilize word information, some appliers introduce external vocabulary information on the basis of the word-based sequence labeling method and integrate the external vocabulary information into word vector representation on an input layer, so that the model is changed, meanwhile, the introduction of the external word vector also causes the model training efficiency to be lower, and finally, the accuracy of named entity recognition is reduced; some applications establish an ElMo model based on stroke sequences only on the basis of a word-based sequence labeling method, and have defects in the aspects of effectiveness and accuracy of named entity identification.
Disclosure of Invention
In order to solve the problems, the invention provides a Chinese named entity recognition method and a Chinese named entity recognition system based on stroke volume and word vectors.
In order to achieve the above object, the present invention provides a method for identifying a named entity in chinese based on stroke volume and word vector, comprising:
acquiring a stroke sequence corresponding to each Chinese character in the text and a character feature vector of each Chinese character;
inputting the stroke sequence into a stroke convolution neural network to obtain a stroke feature vector;
setting a sliding window according to the maximum length of the entity in the text, and acquiring a word vector of each word in the sliding window through a self-attention mechanism;
splicing the stroke feature vector, the word vector and the character feature vector of each Chinese character in the text, and inputting the stroke feature vector, the word vector and the character feature vector into a BilSTM network to obtain the score of each Chinese character corresponding to each entity label;
and determining an optimal entity label for each Chinese character in the text by adopting a CRF model.
As a further improvement of the invention, a mapping table from Chinese characters to stroke sequences is constructed, and the stroke sequences corresponding to the Chinese characters are obtained through the mapping table.
As a further improvement of the present invention, the stroke convolution neural network convolves the stroke sequence by convolution kernels of different window sizes to obtain the stroke feature vector.
As a further improvement of the invention, the stroke convolution neural network obtains the stroke feature graph through convolution kernel convolution with different window sizes, and performs maximum pooling and full connection on the feature graph to obtain the stroke feature vector, wherein the formula is as follows:
Figure BDA0003444023500000021
wherein:
w represents weights in convolutional neural network training;
Mt,t+k-1a feature representing an input;
b represents the bias in the convolutional neural network training;
as a further improvement of the invention, a classification loss function L (cls) is added in the stroke convolution neural network training process:
L(cls)=-logP(z|X)=-logsoftmax(w*semb)
wherein the content of the first and second substances,
x represents an input stroke sequence;
z represents a Chinese label corresponding to the stroke sequence;
w represents a parameter in the network;
semb represents the stroke feature vector.
As a further improvement of the present invention, the obtaining, by a self-attention mechanism, a word vector of each word within the sliding window; the method comprises the following steps:
calculating the similarity between every two words in the sliding window through the self-attention mechanism;
and acquiring word vector quantity of each word in the sliding window according to the similarity by adopting a softmax function.
As a further improvement of the present invention,
for each Chinese character in the sliding window, generating a corresponding Query vector, a corresponding Key vector and a corresponding Value vector according to the character feature vector;
and calculating the dot product of the Query vector and the Key vector to obtain the score of each word, and multiplying the score by the Value vector of each word to obtain the word vector of the word in the sliding window.
As a further improvement of the present invention, the CRF model is used to determine an optimal entity tag for each chinese character in the text; the method comprises the following steps:
defining the character sequence of the input text as x ═ x (x)1,x2,...,xn) The predicted tag sequence is y ═ y (y)1,y2,…,yn);
Definition of
Figure BDA0003444023500000031
Is the ith word output by the BilSTM network model and is marked as a label yiA predicted score of (d);
defining a label transfer matrix
Figure BDA0003444023500000032
Wherein
Figure BDA0003444023500000033
Represents a score converted from label yi to label yi + 1;
by passing
Figure BDA0003444023500000034
Calculating a final score for each of the predicted tag sequences;
and taking the predicted tag sequence with the highest score as a final tag sequence, and acquiring the Chinese named entity according to the tag.
As a further improvement of the present invention,
calculating the conditional probability of each of said predicted tag sequences
Figure BDA0003444023500000035
And if the conditional probability of the predicted tag sequence with the highest score is also the highest, taking the predicted tag sequence with the highest score as the final tag sequence.
The invention also provides a Chinese named entity recognition system based on stroke volume and word vector, which comprises a pre-preparation module, a stroke characteristic acquisition module, a word vector acquisition module, a label prediction module and an optimal label acquisition module;
the pre-preparation module is configured to:
acquiring a stroke sequence corresponding to each Chinese character in the text and a character feature vector of each Chinese character;
the stroke characteristic acquisition module is used for:
inputting the stroke sequence into a stroke convolution neural network to obtain a stroke feature vector;
the word vector acquisition module is configured to:
setting a sliding window according to the maximum length of the entity in the text, and acquiring a word vector of each word in the sliding window through a self-attention mechanism;
the label prediction module is configured to:
splicing the stroke feature vector, the word vector and the character feature vector of each Chinese character in the text, and inputting the stroke feature vector, the word vector and the character feature vector into a BilSTM network to obtain the score of each Chinese character corresponding to each entity label;
the best tag obtaining module is configured to:
and determining an optimal entity label for each Chinese character in the text by adopting a CRF model.
Compared with the prior art, the invention has the beneficial effects that:
the invention considers the influence of the stroke sequence of the Chinese character on the basis of the character-based sequence labeling method in the named entity recognition method, combines the stroke characteristic vector, the word characteristic vector and the character characteristic vector of the Chinese character, and then performs the named entity recognition, thereby improving the effect of the named entity recognition.
In the process of obtaining the stroke feature vector, the method extracts the stroke feature vector of the Chinese character by adopting a convolution method, and the convolution method is more suitable for the number range of strokes of the Chinese character; meanwhile, a convolution core with the size of multiple windows is selected in the convolution process to perform convolution on the stroke sequence, and the most effective stroke feature vector is obtained.
In the process of solving the word feature vector of the Chinese character, the word vector information in the sliding window is obtained through a self-attention mechanism, so that the defect of semantics is overcome, and the condition that the prediction accuracy is reduced under the condition of introducing external words in the prior art is avoided.
In the stroke convolution neural network training process, the classification loss function is added, so that the stroke convolution neural network training accuracy is improved.
Drawings
FIG. 1 is a flow chart of a method for identifying a named entity in Chinese based on stroke volume and word vector according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a system for identifying a named entity in Chinese based on stroke volume and word vector according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a stroke convolution neural network according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a model of a self-attention mechanism according to an embodiment of the present disclosure;
FIG. 5 is a schematic diagram of a bidirectional timing model and a CRF model according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The invention is described in further detail below with reference to the attached drawing figures:
as shown in fig. 1, the method for identifying a named entity in chinese based on stroke volume and word vector provided by the present invention includes:
s1, acquiring a stroke sequence corresponding to each Chinese character in the text and a character feature vector of each Chinese character;
wherein the content of the first and second substances,
and acquiring the stroke sequence of each Chinese character in a training set in the training process through a Chinese dictionary website, constructing a mapping table from the Chinese character to the stroke sequence, and acquiring the stroke sequence corresponding to each Chinese character in the text through the mapping table.
For example: as shown in FIG. 3, the stroke sequence obtained from the mapping table is "left-falling stroke")
Figure RE-GDA0003528531880000052
Fold-back is one.
S2, inputting the stroke sequence into a stroke convolution neural network to obtain a stroke feature vector;
wherein the content of the first and second substances,
as shown in fig. 3, the stroke convolution neural network convolves the stroke sequence through convolution kernels of different window sizes, obtains a stroke feature map after convolution of the stroke convolution neural network, performs maximum pooling and full connection on the feature map to obtain a stroke feature vector, and has the formula:
Figure BDA0003444023500000051
wherein:
w represents weights in convolutional neural network training;
Mt,t+k-1a feature representing an input;
b represents the bias in the convolutional neural network training;
in the invention, a classification loss function is added in the stroke convolution neural network training process to improve the training accuracy, and the classification loss function is expressed as follows:
L(cls)=-logP(z|X)=-logsoftmax(w*semb)
wherein the content of the first and second substances,
x represents an input stroke sequence;
z represents a Chinese label corresponding to the stroke sequence;
w represents a parameter in the network;
semb represents the stroke feature vector.
S3, setting a sliding window according to the maximum length of an entity in the text, and acquiring a word vector of each word in the sliding window through a self-attention mechanism;
wherein the content of the first and second substances,
the word-based sequence labeling method generally has the problem of insufficient semantics, and in order to better utilize word vector information, the SA mechanism (self-attention mechanism) is used to acquire the word vector information in a sliding window to solve the problem.
Acquiring the maximum length of an entity in a training set in the training process, taking the maximum length as a sliding window, and calculating the similarity between every two characters in the sliding window through a self-attention mechanism; and then, a softmax function is adopted to obtain a word vector of each word in the sliding window according to the similarity.
Specifically, for each Chinese character in the sliding window, generating a corresponding Query vector, a corresponding Key vector and a corresponding Value vector according to the character feature vector;
and calculating the dot product of the Query vector and the Key vector to obtain the score of each word, and multiplying the score by the Value vector of each word to obtain the word vector of the word in the sliding window.
For example:
as shown in fig. 4, if the text content is "beijing city", e1、e2、e3Respectively corresponding to the character feature vectors of each word, and generating a Query vector, a Key vector and a Value vector for each word, wherein the vectors are the character feature vectors e corresponding to each word1、e2、e3Multiplying by three weight matrixes created in the training process; calculating a score corresponding to each word through a dot product between the Query vector and the Key vector, and then multiplying the score and the corresponding Value vector to obtain a word vector corresponding to each word in the sliding window, wherein the formula is as follows:
Figure BDA0003444023500000071
s4, splicing the stroke feature vectors, word vectors and character feature vectors of all Chinese characters in the text, and inputting the stroke feature vectors, word vectors and character feature vectors into a BilSTM network to obtain the score of each Chinese character corresponding to each entity label;
wherein the content of the first and second substances,
the splicing is a direct splicing of vector dimensions, and if the stroke feature vector of a certain Chinese character can be represented as 1 × 20, the word vector can be represented as 1 × 30, and the character feature vector can be represented as 1 × 60, the spliced feature vector 1 × 110 can be obtained after the splicing.
The BilSTM (Bi-directional Long Short-Term Memory) is a bidirectional Long-time and Short-time Memory network; the LSTM (Long Short-Term Memory) is a Long-Short time Memory network, is an improved time sequence network, solves the problem of gradient information, realizes effective utilization of Long-distance information, can only acquire unidirectional time sequence information, but has important influence on NER (named entity identification) tasks by context information, and therefore, the application adopts the BilSTM network to acquire the context information;
as shown in fig. 5, taking "beijing smith" as an example, the score of each word corresponding to multiple labels is obtained through forward LSTM calculation and reverse LSTM calculation, where the labels are preset, and the method may include: address, time, person name, book name, etc.
And S5, determining an optimal entity label for each Chinese character in the text by adopting a CRF model.
Wherein the content of the first and second substances,
due to the strong constraint relationship between adjacent tags in the NER task, for example, after the B-LOC tag (the start tag of the address), the tag can only be an I-LOC tag or an O tag, but cannot be other tags such as a B-PER tag (the start tag of the name of a person). Therefore, after sequence modeling by the BiLSTM network, Conditional Random Field (CRF) is used herein to predict the tags of the entire sequence, specifically:
defining the character sequence of the input text as x ═ x (x)1,x2,...,xn) The predicted tag sequence is y ═ y (y)1,y2,...,yn) (ii) a Y (x) represents the set of all possible tag sequences for the text;
definition of
Figure BDA0003444023500000072
Is the ith character mark output by the BilSTM network modelNote as label yiA predicted score of (d);
defining a label transfer matrix
Figure BDA0003444023500000073
Wherein
Figure BDA0003444023500000074
Represents a score converted from label yi to label yi + 1;
by passing
Figure BDA0003444023500000075
Calculating a final score for each predicted tag sequence;
and taking the predicted tag sequence with the highest score as a final tag sequence, and acquiring the Chinese named entity according to the tag.
Further, in the above-mentioned case,
a loss function may be set, such as:
calculating the conditional probability of each predicted tag sequence
Figure BDA0003444023500000081
And if the conditional probability of the predicted tag sequence with the highest score is also the maximum, taking the predicted tag sequence with the highest score as the final tag sequence.
Finally, the optimal label sequence is found through a Viterbi algorithm, and the formula is as follows:
Figure BDA0003444023500000082
as shown in fig. 2, the present invention further provides a chinese named entity recognition system based on stroke convolution kernel word vectors, which includes a pre-preparation module, a stroke feature acquisition module, a word vector acquisition module, a label prediction module, and an optimal label acquisition module;
a pre-preparation module to:
acquiring a stroke sequence corresponding to each Chinese character in the text and a character feature vector of each Chinese character;
a stroke characteristic acquisition module for:
inputting the stroke sequence into a stroke convolution neural network to obtain a stroke feature vector;
a word vector acquisition module to:
setting a sliding window according to the maximum length of an entity in the text, and acquiring a word vector of each word in the sliding window through a self-attention mechanism;
a label prediction module to:
splicing the stroke characteristic vector, the word vector and the character characteristic vector of each Chinese character in the text, inputting the stroke characteristic vector, the word vector and the character characteristic vector into a BilSTM network, and acquiring the score of each Chinese character corresponding to each entity label;
a best label acquisition module to:
and determining an optimal entity label for each Chinese character in the text by adopting a CRF model.
The invention has the advantages that:
the invention considers the influence of the stroke sequence of the Chinese character on the basis of the character-based sequence labeling method in the named entity recognition method, combines the stroke characteristic vector, the word characteristic vector and the character characteristic vector of the Chinese character, and then performs the named entity recognition, thereby improving the effect of the named entity recognition.
In the process of obtaining the stroke feature vector, the method extracts the stroke feature vector of the Chinese character by adopting a convolution method, and the convolution method is more suitable for the number range of strokes of the Chinese character; meanwhile, a convolution core with the size of multiple windows is selected in the convolution process to perform convolution on the stroke sequence, and the most effective stroke feature vector is obtained.
In the process of solving the word feature vector of the Chinese character, the word vector information in the sliding window is obtained through a self-attention mechanism, so that the defect of semantics is overcome, and the condition that the prediction accuracy is reduced under the condition of introducing external words in the prior art is avoided.
In the stroke convolution neural network training process, the classification loss function is added, so that the stroke convolution neural network training accuracy is improved.
The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (10)

1. The Chinese named entity recognition method based on stroke volume and word vector is characterized by comprising the following steps:
acquiring a stroke sequence corresponding to each Chinese character in the text and a character feature vector of each Chinese character;
inputting the stroke sequence into a stroke convolution neural network to obtain a stroke feature vector;
setting a sliding window according to the maximum length of the entity in the text, and acquiring a word vector of each word in the sliding window through a self-attention mechanism;
splicing the stroke feature vector, the word vector and the character feature vector of each Chinese character in the text, and inputting the stroke feature vector, the word vector and the character feature vector into a BilSTM network to obtain the score of each Chinese character corresponding to each entity label;
and determining an optimal entity label for each Chinese character in the text by adopting a CRF model.
2. The method of claim 1, wherein the method comprises: and constructing a mapping table from the Chinese characters to the stroke sequences, and acquiring the stroke sequences corresponding to the Chinese characters through the mapping table.
3. The method of claim 1, wherein the method comprises: and the stroke convolution neural network performs convolution on the stroke sequence through convolution cores with different window sizes to obtain the stroke feature vector.
4. The method of claim 3, wherein the method comprises: the stroke convolution neural network obtains a stroke feature graph through convolution kernel convolution of different window sizes, performs maximum pooling and full connection on the feature graph to obtain a stroke feature vector, and the formula is as follows:
Figure FDA0003444023490000011
wherein:
w represents weights in convolutional neural network training;
Mt,t+k-1a feature representing an input;
b represents the bias in the convolutional neural network training.
5. The method of claim 1, wherein the method comprises: adding a classification loss function L (cls) in the stroke convolution neural network training process:
L(cls)=-log P(z|X)=-log softmax(w*semb)
wherein the content of the first and second substances,
x represents an input stroke sequence;
z represents a Chinese label corresponding to the stroke sequence;
w represents a parameter in the network;
semb represents the stroke feature vector.
6. The method of claim 1, wherein the method comprises: acquiring a word vector of each word in the sliding window through a self-attention mechanism; the method comprises the following steps:
calculating the similarity between every two words in the sliding window through the self-attention mechanism;
and acquiring a word vector of each word in the sliding window according to the similarity by adopting a soffmax function.
7. The method of claim 6, wherein the method comprises:
for each Chinese character in the sliding window, generating a corresponding Query vector, a corresponding Key vector and a corresponding Value vector according to the character feature vector;
and calculating the dot product of the Query vector and the Key vector to obtain the score of each word, and multiplying the score by the Value vector of each word to obtain the word vector of the word in the sliding window.
8. The method for identifying named entities as claimed in claim 1, wherein the CRF model is used to determine an optimal entity label for each Chinese character in the text; the method comprises the following steps:
defining the character sequence of the input text as x ═ x (x)1,x2,...,xn) The predicted tag sequence is y ═ y (y)1,y2,...,yn);
Definition of
Figure FDA0003444023490000023
Is the ith word output by the BilSTM network model and is marked as a label yiA predicted score of (a);
defining a label transfer matrix
Figure FDA0003444023490000024
Wherein
Figure FDA0003444023490000025
Represents a score converted from label yi to label yi + 1;
by passing
Figure FDA0003444023490000021
Calculating a final score for each of the predicted tag sequences;
and taking the predicted tag sequence with the highest score as a final tag sequence, and acquiring the Chinese named entity according to the tag.
9. The method of claim 8, wherein the method comprises:
calculating the conditional probability of each of said predicted tag sequences
Figure FDA0003444023490000022
And if the conditional probability of the predicted tag sequence with the highest score is also the maximum, taking the predicted tag sequence with the highest score as the final tag sequence.
10. A system for implementing the method for identifying a named entity in chinese according to any one of claims 1 to 9, comprising a pre-preparation module, a stroke feature acquisition module, a word vector acquisition module, a label prediction module, and an optimal label acquisition module;
the pre-preparation module is configured to:
acquiring a stroke sequence corresponding to each Chinese character in the text and a character feature vector of each Chinese character;
the stroke characteristic acquisition module is used for:
inputting the stroke sequence into a stroke convolution neural network to obtain a stroke feature vector;
the word vector acquisition module is configured to:
setting a sliding window according to the maximum length of the entity in the text, and acquiring a word vector of each word in the sliding window through a self-attention mechanism;
the label prediction module is configured to:
splicing the stroke feature vector, the word vector and the character feature vector of each Chinese character in the text, and inputting the stroke feature vector, the word vector and the character feature vector into a BilSTM network to obtain the score of each Chinese character corresponding to each entity label;
the best tag obtaining module is configured to:
and determining an optimal entity label for each Chinese character in the text by adopting a CRF model.
CN202111641955.1A 2021-12-29 2021-12-29 Chinese named entity recognition method and system based on stroke volume and word vector Pending CN114298047A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111641955.1A CN114298047A (en) 2021-12-29 2021-12-29 Chinese named entity recognition method and system based on stroke volume and word vector

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111641955.1A CN114298047A (en) 2021-12-29 2021-12-29 Chinese named entity recognition method and system based on stroke volume and word vector

Publications (1)

Publication Number Publication Date
CN114298047A true CN114298047A (en) 2022-04-08

Family

ID=80972401

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111641955.1A Pending CN114298047A (en) 2021-12-29 2021-12-29 Chinese named entity recognition method and system based on stroke volume and word vector

Country Status (1)

Country Link
CN (1) CN114298047A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114757184A (en) * 2022-04-11 2022-07-15 中国航空综合技术研究所 Method and system for realizing knowledge question answering in aviation field

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114757184A (en) * 2022-04-11 2022-07-15 中国航空综合技术研究所 Method and system for realizing knowledge question answering in aviation field
CN114757184B (en) * 2022-04-11 2023-11-10 中国航空综合技术研究所 Method and system for realizing knowledge question and answer in aviation field

Similar Documents

Publication Publication Date Title
WO2021147726A1 (en) Information extraction method and apparatus, electronic device and storage medium
CN109753660B (en) LSTM-based winning bid web page named entity extraction method
CN111160031A (en) Social media named entity identification method based on affix perception
CN112541355B (en) Entity boundary type decoupling few-sample named entity recognition method and system
CN108287911B (en) Relation extraction method based on constrained remote supervision
CN113591483A (en) Document-level event argument extraction method based on sequence labeling
CN113392209B (en) Text clustering method based on artificial intelligence, related equipment and storage medium
CN111738169B (en) Handwriting formula recognition method based on end-to-end network model
CN111475622A (en) Text classification method, device, terminal and storage medium
CN114330354B (en) Event extraction method and device based on vocabulary enhancement and storage medium
CN113177412A (en) Named entity identification method and system based on bert, electronic equipment and storage medium
CN110276396B (en) Image description generation method based on object saliency and cross-modal fusion features
CN108509423A (en) A kind of acceptance of the bid webpage name entity abstracting method based on second order HMM
CN114091450A (en) Judicial domain relation extraction method and system based on graph convolution network
CN115565177A (en) Character recognition model training method, character recognition device, character recognition equipment and medium
CN113076758B (en) Task-oriented dialog-oriented multi-domain request type intention identification method
CN113191150B (en) Multi-feature fusion Chinese medical text named entity identification method
CN113360654B (en) Text classification method, apparatus, electronic device and readable storage medium
CN114417874A (en) Chinese named entity recognition method and system based on graph attention network
CN112699685B (en) Named entity recognition method based on label-guided word fusion
CN114298047A (en) Chinese named entity recognition method and system based on stroke volume and word vector
CN113076744A (en) Cultural relic knowledge relation extraction method based on convolutional neural network
Li et al. Review network for scene text recognition
CN115186670B (en) Method and system for identifying domain named entities based on active learning
CN111737470A (en) Text classification method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination