CN117592482A - Operation ticket naming entity identification method based on BiLSTM+CRF model - Google Patents

Operation ticket naming entity identification method based on BiLSTM+CRF model Download PDF

Info

Publication number
CN117592482A
CN117592482A CN202311556835.0A CN202311556835A CN117592482A CN 117592482 A CN117592482 A CN 117592482A CN 202311556835 A CN202311556835 A CN 202311556835A CN 117592482 A CN117592482 A CN 117592482A
Authority
CN
China
Prior art keywords
operation ticket
crf
lstm
model
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311556835.0A
Other languages
Chinese (zh)
Inventor
笪涛
马海涛
朱江渝
刘小荷
侯超
马骏毅
丁瑾
吴林
张佳
吴昊
王支奎
袁立刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Ruiyuan Electric Power Technology Co ltd
State Grid Jiangsu Electric Power Co ltd Zhenjiang Power Supply Branch
Original Assignee
Nanjing Ruiyuan Electric Power Technology Co ltd
State Grid Jiangsu Electric Power Co ltd Zhenjiang Power Supply Branch
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Ruiyuan Electric Power Technology Co ltd, State Grid Jiangsu Electric Power Co ltd Zhenjiang Power Supply Branch filed Critical Nanjing Ruiyuan Electric Power Technology Co ltd
Priority to CN202311556835.0A priority Critical patent/CN117592482A/en
Publication of CN117592482A publication Critical patent/CN117592482A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • G06N3/0442Recurrent networks, e.g. Hopfield networks characterised by memory or gating, e.g. long short-term memory [LSTM] or gated recurrent units [GRU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Evolutionary Computation (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Economics (AREA)
  • Public Health (AREA)
  • Water Supply & Treatment (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses an operation ticket naming entity identification method based on BiLSTM+CRF model, comprising the following steps: 1) Preparing a corpus; 2) Preprocessing data; 3) Performing data marking by adopting a BIEO marking method; 4) Sentence segmentation: sentence segmentation is carried out according to punctuation marks; 5) Splitting the marked sentences into a word list and a corresponding marked list; 6) Constructing a vocabulary, and converting the operation ticket text data into a digital representation which can be understood and processed by the LSTM+CRF model; 7) Counting an operation ticket vocabulary and entity labels, constructing a corresponding dictionary, and mapping words into unique integer numbers; 8) Sentence vectorization; 9) Splitting the vectorized data into a training set, a verification set and a test set; 10 Constructing an LSTM+CRF neural network model; 11 Operation ticket naming body predictive identification. Compared with the traditional method and the single use of BILSM or CRF model, the invention can generally obtain higher evaluation indexes such as accuracy, recall rate, F1 value and the like, and the recognition rate is over 9.

Description

Operation ticket naming entity identification method based on BiLSTM+CRF model
Technical Field
The invention relates to an operation ticket naming entity identification method based on BiLSTM+CRF model, belonging to the technical field of text natural language processing.
Background
The operation ticket is an important electronic text certificate for daily overhaul and maintenance of the distribution network power system equipment, and contains a large amount of entity information such as a line, a switch, a ring main unit, a switch operation state, other equipment and the like. At present, unstructured operation ticket text has some problems and disadvantages for the entity information recognition NLP field as follows:
1. and (3) manual treatment: in the prior art, the network allocation entity information in the operation ticket text is read and understood by relying on manual experience in the professional field in advance, so that the processing efficiency is low and errors are easy to occur;
2. there are accuracy problems: due to the complexity and diversity of the text content of the operation ticket, the prior conventional technology has the problem of low accuracy in the aspects of entity boundary identification and classification;
3. the pertinence is insufficient: the prior art is only suitable for specific types of operation tickets, and has poor performance and lack of generality for other types of operation ticket identification;
4. there is an island of data: the text analysis accuracy of the traditional technology is low, unstructured texts cannot be shared across systems, and information cannot be shared to form a data island.
Thus, there is an urgent need for a solution that can automatically identify named entities in an operation ticket.
Disclosure of Invention
The invention aims to provide an operation ticket named entity recognition method based on a BiLSTM+CRF model, wherein 'BiLSTM+CRF' is a technical scheme combining a bidirectional long and short Term Memory network (Bidirectional Long Short-Term Memory, biLSTM) and a conditional random field (Conditional Random Field, CRF) and is used for operation ticket named entity recognition (Named Entity Recognition, NER) tasks. Solves the following technical problems: 1. automatic identification: an automatic operation ticket named entity identification method is provided, so that manual intervention is reduced, and processing efficiency is improved. 2. And the accuracy is improved: and introducing a deep learning model BiLSTM+CRF to accurately mark entity boundaries in the operation ticket text, improve classification effect and improve identification accuracy. 3. Improving the universality: the model with strong universality is designed, so that the model is suitable for various operation tickets, and the adaptability and generalization capability of the model are improved.
The aim of the invention is realized by the following technical scheme:
a method for identifying operation ticket naming entity based on BiLSTM+CRF model includes:
1) Corpus preparation: collecting a text, a database and a log file containing dispatch maintenance operation ticket data;
2) Data preprocessing: performing artificial character grading word and part of speech tagging pretreatment on original operation ticket text data;
3) Data were labeled using the BIEO (Begin, inside, outside, end) labeling method: the initial position of the entity is marked as 'B', the middle part is marked as 'I', the non-entity part is marked as 'O', and the end position is marked as 'E';
4) Sentence segmentation: sentence segmentation is carried out according to punctuation marks;
5) Splitting the marked sentences into a word list and a corresponding marked list;
6) Constructing a vocabulary, and converting the operation ticket text data into a digital representation which can be understood and processed by the LSTM+CRF model;
7) Counting an operation ticket vocabulary and entity labels, constructing a corresponding dictionary, and mapping words into unique integer numbers;
8) Sentence vectorization: the operation ticket text is vectorized, words are converted into corresponding numbers, and the words are filled or truncated according to the maximum length of the sentences to enable the lengths of the sentences to be the same, and finally vectorized representations are the numbers of the sentences and the numbers of the labels respectively;
9) Splitting the vectorized data into a training set, a verification set and a test set for subsequent model training, evaluation and prediction;
10 Building lstm+crf neural network model: inputting word vectors into a bidirectional LSTM layer, mapping LSTM output into the score of each label through a full connection layer, and finally decoding by using a CRF layer to obtain an optimal label sequence;
11 Operation ticket naming body predictive identification): and sending the operation ticket text into model prediction, and automatically identifying the named body in the operation ticket.
Further, the LSTM+CRF neural network model architecture is as follows:
input layer: receiving an input sequence of ticket text, each word typically represented by its number;
word embedding layer: converting the input word segmentation number into a corresponding word vector;
bidirectional LSTM layer: receiving a word vector sequence, and processing sequence information in two directions, wherein the sequence information is processed in two directions, namely a forward direction and a backward direction;
full tie layer: the output of the LSTM is mapped to the tag space. It maps the output dimension of LSTM to the number of labels to calculate the score for each label;
conditional random field layer: the method is used for decoding the output sequences, solving the dependency relationship among the tag sequences, and the CRF layer decodes on the output of the LSTM to obtain the optimal tag sequence, so that the output tag sequence meets the overall optimal probability.
Compared with the prior art, the invention has the beneficial effects that:
1. contextual information capture: the two-way long and short term memory network (BILSM) can consider the information of the operation ticket context at the same time, so that the NER task can better understand the context of an entity, thereby improving the identification accuracy.
2. Solves the long-term dependence problem: while conventional RNNs are prone to gradient extinction or gradient explosion in the face of long-term dependency problems, BILSM avoids this problem by propagating forward and backward, thereby better handling long text sequences of tickets.
3. Sequence modeling: conditional Random Fields (CRFs) can model the entire labeling sequence and take into account the interrelationships between entity tags. The sequence modeling can further improve the performance of NER tasks and avoid the generation of illegal tag sequences.
4. Context consistency: because the conditional random field CRF considers the dependency relationship between tag sequences, the BILSM+CRF model can ensure the consistency of entity boundaries and avoid generating unreasonable entity boundaries.
5. End-to-end learning: BILSM+CRF is an end-to-end deep learning model, and the mapping relation between the features and the labels is directly learned from the original text, so that the complexity of feature engineering can be reduced without manually designing the features.
6. The effect is excellent: the BILSM+CRF technical scheme is excellent in a plurality of named entity recognition tasks, and compared with the traditional method and the method for independently using the BILSM or CRF model, the method can generally obtain higher evaluation indexes such as accuracy, recall rate, F1 value and the like, and the recognition rate is over 9.
Through the technical characteristics, the invention greatly improves the automation level of operation ticket processing, reduces errors and omission, and provides a more reliable named body recognition solution for the fields of operation, maintenance, overhaul and the like of the distribution network power system. Because the model can learn the characteristics and modes from the data, the BILSM+CRF can flexibly adapt to different types of named entities in the operation ticket, including switches, station rooms, ring main units and the like.
Drawings
FIG. 1 is a diagram of an LSTM-CRF model architecture of the present invention;
FIG. 2 is a diagram of an overall model architecture for BiLSTM+CRF model applied to ticket text recognition.
Detailed Description
The invention will be further described with reference to the drawings and the specific examples.
For operation ticket named entity recognition, the invention adopts a deep learning model based on BiLSTM+CRF to construct the whole neural network, wherein the BiLSTM+CRF is a technical scheme combining a bidirectional long and short Term Memory network (Bidirectional Long Short-Term Memory, BILSTM) and a conditional random field (Conditional Random Field, CRF) and is used for operation ticket named entity recognition (Named Entity Recognition, NER) tasks. NER is an important task in natural language processing, the goal of which is to identify named entities of a specified category, such as switches, lines, station rooms, ring main units, switching stations, switch states, etc., from ticket text.
The following is a technical scheme for identifying the named entity BiLSTM+CRF, and the main technical characteristics include 6 parts:
1. data preprocessing:
text marking: each word or character in the original ticket text is marked, for example, by a BIEO (Begin, side, end) marking method, with the beginning position of the entity marked "B", the middle part marked "I", the non-entity part marked "O", and the ending position marked "E".
2. Building a two-way long and short term memory network (BILSTM):
BILSM is an extended form of Recurrent Neural Network (RNN) that can take context information into account at the same time. This network structure has two LSTM layers, one forward propagating and one backward propagating, capturing the context information for each word. Thus, the long-term dependence problem faced by the traditional RNN can be effectively solved.
3. Conditional Random Field (CRF):
CRF is a statistical model used to annotate sequence data. In the NER task, it can model over the whole labeling sequence, taking into account the interrelationships between entity tags. The CRF further improves the performance of NER by modeling the probability of the entity sequence.
4. Model combination: the BILSM and CRF are combined as shown in FIG. 1 to form an end-to-end model. BILSM is responsible for extracting context-related features from the entered ticket annotation text, while CRF is responsible for modeling the tag sequence of NER tasks. During training, the entire model is optimized by maximizing the conditional log likelihood function of the CRF.
5. Model training and evaluation: training the model by using the marked operation ticket training data, and tuning the model by using the verification set to avoid overfitting. And in the evaluation stage, the model is evaluated by using test data, and the performance of the model is evaluated by using indexes such as accuracy, recall rate, F1 value and the like.
6. And (3) predicting: finally, given new unlabeled operation ticket text, predicting the named entity in the operation ticket text through a trained model, and realizing automatic named entity identification of the operation ticket text.
In general, the BILSM+CRF technical scheme captures context information through a two-way long-short-term memory network, and then realizes efficient and accurate named entity recognition through the relationship between the conditional random field modeling tag sequences.
The following describes the implementation of the invention in detail by means of specific examples:
1. data cleaning:
the step is mainly used for cleaning the original operation ticket text data, filtering the automatic switch ticket, the ticket which does not relate to the switch operation, the in-station switch ticket and the like, and only storing the non-automatic switch ticket data. For example:
lv Na 171 line Lv Mengna path 1# ring main unit Lv Na 171 switch is changed from running to cold standby (ring opening)
2. And (3) data marking:
named entity recognition (Named Entity Recognition, NER) belongs to supervised learning and the model needs to be trained on labeled datasets. Where each input (in this case, each word sequence) is associated with its corresponding output (named entity tag). The goal is to learn the mapping from the input data to the output tags so that the model can accurately predict the entities in the new, unseen data. This requires, in the context of NER, labeling the data to identify the boundaries and types of named entities present in the text. The labeling process involves manually marking each word segment in the text with a label that indicates the type of entity (e.g., switch, ring main unit, line, etc.). The scheme adopts the marking type of BIOE, marks line entity as 'line', ring main unit entity as 'ring', switch entity as 'switch', initial cold standby entity as 'ori', target operation entity as 'targ', and the like, and the following examples are:
BX: start of type X entity
IX: inside an X-type entity
EX: end of X type entity
O: outside any entity (non-entity)
3. Sentence segmentation:
considering that different operation ticket filling habits of partial dispatchers exist, punctuation marks exist in a text content, in the step, sentence segmentation is carried out on marked data according to the punctuation marks, and the marked data are fed into LSTM+CRF model training;
4. splitting the marked sentences into a word list and a corresponding marked list:
the labeled sentences are processed according to the rows and divided into a word list and a label list, and text data are converted into a form suitable for a sequence labeling task, so that model training and prediction can be facilitated. In the training process, the model can be guided to learn the correct prediction label of each position through the labels, so that the model has the capability of identifying the entity or the part of speech. In addition, in the prediction process, the model can predict a corresponding tag list according to the word list of the input sentence, so that the identification of the entity or the part of speech in the text is realized.
5. Construction of vocabulary (all non-repeating words in corpus)
The construction of a vocabulary is an important step in natural language processing tasks, whose purpose is to convert ticket text data into a digital representation that can be understood and processed by the lstm+crf model, characterized mainly by:
the digitized representation: all the operation ticket data are formed into a corpus, and each word is mapped into a unique integer number by constructing a vocabulary, so that the text data are converted into digital representation, and the computer model is convenient to process.
Reducing the data dimension: the ticket text data typically has a high dimension, with each word being a feature. By constructing the vocabulary, the operation ticket text data can be converted into a low-dimensional digital sequence, and the computational complexity of model training and reasoning is reduced.
Extracting characteristics: and constructing an operation ticket vocabulary table to map each word into a unique number, and sequencing the words according to information such as frequency, so that common segmentation words are mapped into smaller numbers, and common characteristic information is extracted.
Maintaining consistency: constructing the operation ticket vocabulary ensures that each word has a unique number, and maintains consistency of the text data in different links, such as using the same number in the training set, the validation set, and the test set.
6. Building dictionary ("word: number", "number: word", "tag: word", word: tag ")
And counting the vocabulary of the operation ticket and the entity tag, constructing a corresponding dictionary, and mapping the words into unique integer numbers. For example:
7. sentence vectorization:
in the training process of the machine learning model, text data can be converted into digital representation through text vectorization representation on the basis of a generated dictionary, so that the model is convenient to train and learn. Since the natural language processing data is of different sizes, the text data needs to be filled or truncated when the text is represented in a vectorized manner.
In the following example, the operation ticket text is represented in a vectorization mode, the words are converted into corresponding numbers, and filling or cutting is carried out according to the maximum length of the sentences so that the lengths of the sentences are the same. Here, the maximum length of the sentence is set to 10, so that it is necessary to fill the sentence, and the sentences having a length less than 10 are filled with 0 at the end so that all the sentence lengths are 10. The resulting vectorized representations are the number of sentences and the number of tags, respectively.
8. Splitting into a training set, a verification set and a test set:
splitting the vectorized data into a training set, a verification set and a test set, and dividing the training set, the verification set and the test set according to the 8:2 duty ratio of the label category for training, evaluating and predicting a subsequent model.
9. Save as pkl file:
the data is stored as a pkl binary file, which contains vocabulary, tag tables and vectorized training set, verification set and test set data, so that the data can be quickly loaded during subsequent model training and testing.
10. Constructing an LSTM+CRF neural network model:
the constructed LSTM-CRF model structure is shown in fig. 2, word vectors are input into a bidirectional LSTM layer, LSTM output is mapped into the score of each label through a full connection layer, and finally the CRF layer is used for decoding to obtain the optimal label sequence.
Long short-term memory (LSTM) is a special RNN, mainly to solve the problems of gradient extinction and gradient explosion in the Long sequence training process. In short, LSTM is able to perform better in longer text sequences than normal RNNs.
LSTM has very good performance advantage in processing sequence problems, so the scheme uses BiLSTM+CRF model to perform operation ticket text recognition, and the whole model architecture is as follows:
input layer: an input sequence of ticket text is received, each word being generally represented by its number (index).
Word Embedding Layer (Embedding Layer): and converting the input word segmentation number into a corresponding word vector. The scheme adopts a pytorch built-in word vector generation mode.
Bidirectional LSTM layer (Bidirectional LSTM Layer): a sequence of word vectors is received and sequence information is processed in two directions, forward (forward) and backward (reverse) respectively. This captures the context information for each word segment.
Full link Layer (Linear Layer): the output of the LSTM is mapped to the tag space. It maps the output dimension of the LSTM to the number of labels to score each label.
Conditional random field Layer (CRF Layer): the method is used for decoding the output sequences and solving the dependency relationship between the tag sequences. The CRF layer decodes on the output of the LSTM to obtain an optimal tag sequence such that the output tag sequence satisfies the overall optimal probability.
11. Operation ticket naming body prediction and identification:
and sending the operation ticket text into model prediction, automatically identifying a named body in the operation ticket, and judging that the entities in the operation ticket are all correctly resolved according to an actual effect diagram.
In addition to the above embodiments, other embodiments of the present invention are possible, and all technical solutions formed by equivalent substitution or equivalent transformation are within the scope of the present invention.

Claims (2)

1. The operation ticket naming entity identification method based on BiLSTM+CRF model is characterized by comprising the following steps:
1) Corpus preparation: collecting a text, a database and a log file containing dispatch maintenance operation ticket data;
2) Data preprocessing: performing artificial character grading word and part of speech tagging pretreatment on original operation ticket text data;
3) Data labeling is carried out by adopting a BIEO labeling method: the initial position of the entity is marked as 'B', the middle part is marked as 'I', the non-entity part is marked as 'O', and the end position is marked as 'E';
4) Sentence segmentation: sentence segmentation is carried out according to punctuation marks;
5) Splitting the marked sentences into a word list and a corresponding marked list;
6) Constructing a vocabulary, and converting the operation ticket text data into a digital representation which can be understood and processed by the LSTM+CRF model;
7) Counting an operation ticket vocabulary and entity labels, constructing a corresponding dictionary, and mapping words into unique integer numbers;
8) Sentence vectorization: the operation ticket text is vectorized, words are converted into corresponding numbers, and the words are filled or truncated according to the maximum length of the sentences to enable the lengths of the sentences to be the same, and finally vectorized representations are the numbers of the sentences and the numbers of the labels respectively;
9) Splitting the vectorized data into a training set, a verification set and a test set for subsequent model training, evaluation and prediction;
10 Building lstm+crf neural network model: inputting word vectors into a bidirectional LSTM layer, mapping LSTM output into the score of each label through a full connection layer, and finally decoding by using a CRF layer to obtain an optimal label sequence;
11 Operation ticket naming body predictive identification): and sending the operation ticket text into model prediction, and automatically identifying the named body in the operation ticket.
2. The method for identifying operation ticket naming entity based on BiLSTM+CRF model as set forth in claim 1, wherein said LSTM+CRF neural network model architecture is as follows:
input layer: receiving an input sequence of ticket text, each word typically represented by its number;
word embedding layer: converting the input word segmentation number into a corresponding word vector;
bidirectional LSTM layer: receiving a word vector sequence, and processing sequence information in two directions, wherein the sequence information is processed in two directions, namely a forward direction and a backward direction;
full tie layer: the output of the LSTM is mapped to the tag space. It maps the output dimension of LSTM to the number of labels to calculate the score for each label;
conditional random field layer: the method is used for decoding the output sequences, solving the dependency relationship among the tag sequences, and the CRF layer decodes on the output of the LSTM to obtain the optimal tag sequence, so that the output tag sequence meets the overall optimal probability.
CN202311556835.0A 2023-11-21 2023-11-21 Operation ticket naming entity identification method based on BiLSTM+CRF model Pending CN117592482A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311556835.0A CN117592482A (en) 2023-11-21 2023-11-21 Operation ticket naming entity identification method based on BiLSTM+CRF model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311556835.0A CN117592482A (en) 2023-11-21 2023-11-21 Operation ticket naming entity identification method based on BiLSTM+CRF model

Publications (1)

Publication Number Publication Date
CN117592482A true CN117592482A (en) 2024-02-23

Family

ID=89914568

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311556835.0A Pending CN117592482A (en) 2023-11-21 2023-11-21 Operation ticket naming entity identification method based on BiLSTM+CRF model

Country Status (1)

Country Link
CN (1) CN117592482A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117992068A (en) * 2024-04-02 2024-05-07 天津南大通用数据技术股份有限公司 LSTM and TRM combined intelligent database grammar analysis method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117992068A (en) * 2024-04-02 2024-05-07 天津南大通用数据技术股份有限公司 LSTM and TRM combined intelligent database grammar analysis method

Similar Documents

Publication Publication Date Title
CN111985239B (en) Entity identification method, entity identification device, electronic equipment and storage medium
CN111738003B (en) Named entity recognition model training method, named entity recognition method and medium
CN112905804B (en) Dynamic updating method and device for power grid dispatching knowledge graph
CN110134946B (en) Machine reading understanding method for complex data
CN108717574A (en) A kind of natural language inference method based on conjunction label and intensified learning
CN110287482B (en) Semi-automatic participle corpus labeling training device
CN117592482A (en) Operation ticket naming entity identification method based on BiLSTM+CRF model
US20220300546A1 (en) Event extraction method, device and storage medium
CN113434688B (en) Data processing method and device for public opinion classification model training
CN114676255A (en) Text processing method, device, equipment, storage medium and computer program product
CN113065341A (en) Automatic labeling and classifying method for environmental complaint report text
CN115878778A (en) Natural language understanding method facing business field
CN116663540A (en) Financial event extraction method based on small sample
CN114996470A (en) Intelligent scheduling maintenance identification library construction method
CN115098673A (en) Business document information extraction method based on variant attention and hierarchical structure
CN113590827B (en) Scientific research project text classification device and method based on multiple angles
CN114564950A (en) Electric Chinese named entity recognition method combining word sequence
CN113869054A (en) Deep learning-based electric power field project feature identification method
CN114356924A (en) Method and apparatus for extracting data from structured documents
Wei et al. Named entity recognition method for educational emergency field based on BERT
CN111950286A (en) Development method of artificial intelligent legal review engine system
CN116522165A (en) Public opinion text matching system and method based on twin structure
CN114757183B (en) Cross-domain emotion classification method based on comparison alignment network
CN116166768A (en) Text knowledge extraction method and system based on rules
CN116108175A (en) Language conversion method and system based on semantic analysis and data construction

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination