CN110826332A - GP-based automatic identification method for named entities of traditional Chinese medicine patents - Google Patents
GP-based automatic identification method for named entities of traditional Chinese medicine patents Download PDFInfo
- Publication number
- CN110826332A CN110826332A CN201911062344.4A CN201911062344A CN110826332A CN 110826332 A CN110826332 A CN 110826332A CN 201911062344 A CN201911062344 A CN 201911062344A CN 110826332 A CN110826332 A CN 110826332A
- Authority
- CN
- China
- Prior art keywords
- chinese medicine
- traditional chinese
- tree
- named entities
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
- G06F16/316—Indexing structures
- G06F16/322—Trees
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/12—Computing arrangements based on biological models using genetic models
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Software Systems (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Computational Linguistics (AREA)
- Genetics & Genomics (AREA)
- Mathematical Physics (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Databases & Information Systems (AREA)
- Machine Translation (AREA)
Abstract
A GP-based automatic recognition method for named entities of traditional Chinese medicine patents is characterized in that automatic extraction of traditional Chinese medicine patent document features is achieved through active learning of a model, and then named entity labeling is achieved according to extracted feature information. The invention applies genetic programming to the recognition task of the named entities of the traditional Chinese medicine patents, so that the algorithm can be actively learned; compared with the mainstream deep learning method technology at the present stage, the method has fewer parameters and is easy to operate; in the learning process, the context information of the words is considered, and the context dependency relationship between the words is also considered, so that the information extraction is more sufficient; compared with the gate-based LSTM algorithm, the GP is used for searching the memory cells, so that a more complex operation structure can be found, more named entities can be found compared with the original method, and the performance of the algorithm is improved. The method is used for automatically identifying the named entities in the traditional Chinese medicine patent, and can also be expanded to related tasks such as keyword extraction and the like.
Description
Technical Field
The invention relates to the field of natural language processing, in particular to a GP-based automatic identification method for named entities of traditional Chinese medicine patents.
Background
The patent indexing is the core work of data deep processing, and various retrieval information in patent documents can be effectively extracted through indexing, so that the efficiency and the accuracy of patent document retrieval are improved. The patent data of traditional Chinese medicine as a kind of patent data with important value comprises a large amount of professional vocabularies (traditional medicines, compounds and the like), and the automatic indexing work is extremely difficult. In auto-indexing, named entity recognition is again one of the tasks where the first step is important, and the results of named entity recognition can affect the following tasks.
Named Entity Recognition (NER), a key task in natural language processing, aims to recognize entities with specific meanings in texts, so that the technology can recognize related entities such as drugs and compounds in traditional Chinese medicine patent data and is an effective method. For a long time the NER approach mainly comprises three main categories: (1) rule/dictionary based methods; (2) methods based on traditional machine learning (e.g., CRF, HMM, MEMM); (3) a deep learning based approach; in recent years, the method based on deep learning is the mainstream method in the named entity recognition task, and good effects are obtained, wherein the method comprises Bi-LSTM, LSTM + CRF, RNN + CRF, existing Attention-based, transfer learning-based neural network structures and the like. The named entity recognition based on the traditional method generally needs word segmentation processing on a text, and the quality of a word segmentation result directly affects an experimental result, particularly traditional Chinese medicine patent data with more professional vocabularies; the deep learning-based method has various network types, great dependence on parameter setting and poor model interpretability.
Genetic Programming (GP), a branch of the field of evolutionary computing, is a method that mimics human intelligence, and can select computer programs by autonomous learning to solve tasks given in advance, with fewer parameters and ease of operation. The invention provides a named entity automatic identification method based on Genetic Programming (GP) by combining a traditional Chinese medicine patent naming identification task.
Disclosure of Invention
The invention aims to solve the following problems:
(1) the method has the advantages that the method carries out named entity recognition on the medicines and compounds in the Chinese medicine patent data, and facilitates automatic indexing of the patent data in the later period;
(2) a universal named entity recognition model is designed, the model can actively learn according to different data types, and excessive human participation is not needed;
(3) the optimization is carried out through a GP algorithm, so that the problems of gradient disappearance, gradient explosion and the like when the optimization problem is solved through the traditional deep learning are avoided;
(4) the GP algorithm is adopted to search the input and output points of the model, so that more complex structures can be found, more named entities can be found compared with the original method, and the method has better performance;
(5) compared with the mainstream deep learning method technology at the present stage, the method has the advantages that the parameters are fewer, the operation is easy, and errors caused by word segmentation in the traditional method are avoided.
The invention discloses a GP-based automatic identification method for named entities of traditional Chinese medicine patents, which has the advantages of fewer parameters, easy operation, sufficient extracted information and good performance.
A GP-based automatic recognition method for named entities of traditional Chinese medicine patents is characterized in that automatic extraction of document features is achieved through active learning of a model, and then named entity labeling is achieved according to extracted feature information.
The method comprises the following specific steps:
the first step is as follows: preparing data: data cleaning, namely manually marking named entities in Chinese patent documents;
the second step is that: structured representation of data: converting 'Chinese characters' in training data into a vector form by a character embedding method, wherein each 'Chinese character' is embedded into a vector with l dimension;
the third step: and in the model learning process based on the GP algorithm, the training is carried out sentence by sentence.
1. Local information extraction for words
A. Context information representation of words
For each sentence in the data set, word vectors of the words contained therein are sequentially concatenated into a matrix form, for example: sentences of length s can be represented structurally as matrices of dimension s x l. When extracting the local information of the word, setting a window to be 5 x l, namely simultaneously considering the information of the first two words and the information of the second two words in the extraction process of the local information of the word, and completing the current word by using a 0 vector if the current word is not followed or followed by two words; the context information for each word can be represented as a matrix of 5 x l, denoted P11,P12,...,P5lTo index the values in the matrix, figure 1 gives the word "xi"context information matrix representation.
B. Local information extraction for words
Through A, each word in the sentence is corresponding to a context information matrix, and the local information extraction process of the Chinese character converts the Chinese character into a vector form containing local information according to the context information matrix by learning a tree structure, and the vector is recorded as yt=[yt1,yt2,....,ytm]And t is 1, 2.. times.s, the invention sets s words in a sentence, and the dimension of the formed local information 'word vector' is m. The specific process is as follows:
(1) randomly initializing a tree structure of T subtrees, wherein leaf nodes of the tree are indexes in a word context information matrix, intermediate nodes are a plurality of operators and elementary functions (each subtree is actually a function expression, and a variable is the index of the leaf node) given at random, and the output of the m subtrees is multiplied by C and spliced into a vector y with dimension mtI.e. the root of the tree, C is a coefficient between (0, 0.5), and we refer to this type of tree structure as the first genetic programming tree, abbreviated GP1, one of which is shown in fig. 3. In fact, each tree is a local information extraction model.
(2) Inputting the context information matrix corresponding to each word in the sentence into the initialized T trees to form T y vectors, and using the formed vectors as the input of the next process
(3) The optimization process of the tree structure is realized through self-evolution, and an optimal tree structure is finally learned, so that the local information of the character can be extracted to the maximum extent (the fourth step of the self-evolution process of the tree will give a detailed description).
2. Sequence labeling
For the sequence data of Chinese medicine patent types, the long dependency relationship often exists between words, and is inspired by the learning process of LSTM (Long Short Term Memory networks), the invention provides a genetic programming tree with long and Short Memory capacity.
Through the above steps, each word is represented as a vector y with dimension mt. A T tree structure is also initialized, here denoted as a second genetic programming tree, abbreviated as GP2, with the T outputs of the tree structure in GP1 corresponding to the inputs to GP2, but with a clear distinction from GP 1.
The structure of GP2 is described below:
(1) the learning process GP2 for each word marker corresponds to three inputs (i.e., there are three types of inputs for the leaf nodes of the tree): input of itself ytOutput of the previous word ht-1Memory cell C produced by the previous processt-1Node values (first character h) of corresponding dimensions of three types of vectorst-1And Ct-1Then randomly given); the three inputs are input into nodes at the upper layer of the tree in a full connection mode;
(2) the output of GP1 is an m-dimensional vector, while GP2 corresponds to two types of outputs: h istAnd Ct。htOutputting h for the learned information of a certain dimension of the current word vectortRespectively input to the following three positions:
a. input to higher layers;
b. as input for the next word;
c. as input for the next iteration.
CtCompared with the traditional gate-based LSTM model, the GP tree-based learning process can learn more complex functional relationship to enrich the memory cell C for the long-term memory cell to store the long-term dependency information among word sequencest;
(3) In addition, the intermediate nodes of GP2 and GP1 are also different: the intermediate nodes of GP1 are randomly given operators and elementary functions, and GP2 additionally adds three types of activation functions commonly used in deep learning: sigmoid, tanh, relu.
A simple illustration of the sequence learning (FIG. 3) and the GP2 used in the learning process (FIG. 4) are given below, respectively, and as can be seen from FIG. 3, the output is provided to the higher layer htThe probabilities of all tokens can be output by the softmax transform, and the token corresponding to the probability maximum is considered to be the token of the word. The GP2 is also learned through a form of self-evolution.
For example: the final output of the 'Goujin' is [0.6, 0.1, 0.3], i.e. the probabilities marked as 'B', 'I' and 'O' are 0.6, 0.1 and 0.3 respectively, and obviously, the probability that the model predicts that the output is 'B' is the maximum, and the model outputs are marked as 'B'.
The fourth step: function of adaptive value
T models are formed through the process, each model gives a corresponding label, a reasonable adaptive value function needs to be given for judging the quality of the model, and the probability that an individual (model) with a larger adaptive value is inherited to the next generation is larger.
The cross entropy is always used for measuring the difference information between two distributions, and for each sentence, when the learned label is more similar to the real label distribution, the corresponding cross entropy is also smaller. In the genetic operation, the larger the individual adaptive value is expected to be, the stronger the adaptive capacity is, in order to accord with the survival rule of a suitable person, the negative value of the cross entropy is supposed to be adopted as the adaptive value function, and in order to prevent overfitting, the width and the depth of the tree are constrained by the method. Thus, the corresponding fitness function is:
pjiis the probability of the true mark being,is the corresponding prediction probability; n is a radical ofTkOf GPkDepth, DTkFor the width of GPk, k is 1 or 2.
And (3) an evolution process of the tree structure: for the initialized tree (the depth of the tree is set to be not more than 10) the initialized tree is evolved into a final tree structure through operations of multiple selection, crossing and mutation:
selecting an operator: selecting by adopting a roulette method, selecting m trees with the largest adaptive value to enter the next generation, wherein the probability that the trees with the higher adaptive value are selected is higher;
and (3) a crossover operator: randomly selecting subtrees in the two individuals, and randomly exchanging the positions corresponding to the two subtrees;
mutation operator: the symbols in the tree or the subtrees are transformed randomly with a probability of 1%.
The fifth step: and selecting an optimal model. After the model training is finished, the final model is verified through a verification set, and the optimal tree structures (GP1 and GP2) in the verification set are selected as the final model.
And a sixth step: and testing the learned model on a test set.
According to the method, the automatic extraction of the document features is realized through the autonomous learning of the model, so that the named entity labeling is realized according to the extracted feature information without excessive manual participation, and the parameter quantity is small; in the process of extracting the information of the words, the context information of the words and the context dependency relationship between the words are considered, so that the information extraction is more sufficient; the GP algorithm is used for searching, so that a more complex structure can be automatically found, more named entities are labeled, and the performance of the algorithm is improved; compared with the mainstream deep learning method technology at the present stage, the method has fewer parameters and is easy to operate, and the problems of gradient loss and gradient explosion in the deep learning algorithm can be prevented.
Drawings
FIG. 1 is the word "xi"is represented by a context information matrix;
FIG. 2 is a partial information extraction GP1 of a word;
FIG. 3 is a learning process for a text sequence;
FIG. 4 is the GP2 structure in the sequence model;
fig. 5 is a model learning flowchart.
Detailed Description
A GP-based automatic identification method for named entities of traditional Chinese medicine patents is characterized by comprising the following steps:
the first step is as follows: preparing data: data cleaning, namely manually marking named entities in Chinese patent documents; the first word of a named entity is marked "B" (begin), the second word of the named entity and the remaining words are marked "I" (inside), and words that are not named entities are marked "O" (out). An example of a labeled training data portion is shown in FIG. 1, with labeled words on the left and corresponding label forms for each word on the right, where "Goji" is labeled as the named entity.
The second step is that: structured representation of data: converting 'Chinese characters' in training data into a vector form by a character embedding method, wherein each 'Chinese character' is embedded into a vector with l dimension; the specific process is as follows:
1. firstly, the Chinese characters c in the training datai,ciExpressed as one-hot vector form, i ═ 1,2,. n; expressed as a one-hot vector (i.e., for m words (without repetition) in the training data, each word corresponds to an index of one dimension, for the ith word only the ith dimension is 1, and the other dimensions are all 0):
"carry": [1,0,0,0,......,0]
Taking: [0,1,0,0,......,0]
"from": [0,0,1,0,......,0]
'Qi': [0,0,0,1,......,0]
......
2. Taking 'Chinese character' represented by one-hot as input, training the 'Chinese character' into a character vector which is densely distributed and has a certain semantic relation through a word2vector model, wherein the vector length is l, and a new 'Chinese character' vector in a sentence is marked as xi=[xi1,xi2,......xil],i=1,2,...,n;
The third step: model learning process based on GP algorithm
And training sentence by sentence.
1. Local information extraction for words
A. Context information representation of words
(1) Splicing the word vectors of each word in the sentence into a matrix form in sequence; for example: sentences of length s may be represented as a matrix of dimension s x l.
(2) For each word in the sentence, taking the word vector of the first two words and the last two words as context information (for the current word, the front word and the back word are not filled with 0 vector), and representing each word as a matrix of 5 x l11,P12,...,P5lTo index the values in the matrix, figure 2 gives the word "xi"context information matrix representation.
B. Local information extraction for words
Conversion of a word into a word vector y containing local information by learning a tree structure based on its context information matrixt=[yt1,yt2,....,ytm]And t is 1, 2.. times.s, the invention sets s words in a sentence, and the dimension of the formed local information 'word vector' is m.
The specific process is as follows:
(1) randomly initializing a tree structure of T subtrees, wherein leaf nodes of the tree are indexes in a word context information matrix, intermediate nodes are a plurality of operators and elementary functions (each subtree is actually a function expression, and a variable is the index of the leaf node) given at random, and the output of the m subtrees is multiplied by C and spliced into a vector y with dimension mtI.e. the root of the tree, C is a coefficient between (0, 0.5), and we refer to this type of tree structure as the first genetic programming tree, abbreviated GP1, one of which is shown in fig. 3. In fact, each tree is a local information extraction model.
(2) Inputting the context information matrix corresponding to each word in the sentence into the initialized T trees to form T y vectors, and using the formed vectors as the input of the next process
(3) The optimization process of the tree structure is realized through self-evolution, and an optimal tree structure is finally learned, so that the local information of the character can be extracted to the maximum extent (the fourth step of the self-evolution process of the tree will give a detailed description).
2. Sequence labeling
For the sequence data of Chinese medicine patent types, the long dependency relationship often exists between words, and is inspired by the learning process of LSTM (Long Short Term Memory networks), the invention provides the GP tree with long and Short Memory capability.
Through the above steps, each word is represented as a vector y with dimension mt. A T tree structure is also initialized, here denoted as a second genetic programming tree, abbreviated as GP2, with the T outputs of the tree structure in GP1 corresponding to the inputs to GP2, but with a clear distinction from GP 1.
The structure of GP2 is described below:
(3) the learning process GP2 for each word marker corresponds to three inputs (i.e., there are three types of inputs for the leaf nodes of the tree): input of itself ytOutput of the previous word ht-1Memory cell C produced by the previous processt-1Node values (first character h) of corresponding dimensions of three types of vectorst-1And Ct-1Then randomly given); the three inputs are input into nodes at the upper layer of the tree in a full connection mode;
(4) the output of GP1 is an m-dimensional vector, while GP2 corresponds to two types of outputs: h istAnd Ct。htOutputting h for the learned information of a certain dimension of the current word vectortRespectively input to the following three positions:
d. input to higher layers;
e. as input for the next word;
f. as input for the next iteration.
CtCompared with the traditional gate-based LSTM model, the GP tree-based learning process can learn more complex functional relationship to enrich the memory cell C for the long-term memory cell to store the long-term dependency information among word sequencest;
(3) In addition, the intermediate nodes of GP2 and GP1 are also different: the intermediate nodes of GP1 are randomly given operators and elementary functions, and GP2 additionally adds three types of activation functions commonly used in deep learning: sigmoid, tanh, relu.
FIGS. 3 and 4 show a simple illustration of the learning of the l-sequence and the GP2 used in the learning process, respectively, as can be seen from FIG. 3, output to the higher level htThe probabilities of all tokens can be output by the softmax transform, and the token corresponding to the probability maximum is considered to be the token of the word. The GP2 is also learned through a form of self-evolution.
For example: the final output of the 'Goujin' is [0.6, 0.1, 0.3], i.e. the probabilities marked as 'B', 'I' and 'O' are 0.6, 0.1 and 0.3 respectively, and obviously, the probability that the model predicts that the output is 'B' is the maximum, and the model outputs are marked as 'B'.
The fourth step: function of adaptive value
T models are formed through the process, each model gives a corresponding label, a reasonable adaptive value function needs to be given for judging the quality of the model, and the probability that an individual (model) with a larger adaptive value is inherited to the next generation is larger.
The cross entropy is always used for measuring the difference information between two distributions, and for each sentence, when the learned label is more similar to the real label distribution, the corresponding cross entropy is also smaller. In the genetic operation, the larger the individual adaptive value is expected to be, the stronger the adaptive capacity is, in order to accord with the survival rule of a suitable person, the negative value of the cross entropy is supposed to be adopted as the adaptive value function, and in order to prevent overfitting, the width and the depth of the tree are constrained by the method. Thus, the corresponding fitness function is:
pjiis the probability of the true mark being,is the corresponding prediction probability; n is a radical ofTkIs the depth of GPk, DTkFor the width of GPk, k is 1 or 2.
And (3) an evolution process of the tree structure: for the initialized tree (the depth of the tree is set to be not more than 10) the initialized tree is evolved into a final tree structure through operations of multiple selection, crossing and mutation:
selecting an operator: selecting by adopting a roulette method, selecting m trees with the largest adaptive value to enter the next generation, wherein the probability that the trees with the higher adaptive value are selected is higher;
and (3) a crossover operator: randomly selecting subtrees in the two individuals, and randomly exchanging the positions corresponding to the two subtrees;
mutation operator: the symbols in the tree or the subtrees are transformed randomly with a probability of 1%.
The fifth step: and selecting an optimal model. After the model training is finished, the final model is verified through a verification set, and the optimal tree structures (GP1 and GP2) in the verification set are selected as the final model.
And a sixth step: and testing the learned model on a test set.
The overall algorithm flow is shown in fig. 5.
And (3) describing a model flow:
step 1: preparing data;
step 2: the data is structurally represented in a vector form;
step 3: circulating sentence by sentence;
step 4: respectively initializing T first genetic programming trees and T second genetic programming trees, and giving selection, intersection and variation parameters required in an algorithm;
step 5: extracting local information of the words in the data;
step 6: marking the characters in the sentence according to the initialized genetic programming tree;
step 7: calculating an adaptive value of the genetic programming tree according to the marking information of the characters;
step 10: and judging whether the algorithm meets a termination condition, if so, terminating the algorithm, and if not, entering step 11.
step 11: and respectively executing selection-cross-mutation operation on the first genetic programming tree and the second genetic programming tree to form a new population, and returning to execute step 5.
Claims (3)
1. A GP-based automatic recognition method for named entities of traditional Chinese medicine patents is characterized in that automatic extraction of document features is achieved through active learning of a model, and then named entity labeling is achieved according to extracted feature information.
2. The GP-based automatic identification method for named entities of traditional Chinese medicine patents according to claim 1, comprising the following steps:
(1) data cleaning, namely manually marking named entities in Chinese patent documents;
(2) converting 'Chinese characters' in training data into vector form by a character embedding method, wherein each 'Chinese character' is embedded intolA vector of dimensions;
(3) model learning based on GP algorithm;
(4) the fitness function is:
is the probability of the true mark being,is the corresponding prediction probability;is GPkThe depth of (a) of (b),is GPkK is 1 or 2;
(5) after the model training is finished, verifying the final model through a verification set, and selecting the optimal tree structure under the verification set as the final model;
(6) the final model is tested on the test set.
3. The GP-based automatic recognition method for named entities of traditional Chinese medicine patents according to claim 2, wherein a model learning process based on GP algorithm comprises the following steps:
Step1: separately initializingTA first genetic programming tree and a second genetic programming tree are provided, and selection, intersection and variation parameters required in the algorithm are given;
Step2: representing words in the sentence in a matrix form containing context information;
Step3: extracting local information of the words in the data to form a new word vector;
Step4: inputting the formed new word vectors into a second genetic programming tree in sequence, and labeling the words in the sentence;
Step5: calculating an adaptive value of the genetic programming tree according to the marking information of the characters;
Step6: judging whether the algorithm meets the termination condition, if so, ending the training of the current sentence, and turning tostep2, training the next sentence until all sentences are trained; if not, enterstep7;
step7: performing selection-crossover-mutation operations on the first genetic programming tree and the second genetic programming tree, respectively, to form a new population, and returning to performstep5。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911062344.4A CN110826332A (en) | 2019-11-02 | 2019-11-02 | GP-based automatic identification method for named entities of traditional Chinese medicine patents |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911062344.4A CN110826332A (en) | 2019-11-02 | 2019-11-02 | GP-based automatic identification method for named entities of traditional Chinese medicine patents |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110826332A true CN110826332A (en) | 2020-02-21 |
Family
ID=69552223
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911062344.4A Withdrawn CN110826332A (en) | 2019-11-02 | 2019-11-02 | GP-based automatic identification method for named entities of traditional Chinese medicine patents |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110826332A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112988733A (en) * | 2021-04-16 | 2021-06-18 | 北京妙医佳健康科技集团有限公司 | Method and device for improving and enhancing data quality |
-
2019
- 2019-11-02 CN CN201911062344.4A patent/CN110826332A/en not_active Withdrawn
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112988733A (en) * | 2021-04-16 | 2021-06-18 | 北京妙医佳健康科技集团有限公司 | Method and device for improving and enhancing data quality |
CN112988733B (en) * | 2021-04-16 | 2021-08-27 | 北京妙医佳健康科技集团有限公司 | Method and device for improving and enhancing data quality |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109783817B (en) | Text semantic similarity calculation model based on deep reinforcement learning | |
CN110334354B (en) | Chinese relation extraction method | |
WO2023024412A1 (en) | Visual question answering method and apparatus based on deep learning model, and medium and device | |
CN107943784B (en) | Relationship extraction method based on generation of countermeasure network | |
CN111241294B (en) | Relationship extraction method of graph convolution network based on dependency analysis and keywords | |
CN112883738A (en) | Medical entity relation extraction method based on neural network and self-attention mechanism | |
CN108549658B (en) | Deep learning video question-answering method and system based on attention mechanism on syntax analysis tree | |
CN109657239A (en) | The Chinese name entity recognition method learnt based on attention mechanism and language model | |
CN112487143A (en) | Public opinion big data analysis-based multi-label text classification method | |
CN110222163A (en) | A kind of intelligent answer method and system merging CNN and two-way LSTM | |
CN111966812B (en) | Automatic question answering method based on dynamic word vector and storage medium | |
CN110188195B (en) | Text intention recognition method, device and equipment based on deep learning | |
CN109063164A (en) | A kind of intelligent answer method based on deep learning | |
CN113435211B (en) | Text implicit emotion analysis method combined with external knowledge | |
CN110516070B (en) | Chinese question classification method based on text error correction and neural network | |
CN111027595A (en) | Double-stage semantic word vector generation method | |
CN111400494B (en) | Emotion analysis method based on GCN-Attention | |
CN111222318B (en) | Trigger word recognition method based on double-channel bidirectional LSTM-CRF network | |
CN111738007A (en) | Chinese named entity identification data enhancement algorithm based on sequence generation countermeasure network | |
CN110298044B (en) | Entity relationship identification method | |
CN110472062B (en) | Method and device for identifying named entity | |
CN113220876B (en) | Multi-label classification method and system for English text | |
CN115130538A (en) | Training method of text classification model, text processing method, equipment and medium | |
CN113282721A (en) | Visual question-answering method based on network structure search | |
CN112989833A (en) | Remote supervision entity relationship joint extraction method and system based on multilayer LSTM |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20200221 |
|
WW01 | Invention patent application withdrawn after publication |