CN110826332A - GP-based automatic identification method for named entities of traditional Chinese medicine patents - Google Patents

GP-based automatic identification method for named entities of traditional Chinese medicine patents Download PDF

Info

Publication number
CN110826332A
CN110826332A CN201911062344.4A CN201911062344A CN110826332A CN 110826332 A CN110826332 A CN 110826332A CN 201911062344 A CN201911062344 A CN 201911062344A CN 110826332 A CN110826332 A CN 110826332A
Authority
CN
China
Prior art keywords
chinese medicine
traditional chinese
tree
named entities
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN201911062344.4A
Other languages
Chinese (zh)
Inventor
张亚宇
谷波
钱宇华
马国帅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanxi University
Original Assignee
Shanxi University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanxi University filed Critical Shanxi University
Priority to CN201911062344.4A priority Critical patent/CN110826332A/en
Publication of CN110826332A publication Critical patent/CN110826332A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/316Indexing structures
    • G06F16/322Trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Genetics & Genomics (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

A GP-based automatic recognition method for named entities of traditional Chinese medicine patents is characterized in that automatic extraction of traditional Chinese medicine patent document features is achieved through active learning of a model, and then named entity labeling is achieved according to extracted feature information. The invention applies genetic programming to the recognition task of the named entities of the traditional Chinese medicine patents, so that the algorithm can be actively learned; compared with the mainstream deep learning method technology at the present stage, the method has fewer parameters and is easy to operate; in the learning process, the context information of the words is considered, and the context dependency relationship between the words is also considered, so that the information extraction is more sufficient; compared with the gate-based LSTM algorithm, the GP is used for searching the memory cells, so that a more complex operation structure can be found, more named entities can be found compared with the original method, and the performance of the algorithm is improved. The method is used for automatically identifying the named entities in the traditional Chinese medicine patent, and can also be expanded to related tasks such as keyword extraction and the like.

Description

GP-based automatic identification method for named entities of traditional Chinese medicine patents
Technical Field
The invention relates to the field of natural language processing, in particular to a GP-based automatic identification method for named entities of traditional Chinese medicine patents.
Background
The patent indexing is the core work of data deep processing, and various retrieval information in patent documents can be effectively extracted through indexing, so that the efficiency and the accuracy of patent document retrieval are improved. The patent data of traditional Chinese medicine as a kind of patent data with important value comprises a large amount of professional vocabularies (traditional medicines, compounds and the like), and the automatic indexing work is extremely difficult. In auto-indexing, named entity recognition is again one of the tasks where the first step is important, and the results of named entity recognition can affect the following tasks.
Named Entity Recognition (NER), a key task in natural language processing, aims to recognize entities with specific meanings in texts, so that the technology can recognize related entities such as drugs and compounds in traditional Chinese medicine patent data and is an effective method. For a long time the NER approach mainly comprises three main categories: (1) rule/dictionary based methods; (2) methods based on traditional machine learning (e.g., CRF, HMM, MEMM); (3) a deep learning based approach; in recent years, the method based on deep learning is the mainstream method in the named entity recognition task, and good effects are obtained, wherein the method comprises Bi-LSTM, LSTM + CRF, RNN + CRF, existing Attention-based, transfer learning-based neural network structures and the like. The named entity recognition based on the traditional method generally needs word segmentation processing on a text, and the quality of a word segmentation result directly affects an experimental result, particularly traditional Chinese medicine patent data with more professional vocabularies; the deep learning-based method has various network types, great dependence on parameter setting and poor model interpretability.
Genetic Programming (GP), a branch of the field of evolutionary computing, is a method that mimics human intelligence, and can select computer programs by autonomous learning to solve tasks given in advance, with fewer parameters and ease of operation. The invention provides a named entity automatic identification method based on Genetic Programming (GP) by combining a traditional Chinese medicine patent naming identification task.
Disclosure of Invention
The invention aims to solve the following problems:
(1) the method has the advantages that the method carries out named entity recognition on the medicines and compounds in the Chinese medicine patent data, and facilitates automatic indexing of the patent data in the later period;
(2) a universal named entity recognition model is designed, the model can actively learn according to different data types, and excessive human participation is not needed;
(3) the optimization is carried out through a GP algorithm, so that the problems of gradient disappearance, gradient explosion and the like when the optimization problem is solved through the traditional deep learning are avoided;
(4) the GP algorithm is adopted to search the input and output points of the model, so that more complex structures can be found, more named entities can be found compared with the original method, and the method has better performance;
(5) compared with the mainstream deep learning method technology at the present stage, the method has the advantages that the parameters are fewer, the operation is easy, and errors caused by word segmentation in the traditional method are avoided.
The invention discloses a GP-based automatic identification method for named entities of traditional Chinese medicine patents, which has the advantages of fewer parameters, easy operation, sufficient extracted information and good performance.
A GP-based automatic recognition method for named entities of traditional Chinese medicine patents is characterized in that automatic extraction of document features is achieved through active learning of a model, and then named entity labeling is achieved according to extracted feature information.
The method comprises the following specific steps:
the first step is as follows: preparing data: data cleaning, namely manually marking named entities in Chinese patent documents;
the second step is that: structured representation of data: converting 'Chinese characters' in training data into a vector form by a character embedding method, wherein each 'Chinese character' is embedded into a vector with l dimension;
the third step: and in the model learning process based on the GP algorithm, the training is carried out sentence by sentence.
1. Local information extraction for words
A. Context information representation of words
For each sentence in the data set, word vectors of the words contained therein are sequentially concatenated into a matrix form, for example: sentences of length s can be represented structurally as matrices of dimension s x l. When extracting the local information of the word, setting a window to be 5 x l, namely simultaneously considering the information of the first two words and the information of the second two words in the extraction process of the local information of the word, and completing the current word by using a 0 vector if the current word is not followed or followed by two words; the context information for each word can be represented as a matrix of 5 x l, denoted P11,P12,...,P5lTo index the values in the matrix, figure 1 gives the word "xi"context information matrix representation.
B. Local information extraction for words
Through A, each word in the sentence is corresponding to a context information matrix, and the local information extraction process of the Chinese character converts the Chinese character into a vector form containing local information according to the context information matrix by learning a tree structure, and the vector is recorded as yt=[yt1,yt2,....,ytm]And t is 1, 2.. times.s, the invention sets s words in a sentence, and the dimension of the formed local information 'word vector' is m. The specific process is as follows:
(1) randomly initializing a tree structure of T subtrees, wherein leaf nodes of the tree are indexes in a word context information matrix, intermediate nodes are a plurality of operators and elementary functions (each subtree is actually a function expression, and a variable is the index of the leaf node) given at random, and the output of the m subtrees is multiplied by C and spliced into a vector y with dimension mtI.e. the root of the tree, C is a coefficient between (0, 0.5), and we refer to this type of tree structure as the first genetic programming tree, abbreviated GP1, one of which is shown in fig. 3. In fact, each tree is a local information extraction model.
(2) Inputting the context information matrix corresponding to each word in the sentence into the initialized T trees to form T y vectors, and using the formed vectors as the input of the next process
(3) The optimization process of the tree structure is realized through self-evolution, and an optimal tree structure is finally learned, so that the local information of the character can be extracted to the maximum extent (the fourth step of the self-evolution process of the tree will give a detailed description).
2. Sequence labeling
For the sequence data of Chinese medicine patent types, the long dependency relationship often exists between words, and is inspired by the learning process of LSTM (Long Short Term Memory networks), the invention provides a genetic programming tree with long and Short Memory capacity.
Through the above steps, each word is represented as a vector y with dimension mt. A T tree structure is also initialized, here denoted as a second genetic programming tree, abbreviated as GP2, with the T outputs of the tree structure in GP1 corresponding to the inputs to GP2, but with a clear distinction from GP 1.
The structure of GP2 is described below:
(1) the learning process GP2 for each word marker corresponds to three inputs (i.e., there are three types of inputs for the leaf nodes of the tree): input of itself ytOutput of the previous word ht-1Memory cell C produced by the previous processt-1Node values (first character h) of corresponding dimensions of three types of vectorst-1And Ct-1Then randomly given); the three inputs are input into nodes at the upper layer of the tree in a full connection mode;
(2) the output of GP1 is an m-dimensional vector, while GP2 corresponds to two types of outputs: h istAnd Ct。htOutputting h for the learned information of a certain dimension of the current word vectortRespectively input to the following three positions:
a. input to higher layers;
b. as input for the next word;
c. as input for the next iteration.
CtCompared with the traditional gate-based LSTM model, the GP tree-based learning process can learn more complex functional relationship to enrich the memory cell C for the long-term memory cell to store the long-term dependency information among word sequencest
(3) In addition, the intermediate nodes of GP2 and GP1 are also different: the intermediate nodes of GP1 are randomly given operators and elementary functions, and GP2 additionally adds three types of activation functions commonly used in deep learning: sigmoid, tanh, relu.
A simple illustration of the sequence learning (FIG. 3) and the GP2 used in the learning process (FIG. 4) are given below, respectively, and as can be seen from FIG. 3, the output is provided to the higher layer htThe probabilities of all tokens can be output by the softmax transform, and the token corresponding to the probability maximum is considered to be the token of the word. The GP2 is also learned through a form of self-evolution.
For example: the final output of the 'Goujin' is [0.6, 0.1, 0.3], i.e. the probabilities marked as 'B', 'I' and 'O' are 0.6, 0.1 and 0.3 respectively, and obviously, the probability that the model predicts that the output is 'B' is the maximum, and the model outputs are marked as 'B'.
The fourth step: function of adaptive value
T models are formed through the process, each model gives a corresponding label, a reasonable adaptive value function needs to be given for judging the quality of the model, and the probability that an individual (model) with a larger adaptive value is inherited to the next generation is larger.
The cross entropy is always used for measuring the difference information between two distributions, and for each sentence, when the learned label is more similar to the real label distribution, the corresponding cross entropy is also smaller. In the genetic operation, the larger the individual adaptive value is expected to be, the stronger the adaptive capacity is, in order to accord with the survival rule of a suitable person, the negative value of the cross entropy is supposed to be adopted as the adaptive value function, and in order to prevent overfitting, the width and the depth of the tree are constrained by the method. Thus, the corresponding fitness function is:
Figure BDA0002258328920000041
pjiis the probability of the true mark being,is the corresponding prediction probability; n is a radical ofTkOf GPkDepth, DTkFor the width of GPk, k is 1 or 2.
And (3) an evolution process of the tree structure: for the initialized tree (the depth of the tree is set to be not more than 10) the initialized tree is evolved into a final tree structure through operations of multiple selection, crossing and mutation:
selecting an operator: selecting by adopting a roulette method, selecting m trees with the largest adaptive value to enter the next generation, wherein the probability that the trees with the higher adaptive value are selected is higher;
and (3) a crossover operator: randomly selecting subtrees in the two individuals, and randomly exchanging the positions corresponding to the two subtrees;
mutation operator: the symbols in the tree or the subtrees are transformed randomly with a probability of 1%.
The fifth step: and selecting an optimal model. After the model training is finished, the final model is verified through a verification set, and the optimal tree structures (GP1 and GP2) in the verification set are selected as the final model.
And a sixth step: and testing the learned model on a test set.
According to the method, the automatic extraction of the document features is realized through the autonomous learning of the model, so that the named entity labeling is realized according to the extracted feature information without excessive manual participation, and the parameter quantity is small; in the process of extracting the information of the words, the context information of the words and the context dependency relationship between the words are considered, so that the information extraction is more sufficient; the GP algorithm is used for searching, so that a more complex structure can be automatically found, more named entities are labeled, and the performance of the algorithm is improved; compared with the mainstream deep learning method technology at the present stage, the method has fewer parameters and is easy to operate, and the problems of gradient loss and gradient explosion in the deep learning algorithm can be prevented.
Drawings
FIG. 1 is the word "xi"is represented by a context information matrix;
FIG. 2 is a partial information extraction GP1 of a word;
FIG. 3 is a learning process for a text sequence;
FIG. 4 is the GP2 structure in the sequence model;
fig. 5 is a model learning flowchart.
Detailed Description
A GP-based automatic identification method for named entities of traditional Chinese medicine patents is characterized by comprising the following steps:
the first step is as follows: preparing data: data cleaning, namely manually marking named entities in Chinese patent documents; the first word of a named entity is marked "B" (begin), the second word of the named entity and the remaining words are marked "I" (inside), and words that are not named entities are marked "O" (out). An example of a labeled training data portion is shown in FIG. 1, with labeled words on the left and corresponding label forms for each word on the right, where "Goji" is labeled as the named entity.
The second step is that: structured representation of data: converting 'Chinese characters' in training data into a vector form by a character embedding method, wherein each 'Chinese character' is embedded into a vector with l dimension; the specific process is as follows:
1. firstly, the Chinese characters c in the training datai,ciExpressed as one-hot vector form, i ═ 1,2,. n; expressed as a one-hot vector (i.e., for m words (without repetition) in the training data, each word corresponds to an index of one dimension, for the ith word only the ith dimension is 1, and the other dimensions are all 0):
"carry": [1,0,0,0,......,0]
Taking: [0,1,0,0,......,0]
"from": [0,0,1,0,......,0]
'Qi': [0,0,0,1,......,0]
......
2. Taking 'Chinese character' represented by one-hot as input, training the 'Chinese character' into a character vector which is densely distributed and has a certain semantic relation through a word2vector model, wherein the vector length is l, and a new 'Chinese character' vector in a sentence is marked as xi=[xi1,xi2,......xil],i=1,2,...,n;
The third step: model learning process based on GP algorithm
And training sentence by sentence.
1. Local information extraction for words
A. Context information representation of words
(1) Splicing the word vectors of each word in the sentence into a matrix form in sequence; for example: sentences of length s may be represented as a matrix of dimension s x l.
(2) For each word in the sentence, taking the word vector of the first two words and the last two words as context information (for the current word, the front word and the back word are not filled with 0 vector), and representing each word as a matrix of 5 x l11,P12,...,P5lTo index the values in the matrix, figure 2 gives the word "xi"context information matrix representation.
B. Local information extraction for words
Conversion of a word into a word vector y containing local information by learning a tree structure based on its context information matrixt=[yt1,yt2,....,ytm]And t is 1, 2.. times.s, the invention sets s words in a sentence, and the dimension of the formed local information 'word vector' is m.
The specific process is as follows:
(1) randomly initializing a tree structure of T subtrees, wherein leaf nodes of the tree are indexes in a word context information matrix, intermediate nodes are a plurality of operators and elementary functions (each subtree is actually a function expression, and a variable is the index of the leaf node) given at random, and the output of the m subtrees is multiplied by C and spliced into a vector y with dimension mtI.e. the root of the tree, C is a coefficient between (0, 0.5), and we refer to this type of tree structure as the first genetic programming tree, abbreviated GP1, one of which is shown in fig. 3. In fact, each tree is a local information extraction model.
(2) Inputting the context information matrix corresponding to each word in the sentence into the initialized T trees to form T y vectors, and using the formed vectors as the input of the next process
(3) The optimization process of the tree structure is realized through self-evolution, and an optimal tree structure is finally learned, so that the local information of the character can be extracted to the maximum extent (the fourth step of the self-evolution process of the tree will give a detailed description).
2. Sequence labeling
For the sequence data of Chinese medicine patent types, the long dependency relationship often exists between words, and is inspired by the learning process of LSTM (Long Short Term Memory networks), the invention provides the GP tree with long and Short Memory capability.
Through the above steps, each word is represented as a vector y with dimension mt. A T tree structure is also initialized, here denoted as a second genetic programming tree, abbreviated as GP2, with the T outputs of the tree structure in GP1 corresponding to the inputs to GP2, but with a clear distinction from GP 1.
The structure of GP2 is described below:
(3) the learning process GP2 for each word marker corresponds to three inputs (i.e., there are three types of inputs for the leaf nodes of the tree): input of itself ytOutput of the previous word ht-1Memory cell C produced by the previous processt-1Node values (first character h) of corresponding dimensions of three types of vectorst-1And Ct-1Then randomly given); the three inputs are input into nodes at the upper layer of the tree in a full connection mode;
(4) the output of GP1 is an m-dimensional vector, while GP2 corresponds to two types of outputs: h istAnd Ct。htOutputting h for the learned information of a certain dimension of the current word vectortRespectively input to the following three positions:
d. input to higher layers;
e. as input for the next word;
f. as input for the next iteration.
CtCompared with the traditional gate-based LSTM model, the GP tree-based learning process can learn more complex functional relationship to enrich the memory cell C for the long-term memory cell to store the long-term dependency information among word sequencest
(3) In addition, the intermediate nodes of GP2 and GP1 are also different: the intermediate nodes of GP1 are randomly given operators and elementary functions, and GP2 additionally adds three types of activation functions commonly used in deep learning: sigmoid, tanh, relu.
FIGS. 3 and 4 show a simple illustration of the learning of the l-sequence and the GP2 used in the learning process, respectively, as can be seen from FIG. 3, output to the higher level htThe probabilities of all tokens can be output by the softmax transform, and the token corresponding to the probability maximum is considered to be the token of the word. The GP2 is also learned through a form of self-evolution.
For example: the final output of the 'Goujin' is [0.6, 0.1, 0.3], i.e. the probabilities marked as 'B', 'I' and 'O' are 0.6, 0.1 and 0.3 respectively, and obviously, the probability that the model predicts that the output is 'B' is the maximum, and the model outputs are marked as 'B'.
The fourth step: function of adaptive value
T models are formed through the process, each model gives a corresponding label, a reasonable adaptive value function needs to be given for judging the quality of the model, and the probability that an individual (model) with a larger adaptive value is inherited to the next generation is larger.
The cross entropy is always used for measuring the difference information between two distributions, and for each sentence, when the learned label is more similar to the real label distribution, the corresponding cross entropy is also smaller. In the genetic operation, the larger the individual adaptive value is expected to be, the stronger the adaptive capacity is, in order to accord with the survival rule of a suitable person, the negative value of the cross entropy is supposed to be adopted as the adaptive value function, and in order to prevent overfitting, the width and the depth of the tree are constrained by the method. Thus, the corresponding fitness function is:
Figure BDA0002258328920000081
pjiis the probability of the true mark being,
Figure BDA0002258328920000082
is the corresponding prediction probability; n is a radical ofTkIs the depth of GPk, DTkFor the width of GPk, k is 1 or 2.
And (3) an evolution process of the tree structure: for the initialized tree (the depth of the tree is set to be not more than 10) the initialized tree is evolved into a final tree structure through operations of multiple selection, crossing and mutation:
selecting an operator: selecting by adopting a roulette method, selecting m trees with the largest adaptive value to enter the next generation, wherein the probability that the trees with the higher adaptive value are selected is higher;
and (3) a crossover operator: randomly selecting subtrees in the two individuals, and randomly exchanging the positions corresponding to the two subtrees;
mutation operator: the symbols in the tree or the subtrees are transformed randomly with a probability of 1%.
The fifth step: and selecting an optimal model. After the model training is finished, the final model is verified through a verification set, and the optimal tree structures (GP1 and GP2) in the verification set are selected as the final model.
And a sixth step: and testing the learned model on a test set.
The overall algorithm flow is shown in fig. 5.
And (3) describing a model flow:
step 1: preparing data;
step 2: the data is structurally represented in a vector form;
step 3: circulating sentence by sentence;
step 4: respectively initializing T first genetic programming trees and T second genetic programming trees, and giving selection, intersection and variation parameters required in an algorithm;
step 5: extracting local information of the words in the data;
step 6: marking the characters in the sentence according to the initialized genetic programming tree;
step 7: calculating an adaptive value of the genetic programming tree according to the marking information of the characters;
step 10: and judging whether the algorithm meets a termination condition, if so, terminating the algorithm, and if not, entering step 11.
step 11: and respectively executing selection-cross-mutation operation on the first genetic programming tree and the second genetic programming tree to form a new population, and returning to execute step 5.

Claims (3)

1. A GP-based automatic recognition method for named entities of traditional Chinese medicine patents is characterized in that automatic extraction of document features is achieved through active learning of a model, and then named entity labeling is achieved according to extracted feature information.
2. The GP-based automatic identification method for named entities of traditional Chinese medicine patents according to claim 1, comprising the following steps:
(1) data cleaning, namely manually marking named entities in Chinese patent documents;
(2) converting 'Chinese characters' in training data into vector form by a character embedding method, wherein each 'Chinese character' is embedded intolA vector of dimensions;
(3) model learning based on GP algorithm;
(4) the fitness function is:
Figure 29720DEST_PATH_IMAGE002
Figure DEST_PATH_IMAGE003
is the probability of the true mark being,
Figure 743599DEST_PATH_IMAGE004
is the corresponding prediction probability;
Figure DEST_PATH_IMAGE005
is GPkThe depth of (a) of (b),
Figure 54494DEST_PATH_IMAGE006
is GPkK is 1 or 2;
(5) after the model training is finished, verifying the final model through a verification set, and selecting the optimal tree structure under the verification set as the final model;
(6) the final model is tested on the test set.
3. The GP-based automatic recognition method for named entities of traditional Chinese medicine patents according to claim 2, wherein a model learning process based on GP algorithm comprises the following steps:
Step1: separately initializingTA first genetic programming tree and a second genetic programming tree are provided, and selection, intersection and variation parameters required in the algorithm are given;
Step2: representing words in the sentence in a matrix form containing context information;
Step3: extracting local information of the words in the data to form a new word vector;
Step4: inputting the formed new word vectors into a second genetic programming tree in sequence, and labeling the words in the sentence;
Step5: calculating an adaptive value of the genetic programming tree according to the marking information of the characters;
Step6: judging whether the algorithm meets the termination condition, if so, ending the training of the current sentence, and turning tostep2, training the next sentence until all sentences are trained; if not, enterstep7;
step7: performing selection-crossover-mutation operations on the first genetic programming tree and the second genetic programming tree, respectively, to form a new population, and returning to performstep5。
CN201911062344.4A 2019-11-02 2019-11-02 GP-based automatic identification method for named entities of traditional Chinese medicine patents Withdrawn CN110826332A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911062344.4A CN110826332A (en) 2019-11-02 2019-11-02 GP-based automatic identification method for named entities of traditional Chinese medicine patents

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911062344.4A CN110826332A (en) 2019-11-02 2019-11-02 GP-based automatic identification method for named entities of traditional Chinese medicine patents

Publications (1)

Publication Number Publication Date
CN110826332A true CN110826332A (en) 2020-02-21

Family

ID=69552223

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911062344.4A Withdrawn CN110826332A (en) 2019-11-02 2019-11-02 GP-based automatic identification method for named entities of traditional Chinese medicine patents

Country Status (1)

Country Link
CN (1) CN110826332A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112988733A (en) * 2021-04-16 2021-06-18 北京妙医佳健康科技集团有限公司 Method and device for improving and enhancing data quality

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112988733A (en) * 2021-04-16 2021-06-18 北京妙医佳健康科技集团有限公司 Method and device for improving and enhancing data quality
CN112988733B (en) * 2021-04-16 2021-08-27 北京妙医佳健康科技集团有限公司 Method and device for improving and enhancing data quality

Similar Documents

Publication Publication Date Title
CN109783817B (en) Text semantic similarity calculation model based on deep reinforcement learning
CN110334354B (en) Chinese relation extraction method
WO2023024412A1 (en) Visual question answering method and apparatus based on deep learning model, and medium and device
CN107943784B (en) Relationship extraction method based on generation of countermeasure network
CN111241294B (en) Relationship extraction method of graph convolution network based on dependency analysis and keywords
CN112883738A (en) Medical entity relation extraction method based on neural network and self-attention mechanism
CN108549658B (en) Deep learning video question-answering method and system based on attention mechanism on syntax analysis tree
CN109657239A (en) The Chinese name entity recognition method learnt based on attention mechanism and language model
CN112487143A (en) Public opinion big data analysis-based multi-label text classification method
CN110222163A (en) A kind of intelligent answer method and system merging CNN and two-way LSTM
CN111966812B (en) Automatic question answering method based on dynamic word vector and storage medium
CN110188195B (en) Text intention recognition method, device and equipment based on deep learning
CN109063164A (en) A kind of intelligent answer method based on deep learning
CN113435211B (en) Text implicit emotion analysis method combined with external knowledge
CN110516070B (en) Chinese question classification method based on text error correction and neural network
CN111027595A (en) Double-stage semantic word vector generation method
CN111400494B (en) Emotion analysis method based on GCN-Attention
CN111222318B (en) Trigger word recognition method based on double-channel bidirectional LSTM-CRF network
CN111738007A (en) Chinese named entity identification data enhancement algorithm based on sequence generation countermeasure network
CN110298044B (en) Entity relationship identification method
CN110472062B (en) Method and device for identifying named entity
CN113220876B (en) Multi-label classification method and system for English text
CN115130538A (en) Training method of text classification model, text processing method, equipment and medium
CN113282721A (en) Visual question-answering method based on network structure search
CN112989833A (en) Remote supervision entity relationship joint extraction method and system based on multilayer LSTM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20200221

WW01 Invention patent application withdrawn after publication