CN108491382A - A kind of semi-supervised biomedical text semantic disambiguation method - Google Patents
A kind of semi-supervised biomedical text semantic disambiguation method Download PDFInfo
- Publication number
- CN108491382A CN108491382A CN201810207213.XA CN201810207213A CN108491382A CN 108491382 A CN108491382 A CN 108491382A CN 201810207213 A CN201810207213 A CN 201810207213A CN 108491382 A CN108491382 A CN 108491382A
- Authority
- CN
- China
- Prior art keywords
- sentence
- data
- semantic disambiguation
- words
- text
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 36
- 239000013598 vector Substances 0.000 claims abstract description 41
- 230000015654 memory Effects 0.000 claims description 9
- 230000002457 bidirectional effect Effects 0.000 claims description 7
- 238000010380 label transfer Methods 0.000 claims description 7
- 238000002372 labelling Methods 0.000 claims description 7
- 230000007774 longterm Effects 0.000 claims description 3
- 238000012546 transfer Methods 0.000 abstract description 3
- 238000012545 processing Methods 0.000 abstract description 2
- 238000010801 machine learning Methods 0.000 abstract 1
- 238000012706 support-vector machine Methods 0.000 description 4
- 230000006403 short-term memory Effects 0.000 description 3
- 238000012935 Averaging Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 1
- 238000002716 delivery method Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000007500 overflow downdraw method Methods 0.000 description 1
- 230000007480 spreading Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
- G06F18/2155—Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the incorporation of unlabelled data, e.g. multiple instance learning [MIL], semi-supervised techniques using expectation-maximisation [EM] or naïve labelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Machine Translation (AREA)
Abstract
The present invention is a kind of semantic disambiguation method for biomedical text polysemant.Include mainly:The vectorization for being carried out word to biomedical text using Word2Vec is indicated, the vectorization for being built context sentence to term vector language model based on two-way LSTM models is indicated, recycle the relationship of sentence vector space similarity, the label of existing mark medical data is passed to most like no labeled data by combination tag TRANSFER METHOD according to probability, and all labeled data is finally combined to carry out semantic disambiguation to biomedical text.Since biomedical data is with strongly professional, the features such as term is more manually carry out processing to medical data and take time and effort and error-prone, can then greatly reduce handmarking's cost using the present invention, simultaneously compared to traditional machine learning method, the accuracy of semantic disambiguation can be effectively improved.
Description
Technical Field
The invention belongs to the field of natural language processing semantic disambiguation, and relates to a text semantic disambiguation method and system based on semi-supervised biomedicine. Specifically, semantic disambiguation is carried out on polysemous words in medical texts by utilizing a bidirectional long-short term memory model Bi-LSTM based on a label transfer method.
Background
Medical care providers have become increasingly available with the explosive growth of digital information in recent years. In the biomedical field, text data contains a great deal of knowledge and information in professional fields, and how to extract useful information from digitized text information becomes more and more important. Compared with general text data, the medical text data has the difficulties of strong specialization, difficult data annotation and the like. Understanding biomedical text semantic information and automated labeling of medical data has therefore become a research hotspot.
Traditional biomedical text semantic disambiguation methods include supervised learning methods, unsupervised learning methods, and knowledge base-based learning methods. The supervised learning method learns a potential classifier by using the labeled data and then predicts the potential semantics of the unknown data by using the classifier. This method usually requires a large amount of labeling data to ensure high accuracy of the classifier, and its manual labeling process is time and labor consuming, and thus is not the best choice in some fields of biomedicine where the amount of data is not large. The unsupervised learning method does not require labeled data, which is classified only by potential similarities. The unsupervised learning method greatly simplifies the engineering of manual labeling data, but the accuracy of the method needs to be further improved at any time, and the unsupervised learning method is not suitable for the fields with low fault tolerance rate, such as the medical field. The method based on the knowledge base utilizes the established and open-source medical knowledge base as a training sample, and has the advantages of high data reliability and the disadvantages of poor expansibility for establishing the knowledge base and difficult maintenance.
Biomedical text semantic disambiguation commonly utilizes a word vector model to vectorize each word in the text, word semantic information is stored in a low-dimensional space in the form of a vector, and similar semantic words have similar word vector representations. Common Word vector transformation techniques are the Word2Vec model, which includes the Skip-gram model and the CBOW model. The Skip-gram model predicts word vectors of adjacent window words by using target words, and the CBOW model predicts word vectors of target words by using adjacent window words. Similarly, a sentence vector uses the word vector characteristics of each word in the fused sentence to represent the semantic information of the sentence. Common traditional fusion methods include methods such as cascading, averaging, weighted summation and the like, wherein the cascading method is to directly splice word vectors of each word in a sentence according to the front and back sequence; the averaging method is that all word vectors in the sentence are averaged to obtain a sentence vector; the weighted summation method is that different weights are given according to the importance of each word to semantic information, and then sentence vectors are obtained by adding and summing according to the weights. Sentence vectors are often used as features to initialize language models to facilitate subsequent natural language processing tasks.
The Recurrent Neural Network (RNN) is a neural network model for processing text information, and is characterized by connecting the information of previous time to the task of current time, and has a certain memory. However, when dealing with long sentences, the RNN can theoretically deal with long-term dependency problems. However, in practice, bengio et al (1994) conducted intensive research on the problem, and found that RNN could not successfully learn these knowledge, and when the words are far apart, RNN may cause gradient explosion or gradient disappearance, leading to backward propagation failure, and failing to effectively retain the text information. To overcome this drawback, an improved model of RNN, the long short term memory model (LSTM), was proposed. Three ' gate ' structures are additionally arranged on the basis of the RNN internal structure, a forgetting gate ' determines the amount of information of the last time, an input gate ' determines the amount of information of the current time, and an output gate ' determines the amount of information output at the current time. The LSTM selectively utilizes the previous time information and the current time information through the special gate structure, and effectively avoids the problem of long-term dependence of RNN.
In recent years, semi-supervised learning is successfully applied to semantic disambiguation tasks, wherein the bootstrapping algorithm can achieve better accuracy. A low-recall classifier can learn from a small set of labeled examples and then expand the set of labels with those sentences with which to label unlabeled corpora with high confidence labels. In recent years, a label propagation algorithm for word sense disambiguation has been proposed. And compared with bootstrapping and a Support Vector Machine (SVM) supervised classifier. Tag propagation achieves better performance because it assigns tags by optimizing global targets, whereas bootstrapping, et al, traditional algorithms propagate tags based on instance local similarities.
Disclosure of Invention
The invention provides a biomedical text semantic disambiguation method and system based on semi-supervised learning and deep learning. The problems of weak global property, difficult manual labeling, high cost and the like of the traditional disambiguation method are solved to a certain extent, and the accuracy of semantic disambiguation of biomedical texts and general texts is improved.
The invention consists of two parts: 1. and fusing the word vectors to form sentence vectors based on the bidirectional long-short term memory network LSTM model, and generating semantic features of the sentences. 2. The semi-supervised semantic disambiguation model based on the label transfer method automatically labels unlabelled data by utilizing the similarity of the labeled data and simultaneously eliminates semantic ambiguity.
The technical scheme adopted by the invention comprises the following steps:
(I) forming sentence vectors based on the two-way long-short term memory network LSTM model, and generating semantic features of sentences
The two-way long-short term memory network LSTM model comprises: the device comprises an output layer, a backward hidden layer, a forward hidden layer and an input layer. Wherein, each time step has six specific weights to be recycled, and the six weights correspond to the following: input layers to forward and backward hidden layers (w 1, w 3), hidden layers to hidden layers (w 2, w 5), forward and backward hidden layers to output layers (w 4, w 6)
The hidden layer is LSTM model composed of three gates (9, input gate, output gate) and a memory cell (cell)
The word vector of each word is used as the input of the bidirectional recurrent neural network LSTM, and the current output is obtained together with the output at the last moment. The process is divided into three stages
The first stage is as follows: selectively filtering information of last moment by using sigmoid function through forget gate layer
Wherein,in order to output the signals at the last moment,for the current input, i.e. the current word vector,is 0 to 1, and is used for filtering the information learned at the last moment
And a second stage: generating new information requiring updating
Firstly, the input gate layer decides which values to update through sigmoid
Then, a new candidate value is generated by a tanh layer
Candidate value of new informationRefresh is performed
And a third stage: output of the model
Obtaining an initial output through a sigmoid layer
Then will be determined by the tanh functionScaling and multiplying the two to obtain the output of the model
Semi-supervised semantic disambiguation model based on label transfer method
The label transfer method transfers the label of the marked data to the unmarked data according to probability by utilizing the similarity among the sample data. Firstly, a graph model is constructed for all samples, wherein each sample is a node, and the nodeAndthe similarity calculation method comprises the following steps:
whereinIs a hyper-parameter. Each node propagates the label according to the probability according to the similarity with the surrounding node, the probability calculation method is:
n represents the number of edges.
Drawings
FIG. 1 is a schematic diagram of the system of the present invention.
FIG. 2 is a view showing the internal structure of the LSTM according to the present invention.
Detailed Description
(1) User inputs biomedical text to generate sentence vector characteristics
Firstly, dividing a biomedical text into phrase forms, then generating Word vectors by using a Word2Vec model for the phrases, then sequentially inputting the Word vectors in each sentence into a bidirectional long-short term memory model, and outputting two sentence vectors by the model, wherein the two sentence vectors are respectivelyAndforming a new sentence vector in a cascade manner:
New sentence vector re-input into multi-layer perceptronGet the final sentence vector:
。
(2) Automatic labeling of unmarked data and disambiguation of ambiguous words using tag transfer
And (3) taking the sentence vector characteristics obtained in the step (1) as vector graph nodes, calculating the similarity of each node, automatically spreading the most similar labels for the unlabeled data according to a label transfer method, and for ambiguous words, transferring semantic information which most conforms to the sentence vector according to the similarity.
(3) Results of the experiment
And (3) adopting an international universal medical text MSH WSD data set and an NLM WSD data set according to the step (1) and the step (2). Wherein the MSH WSD dataset contains 203 medically ambiguous entities, for a total of 37888 ambiguous sentences, wherein 37090 samples were manually annotated; the NLM WSD dataset contains 50 ambiguous entities, containing 552153 common sentences, where each ambiguous entity is artificially labeled with 100 samples. The experiment used 20: 1, randomly adding unmarked data of one twentieth of the original marked data from other medical corpora, and carrying out tests according to the semi-supervised model based on the label transfer method, wherein the test results are compared as follows:
table 1 MSH WSD data set experimental results.
Table 2 NLM WSD dataset experimental results.
Wherein, SVM represents to adopt the support vector machine as the model, LSTM represents to adopt the unidirectional long short-term memory model, Bi-LSTM represents to adopt the bidirectional long short-term memory model; WE (Con) represents that a cascade word vector is adopted as a sentence semantic feature, WE (Avg) represents that an average word vector method is adopted as the sentence semantic feature, WE (Wsum) represents that a weighted sum word vector method is adopted as the sentence semantic feature, and Con represents that a model adopted by the invention is adopted as the sentence semantic feature; LP represents the label delivery method proposed by the present invention. According to the experimental result, after the non-tag data is added to the language model, manual marking is not needed, the cost of manual marking of medical personnel is reduced, the optimal accuracy is obtained in semantic disambiguation of medical texts, and the method is proved to be feasible and effective.
Claims (5)
1. A semi-supervised biomedical text semantic disambiguation method is characterized by comprising the following steps:
(1) vectorization representation of words of medical text based on Word2Vec language model
(2) Performing vectorization representation on medical text sentences based on bidirectional long-term and short-term model Bi-LSTM on the basis of word vectors
(3) And automatically labeling the label-free data based on a label transfer method by using sentence vector space similarity, and performing semantic disambiguation on the polysemous words.
2. The Word2 vectored language model-based vectorized representation of words of medical text according to claim 1, wherein: the words may include both medical specific terms and general text words.
3. The Bi-directional long-short term memory model Bi-LSTM based vectorized representation of medical text sentences as claimed in claim 1 wherein: the bidirectional long-short term memory model Bi-LSTM inputs the word vector representation of each word in the sentence, and outputs the vectorized representation of the sentence.
4. The sentence vector spatial similarity of claim 1, wherein: and calculating the geometric distance between the sentence vectors by using an Euclidean distance formula, and calculating the similarity of the sentence vectors by using the reciprocal of the geometric distance.
5. The label delivery-based automated tagging of unlabeled data and semantic disambiguation of ambiguous words of claim 1, wherein: and (4) transferring the data label to the unlabeled data according to probability by using the similarity between the sentence vectors, and automatically carrying out semantic disambiguation on the medical text data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810207213.XA CN108491382A (en) | 2018-03-14 | 2018-03-14 | A kind of semi-supervised biomedical text semantic disambiguation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810207213.XA CN108491382A (en) | 2018-03-14 | 2018-03-14 | A kind of semi-supervised biomedical text semantic disambiguation method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108491382A true CN108491382A (en) | 2018-09-04 |
Family
ID=63339234
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810207213.XA Pending CN108491382A (en) | 2018-03-14 | 2018-03-14 | A kind of semi-supervised biomedical text semantic disambiguation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108491382A (en) |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109377203A (en) * | 2018-09-13 | 2019-02-22 | 平安医疗健康管理股份有限公司 | Medical settlement data processing method, device, computer equipment and storage medium |
CN110059185A (en) * | 2019-04-03 | 2019-07-26 | 天津科技大学 | A kind of medical files specialized vocabulary automation mask method |
CN110287337A (en) * | 2019-06-19 | 2019-09-27 | 上海交通大学 | The system and method for medicine synonym is obtained based on deep learning and knowledge mapping |
CN110705206A (en) * | 2019-09-23 | 2020-01-17 | 腾讯科技(深圳)有限公司 | Text information processing method and related device |
CN111221960A (en) * | 2019-10-28 | 2020-06-02 | 支付宝(杭州)信息技术有限公司 | Text detection method, similarity calculation method, model training method and device |
CN111414473A (en) * | 2020-02-13 | 2020-07-14 | 合肥工业大学 | Semi-supervised classification method and system |
CN111597296A (en) * | 2019-02-20 | 2020-08-28 | 阿里巴巴集团控股有限公司 | Commodity data processing method, device and system |
CN111881979A (en) * | 2020-07-28 | 2020-11-03 | 复旦大学 | Multi-modal data annotation device and computer-readable storage medium containing program |
CN113158687A (en) * | 2021-04-29 | 2021-07-23 | 新声科技(深圳)有限公司 | Semantic disambiguation method and device, storage medium and electronic device |
CN113742458A (en) * | 2021-09-18 | 2021-12-03 | 苏州大学 | Natural language instruction disambiguation method and system for mechanical arm grabbing |
CN113779987A (en) * | 2021-08-23 | 2021-12-10 | 科大国创云网科技有限公司 | Event co-reference disambiguation method and system based on self-attention enhanced semantics |
CN115293158A (en) * | 2022-06-30 | 2022-11-04 | 撼地数智(重庆)科技有限公司 | Disambiguation method and device based on label assistance |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2002010985A2 (en) * | 2000-07-28 | 2002-02-07 | Tenara Limited | Method of and system for automatic document retrieval, categorization and processing |
CN1916887A (en) * | 2006-09-06 | 2007-02-21 | 哈尔滨工程大学 | Method for eliminating ambiguity without directive word meaning based on technique of substitution words |
US20140040275A1 (en) * | 2010-02-09 | 2014-02-06 | Siemens Corporation | Semantic search tool for document tagging, indexing and search |
CN104268200A (en) * | 2013-09-22 | 2015-01-07 | 中科嘉速(北京)并行软件有限公司 | Unsupervised named entity semantic disambiguation method based on deep learning |
CN106569998A (en) * | 2016-10-27 | 2017-04-19 | 浙江大学 | Text named entity recognition method based on Bi-LSTM, CNN and CRF |
CN106919646A (en) * | 2017-01-18 | 2017-07-04 | 南京云思创智信息科技有限公司 | Chinese text summarization generation system and method |
CN106980608A (en) * | 2017-03-16 | 2017-07-25 | 四川大学 | A kind of Chinese electronic health record participle and name entity recognition method and system |
CN106997379A (en) * | 2017-03-20 | 2017-08-01 | 杭州电子科技大学 | A kind of merging method of the close text based on picture text click volume |
CN107301213A (en) * | 2017-06-09 | 2017-10-27 | 腾讯科技(深圳)有限公司 | Intelligent answer method and device |
-
2018
- 2018-03-14 CN CN201810207213.XA patent/CN108491382A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2002010985A2 (en) * | 2000-07-28 | 2002-02-07 | Tenara Limited | Method of and system for automatic document retrieval, categorization and processing |
CN1916887A (en) * | 2006-09-06 | 2007-02-21 | 哈尔滨工程大学 | Method for eliminating ambiguity without directive word meaning based on technique of substitution words |
US20140040275A1 (en) * | 2010-02-09 | 2014-02-06 | Siemens Corporation | Semantic search tool for document tagging, indexing and search |
CN104268200A (en) * | 2013-09-22 | 2015-01-07 | 中科嘉速(北京)并行软件有限公司 | Unsupervised named entity semantic disambiguation method based on deep learning |
CN106569998A (en) * | 2016-10-27 | 2017-04-19 | 浙江大学 | Text named entity recognition method based on Bi-LSTM, CNN and CRF |
CN106919646A (en) * | 2017-01-18 | 2017-07-04 | 南京云思创智信息科技有限公司 | Chinese text summarization generation system and method |
CN106980608A (en) * | 2017-03-16 | 2017-07-25 | 四川大学 | A kind of Chinese electronic health record participle and name entity recognition method and system |
CN106997379A (en) * | 2017-03-20 | 2017-08-01 | 杭州电子科技大学 | A kind of merging method of the close text based on picture text click volume |
CN107301213A (en) * | 2017-06-09 | 2017-10-27 | 腾讯科技(深圳)有限公司 | Intelligent answer method and device |
Non-Patent Citations (3)
Title |
---|
DAYU YUAN等: "Semi-supervisedWord Sense Disambiguation with Neural Models", 《ARXIV:1603.07012V2[CS.CL]》 * |
ZHENG-YU NIU等: "Word Sense Disambiguation Using Label Propagation Based Semi-Supervised Learning", 《PROCEEDINGS OF THE 43RD ANNUAL MEETING OF THE ACL》 * |
李丽双等: "基于CNN-BLSTM-CRF模型的生物医学命名实体识别", 《中文信息学报》 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109377203A (en) * | 2018-09-13 | 2019-02-22 | 平安医疗健康管理股份有限公司 | Medical settlement data processing method, device, computer equipment and storage medium |
CN111597296A (en) * | 2019-02-20 | 2020-08-28 | 阿里巴巴集团控股有限公司 | Commodity data processing method, device and system |
CN110059185A (en) * | 2019-04-03 | 2019-07-26 | 天津科技大学 | A kind of medical files specialized vocabulary automation mask method |
CN110059185B (en) * | 2019-04-03 | 2022-10-04 | 天津科技大学 | Medical document professional vocabulary automatic labeling method |
CN110287337A (en) * | 2019-06-19 | 2019-09-27 | 上海交通大学 | The system and method for medicine synonym is obtained based on deep learning and knowledge mapping |
CN110705206A (en) * | 2019-09-23 | 2020-01-17 | 腾讯科技(深圳)有限公司 | Text information processing method and related device |
CN111221960A (en) * | 2019-10-28 | 2020-06-02 | 支付宝(杭州)信息技术有限公司 | Text detection method, similarity calculation method, model training method and device |
CN111414473A (en) * | 2020-02-13 | 2020-07-14 | 合肥工业大学 | Semi-supervised classification method and system |
CN111414473B (en) * | 2020-02-13 | 2021-09-07 | 合肥工业大学 | Semi-supervised classification method and system |
CN111881979A (en) * | 2020-07-28 | 2020-11-03 | 复旦大学 | Multi-modal data annotation device and computer-readable storage medium containing program |
CN113158687A (en) * | 2021-04-29 | 2021-07-23 | 新声科技(深圳)有限公司 | Semantic disambiguation method and device, storage medium and electronic device |
CN113779987A (en) * | 2021-08-23 | 2021-12-10 | 科大国创云网科技有限公司 | Event co-reference disambiguation method and system based on self-attention enhanced semantics |
CN113742458A (en) * | 2021-09-18 | 2021-12-03 | 苏州大学 | Natural language instruction disambiguation method and system for mechanical arm grabbing |
CN115293158A (en) * | 2022-06-30 | 2022-11-04 | 撼地数智(重庆)科技有限公司 | Disambiguation method and device based on label assistance |
CN115293158B (en) * | 2022-06-30 | 2024-02-02 | 撼地数智(重庆)科技有限公司 | Label-assisted disambiguation method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108491382A (en) | A kind of semi-supervised biomedical text semantic disambiguation method | |
Sharma et al. | Literature survey of statistical, deep and reinforcement learning in natural language processing | |
CN109800437B (en) | Named entity recognition method based on feature fusion | |
Yao et al. | Bi-directional LSTM recurrent neural network for Chinese word segmentation | |
Gasmi et al. | LSTM recurrent neural networks for cybersecurity named entity recognition | |
Nguyen et al. | Recurrent neural network-based models for recognizing requisite and effectuation parts in legal texts | |
US20210141863A1 (en) | Multi-perspective, multi-task neural network model for matching text to program code | |
US20200302118A1 (en) | Korean Named-Entity Recognition Method Based on Maximum Entropy Model and Neural Network Model | |
CN110457682B (en) | Part-of-speech tagging method for electronic medical record, model training method and related device | |
CN111782769B (en) | Intelligent knowledge graph question-answering method based on relation prediction | |
Jabreel et al. | Target-dependent sentiment analysis of tweets using bidirectional gated recurrent neural networks | |
CN111274829B (en) | Sequence labeling method utilizing cross-language information | |
CN109960728A (en) | A kind of open field conferencing information name entity recognition method and system | |
US20240233877A1 (en) | Method for predicting reactant molecule, training method, apparatus, and electronic device | |
Popov | Neural network models for word sense disambiguation: an overview | |
Ren et al. | Detecting the scope of negation and speculation in biomedical texts by using recursive neural network | |
Thattinaphanich et al. | Thai named entity recognition using Bi-LSTM-CRF with word and character representation | |
Deng et al. | Self-attention-based BiGRU and capsule network for named entity recognition | |
Anandika et al. | A study on machine learning approaches for named entity recognition | |
Zhang et al. | Using a pre-trained language model for medical named entity extraction in Chinese clinic text | |
Foland et al. | CU-NLP at SemEval-2016 task 8: AMR parsing using LSTM-based recurrent neural networks | |
Bhuyan et al. | Textual entailment as an evaluation metric for abstractive text summarization | |
CN114373554A (en) | Drug interaction relation extraction method using drug knowledge and syntactic dependency relation | |
Abd et al. | A comparative study of word representation methods with conditional random fields and maximum entropy markov for bio-named entity recognition | |
Liu et al. | Recognizing proper names in ur iii texts through supervised learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20180904 |