CN108875809A

CN108875809A - The biomedical entity relationship classification method of joint attention mechanism and neural network

Info

Publication number: CN108875809A
Application number: CN201810554915.5A
Authority: CN
Inventors: 林鸿飞; 郑巍
Original assignee: Dalian University of Technology
Current assignee: Dalian University of Technology
Priority date: 2018-06-01
Filing date: 2018-06-01
Publication date: 2018-11-23

Abstract

A kind of biomedical entity relationship classification method for combining attention mechanism and neural network, belongs to biomedical and data mining technology field, to solve biomedical entity relationship classification problem, main points be include S1, based on referring to the text-processing parsed；S2, mode input vector of the building based on attention mechanism；S3, biomedical entity relationship disaggregated model of the building based on two-way LSTM；S4, biomedical entity relationship classification is carried out using relationship disaggregated model.The sentence that the present invention is directed in biological document devises the parsing of the reference based on following, then from the basic unit word for constituting sentence, the insertion vector of word is weighting in using attention mechanism, the weight of the prominent keyword that the classification of biomedical entity relationship is had a major impact, make the relationship between candidate entity is apparent to be illustrated, carries out biomedical entity relationship classification.

Description

The classification of the biomedical entity relationship of joint attention mechanism and neural network Method

Technical field

The present invention relates to biomedical and data mining technology field, especially a kind of joint attention mechanism and mind Biomedical entity relationship classification method through network.

Background technique

With the development of the bioinformatics of data-driven, found between the biomedical entity of prediction by calculation method Relationship becomes a kind of trend.Text mining method based on calculating can be from a large amount of available biometric databases and non-structured text Middle discovery mode and knowledge.Currently, the newest unstructured data of magnanimity is hidden in specialized database or scientific literature.In It is to detect and predict that biomedical entity relationship is one effective and feasible from document and database using Text Mining Technology Approach.In addition, this can also automate the database annotation process realized by having been manually done, biomedical map is additionally aided Building.

Biomedical entity relationship research is excavated in traditional slave text mainly utilizes the machine in statistical machine learning theory Device learning method such as support vector machines.These methods are dependent on well-designed core or the feature carefully designed.Feature Design needs domain expert, but also is a labor-intensive skill sex work based on testing repeatedly.In addition, this A little methods are limited to the generalization ability for not meeting word.Currently, method neural network based can be simple non-thread by constructing Property model automatically learns the multi-level expression of non-structured text, has shown in each task of natural language processing field Its potentiality are shown.There are two main neural network framework, that is, convolutional neural networks and Recognition with Recurrent Neural Network at present.The former is more Suitable for learning continuous local mode.Although the latter can learn discontinuous global schema, its with biasing characteristic, i.e., after The information of input is more dominant to target.

However, due to the general long sentence of Biomedical literature text and complicated clump sentence is in the majority, and the spy with scientific language Property.Moreover, the word important to final relationship might not but appear in the rear portion of sentence.Although above-mentioned different text mining sides Method explores the interactive relation between various classification biomedicine entities, but the biomedical entity in the majority in long complex sentence It is not also very satisfactory on interaction classification performance.

Summary of the invention

The object of the present invention is to provide a kind of based on the neural network framework of attention mechanism in Biomedical literature It has marked relationship between entity and has carried out more accurate and effectively relationship classification method.

The present invention solves technical solution used by prior art problem：A kind of joint attention mechanism and nerve net The biomedical entity relationship classification method of network, includes the following steps：

S1, the text-processing based on reference parsing：The open data set marked of acquisition, using text-processing technology, refers to Generation parsing and technology of prunning branches handle sentence, and processing method is as follows：

A1, initial treatment：One special symbol replaces the digital substring for being not belonging to biomedical entity substring.It deletes not Bracket comprising candidate entity；Extensive for method, all biomedical entities are replaced with entity*, and wherein * indicates 0, 1,2,…；Delete the sentence of sentence or two entities with same symbol only comprising an entity；

A2, the reference based on following handle parsing：For in biological document text have colon ":" include The sentence pattern of " following [refer to word] ", if two entities of candidate centering be located at colon ":" both sides, benefit It is replaced with following rule, wherein [w] * indicates one or more word symbols：

Sentence pattern 1:entity1[w]*following[cataphora word]:[w]*entity2[w]*.

Rule 1:entity1[w]*following entity2.

Sentence pattern 2:[w]*following[cataphora word][w]*entity2:[w]*entity1[w]*.

Rule 2:[w]*following entity1[w]*entity2.

A3, sentence beta pruning：Each sentence in corpus is trimmed to fixed input length：It calculates between all candidate entities Maximum spacing after, select the text size n bigger than this spacing as the input length of sentence.In order to reach this fixation Input length n, the sentence of n be greater than for length, in addition to retain two entities and between all texts, and before and after retaining entity The word of same number deletes word extra in sentence based on this strategy；It is less than the sentence of n for length, it is special with one Symbol behind sentence patch not enough this length each input sentence.

S2, mode input vector of the building based on attention mechanism：The mode input vector include input feature vector to Amount, input attention vector sum sentence vector, processing method are as follows：

The building of B1, input feature value：Sentence S={ the w of a given beta pruning₁,w₂,…,w_i,…,w_n, each word w_i It is expressed as three features：Word itself word, the PoS label of word and position；Position feature reflects current word w_iBe mentioned The relative distance of two candidate entities；The position sequence with current word in sentence subtracts the position sequence of entity 1 and entity 2 respectively Obtain word w_iDistance d relative to entity 1 and entity 2₁And d₂；The PoS label for combining a word and it distinguishes a word in difference Semanteme in sentence；By the parsing of Stamford resolver, processed sentence can obtain PoS of each word in sentence above Label；Each feature group has an insertion dictionary；Assuming thatIt is the insertion dictionary of k-th of feature group, m here_kIt is one super Parameter indicates the dimension of feature insertion vector, l_kIt is dictionary V_kThe number of features for including；Word in each insertion dictionary can be by Random initializtion, or it is vector initialising with word insertion trained in advance；To a word w_iFor, by looking into corresponding insertion It is a real value row vector that dictionary, which can map index of each feature in dictionary, to obtain word w_iWord, The corresponding insertion vector of PoS and position, is expressed asAndWith

B2, the building for inputting attention vector：Vector is embedded in using initial word of the attention mechanism into B1 It generates the word towards candidate entity and is embedded in vector；It is equal to the row vector α of sentence maximum length n using two length^jTo quantify Each word w in one sentence_iWith j-th candidates entity e_jThe degree of correlation factor, wherein { 1,2 } j ∈；α^jIt is defined as follows Shown in formula：

Wherein,WithIt is word w respectively_iWith j-th candidates entity e_jWord be embedded in vector, score function regards face as To the function of candidate entity, it is defined as follows：

Wherein, symbol dot indicates two vectorsWithOn dot product operations.m₁It is the dimension of word insertion vector.It is related The degree factorWithAct on word w_iInitial word be embedded in vectorOn；The synergy α of the two_iIt is expressed asα_iIt is applied to initial word insertion vectorOn regard as towards candidate entity word insertion vectorIt is defined It is represented byWherein symbol * indicates the multiplication that step-by-step calculates；

The building of B3, sentence vector：

Construct vectorTo indicate word w_iSemanteme, wherein x_i∈R^m, m=m₁+m₂+ 2m₃, m₂And m₃It is the dimension of PoS and position insertion vector respectively；" | | " indicate attended operation.Then, sentence S can be expressed as One real-valued vectors array S_emb=[x₁,x₂,…,x_i,...,x_n]。

S3, biomedical entity relationship disaggregated model of the building based on two-way LSTM：It is deposited using two-way with length term The Recognition with Recurrent Neural Network of storage unit LSTM carries out supervised learning modeling；Embeding layer therein can be realized and be obtained in B1 in step S2 The vector of each feature indicates；Attention layers of input are realized the building for inputting attention vector in step S2 in B2；Merge Layer realizes the building of sentence vector in B3 in step S2；Two-way LSTM layers uses two-way LSTM network, it is by a forward direction LSTM With a backward LSTM composition.For word w_iFor, two LSTM collect available from the front to the back and from the front to the back respectively Contextual information；The two-way LSTM layers of output in moment n is the connection of LSTM output vector in both directionh_n Indicate the high-level semantic of entire sentence；Utilize the logistical regression classifier (logistic for having softmax function Regression) the classifier as candidate entity；Softmax function is with two-way LSTM layers of output h_nIt is defeated as input Y indicates probability distribution of the candidate translation example on each different relationship class label out；The probability distribution of jth class label is expressed as P (y=j | S)=softmax (h_nW_s+b_s), S indicates sentence, W_sIt is the weight matrix for needing to learn, b_sIt is the biasing for needing to learn Vector；The corresponding label of the classification of maximum probability is the relationship type of candidate translation example, is expressed as Wherein C is the set of presumable label of classifying in biological corpus；It sets for predicting that the loss function of error is damaged as cross entropy Lose functionL is the sample number marked in training set, what subscript k indicated to be classified for k-th Sentence, θ is all parameters in model, using RMSprop optimization algorithm (Resilient mean square Propagation the parameter of loss function) is updated, training method is as follows：

e1:It is accumulative that parameter learning rate η in RMSprop optimization algorithm, momentum term parameter beta, initial rate v, gradient are set Measure rate of decay ρ, gradient cumulative amount r and maximum number of iterations maxIter and most small quantities of m.

e2:Gradient cumulative amount r=0, the number of iterations iterCount=0 are initialized, initialization error current and last time are accidentally Difference is infinity, i.e. currError=lastError=0, random initializtion parameter θ；

e3:Parameter θ more in new model as follows：

θ←θ+v

Wherein ⊙ indicates the multiplication that step-by-step calculates.

e4:The number of iterations iterCount adds 1, calculates error current according to the loss function in step S3, if current miss Difference is greater than last time error, that is, currError>LastError or the number of iterations are equal to maximum number of iterations, that is, iterCount =maxIter then meets the condition of convergence and goes to step e5；Otherwise, lastError=currError continues e3；

e5:All parameter θs in preservation model are into file.

S4, biomedical entity relationship is predicted：The parameter value θ that read step e5 training obtains passes to the network mould in S3 Type；The sentence including at least two biomedical entities in Biomedical literature is extracted, matching principle constructs two-by-two according to entity Relationship example；Initial treatment is carried out to text using the method in step S1, refers to parsing and beta pruning；Then using in step S2 The method of B1, obtain respectively word in sentence, the PoS label of each word and current word and two entities with respect to away from From looking into corresponding vector dictionary and obtain the index of these features；Based on input, model can export each pair of relationship example each Probability value in classification, wherein that maximum one kind of probability value is the corresponding class label of candidate translation example, to obtain candidate Relationship type between entity.

The beneficial effects of the present invention are：The present invention innovatively proposes one by introducing input attention mechanism The biomedical entity relationship classification method of kind joint attention mechanism and neural network.This method is to a certain extent gram The biasing defect of Recognition with Recurrent Neural Network LSTM is taken, this defect makes LSTM neglect one when handling biomedical text long sentence Information before the sentence important to final classification result a bit.Moreover, this method can efficiently identify word in long complicated sentence Between short distance mode and remote mode, thus realize automatically and efficiently in biological document entity carry out relation Class.It is evaluated and tested in corpus on tri- data sets of DrugBank, Medline and Overall in embodiment DDIExtraction 2013 Primary evaluation index F-score on increase rate be respectively 3.3%, 21.7% and 6.2%, demonstrate the method for the present invention pair The validity that entity relationship is classified in Biomedical literature.

Detailed description of the invention

Fig. 1 is the flow diagram of relationship classification method of the present invention；

Fig. 2 is the biomedical entity relationship disaggregated model signal of present invention joint attention mechanism and neural network Figure；

Fig. 3 is the visualization that the embodiment of the present invention inputs attention.

Specific embodiment

Below in conjunction with the drawings and the specific embodiments, the present invention will be described：

Embodiment：

According to the above-mentioned description for being directed to method and system specific embodiment involved in the present invention, in conjunction with specific embodiments It is illustrated.

The present embodiment uses DrugBank the and Medline data set in 2013 evaluation and test task of DDIExtraction, In be divided into training set and test set again.DrugBank training set and test set separately include 31270 and 1221 sentences, also distinguish Represent the sentence in the sentence and biomedical article in Service functions.In experimentation, in two datasets Training dataset merge as training set, using two test sets are constant and the union of two test sets when test Overall。

It is a kind of joint attention mechanism and neural network biomedical entity relationship classification method specific steps such as Under：

1, based on the text-processing for referring to parsing：The above-mentioned data set marked is acquired, using text-processing technology, is referred to After parsing is handled, each sentence in corpus is trimmed to fixed input length.It calculates between all candidate entities most After big spacing, each 5 words before and after two candidate entities are kept respectively.Then, 85 are set as by the sentence length n of beta pruning.

2, the mode input vector based on attention mechanism is constructed：The mode input vector include input feature vector to Amount, input attention vector sum sentence vector, steps are as follows：

For input feature vector term vector, the pre-training corpus that word indicates includes two parts, amounts to about 2.5G byte.One It point is the abstract of related article in the Medline before obtained and key word of the inquiry " drug " in PubMed 2016. Another part is the corpus in 2013 evaluation and test task of DDIExtraction.PoS vector has used marked PoS label DDIExtraction 2013 evaluates and tests the sentence in task corpus as training corpus.Two kinds of insertion vector all passes through out Source tool word2vec tool training, using Skip-Gram model and the negative method of sampling.Position insertion vector uses random initial Change assignment, institute's directed quantity is initialized to meet the random value of normal state standard profile.The dimension of word insertion vector is set as m₁= 200, PoS and position insertion vector dimension m₂、m₃It is respectively set to 10.

When building input attention vector, vector is embedded in using input word of the attention mechanism into B1；Benefit It is equal to the row vector of sentence maximum length 85 with two lengthTo quantify in a sentence Each word w_iWith j-th candidates entity e_jThe degree of correlation factor, wherein { 1,2 } j ∈.Score function isWherein i≤85.Then, the degree of correlation factorWithAct on word w_iInitial word It is embedded in vectorOn；The synergy α of the two_iIt is expressed asSuch as the sentence in sentence corpus “Synergism was also noted when entity0 was combined with entity1and Entity2. ", above formula α is utilized_iThe result of the word grade attention calculated is as shown in Figure 3.It can be sent out from figure Existing, word " synergism ", " combined " and " when " has higher attention weight relative to other words.Due to two Real relationship between a candidate's entity entity0 and entity1 is " effect ", therefore, what these were calculated Attention value is reasonable.Next, α_iIt is applied to initial word insertion vectorOn obtain towards candidate entity word insertion VectorWherein symbol * indicates the multiplication that step-by-step calculates.

Finally, including that word is embedded in vectorPoS is embedded in vectorAnd position is embedded in vectorWithInside all Vector connect into a new vectorIndicate word w_iSemanteme, wherein x_i∈R^m, m=230. Then, sentence S can be expressed as a real-valued vectors array S_emb=[x₁,x₂,...,x_i,...,x₈₅]。

3, classified using the framework based on two-way LSTM to biomedical entity relationship and carry out supervised learning modeling：Before It states the training set data that step process is crossed and is transmitted to two-way LSTM Recognition with Recurrent Neural Network progress supervised learning as shown in Fig. 2 Modeling；The two-way LSTM network used, it is made of a forward direction LSTM and a backward LSTM.For word w_iFor, two A LSTM collects available contextual information from the front to the back and from the front to the back respectively.The adaptive calculating answered the door of three of LSTM according to By pervious state h_t-1With current input state x_t, formula is as follows：

i_t=σ (W_i·x_t+U_i·h_t-1+b_i)

f_t=σ (W_f·x_t+U_f·h_t-1+b_f)

o_t=σ (W_o·x_t+U_o·h_t-1+b_o)

Wherein σ indicates sigmoid function, and the value range of three doors is [0,1].After having three doors, by former Cell state C_t-1And candidate stateSynergy determine current cell state C_t.The output h of LSTM unit_tIt is by exporting The fixed cell state of thresholding updates shown in following formula：

BLSTM is in the connection that the output of moment n=85 is LSTM output vector in both directionWherein, LSTM implies unit number and is set as inputting identical dimension 230, h with LSTM_nDimension be 460.

Using the logistical regression classifier with softmax function as the classifier of candidate entity；Softmax letter Number is with two-way LSTM layers of output h_nAs input, exporting y indicates candidate translation example on each different relationship class label Probability distribution；The probability distribution of jth class label is expressed as p (y=j | S)=softmax (h_nW_s+b_s), S indicates sentence, W_sIt is The weight matrix for needing to learn, b_sIt is the bias vector for needing to learn；The corresponding label of the classification of maximum probability is candidate translation example Relationship type, be expressed asSymbol C=5.Next it is used to predict the loss function of error For cross entropy loss functionL is the sample number marked in training set, and subscript k indicates kth A sentence being classified.

4, the parameter of training entity relationship disaggregated model：The parameter of loss function is updated with RMSprop optimization algorithm, Middle parameter learning rate η=0.001, momentum term parameter beta=0.9.Ginseng is adjusted using 5 times of cross validations of Sentence-level on training set Number saves the parameter of acquisition into file with the performance of optimization system.

5, biomedical entity relationship is predicted：It reads the parameter value θ that file training obtains in 4 and passes to net shown in Fig. 2 Network model；Extracting in test set includes at least two biomedical sentences, and according to entity, matching principle building relationship is real two-by-two Example；Text is handled using the method for B1 in step S1 and S2, respectively the PoS of the word in acquisition sentence, each word Label and relative distance with two entities, look into corresponding vector dictionary and obtain the index of these features and pass to Fig. 2 mould Embeding layer in type；Based on input, model exports probability value of each pair of relationship example in each classification, and wherein probability value is maximum That one kind be the corresponding class label of candidate translation example, thus the relationship type between obtaining entity.

For the validity of verification method, three kinds of control methods of experimental selection：

(1) the method RAIHANI based on SVM：System RAIHANI devises many rule and feature, such as chunk, touching Send out word, negative filtering and SAME_BLOK etc..Classifier also individually designed many of this method for each subtype Different features.

(2) the method jointAB-LSTM based on LSTM：JointAB-LSTM has combined two LSTM networks, one of them Attention technology has been used at pooling layers, has been on 2013 corpus of DDIExtraction based on neural network framework method In existing best one of LSTM method.

(3) the method MCCNN based on convolutional neural networks：MCCNN has used multichannel word to be embedded in vector, is Based on best one of CNN method existing in network architecture method on 2013 corpus of DDIExtraction.

In addition, (2) and (3), which are removed, uses common text-processing technology, also excluded using filtering technique incoherent Negative example.

Table 1 gives 4 kinds of methods including the method for the present invention and 3 control methods on experiment corpus described previously The F-score evaluation index test result of Overall test set co-relation classification, F-score here refer in more classification often The micro- average Micro_F used.F-score is the evaluation index that text field relationship extracts frequent accepted standard, it determines Justice is as follows：

Wherein Micro_P indicates precision, and Micro_R indicates recall rate, TP_i(true positives, real positive example) generation Table sort device is predicted as practical in the positive example example of the i-th class relationship being also the number of positive example, FP_i(false positives, it is empty False positive example) it represents classifier and is predicted as actually being negative in the positive example example of the i-th class relationship the number of example, FN_i(false Negatives, false negative example) represent classifier prediction be negative example example in be actually the i-th class relationship positive example number.Precision Micro_P and recall rate Micro_R considers looking into for algorithm respectively and parasexuality and looks into full property.But the two indexs cannot be more fully The performance of a categorizing system is embodied, therefore, is generally used for playing a balance between precision Micro_P and recall rate Micro_R The Micro_F value of effect evaluates the overall performance of an algorithm.

Value in table 1 with runic is the peak of each relationship classification on corresponding data collection, and symbol "-" indicates not corresponding Value (similarly hereinafter)." DEC " indicates the relationship detection for only carrying out binary, i.e., either with or without relationship." CLA " indicates relationship classification. " MEC ", " EFF ", " ADV " and " INT " indicates " mechanism ", " effect ", " advice " and " int " type respectively.

Table 1 not homologous ray in O_VERALLThe performance of test set co-relation classification compares

Table 2 give including the method for the present invention and wherein 3 kinds of methods including 2 control methods in experiment language described previously Expect the F-score evaluation index test result of upper three test set co-relations classification (CLA), wherein joint AB-LSTM method Do not provide experiment corresponding experimental result.

Performance of the homologous ray on three data sets does not compare table 2

Listed experimental result can be seen that method proposed by the invention obtains on experimental data set from above-mentioned two table The detection (DEC) and classification (CLA) performance obtained.This also illustrate in this method by based on refer to parsing text-processing, The input vector of attention mechanism construction and suitable model for the classification of biological document entity relationship, truly have it is very high Performance boost.Tri- data sets of DrugBank, Medline and Overall in corpus are evaluated and tested in DDIExtraction 2013 On primary evaluation index F-score on increase rate be respectively 3.3%, 21.7% and 6.2%, demonstrate the method for the present invention The validity classified to entity relationship in Biomedical literature.

The above content is combine specific optimal technical scheme further detailed description of the invention, and it cannot be said that Specific implementation of the invention is only limited to these instructions.For those of ordinary skill in the art to which the present invention belongs, exist Under the premise of not departing from present inventive concept, a number of simple deductions or replacements can also be made, all shall be regarded as belonging to of the invention Protection scope.

Claims

1. a kind of biomedical entity relationship classification method for combining attention mechanism and neural network, which is characterized in that packet Include following steps：S1, the text-processing based on reference parsing；S2, mode input vector of the building based on attention mechanism； S3, biomedical entity relationship disaggregated model of the building based on two-way LSTM；S4, biomedicine is carried out using relationship disaggregated model Entity relationship classification.

2. the biomedical entity relationship point of a kind of joint attention mechanism and neural network according to claim 1 Class method, which is characterized in that the step：

S1, the text-processing based on reference parsing：The open data set marked of acquisition, using text-processing technology to text into Row initialization process, using based on following reference dissection process and technology of prunning branches sentence is handled；

S2, mode input vector of the building based on attention mechanism：The mode input vector include input feature value, Attention vector sum sentence vector is inputted, processing method is as follows：

The building of B1, input feature value：Sentence S={ the w of a given beta pruning₁,w₂,…,w_i,…,w_n, each word w_iBy table It is shown as three feature vectors：Word itself word, the PoS label of word and position, are expressed asAndWith

B2, the building for inputting attention vector：Using initial word insertion vector of the attention mechanism into B1 with life Vector is embedded at the word towards candidate entity；It is equal to the row vector α of sentence maximum length n using two length^jTo quantify one Each word w in a sentence_iWith the degree of correlation factor of j-th candidates entity, α^jIt is defined as follows shown in formula：

Wherein, { 1,2 } j ∈,WithIt is word w respectively_iWith j-th candidates entity e_jWord be embedded in vector, score function quilt Regard the function towards candidate entity as, is defined as follows：

Wherein, symbol dot indicates two vectorsWithOn dot product operations；m₁It is the dimension of word insertion vector；Degree of correlation The factorWithAct on word w_iInitial word be embedded in vectorOn, the synergy α of the two_iIt is expressed as α_iIt is applied to initial word insertion vectorOn regard as towards candidate entity word insertion vectorIts definition is expressed asWherein symbol * indicates the multiplication that step-by-step calculates；

The building of B3, sentence vector：

Construct vectorTo indicate word w_iSemanteme, wherein x_i∈R^m, m=m₁+m₂+2m₃, m₂With m₃It is the dimension of PoS and position insertion vector respectively；" | | " indicate attended operation；Sentence S is expressed as a real-valued vectors array S_emb=[x₁,x₂,…,x_i,...,x_n]；

S3, biomedical entity relationship disaggregated model of the building based on two-way LSTM：List is stored with length term using two-way The Recognition with Recurrent Neural Network of member carries out supervised learning modeling；Including：Embeding layer inputs attention layers, merges layer, two-way LSTM layers；Using the logistical regression classifier with softmax function as the classifier of candidate translation example；Softmax letter Number is with two-way LSTM layers of output h_nAs input, exporting y indicates candidate translation example on each different relationship class label Probability distribution；The probability distribution of jth class label is expressed as p (y=j | S)=softmax (h_nW_s+b_s), S indicates sentence, W_sIt is The weight matrix for needing to learn, b_sIt is the bias vector for needing to learn；The corresponding label of the classification of maximum probability is candidate translation example Relationship type, be expressed asWherein C is the collection of presumable label of classifying in biological corpus It closes；The loss function for predicting error is set as cross entropy loss functionL is trained The sample number of mark is concentrated, subscript k indicates that k-th of sentence being classified, θ are all parameters in model；

S4, biomedical entity relationship is predicted：Initial treatment is carried out to text using the method in step S1, parsing is referred to and cuts Branch；Then using the method for B1 in step S2, word in sentence, the PoS label of each word and current single are obtained respectively The relative distance of word and two entities looks into corresponding vector dictionary and obtains the index of these features；Based on input, model can be defeated Probability value of each pair of relationship example in each classification out, wherein that maximum one kind of probability value is the corresponding class of candidate translation example Distinguishing label, thus the relationship type between obtaining entity.

3. the biomedical entity relationship point of a kind of joint attention mechanism and neural network according to claim 2 Class method, it is characterised in that in step S1, reference processing and sentence beta pruning of the initial treatment based on following, table It is shown as：

A1, initial treatment：One special symbol replaces the digital substring for being not belonging to biomedical entity substring；Deletion does not include The bracket of candidate entity；Extensive for method, all biomedical entities are replaced with entity*, and * indicates 0,1,2 ...； Delete the sentence of sentence or two entities with same symbol only comprising an entity；

A2, the reference based on following handle parsing：For in biological document text with ":" include " following The sentence pattern of [refer to word] ", if two entities of candidate centering be located at ":" both sides, using following rule into Row replacement, wherein [w] * indicates one or more word symbols：

Sentence pattern 1:entity1[w]*following[cataphoraword]:[w]*entity2[w]*.

Rule 1:entity1[w]*following entity2.

Sentence pattern 2:[w]*following[cataphora word][w]*entity2:[w]*entity1[w]*.

Rule 2:[w]*following entity1[w]*entity2.

A3, sentence beta pruning：Each sentence in corpus is trimmed to fixed input length：It calculates between all candidate entities most After big spacing, select the text size n bigger than this spacing as the input length of sentence；In order to reach the input of this fixation Length n is greater than length the sentence of n, in addition to retain two entities and between all texts, and retain identical before and after entity The word of number deletes word extra in sentence based on this strategy；It is less than the sentence of n for length, with a special symbol Each input sentence of patch not enough this length number behind sentence.

4. the biomedical entity relationship point of a kind of joint attention mechanism and neural network according to claim 2 Class method, which is characterized in that in step S2, described in step B2AndWithAcquisition methods are as follows：

Current word w is used respectively_iPosition sequence in sentence subtracts the position sequence of entity 1 and entity 2, and word w can be obtained_iRelative to entity 1 With the distance d of entity 2₁And d₂；The PoS label for combining a word and it distinguishes semanteme of the word in different sentences；Pass through this Processed sentence obtains PoS label of each word in sentence above for smooth good fortune resolver parsing；Each feature group have one it is embedding Enter dictionary；Assuming thatIt is the insertion dictionary of k-th of feature group, wherein m_kIt is the dimension of a feature insertion vector, l_kIt is word Allusion quotation V_kThe number of features for including；Word in each insertion dictionary can be embedded in by random initializtion, or with word trained in advance It is vector initialising；To a word w_iFor, rope of each feature in dictionary can be mapped by looking into corresponding insertion dictionary Drawing symbol is a real value row vector, to obtain word w_iThe corresponding insertion vector of word, PoS and position,With AndWith

5. the biomedical entity relationship point of a kind of joint attention mechanism and neural network according to claim 2 Class method, which is characterized in that in step s3, two-way LSTM layers uses two-way LSTM network, it is by a forward direction LSTM and one A backward LSTM composition；The two-way LSTM layers of output in moment n is the connection of LSTM output vector in both directionh_nIndicate the high-level semantic of entire sentence.

6. the biomedical entity relationship point of a kind of joint attention mechanism and neural network according to claim 2 Class method, which is characterized in that in step s3, loss function is updated using RMSprop optimization algorithm.