CN106294322A

CN106294322A - A kind of Chinese based on LSTM zero reference resolution method

Info

Publication number: CN106294322A
Application number: CN201610633621.2A
Authority: CN
Inventors: 赵铁军
Original assignee: Harbin Institute of Technology
Current assignee: Harbin Institute of Technology
Priority date: 2016-08-04
Filing date: 2016-08-04
Publication date: 2017-01-04

Abstract

A kind of Chinese based on LSTM zero reference resolution method, the present invention relates to Chinese zero reference resolution method based on LSTM.The invention aims to solve the accuracy rate of existing method Chinese zero reference resolution task low and semantic information is understood the problem that accuracy rate is low.One, each word in existing text data is processed, use word2vec instrument that each word in the text data after processing is trained, obtain a term vector dictionary；Two, the selected zero antecedent Candidate Set referred to；If the candidate phrase in the three antecedent Candidate Sets referred to when leading zero is zero to refer to real antecedent, then this training sample is positive example sample, otherwise for negative example sample；Four, Dropout layer connects logistic regression layer, represents that mode input sample is judged as the probit of positive example sample, and this value is as the output of model.The present invention is used for natural language processing field.

Description

A kind of Chinese based on LSTM zero reference resolution method

Technical field

The present invention relates to Chinese zero reference resolution method based on LSTM.

Background technology

Refer to refer to that referring to word with one in chapter refers to certain linguistic unit said in the past.In linguistics, refer to Word is referred to as anaphor, and the object of indication or content are referred to as first lang.Refer to be rhetorical a kind of term, refer to one section of word or In a language piece again and again mention same word, same person or the phenomenon of same things.Reference resolution just determines that photograph Answer the process of mutual relation between language and first lang, be one of the key issue of natural language processing.In natural language, reader Can the part inferred of relation based on context often be omitted, the part being omitted undertakes sentence in sentence Syntactic constituent, and refer to the linguistic unit hereinbefore said, this phenomenon referred to as zero refers to.Zero refer to i.e. to refer to should at itself Occur that the place zero pronoun referring to word replaces.Such as: nine years old that year of little celery,Noon cooks rice boiled, strained and then steamed,Hear mother hem and dam very It is pleasant to the ear,Listen for a moment before standing in table,Also forget cooking.In above-mentioned example eachIt is all " little celery " that place refers to subject, but The most do not refer to actual personal pronoun, and use anaphora, but do not affect the understanding of full sentence.

For the asian types such as Chinese, omit the phenomenon of certain part in syntactic structure and be up to 36%.This demonstrate Chinese occurring, zero phenomenon referred to is the most universal.Due to zero universal existence referring to phenomenon so that Chinese is at a lot of necks The research in territory is the most difficult.Such as, in machine translation field, cannot knowing the meaning representated by clipped when, Translator of Chinese cannot be become object language etc..Therefore the research that refers to of Chinese zero is the key of natural language processing and focus is asked One of topic, very important in texts in natural language understands.In generally talking about at one section, in order to ensure the short and sweet of text, literary composition Often dispensing a lot of information in Ben, people can obtain these information by context, but machine is for default place Not being understood that, this will have a kind of method to obtain default information from text.The research that Chinese zero refers to is exactly for solving Such problem and propose.The research that Chinese zero refers to not only plays an important role, at machine translation, literary composition in information extraction This classification and information extraction etc. are the most crucial in applying.

Zero research referred in early days mainly utilizes the syntactic feature formation logic rule of language to clear up, and compares and has representative The method of property includes center theory, method based on syntax etc..This type of method subject matter be represent and process the most extremely difficult, And needing substantial amounts of manual intervention, portability and the automaticity of simultaneity factor are the most poor.Therefore machine learning method For solving the problem of reference resolution, such as decision tree, SVM, tree kernel method etc..But due to based on syntactic feature to The method of amount or syntax tree structure is difficult to more effectively improve the accuracy rate of zero reference resolution problem.Along with deep The rise of learning research method and development, in natural language processing field, more and more use term vectors solve nature Language processing tasks, also achieves good effect, and the word mode of " term vector " is represented be by deep learning Introduce a core technology in NLP field.Become so using term vector and neural net method to solve zero reference resolution task A kind of necessary trial and innovation are become.

At present the method for Chinese zero reference resolution mainly has three classes:

(1) Chinese zero reference resolution is regarded as a binary classification task.In sentence each zero is referred to position, first First determine its antecedent Candidate Set according to rule；According to the feature masterplate of design, complete syntax tree is extracted feature and just obtains Negative training sample；A binary classifier is trained to carry out Chinese zero reference resolution.

(2) this problem is regarded as a binary classification problems equally.First on complete syntax tree, determine zero position referred to Put, antecedent candidate and mark positive counter-example；Extraction comprises zero subtree referring to position and antecedent candidate, according to tree Kernel principle, use SVM-TK instrument train binary classifier carry out zero refer to clear up.

(3) unsupervised approaches.Many unsupervised approaches are had to be also employed in Chinese zero reference resolution problem, such as combing Ranking model, Integer Linear Programing model, probabilistic model etc..

Above traditional method only make use of in sentence zero to refer to occur that location context syntactic information does not utilize it Semantic information, the accuracy rate causing Chinese zero reference resolution task is low and low to the true rate of semantic information understanding.

Summary of the invention

The invention aims to solve the accuracy rate of existing method Chinese zero reference resolution task low and to semanteme The shortcoming that comprehension of information accuracy rate is low, and a kind of Chinese based on LSTM zero reference resolution method is proposed.

Above-mentioned goal of the invention is achieved through the following technical solutions:

Step one, each word in existing text data is processed, after using word2vec instrument to processing In text data, each word is trained, and obtains a term vector dictionary, and each of which word all correspond to a term vector；

Step 2, the Chinese data used in OntoNotes5.0 corpus, in this Chinese data the zero of sentence refer to and Its antecedent has clear and definite mark；To having marked zero sentence referring to position, first it is converted into completely with syntactic analysis instrument The form of syntax tree, in complete syntax tree, chooses maximum NP knot to occurring in the zero all NP nodes referred to before position Point and modified NP node are as this zero antecedent Candidate Set referred to；

Described NP is noun phrase；

Step 3, to occurring in the zero sentence extraction key word referring to after position, the antecedent referred to each zero is waited Noun phrase one training sample of composition in selected works, if when the candidate phrase in the antecedent Candidate Set that leading zero refers to is zero Refer to real antecedent, then this training sample is positive example sample, otherwise for negative example sample；

Step 4, by positive and negative example sample all of word composition one word dictionary, to one id label of each word, by positive and negative All word id labels in example sample are replaced, and obtain word sequence, as the input of model；The word sequence of input connects Embedding layer, the id label of input is converted into term vector by Embedding layer, uses the term vector dictionary that step one obtains Initialize all term vectors of Embedding layer；Embedding layer connects two-way LSTM Internet, by two-way for each moment The output result of LSTM Internet is stitched together, and sends into Dropout layer；Dropout layer connects logistic regression layer, logistic regression layer Exporting the numerical value between 0 to 1, represent that mode input sample is judged as the probit of positive example sample, this value is made Output for model；

Described Embedding layer is embeding layer；LSTM is shot and long term memory models.

Invention effect

Correlational study of the present invention is not only informatics, the evidence of linguistics correlation theory, simultaneously to natural language understanding There is facilitation.The present invention is to solve that traditional method only make use of morphology and syntactic structure information or statistical probability information Deng, not carrying out Chinese zero in semantic analysis aspect refers to the problem that task is cleared up, the proposition of novelty use term vector and LSTM model carries out this task.On identical data set, compared with the present invention has measure of supervision with tradition, F1-score value carries Rise 5.8%, improved 2% with unsupervised approaches ratio.The term vector obtained by corpus data training is proved to containing specific Structurally and semantically information, be a kind of well semantic meaning representation form.The present invention proposes the extracting method of a kind of key word, will In sentence, zero refers to word relevant with antecedent in the word that position appears below and extracts, and forms one with each antecedent candidate Individual sample, the most just changes into a binary classification task by Chinese zero reference resolution problem, has redesigned this classification applicable and has asked The two-way LSTM neural network structure of topic, obtains this binary classification model by training.This model is used to carry out Chinese zero finger In generation, clears up, as long as sentence being converted into corresponding form input model just can obtain classification results.Solve in existing method the most sharp Refer to occur that location context syntactic information does not utilize its semantic information, Chinese zero reference resolution task with in sentence zero Accuracy rate low and semantic information is understood the shortcoming that accuracy rate is low, the present invention considers semantic information, improves Chinese zero The accuracy rate of reference resolution task and the accuracy rate to semantic information understanding.

The present invention proposes the abstracting method of a kind of key word, and in sentence, zero refers to occur what the hereafter extraction of position was correlated with Noun and verb, arrange a length keywords parameter, if the key word of extraction is more than this parameter, carries out cutting, otherwise then Supplement.The antecedent candidate phrase referred to zero, owing to the word quantity in phrase is not fixed, also to set a word quantity Parameter, cuts out accordingly or supplements.

The present invention uses term vector as the mode of a kind of semantic meaning representation, uses two-way LSTM neutral net to carry out semantic pass System's modeling.Carrying out semantic modeling by antecedent candidate phrase and zero in sentence being referred to key word hereinafter, finding both Between semantic relation, thus preferably carry out Chinese zero reference resolution at semantic level.

The present invention uses two-way LSTM network, and the term vector dictionary obtained by training is for initializing in LSTM network Embedding layer parameter, two-way LSTM layer is made up of a forward LSTM layer and a reverse LSTM layer, the two LSTM The output of each time node of layer, as the input of a logistic regression layer, finally divides as binary with the output of logistic regression layer The output of class model.

Illustrate flow process and the effect of this invention.For sentence, " China's electronic product foreign trade continues to increase, * Pro* accounts for the proportion of total import and export to be continued to rise." in sentence " * pro* " be zero position referring to occur, this sentence is converted into Syntax tree completely, at the NP node that " * pro* " above occurs, determines that the zero antecedent candidate phrase referring to position is: " China's machine Electricity product foreign trade ", " Chinese ", " electronic product "." * pro* " real antecedent is that " China's electronic product is imported and exported Trade ".Eradication keyword abstraction rule, zero refer to position " * pro* " the key word hereafter extracted be: " accounting for ", " turnover Mouthful ", " proportion ", " continuation ", " rising ".Arranging maximum key word number is 6, and antecedent candidate phrase major term number is 3, if Word number is inadequate, fills with symbol " * ".Obtain three samples: [product foreign trade accounts for import and export proportion and continues to rise *], [* * China accounts for import and export proportion and continues to rise *] and [* electronic product accounts for import and export proportion and continues rising *].By word dictionary, Word in these samples is replaced to word ID, then inputs the binary classification model based on two-way LSTM trained.Mould [product foreign trade accounts for import and export proportion and continues to rise *] can be divided into positive example by type, and two other sample is divided into negative example, recognizes It is " * pro* " real antecedent for " China electronic product foreign trade ".

Accompanying drawing explanation

Fig. 1 is the whole flow chart carrying out Chinese zero reference resolution based on two-way LSTM；

Fig. 2 is the two-way LSTM prototype network structure chart that detailed description of the invention one proposes；

Fig. 3 is conventional network structure figure；

Fig. 4 is dropout network structure.

Detailed description of the invention

Detailed description of the invention one: combine Fig. 1 and present embodiment is described, the one of present embodiment is based on term vector and two-way The Chinese zero reference resolution method of LSTM, specifically prepares according to following steps:

Step one, each word in existing text data is simply processed, use word2vec instrument to place In text data after reason, each word is trained that (word2vec is a open source software, is specifically used to the literary composition of point good word Word, by internal model, is converted into corresponding vector by this), obtain a term vector dictionary, each of which word is the most corresponding A term vector；

Chinese department's divided data in step 2, use OntoNotes5.0 corpus, sentence in this Chinese department's divided data Zero refers to and antecedent has clear and definite mark；To having marked the zero sentence text referring to position, first use syntactic analysis work Tool (sentence being converted into the instrument of tree form, such as: Stanford Parser) is converted into the form of complete syntax tree, complete In full syntax tree, choose maximum NP node (ancestors to occurring in zero all NP (noun phrase) node referred to before position Without NP node in node) and modified NP node (father node is NP node and right sibling is also NP node) as this zero The antecedent Candidate Set referred to；

Described NP is noun phrase；

Step 3, to occur in zero refer to position after sentence (referring to position occurs to sentence end from zero) extraction close Noun phrase NP in keyword, with each zero antecedent Candidate Set referred to forms a training sample, if when leading zero refers to Antecedent Candidate Set in candidate phrase be zero to refer to real antecedent, then this training sample is positive example sample, is otherwise Negative example sample；

Step 4, by positive and negative example sample all of word composition one word dictionary, to one id label of each word, by positive and negative All word id labels in example sample are replaced, and obtain word sequence, as the input of model；The word sequence of input connects Embedding layer, the id label of input is converted into term vector by Embedding layer, uses the term vector dictionary that step one obtains Initialize all term vector parameters of Embedding layer；Embedding layer connects two-way LSTM Internet, is used for extracting feature； The output result of two-way for each moment LSTM Internet is stitched together, sends into Dropout layer；Dropout layer connects logic and returns Return layer, logistic regression layer one numerical value between 0 to 1 of output, represent that mode input sample is judged as the probability of positive example Value, this value is as the output of model；

Described Embedding layer is embeding layer；LSTM is shot and long term memory models；Dropout layer is a kind of ad hoc network Structure, the implicit unit of certain ratio of selection that the when of model training, dropout Internet can be random is ineffective.Such as Fig. 3 And Fig. 4；Fig. 3 is conventional network structure figure, and Fig. 4 is dropout network structure；

Dropout refers to allow the weight of some hidden layer node of network not work at random when model training, idle Those nodes can temporarily not think it is the part of network structure, but its weight must remain (simply the most more Newly), because during sample input next time, it may work again (somewhat abstract, implement the experiment portion seen below Point).May be considered a kind of special network structure.

Detailed description of the invention two: present embodiment is unlike detailed description of the invention one: to existing in described step one The process that simply processes of text data be: use participle program that sentence in existing text data is carried out participle, Spcial character is removed, only retains Chinese character, English and punctuate (spcial character such as Greek alphabet, Russion letter, phonetic notation symbol Number, special symbol etc.).

Detailed description of the invention three: present embodiment is unlike detailed description of the invention one or two: in described step 2 first The processing mode of row word Candidate Set is:

Arranging antecedent Candidate Set major term number is n, 1≤n≤maxW, and maxW represents the major term number of a sentence Mesh；

If antecedent Candidate Set word number is less than n, then it is filled with symbol * until word number is equal to n；

If antecedent Candidate Set word number is more than n, the most only retain last n word；

Being mapped to the term vector stage at word, * is mapped to null vector.

Detailed description of the invention four: present embodiment is unlike one of detailed description of the invention one to three: described step 3 In to occur in zero refer to position after sentence (referring to position occurs to sentence end from zero) extracting keywords；Detailed process For:

Arranging key word major term number is m, 1≤m≤maxW, and maxW represents major term number, the key word of a sentence Extracting rule is: the noun in extraction sentence and verb；

If total word number of extraction is less than m, then it is filled with symbol *, until reaching m word；

If total word number of extraction is equal to m, then need not extra process；

If total word number of extraction is more than m, then all words of extraction is carried out cutting, first delete modified noun, Calculate total word number of the extraction after deleting modified noun, if total word number of extraction is equal to m, then need not extra process；As Total word number of fruit extraction less than m, is then filled with symbol *, until reaching m word；

If total word number of extraction is more than m, delete the noun in addition to modified noun in noun the most again, calculate after deleting Total word number of extraction, if total word number of extraction is less than m, is then filled with symbol *, until reaching m word；If extraction Total word number equal to m, then need not extra process；If total word number of extraction is more than m, delete verb the most again, calculate and delete verb After total word number of extraction, if total word number of extraction is less than m, be then filled with symbol *, until reaching m word；If taken out The total word number taken is equal to m, then need not extra process.

Detailed description of the invention five: present embodiment is unlike one of detailed description of the invention one to four: described step 4 In two-way LSTM Internet include forward LSTM layer and reverse LSTM layer；As shown in Figure 2；The effect of LSTM layer is in input Feature is extracted on keyword sequence；

All words in positive and negative example sample forward input forward LSTM layer respectively, reversely inputs reverse LSTM layer；Use double The information of both direction input is preserved respectively to LSTM layer.Model can be made in theory to use when processing current time data The contextual information of whole sequence, last the two LSTM layer is stitched together in the output of each sequential.3 doors and independence The design of memory cell so that LSTM unit has preservation, reads, resets and update the ability of distance historical information.

Detailed description of the invention six: present embodiment is unlike one of detailed description of the invention one to five: described LSTM layer It is made up of LSTM unit, the corresponding LSTM unit of each sequential；LSTM unit each sequential can input a word to Amount, then one value of output, the output valve of each sequential through concatenation (two vectorial concatenation can be regarded as by Second vector is appended to first vectorial end so that it is be merged into a new vector) obtain a characteristic vector, send into Dropout layer, is connected with the logistic regression layer of Dropout layer, logistic regression layer one numerical value between 0 to 1 of output, table Showing that input sample is judged as the probit of positive example, this value is as the output of model.

Detailed description of the invention seven: present embodiment is unlike one of detailed description of the invention one to six: described LSTM layer It is made up of LSTM unit, the corresponding LSTM unit of each sequential；LSTM unit each sequential can input a word to Amount, then one value of LSTM unit output；Detailed process is:

LSTM unit specialized designs mnemon (memory cell) is used for preserving historical information.Historical information is more Newly with utilization respectively by the control input gate (input gate) of 3 doors, forget door (forget gate), out gate (output gate)；

If h is LSTM unit exports data, c is LSTM candidate's mnemon value, and x is that LSTM unit inputs data；

(1) candidate's mnemon value of current time is calculated according to the formula of tradition RNNW_xc、W_hcIt is that LSTM is mono-respectively Unit's current time input data x_tWith upper moment LSTM unit output data h_t-1Weighting parameter, b_cFor offset parameter, h is sharp Function alive；

{\tilde{c}}_{t} = \tanh (W_{x c} x_{t} + W_{h c} h_{t - 1} + b_{c})

(2) value i of input gate input gate is calculated_t, input gate is used for controlling current data and inputs mnemon shape The impact of state value.The calculating of all doors is except by present input data x_tWith upper moment LSTM unit output valve h_t-1Outside impact, Also recalled cell value c by engraving upper a period of time_t-1Impact.

i_t=σ (W_xix_t+W_hih_t-1+W_cic_t-1+b_i)

Wherein, W_xiData x are inputted for LSTM unit current time_tWeighting parameter, W_hiDefeated for a upper moment LSTM unit Go out data h_t-1Weighting parameter, W_ciFor upper moment candidate's mnemon value c_t-1Weighting parameter, b_iFor offset parameter；σ is Activation primitive；

(3) value f of forget gate is forgotten in calculating_t, forget door for controlling historical information to current mnemon shape The impact of state value.

f_t=σ (W_xfx_t+W_hfh_t-1+W_cfc_t-1+b_f)

Wherein, W_xfData x are inputted for LSTM unit current time_tWeighting parameter, W_hfDefeated for a upper moment LSTM unit Go out data h_t-1Weighting parameter, W_cfFor upper moment candidate's mnemon value c_t-1Weighting parameter, b_fFor offset parameter；

(4) current time mnemon value c is calculated_t；

Wherein, ⊙ represents pointwise product；From formula, mnemon updates and depends on moment candidate's mnemon Value c_t-1Candidate's mnemon value with current time, and by input gate with forget respectively this two parts factor to be entered Row regulation.

(5) out gate o is calculated_t；For controlling the output of mnemon state value.

o_t=σ (W_xox_t+W_hoh_t-1+W_coc_t-1+b_o)

Wherein, W_xoData x are inputted for LSTM unit current time_tWeighting parameter, W_hoDefeated for a upper moment LSTM unit Go out data h_t-1Weighting parameter, W_coFor upper moment candidate's mnemon value c_t-1Weighting parameter, b_oFor offset parameter；

(6) last LSTM unit is output as

h_t=o_t⊙tanh(c_t)。

Detailed description of the invention eight: present embodiment is unlike one of detailed description of the invention one to seven: described σ typically takes Logistic sigmoid function, span 0≤σ≤1.

Detailed description of the invention nine: present embodiment is unlike one of detailed description of the invention one to eight: described to LSTM The value of unit output, uses logistic regression to carry out binary classification, and the output result of logistic regression layer is the sample of mode input, quilt Be predicted as positive example probit (the last output of model that this patent proposes is exactly this probit, and this probit is the most accurate, Illustrate that model is the best), this value as the output detailed process of model is:

Classification formula is:

p (y = 1 | x) = \frac{\exp (w \cdot x + b)}{1 + \exp (w \cdot x + b)}

Wherein, x is the characteristic vector of dropout network output, and b is bias vector, and y is tag along sort, is divided into positive example mark Sign or negative example label；Logistic regression p (y=1 | x) calculate is that y is positive example mark under conditions of the characteristic vector of input is x The probability signed；In Chinese zero reference resolution framework based on two-way LSTM model.

In order to prevent neutral net from Expired Drugs occur, over-fitting is existing to use dropout technology to avoid model to occur As.Dropout layer allows the implicit node of certain proportion (ratio p generally takes 0.5) not work model training when at random.No The weights that these nodes of work are corresponding would not update in current training.But model uses when, all nodes Will be used, recover complete and connect.Reach to prevent Expired Drugs by this mechanism.

The construction process of whole LSTM binary classification network is: at data preprocessing phase, by the keyword sequence of extraction Word dictionary is utilized to be converted into word label sequence；Then using these word label sequences as the input of neutral net, it is connected to Embedding layer, the word label of each sequential is converted into term vector by embedding layer, and order passes to forward LSTM net respectively Network layers and backward pass to reverse LSTM Internet；Two LSTM layers can have an output in each time series, and these are defeated Go out result horizontally-spliced (being operated by concatenate, be spliced together), be then fed into dropout layer；dropout Layer output result sends into logistic regression classification layer, last output category probit.

Employing following example checking beneficial effects of the present invention:

Embodiment one:

The present embodiment one, specifically prepares according to following steps:

(1) sample extraction.Sentence and its complete syntax that Chinese zero refers to is comprised in the extraction of OntoNote5.0 corpus Tree.The complete syntax tree of sentence extracts antecedent Candidate Set.Each antecedent candidate phrase zero refers to constitute one with it Whether sample, be that the zero real antecedent referred to determines that this sample is positive example or negative example according to this candidate phrase.

(2) keyword abstraction.The keyword abstraction strategy proposed by the present invention, in extraction sentence, zero refers to position to sentence The key word of tail and the key word of candidate phrase.Finally according to word dictionary, these key words are replaced to word label.

(3) positive and negative training sample is sent into the two-way LSTM model framework that the present invention proposes, after training, obtain one Individual Chinese zero reference resolution model.

(4) finally new test sample (also from said method and corpus) is sent into model, pre-according to model Survey result and the legitimate reading of test sample, obtain testing data.

Test result is as follows:

Accuracy rate	Recall rate	F1 value
			50.7	50.7	50.7

The present invention also can have other various embodiments, in the case of without departing substantially from present invention spirit and essence thereof, and this area Technical staff is when making various corresponding change and deformation according to the present invention, but these change accordingly and deformation all should belong to The protection domain of appended claims of the invention.

Claims

1. a Chinese zero reference resolution method based on LSTM, it is characterised in that: a kind of Chinese zero based on LSTM refers to disappear Solution method, specifically prepares according to following steps:

Step one, each word in existing text data is processed, use word2vec instrument to the text after processing In data, each word is trained, and obtains a term vector dictionary, and each of which word all correspond to a term vector；

Chinese data in step 2, use OntoNotes5.0 corpus, in this Chinese data, the zero of sentence refers to and first Row word has clear and definite mark；To having marked zero sentence referring to position, first it is converted into complete syntax with syntactic analysis instrument Tree form, in complete syntax tree, to occur in the zero all NP nodes referred to before position choose maximum NP node and Modified NP node is as this zero antecedent Candidate Set referred to；

Described NP is noun phrase；

Step 3, to occurring in the zero sentence extraction key word referring to after position, with each zero antecedent Candidate Set referred to In noun phrase one training sample of composition, if the candidate phrase in the antecedent Candidate Set referred to when leading zero is zero to refer to Real antecedent, then this training sample is positive example sample, otherwise for negative example sample；

Step 4, by positive and negative example sample all of word composition one word dictionary, to one id label of each word, by positive and negative example sample All word id labels in Ben are replaced, and obtain word sequence, as the input of model；The word sequence of input connects Embedding Layer, the id label of input is converted into term vector by Embedding layer, and the term vector dictionary using step one to obtain initializes All term vectors of Embedding layer；Embedding layer connects two-way LSTM Internet, by two-way for each moment LSTM network The output result of layer is stitched together, and sends into Dropout layer；Dropout layer connects logistic regression layer, and logistic regression layer exports one Numerical value between 0 to 1, represents that mode input sample is judged as the probit of positive example sample, and this value is as model Output；

A kind of Chinese based on LSTM zero reference resolution method, it is characterised in that: described step one In process that existing text data is processed be: use participle program sentence in existing text data to be carried out point Word, removes spcial character, only retains Chinese character, English and punctuate.

A kind of Chinese based on LSTM zero reference resolution method, it is characterised in that: described step 2 The processing mode of middle antecedent Candidate Set is:

Arranging antecedent Candidate Set major term number is n, 1≤n≤maxW, and maxW represents the major term number of a sentence；

If antecedent Candidate Set word number is equal to n, then need not process；

Being mapped to the term vector stage at word, * is mapped to null vector.

A kind of Chinese based on LSTM zero reference resolution method, it is characterised in that: described step 3 In to occurring in the zero sentence extraction key word referring to after position；Detailed process is:

Arranging key word major term number is m, 1≤m≤maxW, and maxW represents major term number, the keyword extraction of a sentence Rule is: the noun in extraction sentence and verb；

If total word number of extraction is equal to m, do not process；

If total word number of extraction is more than m, then all words of extraction is carried out cutting, first delete modified noun, calculate Delete total word number of the extraction after modified noun, if total word number of extraction is equal to m, do not process；If total word of extraction Number less than m, is then filled with symbol *, until reaching m word；

If total word number of extraction is more than m, delete the noun in addition to modified noun in noun the most again, calculate the extraction after deleting Total word number, if total word number of extraction is less than m, be then filled with symbol *, until reaching m word；If total word of extraction Number, equal to m, does not processes；If total word number of extraction is more than m, delete verb the most again, calculate the total of the extraction after deleting verb Word number, if total word number of extraction is less than m, is then filled with symbol *, until reaching m word；If total word number etc. of extraction In m, do not process.

A kind of Chinese based on LSTM zero reference resolution method, it is characterised in that: described step 4 In two-way LSTM Internet include forward LSTM layer and reverse LSTM layer；All words in positive and negative example sample forward respectively just inputs To LSTM layer, reversely input reverse LSTM layer；Two-way LSTM layer is used to preserve the information of both direction input respectively.

A kind of Chinese based on LSTM zero reference resolution method, it is characterised in that: described LSTM layer It is made up of LSTM unit, the corresponding LSTM unit of each sequential；LSTM unit each sequential can input a word to Amount, then one value of LSTM unit output, the output valve of each sequential obtains a characteristic vector through concatenation, sends into Dropout layer, is connected with the logistic regression layer of Dropout layer, logistic regression layer one numerical value between 0 to 1 of output, table Showing that input sample is judged as the probit of positive example sample, this value is as the output of model.

A kind of Chinese based on LSTM zero reference resolution method, it is characterised in that: described LSTM layer It is made up of LSTM unit, the corresponding LSTM unit of each sequential；LSTM unit each sequential can input a word to Amount, then one value of LSTM unit output；Detailed process is:

(1) candidate's mnemon value of current time is calculated according to the formula of tradition RNN

{\tilde{c}}_{t} = \tanh (W_{x c} x_{t} + W_{h c} h_{t - 1} + b_{c})

In formula, W_xc、W_hcIt is that LSTM unit current time inputs data x respectively_tWith upper moment LSTM unit output data h_t-1's Weighting parameter, b_cFor offset parameter, h is activation primitive；Described RNN is Recognition with Recurrent Neural Network；

(2) value i of input gate is calculated_t,

i_t=σ (W_xix_t+W_hih_t-1+W_cic_t-1+b_i)

Wherein, W_xiData x are inputted for LSTM unit current time_tWeighting parameter, W_hiFor a upper moment LSTM unit output number According to h_t-1Weighting parameter, W_ciFor upper moment candidate's mnemon value c_t-1Weighting parameter, b_iFor offset parameter；σ is for activating Function；

(3) value f of door is forgotten in calculating_t,

f_t=σ (W_xfx_t+W_hfh_t-1+W_cfc_t-1+b_f)

Wherein, W_xfData x are inputted for LSTM unit current time_tWeighting parameter, W_hfFor a upper moment LSTM unit output number According to h_t-1Weighting parameter, W_cfFor upper moment candidate's mnemon value c_t-1Weighting parameter, b_fFor offset parameter；

(4) current time mnemon value c is calculated_t；

Wherein, ⊙ represents pointwise product；

(5) out gate o is calculated_t；

o_t=σ (W_xox_t+W_hoh_t-1+W_coc_t-1+b_o)

Wherein, W_xoData x are inputted for LSTM unit current time_tWeighting parameter, W_hoFor a upper moment LSTM unit output number According to h_t-1Weighting parameter, W_coFor upper moment candidate's mnemon value c_t-1Weighting parameter, b_oFor offset parameter；

(6) LSTM unit is output as

h_t=o_t⊙tanh(c_t)。

A kind of Chinese based on LSTM zero reference resolution method, it is characterised in that: described σ value Scope 0≤σ≤1.

A kind of Chinese based on LSTM zero reference resolution method, it is characterised in that: described to LSTM The value of unit output obtains a characteristic vector through concatenation, sends into Dropout layer, with the logistic regression of Dropout layer Layer connects, and uses logistic regression to carry out binary classification, logistic regression layer one numerical value between 0 to 1 of output, represents input Sample is judged as the probit of positive example sample, and this value as the output detailed process of model is:

Classification formula is:

p (y = 1 | x) = \frac{\exp (w \cdot x + b)}{1 + \exp (w \cdot x + b)}

Wherein, x be dropout network output characteristic vector, b is bias vector, and y is tag along sort, be divided into positive example label or Person bears example label；Logistic regression p (y=1 | x) calculate is that y is positive example label under conditions of the characteristic vector of input is x Probability；In Chinese zero reference resolution framework based on two-way LSTM model.