CN104915448A

CN104915448A - Substance and paragraph linking method based on hierarchical convolutional network

Info

Publication number: CN104915448A
Application number: CN201510372795.3A
Authority: CN
Inventors: 包红云; 郑孙聪; 许家铭; 齐振宇; 徐博; 郝红卫
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2015-06-30
Filing date: 2015-06-30
Publication date: 2015-09-16
Anticipated expiration: 2035-06-30
Also published as: CN104915448B

Abstract

The invention discloses a substance and paragraph linking method based on a hierarchical convolutional network. The substance and paragraph linking method based on the hierarchical convolutional network comprises the steps that: a word vectorization representation is transformed into a sentence vectorization representation by using a convolutional neural network; the sentence vectorization representation is convolved via the convolutional neural network to obtain a paragraph vectorization representation while considering the sentence sequence information; with the existed substance as supervision information, the sentence vectorization representation and the paragraph vectorization representation are subjected to a training of the hierarchical convolutional network through a Softmax output; meanwhile, the training of the hierarchical convolutional network is further improved according to pair-wise similarity between a paragraph semantic vector feature and a substance semantic vector feature; a test description paragraph is set and a deep semantic feature is extracted by using the trained hierarchical convolutional network to obtain a vectorization representation of a test paragraph;, so that the paragraph can be directly linked to the target substance based on the deep semantic feature through the Softmax output.

Description

A kind of entity based on level convolutional network and paragraph link method

Technical field

The present invention relates to construction of knowledge base technical field, relate more specifically to a kind of entity based on level convolutional network and paragraph link method.

Background technology

Nowadays, widely used large-scale knowledge base has Freebase, WordNet and YAGO etc.They are all devoted to the overall resources bank of structure one, and allow machine to access more easily and obtain structuring public information.Meanwhile, these knowledge bases provide application structure (APIs) to be convenient for people to inquire about the information of related entities more horn of plenty.Such as, when we retrieve a city name " Washington D.C. " in YAGO database, return results as shown in table 1 below:

Table 1

Can see, the object information returned is all the organizational information of some highly structurals.But these structured messages also do not meet actual context and the semantic information that people understand entity.Different with YAGO database, Freebase with WordNet then additionally can return the descriptive paragraph relevant to retrieving entity while return structure information, as shown in table 2 below:

Table 2

Can see, descriptive paragraph as shown in table 2 is more of value to concrete linguistic context and the semantic information that user understands query entity word.But the descriptive paragraph information of Freebase and WordNet is all by manually editing, and this carries out the limitation of paragraph description and the time of at substantial and manpower to entity under causing large data.Therefore, an efficient entity how is designed and descriptive paragraph AutoLink method is the task that large data age construction of knowledge base is urgently needed badly.

Can also see from the returning results of table 2, descriptive content might not comprise query entity word, and only need comprise some related terms and describe in many aspects entity.Therefore, in order to head it off, entity and paragraph link method need to set about from two aspects: the subject information 1, catching text from given one section of descriptive paragraph; 2, the important descriptive content relevant with entity is found.Much more traditional method is to extract the subject information of paragraph based on topic model method, as Dirichlet distribute (LDA) and probability are dived semantic analysis (PLSA) etc.To be the subject informations extracted be for the common problem of these methods based on the Term co-occurrence information acquisition of document level, and the height by the short-and-medium Text Representation of social media is openness to be affected relatively more serious, and lost the word order information in text.

In recent years, along with the rise of deep neural network, the deep layer latent semantic feature that some researchers attempt adopting depth model and term vector method for expressing to learn descriptive paragraph represents to solve the link problems of entity and paragraph.But, existing based on depth model method solve descriptive paragraph semantic feature extract time, just simply whole paragraph is regarded as a long sentence and carries out processing or directly multiple statement being weighted on average obtaining semantic vector.And in fact, the sentence order in paragraph also has Semantic logical relation.

On the other hand, it is also very important for catching in paragraph with the closely-related descriptive clue of entity.As although the descriptive paragraph during above-mentioned table 2 returns results directly does not comprise query entity word " Washington D.C. ", but but contain a lot of relevant vocabulary or phrase, as: " GeorgeWashington ", " United States " and " capital " etc.Therefore, that carries out that vectorization character representation contributes to entity and descriptive paragraph to entity links work.

Summary of the invention

For above-mentioned technical matters, fundamental purpose of the present invention is to provide a kind of entity based on level convolutional network and paragraph link method, thus by the entity word in internet and descriptive paragraph without the need to manually participation and AutoLink, the structure of the semantic knowledge-base under large data can be contributed to.

To achieve these goals, the invention provides a kind of entity based on level convolutional network and paragraph link method, comprise the following steps:

Utilize convolutional neural networks to represent that changing into sentence vectorization represents by term vector, described convolutional network is conducive to extracting query entity and is describing the important clue in paragraph;

Through convolutional neural networks, described sentence vectorization represents considers that described sentence order information obtains paragraph vectorization and represents again;

Described sentence vectorization is represented and to represent with described paragraph vectorization and to be exported by Softmax, carries out the training of described convolutional neural networks model by existing entity as supervision message;

Consider that the pair-wise similarity information between described paragraph semantic vector feature and Entity Semantics vector characteristics improves the training of described convolutional neural networks model further simultaneously;

A given test description paragraph, the neural network model trained described in utilization carries out the vectorization that Deep Semantics feature extraction obtains described test paragraph and represents, then exports based on this semantic expressiveness can directly be linked on target entity through Softmax.

Feature learning problem in the linking of entity and paragraph is divided into four levels by entity of the present invention and paragraph link method, is respectively: urtext paragraph represents the eigenmatrix layer obtained by term vector; The sentence vectorization representation feature layer obtained by convolutional neural networks; The paragraph vectorization representation feature layer obtained by convolutional neural networks; Term vector look-up table is utilized to obtain the vectorization representation feature layer of entity word.Tabled look-up by convolution character network and term vector, the accuracy value ACC of method of the present invention entity and paragraph link method on two text data sets is significantly superior to other control methodss, and relative to best control methods two, the accuracy value of the inventive method on two data sets improves 12.4% and 16.76% respectively.

Accompanying drawing explanation

Fig. 1 is the process flow diagram of the entity based on level convolutional network as one embodiment of the invention and paragraph link method;

Fig. 2 is the block schematic illustration of the entity based on level convolutional network as one embodiment of the invention and paragraph link method;

Fig. 3 is the performance schematic diagram of the entity based on level convolutional network as one embodiment of the invention and paragraph link method.

Embodiment

For making the object, technical solutions and advantages of the present invention clearly understand, below in conjunction with specific embodiment, and with reference to accompanying drawing, the present invention is described in further detail.

The invention discloses a kind of entity based on level convolutional network and paragraph link method, entity word in internet and descriptive paragraph can be carried out without the need to manually participating in ground AutoLink, its general plotting is, first carries out to the term vector in paragraph the vectorization that convolution obtains sentence represent by stratification convolutional neural networks.Consider the order information of sentence in paragraph, and the vectorization of sentence is represented that carrying out again the vectorization that convolution obtains paragraph represents.Then utilize substance feature to instruct the parameter learning of convolutional neural networks model as supervision message, consider simultaneously the deep semantic feature of paragraph and Entity Semantics vectorization represent between pair-wise affinity information improve the study of convolutional neural networks model.A given new descriptive paragraph, then can utilize its deep semantic feature of convolutional neural networks model extraction trained, and export the entity link obtaining correspondence based on this feature.

More specifically, first the method utilizes convolutional neural networks to represent that changing into sentence vectorization represents by term vector.Then utilize sentence vectorization to represent and again consider that described sentence order information obtains paragraph vectorization and represents through convolutional neural networks.Sentence vectorization is represented and to represent with paragraph vectorization and to be exported by Softmax, carries out the training of described convolutional neural networks model by existing entity as supervision message.Meanwhile, consider that the pair-wise similarity information between paragraph semantic vector feature and Entity Semantics vector characteristics improves the training of convolutional neural networks model further.A given test description paragraph, vectorization that Deep Semantics feature extraction obtains testing paragraph represents to utilize the neural network model that trains to carry out, and then exports based on this semantic expressiveness can directly be linked on target entity through Softmax.

Be described in detail as the entity based on level convolutional network of one embodiment of the invention and paragraph link method below in conjunction with accompanying drawing.

Fig. 1 is the process flow diagram of the entity based on level convolutional network as one embodiment of the invention and paragraph link method.

With reference to Fig. 1, in step S101, represented by convolutional neural networks model and term vector, extract the vectorization representation feature of every bar sentence in pending paragraph;

According to one exemplary embodiment of the present invention, describedly to be represented by convolutional neural networks model and term vector, the step extracting the vectorization representation feature of every bar sentence in pending paragraph comprises:

In step S1011, a sentence in given pending paragraph, utilizes look-up table obtain lexical item quantization means and sentence is characterized into matrix form;

In step S1012, described sentence matrixing representation feature carries out one dimension convolution, obtain the eigenmatrix after convolution;

In step S1013, the eigenmatrix after described convolution carries out average sampling and compress feature, the vectorization obtaining sentence represents.

According to one exemplary embodiment of the present invention, describedly look-up table is utilized to obtain lexical item quantization means and step sentence being characterized into matrix form comprises:

The term vector set that a given word2vec trains wherein, | V| is dictionary size, and d is the dimension of term vector.Then arbitrarily length is that the sentence of n can be expressed as in paragraph:

s＝(x ₁；x ₂；...；x _n) (1)

Wherein, x _ithat the vectorization that i-th word utilizing look-up table to find in term vector set is corresponding represents.Wherein, if word x _ido not appear in the term vector set trained, then in this exemplary embodiment of the present invention, directly random initializtion expression is carried out to it.

In step S1012, describedly on sentence matrixing representation feature, carry out one dimension convolution, the step obtaining the eigenmatrix after convolution comprises:

Here, use represent h initial from i-th word in sentence s _sindividual continuous word feature.A given one dimension convolution kernel then h _seigenmatrix after individual continuous word feature convolution is:

s^{(i)} = f (W^{(1)} \cdot s_{i : i + h_{s} - 1} + b^{(1)}) - - - (2)

Wherein, b ⁽¹⁾be bias term, f is activation function, h _sindividual continuous word feature eigenmatrix after convolution.Then the eigenmatrix of described sentence is after convolution:

s = (s^{(1)}, s^{(2)}, ..., s^{(n - h_{s} + 1)}) - - - (3)

In step S1013, described eigenmatrix after convolution carries out average sampling and compress feature, the step that the vectorization obtaining sentence represents comprises:

In this exemplary embodiment of the present invention, the step of described employing average sampling is:

\overset{&OverBar;}{s} = \frac{1}{n - h_{s} + 1} Σ_{i = 1}^{n - h_{s} + 1} s^{(i)} - - - (4)

So far, each convolution kernel a d dimensional feature vector can be generated if employ k convolution kernel, then through a convolutional layer, the vectorization that finally can obtain sentence is expressed as the dimension that then sentence vectorization represents is dk.

In step S102, utilize convolutional neural networks structure and described sentence vectorization to represent, learn the deep semantic feature of described paragraph;

According to one exemplary embodiment of the present invention, the deep semantic feature learning method of described paragraph comprises:

In step S1021, utilize the sentence vector characteristics in described paragraph, by the word order of sentence in described paragraph, paragraph is characterized into matrix form;

In step S1022, described paragraph matrixing representation feature carries out one dimension convolution, obtain the eigenmatrix after convolution;

In step S1023, the eigenmatrix after described convolution carries out average sampling and compress feature and carry out once linear conversion, the vectorization obtaining paragraph represents.

According to one exemplary embodiment of the present invention, the described step utilizing the sentence vector characteristics in paragraph paragraph to be characterized into matrix form by the word order of sentence in described paragraph comprises:

The vectorization having obtained the l bar sentence of described paragraph represents, then paragraph can be expressed as:

t＝(s ₁；s ₂；...；s _l) (5)

In step S1022, describedly on paragraph matrixing representation feature, carry out one dimension convolution, the step obtaining the eigenmatrix after convolution comprises:

Here, use represent h initial from i-th sentence in paragraph t _tindividual sequence sentence subcharacter.A given one dimension convolution kernel then h _tconvolution after individual sequence sentence subcharacter convolution is characterized as:

t^{(i)} = f (W^{(2)} \cdot t_{i : i + h_{t} - 1} + b^{(2)}) - - - (6)

Wherein, b ⁽²⁾be bias term, f is activation function, h _tindividual sequence sentence subcharacter feature after convolution.Then the feature of described paragraph is after convolution:

t = (t^{(1)}, t^{(2)}, ..., t^{(n - h_{t} + 1)}) - - - (7)

In step S1023, described eigenmatrix after convolution carries out average sampling and compress feature and carry out once linear conversion, the step that the vectorization obtaining paragraph represents comprises:

\overset{&OverBar;}{t} = \frac{1}{l - h_{t} + 1} Σ_{i = 1}^{l - h_{t} + 1} t^{(i)} - - - (8)

So far, through convolution kernel W ⁽²⁾generate a dk dimensional feature vector conveniently calculate the similarity of paragraph feature and substance feature, the unification of vector dimension need be ensured, then once linear conversion is carried out to described paragraph vector:

z = W^{(3)} \cdot \overset{&OverBar;}{t} - - - (9)

Wherein, for the matrix of a linear transformation, and proper vector z is final paragraph proper vector in one exemplary embodiment of the present invention.

In step S103, the vectorization of described sentence represents and to represent with the vectorization of described paragraph and export entity belonging to paragraph described in matching respectively through Softmax;

According to one exemplary embodiment of the present invention, the vectorization of described sentence and described paragraph represents that described in matching, the method for entity belonging to paragraph comprises the following steps respectively:

In step S1031, linear transformation is carried out respectively to described sentence vector sum paragraph vector and obtains output vector, and use Dropout technology to carry out canonical;

In step S1032, use the link probability of Softmax function calculated candidate entity;

According to one exemplary embodiment of the present invention, described distich subvector and paragraph vector carry out linear transformation and obtain output vector, and the step using Dropout technology to carry out canonical comprises:

Distich subvector feature s and paragraph vector characteristics t carries out linear change respectively, obtains two output vectors:

ys＝W ⁽⁴⁾·(sοr)+b ⁽⁴⁾(10)

y＝W ⁽⁵⁾·(zοr)+b ⁽⁵⁾(11)

Wherein, with be weight matrix, m is the entity number in one exemplary embodiment of the present invention, symbol.Representing matrix element take advantage of operation, and it is then a Bernoulli Jacob's distribution obeying certain probability ρ.Dropout technology is used to prevent over-fitting, can the robustness of strength neural network model.

In step S1032, the step using described Softmax function to calculate the link probability of described candidate's entity comprises:

Softmax activation function is used to calculate the probable value of the described entity word of each correspondence respectively at two described output layers of described sentence vector characteristics and described paragraph vector characteristics:

{ps}_{i} = \frac{\exp ({ys}_{i})}{Σ_{j = 1}^{m} \exp ({ys}_{j})} - - - (12)

p_{i} = \frac{\exp (y_{i})}{Σ_{j = 1}^{m} \exp (y_{j})} - - - (13)

Then in formula (12) and formula (13), ps _iand p _irepresent the probable value of corresponding i-th described entity word respectively.

In step S104, the vectorization calculating described entity represents and the pair-wise analog information that described paragraph vectorization represents;

A given entity word set E={e ₁, e ₂..., e _m, utilize word2vec to carry out initialization to described entity word set, then the similarity of entity word set E and described paragraph proper vector z is:

sim(z，E)＝{z·e ₁，z·e ₂，...，z·e _m} (14)

Wherein, operational character ze represents the similarity of described paragraph proper vector z and the described entity word e of correspondence.

In step S105, carry out error back propagation training convolutional neural networks model by the pair-wise similarity information of Softmax fit object entity word and paragraph proper vector and target entity word;

According to one exemplary embodiment of the present invention, the step that the described pair-wise similarity information by Softmax fit object entity word and described paragraph proper vector and target entity word carries out training convolutional neural networks model described in error back propagation comprises:

In step S1051, export according to described sentence characteristics and paragraph feature, utilize described Softmax to concentrate the fitting result target setting function of target entity word to described training data;

In step S1052, according to the pair-wise similarity information setting objective function of described paragraph feature and described target entity word;

In step S1053, setting global object constraint function;

In step S1054, stochastic gradient descent method is utilized to upgrade the parameter in model;

According to one exemplary embodiment of the present invention, described output according to sentence characteristics and paragraph feature utilizes described Softmax to concentrate the step of the fitting result target setting function of target entity word to comprise to described training data:

Utilize formula (10), (11) and formula (12), (13), the goal constraint function setting described sentence vectorization feature and described paragraph vectorization feature is respectively:

Wherein, L _sfor the goal constraint function of described sentence vectorization feature, L _p1for the goal constraint function of described paragraph vectorization feature, for paragraph set in all corpus in all sentence set, be correct entity word belonging to i-th sentence and it is the correct entity word belonging to i-th paragraph.

In step S1052, the step of the described pair-wise similarity information setting objective function according to paragraph feature and described target entity word comprises:

In order to strengthen the semantic meaning representation ability of described paragraph and entity, the present invention strengthens the similarity of the described paragraph vectorization feature described affiliated entity word vectorization feature corresponding with it by target setting constraint function, and weaken the similarity of the described paragraph vectorization feature described non-belonging entity word vectorization feature corresponding with it, its goal constraint function is as follows:

Wherein, e _rit is the correct entity word belonging to given described paragraph z.

In step S1053, the step of described setting global object constraint function is as follows:

L＝L _s+(1-α)·L _p1+α·L _p2(18)

Wherein, α is weight harmonic coefficient, and two that are used for balancing described paragraph vectorization feature retrain L _p1and L _p2.

In step S1054, the described stochastic gradient descent method that utilizes comprises the step that the parameter in described model upgrades:

In the described goal constraint function of setting, all model training improve parameter unifications are expressed as θ:

θ＝(x，W ⁽¹⁾，b ⁽¹⁾，W ⁽²⁾，b ⁽²⁾，α，W ⁽³⁾，W ⁽⁴⁾，b ⁽⁴⁾，W ⁽⁵⁾，b ⁽⁵⁾，E) (19)

In one exemplary embodiment of the present invention, adopt stochastic gradient descent method to carry out error back propagation and described objective function is optimized.

In step S106, utilize convolutional neural networks model after upgrading to carry out deep semantic feature extraction to test description paragraph, then represent based on the vectorization of paragraph and to link with corresponding entity word.

According to one exemplary embodiment of the present invention, described convolutional neural networks model after described renewal carries out deep semantic feature extraction to described test description paragraph, and the vectorization then based on described paragraph represents that the step of carrying out linking with corresponding described entity word comprises:

In step S1061, a given test paragraph text, first calculates the vectorization feature s of sentence in described paragraph by formula (2), (3), (4);

In step S1062, calculated the vectorization feature z of described paragraph by formula (6), (7), (8), (9);

In step S1063, utilize the vectorization feature z of the described paragraph generated, use the matching probability without the linear transformation of Dropout and the described entity word of Softmax function output correspondence:

y＝W ⁽⁵⁾·z+b ⁽⁵⁾(20)

p_{i} = \frac{\exp (y_{i})}{Σ_{j = 1}^{m} \exp (y_{j})} - - - (21)

The entity word that then matching probability is the highest is the affiliated entity word of described test paragraph.

Fig. 2 is the block schematic illustration of the entity based on level convolutional network as one embodiment of the invention and paragraph link method.

With reference to Fig. 2, the proper vector having four levels based on the entity of level convolutional network and paragraph link method represents, is respectively:

Feature hierarchy one: urtext paragraph represents the eigenmatrix obtained by term vector;

Feature hierarchy two: the sentence vectorization representation feature obtained by convolutional neural networks;

Feature hierarchy three: the paragraph vectorization representation feature obtained by convolutional neural networks;

Feature hierarchy four: utilize term vector look-up table to obtain the vectorization representation feature of entity word;

The whole model training stage has three place's supervision messages and instructs, and is respectively:

Supervision message one: the matching information of vectorization representation feature to affiliated entity word after linear change and Softmax export of sentence;

Supervision message two: the matching information of vectorization representation feature to affiliated entity word after linear change and Softmax export of paragraph;

Supervision message three: the vectorization representation feature of paragraph is after linear change and the Pair-wise similarity information of affiliated entity word;

In order to the entity of accurate evaluation the inventive method and the link performance of paragraph, the present invention by comparison entity and paragraph link result and paragraph true belonging to the consistance of entity obtain the precision (ACC) of the inventive method.A given descriptive paragraph sample x ⁽ⁱ⁾, the entity word of the inventive method link is e ⁽ⁱ⁾, and the true described entity word of paragraph is then precision is defined as follows:

Wherein, be the number of descriptive paragraph, δ (x, y) is indicator function, and as x=y, indicator function is 1, and as x ≠ y, indicator function is 0.

Two kinds of open text data sets are adopted in test of the present invention:

History: this data set comprises 409 entities, 1704 paragraphs.

Literature: this data set comprises 445 entities, 2247 paragraphs.

For these text data sets, the present invention is left intact (comprising the operation such as stop words and stem reduction).Average each paragraph comprises 4-6 bar sentence, and each paragraph only comprises 1 entity word.The concrete statistical information of data set is as shown in table 3:

Table 3

Following control methods is adopted in test of the present invention:

Control methods one: based on word bag model and this special homing method of logic, the method directly adopts this special homing method of logic on the word bag model of urtext;

Control methods two: based on the link method of convolutional neural networks, the method adopts traditional convolutional neural networks model to regard entity and paragraph link problems as a classification problem simply.

Adopt optimum configurations as shown in table 4 in the present invention's test:

Table 4

Data set	ρ	h _s	h _t	d	k
						History	0.5	3	6	100	1
Literature	0.5	3	8	100	1

In table 4, adopt the specific gravity factor of Dropout when parameter ρ is model training, h _sfor the frame mouth size of convolution kernel during sentence vectorization character representation, h _tfor the frame mouth size of convolution kernel during paragraph vectorization character representation, d is term vector dimension, the number of convolution kernel when k is sentence vectorization character representation.

In the present invention's test, all entities and paragraph link method perform asks for its mean accuracy value (ACC) for 50 times, and final test findings is as shown in table 5:

Table 5

Method	History/ accuracy value (%)	Literature/ accuracy value (%)
			Control methods one	65.10±0.01	61.17±0.05
Control methods two	77.01±3.92	74.50±10.3
			The inventive method	89.41±1.05	91.26±0.50

Table 5 is accuracy value (ACC) evaluation result of the inventive method, control methods one, control methods two entity and paragraph link method on two text data sets.Test findings shows, the performance of the inventive method is significantly superior to other control methodss.And relative to best control methods two, the accuracy value of the inventive method on two data sets improves 12.4% and 16.76% respectively.

Meanwhile, the slip word window size of verification experimental verification of the present invention convolution kernel when carrying out sentence characteristics and representing carries out the impact of the accuracy value performance that entity links with paragraph on the inventive method, test findings as shown in Figure 3.Can see, when word window size is 3, the inventive method performance all reaches optimum on two data sets, and when word window size is greater than 3, the accuracy value hydraulic performance decline of the inventive method.The slip word window size of the sentence characteristics convolution kernel thus adopted in the present invention's experiment is 3.

Above-described specific embodiment; object of the present invention, technical scheme and beneficial effect are further described; be understood that; the foregoing is only specific embodiments of the invention; be not limited to the present invention; within the spirit and principles in the present invention all, any amendment made, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.

Claims

1., based on entity and the paragraph link method of level convolutional network, comprise the following steps:

Represented by convolutional neural networks model and term vector, extract the vectorization representation feature of every bar sentence in pending paragraph;

Utilize convolutional neural networks structure and sentence vectorization to represent, learn the deep semantic feature of described paragraph;

The vectorization of described sentence is represented and to represent with the vectorization of paragraph and export entity belonging to matching paragraph respectively through Softmax;

The vectorization calculating described entity represents the pair-wise analog information represented with paragraph vectorization;

Carry out error back propagation by the pair-wise similarity information of Softmax fit object entity word and paragraph proper vector and target entity word and train described convolutional neural networks model;

Utilize the described convolutional neural networks model after upgrading to carry out deep semantic feature extraction to described pending paragraph, then represent based on the vectorization of described paragraph and to link with corresponding entity word.

2. the entity based on level convolutional network according to claim 1 and paragraph link method, it is characterized in that, describedly represented by convolutional neural networks model and term vector, the step extracting the vectorization representation feature of every bar sentence in pending paragraph comprises:

A sentence in given pending paragraph, utilizes look-up table to obtain term vector and represents and described sentence is characterized into matrix form;

Described sentence matrixing representation feature carries out one dimension convolution, obtains the eigenmatrix after convolution;

Convolution feature after described convolution is carried out average sampling to compress feature, the vectorization obtaining described sentence represents.

3. the entity based on level convolutional network according to claim 1 and paragraph link method, is characterized in that, describedly utilizes convolutional neural networks structure and described sentence vectorization to represent, the step learning the deep semantic feature of described paragraph comprises:

Utilize the sentence vector characteristics in described paragraph, by the word order of sentence in described paragraph, paragraph is characterized into matrix form;

Described paragraph matrixing representation feature carries out one dimension convolution, obtains the eigenmatrix after convolution;

Convolution feature after described convolution is carried out average sampling compress feature and carry out once linear conversion, the vectorization obtaining described paragraph represents.

4. the entity based on level convolutional network according to claim 1 and paragraph link method, it is characterized in that, the vectorization of described sentence represents and to represent that with the vectorization of described paragraph the step exporting entity belonging to paragraph described in matching respectively through Softmax comprises:

Linear transformation is carried out respectively to described sentence vector sum paragraph vector and obtains output vector, and use Dropout technology to carry out canonical;

Use the link probability of Softmax function calculated candidate entity.

5. the entity based on level convolutional network according to claim 1 and paragraph link method, is characterized in that, the vectorization of the described entity of described calculating represents that the method for the pair-wise analog information represented with described paragraph vectorization is as follows:

sim(z，E)＝{z·e ₁，z·e ₂，...，z·e _m}；

6. the entity based on level convolutional network according to claim 1 and paragraph link method, it is characterized in that, the described pair-wise similarity information by described Softmax fit object entity word and described paragraph proper vector and target entity word is carried out error back propagation and is trained the step of described convolutional neural networks model to comprise:

Export according to described sentence characteristics and paragraph feature, utilize described Softmax to concentrate the fitting result target setting function of target entity word to described training data;

According to the pair-wise similarity information setting objective function of described paragraph feature and described target entity word;

Described objective function is carried out unified fusion by setting global object constraint function;

Stochastic gradient descent method is utilized to upgrade the parameter in described convolutional neural networks model.

7. the entity based on level convolutional network according to claim 6 and paragraph link method, is characterized in that, the step of the described pair-wise similarity information setting objective function according to described paragraph feature and described target entity word comprises:

In order to strengthen the semantic meaning representation ability of described paragraph and entity, the similarity of the described paragraph vectorization feature described affiliated entity word vectorization feature corresponding with it is strengthened by target setting constraint function, and weaken the similarity of the described paragraph vectorization feature described non-belonging entity word vectorization feature corresponding with it, described in it, goal constraint function is as follows:

8. the entity based on level convolutional network according to claim 6 and paragraph link method, is characterized in that, the step that described objective function carries out unified fusion comprises by described setting global object constraint function:

Set described global object constraint function as follows:

L＝L _s+(1-α)·L _p1+α·L _p2；

Wherein, α is weight harmonic coefficient, is used for balancing two constraints of described paragraph vectorization feature, and namely paragraph feature exports and utilizes Softmax to concentrate the matching bound term L of target entity word to described training data _p1with the pair-wise similarity bound term L of paragraph feature and described target entity word _p2.

9. the entity based on level convolutional network according to claim 1 and paragraph link method, it is characterized in that, convolutional neural networks model after described renewal is treated processing section and is dropped into the feature extraction of row deep semantic, and the vectorization then based on described paragraph represents that the step of carrying out linking with corresponding described entity word comprises:

A given pending paragraph text, the convolutional neural networks model first by training calculates the vectorization feature of sentence in described paragraph;

The vectorization feature of described paragraph is calculated by the convolutional neural networks model trained;

Utilize the vectorization feature of the described paragraph generated, use the matching probability without the linear transformation of Dropout and the entity word of Softmax function output correspondence.

10. the entity based on level convolutional network according to claim 1 and paragraph link method, is characterized in that, in described convolutional neural networks model, the slip word window size of the sentence characteristics convolution kernel of employing is 3.