CN104615589A - Named-entity recognition model training method and named-entity recognition method and device - Google Patents

Named-entity recognition model training method and named-entity recognition method and device Download PDF

Info

Publication number
CN104615589A
CN104615589A CN201510082318.3A CN201510082318A CN104615589A CN 104615589 A CN104615589 A CN 104615589A CN 201510082318 A CN201510082318 A CN 201510082318A CN 104615589 A CN104615589 A CN 104615589A
Authority
CN
China
Prior art keywords
named entity
participle
rnn
mark
text string
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510082318.3A
Other languages
Chinese (zh)
Inventor
张军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Baidu Online Network Technology Beijing Co Ltd
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201510082318.3A priority Critical patent/CN104615589A/en
Publication of CN104615589A publication Critical patent/CN104615589A/en
Pending legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)

Abstract

An embodiment of the invention provides a named-entity recognition model training method and a named-entity recognition method and device. The method used for training a recurrent neutral network (RNN) named-entity recognition model includes: acquiring multiple labeled sample data, wherein each sample datum includes a text string and multiple term segment labeled data thereof, and each term segment labeled datum includes segmented terms separated from the text string and a named-entity attribute tag in the text string; mapping the segmented terms in the labeled sample data to be term vectors, taming the sample data as training samples, training the RNN named-entity recognition model, and learning parameters of the RNN named-entity recognition model. By the named-entity recognition model training method and the name-entity recognition method and device, the trained model has better generalization ability, the named entity in the natural language tests can be recognized rapidly, and recognition accuracy of the named entity is improved.

Description

Train the method for Named Entity Extraction Model, named entity recognition method and device
Technical field
The present invention relates to natural language processing technique field, particularly relate to a kind of method, named entity recognition method and device of training Named Entity Extraction Model.
Background technology
Named entity (such as name, place name, organizational structure's title, the network words etc. of certain sense) identify it is an important component part of natural language understanding, therefore, set up and safeguard that named entity storehouse is one of the core in numerous natural language processing (Natural Language Processing, NLP) field application (such as search system, machine translation system etc.).Such as, if search engine can by named entity storehouse, the search word identifying user " I had never expected " it is the title of an online movie play absolutely, and that just can return more accurate Search Results to user.
In the prior art, the following two kinds of named entity recognition methods of general employing.A kind of method excavates named entity by rule-based method in the middle of the inquiry log of search engine.Concrete, the search word of the search word recent user inputted and in the past user compares.If find it is new search word, be then the probability of named entity to the search word made new advances by the designed increment based on search word and with the similarity new probability formula of search word in the past, and search word probability being exceeded certain threshold value is identified as named entity.Although this method can identify emerging named entity on internet accurately, but the realization of described method depends on the data of inquiry log, and use search word to carry out searching described search word from user to be identified as named entity and to postpone, affect the inquiry experience of user.
Another kind method is from the corpus marked in advance (manually marking out the named entity one group of text data), by the method establishment Hidden Markov hypothesis of statistics, this model is then utilized to mark out new named entity from a large amount of text datas.Although the method can obtain good effect in small-scale data, but because it depends on Markov hypothesis, (whether current word is a part for certain named entity, depend on the word of the fixed qty (being generally 2) before it), cause this model to lack generalization ability, the accuracy of identification on large-scale data is not high.
Summary of the invention
The object of the embodiment of the present invention is, provides a kind of method, named entity recognition method and device of training Named Entity Extraction Model, can identify the named entity in natural language text quickly and automatically, and improve the identification accuracy of named entity.
In order to realize foregoing invention object, The embodiment provides a kind of for training the method for Recognition with Recurrent Neural Network (RNN) Named Entity Extraction Model, comprise: the sample data obtaining multiple mark, each described sample data comprises text string and multiple participle labeled data thereof, and described participle labeled data comprises the participle that separates from described text string and the named entity attribute mark in described text string thereof; Participle in the sample data of described multiple mark is mapped as term vector, using described sample data as training sample, RNN Named Entity Extraction Model is trained, to learn the parameter of described RNN Named Entity Extraction Model.
The embodiment of the present invention additionally provides a kind of for training the device of Recognition with Recurrent Neural Network (RNN) Named Entity Extraction Model, comprise: sample data acquisition module, for obtaining the sample data of multiple mark, each described sample data comprises text string and multiple participle labeled data thereof, and described participle labeled data comprises the participle that separates from described text string and the named entity attribute mark in described text string thereof; Parameter learning module, for the participle in the sample data of described multiple mark is mapped as term vector, using described sample data as training sample, trains RNN Named Entity Extraction Model, to learn the parameter of described RNN Named Entity Extraction Model.
The embodiment of the present invention additionally provides a kind of recognition methods of named entity, comprising: obtain text string; Word segmentation processing is carried out to described text string and obtains multiple participle; The RNN Named Entity Extraction Model obtained is trained to obtain the named entity attribute mark of maximum probability corresponding to described participle respectively by method according to claim 5; According to the named entity attribute mark of maximum probability corresponding to described participle, identification is carried out to described text string and obtain named entity.
The embodiment of the present invention additionally provides a kind of recognition device of named entity, comprising: text string acquisition module, for obtaining text string; Text string word-dividing mode, obtains multiple participle for carrying out word segmentation processing to described text string; Named entity attribute mark acquisition module, for training the RNN Named Entity Extraction Model obtained to obtain the named entity attribute mark of maximum probability corresponding to described participle respectively by device according to claim 17; Named entity recognition module, obtains named entity for carrying out identification according to the named entity attribute mark of maximum probability corresponding to described participle to described text string.
The method of the training Named Entity Extraction Model that the embodiment of the present invention provides, named entity recognition method and device, by obtaining the sample data of multiple mark, and the participle in the sample data of multiple mark is mapped as term vector, using described sample data as training sample, RNN Named Entity Extraction Model is trained, to learn the parameter of described RNN Named Entity Extraction Model.Compared with prior art, without the need to depending on inquiry log and Hidden Markov hypothesis, this model has better generalization ability, can identify the named entity in natural language text automatically and quickly, improve the identification accuracy of named entity.
Accompanying drawing explanation
Fig. 1 is the ultimate principle block diagram that the embodiment of the present invention is shown;
Fig. 2 is the process flow diagram of the method for training RNN Named Entity Extraction Model that the embodiment of the present invention one is shown;
Fig. 3 is the illustrative diagram of the RNN Named Entity Extraction Model that the embodiment of the present invention one is shown;
Fig. 4 is the process flow diagram of the recognition methods of the named entity that the embodiment of the present invention two is shown;
Fig. 5 is the logic diagram of the device for training RNN Named Entity Extraction Model that the embodiment of the present invention three is shown;
Fig. 6 is the logic diagram of the recognition device of the named entity that the embodiment of the present invention four is shown.
Embodiment
Basic conception of the present invention is, obtain the sample data of multiple mark, and the participle in the sample data of described multiple mark is mapped as term vector, using described sample data as training sample, RNN Named Entity Extraction Model is trained, to learn the parameter of described RNN Named Entity Extraction Model.On the other hand, in the text string got, each participle is as input, named entity attribute mark corresponding to described participle is obtained by trained Named Entity Extraction Model, named entity attribute mark that finally can be corresponding according to described participle, identification is carried out to text string and obtains named entity, this model has better generalization ability, makes the recognition speed of named entity faster, and improves the identification accuracy of named entity.
Fig. 1 is the ultimate principle block diagram of the embodiment of the present invention.With reference to Fig. 1, in the present invention, first need to obtain training sample, concrete, by heuristic rule to text string process obtain weak mark sample data (having marked the text of named entity in advance) as training sample, thus automatically can obtain sample data, certainly also obtain training sample by the mode of such as manual mark.Secondly, this training sample is utilized to train RNN Named Entity Extraction Model, to learn out the parameter of RNN Named Entity Extraction Model, namely utilize designed training algorithm to train the RNN Named Entity Extraction Model set up, obtain the parameter of RNN Named Entity Extraction Model.Finally, obtain text string to be identified, utilize these parameters can obtain the named entity attribute mark of maximum probability corresponding to participle in the middle of described text string to be identified, just can be identified text string by the named entity attribute mark of maximum probability corresponding to participle, finally obtain named entity.
Can from large-scale natural language text content (such as VIP web page library by said process, forum postings etc.) in the middle of, mark out a large amount of named entities, in order to the accuracy of named entity can be ensured, also by simply adding up the quantity being noted as the phrase (one or more word composition) of named entity, then a threshold value is set, if certain word frequency (word frequency refers to the number of times that some given words occur in residing file) being noted as the word of named entity exceedes this threshold value, then using by as new named entity, thus obtain the named entity storehouse that automatic mining goes out, be mainly used in such as search engine, the application in the NLP fields such as mechanical translation.
A kind of for training the method for Recognition with Recurrent Neural Network Named Entity Extraction Model, named entity recognition method and device to be described in detail to the embodiment of the present invention below in conjunction with accompanying drawing.
Embodiment one
Fig. 2 is the process flow diagram of the method for training RNN Named Entity Extraction Model that the embodiment of the present invention one is shown.Described RNN Named Entity Extraction Model is for identifying the named entity in text.
With reference to Fig. 2, in step S110, obtain the sample data of multiple mark, each described sample data comprises text string and multiple participle labeled data thereof, and described participle labeled data comprises the participle that separates from described text string and the named entity attribute mark in described text string thereof.
Concrete, according to design of the present invention, the named entity attribute mark of described participle in described text string comprises the information whether described participle belongs to named entity.
In addition, the named entity attribute mark of described participle in described text string also can comprise the position mark in described participle named entity belonging to it.
Such as, the named entity attribute mark of described participle in described text string can comprise the beginning flag of named entity, the continuity mark of named entity and non-named entity mark.Such as, the named entity attribute mark of participle in described text string is initial (the routine B) of whether certain named entity, a whether part (routine I) for certain named entity, or this word is not any named entity (routine O), thus can obtain the named entity attribute mark of all entity word in the middle of a text string.It should be noted that, mark B implication is Begin, represents the beginning of the named entity of certain type, and mark I implication is In, is expressed as the continuity of certain named entity, and mark O implication is Out, represents that this word is not named entity word.
Preferably, the named entity attribute mark of described participle in described text string also can comprise the type of the named entity belonging to described participle.Here, the type of named entity can include, but not limited to the network words of name, place name, institutional framework name, movie and television play name, title or certain sense.Such as, the named entity attribute mark of participle in described text string is initial (the routine B-DRAMA) of whether certain named entity, a whether part (routine I-DRAMA) for certain named entity, or this word is not any named entity (routine O), DRAMA can replace by the type of other predefined named entities and (as PERSON, represent name; ADDR represents address).Table 1 shows the sample data of a mark, as shown in table 1, include in the sample data of a mark text string " why I had never expected so fire absolutely? " and multiple participle labeled data, wherein, participle labeled data comprises the participle that separates from described text string and the named entity attribute mark in described text string thereof, such as, participle " absolutely " and " B-DRAMA ".
Table 1
According to design of the present invention, described training sample comprises such as M group < text string, multiple participle labeled data > sample data.Here, the value of M is generally enough large, usually can more than ten million magnitude.Content in aforementioned table 1 is exactly a concrete sample data example.Obviously, purely rely on manpower to mark this M group sample data will take time and effort very much.Therefore, further, described method also can comprise: the sample data obtaining multiple mark according to heuristic rule from natural language text.Such as, if containing paired punctuation marks used to enclose the title in described natural language text, then using the text string containing described paired punctuation marks used to enclose the title as sample data, and mark named entity attribute mark corresponding to each participle in described text string; Again such as, if certain text string contains the participle matched completely with predetermined title in described natural language text, then using the text string containing described participle as sample data, and mark named entity attribute mark corresponding to each participle in described text string.By aforementioned heuristic rule, text string is marked, automatically can obtain the sample data of weak mark, improve treatment effeciency.
In step S120, the participle in the sample data of described multiple mark is mapped as term vector, using described sample data as training sample, RNN Named Entity Extraction Model is trained, to learn the parameter of described RNN Named Entity Extraction Model.
According to an alternative embodiment of the invention, step S120 can comprise, the input layer of described RNN Named Entity Extraction Model is generated by the participle of described training sample Chinese version string, the term vector that in described input layer, each participle is corresponding is searched from predefined vocabulary, the term vector layer of described RNN Named Entity Extraction Model is generated by described term vector, matrix mapping is carried out to described term vector layer, obtain the hidden layer of described RNN Named Entity Extraction Model, using the term vector of each described participle as condition, calculate the probability of the multiple named entity attribute marks corresponding with each described participle under the described conditions respectively, as the output layer of described RNN Named Entity Extraction Model, the sample data of described multiple mark is utilized to train described RNN Named Entity Extraction Model, obtain the parameter of described RNN Named Entity Extraction Model.
Concrete, Fig. 3 is the illustrative diagram of the RNN Named Entity Extraction Model that the embodiment of the present invention one is shown.With reference to Fig. 3, participle is carried out to described training sample Chinese version string, such as, supposes that a text string comprises T participle, be designated as: Text=(w 1..., w t), each participle input word segmentation processing obtained, can generate the input layer of described RNN Named Entity Extraction Model; Each participle w in text string iall belong to a word in predefined vocabulary, the size of vocabulary is | V| (the special word <OOV> comprising in order to the OOV of mark not in the middle of dictionary); Each participle finds corresponding term vector by the mode of looking up the dictionary, and this vectorial layer is called the term vector layer of described RNN Named Entity Extraction Model.
Here it should be noted that, described term vector is used to a kind of mode of the word in language being carried out mathematicization, as its name suggests, term vector is shown as a vector a vocabulary exactly, the simplest term vector mode represents a word with a very long vector, the length of vector is the size of vocabulary, the component of vector only has one " 1 ", other are " 0 " entirely, the position of " 1 " is to should the position of word in vocabulary, for example, " microphone " is expressed as [0 001 00 0000000000 ...], but this mode can not portray the similarity between word and word well, on this basis, occur that again a kind of term vector represents, overcome aforesaid drawbacks.Its ultimate principle is directly common with one vector representation word, and such as [0.792,0.177,0.107,0.109,0.542 ...], namely common vector representation form.
In actual applications, the term vector of network represents each input word w icorresponding term vector, the column vector C (w of its to be a length be EMBEDDING_SIZE i); The hidden layer of network represents the designed state of RNN Named Entity Extraction Model when each time point i, be a length is the column vector h of HIDDEN_SIZE i, the common span of EMBEDDING_SIZE be here 50 to 1000, HIDDEN_SIZE common value be 1 to 4 times of EMBEDDING_SIZE.
Is the hidden layer of RNN Named Entity Extraction Model on term vector layer.The feature of RNN network is, when calculating the value of current hidden layer, to employ the vector value of the node of the value of term vector layer and the hidden layer of back.Output layer on hiding, the named entity attribute mark (such as B, I or O) that certain participle word of each node on behalf is possible.Output layer also can be described as SoftMax layer, can calculate the probability that each participle belongs to certain named entity attribute mark.RNN Named Entity Extraction Model is just established by the input layer of above-mentioned generation, term vector layer, hidden layer and output layer.The starting point of the present embodiment is in the middle of the sample data of aforementioned mark, with set up RNN Named Entity Extraction Model, learn out the parameter of RNN Named Entity Extraction Model, thus can extensively cannot rely on rule in the middle of the text (such as, eliminating the text of punctuation marks used to enclose the title) identifying named entity to other.
Preferably, by following formula execution is described, matrix mapping is carried out to described term vector layer, obtains the process of the hidden layer of described RNN Named Entity Extraction Model:
[h i] j=sigmoid([WC(w i)] j+[Uh i-1)] j
Wherein, [h i] jfor a jth element in i-th vector of described hidden layer, W, U are the transformation matrix parameter of described RNN Named Entity Extraction Model, C (w i) be i-th term vector of described term vector layer, h i-1for the i-th-1 vector of described hidden layer.Here, W is line number is HIDDEN_SIZE, and columns is the matrix of EMBEDDING_SIZE; U is then line number is HIDDEN_SIZE, and columns is also the matrix of HIDDEN_SIZE.Sigmoid is the function of nonlinear transformation.
Further, perform using the term vector of each described participle as condition by following formula, calculate the probability of the multiple named entity attribute marks corresponding with each described participle under the described conditions respectively, the process as the output layer of described RNN Named Entity Extraction Model:
P ( label = L i | w i ) = e O L &CenterDot; h i &Sigma; k = 1 K e O k &CenterDot; h i
Wherein, L ibe i-th named entity attribute mark, w ibe i-th participle, h ifor i-th vector of described hidden layer, O is the transformation matrix parameter of described RNN Named Entity Extraction Model, and K is the line number of described transformation matrix parameter O.Here O is a behavior K, is classified as the matrix of HIDDEN_SIZE.
Preferably, the sample data of described multiple mark is utilized to train described RNN Named Entity Extraction Model, the process obtaining the parameter of described RNN Named Entity Extraction Model can comprise: the conditional probability obtaining multiple named entity attribute marks corresponding to each described participle, conditional probability according to described multiple named entity attribute mark sets up loss function, the sample data of described multiple mark is utilized to train described loss function, obtain the parameter sets of the described RNN Named Entity Extraction Model making described loss function minimum, wherein, described parameter sets comprises term vector and transformation matrix parameter.
Particularly, perform by following formula and utilize the sample data of described multiple mark to train described RNN Named Entity Extraction Model, obtain the process of the parameter of described RNN Named Entity Extraction Model:
Wherein, all < Text, Label > to the sample data for all marks, the parameter sets that θ is the described RNN Named Entity Extraction Model that makes J (θ) minimum, described parameter sets comprises term vector and transformation matrix parameter, L ibe i-th named entity attribute mark, w ibe i-th participle.Here, it should be noted that the parameter of described RNN Named Entity Extraction Model has: term vector C (w) of each word w in the middle of vocabulary and transformation matrix parameter W, U, O, be designated as θ by this group parameter sets.
Here it should be noted that, above-mentioned formula is loss function, trains described RNN Named Entity Extraction Model by stochastic gradient descent method.Concrete, utilize stochastic gradient descent method (Stochastic Gradient Descen, SGD) and back-propagation algorithm (Back PropagationThrough Time, BPTT) exactly, optimum parameter θ can be obtained.The thought of SGD algorithm is the gradient (partial derivative of parameter) by calculating a certain group of training sample, carry out the parameter that iteration renewal random initializtion is crossed, at every turn the method upgraded allows parameter deduct a set learning rate (learningrate) be multiplied by the gradient calculated, thus can allow the value that RNN Named Entity Extraction Model calculates according to parameter after many iterations, and the difference between actual value minimizes on defined loss function.In addition, BPTT algorithm is the method for the gradient of a kind of effective calculating parameter in RNN network.
By this for training the method for RNN Named Entity Extraction Model, obtain the sample data of multiple mark, and the participle in the sample data of described multiple mark is mapped as term vector, using described sample data as training sample, RNN Named Entity Extraction Model is trained, to learn the parameter of described RNN Named Entity Extraction Model, compared with prior art, without the need to depending on inquiry log and Hidden Markov hypothesis, there is better generalization ability, the named entity identified in natural language text can be applied to, and the recognition speed of named entity is fast, degree of accuracy is higher.
Embodiment two
Fig. 4 is the process flow diagram of the recognition methods of the named entity that the embodiment of the present invention two is shown.Described method can be performed on such as search engine server.
With reference to Fig. 4, in step S210, obtain text string.
Described text string can be the search word sent from client.Such as, user inputs in browser searches engine interface " why I had never expected so fire absolutely? " search for, described search word is sent to search engine server by browser application.
In step S220, word segmentation processing is carried out to described text string and obtains multiple participle.
Such as, search engine server can utilize existing participle technique, carries out word segmentation processing obtain multiple participle to the text string got.
In step S230, the RNN Named Entity Extraction Model obtained is trained to obtain the named entity attribute mark of maximum probability corresponding to described participle respectively by the method according to embodiment one.Describe described for training the method for RNN Named Entity Extraction Model in aforesaid embodiment one.
In step S240, according to the named entity attribute mark of maximum probability corresponding to described participle, identification is carried out to described text string and obtain named entity.
In step S230, after obtaining named entity attribute mark corresponding to described participle, named entity attribute mark that just can be corresponding according to described participle, identifies text string, finally identifies the named entity in text string.
Further, as previously mentioned, the named entity attribute mark of the maximum probability that described participle is corresponding also can comprise the type of the named entity belonging to described participle, therefore, described method can also comprise: the type obtaining described named entity according to the named entity attribute mark of maximum probability corresponding to described participle.
By the recognition methods of this named entity, word segmentation processing is carried out to the text string obtained and obtains multiple participle, and the named entity attribute mark of maximum probability corresponding to described participle is obtained by trained RNN Named Entity Extraction Model, finally can according to the named entity attribute mark of maximum probability corresponding to described participle, identification is carried out to text string and obtains named entity, compared with prior art, the named entity in natural language text can be identified fast, and improve the identification accuracy of named entity, the type of identified named entity can also be obtained.
Embodiment three
Fig. 5 is the logic diagram of the device for training RNN Named Entity Extraction Model that the embodiment of the present invention three is shown.
With reference to Fig. 5, described RNN Named Entity Extraction Model is for identifying the named entity in text, described for training the device of RNN Named Entity Extraction Model to comprise sample data acquisition module 310 and parameter learning module 320.
Sample data acquisition module 310 is for obtaining the sample data of multiple mark, each described sample data comprises text string and multiple participle labeled data thereof, and described participle labeled data comprises the participle that separates from described text string and the named entity attribute mark in described text string thereof.
Alternatively, the named entity attribute mark of described participle in described text string comprises the information whether described participle belongs to named entity.Further, the named entity attribute mark of described participle in described text string also can comprise the position mark in described participle named entity belonging to it.
Preferably, the named entity attribute mark of described participle in described text string comprises: the beginning flag of named entity, the continuity mark of named entity and non-named entity mark.
Further, the named entity attribute mark of described participle in described text string also comprises the type of the named entity belonging to described participle.
Alternatively, sample data acquisition module 310 also for obtaining the sample data of multiple mark from natural language text according to heuristic rule, wherein, if containing paired punctuation marks used to enclose the title in described natural language text, then described sample data acquisition module 310 using the text string containing described paired punctuation marks used to enclose the title as sample data, and mark named entity attribute mark corresponding to each participle in described text string, if or certain text string contains the participle matched completely with predetermined title in described natural language text, then described sample data acquisition module 310 using the text string containing described participle as sample data, and mark named entity attribute mark corresponding to each participle in described text string.
Parameter learning module 320, for the participle in the sample data of described multiple mark is mapped as term vector, using described sample data as training sample, is trained RNN Named Entity Extraction Model, to learn the parameter of described RNN Named Entity Extraction Model.
Preferably, described parameter learning module 320 can comprise:
Input layer generation unit, for being generated the input layer of described RNN Named Entity Extraction Model by the participle of described training sample Chinese version string.
Term vector layer generation unit, for searching the term vector that in described input layer, each participle is corresponding from predefined vocabulary, is generated the term vector layer of described RNN Named Entity Extraction Model by described term vector.
Hidden layer generation unit, for carrying out matrix mapping to described term vector layer, obtains the hidden layer of described RNN Named Entity Extraction Model.
Output layer generation unit, for using the term vector of each described participle as condition, calculates the probability of the multiple named entity attribute marks corresponding with each described participle, under the described conditions respectively as the output layer of described RNN Named Entity Extraction Model.
Parameter learning unit, for utilizing the sample data of described multiple mark to train described RNN Named Entity Extraction Model, obtains the parameter of described RNN Named Entity Extraction Model.
Further, described hidden layer generation unit is used for carrying out matrix mapping by following formula to described term vector layer, obtains hidden layer:
[h i] j=sigmoid([WC(w i)] j+[Uh i-1)] j
Wherein, [h i] jfor a jth element in i-th vector of described hidden layer, W, U are the transformation matrix parameter of described RNN Named Entity Extraction Model, C (w i) be i-th term vector of described term vector layer, h i-1for the i-th-1 vector of described hidden layer.
Alternatively, described output layer generation unit is used for the probability being calculated the multiple named entity attribute marks corresponding with each described participle by following formula respectively, the output layer as described RNN Named Entity Extraction Model:
P ( label = L i | w i ) = e O L &CenterDot; h i &Sigma; k = 1 K e O k &CenterDot; h i
Wherein, L ibe i-th named entity attribute mark, w ibe i-th participle, h ifor i-th vector of described hidden layer, O is the transformation matrix parameter of described RNN Named Entity Extraction Model, and K is the line number of described transformation matrix parameter O.
Preferably, described parameter learning unit is for obtaining the conditional probability of multiple named entity attribute marks corresponding to each described participle, conditional probability according to described multiple named entity attribute mark sets up loss function, the sample data of described multiple mark is utilized to train described loss function, obtain the parameter sets of the described RNN Named Entity Extraction Model making the value of loss function minimum, wherein, described parameter sets comprises term vector and transformation matrix parameter.
Concrete, utilize the sample data of described multiple mark to train described RNN Named Entity Extraction Model by following formula, obtain the parameter of described RNN Named Entity Extraction Model:
Wherein, all < Text, Label > to the sample data for all marks, the parameter sets that θ is the described RNN Named Entity Extraction Model that makes J (θ) minimum, described parameter sets comprises term vector and transformation matrix parameter, L ibe i-th named entity attribute mark, w ibe i-th participle.
By this for training the device of RNN Named Entity Extraction Model, obtain the sample data of multiple mark, and the participle in the sample data of described multiple mark is mapped as term vector, using described sample data as training sample, RNN Named Entity Extraction Model is trained, to learn the parameter of described RNN Named Entity Extraction Model, compared with prior art, without the need to depending on inquiry log and Hidden Markov hypothesis, there is better generalization ability, the named entity identified in natural language text can be applied to, and the recognition speed of named entity is fast, degree of accuracy is higher.
Embodiment four
Fig. 6 is the logic diagram of the recognition device of the named entity that the embodiment of the present invention four is shown.
With reference to Fig. 6, the recognition device of described named entity comprises text string acquisition module 410, text string word-dividing mode 420, named entity attribute mark acquisition module 430 and named entity recognition module 440.
Text string acquisition module 410 is for obtaining text string.
Text string word-dividing mode 420 obtains multiple participle for carrying out word segmentation processing to described text string.
Named entity attribute mark acquisition module 430 is for training the RNN Named Entity Extraction Model obtained to obtain the named entity attribute mark of maximum probability corresponding to described participle respectively by the device according to embodiment three.Describe described for training the device of RNN Named Entity Extraction Model in aforesaid embodiment three.
Named entity recognition module 440 obtains named entity for carrying out identification according to the named entity attribute mark of maximum probability corresponding to described participle to described text string.
Further, described recognition device can also comprise: the type acquisition module (not shown) of named entity, for obtaining the type of described named entity according to the named entity attribute mark of maximum probability corresponding to described participle.
By the recognition device of this named entity, word segmentation processing is carried out to the text string obtained and obtains multiple participle, and the named entity attribute mark of maximum probability corresponding to described participle is obtained by trained RNN Named Entity Extraction Model, finally can according to the named entity attribute mark of maximum probability corresponding to described participle, identification is carried out to text string and obtains named entity, compared with prior art, the named entity in natural language text can be identified fast, and improve the identification accuracy of named entity, the type of identified named entity can also be obtained.
In addition, each functional module in each embodiment of the present invention can be integrated in a processing module, also can be that the independent physics of modules exists, also can two or more module integrations in a module.Above-mentioned integrated module both can adopt the form of hardware to realize, and the form that hardware also can be adopted to add software function module realizes.
The above-mentioned integrated module realized with the form of software function module, can be stored in a computer read/write memory medium.Above-mentioned software function module is stored in a storage medium, comprising some instructions in order to make a computer equipment (can be personal computer, server, or the network equipment etc.) or processor (processor) perform the part steps of method described in each embodiment of the present invention.And aforesaid storage medium comprises: USB flash disk, portable hard drive, ROM (read-only memory) (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disc or CD etc. various can be program code stored medium.
The above; be only the specific embodiment of the present invention, but protection scope of the present invention is not limited thereto, is anyly familiar with those skilled in the art in the technical scope that the present invention discloses; change can be expected easily or replace, all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain of described claim.

Claims (20)

1., for training a method for Recognition with Recurrent Neural Network (RNN) Named Entity Extraction Model, described RNN Named Entity Extraction Model, for identifying the named entity in text, is characterized in that, described method comprises:
Obtain the sample data of multiple mark, each described sample data comprises text string and multiple participle labeled data thereof, and described participle labeled data comprises the participle that separates from described text string and the named entity attribute mark in described text string thereof;
Participle in the sample data of described multiple mark is mapped as term vector, using described sample data as training sample, RNN Named Entity Extraction Model is trained, to learn the parameter of described RNN Named Entity Extraction Model.
2. method according to claim 1, is characterized in that, the named entity attribute mark of described participle in described text string comprises the information whether described participle belongs to named entity.
3. method according to claim 2, is characterized in that, the named entity attribute mark of described participle in described text string also comprises the position mark in described participle named entity belonging to it.
4. method according to claim 1, is characterized in that, described participle the named entity attribute mark in described text string comprise: the beginning flag of named entity, the continuity mark of named entity and non-named entity mark.
5. the method according to any one of Claims 1 to 4, is characterized in that, the named entity attribute mark of described participle in described text string also comprises the type of the named entity belonging to described participle.
6. method according to claim 5, is characterized in that, described method also comprises:
From natural language text, the sample data of multiple mark is obtained according to heuristic rule, wherein,
If containing paired punctuation marks used to enclose the title in described natural language text, then using the text string containing described paired punctuation marks used to enclose the title as sample data, and mark named entity attribute mark corresponding to each participle in described text string, or
If certain text string contains the participle matched completely with predetermined title in described natural language text, then using the text string containing described participle as sample data, and mark named entity attribute mark corresponding to each participle in described text string.
7. method according to claim 5, is characterized in that, described using described sample data as training sample, trains RNN Named Entity Extraction Model, comprises with the process of the parameter learning described RNN Named Entity Extraction Model:
The input layer of described RNN Named Entity Extraction Model is generated by the participle of described training sample Chinese version string,
From predefined vocabulary, search the term vector that in described input layer, each participle is corresponding, generated the term vector layer of described RNN Named Entity Extraction Model by described term vector,
Matrix mapping is carried out to described term vector layer, obtains the hidden layer of described RNN Named Entity Extraction Model,
Using the term vector of each described participle as condition, calculate the probability of the multiple named entity attribute marks corresponding with each described participle under the described conditions respectively, as the output layer of described RNN Named Entity Extraction Model,
Utilize the sample data of described multiple mark to train described RNN Named Entity Extraction Model, obtain the parameter of described RNN Named Entity Extraction Model.
8. method according to claim 7, is characterized in that, the described sample data of described multiple mark that utilizes is trained described RNN Named Entity Extraction Model, and the process obtaining the parameter of described RNN Named Entity Extraction Model comprises:
Obtain the conditional probability of multiple named entity attribute marks corresponding to each described participle,
Conditional probability according to described multiple named entity attribute mark sets up loss function,
Utilize the sample data of described multiple mark to train described loss function, obtain the parameter sets of the described RNN Named Entity Extraction Model making described loss function minimum, wherein, described parameter sets comprises term vector and transformation matrix parameter.
9. a recognition methods for named entity, is characterized in that, described recognition methods comprises:
Obtain text string;
Word segmentation processing is carried out to described text string and obtains multiple participle;
The RNN Named Entity Extraction Model obtained is trained to obtain the named entity attribute mark of maximum probability corresponding to described participle respectively by method according to claim 5;
According to the named entity attribute mark of maximum probability corresponding to described participle, identification is carried out to described text string and obtain named entity.
10. method according to claim 9, is characterized in that, described method also comprises: the type obtaining described named entity according to the named entity attribute mark of maximum probability corresponding to described participle.
11. 1 kinds for training the device of Recognition with Recurrent Neural Network (RNN) Named Entity Extraction Model, described RNN Named Entity Extraction Model, for identifying the named entity in text, is characterized in that, described device comprises:
Sample data acquisition module, for obtaining the sample data of multiple mark, each described sample data comprises text string and multiple participle labeled data thereof, and described participle labeled data comprises the participle that separates from described text string and the named entity attribute mark in described text string thereof;
Parameter learning module, for the participle in the sample data of described multiple mark is mapped as term vector, using described sample data as training sample, trains RNN Named Entity Extraction Model, to learn the parameter of described RNN Named Entity Extraction Model.
12. devices according to claim 11, is characterized in that, the named entity attribute mark of described participle in described text string comprises the information whether described participle belongs to named entity.
13. devices according to claim 12, is characterized in that, the named entity attribute mark of described participle in described text string also comprises the position mark in described participle named entity belonging to it.
14. devices according to claim 11, is characterized in that, described participle the named entity attribute mark in described text string comprise: the beginning flag of named entity, the continuity mark of named entity and non-named entity mark.
15. devices according to any one of claim 11 ~ 14, it is characterized in that, the named entity attribute mark of described participle in described text string also comprises the type of the named entity belonging to described participle.
16. devices according to claim 15, is characterized in that, described sample data acquisition module also for obtaining the sample data of multiple mark from natural language text according to heuristic rule, wherein,
If containing paired punctuation marks used to enclose the title in described natural language text, then described sample data acquisition module is using the text string containing described paired punctuation marks used to enclose the title as sample data, and marks named entity attribute mark corresponding to each participle in described text string, or
If certain text string contains the participle matched completely with predetermined title in described natural language text, then described sample data acquisition module is using the text string containing described participle as sample data, and marks named entity attribute mark corresponding to each participle in described text string.
17. devices according to claim 15, is characterized in that, described parameter learning module comprises:
Input layer generation unit, for being generated the input layer of described RNN Named Entity Extraction Model by the participle of described training sample Chinese version string,
Term vector layer generation unit, for searching the term vector that in described input layer, each participle is corresponding from predefined vocabulary, is generated the term vector layer of described RNN Named Entity Extraction Model by described term vector,
Hidden layer generation unit, for carrying out matrix mapping to described term vector layer, obtains the hidden layer of described RNN Named Entity Extraction Model,
Output layer generation unit, for using the term vector of each described participle as condition, calculates the probability of the multiple named entity attribute marks corresponding with each described participle under the described conditions respectively, as the output layer of described RNN Named Entity Extraction Model,
Parameter learning unit, for utilizing the sample data of described multiple mark to train described RNN Named Entity Extraction Model, obtains the parameter of described RNN Named Entity Extraction Model.
18. devices according to claim 17, it is characterized in that, described parameter learning unit is for obtaining the conditional probability of multiple named entity attribute marks corresponding to each described participle, conditional probability according to described multiple named entity attribute mark sets up loss function, the sample data of described multiple mark is utilized to train described loss function, obtain the parameter sets of the described RNN Named Entity Extraction Model making described loss function minimum, wherein, described parameter sets comprises term vector and transformation matrix parameter.
The recognition device of 19. 1 kinds of named entities, is characterized in that, described recognition device comprises:
Text string acquisition module, for obtaining text string;
Text string word-dividing mode, obtains multiple participle for carrying out word segmentation processing to described text string;
Named entity attribute mark acquisition module, for training the RNN Named Entity Extraction Model obtained to obtain the named entity attribute mark of maximum probability corresponding to described participle respectively by device according to claim 17;
Named entity recognition module, obtains named entity for carrying out identification according to the named entity attribute mark of maximum probability corresponding to described participle to described text string.
20. devices according to claim 19, is characterized in that, described recognition device also comprises: the type acquisition module of named entity, for obtaining the type of described named entity according to the named entity attribute mark of maximum probability corresponding to described participle.
CN201510082318.3A 2015-02-15 2015-02-15 Named-entity recognition model training method and named-entity recognition method and device Pending CN104615589A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510082318.3A CN104615589A (en) 2015-02-15 2015-02-15 Named-entity recognition model training method and named-entity recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510082318.3A CN104615589A (en) 2015-02-15 2015-02-15 Named-entity recognition model training method and named-entity recognition method and device

Publications (1)

Publication Number Publication Date
CN104615589A true CN104615589A (en) 2015-05-13

Family

ID=53150041

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510082318.3A Pending CN104615589A (en) 2015-02-15 2015-02-15 Named-entity recognition model training method and named-entity recognition method and device

Country Status (1)

Country Link
CN (1) CN104615589A (en)

Cited By (60)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104899304A (en) * 2015-06-12 2015-09-09 北京京东尚科信息技术有限公司 Named entity identification method and device
CN105183720A (en) * 2015-08-05 2015-12-23 百度在线网络技术(北京)有限公司 Machine translation method and apparatus based on RNN model
CN105320645A (en) * 2015-09-24 2016-02-10 天津海量信息技术有限公司 Recognition method for Chinese company name
CN105786782A (en) * 2016-03-25 2016-07-20 北京搜狗科技发展有限公司 Word vector training method and device
CN105808523A (en) * 2016-03-08 2016-07-27 浪潮软件股份有限公司 Method and apparatus for identifying document
CN105868184A (en) * 2016-05-10 2016-08-17 大连理工大学 Chinese name recognition method based on recurrent neural network
CN105894088A (en) * 2016-03-25 2016-08-24 苏州赫博特医疗信息科技有限公司 Medical information extraction system and method based on depth learning and distributed semantic features
CN105893354A (en) * 2016-05-03 2016-08-24 成都数联铭品科技有限公司 Word segmentation method based on bidirectional recurrent neural network
CN105930413A (en) * 2016-04-18 2016-09-07 北京百度网讯科技有限公司 Training method for similarity model parameters, search processing method and corresponding apparatuses
CN105955953A (en) * 2016-05-03 2016-09-21 成都数联铭品科技有限公司 Word segmentation system
CN105955952A (en) * 2016-05-03 2016-09-21 成都数联铭品科技有限公司 Information extraction method based on bidirectional recurrent neural network
CN105955954A (en) * 2016-05-03 2016-09-21 成都数联铭品科技有限公司 New enterprise name discovery method based on bidirectional recurrent neural network
CN105975456A (en) * 2016-05-03 2016-09-28 成都数联铭品科技有限公司 Enterprise entity name analysis and identification system
CN105975455A (en) * 2016-05-03 2016-09-28 成都数联铭品科技有限公司 information analysis system based on bidirectional recurrent neural network
CN106095749A (en) * 2016-06-03 2016-11-09 杭州量知数据科技有限公司 A kind of text key word extracting method based on degree of depth study
CN106202574A (en) * 2016-08-19 2016-12-07 清华大学 The appraisal procedure recommended towards microblog topic and device
CN106383816A (en) * 2016-09-26 2017-02-08 大连民族大学 Chinese minority region name identification method based on deep learning
CN106407183A (en) * 2016-09-28 2017-02-15 医渡云(北京)技术有限公司 Method and device for generating medical named entity recognition system
CN106557462A (en) * 2016-11-02 2017-04-05 数库(上海)科技有限公司 Name entity recognition method and system
CN106557563A (en) * 2016-11-15 2017-04-05 北京百度网讯科技有限公司 Query statement based on artificial intelligence recommends method and device
CN106570170A (en) * 2016-11-09 2017-04-19 武汉泰迪智慧科技有限公司 Text classification and naming entity recognition integrated method and system based on depth cyclic neural network
CN106708804A (en) * 2016-12-27 2017-05-24 努比亚技术有限公司 Method and device for generating word vectors
CN106776562A (en) * 2016-12-20 2017-05-31 上海智臻智能网络科技股份有限公司 A kind of keyword extracting method and extraction system
CN106815194A (en) * 2015-11-27 2017-06-09 北京国双科技有限公司 Model training method and device and keyword recognition method and device
CN106815193A (en) * 2015-11-27 2017-06-09 北京国双科技有限公司 Model training method and device and wrong word recognition methods and device
CN106844788A (en) * 2017-03-17 2017-06-13 重庆文理学院 A kind of library's intelligent search sort method and system
CN106970902A (en) * 2016-01-13 2017-07-21 北京国双科技有限公司 A kind of Chinese word cutting method and device
CN107704454A (en) * 2017-10-25 2018-02-16 古联(北京)数字传媒科技有限公司 The recognition methods of ancient books proper name and device
CN107783960A (en) * 2017-10-23 2018-03-09 百度在线网络技术(北京)有限公司 Method, apparatus and equipment for Extracting Information
CN107797987A (en) * 2017-10-12 2018-03-13 北京知道未来信息技术有限公司 A kind of mixing language material name entity recognition method based on Bi LSTM CNN
CN107818080A (en) * 2017-09-22 2018-03-20 新译信息科技(北京)有限公司 Term recognition methods and device
CN107832303A (en) * 2017-11-22 2018-03-23 古联(北京)数字传媒科技有限公司 The recognition methods of ancient books title and device
WO2018059302A1 (en) * 2016-09-29 2018-04-05 腾讯科技(深圳)有限公司 Text recognition method and device, and storage medium
CN108074565A (en) * 2016-11-11 2018-05-25 上海诺悦智能科技有限公司 Phonetic order redirects the method and system performed with detailed instructions
CN108090044A (en) * 2017-12-05 2018-05-29 五八有限公司 The recognition methods of contact method and device
CN108205524A (en) * 2016-12-20 2018-06-26 北京京东尚科信息技术有限公司 Text data processing method and device
CN108363701A (en) * 2018-04-13 2018-08-03 达而观信息科技(上海)有限公司 Name entity recognition method and system
CN108509423A (en) * 2018-04-04 2018-09-07 福州大学 A kind of acceptance of the bid webpage name entity abstracting method based on second order HMM
CN108536733A (en) * 2017-03-02 2018-09-14 埃森哲环球解决方案有限公司 Artificial intelligence digital agent
CN108595430A (en) * 2018-04-26 2018-09-28 携程旅游网络技术(上海)有限公司 Boat becomes information extracting method and system
CN108920460A (en) * 2018-06-26 2018-11-30 武大吉奥信息技术有限公司 A kind of training method and device of the multitask deep learning model of polymorphic type Entity recognition
CN109033427A (en) * 2018-08-10 2018-12-18 北京字节跳动网络技术有限公司 The screening technique and device of stock, computer equipment and readable storage medium storing program for executing
CN109710925A (en) * 2018-12-12 2019-05-03 新华三大数据技术有限公司 Name entity recognition method and device
CN109726398A (en) * 2018-12-27 2019-05-07 北京奇安信科技有限公司 A kind of Entity recognition and determined property method, system, equipment and medium
CN109740150A (en) * 2018-12-20 2019-05-10 出门问问信息科技有限公司 Address resolution method, device, computer equipment and computer readable storage medium
CN110222340A (en) * 2019-06-06 2019-09-10 掌阅科技股份有限公司 Training method, electronic equipment and the storage medium of books characters name identification model
CN110275953A (en) * 2019-06-21 2019-09-24 四川大学 Personality classification method and device
CN110402445A (en) * 2017-04-20 2019-11-01 谷歌有限责任公司 Use recurrent neural network processing sequence data
CN110516228A (en) * 2019-07-04 2019-11-29 湖南星汉数智科技有限公司 Name entity recognition method, device, computer installation and computer readable storage medium
CN110598210A (en) * 2019-08-29 2019-12-20 深圳市优必选科技股份有限公司 Entity recognition model training method, entity recognition device, entity recognition equipment and medium
CN110728147A (en) * 2018-06-28 2020-01-24 阿里巴巴集团控股有限公司 Model training method and named entity recognition method
CN110889287A (en) * 2019-11-08 2020-03-17 创新工场(广州)人工智能研究有限公司 Method and device for named entity recognition
CN110929875A (en) * 2019-10-12 2020-03-27 平安国际智慧城市科技股份有限公司 Intelligent language learning method, system, device and medium based on machine learning
CN111105458A (en) * 2018-10-25 2020-05-05 深圳市深蓝牙医疗科技有限公司 Oral implant positioning method, oral tissue identification model establishing method, device, equipment and storage medium
CN111191107A (en) * 2018-10-25 2020-05-22 北京嘀嘀无限科技发展有限公司 System and method for recalling points of interest using annotation model
WO2020132985A1 (en) * 2018-12-26 2020-07-02 深圳市优必选科技有限公司 Self-training method and apparatus for model, computer device, and storage medium
CN111368036A (en) * 2020-03-05 2020-07-03 百度在线网络技术(北京)有限公司 Method and apparatus for searching information
CN111523314A (en) * 2020-07-03 2020-08-11 支付宝(杭州)信息技术有限公司 Model confrontation training and named entity recognition method and device
CN111563380A (en) * 2019-01-25 2020-08-21 浙江大学 Named entity identification method and device
US11113608B2 (en) 2017-10-30 2021-09-07 Accenture Global Solutions Limited Hybrid bot framework for enterprises

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101075228A (en) * 2006-05-15 2007-11-21 松下电器产业株式会社 Method and apparatus for named entity recognition in natural language
CN102314417A (en) * 2011-09-22 2012-01-11 西安电子科技大学 Method for identifying Web named entity based on statistical model
US20120265521A1 (en) * 2005-05-05 2012-10-18 Scott Miller Methods and systems relating to information extraction

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120265521A1 (en) * 2005-05-05 2012-10-18 Scott Miller Methods and systems relating to information extraction
CN101075228A (en) * 2006-05-15 2007-11-21 松下电器产业株式会社 Method and apparatus for named entity recognition in natural language
CN102314417A (en) * 2011-09-22 2012-01-11 西安电子科技大学 Method for identifying Web named entity based on statistical model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
GUOYU WANG等: "using hybrid neural network to address Chinese named entity recognition", 《PROCEEDINGS OF CCIS2014》 *

Cited By (87)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104899304A (en) * 2015-06-12 2015-09-09 北京京东尚科信息技术有限公司 Named entity identification method and device
CN104899304B (en) * 2015-06-12 2018-02-16 北京京东尚科信息技术有限公司 Name entity recognition method and device
CN105183720A (en) * 2015-08-05 2015-12-23 百度在线网络技术(北京)有限公司 Machine translation method and apparatus based on RNN model
CN105183720B (en) * 2015-08-05 2019-07-09 百度在线网络技术(北京)有限公司 Machine translation method and device based on RNN model
CN105320645B (en) * 2015-09-24 2019-07-12 天津海量信息技术股份有限公司 The recognition methods of Chinese enterprise name
CN105320645A (en) * 2015-09-24 2016-02-10 天津海量信息技术有限公司 Recognition method for Chinese company name
CN106815194A (en) * 2015-11-27 2017-06-09 北京国双科技有限公司 Model training method and device and keyword recognition method and device
CN106815193A (en) * 2015-11-27 2017-06-09 北京国双科技有限公司 Model training method and device and wrong word recognition methods and device
CN106970902A (en) * 2016-01-13 2017-07-21 北京国双科技有限公司 A kind of Chinese word cutting method and device
CN105808523A (en) * 2016-03-08 2016-07-27 浪潮软件股份有限公司 Method and apparatus for identifying document
CN105786782A (en) * 2016-03-25 2016-07-20 北京搜狗科技发展有限公司 Word vector training method and device
CN105786782B (en) * 2016-03-25 2018-10-19 北京搜狗信息服务有限公司 A kind of training method and device of term vector
CN105894088B (en) * 2016-03-25 2018-06-29 苏州赫博特医疗信息科技有限公司 Based on deep learning and distributed semantic feature medical information extraction system and method
CN105894088A (en) * 2016-03-25 2016-08-24 苏州赫博特医疗信息科技有限公司 Medical information extraction system and method based on depth learning and distributed semantic features
CN105930413A (en) * 2016-04-18 2016-09-07 北京百度网讯科技有限公司 Training method for similarity model parameters, search processing method and corresponding apparatuses
CN105955952A (en) * 2016-05-03 2016-09-21 成都数联铭品科技有限公司 Information extraction method based on bidirectional recurrent neural network
CN105893354A (en) * 2016-05-03 2016-08-24 成都数联铭品科技有限公司 Word segmentation method based on bidirectional recurrent neural network
CN105955953A (en) * 2016-05-03 2016-09-21 成都数联铭品科技有限公司 Word segmentation system
CN105955954A (en) * 2016-05-03 2016-09-21 成都数联铭品科技有限公司 New enterprise name discovery method based on bidirectional recurrent neural network
CN105975456A (en) * 2016-05-03 2016-09-28 成都数联铭品科技有限公司 Enterprise entity name analysis and identification system
CN105975455A (en) * 2016-05-03 2016-09-28 成都数联铭品科技有限公司 information analysis system based on bidirectional recurrent neural network
CN105868184A (en) * 2016-05-10 2016-08-17 大连理工大学 Chinese name recognition method based on recurrent neural network
CN105868184B (en) * 2016-05-10 2018-06-08 大连理工大学 A kind of Chinese personal name recognition method based on Recognition with Recurrent Neural Network
CN106095749A (en) * 2016-06-03 2016-11-09 杭州量知数据科技有限公司 A kind of text key word extracting method based on degree of depth study
CN106202574A (en) * 2016-08-19 2016-12-07 清华大学 The appraisal procedure recommended towards microblog topic and device
CN106383816A (en) * 2016-09-26 2017-02-08 大连民族大学 Chinese minority region name identification method based on deep learning
CN106383816B (en) * 2016-09-26 2018-11-30 大连民族大学 The recognition methods of Chinese minority area place name based on deep learning
CN106407183A (en) * 2016-09-28 2017-02-15 医渡云(北京)技术有限公司 Method and device for generating medical named entity recognition system
CN106407183B (en) * 2016-09-28 2019-06-28 医渡云(北京)技术有限公司 Medical treatment name entity recognition system generation method and device
CN107885716B (en) * 2016-09-29 2020-02-11 腾讯科技(深圳)有限公司 Text recognition method and device
US11068655B2 (en) 2016-09-29 2021-07-20 Tencent Technology (Shenzhen) Company Limited Text recognition based on training of models at a plurality of training nodes
WO2018059302A1 (en) * 2016-09-29 2018-04-05 腾讯科技(深圳)有限公司 Text recognition method and device, and storage medium
CN107885716A (en) * 2016-09-29 2018-04-06 腾讯科技(深圳)有限公司 Text recognition method and device
CN106557462A (en) * 2016-11-02 2017-04-05 数库(上海)科技有限公司 Name entity recognition method and system
CN106570170A (en) * 2016-11-09 2017-04-19 武汉泰迪智慧科技有限公司 Text classification and naming entity recognition integrated method and system based on depth cyclic neural network
CN108074565A (en) * 2016-11-11 2018-05-25 上海诺悦智能科技有限公司 Phonetic order redirects the method and system performed with detailed instructions
CN106557563B (en) * 2016-11-15 2020-09-25 北京百度网讯科技有限公司 Query statement recommendation method and device based on artificial intelligence
CN106557563A (en) * 2016-11-15 2017-04-05 北京百度网讯科技有限公司 Query statement based on artificial intelligence recommends method and device
CN108205524B (en) * 2016-12-20 2022-01-07 北京京东尚科信息技术有限公司 Text data processing method and device
CN108205524A (en) * 2016-12-20 2018-06-26 北京京东尚科信息技术有限公司 Text data processing method and device
CN106776562B (en) * 2016-12-20 2020-07-28 上海智臻智能网络科技股份有限公司 Keyword extraction method and extraction system
CN106776562A (en) * 2016-12-20 2017-05-31 上海智臻智能网络科技股份有限公司 A kind of keyword extracting method and extraction system
CN106708804A (en) * 2016-12-27 2017-05-24 努比亚技术有限公司 Method and device for generating word vectors
CN108536733A (en) * 2017-03-02 2018-09-14 埃森哲环球解决方案有限公司 Artificial intelligence digital agent
CN106844788B (en) * 2017-03-17 2020-02-18 重庆文理学院 Library intelligent search sorting method and system
CN106844788A (en) * 2017-03-17 2017-06-13 重庆文理学院 A kind of library's intelligent search sort method and system
CN110402445B (en) * 2017-04-20 2023-07-11 谷歌有限责任公司 Method and system for browsing sequence data using recurrent neural network
CN110402445A (en) * 2017-04-20 2019-11-01 谷歌有限责任公司 Use recurrent neural network processing sequence data
CN107818080A (en) * 2017-09-22 2018-03-20 新译信息科技(北京)有限公司 Term recognition methods and device
CN107797987A (en) * 2017-10-12 2018-03-13 北京知道未来信息技术有限公司 A kind of mixing language material name entity recognition method based on Bi LSTM CNN
CN107797987B (en) * 2017-10-12 2021-02-09 北京知道未来信息技术有限公司 Bi-LSTM-CNN-based mixed corpus named entity identification method
CN107783960A (en) * 2017-10-23 2018-03-09 百度在线网络技术(北京)有限公司 Method, apparatus and equipment for Extracting Information
US11288593B2 (en) 2017-10-23 2022-03-29 Baidu Online Network Technology (Beijing) Co., Ltd. Method, apparatus and device for extracting information
CN107704454A (en) * 2017-10-25 2018-02-16 古联(北京)数字传媒科技有限公司 The recognition methods of ancient books proper name and device
US11113608B2 (en) 2017-10-30 2021-09-07 Accenture Global Solutions Limited Hybrid bot framework for enterprises
CN107832303A (en) * 2017-11-22 2018-03-23 古联(北京)数字传媒科技有限公司 The recognition methods of ancient books title and device
CN108090044A (en) * 2017-12-05 2018-05-29 五八有限公司 The recognition methods of contact method and device
CN108090044B (en) * 2017-12-05 2022-03-15 五八有限公司 Contact information identification method and device
CN108509423A (en) * 2018-04-04 2018-09-07 福州大学 A kind of acceptance of the bid webpage name entity abstracting method based on second order HMM
CN108363701A (en) * 2018-04-13 2018-08-03 达而观信息科技(上海)有限公司 Name entity recognition method and system
CN108595430B (en) * 2018-04-26 2022-02-22 携程旅游网络技术(上海)有限公司 Aviation transformer information extraction method and system
CN108595430A (en) * 2018-04-26 2018-09-28 携程旅游网络技术(上海)有限公司 Boat becomes information extracting method and system
CN108920460A (en) * 2018-06-26 2018-11-30 武大吉奥信息技术有限公司 A kind of training method and device of the multitask deep learning model of polymorphic type Entity recognition
CN108920460B (en) * 2018-06-26 2022-03-11 武大吉奥信息技术有限公司 Training method of multi-task deep learning model for multi-type entity recognition
CN110728147B (en) * 2018-06-28 2023-04-28 阿里巴巴集团控股有限公司 Model training method and named entity recognition method
CN110728147A (en) * 2018-06-28 2020-01-24 阿里巴巴集团控股有限公司 Model training method and named entity recognition method
CN109033427A (en) * 2018-08-10 2018-12-18 北京字节跳动网络技术有限公司 The screening technique and device of stock, computer equipment and readable storage medium storing program for executing
CN111191107A (en) * 2018-10-25 2020-05-22 北京嘀嘀无限科技发展有限公司 System and method for recalling points of interest using annotation model
CN111105458A (en) * 2018-10-25 2020-05-05 深圳市深蓝牙医疗科技有限公司 Oral implant positioning method, oral tissue identification model establishing method, device, equipment and storage medium
US11093531B2 (en) 2018-10-25 2021-08-17 Beijing Didi Infinity Technology And Development Co., Ltd. Systems and methods for recalling points of interest using a tagging model
CN111191107B (en) * 2018-10-25 2023-06-30 北京嘀嘀无限科技发展有限公司 System and method for recalling points of interest using annotation model
CN109710925A (en) * 2018-12-12 2019-05-03 新华三大数据技术有限公司 Name entity recognition method and device
CN109740150A (en) * 2018-12-20 2019-05-10 出门问问信息科技有限公司 Address resolution method, device, computer equipment and computer readable storage medium
WO2020132985A1 (en) * 2018-12-26 2020-07-02 深圳市优必选科技有限公司 Self-training method and apparatus for model, computer device, and storage medium
CN109726398A (en) * 2018-12-27 2019-05-07 北京奇安信科技有限公司 A kind of Entity recognition and determined property method, system, equipment and medium
CN109726398B (en) * 2018-12-27 2023-07-07 奇安信科技集团股份有限公司 Entity identification and attribute judgment method, system, equipment and medium
CN111563380A (en) * 2019-01-25 2020-08-21 浙江大学 Named entity identification method and device
CN110222340A (en) * 2019-06-06 2019-09-10 掌阅科技股份有限公司 Training method, electronic equipment and the storage medium of books characters name identification model
CN110275953A (en) * 2019-06-21 2019-09-24 四川大学 Personality classification method and device
CN110516228A (en) * 2019-07-04 2019-11-29 湖南星汉数智科技有限公司 Name entity recognition method, device, computer installation and computer readable storage medium
CN110598210B (en) * 2019-08-29 2023-08-04 深圳市优必选科技股份有限公司 Entity recognition model training, entity recognition method, entity recognition device, entity recognition equipment and medium
CN110598210A (en) * 2019-08-29 2019-12-20 深圳市优必选科技股份有限公司 Entity recognition model training method, entity recognition device, entity recognition equipment and medium
CN110929875A (en) * 2019-10-12 2020-03-27 平安国际智慧城市科技股份有限公司 Intelligent language learning method, system, device and medium based on machine learning
CN110889287A (en) * 2019-11-08 2020-03-17 创新工场(广州)人工智能研究有限公司 Method and device for named entity recognition
CN111368036A (en) * 2020-03-05 2020-07-03 百度在线网络技术(北京)有限公司 Method and apparatus for searching information
CN111368036B (en) * 2020-03-05 2023-09-26 百度在线网络技术(北京)有限公司 Method and device for searching information
CN111523314A (en) * 2020-07-03 2020-08-11 支付宝(杭州)信息技术有限公司 Model confrontation training and named entity recognition method and device

Similar Documents

Publication Publication Date Title
CN104615589A (en) Named-entity recognition model training method and named-entity recognition method and device
CN109145153B (en) Intention category identification method and device
CN107679039B (en) Method and device for determining statement intention
CN110019471B (en) Generating text from structured data
Nguyen et al. Relation extraction: Perspective from convolutional neural networks
US20230169270A1 (en) Entity linking method and apparatus
CN108962224B (en) Joint modeling method, dialogue method and system for spoken language understanding and language model
CN112711948B (en) Named entity recognition method and device for Chinese sentences
CN104615767B (en) Training method, search processing method and the device of searching order model
CN109376222B (en) Question-answer matching degree calculation method, question-answer automatic matching method and device
CN110795911B (en) Real-time adding method and device for online text labels and related equipment
US20180053107A1 (en) Aspect-based sentiment analysis
CN111160031A (en) Social media named entity identification method based on affix perception
CN111444320A (en) Text retrieval method and device, computer equipment and storage medium
CN109284397A (en) A kind of construction method of domain lexicon, device, equipment and storage medium
Chrupała Text segmentation with character-level text embeddings
CN111611452B (en) Method, system, equipment and storage medium for identifying ambiguity of search text
CN109472022B (en) New word recognition method based on machine learning and terminal equipment
CN104699797A (en) Webpage data structured analytic method and device
CN103823857A (en) Space information searching method based on natural language processing
KR20220120545A (en) Method and apparatus for obtaining PIO status information
CN113553853B (en) Named entity recognition method and device, computer equipment and storage medium
CN113761868B (en) Text processing method, text processing device, electronic equipment and readable storage medium
CN112507337A (en) Implementation method of malicious JavaScript code detection model based on semantic analysis
CN107656921A (en) A kind of short text dependency analysis method based on deep learning

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20150513