CN109710927A - Name recognition methods, device, readable storage medium storing program for executing and the electronic equipment of entity - Google Patents

Name recognition methods, device, readable storage medium storing program for executing and the electronic equipment of entity Download PDF

Info

Publication number
CN109710927A
CN109710927A CN201811519563.6A CN201811519563A CN109710927A CN 109710927 A CN109710927 A CN 109710927A CN 201811519563 A CN201811519563 A CN 201811519563A CN 109710927 A CN109710927 A CN 109710927A
Authority
CN
China
Prior art keywords
participle
target
condition probability
segmented
probability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811519563.6A
Other languages
Chinese (zh)
Other versions
CN109710927B (en
Inventor
贾弼然
崔朝辉
赵立军
张霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Neusoft Corp
Original Assignee
Neusoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Neusoft Corp filed Critical Neusoft Corp
Priority to CN201811519563.6A priority Critical patent/CN109710927B/en
Publication of CN109710927A publication Critical patent/CN109710927A/en
Application granted granted Critical
Publication of CN109710927B publication Critical patent/CN109710927B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

This disclosure relates to a kind of recognition methods, device, readable storage medium storing program for executing and electronic equipment for naming entity.Method comprises determining that t-th of target in text segments xtCorresponding all possible true participle;For each true participle, determine that each participle state corresponds to the first condition Probability p (a really segmented respectivelyd|li), wherein adD-th of characterization true participle, liCharacterize i-th of participle state;Correspond to the target according to each true participle and segments xtSecond condition Probability p (xt|ad) and the first condition Probability p (ad|li), determine that each participle state corresponds to the target and segments xtThird condition Probability p (xt|li);According to the third condition Probability p (xt|li), x is segmented to the targettIt is named Entity recognition.In this way, improving the accuracy rate and recall rate of name Entity recognition, and it also can effectively avoid and occur the case where multiword, few word or wrong word during text identification.

Description

Name recognition methods, device, readable storage medium storing program for executing and the electronic equipment of entity
Technical field
This disclosure relates to natural language processing field, and in particular, to it is a kind of name the recognition methods of entity, device, can Read storage medium and electronic equipment.
Background technique
With the application of artificial intelligence, natural language processing is increasingly taken seriously and popularizes.And in natural language processing In engineering, name Entity recognition is one critically important step of natural language processing initial stage, for the time in text, number The entities such as word, name, place name and organization's title have great significance in many research fields.Name Entity recognition at present Hidden Markov model (HMM) is used mostly, but usually will appear some problems in identification process, for example, for opening It puts transliteration entity in collection text and is likely to occur very much different translation texts greatly, will cause very big ambiguity and very high in identification process Error rate, alternatively, will appear in marking and translating text obtained by some corpus of low quality multiword, few word or it is wrong not The problem of word.Therefore, the name entity in text cannot be accurately identified using existing HMM model.
Summary of the invention
In order to overcome problems of the prior art, the embodiment of the present disclosure provide a kind of recognition methods for naming entity, Device, readable storage medium storing program for executing and electronic equipment.
To achieve the goals above, disclosure first aspect provides a kind of recognition methods for naming entity, comprising:
Determine t-th of target participle x in texttCorresponding all possible true participle;
For each true participle, determine that each participle state corresponds to the first condition probability really segmented respectively p(ad|li), wherein adD-th of characterization true participle, liCharacterize i-th of participle state;
Correspond to the target according to each true participle and segments xtSecond condition Probability p (xt|ad), Yi Jisuo State first condition Probability p (ad|li), determine that each participle state corresponds to the target and segments xtThird condition Probability p (xt| li);
According to the third condition Probability p (xt|li), x is segmented to the targettIt is named Entity recognition.
Optionally, described to be directed to each true participle, determine that each participle state corresponds to what this was really segmented respectively First condition Probability p (ad|li), comprising:
For each true participle, the target participle x is determinedtCorresponding to the fourth condition probability really segmented p(ad|xt);
X is segmented according to the targettCorresponding to each fourth condition Probability p (a really segmentedd|xt), estimate Each participle state is counted corresponding to each first condition Probability p (a really segmentedd|li)。
Optionally, described that x is segmented according to the targettCorresponding to each fourth condition probability really segmented p(ad|xt), estimate that each participle state corresponds to each first condition Probability p (a really segmentedd|li), comprising:
(1), will be so that d (z to formula (2) according to the following formulat,yi) meet preset conditionIt is determined as each participle shape State corresponds to each first condition probability really segmented:
Wherein, D characterizes the sum really segmented,Characterize the target participle xtCorresponding to d-th The fourth condition probability really segmented,It characterizes i-th of participle state and corresponds to first really segmented for d-th Conditional probability,Characterize the target participle xtCorresponding to each fourth condition really segmented The vector of probability,It characterizes i-th of participle state and corresponds to the first condition each really segmented The vector of probability, d (zt,yi) characterization ztAnd yiRelative entropy.
Optionally, the preset condition are as follows: loss functionIt is minimum;Wherein, TiCharacterization belongs to described I-th of participle state liTarget participle sum, L characterizes the sum of the participle state,Characterize i-th of participle shape State and the target segment xtBetween it is whether relevant, if relevant be 1, be otherwise 0.
Optionally, described that target participle x is corresponded to according to each true participletSecond condition Probability p (xt| ad) and the first condition Probability p (ad|li), determine that each participle state corresponds to the target and segments xtThird condition Probability p (xt|li), comprising:
(3) according to the following formula determine that each participle state corresponds to the target and segments xtThird condition Probability p (xt| li):
Wherein, D characterizes the sum really segmented.
Optionally, described according to the third condition Probability p (xt|li), x is segmented to the targettIt is named entity knowledge Not, comprising:
The corresponding participle state of maximum third condition probability is determined as the target participle xtName Entity recognition knot Fruit.
Disclosure second aspect provides a kind of identification device for naming entity, comprising:
First determining module, for determining that t-th of target in text segments xtCorresponding all possible true participle;
Second determining module, for being determined respectively for each of the first determining module determination true participle Each participle state corresponds to the first condition Probability p (a really segmentedd|li), wherein adD-th of characterization true participle, liTable Levy i-th of participle state;
Third determining module, it is described for being corresponded to according to each of the second determining module determination true participle Target segments xtSecond condition Probability p (xt|ad) and the first condition Probability p (ad|li), determine each participle state pair Target described in Ying Yu segments xtThird condition Probability p (xt|li);
Identification module, the third condition Probability p (x for being determined according to the third determining modulet|li), to described Target segments xtIt is named Entity recognition.
Optionally, second determining module includes:
First determines submodule, for being directed to each true participle, determines the target participle xtIt is true corresponding to this Fourth condition Probability p (a segmented in factd|xt);
Submodule is estimated, for determining that the target that submodule determines segments x according to described firsttCorresponding to each institute State the fourth condition Probability p (a really segmentedd|xt), estimate that each participle state corresponds to each institute really segmented State first condition Probability p (ad|li)。
Optionally, the estimation submodule includes:
Second determines submodule, for (1), will be so that d (z to formula (2) according to the following formulat,yi) meet preset condition 'sIt is determined as each participle state corresponding to each first condition probability really segmented:
Wherein, D characterizes the sum really segmented,Characterize the target participle xtCorresponding to d-th The fourth condition probability really segmented,It characterizes i-th of participle state and corresponds to first really segmented for d-th Conditional probability,Characterize the target participle xtCorresponding to each fourth condition really segmented The vector of probability,It characterizes i-th of participle state and corresponds to the first condition each really segmented The vector of probability, d (zt,yi) characterization ztAnd yiRelative entropy.
Optionally, the preset condition are as follows: loss functionIt is minimum;Wherein, TiCharacterization belongs to described the I participle state liTarget participle sum, L characterizes the sum of the participle state,Characterize i-th of participle state X is segmented with the targettBetween it is whether relevant, if relevant be 1, be otherwise 0.
Optionally, the third determining module includes:
Third determines submodule, segments x for according to the following formula (3), determining that each participle state corresponds to the targett Third condition Probability p (xt|li):
Wherein, D characterizes the sum really segmented.
Optionally, the identification module includes:
4th determines submodule, segments for the corresponding participle state of maximum third condition probability to be determined as the target xtName Entity recognition result.
The disclosure third aspect also provides a kind of computer readable storage medium, is stored thereon with computer program, the journey The step of the method provided by disclosure first aspect is realized when sequence is executed by processor.
Disclosure fourth aspect also provides a kind of electronic equipment, comprising:
Memory is stored thereon with computer program;
Processor, for executing the computer program in the memory, to realize that disclosure first aspect is mentioned The step of the method for confession.
Through the above technical solutions, due to considering target participle when being named Entity recognition to target participle Corresponding all possible true participle and each participle state correspond to the first condition probability really segmented and each true Participle corresponds to the second condition probability of target participle, in this way, resulting each participle state corresponds to the Article 3 of target participle Relationship between part probability, substantially characterization target participle, true participle and participle state three, and then it is real to improve name The accuracy rate and recall rate of body identification, and also can effectively avoid and occur multiword, few word or wrong other during text identification The case where word.
Other feature and advantage of the disclosure will the following detailed description will be given in the detailed implementation section.
Detailed description of the invention
Attached drawing is and to constitute part of specification for providing further understanding of the disclosure, with following tool Body embodiment is used to explain the disclosure together, but does not constitute the limitation to the disclosure.In the accompanying drawings:
Fig. 1 is a kind of flow chart of recognition methods for naming entity shown according to an exemplary embodiment.
Fig. 2 is a kind of flow chart of the recognition methods of the name entity shown according to another exemplary embodiment.
Fig. 3 is a kind of flow chart of identification device for naming entity shown according to an exemplary embodiment.
Fig. 4 is the block diagram of a kind of electronic equipment shown according to an exemplary embodiment.
Specific embodiment
It is described in detail below in conjunction with specific embodiment of the attached drawing to the disclosure.It should be understood that this place is retouched The specific embodiment stated is only used for describing and explaining the disclosure, is not limited to the disclosure.
Firstly, being illustrated to HMM model.
HMM model is made of five parts:
(1) state number L, that is, the number of character labeling state set are segmented in model.
(2) the number T for the distinct symbols (also referred to as segmenting) that each participle state may exporti, i.e. character labeling state The sum of participle may be exported.
(3) state transition probability matrix A={ aij, refer to the probability matrix converted between all character labeling states:
aij=P (lj|li),1≤i,j≤L
aij≥0
Wherein, aijCharacter labeling state is characterized from state liIt is transferred to state ljProbability, i characterize i-th of character labeling State, j characterize j-th of character labeling state.
(4) in state liIn the case of occur participle xtProbability distribution matrix B={ bi(t) }, wherein the moment of probability distribution Battle array is also referred to as emission probability matrix, characterizes the relationship between state and participle:
bi(t)=P (xt|li), 1≤i≤L, 1≤t≤Ti
bi(t)≥0
Wherein, t characterizes t-th of participle, TiCharacterization belongs to i-th of character labeling state liParticiple sum.
(5) original state matrix probability distribution π={ πi, i.e., one participle is initially the general of which character labeling state Rate:
πi=P (li),1≤i≤L
πi≥0
To sum up, the five-tuple of HMM model can be denoted as to μ=(l, t, A, B, π).
When being named Entity recognition using the HMM model, firstly, entering text into the HMM model and to this Text carries out thick cutting and then the result after the completion of thick cutting is compared with training corpus, carries out character labeling, and unite Count out calculated value required for Viterbi (viterbi) algorithm, wherein the calculated value is each parameter in five-tuple in HMM model Numerical value then text is identified on the basis of obtained calculated value using Viterbi (viterbi) algorithm.That is, It calculates separately out in state liIn the case of occur participle xtProbability distribution matrix, wherein 1≤i≤L, 1≤t≤M, and according to this Probability distribution matrix is to text identification.Illustratively, referring to existing character labeling collection, state liIt can be such as are as follows: surname, two-character given name Lead-in, name above etc. roles.Wherein, for none absolute standard of character labeling collection, the summary and specially of forefathers is needed Family's knowledge is adjusted.
To sum up, the name accuracy of entity in text and above-mentioned identified in state liIn the case of occur participle xt's The accuracy of probability distribution matrix is related, therefore, in order to improve the accuracy for naming Entity recognition in text, need to ensure in HMM It is calculated in state l in modeliIn the case of occur participle xtProbability distribution matrix accuracy.
Then, the recognition methods of the name entity provided the disclosure is illustrated.Referring to FIG. 1, Fig. 1 is shown according to one Example property implements a kind of flow chart of the recognition methods of the name entity exemplified.As shown in Figure 1, this method may include following step Suddenly.
In a step 11, t-th of target participle x in text is determinedtCorresponding all possible true participle.
Text is carried out multiple target participles can be obtained after cutting, to the determination of each target participle and target participle pair The all possible true participle answered.Wherein, target participle can be an individual word, or multiple word compositions Word is not especially limited in the disclosure.
In the disclosure, for ease of description, it is illustrated for being segmented to a target.Illustratively, such as step 11 It is described, x is segmented for t-th of target in textt, determine that the target segments xtCorresponding all possible true participle.Specifically Ground segments x in known targettWhen, it can count according to history text recognition result and identify that the target segments x in practical situationst When all possible true participle that is encountered.
For example, text is " Beatrice ", it is the name of a singer, should is " the refined emerald green silk of shellfish " after correct translation, And this name can be translated into " Beatrice " by some people sometimes, in this way, gained target participle includes " than " and " Aunar is beautiful This ", wherein for target participle " ratio ", it is considered that the corresponding true participle of target participle in statistics are as follows: " shellfish " and " ratio ";For target participle " Aunar beautiful this ", it is considered that the target segments corresponding true participle in statistics are as follows: " refined kingfisher Silk " and " Aunar beautiful this ".
For another example target participle is " Beijing " or " Concord Hospital " or " mind if text is " mind section, BJ Union Hospital " Section ", and it is considered that true participle is " Beijing " or " Concord Hospital " or " neurology department " in statistics.
In step 12, for each true participle, determine respectively each participle state correspond to this really segment first Conditional probability p (ad|li)。
Wherein, adD-th of characterization true participle, liI-th of participle state is characterized, which is according to People's Daily It marks what corpus determined, and is stored in advance in the HMM model.
In step 13, the second condition Probability p (x of target participle is corresponded to according to each true participlet|ad), Yi Ji One conditional probability p (ad|li), determine that each participle state corresponds to target and segments xtThird condition Probability p (xt|li)。
Second condition Probability p (xt|ad) target participle and the relationship really segmented are characterized, it can be identified according to history text As a result it determines.Specifically, as described above, x is segmented in the known targettCorresponding all possible true participle situation Under, it calculates and occurs target participle x under conditions of there is each true participletProbability, as second condition Probability p (xt| ad), and then according to the step second condition Probability p (xt|ad) and step 12 in determine first condition Probability p (ad|li), it determines Each participle state corresponds to target participle x outtThird condition Probability p (xt|li)。
Wherein, second condition Probability p (xt|ad) can be expressed asw(xt,ad) characterization be There is true participle adWhen occur target participle xtNumber, w (ad) the true participle a of characterization appearancedNumber.
At step 14, according to third condition Probability p (xt|li), x is segmented to targettIt is named Entity recognition.
Determining third condition Probability p (xt|li) after, further according to the third condition Probability p (xt|li), to mesh Mark participle xtIt is named Entity recognition.
Above-mentioned steps 11- step 14, Jin Ershi are performed both by it should be noted that can segment to each target in text Entity recognition now is named to target each in text participle.
By adopting the above technical scheme, due to considering target participle when being named Entity recognition to target participle Corresponding all possible true participle and each participle state correspond to the first condition probability really segmented and each true Participle corresponds to the second condition probability of target participle, in this way, resulting each participle state corresponds to the Article 3 of target participle Relationship between part probability, substantially characterization target participle, true participle and participle state three, and then it is real to improve name The accuracy rate and recall rate of body identification, and also can effectively avoid and occur multiword, few word or wrong other during text identification The case where word.
Determining each participle state corresponding to the first condition Probability p (a each really segmentedd|li) after, it can basis Total probability formula determines that each participle state corresponds to the target and segments xtThird condition Probability p (xt|li), it is specifically, above-mentioned The specific embodiment of step 13 can be with are as follows: (3) according to the following formula determine that each participle state corresponds to the target and segments xt Third condition Probability p (xt|li):
Wherein, D characterizes the sum really segmented.
In this way, identified each participle state, which corresponds to target, segments xtThird condition probability, it is characterization target participle, true Relationship between real participle and participle state three, and then the accuracy rate and recall rate of name Entity recognition are improved, and also It can effectively avoid and occur the case where multiword, few word or wrong word during text identification.
In addition, determining third condition Probability p (xt|li) after, in a kind of possible embodiment, according to this Three conditional probability p (xt|li) x is segmented to targettIt is named Entity recognition.Be preferably carried out in mode in another kind, in order into One step improves the accuracy rate and recall rate to name Entity recognition, and the specific embodiment of step 14 can be with are as follows: by maximum third The corresponding participle state of conditional probability is determined as the target participle xtName Entity recognition result.
It specifically, include that multiple participle states correspond to target participle x in third condition probabilitytMultiple probability, And due to occurring target participle x under each participle state under normal conditionstProbability it is not identical, that is, it is above-mentioned multiple general Rate is different, and the probability highest occurred under target participle participle state corresponding to maximum third condition probability, because The corresponding participle state of maximum third condition probability can be determined as target participle x in the disclosure by thistName entity Recognition result segments x to target to further increasetName Entity recognition accuracy.
Illustratively, for " Beatrice " this text, if translation gained target participle is " ratio ", and the target is segmented " than " corresponding true participle is " shellfish " and " ratio ", conventionally, as inputting true participle not in HMM model " shellfish ", and due to " comparing " and not having " surname " this participle state in the historical data, when identifying target participle " ratio ", " ratio " can not be identified as " surname ".Similarly, for target participle " Aunar beautiful this ", due to inputting true point not in HMM model Word " refined kingfisher silk ", and due to " Aunar beautiful this " in the historical data and do not have " name " this participle state, in identification mesh When mark participle " Aunar beautiful this ", " Aunar beautiful this " can not be identified as " name ".In this way, in the prior art, if input text is " Beatrice " can not be identified as name by " Beatrice ".
In the present solution, being input in HMM model by will really segment " shellfish ", in this way, in identification target participle " ratio " When encountered it is all possible it is true participle be " shellfish " and " ratio ", and determine participle state be " surname " correspond to really segment The first condition probability of " shellfish " and " ratio " is respectively p (shellfish | surname) and p (ratio | surname), and, each true participle corresponds to target The second condition probability of participle is respectively p (ratio | shellfish) and p (ratio | ratio), and then determines that participle state is " surname " corresponding to target Segment third condition Probability p (ratio | surname)=p (ratio | shellfish) p (shellfish | surname)+p (ratio | the ratio) p (compare | surname) of " ratio ".As described above, Due to " comparing " and not having " surname " this participle state in the historical data, the numerical value of p (ratio | surname) can be zero, and " shellfish " Corresponding to " surname ", this participle state belongs to truth, and probability value p (shellfish | surname) can be bigger, therefore, in p (ratio | ratio) p Increase p (ratio | shellfish) p (shellfish | surname) on the basis of (ratio | surname), so that the Probability p (ratio | surname) determined increases, it can mention significantly A possibility that " ratio " is identified as " surname " by height.Similarly, according to principle as above, the Probability p determined (Aunar beautiful this | name) also can be by Increase, it can greatly improve a possibility that " Aunar beautiful this " is identified as " name ".To sum up, if input text is " Beatrice " that translates can be identified as name by " Beatrice ".
To sum up, in terms of existing technologies, in the disclosure, participle state is influenced corresponding to target using true participle The probability of participle further increases the accuracy of the name Entity recognition to target participle.
Belong to implicit parameter due to segmenting state in HMM model, each point can not be determined according to history text recognition result Word state corresponds to the first condition Probability p (a really segmentedd|li), it therefore, in the disclosure, can be according to literary from history The fourth condition probability determined in this recognition result estimates above-mentioned first condition probability.Specifically, as shown in Fig. 2, it is above-mentioned Step 12 may comprise steps of.
In step 121, for each true participle, determine that target segments xtCorresponding to the fourth condition really segmented Probability p (ad|xt)。
In the disclosure, which segments xtCorresponding to the fourth condition Probability p (a really segmentedd|xt) can be described as after Test probability, and the posterior probability can be obtained through statistics.As described above, x is segmented in known targettIn the case where, it can In history text recognition result, determine to segment x with the targettCorresponding all true participles, and then can determine that this Target segments xtCorresponding to the fourth condition Probability p (a each really segmentedd|xt)。
Wherein, fourth condition Probability p (ad|xt) can be expressed asw(ad,xt) characterization be There is target participle xtWhen there is true participle adNumber, w (xt) characterization occur target participle xtNumber.
In step 122, x is segmented according to targettCorresponding to the fourth condition Probability p (a each really segmentedd|xt), estimate It counts each participle state and corresponds to the first condition Probability p (a each really segmentedd|li)。
Since above-mentioned fourth condition probability characterizes the relationship between true participle and target participle, and the fourth condition probability Be determined from history text recognition result, so, the fourth condition probability is more accurate.Therefore, in the disclosure, root According to the accurate fourth condition probability, it is general can accurately to estimate that each participle state corresponds to the first condition each really segmented Rate p (ad|li), it is ensured that the first condition Probability p (a estimatedd|li) accuracy.
Illustratively, it is contemplated that KL divergence (Kullback-Leibler divergence), also known as relative entropy are to measure In similar events space two probability distribution relative mistakes away from estimate, therefore, can be determined according to relative entropy formula with the above-mentioned 4th The immediate first condition Probability p (a of conditional probabilityd|li)。
Specifically, the embodiment of step 122 can be with are as follows: (1) to formula (2) according to the following formula, it will be so that d (zt,yi) Meet preset conditionIt is determined as a participle state and corresponds to the first condition probability each really segmented:
Wherein, D characterizes the sum really segmented,It characterizes target and segments xtCorresponding to d-th of true participle Fourth condition probability,It is general corresponding to d-th of first condition really segmented to characterize i-th of participle state Rate,Characterize the target participle xtCorresponding to each fourth condition probability really segmented Vector,It characterizes i-th of participle state and corresponds to the first condition probability each really segmented Vector, d (zt,yi) characterization ztAnd yiRelative entropy.
Wherein, above-mentioned identified each true participle is an independent unit, but has dependence within a context Relationship.And occur under each participle state in HMM model it is true participle be it is unfixed, each participle state can produce It is raw a variety of possible.Therefore, in the disclosure, the first condition Probability p (ad|li) also need to meet the following conditions:
0≤P(ad|li)≤1
In addition, a kind of possible embodiment are as follows: the preset condition can characterize the first condition probability that user is received The difference of distribution and fourth condition probability distribution, the difference can be the numerical value of default, is also possible to the number of user's self-setting Value, and above-mentioned numerical value is all larger than zero.
In view of will appear the probability of many similarities and differences words in practical problem, and cutting mark participle shape is being carried out to text When state, each participle state may include multiple target participles, therefore, in order to improve the identification to entity is named in entire text Accuracy, another preferred embodiment are as follows: the preset condition be loss functionIt is minimum.Wherein, The TiCharacterization belongs to i-th of participle state liTarget participle sum, L characterize participle state sum,I-th point of characterization Word state and target segment xtBetween it is whether relevant, if relevant be 1, be otherwise 0.
In this way, can be by above-mentioned solution so that d (zt,yi) meet preset conditionThe problem of be converted into solution formula (4) The problem of, that is, it will solve so that d (zt,yi) meet preset conditionThe problem of be converted into solve optimize the problem of, in turn It is solved for formula (4) and above-mentioned formula (2), obtained optimal solution is
According to formula (4), solve: for arbitraryHave
Wherein, as described above,It indicates if state liWith observation xtBetween be related,Otherwise It is not involved in calculating.Therefore, for arbitraryHave:
Wherein, TiExpression belongs to i-th of participle state liTarget participle sum.
Furthermore, it is possible to using theorem to obtainedIt is verified, to determine above-mentioned formula (5) or formula (6) for public affairs The optimal solution of formula (4).Wherein, solution required by theorem proving is utilizedThe as optimal solution of formula (4), belongs to the prior art, this Place repeats no more.
By adopting the above technical scheme, by relative entropy formula, first condition Probability p (a will be solvedd|li) the problem of, become The optimization problem of one convex function contains Strict local minimizer value point using this provable optimization problem of theorem, and finds out this Solution is the first condition Probability p (ad|li)。
Based on the same inventive concept, the disclosure also provides a kind of identification device for naming entity.Referring to FIG. 3, Fig. 3 is root A kind of block diagram of the identification device of name entity shown according to an exemplary embodiment.As shown in figure 3, the identification of the name entity Device may include:
First determining module 31, for determining that t-th of target in text segments xtCorresponding all possible true point Word;
Second determining module 32, it is true respectively for being directed to each of the first determining module determination true participle Fixed each participle state corresponds to the first condition Probability p (a really segmentedd|li), wherein adD-th of characterization true participle, li Characterize i-th of participle state;
Third determining module 33, for corresponding to institute according to each of the second determining module determination true participle State target participle xtSecond condition Probability p (xt|ad) and the first condition Probability p (ad|li), determine each participle state X is segmented corresponding to the targettThird condition Probability p (xt|li);
Identification module 34, the third condition Probability p (x for being determined according to the third determining modulet|li), to institute State target participle xtIt is named Entity recognition.
Optionally, second determining module includes:
First determines submodule, for being directed to each true participle, determines the target participle xtIt is true corresponding to this Fourth condition Probability p (a segmented in factd|xt);
Submodule is estimated, for determining that the target that submodule determines segments x according to described firsttCorresponding to each institute State the fourth condition Probability p (a really segmentedd|xt), estimate that each participle state corresponds to each institute really segmented State first condition Probability p (ad|li)。
Optionally, the estimation submodule includes:
Second determines submodule, is used for according to above-mentioned formula (1) to formula (2), will be so that d (zt,yi) meet preset condition 'sIt is determined as each participle state corresponding to each first condition probability really segmented.
Optionally, the preset condition are as follows: loss functionIt is minimum;Wherein, TiCharacterization belongs to described the I participle state liTarget participle sum, L characterizes the sum of the participle state,Characterize i-th of participle state X is segmented with the targettBetween it is whether relevant, if relevant be 1, be otherwise 0.
Optionally, the third determining module includes:
Third determines submodule, for determining that each participle state corresponds to the target and segments x according to above-mentioned formula (3)t Third condition Probability p (xt|li)。
Optionally, the identification module includes:
4th determines submodule, segments for the corresponding participle state of maximum third condition probability to be determined as the target xtName Entity recognition result.
About the device in above-described embodiment, wherein modules execute the concrete mode of operation in related this method Embodiment in be described in detail, no detailed explanation will be given here.
Fig. 4 is the block diagram of a kind of electronic equipment 400 shown according to an exemplary embodiment.As shown in figure 4, the electronics is set Standby 400 may include: processor 401, memory 402.The electronic equipment 400 can also include multimedia component 403, input/ Export one or more of (I/O) interface 404 and communication component 405.
Wherein, processor 401 is used to control the integrated operation of the electronic equipment 400, to complete above-mentioned name entity All or part of the steps in recognition methods.Memory 402 is for storing various types of data to support in the electronic equipment 400 operation, these data for example may include any application or method for operating on the electronic equipment 400 Instruction and the relevant data of application program, such as contact data, the message of transmitting-receiving, picture, audio, video etc..This is deposited Reservoir 402 can realize by any kind of volatibility or non-volatile memory device or their combination, for example, it is static with Machine accesses memory (Static Random Access Memory, abbreviation SRAM), electrically erasable programmable read-only memory (Electrically Erasable Programmable Read-Only Memory, abbreviation EEPROM), erasable programmable Read-only memory (Erasable Programmable Read-Only Memory, abbreviation EPROM), programmable read only memory (Programmable Read-Only Memory, abbreviation PROM), and read-only memory (Read-Only Memory, referred to as ROM), magnetic memory, flash memory, disk or CD.Multimedia component 403 may include screen and audio component.Wherein Screen for example can be touch screen, and audio component is used for output and/or input audio signal.For example, audio component may include One microphone, microphone is for receiving external audio signal.The received audio signal can be further stored in storage Device 402 is sent by communication component 405.Audio component further includes at least one loudspeaker, is used for output audio signal.I/O Interface 404 provides interface between processor 401 and other interface modules, other above-mentioned interface modules can be keyboard, mouse, Button etc..These buttons can be virtual push button or entity button.Communication component 405 is for the electronic equipment 400 and other Wired or wireless communication is carried out between equipment.Wireless communication, such as Wi-Fi, bluetooth, near-field communication (Near Field Communication, abbreviation NFC), 2G, 3G or 4G or they one or more of combination, therefore corresponding communication Component 405 may include: Wi-Fi module, bluetooth module, NFC module.
In one exemplary embodiment, electronic equipment 400 can be by one or more application specific integrated circuit (Application Specific Integrated Circuit, abbreviation ASIC), digital signal processor (Digital Signal Processor, abbreviation DSP), digital signal processing appts (Digital Signal Processing Device, Abbreviation DSPD), programmable logic device (Programmable Logic Device, abbreviation PLD), field programmable gate array (Field Programmable Gate Array, abbreviation FPGA), controller, microcontroller, microprocessor or other electronics member Part is realized, for executing the recognition methods of above-mentioned name entity.
In a further exemplary embodiment, a kind of computer readable storage medium including program instruction is additionally provided, it should The step of recognition methods of above-mentioned name entity is realized when program instruction is executed by processor.For example, this computer-readable is deposited Storage media can be the above-mentioned memory 402 including program instruction, and above procedure instruction can be by the processor of electronic equipment 400 401 execute to complete the recognition methods of above-mentioned name entity.
The preferred embodiment of the disclosure is described in detail in conjunction with attached drawing above, still, the disclosure is not limited to above-mentioned reality The detail in mode is applied, in the range of the technology design of the disclosure, a variety of letters can be carried out to the technical solution of the disclosure Monotropic type, these simple variants belong to the protection scope of the disclosure.
It is further to note that specific technical features described in the above specific embodiments, in not lance In the case where shield, it can be combined in any appropriate way.In order to avoid unnecessary repetition, the disclosure to it is various can No further explanation will be given for the combination of energy.
In addition, any combination can also be carried out between a variety of different embodiments of the disclosure, as long as it is without prejudice to originally Disclosed thought equally should be considered as disclosure disclosure of that.

Claims (10)

1. a kind of recognition methods for naming entity characterized by comprising
Determine t-th of target participle x in texttCorresponding all possible true participle;
For each true participle, determine that each participle state corresponds to the first condition Probability p (a really segmented respectivelyd| li), wherein adD-th of characterization true participle, liCharacterize i-th of participle state;
Correspond to the target according to each true participle and segments xtSecond condition Probability p (xt|ad) and described One conditional probability p (ad|li), determine that each participle state corresponds to the target and segments xtThird condition Probability p (xt|li);
According to the third condition Probability p (xt|li), x is segmented to the targettIt is named Entity recognition.
2. being determined respectively each the method according to claim 1, wherein described be directed to each true participle Participle state corresponds to the first condition Probability p (a really segmentedd|li), comprising:
For each true participle, the target participle x is determinedtCorresponding to the fourth condition Probability p (a really segmentedd| xt);
X is segmented according to the targettCorresponding to each fourth condition Probability p (a really segmentedd|xt), estimation is each Participle state corresponds to each first condition Probability p (a really segmentedd|li)。
3. according to the method described in claim 2, it is characterized in that, described segment x according to the targettCorresponding to each described Fourth condition Probability p (a really segmentedd|xt), estimate each participle state correspond to it is each it is described really segment it is described First condition Probability p (ad|li), comprising:
(1), will be so that d (z to formula (2) according to the following formulat,yi) meet the y of preset conditioni dIt is determined as each participle state pair It should be in each first condition probability really segmented:
Wherein, D characterizes the sum really segmented,Characterize the target participle xtIt is true corresponding to d-th The fourth condition probability of participle,I-th of participle state is characterized corresponding to d-th of first condition really segmented Probability,Characterize the target participle xtCorresponding to each fourth condition probability really segmented Vector,It characterizes i-th of participle state and corresponds to the first condition probability each really segmented Vector, d (zt,yi) characterization ztAnd yiRelative entropy.
4. according to the method described in claim 3, it is characterized in that, the preset condition are as follows: loss function It is minimum;Wherein, TiCharacterization belongs to i-th of participle state liTarget participle sum, L characterizes the participle state Sum,I-th of participle state of characterization and the target segment xtBetween it is whether relevant, if relevant be 1, otherwise for 0。
5. method according to any of claims 1-4, which is characterized in that described according to each true participle pair Target described in Ying Yu segments xtSecond condition Probability p (xt|ad) and the first condition Probability p (ad|li), determine each point Word state corresponds to the target and segments xtThird condition Probability p (xt|li), comprising:
(3) according to the following formula determine that each participle state corresponds to the target and segments xtThird condition Probability p (xt|li):
Wherein, D characterizes the sum really segmented.
6. method according to any of claims 1-4, which is characterized in that described according to the third condition Probability p (xt|li), x is segmented to the targettIt is named Entity recognition, comprising:
The corresponding participle state of maximum third condition probability is determined as the target participle xtName Entity recognition result.
7. a kind of identification device for naming entity characterized by comprising
First determining module, for determining that t-th of target in text segments xtCorresponding all possible true participle;
Second determining module, for determining each point respectively for each of the first determining module determination true participle Word state corresponds to the first condition Probability p (a really segmentedd|li), wherein adD-th of characterization true participle, liCharacterization the I participle state;
Third determining module, for corresponding to the target according to each of the second determining module determination true participle Segment xtSecond condition Probability p (xt|ad) and the first condition Probability p (ad|li), determine that each participle state corresponds to The target segments xtThird condition Probability p (xt|li);
Identification module, the third condition Probability p (x for being determined according to the third determining modulet|li), to the target Segment xtIt is named Entity recognition.
8. the method according to the description of claim 7 is characterized in that the third determining module includes:
Third determines submodule, segments x for according to the following formula (3), determining that each participle state corresponds to the targett? Three conditional probability p (xt|li):
Wherein, D characterizes the sum really segmented.
9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is held by processor The step of any one of claim 1-6 the method is realized when row.
10. a kind of electronic equipment characterized by comprising
Memory is stored thereon with computer program;
Processor, for executing the computer program in the memory, to realize described in any one of claim 1-6 The step of method.
CN201811519563.6A 2018-12-12 2018-12-12 Named entity identification method and device, readable storage medium and electronic equipment Active CN109710927B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811519563.6A CN109710927B (en) 2018-12-12 2018-12-12 Named entity identification method and device, readable storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811519563.6A CN109710927B (en) 2018-12-12 2018-12-12 Named entity identification method and device, readable storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN109710927A true CN109710927A (en) 2019-05-03
CN109710927B CN109710927B (en) 2022-12-20

Family

ID=66256392

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811519563.6A Active CN109710927B (en) 2018-12-12 2018-12-12 Named entity identification method and device, readable storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN109710927B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116013027A (en) * 2022-08-05 2023-04-25 航天神舟智慧系统技术有限公司 Group event early warning method and system

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006178865A (en) * 2004-12-24 2006-07-06 Nippon Telegr & Teleph Corp <Ntt> Device, method and program for extracting intrinsic expression, and recording medium with the program recorded thereon
JP6077727B1 (en) * 2016-01-28 2017-02-08 楽天株式会社 Computer system, method, and program for transferring multilingual named entity recognition model
CN106776544A (en) * 2016-11-24 2017-05-31 四川无声信息技术有限公司 Character relation recognition methods and device and segmenting method
CN107203511A (en) * 2017-05-27 2017-09-26 中国矿业大学 A kind of network text name entity recognition method based on neutral net probability disambiguation
CN107832476A (en) * 2017-12-01 2018-03-23 北京百度网讯科技有限公司 A kind of understanding method of search sequence, device, equipment and storage medium
CN107908614A (en) * 2017-10-12 2018-04-13 北京知道未来信息技术有限公司 A kind of name entity recognition method based on Bi LSTM
CN108388559A (en) * 2018-02-26 2018-08-10 中译语通科技股份有限公司 Name entity recognition method and system, computer program of the geographical space under
CN108536679A (en) * 2018-04-13 2018-09-14 腾讯科技(成都)有限公司 Name entity recognition method, device, equipment and computer readable storage medium

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006178865A (en) * 2004-12-24 2006-07-06 Nippon Telegr & Teleph Corp <Ntt> Device, method and program for extracting intrinsic expression, and recording medium with the program recorded thereon
JP6077727B1 (en) * 2016-01-28 2017-02-08 楽天株式会社 Computer system, method, and program for transferring multilingual named entity recognition model
CN106776544A (en) * 2016-11-24 2017-05-31 四川无声信息技术有限公司 Character relation recognition methods and device and segmenting method
CN107203511A (en) * 2017-05-27 2017-09-26 中国矿业大学 A kind of network text name entity recognition method based on neutral net probability disambiguation
CN107908614A (en) * 2017-10-12 2018-04-13 北京知道未来信息技术有限公司 A kind of name entity recognition method based on Bi LSTM
CN107832476A (en) * 2017-12-01 2018-03-23 北京百度网讯科技有限公司 A kind of understanding method of search sequence, device, equipment and storage medium
CN108388559A (en) * 2018-02-26 2018-08-10 中译语通科技股份有限公司 Name entity recognition method and system, computer program of the geographical space under
CN108536679A (en) * 2018-04-13 2018-09-14 腾讯科技(成都)有限公司 Name entity recognition method, device, equipment and computer readable storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116013027A (en) * 2022-08-05 2023-04-25 航天神舟智慧系统技术有限公司 Group event early warning method and system

Also Published As

Publication number Publication date
CN109710927B (en) 2022-12-20

Similar Documents

Publication Publication Date Title
TWI664540B (en) Search word error correction method and device, and weighted edit distance calculation method and device
CN105976818B (en) Instruction recognition processing method and device
JP6894058B2 (en) Hazardous address identification methods, computer-readable storage media, and electronic devices
CN105009064B (en) Use the touch keyboard of language and spatial model
US20190220752A1 (en) Method, apparatus, server, and storage medium for incorporating structured entity
CN102831177B (en) Statement error correction and system thereof
CN110472251A (en) Method, the method for statement translation, equipment and the storage medium of translation model training
CN109783555A (en) Form templat storage method, device, storage medium and electronic equipment
CN105630763B (en) For referring to the method and system of the disambiguation in detection
US20160078016A1 (en) Intelligent ontology update tool
EP3620994A1 (en) Methods, apparatuses, devices, and computer-readable storage media for determining category of entity
CN108268637A (en) A kind of intelligent sound correction recognition methods, device and user terminal
CN113220835B (en) Text information processing method, device, electronic equipment and storage medium
CN102193646B (en) Method and device for generating personal name candidate words
EP4123474A1 (en) Method for acquiring structured question-answering model, question-answering method and corresponding apparatus
CN109582886A (en) Content of pages extracting method, the generation method of template and device, medium and equipment
CN109710927A (en) Name recognition methods, device, readable storage medium storing program for executing and the electronic equipment of entity
CN107426610A (en) Video information synchronous method and device
CN110427622A (en) Appraisal procedure, device and the storage medium of corpus labeling
CN107220283B (en) Data processing method, device, storage medium and electronic equipment
CN108763574A (en) A kind of microblogging rumour detection algorithm based on gradient boosted tree detects characteristic set with rumour
US11663269B2 (en) Error correction method and apparatus, and computer readable medium
WO2020230043A1 (en) Feature vector feasibilty estimation
JP2021124913A (en) Retrieval device
CN109508390A (en) Input prediction method and device based on knowledge graph and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant